2013-02-10 79 views
0

我有几个表的HTML文件(所有表具有相同的列数和列名相同的)。这些表格由其他HTML标签分隔。更新列值:: TreeBuilder作为

对于每个表的每一行,我想改变小区1和小区3

的价值这是我迄今(感谢@depesz):

#!/usr/bin/env perl 
use strict; 
use warnings; 
use utf8; 
use open qw(:std :utf8); 

use HTML::TreeBuilder; 

my $input_file_name = shift; 

my $tree = HTML::TreeBuilder->new(); 
$tree->parse_file($input_file_name) or die "Cannot open or parse $input_file_name\n"; 
$tree->elementify(); 

my @tables = $tree->find_by_tag_name('table'); 
for my $table (@tables) { 
    foreach my $row ($table->find_by_tag_name('tr')) { 
     foreach my $column ($table->find_by_tag_name('td')) { 
      # how do I change the text of first and 3rd column text to "removed" 
     } 
    } 
} 

print $tree->as_HTML(); 
exit; 

它非常适合迭代HTML文件中的所有行。我只是不知道如何做最后一点改变第1列和第3列中的文本。

回答

3

HTML::TreeBuilder::XPath模块允许更方便地访问文档中的HTML节点。

看看这个程序为例。它似乎做你需要的。

use strict; 
use warnings; 

use HTML::TreeBuilder::XPath; 

my $tree = HTML::TreeBuilder::XPath->new_from_file('anon.html'); 

for my $table ($tree->findnodes('//table')) { 
    my $row = 0; 
    for my $tr ($table->findnodes('//tr')) { 
    $row++; 
    for my $td ($tr->findnodes('td[position() = 1 or position() = 3]')) { 
     $td->delete_content; 
     $td->push_content("name$row"); 
    } 
    } 
} 

print $tree->as_HTML('<>&', ' '); 
+0

就像一个魅力。谢谢! – smithy 2013-02-10 15:51:14