我希望得到Perl问题的一些帮助。在PERL中使用LWP :: UserAgent下载XML结果
我需要下载一个查询结果的XML文件,解析结果,从XML文件中抓取下一个链接,重复下载&。
我已经能够下载和分析的第一个结果集的罚款。
我抓住下一个URL,但似乎返回的结果不会改变。 I.e .:通过循环第二次,$res->content
与第一次相同。因此,$url
的值在第一次下载后永远不会改变。
我怀疑这是一个范围问题,但我似乎无法得到这个句柄。
use LWP::UserAgent;
use HTTP::Cookies;
use Data::Dumper;
use XML::LibXML;
use strict;
my $url = "http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?c=bhlead&cc=bhlead&type=simple&rgn=Entire+Finding+Aid&q1=civil+war&Submit=Search;debug=xml";
while ($url ne ""){
my $ua = LWP::UserAgent->new();
$ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)');
$ua->timeout(30);
$ua->default_header('pragma' => "no-cache", 'max-age' => '0');
print "Download URL:\n$url\n\n";
my $res = $ua->get($url);
if ($res->is_error) {
print STDERR __LINE__, " Error: ", $res->status_line, " ", $res;
exit;
}
my $parser = XML::LibXML->new();
my $doc = $parser->load_xml(string=>$res->content);
#grab the url of the next result set
$url = $doc->findvalue('//ResultsLinks/SliceNavigationLinks/NextHitsLink');
print "NEXT URL:\n$url\n\n";
}
你从`print`行得到什么输出? – cjm 2011-02-15 06:22:48
下载网址: http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?c = bhlead&cc = bhlead&type = simple&rgn = Entire + Finding + Aid&q1 = civil + war&Submit = Search; debug = xml 下载地址: http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx? c = bhlead; cc = bhlead; type = simple; rgn = Entire%20Finding%20Aid; q1 = civil%20war; debug = xml; view = reslist; subview = short; sort = occur; start = 26; size = 25 NEXT URL: http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?c=bhlead;cc=bhlead;type=simple;rgn=Entire%20Finding%20Aid;q1=civil %20war; debug = xml; view = reslist; subview = short; sort = occur; start = 26; size = 25 – Matt 2011-02-15 14:17:49