帮助使用wget和sed的linux shell脚本

嗨有人可以帮助我设置一个执行以下操作的shell脚本吗？帮助使用wget和sed的linux shell脚本

wget来http://site.com/xap/wp7?p=1
查看HTML从提取的所有产品名称的所有权之间= “免费送货产品名称”> ...例如：标题= “免费送货HD7-Case001”>，HD7-Case001是提取。
输出到products.txt
然后循环执行步骤1的过程。url http://site.com/xap/wp7?p=1其中“1”是页号最多为50的数字。 http://..wp7?p=1，http://..wp7?p=2，http://..wp7?p=3

我已经做了我自己的一些研究，有这么多的代码编写自己...肯定需要大量的工作

#! /bin/sh 
... 

while read page; do 
wget -q -O- "http://site.com/xap/wp7?p=$page" | 
sed ... 

done < "products.txt"

来源

2011-01-28 acctman

http://xmlstar.sourceforge.net/ – 2011-01-28 07:47:31

是否有某个特定您需要使用wget和sed来解决这个问题？ – 2011-01-28 07:55:06

#/bin/bash 

for page in {1..50} 
do 
    wget -q "http://site.com/xap/wp7?p=$page" -O - \ 
    | tr '"' '\n' | grep "^Free Shipping " | cut -d ' ' -f 3 > products.txt 
done

的TR转弯每个双引号为换行，所以TR的输出将是这样的：

<html> 
... 
... <tag title= 
Free Shipping [Product] 
> ...

基本上，这是将每个产品放在一条线上的一种方式。

接下来，的grep试图扔掉所有其他行除了免运费开始的，所以其输出应该是这样的：

Free Shipping [Product1] 
Free Shipping [Product2] 
...

接下来，切正在提取出第三个“列”（由空格分隔），所以输出应该是：

[Product1] 
[Product2] 
...

来源

2011-01-28 08:59:00

你可以用PHP相结合，为XML解析

wget的bash脚本

#/bin/bash 

for page in {1..50} 
do 
    wget -q -O /tmp/$page.xml "http://site.com/xap/wp7?p=$page" 
    php -q xml.php $page >> products.txt 
done

xml.php

<? 
$file = '/tmp/'.$argv[1].'.xml'; 
// assumeing the following format 
//<Products><Product title="Free Shipping ProductName"/></Products> 

$xml = simplexml_load_file($file); 
echo $xml->Product->attributes()->title; 
/* you can make any replacement only parse/obtain the correct node attribute */ 
?>

不是一个好主意，但PHP simplexml提供一些简单的方法来解析XML。
希望这可以是一些踢开始想法

来源

2011-01-28 08:41:16 ajreal

帮助使用wget和sed的linux shell脚本

回答

相关问题