2012-04-21 51 views
3

的我在一个大问题批次,如何筛选这个HTML文本设置位瓦尔

我使用wget的请愿书后发送到网络,然后我收到一个HTML ,我需要过滤HTML的这个样本:

more code up... 

     <div id="song_html" class="show1"> 
      <div class="left"> 
      <!-- info mp3 here --> 
       256 kbps<br />3:21<br />6.13 mb   </div> 
      <div id="right_song"> 
       <div style="font-size:15px;"><b>Marilyn Manson - Tainted Love (Manson Remix) mp3</b></div> 
       <div style="clear:both;"></div> 
       <div style="float:left;"> 
        <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
         <div style="float:left;"><a href="http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3" rel="nofollow" target="_blank" style="color:green;">Download</a></div> 
               <div style="margin-left:8px; float:left; width:27px; text-align:center;"><a href="javascript:void(0)" onclick="showPlayer_new(37119, '91da6888c92ccb4198dbc78cb30f311635751694', 'marilyn+manson', 'tainted+love')" rel="nofollow" id="lk37119" class="play_now">Play</a></div>      
                     <div style="margin-left:8px; float:left;"><a href="javascript:void(0)" onclick="showEmbed_new(37119, '91da6888c92ccb4198dbc78cb30f311635751694')" rel="nofollow" id="em37119" class="embed">Embed</a></div> 
               <div style="margin-left:8px; float:left;"><a href="http://www.ringtonematcher.com/go/?sid=WDLL&artist=marilyn+manson&song=tainted+love" rel="nofollow" target="_blank" style="color:red;" title="Send Marilyn Manson - Tainted Love Ringtone to your Cell">Descarga Tono</a></div> 
         <div style="clear:both;"></div> 
        </div> 
        <div id="player37119" style="float:left; margin-left:10px;" class="player"></div> 
       </div> 
       <div style="clear:both;"></div> 
      </div> 
      <div style="clear:both;"></div> 
     </div> 

     <div id="song_html" class="show2"> 
      <div class="left"> 
      <!-- info mp3 here --> 
          </div> 
      <div id="right_song"> 
       <div style="font-size:15px;"><b>Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3</b></div> 
       <div style="clear:both;"></div> 
       <div style="float:left;"> 
        <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
         <div style="float:left;"><a href="http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3" rel="nofollow" target="_blank" style="color:green;">Download</a></div> 
               <div style="margin-left:8px; float:left; width:27px; text-align:center;"><a href="javascript:void(0)" onclick="showPlayer_new(668416, 'ac5b8834fa26b892fc1436db4678aca9d8acfdb1', 'spaz+marilyn+manson+metric', 'grow+up+and+blow+the+great+big+dj%3a%2f%2fspaz%2c+marilyn+manson')" rel="nofollow" id="lk668416" class="play_now">Play</a></div>      
                     <div style="margin-left:8px; float:left;"><a href="javascript:void(0)" onclick="showEmbed_new(668416, 'ac5b8834fa26b892fc1436db4678aca9d8acfdb1')" rel="nofollow" id="em668416" class="embed">Embed</a></div> 
               <div style="margin-left:8px; float:left;"><a href="http://www.ringtonematcher.com/go/?sid=WDLL&artist=spaz+marilyn+manson+metric&song=grow+up+and+blow+the+great+big+dj%3a%2f%2fspaz%2c+marilyn+manson" rel="nofollow" target="_blank" style="color:red;" title="Send Spaz Marilyn Manson Metric - Grow Up And Blow The Great Big Dj://spaz, Marilyn Manson Ringtone to your Cell">Descarga Tono</a></div> 
         <div style="clear:both;"></div> 
        </div> 
        <div id="player668416" style="float:left; margin-left:10px;" class="player"></div> 
       </div> 
       <div style="clear:both;"></div> 
      </div> 
      <div style="clear:both;"></div> 
     </div> 

    <div id="morelink" style="margin:10px; text-align:center;"><a href="" rel="nofollow" onClick="toggle(); return false;">Show More Results</a></div> 


       <div id="song_html" class="show3"> 
      <div class="left"> 
      <!-- info mp3 here --> 
       3:10<br />   </div> 
      <div id="right_song"> 
       <div style="font-size:15px;"><b>Marilyn Manson - MARILYN MANSON - Rock is Dead mp3</b></div> 
       <div style="clear:both;"></div> 
       <div style="float:left;"> 
        <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
         <div style="float:left;"><a href="http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3" rel="nofollow" target="_blank" style="color:green;">Download</a></div> 
               <div style="margin-left:8px; float:left; width:27px; text-align:center;"><a href="javascript:void(0)" onclick="showPlayer_new(670124, '14a52b596082676bed6a9d860c383488a486e1dc', 'marilyn+manson', '-+rock+is+dead')" rel="nofollow" id="lk670124" class="play_now">Play</a></div>      
                     <div style="margin-left:8px; float:left;"><a href="javascript:void(0)" onclick="showEmbed_new(670124, '14a52b596082676bed6a9d860c383488a486e1dc')" rel="nofollow" id="em670124" class="embed">Embed</a></div> 
               <div style="margin-left:8px; float:left;"><a href="http://www.ringtonematcher.com/go/?sid=WDLL&artist=marilyn+manson&song=-+rock+is+dead" rel="nofollow" target="_blank" style="color:red;" title="Send Marilyn Manson - - Rock Is Dead Ringtone to your Cell">Descarga Tono</a></div> 
         <div style="clear:both;"></div> 
        </div> 
        <div id="player670124" style="float:left; margin-left:10px;" class="player"></div> 
       </div> 
       <div style="clear:both;"></div> 
      </div> 
      <div style="clear:both;"></div> 
     </div> 

</div> 
</div> 
<!-- ================= --> 

more code down... 

...要设置有点像“名称”,“比特率”,“大小”和“下载”的变量,要打印所有这些信息在批,像这样:

1st result: 
[Name]  Marilyn Manson - Tainted Love (Manson Remix) mp3 
[Info]  Bitrate: 256 kbps. Length: 3:21. Size: 6.13 mb. 
[Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3 

2nd result: 
[Name]  Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3 
[Info]  NO INFO. 
[Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3 

3rd result: 
[Name]  Marilyn Manson - MARILYN MANSON - Rock is Dead mp3 
[Info]  Lenght: 3:10. 
[Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3 

我试过“Findstr” ,“查找”,“SED”,“GREP”,“FART”,但我找不到方法(一行和字符分隔符)做正确的...

唯一我可以看到,使它可能是这一行:

<!-- ================= --> 

我可以用它像一个END-delimitator导致该行标志着MP3的下载和打印他们的信息结束......

有人能帮助我吗?

谢谢你

回答

3

下面的批处理文件中使用的数据你想要的事实位于“info mp3 here”行下面的固定行数。此外,数据是根据其在线中的位置提取的。如果有些数据不遵循这个规则,程序将需要修改。

@echo off 
setlocal EnableDelayedExpansion 
findstr /N /C:"info mp3 here" %1 > "%~N1.tmp" 
set lastLine=-1 
(for /F "usebackq delims=:" %%a in ("%~N1.tmp") do (
    set /A skip=%%a-lastLine 
    for /L %%i in (1,1,!skip!) do set /P info= 
    set /P =& set /P name= 
    for /L %%i in (1,1,4) do set /P download= 
    set "name=!name:*<b>=! 
    for /F "delims=<" %%n in ("!name!") do echo [Name]  %%n 
    set "info=!info:<br />= !" 
    set "info=!info:</div>=!" 
    set bitrate= 
    set lenght= 
    set size= 
    set value= 
    for %%t in (!info!) do (
     if not defined value (
     set value=%%t 
    ) else ( 
     if %%t equ kbps (
      set "bitrate=Bitrate: !value! kbps. " 
      set value= 
     ) else if %%t equ mb (
      set "size=Size: !value! mb." 
      set value= 
     ) else (
      set "lenght=Lenght: !value!. " 
      set value=%%t 
     ) 
    ) 
    ) 
    if defined value (
     set "lenght=Lenght: !value!. " 
    ) 
    set info=!bitrate!!lenght!!size! 
    if not defined info set info=NO INFO. 
    echo [Info]  !info! 
    set "download=!download:"=$!" 
    for /F "tokens=4 delims=$" %%d in ("!download!") do echo [Download] %%d 
    set /A lastline=%%a+6 
)) < %1 
del "%~N1.tmp" 

输出:

[Name]  Marilyn Manson - Tainted Love (Manson Remix) mp3 
[Info]  Bitrate: 256 kbps. Lenght: 3:21. Size: 6.13 mb. 
[Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3 

[Name]  Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3 
[Info]  NO INFO. 
[Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3 

[Name]  Marilyn Manson - MARILYN MANSON - Rock is Dead mp3 
[Info]  Lenght: 3:10. 
[Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3 
+0

Thankyou !.你能告诉我在我发布的示例html开始处需要“削减”多少行吗?如果我尝试使用示例html代码,我会得到失败输出。再次感谢。我的意思是,文件的第一行需要是......哪个? – ElektroStudios 2012-04-24 03:11:04

+0

我认为我的程序应该与任何输入文件一起运行;我发布的输出使用了上面的示例文件。输入:findstr/N/C:“info mp3 here”thefile.html'并检查数字是否与“mp3 info here”行对应。如果仍然存在问题,请切到第一个“mp3 info here”行之上几行。请记住,如果某些数据的格式不符合我的假设,我的程序将失败,如上所述。 – Aacini 2012-04-24 05:12:09

+0

我前几天被tryed“yourcode.bat file.html”,就像我说我得到一个失败的输出,但现在我已经试过了,一切都好,我不知道为什么......也许我救了第一次以utf或unicode格式出错,现在在ansi中。再次感谢你!!! – ElektroStudios 2012-04-25 11:59:55

2

这是一个脚本,它将解析你想要的信息。

该脚本将HTML文件的名称作为参数

输出被发送到一个文件,名称通过在输入文件名后附加'.parsed'得到。

脚本顶部的注释给出了一些有关用于在HTML文件中查找请求的信息的模式的解释。

用制表符替换两个“TAB”实例,并确保在每个制表符前保留单个空格。

#!/bin/bash 

# Parse HTML with sed, suppressing all unwanted lines 
    # "Info" lines all start with a number (ignoring whitespace) 
    # Bitrate and file size can be identified by looking for 
    # the unit (kbps, mb) immediately following the numeric data 
    # Length is identified by the colon in the middle of numeric data 
    # File names are delimited by <b> and </b> 
    # Lines with the URL all contain Download</a> 
    # The </a> isn't necessary, but I thought it would be safer to 
    # include it since one could imagine "Download" appearing in a file name 
# Pipe output to Awk for reordering of the parsed lines 
# and addition of "NO INFO" lines where necessary 

sed -n ' 
/^[ TAB]*[0-9]/ { 
    s/^[ TAB]*/[Info] /
    s/\([0-9]*:[0-9]*\)[^0-9]*/Length: \1./
    s/\([0-9\.]* .bps\)[^0-9L]*/Bitrate: \1./
    s/\([0-9\.]* .b\)[^p][^0-9LB]*/Size: \1./
    p 
} 
/<b>/ { 
    s|</b>.*|| 
    s|.*<b>\(.*\)|[Name]  \1| 
    p 
} 
\|Download</a>| { 
    s/^.*\(http:[^"]*\).*/[Download] \1/ 
    p 
}' $1 | awk 'BEGIN { no_info = "[INFO]  NO INFO."; 
        info = no_info } 
      { if ($1 == "[Name]") name = $0; 
       else if ($1 == "[Info]") info = $0; 
       else { 
        printf("%s\n%s\n%s\n\n", name, info, $0); 
        info = no_info 
       } }' > $1.parsed 
exit 0 
+0

嗨,我很欣赏这么多你的帮助......但这些代码是Linux ..我说我需要做批量,标题说批处理和我的意思是我已经尝试了Windows的SED(和GREP)。对不起,如果你对此感到困惑。我不知道如何“”转换“”linux条款在批处理蝙蝠的权利,但谢谢你。我会调查你的代码... bye – ElektroStudios 2012-04-22 02:58:44

+0

对不起。我完全在批处理部分。不幸的是,我不说批处理。也许有人会翻译过来。我会自我冷落,但显然这不是一种选择。 – 2012-04-22 03:02:46

1

TXR 65(运行在Windows,MinGW的编译的.exe可用)

@(collect) 
<div id="song_html" class="[email protected]"> 
      <div class="left"> 
      <!-- info mp3 here --> 
@(gather :vars ((bitrate nil) (length nil) (size nil))) 
@bitrate [email protected](skip) 
@(skip)@{length /\d+:\d\d/}@(skip) 
@(skip)@{size /\d+\.\d\d/} [email protected](skip) 
@(until) 
      <div id="right_song"> 
@(end) 
@(bind info @(if (or bitrate length size) 
       (let ((s (make-string-output-stream))) 
       (if bitrate 
        (format s "Bitrate: ~a kbps. " bitrate)) 
       (if length 
        (format s "Length: ~a. " length)) 
       (if size 
        (format s "Size: ~a mb. " size)) 
       (get-string-from-stream s)) 
       "NO INFO.")) 
      <div id="right_song"> 
       <div style="font-size:15px;"><b>@title</b></div> 
       <div style="clear:both;"></div> 
       <div style="float:left;"> 
        <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
         <div style="float:left;"><a href="@link" rel="nofollow" target="_blank" style="color:green;">Download</a></div> 
@(until) 
<!-- ================= --> 
@(end) 
@(output) 
@ (repeat) 
[Name]  @title 
[Info]  @info 
[Download] @link 
@ (end) 
@(end) 

运行:

$ txr data.txr data.html 
[Name]  Marilyn Manson - Tainted Love (Manson Remix) mp3 
[Info]  Bitrate: 256 kbps. Length: 3:21. Size: 6.13 mb. 
[Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3 
[Name]  Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3 
[Info]  NO INFO. 
[Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3 
[Name]  Marilyn Manson - MARILYN MANSON - Rock is Dead mp3 
[Info]  Length: 3:10. 
[Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3 
+0

您好Kaz感谢,我会尝试您的代码,但我不明白在WIN上使用它的重要部分。我从你的网站上下载了txr65.zip,但是我找不到任何可执行文件,也许你的意思是我需要在Linux上安装TXR,然后用...编译它。有了MingW(在linux?)?你能解决我的noob问题吗? – ElektroStudios 2012-04-22 16:08:25

+0

不要打扰;有预先制作的二进制文件:http://www.nongnu.org/txr/#downloads – Kaz 2012-04-22 17:13:06