2012-07-14 38 views
0

我试图解析论坛news.ycombinator.com上的评论主题。但是,在查看html之后,似乎没有层次结构来嵌套注释。这会使分析真的很难。例如,这里是一个父评论及其子:黑客新闻:如何提取评论层次

<!-- This part below draws the upvote/downvote images --> 
<table border=0><tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td><td valign=top><center><a id=up_4241971 href="vote?for=4241971&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4241971></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; "> 


<!-- This part below is user/time and permalink info for a parent comment --> 
<span class="comhead"><a href="user?id=JshWright">JshWright</a> 7 hours ago | <a href="item?id=4241971">link</a></span></div><br> 


<!-- This part below is actual Comment --> 
<span class="comment"><font color=#000000>I just got my Verizon Galaxy S3, and ordered the 20-pack of NFC tags offered by <a href="http://tagsfordroid.com" rel="nofollow">http://tagsfordroid.com</a><p>I think I know what my Dad felt like when he got his first label printer... Within days it seemed like every object in his office was labeled...<p>I've got a tag in my car to automatically send my wife a "Headed home" SMS, a tag on my night stand to toggle between 'night' (silent) and 'day' (loud) volume settings, a tag by my back door to launch CardioTrainer when I go out for a run (this one may have crossed the "I've run out of ideas" line...). I'm using the keychain tag to dial a response number for the fire department I'm a member of.</font></span><p><font size=1><u><a href="reply?id=4241971&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr> 


<!-- This part below is upvote/downvote arrow for child of parent --> 
<tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td><td valign=top><center><a id=up_4242025 href="vote?for=4242025&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4242025></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; "> 

<!-- This part has user/time/permalink for child comment --> 
<span class="comhead"><a href="user?id=msbmsb">msbmsb</a> 7 hours ago | <a href="item?id=4242025">link</a></span></div><br> 

<!-- This part is the content of the child comment --> 
<span class="comment"><font color=#000000>I did the same thing. Tag next to the entry-way light switch for changing to an "at-home" profile, tag next to the bed for switching between night mode and morning mode, tag at work, keychain tag for switching between car mode and quiet mode.<p>And profile switching is just the basics. You can have a tag that connects guests' NFC-enabled phones to your wifi without having to hand out the password, for instance.<p>NFC task launcher + tasker is an amazing combination that opens up all kinds of possibilities.</font></span><p><font size=1><u><a href="reply?id=4242025&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr><tr><td> 

那么,如何黑客新闻专卖店的评论的分级结构,我怎么能复制它,当我刮他们的数据?

回答

2

在表中,缩进通过图像标签来完成:

...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td>... 
...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td>... 

想必你会读和解析这些。通过保留width值的内部堆栈,可以完成重新构建实际线程代表。

+0

哇!我错过了。非常感谢。 – yayu 2012-07-14 05:24:32