我处理的HTML看起来像这样grep来提取出从HTML
<a class="title may-blank" data-event-action="title" href="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/" tabindex="1" data-href-url="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/" data-inbound-url="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/?utm_content=title&utm_medium=hot&utm_source=reddit&utm_name=frontpage" rel="">We can play singleplayer games OFF THE INTERNET? Are they seriously that out of touch to advertise this?</a>
多条线路一样,
我只想要那个引号之间的东西一律在href="http://xxxxxxxx"
和rel="">yyyyyyyyyy
中,其余是不必要的。
标识像他们这样的输出,对于每一个块的新线之上
<a href="http://xxxxxxxx" rel="">yyyyyyyyyy</a>
任何想法,我将如何得到解决这样做呢?
它看起来像一个reddit链接,因此您可能还想查看[reddit API](https://www.reddit.com/dev/api/)而不是手动解析html – user3151902
请参见https:// stackoverflow.com/a/1732454/1682509 – Reeno