从网站上刮掉java脚本

-4

我试图从这个网页中用Python 2刮掉https://www.mmoga.co.uk/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/。我需要刮的数据表示如下从网站上刮掉java脚本

<a href="/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15,70000-FIFA-15-Xbox-One-Ultimate-Team-Coins/" class="smallBoldText" style="text-decoration:none;" title="70.000 FIFA 15 Xbox One Ultimate Team Coins">

我需要检索标题的内容。我曾尝试下面的代码，但是这似乎并没有工作

i=0 
while i< len(titles): 
    htmltext = urllib.urlopen("https://www.mmoga.com/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/") 
    data = json.load(htmltext) 
    mmogaamount.append(data["title"]) 
    print mmogaamount 
    i+=1

来源

2015-02-17 andy

这与JavaScript有什么关系？ – MattDMo 2015-02-17 23:23:33

json.load只加载json数据 - 你在这里处理HTML。要么学会使用常规密码，要么查看“BeautifulSoup”，它是一个用于抓取HTML的Python库。 – sol 2015-02-17 23:22:50

这将让你开始：

import requests 
from bs4 import BeautifulSoup 


# get html 
content = requests.get("https://www.mmoga.co.uk/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/").content 
# pass html to beautifulSoup 
soup = BeautifulSoup(content) 
# find tr tag we want based on the class 
tr = soup.body.find("tr",attrs={"class":"row1"}) 
# extract the titles from the "smallBoldText" class 
print([x["title"] for x in tr.find_all(attrs={"class":"smallBoldText"}) if x.has_attr("title")]) 
['70.000 FIFA 15 Xbox One Ultimate Team Coins']

我建议检查出bs4 docs，有多个教程是非常容易跟随。

来源

2015-02-17 23:27:36

从网站上刮掉java脚本

回答

相关问题