2014-10-04 56 views
1
from bs4 import BeautifulSoup 
import requests 

a = 0 
while a == 0: 
    word = input("What word do you want to know? ") 

    url = "http://dictionary.cambridge.org/dictionary/british/" + word.lower() 
    r = requests.get(url) 

    soup = BeautifulSoup(r.content) 

    word = soup.find("span", {"class": "pos"}) 
    definition = soup.find("span", {"class": "def"}) 

    for a in word: 
      print (a) 
    for b in definition: 
      print (b) 

我试图做一个基本的字典程序只是作为一个初学者在python网页抓取。问题是我试图提取单词的定义,但我无法弄清楚如何删除标签并使其定义处于可读状态。Python Web刮 - 试图提取文本

以上是我迄今为止编写的代码,当打印b时,只是打印一大堆标签,其中包含我正在查找的文本,但未正确显示。有人可以给我一些提示,将不胜感激。

p.s.我是新来这个网站和编程,所以请不错请

回答

0

您已经正确找到合适的标签。如今,刚刚拿到.text

word = soup.find("span", {"class": "pos"}).text 
definition = soup.find("span", {"class": "def"}).text 

print(word) 
print(definition) 

对于python输入,它打印:

noun 
a very large snake that kills animals for food by wrapping itself around them and crushing them 
+0

我试过,但有以下错误, 结果= definition.find_all( “A”,{” class“:”query“})。text AttributeError:'ResultSet'对象没有属性'text' 之后,我尝试在打印(定义)的末尾添加'.text'。这解决了这个问题,但让每一行都占据了一个新的界限,我怎样才能使它成为一个句子? – Lawase 2014-10-05 20:28:12

+0

@Lawase为什么使用find_all?使用我发布的完全相同的代码。谢谢。 – alecxe 2014-10-05 20:34:03

+0

很好,非常感谢! – Lawase 2014-10-05 21:51:06