2017-04-14 97 views
6

我试图运行下面的命令蟒蛇ntlk donwload给解析器eror

import nltk 
nltk.download('all') 

但我收到此错误

Traceback (most recent call last): 
    File "./update.py", line 3, in <module> 
    nltk.download('all') 
    File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 664, in download 
    for msg in self.incr_download(info_or_id, download_dir, force): 
    File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 534, in incr_download 
    try: info = self._info_or_id(info_or_id) 
    File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 508, in _info_or_id 
    return self.info(info_or_id) 
    File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 875, in info 
    self._update_index() 
    File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 825, in _update_index 
    ElementTree.parse(compat.urlopen(self._url)).getroot()) 
    File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse 
    tree.parse(source, parser) 
    File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse 
    self._root = parser._parse_whole(source) 
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143 

我是新来的蟒蛇,所以我真的不知道是什么我应该怎么做。 我查看了上面报告的源模块,并注意到它正在尝试下载xml文件。所以我跑了下面的命令,并没有给我任何错误。

compat.urlopen('https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml') 

所以我认为没有问题的下载,但在解析器。有人可以建议我怎么从这里开始?

+0

我也得到了同样的问题在这里 – Bart

+0

这个问题 –

+0

开始发生几个小时前我 – silentser

回答

1

问题在于NLTK返回的XML。

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143 

在23:143,我们看到了问题,缺少“=”:

... unzip="1" unzipped_size"1917" url="https... 

NTLK一定会很快解决这个问题,直到那时我不知道最好的回应是什么。

6

index.xml有一个错字。它已经被修补。刚刚检查和nltk.download('all')工作正常!

见:nltk/nltk_data#70

+0

是啊,现在工作得很好..谢谢 – user3602300