我正在使用Scrapy写一个刮板。我希望它做的事情之一是比较当前网页的根域和它内部链接的根域。如果这个域不同,那么它必须继续提取数据。这是我当前的代码:如何摆脱exceptions.TypeError错误?
class MySpider(Spider):
name = 'smm'
allowed_domains = ['*']
start_urls = ['http://en.wikipedia.org/wiki/Social_media']
def parse(self, response):
items = []
for link in response.xpath("//a"):
#Extract the root domain for the main website from the canonical URL
hostname1 = link.xpath('/html/head/link[@rel=''canonical'']').extract()
hostname1 = urlparse(hostname1).hostname
#Extract the root domain for thelink
hostname2 = link.xpath('@href').extract()
hostname2 = urlparse(hostname2).hostname
#Compare if the root domain of the website and the root domain of the link are different.
#If so, extract the items & build the dictionary
if hostname1 != hostname2:
item = SocialMediaItem()
item['SourceTitle'] = link.xpath('/html/head/title').extract()
item['TargetTitle'] = link.xpath('text()').extract()
item['link'] = link.xpath('@href').extract()
items.append(item)
return items
然而,当我运行它,我得到这个错误:
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\twisted\internet\base.py", line 1201, in mainLoop
self.runUntilCurrent()
File "C:\Anaconda\lib\site-packages\twisted\internet\base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "C:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 382, in callback
self._startRunCallbacks(result)
File "C:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "C:\Anaconda\lib\site-packages\twisted\internet\defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "E:\Usuarios\Daniel\GitHub\SocialMedia-Web-Scraper\socialmedia\socialmedia\spiders\SocialMedia.py", line 16, in parse
hostname1 = urlparse(hostname1).hostname
File "C:\Anaconda\lib\urlparse.py", line 143, in urlparse
tuple = urlsplit(url, scheme, allow_fragments)
File "C:\Anaconda\lib\urlparse.py", line 176, in urlsplit
cached = _parse_cache.get(key, None)
exceptions.TypeError: unhashable type: 'list'
谁能帮我摆脱这种错误的?我认为它与列表键有关,但我不知道如何解决它。 非常感谢你!
达尼
谢谢劳伦斯。我会尽快尝试。 – 2014-12-05 16:40:52
@DaniValverde:是否有用? – bosnjak 2014-12-09 10:44:41
嗨劳伦斯,我还没有试过,我在国外。我回来的时候会试试看。 – 2014-12-09 10:46:28