2017-09-15 96 views
1

使用nltk对项目进行情绪分析。我搜索了GH,发现sentimaent_analyser或popular_scores调用没有任何相似之处。sentiment_analyser错误:'字节'对象没有属性'编码'使用

我也看了Python 3.4 - 'bytes' object has no attribute 'encode',它不是重复的,因为我没有调用bcrypt.gensalt()。encode('utf-8')。虽然它暗示了某种错误类型的问题。

任何人都可以帮助解决这个错误?

我得到的错误:

/lib/python3.5/site-packages/nltk/sentiment/vader.py in init(self, text) 154 def init(self, text): 155 if not isinstance(text, str): --> 156 text = str(text.encode('utf-8')) 157 self.text = text 158 self.words_and_emoticons = self._words_and_emoticons()

AttributeError: 'bytes' object has no attribute 'encode'

数据帧df_stocks.head(5):

  prices articles 
2007-01-01 12469 What Sticks from '06. Somalia Orders Islamist... 
2007-01-02 12472 Heart Health: Vitamin Does Not Prevent Death ... 
2007-01-03 12474 Google Answer to Filling Jobs Is an Algorithm... 
2007-01-04 12480 Helping Make the Shift From Combat to Commerc... 
2007-01-05 12398 Rise in Ethanol Raises Concerns About Corn as...     

的代码下面的最后一行的错误发生的历史:

import numpy as np 
import pandas as pd 
from nltk.classify import NaiveBayesClassifier 
from nltk.corpus import subjectivity 
from nltk.sentiment import SentimentAnalyzer 
from nltk.sentiment.util import *from nltk.sentiment.vader import  SentimentIntensityAnalyzer 
import unicodedata 
for date, row in df_stocks.T.iteritems(): 
    sentence = unicodedata.normalize('NFKD', df_stocks.loc[date, 'articles']).encode('ascii','ignore') 
    ss = sid.polarity_scores(sentence) 

谢谢

+0

可能的重复 - https://stackoverflow.com/questions/38246412/python-3-4-bytes-object-has-no-attribute-encode –

+0

[Python 3.4 - 'bytes'对象可能的重复没有任何属性'encode'](https://stackoverflow.com/questions/38246412/python-3-4-bytes-object-has-no-attribute-encode) – eyllanesc

+0

似乎'df_stocks.loc [date,'articles']'不是unicode str,df_stocks是什么? – aircraft

回答

1

unicodedata.normalize() docs开始,该方法将UNICODE字符串转换为通用格式字符串。

import unicodedata 

print(unicodedata.normalize('NFKD', u'abcdあäasc').encode('ascii', 'ignore')) 

将得到:

b'abcdaasc' 

所以,问题就在这里:df_stocks.loc[date, 'articles']不是一个Unicode字符串。

+0

是的你知道你是正确的......这是在Python 3类型str ..所以工作映射到Unicode现在......我刚刚意识到代码是一个端口从蟒蛇2这可能导致了这个错误 – Mike

+0

很高兴帮助你 – aircraft

相关问题