2017-04-16 109 views
-1

我正在尝试创建以熊猫数据框的形式表示的文档术语矩阵。这是我到目前为止的代码:创建文档术语矩阵时出现属性错误

df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete 

当我运行这段代码,我得到了以下错误:

'list' object has no attribute 'lower' 

我怎样才能摆脱这种错误的?

回答

0

包裹列表STR()对象将它们转换为字符串:

df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete 
+0

所以这似乎已经超过了问题,但现在我得到“ValueError异常:值的长度不符合的长度索引“的任何建议,为什么这是出现? – Jberk

+0

这个错误是熊猫图书馆内部的,所以我不确定。这可能值得一个新的问题。如果你确实把它作为一个新问题,我建议使用dataframe标签。 – JacobIRR

+0

好的,谢谢JacobIRR。我会继续并就这个新错误创建一个新问题。 – Jberk