2017-06-12 75 views
0

所以我需要去通过含有对某些视频游戏信息的CSV文件,并基于此游戏的用户得分的新变量是我的代码:数据管理和图形与蟒蛇

#Imports 
import pandas 
import numpy as np 
import matplotlib.pyplot as plt 

data = pandas.read_csv("Data Collections/metacritic_games_2016_11.csv",  encoding='latin-1') 
data['year'] = pandas.DatetimeIndex(data['release']).year 
data = data[data["year"] >= 2000] 

rating = [] 
for index, row in data.iterrows(): 
if row['user_score'] >= 7.5: 
    rating.append("Good") 
elif row['user_score'] >= 6.5: 
    rating.append("Average") 
elif row['user_score'] >= 0: 
    rating.append("Bad") 

data["new_rating"] = pandas.Series(rating) 

year = 2000 
index = 0 
while year != 2016: 
vals = data[data["year"] == year]["new_rating"].value_counts() 
plt.bar(index, vals["Bad"], color='#494953') 
plt.bar(index, vals["Average"], color='#6A7EFC', bottom=vals["Bad"]) 
plt.bar(index, vals["Good"], color='#FF5656', bottom=vals["Average"] + vals["Bad"]) 
index += 1 
year += 1 

plt.show() 

然而,我不断收到错误说:

if row['user_score'] >= 7.5: 
TypeError: '>=' not supported between instances of 'str' and 'float' 

我不知道该怎么办。任何帮助表示赞赏

+0

尝试类型转换到浮排[“user_score”] –

+0

如果我的回答解决您的问题,请点击选中标记接受它我的答案的左边。 –

回答

2

user_score列中的其中一个数字由于某种原因被视为字符串。假设它不是像"seventeen"值,您可以修复与

data['user_score'] = data['user_score'].astype(float) 

我也建议更换你的代码来创建你的rating列。取而代之的是:

rating = [] 
for index, row in data.iterrows(): 
if row['user_score'] >= 7.5: 
    rating.append("Good") 
elif row['user_score'] >= 6.5: 
    rating.append("Average") 
elif row['user_score'] >= 0: 
    rating.append("Bad") 

data["new_rating"] = pandas.Series(rating) 

你应该做这样的事情:

group_boundaries = [0, 6.5, 7.5, inf] 
group_labels = ['bad', 'average', 'good'] 

data['rating'] = pd.cut(data['user_score'], 
         bins = group_boundaries, 
         labels=group_labels) 
+0

谢谢,我现在明白了! –