我有一个由字符串组成的熊猫数据框,即'P1','P2','P3',...,null。熊猫数据框用NaN替换字符串使用pd.concat
当我尝试连接这个数据框与另一个时,所有的字符串被替换为'NaN'。
看我下面的代码:
descriptions = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json')
descriptions = descriptions.reset_index(drop=1)
descriptions['desc'] = descriptions.short_desc.apply(operator.itemgetter(0)).apply(operator.itemgetter('what'))
f1=pd.DataFrame(descriptions['desc'])
bugPrior = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/priority.json')
bugPrior = bugPrior.reset_index(drop=1)
bugPrior['priority'] = bugPrior.priority.apply(operator.itemgetter(0)).apply(operator.itemgetter('what'))
f2=pd.DataFrame(bugPrior['priority'])
df = pd.concat([f1,f2])
print(df.head())
输出如下:
desc priority
0 Usability issue with external editors (1GE6IRL) NaN
1 API - VCM event notification (1G8G6RR) NaN
2 Would like a way to take a write lock on a tea... NaN
3 getter/setter code generation drops "F" in "..... NaN
4 Create Help Index Fails with seemingly incorre... NaN
任何想法,我怎么可能会停止这种情况的发生?
最终,我的目标是将所有内容都放在一个数据框中,以便我可以删除所有具有“空”值的行。这也有助于后面的代码。
谢谢。
谢谢你的帮助,这个数据集已经在驱动m个坚果了,这只是数据导入! – JohnWayne360