我有一个pandas.DataFrame
,由于文件(.csv)命名不一致,因此列名冗余。这导致列与大多NaN值从不一致命名的列创建数据帧
Bike # Bikenumber Bike# SubscriberType SubscriptionType
NaN NaN W20848 NaN Subscriber
NaN NaN W20231 NaN Subscriber
NaN NaN W00785 NaN Subscriber
NaN NaN W00126 NaN Subscriber
NaN NaN W20929 NaN Casual
有没有一种方法来创建一个新列,并从具有值的多个列填充它?如果多个列不是NaN
,我可以选择从哪个列中提取值?
Bike# Bikenumber Bike # Selected_Num
number1 number2 NaN number2
试图填补与单个列时,我能得到这个
sample['Bike_Num'] = sample['Bike #'].fillna(sample['Bike#'])
print(sample)
Bike # Bikenumber Bike# SubscriberType SubscriptionType Bike_Num
NaN NaN W20848 NaN Subscriber W20848
NaN NaN W20231 NaN Subscriber W20231
NaN NaN W00785 NaN Subscriber W00785
NaN NaN W00126 NaN Subscriber W00126
NaN NaN W20929 NaN Casual W20929
这失败的
sample['Bike_Num'] = sample['Bike #'].fillna(sample['Bike#'], sample['Bikenumber'])
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
那岂不是更好地清洁当时的数据是从CSV读?数据是如何从csv文件中读取的? –
@StephenRauch:我从目录中读取了〜20个csv文件,并使用'for'循环并将它们与'total_df = pd.concat(dfs,ignore_index = True)'连接起来。 –
您正在使用'pandas.read_csv'?我也有理解你基本上有一些列名称的同义词列表吗? –