熊猫系列删除重复的问题

我有我正努力摆脱熊猫系列删除重复的问题

0  RWAY001 
1  RWAY001 
2  RWAY002 
3  RWAY002 
... 
112 RWAY057 
113 RWAY057 
114 RWAY058 
115 RWAY058 
Length: 116

Drop.duplicates（的副本一系列）似乎长度削减至58，但该指数仍似乎从0到去到116，但只是跳过重复项：

0  RWAY001 
2  RWAY002 
... 
112 RWAY057 
114 RWAY058 
Length: 58

因此，它似乎仍然存在与NaN值之间的行。我试过dropna（），但它对数据没有任何影响。

这是我的代码：

df = pd.read_csv(path + flnm) 
    fields = df.file 
    fields = fields.drop_duplicates() 
    print fields

希望得到任何帮助。谢谢。

来源

2016-05-30 Evgeni

我认为你需要reset_index与参数drop=True：

fields.reset_index(inplace=True, drop=True)

或者：

fields = fields.reset_index(drop=True)

样品：

import pandas as pd 

df = pd.DataFrame({'file': {0: 'RWAY001', 1: 'RWAY001', 2: 'RWAY002', 3: 'RWAY002', 115: 'RWAY058', 113: 'RWAY057', 112: 'RWAY057', 114: 'RWAY058'}}) 
print (df) 
     file 
0 RWAY001 
1 RWAY001 
2 RWAY002 
3 RWAY002 
112 RWAY057 
113 RWAY057 
114 RWAY058 
115 RWAY058 

print (df.file.drop_duplicates()) 
0  RWAY001 
2  RWAY002 
112 RWAY057 
114 RWAY058 
Name: file, dtype: object 

print (df.file.drop_duplicates().reset_index(drop=True)) 
0 RWAY001 
1 RWAY002 
2 RWAY057 
3 RWAY058 
Name: file, dtype: object

来源

2016-05-30 08:48:47 jezrael

该诀窍。谢谢！ – Evgeni

熊猫系列删除重复的问题

回答

相关问题