2017-10-29 143 views
-1

我在大熊猫数据帧以下列柱: ​​熊猫数据帧添加基于字符串

在“统计”列,每个统计以由空格隔开。我想为每个统计信息创建新的列。问题是不是每行都有每种类型的属性。例如。第2行没有“trey”。我该如何完成这一壮举?

我想这一点,但每个“后,刚添加了新列:

nba_2017_revised4 = nba_2017_revised3.join(nba_2017_revised3['Stats'].str.split(' ', 7, expand=True).rename(columns={0:'Points', 1:'Rebounds', 2:'Assists', 3:'Steals', 4:'Turnovers', 5:'3_Pointers', 6:'FG_Attempts', 7:'FT_Attempts'})) 

enter image description here

 
Date First Last Stats Minutes DKP Team Opp DRPM 
0 20170412.0 Ron Baker 8pt 1rb 4as 2to 1trey 3-6fg 1-2ft 29.350000 14.75 nyk phi -0.56 
1 20170409.0 Ron Baker 11pt 8rb 8as 1st 2to 1trey 5-12fg 38.100000 34.50 nyk tor -0.56 
2 20170407.0 Ron Baker 2pt 2rb 7as 1to 1-7fg 30.500000 14.50 nyk mem -0.56 
3 20170406.0 Ron Baker 12pt 2rb 2as 2to 5-9fg 2-2ft 27.166667 16.50 nyk was -0.56 
4 20170404.0 Ron Baker 9pt 4rb 6as 2st 4to 1trey 4-7fg 0-1ft 37.300000 25.50 nyk chi -0.56 

感谢。

+4

没有图片请以文本形式添加数据。我们如何复制数据以尝试我们的解决方案。 – Dark

+0

什么是预期输出 – Dark

+0

nba_2017_revised4 = nba_2017_revised3.join(nba_2017_revised3 ['Stats']。str.split('',7,expand = True).rename(columns = {0:'Points',1''Rebounds' ,2:'助攻',3:'抢断',4:'失误',5:'3_Pointers',6:'FG_Attempts',7:'FT_Attempts'})) –

回答

1

我会使用正则表达式来解析每个拆分,通过在最后连续字母作为列值和字符串之前的值作为值。

import pandas as pd 
import re 

pat = lambda x: re.match('^(.+?)([a-z]+)$', x).groups() 
prs = lambda s: pd.Series(*zip(*[pat(x) for x in s.split()])) 

df.drop('Stats', 1).join(df.Stats.apply(prs)) 

     Date First Last Minutes DKP Team Opp DRPM as fg ft pt rb st to trey 
0 20170412 Ron Baker 29.350000 14.75 nyk phi -0.56 4 3-6 1-2 8 1 NaN 2 1 
1 20170409 Ron Baker 38.100000 34.50 nyk tor -0.56 8 5-12 NaN 11 8 1 2 1 
2 20170407 Ron Baker 30.500000 14.50 nyk mem -0.56 7 1-7 NaN 2 2 NaN 1 NaN 
3 20170406 Ron Baker 27.166667 16.50 nyk was -0.56 2 5-9 2-2 12 2 NaN 2 NaN 
4 20170404 Ron Baker 37.300000 25.50 nyk chi -0.56 6 4-7 0-1 9 4 2 4 1 
+0

我很高兴看到你的结果,piRSquared,但我在尝试代码时出现以下错误:AttributeError:'float'object has no attribute'split' –

+0

@ GilO'Brien这是因为'Stats'中的一些值列是'np.nan',它们被定义为'float'。你应该用'''填充na值。尝试这个'df.drop('Stats',1).join(df.Stats.fillna('')。apply(prs))' – piRSquared

+1

它工作!我欠你一杯啤酒! –