2016-03-15 90 views
0

假设我有DFPython和管拆分数据帧列

userid  subcategory    timestamp     smartexpenseid           companyid 
20648196 SmartExpense Declined 2016-03-06T16:44:55.702Z 11771712||91164585||||        9797 
43124398 SmartExpense Declined 2016-03-06T17:09:06.033Z 11111111|249178181?CARRT?266298850196|93461910|||| 63177 
76764125 SmartExpense Declined 2016-03-06T19:44:19.078Z 137177|250155900?HOTEL?270593373724|92826286||||  199412 

我想在同一个数据帧11111111 smartexpenseid列到单独的列拆分大熊猫数据帧?|?249178181 CARRT 266298850196 | 93461910 |||| - >“CctKey | TripId?SegType?SegId | EreceiptId | PctKey | MeKey | RcKey | CapKey”

有人可以请建议一种最好的方式来做到这一点在Python?

回答

1

尝试此

(?<CctKey>\d+)\|(?<TripId>\d*)\??(?<SegType>[^?]*)\??(?<SegId>\d*)\|(?<EreceiptId>\d+)\|(?<PctKey>[^|]*)\|(?<MeKey>[^|]*)\|(?<RcKey>[^|]*)\|(?<CapKey>[^|\n\s]*) 

Demo

在Python移除所有组?<name>语法

(\d+)\|(\d*)\??([^?]*)\??(\d*)\|(\d+)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|\n\s]*) 
+0

正则表达式=“\( \ d +?)|(? \ d *)\ (? [^?] *)\?(? \ d *)\ |(? \ d +)\ |(? [^ |] *)\ |(? [^ |] *)\ |(? [^ |] *)\ |(? [^ | \ n \ s] *) df2 ['smartexpenseid']。我试图添加它在Python中,它给了我一个错误,只声明“语法错误” –

+0

删除所有组? (\ d +)\ |(\ d *)\ ??([^?] *)\ ??(\ d *)\ |(\ d +)\ |([^ |] *)\ |( [^ |] *)\ |([^ |] *)\ |([^ | \ n \ s] *)'https://regex101.com/r/fV5cM3/2 –

+0

非常感谢TIm007。这真的有帮助! –