2017-08-25 39 views
0

示例CSV数据:Python的CSV:如何从数据帧提取与条件数据,编辑所提取的数据,然后把它放回数据帧

ID,AC_Input_Voltage,AC_Input_Current,DC_Output_Voltage,DC_Output_Current,DC_Output_Power,Input_Active_Power,Input_Reactive_Power,Input_Apparent_Power,Line_Frequency,DC_Ref,AC_Ref,Time_Stamp 
8301,418,13.2,34.4,136,4673,1,-1,5524.5,0,49,0,22/6/2017 05:11:00 
8301,419.3,2.3,0.7,-0.9,-0.6,1,-1,946.2,0,50,0,22/6/2017 05:11:01 
8301,417.7,15.2,30.3,196.5,5962,1,-1,6355,0,49,0,22/6/2017 05:11:02 
8301,418.7,2.3,0.7,-0.9,-0.6,1,-1,944.7,0,50,0,22/6/2017 05:11:03 
8301,419.3,3.4,53.6,10.8,580.2,1,-1,1432.8,0,49,0,22/6/2017 05:11:04 
8301,417.7,13.6,30.1,170.4,5122.7,1,-1,5681.8,0,50,0,22/6/2017 05:11:05 
8301,418,11.5,41.2,105,4328.2,1,-1,4796.9,0,49,0,22/6/2017 05:11:07 
8301,419.7,2.3,0.8,-0.9,-0.7,1,-1,946.9,0,51,0,22/6/2017 05:11:08 
8301,419.7,2.3,40.6,-0.7,-27.9,1,-1,974,0,49,0,22/6/2017 05:11:09 
8301,417.4,14.9,30.4,194.4,5903.8,1,-1,6215.4,0,51,0,22/6/2017 05:11:10 
8301,417.7,14.7,30.5,186.2,5682.9,1,-1,6139.5,0,49,0,22/6/2017 05:11:11 
8301,418,12,31.5,141.5,4456.9,1,-1,5012.5,0,51,0,22/6/2017 05:11:12 
8301,419,2.3,0.7,-1.4,-0.9,1,-1,945.4,0,49,0,22/6/2017 05:11:13 
8301,419,2.3,0.7,-0.9,-0.6,1,-1,945.4,0,50,0,22/6/2017 05:11:14 
8301,419.7,2.3,0.8,-0.9,-0.7,1,-1,946.9,0,50,0,22/6/2017 05:11:15 
8301,419,2.3,0.7,-0.9,-0.6,1,-1,945.4,0,49,0,22/6/2017 05:11:16 
8301,419,2.3,32.9,-0.2,-5.7,1,-1,972.4,0,51,0,22/6/2017 05:11:17 
8301,419.3,2.3,50.3,0.3,17.3,1,-1,973.2,0,49,0,22/6/2017 05:11:18 
8301,417.4,15.2,30.5,197.4,6010.5,1,-1,6350,0,50,0,22/6/2017 05:11:19 
8301,418.7,2.3,0.9,-0.9,-0.7,1,-1,944.7,0,49,0,22/6/2017 05:11:20 
8301,419,2.3,42.9,-0.2,-7.4,1,-1,972.4,0,50,0,22/6/2017 05:11:21 
8301,417.4,13.9,30.4,180,5477.6,1,-1,5811.8,0,49,0,22/6/2017 05:11:22 
8301,419.7,2.3,0.9,-0.9,-0.8,1,-1,946.9,0,50,0,22/6/2017 05:11:23 
8301,418.7,2.3,0.7,-0.9,-0.6,1,-1,944.7,0,50,0,22/6/2017 05:11:24 
8301,418.3,2.3,0.6,-0.9,-0.5,1,-1,943.9,0,49,0,22/6/2017 05:11:25 

我试过下面的代码和管理的数据进行编辑然后把它们放入一个新的数据框(df_filter2):

import numpy as np 
from datetime import date,time,datetime 
import pandas as pd 
import csv 

df = pd.read_csv('Data.csv') 
df["Time_Stamp"] = pd.to_datetime(df["Time_Stamp"]) # convert to Datetime 

def getMask(start,end): 
    mask = (df['Time_Stamp'] > start) & (df['Time_Stamp'] <= end) 
    return mask; 

start = '2017-06-22 05:00:00' 
end = '2017-06-22 05:20:00' 
timerange = df.loc[getMask(start, end)] 

df_filter = timerange[timerange["AC_Input_Current"].le(3.0)] # new df with less or equal to 0.5 
#print(df_filter) 

where = (df_filter[df_filter["Time_Stamp"].diff().dt.total_seconds() > 1] ["Time_Stamp"] - pd.Timedelta("1s")).astype(str).tolist() # Find where diff > 1 second 
df_filter2 = timerange[timerange["Time_Stamp"].isin(where)] # Create new df with those 
#print(df_filter2) 
df_filter2["AC_Input_Current"] = 0.0 # Set c1 to 0.0 

#display spikes (high possibility of data being a spike) 
for index, row in df_filter2.iterrows(): 
    values = row.astype(str).tolist() 
    print(','.join(values)) 

输出::下面的编辑行是在数据帧df_filter2 ..

8301,418.0,0.0,34.4,136.0,4673.0,1,-1,5524.5,0,49,0,2017-06-22 05:11:00 
8301,417.7,0.0,30.3,196.5,5962.0,1,-1,6355.0,0,49,0,2017-06-22 05:11:02 
8301,418.0,0.0,41.2,105.0,4328.2,1,-1,4796.9,0,49,0,2017-06-22 05:11:07 
8301,418.0,0.0,31.5,141.5,4456.9,1,-1,5012.5,0,51,0,2017-06-22 05:11:12 
8301,417.4,0.0,30.5,197.4,6010.5,1,-1,6350.0,0,50,0,2017-06-22 05:11:19 
8301,417.4,0.0,30.4,180.0,5477.6,1,-1,5811.8,0,49,0,2017-06-22 05:11:22 

我想是从df_filter2放回输出(从df_filter2)到主数据帧df,更换行从df与同Time_Stamp,与行。我该怎么做呢?

回答

0

Time_Stamp设为两个数据帧的索引,然后根据匹配索引将df指定为df_filter2值。

首先,确保两个数据帧具有相同格式的Time_Stamp以及相同的列名称。对于所提供的样本数据,我用:

# copy df sample data from OP 
df = pd.read_clipboard(sep=",", parse_dates=["Time_Stamp"]) 
# now copy df_filter2 sample data 
df_filter2 = pd.read_clipboard(sep=",", header=None, names=df.columns, parse_dates=[12]) 

现在,设置Time_Stamp为索引和更换匹配的行:

df = df.set_index("Time_Stamp") 
df_filter2 = df_filter2.set_index("Time_Stamp") 
df.loc[df_filter2.index] = df_filter2 

UPDATE(每评论)
要明确,这里是一个完整的工作示例,从data字典开始,编写df,并使用OP代码生成df_filter2。只做了轻微的修改(例如在原始data中将Time_Stamp定义为pd.Timestamp,并在地点添加.loc)。

# sample data 
import pandas as pd 
from pandas import Timestamp 

data = {'AC_Input_Current': {0: 13.199999999999999, 1: 2.2999999999999998,2: 15.199999999999999,3: 2.2999999999999998,4: 3.3999999999999999,5: 13.6,6: 11.5,7: 2.2999999999999998,8: 2.2999999999999998,9: 14.9,10: 14.699999999999999,11: 12.0,12: 2.2999999999999998,13: 2.2999999999999998,14: 2.2999999999999998,15: 2.2999999999999998,16: 2.2999999999999998,17: 2.2999999999999998,18: 15.199999999999999,19: 2.2999999999999998,20: 2.2999999999999998,21: 13.9,22: 2.2999999999999998,23: 2.2999999999999998,24: 2.2999999999999998}, 
'AC_Input_Voltage': {0: 418.0,1: 419.30000000000001,2: 417.69999999999999,3: 418.69999999999999,4: 419.30000000000001,5: 417.69999999999999,6: 418.0,7: 419.69999999999999,8: 419.69999999999999,9: 417.39999999999998,10: 417.69999999999999,11: 418.0,12: 419.0,13: 419.0,14: 419.69999999999999,15: 419.0,16: 419.0,17: 419.30000000000001,18: 417.39999999999998,19: 418.69999999999999,20: 419.0,21: 417.39999999999998,22: 419.69999999999999,23: 418.69999999999999,24: 418.30000000000001}, 
'DC_Output_Current': {0: 136.0,1: -0.90000000000000002,2: 196.5,3: -0.90000000000000002,4: 10.800000000000001,5: 170.40000000000001,6: 105.0,7: -0.90000000000000002,8: -0.69999999999999996,9: 194.40000000000001,10: 186.19999999999999,11: 141.5,12: -1.3999999999999999,13: -0.90000000000000002,14: -0.90000000000000002,15: -0.90000000000000002,16: -0.20000000000000001,17: 0.29999999999999999,18: 197.40000000000001,19: -0.90000000000000002,20: -0.20000000000000001,21: 180.0,22: -0.90000000000000002,23: -0.90000000000000002,24: -0.90000000000000002}, 
'DC_Output_Power': {0: 4673.0,1: -0.59999999999999998,2: 5962.0,3: -0.59999999999999998,4: 580.20000000000005,5: 5122.6999999999998,6: 4328.1999999999998,7: -0.69999999999999996,8: -27.899999999999999,9: 5903.8000000000002,10: 5682.8999999999996,11: 4456.8999999999996,12: -0.90000000000000002,13: -0.59999999999999998,14: -0.69999999999999996,15: -0.59999999999999998,16: -5.7000000000000002,17: 17.300000000000001,18: 6010.5,19: -0.69999999999999996,20: -7.4000000000000004,21: 5477.6000000000004,22: -0.80000000000000004,23: -0.59999999999999998,24: -0.5}, 
'DC_Output_Voltage': {0: 34.399999999999999,1: 0.69999999999999996,2: 30.300000000000001,3: 0.69999999999999996,4: 53.600000000000001,5: 30.100000000000001,6: 41.200000000000003,7: 0.80000000000000004,8: 40.600000000000001,9: 30.399999999999999,10: 30.5,11: 31.5,12: 0.69999999999999996,13: 0.69999999999999996,14: 0.80000000000000004,15: 0.69999999999999996,16: 32.899999999999999,17: 50.299999999999997,18: 30.5,19: 0.90000000000000002,20: 42.899999999999999,21: 30.399999999999999,22: 0.90000000000000002,23: 0.69999999999999996,24: 0.59999999999999998}, 
'DC_Ref': {0: 49,1: 50,2: 49,3: 50,4: 49,5: 50,6: 49,7: 51,8: 49,9: 51,10: 49,11: 51,12: 49,13: 50,14: 50,15: 49,16: 51,17: 49,18: 50,19: 49,20: 50,21: 49,22: 50,23: 50,24: 49}, 
'Input_Apparent_Power': {0: 5524.5,1: 946.20000000000005,2: 6355.0,3: 944.70000000000005,4: 1432.8,5: 5681.8000000000002,6: 4796.8999999999996,7: 946.89999999999998,8: 974.0,9: 6215.3999999999996,10: 6139.5,11: 5012.5,12: 945.39999999999998,13: 945.39999999999998,14: 946.89999999999998,15: 945.39999999999998,16: 972.39999999999998,17: 973.20000000000005,18: 6350.0,19: 944.70000000000005,20: 972.39999999999998,21: 5811.8000000000002,22: 946.89999999999998,23: 944.70000000000005,24: 943.89999999999998}, 
'Time_Stamp': {0: Timestamp('2017-06-22 05:11:00'),1: Timestamp('2017-06-22 05:11:01'),2: Timestamp('2017-06-22 05:11:02'),3: Timestamp('2017-06-22 05:11:03'),4: Timestamp('2017-06-22 05:11:04'),5: Timestamp('2017-06-22 05:11:05'),6: Timestamp('2017-06-22 05:11:07'),7: Timestamp('2017-06-22 05:11:08'),8: Timestamp('2017-06-22 05:11:09'),9: Timestamp('2017-06-22 05:11:10'),10: Timestamp('2017-06-22 05:11:11'),11: Timestamp('2017-06-22 05:11:12'),12: Timestamp('2017-06-22 05:11:13'),13: Timestamp('2017-06-22 05:11:14'),14: Timestamp('2017-06-22 05:11:15'),15: Timestamp('2017-06-22 05:11:16'),16: Timestamp('2017-06-22 05:11:17'),17: Timestamp('2017-06-22 05:11:18'),18: Timestamp('2017-06-22 05:11:19'),19: Timestamp('2017-06-22 05:11:20'),20: Timestamp('2017-06-22 05:11:21'),21: Timestamp('2017-06-22 05:11:22'),22: Timestamp('2017-06-22 05:11:23'),23: Timestamp('2017-06-22 05:11:24'),24: Timestamp('2017-06-22 05:11:25')}} 
df = pd.DataFrame(data) 

有具有恒定值的几列:

df["AC_Ref"] = 0 
df["ID"] = 8301 
df["Input_Active_Power"] = 1 
df["Input_Reactive_Power"] = -1 
df["Line_Frequency"] = 0 

现在构造df_filter2

def getMask(start,end): 
    mask = (df['Time_Stamp'] > start) & (df['Time_Stamp'] <= end) 
    return mask; 
start = '2017-06-22 05:00:00' 
end = '2017-06-22 05:20:00' 
timerange = df.loc[getMask(start, end)] 
df_filter = timerange.loc[timerange["AC_Input_Current"].le(3.0)] 
where = (df_filter.loc[df_filter["Time_Stamp"].diff().dt.total_seconds() > 1, "Time_Stamp"] - pd.Timedelta("1s")).astype(str).tolist() 
df_filter2 = timerange.loc[timerange["Time_Stamp"].isin(where)].copy() 
df_filter2["AC_Input_Current"] = 0.0 

最后,与匹配的行从df_filter2替换行中df(由Time_Stamp) :

df = df.set_index("Time_Stamp") 
df_filter2 = df_filter2.set_index("Time_Stamp") 
df.loc[df_filter2.index] = df_filter2 

我们可以检查,以确保发生更换:

assert(all(df.AC_Input_Current.sort_values()[:5].values == df_filter2.AC_Input_Current.values)) 
+0

而不是使用'pd.read_clipboard',如果我是对我们'read_csv'为'DF = df.read_csv('MainD2的。 csv',sep =',',parse_dates = [“Time_Stamp”]),如何更改'df_filter2'?对不起,我仍然在学python,所以是.. –

+0

你的代码已经生成了'df_filter2',一旦你有'df'。只要使用它。 –

+0

我试过你给出的代码,我去检查'df'中的数据,看看'df_filter2','AC_Input_Current'值被设置为0的行是否在'df'中。 。显然它不是。因为'AC_Input_Current'值仍然是'2.3' –