拆分数据框随机（依赖于唯一值）

我有一个数据帧df，看起来像这样：拆分数据框随机（依赖于唯一值）

| A | B | ... | 
--------------------- 
| one | ... | ... | 
| one | ... | ... | 
| one | ... | ... | 
| two | ... | ... | 
| three | ... | ... | 
| three | ... | ... | 
| four | ... | ... | 
| five | ... | ... | 
| five | ... | ... |

正如你可以看到A有5个独特的价值。我想随机分割DataFrame。例如，我想在DataFrame df1中使用3个唯一值，并在DataFrame df2中使用2个唯一值。我的问题是他们不是独一无二的。我不想通过两个DataFrame分割这些独特的值。

所以导致数据框看起来是这样的：

数据帧df1与3个独特的价值观：

| A | B | ... | 
--------------------- 
| one | ... | ... | 
| one | ... | ... | 
| one | ... | ... | 
| three | ... | ... | 
| three | ... | ... | 
| five | ... | ... | 
| five | ... | ... |

数据帧df2 2个独特的价值观：

| A | B | ... | 
--------------------- 
| two | ... | ... | 
| four | ... | ... |

反正是有如何轻松实现这一点？我想到了分组，但我不知道如何从这个斯普利特...

来源

2017-06-29 ScientiaEtVeritas

你将有独特的一个因素提取到一个列表，然后拆分此列表分为2所列出，然后选择您的基于2个列表的数据帧。 –

设置

df=pd.DataFrame({'A': {0: 'one', 
    1: 'one', 
    2: 'one', 
    3: 'two', 
    4: 'three', 
    5: 'three', 
    6: 'four', 
    7: 'five', 
    8: 'five'}, 
'B': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8}})

解决方案

#get 2 unique keys from column A for df1. You can control the split either 
# by absolute number in each group, or by a percentage. Check docs for the .sample() func. 
df1_keys = df.A.drop_duplicates().sample(2) 
df1 = df[df.A.isin(df1_keys)] 
#anything not in df1_keys will be assigned to df2 
df2 = df[~df.A.isin(df1_keys)] 

df1_keys 
Out[294]: 
7 five 
0  one 
Name: A, dtype: object 

df1 
Out[295]: 
     A B 
0 one 0 
1 one 1 
2 one 2 
7 five 7 
8 five 8 

df2 
Out[296]: 
     A B 
3 two 3 
4 three 4 
5 three 5 
6 four 6

来源

2017-06-29 09:45:38 Allen

v = df1['A'].unique() # Get the unique values 
np.shuffle(v) # Shuffle them 
v1,v2 = np.array_split(v,2) # Split the unique values into two arrays

最后，指数使用.isin()方法来获得期望的结果你的数据帧。

r1 = df[df['A'].isin(v1)] 
r2 = df[df['A'].isin(v2)]

来源

2017-06-29 09:41:12

拆分数据框随机（依赖于唯一值）

回答

相关问题