如何获得sklearn非洗牌train_test_split

如果我想要一个随机火车/测试分裂，我用的是sklearn辅助函数：如何获得sklearn非洗牌train_test_split

In [1]: from sklearn.model_selection import train_test_split 
    ...: train_test_split([1,2,3,4,5,6]) 
    ...: 
Out[1]: [[1, 6, 4, 2], [5, 3]]

什么是最简洁的方式来获得一个非改组的列车/测试分裂，即

[[1,2,3,4], [5,6]]

编辑目前我使用

train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):]

但希望有更好的东西。我已经打开了sklearn https://github.com/scikit-learn/scikit-learn/issues/8844

EDIT 2个问题：我的PR已经被合并，在scikit学习版本0.19，您可以shuffle=False传递参数给train_test_split获得非改组的分裂。

来源

2017-05-08 maxymoo

使用numpy.split：

import numpy as np 
data = np.array([1,2,3,4,5,6]) 

np.split(data, [4])   # modify the index here to specify where to split the array 
# [array([1, 2, 3, 4]), array([5, 6])]

如果您想按百分比分割，则可以从数据的形状计算分裂指数：

data = np.array([1,2,3,4,5,6]) 
p = 0.6 

idx = int(p * data.shape[0]) + 1  # since the percentage may end up to be a fractional 
             # number, modify this as you need, usually shouldn't 
             # affect much if data is large 
np.split(data, [idx]) 
# [array([1, 2, 3, 4]), array([5, 6])]

来源

2017-05-08 00:18:24 Psidom

谢谢，这几乎看起来像我想要的但如果我不知道我想吐的价值？即说我只想做一个60/40分割？ – maxymoo

嗯是的我希望能避免这样的事情，但也许是不可能在这种情况下，你认为它可能会更清楚，只要做'data [：int（len（data）* p）]，data [int（len（数据）* p）：]' – maxymoo

是的。这绝对有效。 – Psidom

我不加入除了一个容易复制粘贴功能除了Psidom的答案：

def non_shuffling_train_test_split(X, y, test_size=0.2): 
    i = int((1 - test_size) * X.shape[0]) + 1 
    X_train, X_test = np.split(X, [i]) 
    y_train, y_test = np.split(y, [i]) 
    return X_train, X_test, y_train, y_test

更新：在某些时候，这个功能变得内置的，所以现在你可以这样做：

from sklearn.model_selection import train_test_split 
train_test_split(X, y, test_size=0.2, shuffle=False)

来源

2017-05-28 09:29:22 Anake

所有你需要做的就是将洗牌参数为False，分层参数设置为无：

In [49]: train_test_split([1,2,3,4,5,6],shuffle = False, stratify = None) 
    Out[49]: [[1, 2, 3, 4], [5, 6]]

来源

2017-08-16 04:55:01

嘿实际上mayank' stratify = None'是默认的（请参阅原始问题中的“编辑2”） – maxymoo

如何获得sklearn非洗牌train_test_split

回答

相关问题