2012-02-01 191 views
12
  • 我有一个形状为(4601,58)的numpy矩阵。
  • 我想基于行数的矩阵分裂随机按60%,20%,20%分
  • 这是机器学习任务,我需要
  • 有随机选取行的numpy的功能?

回答

17

可以使用numpy.random.shuffle

import numpy as np 

N = 4601 
data = np.arange(N*58).reshape(-1, 58) 
np.random.shuffle(data) 

a = data[:int(N*0.6)] 
b = data[int(N*0.6):int(N*0.8)] 
c = data[int(N*0.8):] 
3

如果你想随机选择行,你可以只使用random.sample从标准Python库:

import random 

population = range(4601) # Your number of rows 
choice = random.sample(population, k) # k being the number of samples you require 

random.sample样品无需更换,所以你不必担心重复行结束了在choice。给定一个叫做matrix的数组,可以通过切片来选择行,如下所示:matrix[choice]

当然,k可以等于总体中元素的总数,然后choice将包含行的索引的随机排序。那么你可以根据你的需要划分choice,如果这就是你需要的。

7

一种补充HYRY的答案,如果你想一直洗牌多个阵列X,Y,Z,使用相同的第一维:x.shape[0] == y.shape[0] == z.shape[0] == n_samples

你可以这样做:

rng = np.random.RandomState(42) # reproducible results with a fixed seed 
indices = np.arange(n_samples) 
rng.shuffle(indices) 
x_shuffled = x[indices] 
y_shuffled = y[indices] 
z_shuffled = z[indices] 

然后每个洗牌阵列的分裂在HYRY的回答进行。

1

既然你需要它的机器学习,这里是我写的一个方法:

import numpy as np 

def split_random(matrix, percent_train=70, percent_test=15): 
    """ 
    Splits matrix data into randomly ordered sets 
    grouped by provided percentages. 

    Usage: 
    rows = 100 
    columns = 2 
    matrix = np.random.rand(rows, columns) 
    training, testing, validation = \ 
    split_random(matrix, percent_train=80, percent_test=10) 

    percent_validation 10 
    training (80, 2) 
    testing (10, 2) 
    validation (10, 2) 

    Returns: 
    - training_data: percentage_train e.g. 70% 
    - testing_data: percent_test e.g. 15% 
    - validation_data: reminder from 100% e.g. 15% 
    Created by Uki D. Lucas on Feb. 4, 2017 
    """ 

    percent_validation = 100 - percent_train - percent_test 

    if percent_validation < 0: 
     print("Make sure that the provided sum of " + \ 
     "training and testing percentages is equal, " + \ 
     "or less than 100%.") 
     percent_validation = 0 
    else: 
     print("percent_validation", percent_validation) 

    #print(matrix) 
    rows = matrix.shape[0] 
    np.random.shuffle(matrix) 

    end_training = int(rows*percent_train/100)  
    end_testing = end_training + int((rows * percent_test/100)) 

    training = matrix[:end_training] 
    testing = matrix[end_training:end_testing] 
    validation = matrix[end_testing:] 
    return training, testing, validation 

# TEST: 
rows = 100 
columns = 2 
matrix = np.random.rand(rows, columns) 
training, testing, validation = split_random(matrix, percent_train=80, percent_test=10) 

print("training",training.shape) 
print("testing",testing.shape) 
print("validation",validation.shape) 

print(split_random.__doc__) 
  • 训练(80,2)
  • 测试(10,2)
  • 验证(10, 2)