2017-08-13 62 views
1

我有一长串对象,我需要划分成较小的列表,每列有20个条目。问题在于每个对象只能在一个列表中出现一次。将长列表划分为R中的较短列表

# Create some example data... 
# Make a list of objects. 
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance') 

# Generate a longer list, with a random sequence and number of repetitions for each entry 
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE))) 

print(LONG.LIST) 

Name 
1   Cup 
2 Distance 
3  Roof 
4  Pencil 
5  Lunch 
6  Toast 
7  Watch 
8  Bottle 
9   Car 
10  Roof 
11  Lunch 
12 Forever 
13  Cheese 
14 Oranges 
15  Ocean 
16 Chocolate 
17  Socks 
18  Leaves 
19 Oranges 
20 Distance 
21  Green 
22  Paper 
23  Red 
24  Paper 
25  Trees 
26 Chocolate 
27  Bottle 
28  Dog 
29  Wind 
30  Parrot 
etc.... 

使用上述生成的例子,'Distance'出现在两个位置“2”和位置“20”,'Lunch'在两个“5”和'11,以及在'Oranges'“14”和19' ,所以第一没有重复的列表需要扩展到包括'Green','Paper''Red'。然后第二个列表将与'Paper'开始在24位

最后名单很可能是不完整的,所以这将是很好的与“NA的垫它

如果输出分别列这将是最简单的一个数据框。

我不知道从哪里开始,所以任何建议都非常感谢。谢谢!

+1

你意思是这个'library(tidyverse); LONG.LIST%>%group_by(Name)%>%mutate(grp = row_number())%>%group_by(grp)%> mutate(ind = row_number())%>%传播(grp,名称)' – akrun

+0

@akrun - 太好了,谢谢!似乎运作良好。如果你想把它写成答案,我会接受它。我对tidyverse不熟悉,您能否详细介绍一下发生了什么?我想改变的唯一方法是按字母顺序排列每个列表。 – EcologyTom

+0

@EcologyTom你的意思是说每个列表应该从LONG.LIST的(24n + 1)索引开始? –

回答

3

我们可以用tidyverse来做到这一点。通过“名称”组合,创建序列号一栏,我们在group_by使用它来创建一个新的序列列“IND”,然后转化为“宽”格式spreadorder列字母

library(tidyverse) 
LONG.LIST %>% 
    group_by(Name) %>% 
    mutate(grp = row_number()) %>% 
    group_by(grp) %>% 
    mutate(ind = row_number()) %>% 
    spread(grp, Name) %>% 
    mutate_at(vars(-one_of("ind")), funs(.[order(as.character(.))])) 
# A tibble: 40 x 12 
#  ind  `1`  `2`  `3`  `4`  `5`  `6`  `7`  `8`  `9`  `10`  `11` 
# <int> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> 
# 1  1  Bag  Bag  Bag  Bag  Bag  Bag  Bag  Bag  Cup Distance Distance 
# 2  2  Blue  Blue  Book  Book  Book Cloud  Cup  Cup Distance Train  NA 
# 3  3  Book  Book Bottle Cloud Cloud  Cup Distance Distance Train  NA  NA 
# 4  4 Bottle Bottle Cheese  Cup  Cup Distance  Dog Hammock  NA  NA  NA 
# 5  5  Car  Car Cloud Distance Distance  Dog Hammock  Moon  NA  NA  NA 
# 6  6 Cheese Cheese  Cup  Dog  Dog Hammock  Moon Parrot  NA  NA  NA 
# 7  7 Chocolate Chocolate Distance  Fork Hammock Horse Paper Train  NA  NA  NA 
# 8  8  Cloud  Cloud  Dog Hammock Horse  Moon Parrot  NA  NA  NA  NA 
# 9  9  Cup  Cup  Fork Hippo  Mint Paper Train  NA  NA  NA  NA 
#10 10 Distance Distance Green Horse  Moon Parrot  NA  NA  NA  NA  NA 
# ... with 30 more rows 
+0

谢谢@akrun。虽然错误使用方法(“tbl_vars”): 没有适用于'tbl_vars'的方法应用于类“c('col_list','lazy_dots')” – EcologyTom

+0

@ EcologyTom我正在使用'tidyr_0.6.3'和'dplyr_0.7.2'。在'R 3.4.1'您能否请检查您的版本 – akrun

+0

@EcologyTom目前尚不清楚是否由于版本差异。你可以试试'%>%spread(grp,Name)%>% as.data.frame()%>% mutate_at(vars(-one_of(“ind”)),funs(。[order(as.character (。))]))' – akrun