我在R中使用一个很长的数据帧，但遇到了一些问题。我的数据帧实际上由两个较小的数据帧组成。然后，我调整了从数月到数年的时间安排，以便两者共享一个共同的时间表。在R中结合行

但是，我现在面临的问题是，有时我有两行具有相同的时间值（因此每个调查问卷只有一行），但是我希望每个时间变量只有一行。（我附上了问题的图片，这可能比我的解释更具洞察力）请注意，在这一点上，我仍然希望数据框采用长格式，但只想摆脱“额外的行” 。

谁能告诉我该怎么做？

附加头代码，其中nomem = ID，time.compressed = time，sel01-03 =第一个问卷的一部分，close_num和gener_sat =第二个问卷的一部分。

structure(list(nomem_encr = c(800009L, 800009L, 800009L, 800012L, 
800015L, 800015L), timeline.compressed = c(79, 79, 95, 79, 28, 
28), sel01 = c(NA, 6L, NA, NA, NA, 7L), sel02 = c(NA, 6L, NA, 
NA, NA, 7L), sel03 = c(NA, 3L, NA, NA, NA, 5L), sel04 = c(NA, 
6L, NA, NA, NA, 6L), close_num = c(1, NA, 0.2, 1, 0.8, NA), gener_sat = c(7L, 
NA, 7L, 8L, 7L, NA)), .Names = c("nomem_encr", "timeline.compressed", 
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"), class = "data.frame", row.names = c(NA, 
6L))

https://i.stack.imgur.com/3p038.png

来源

2017-10-16 Elisabeth

你也可以提供样本数据。使用'head'创建子集和'dput'向我们展示如何复制 – Olivia

回复您的第一条评论：我恐怕完全不了解您的意见。我猜想对于每一行，X变量都被回答或Y变量。然而，有时两行具有相同的时间变量，即，X和Y变量同时被回答。我想要的是将这些行组合成一行，其中X和Y变量都被回答。 – Elisabeth

我们如何知道你必须修剪哪些行？ – jaySf

使用reshape2和dplyr包

加载库和数据：

library(reshape2) 
library(dplyr) 

x <- structure(
    list(
    nomem_encr = c(800009L, 800009L, 800009L, 800012L, 800015L, 800015L), 
    timeline.compressed = c(79, 79, 95, 79, 28, 28), 
    sel01 = c(NA, 6L, NA, NA, NA, 7L), 
    sel02 = c(NA, 6L, NA, NA, NA, 7L), 
    sel03 = c(NA, 3L, NA, NA, NA, 5L), 
    sel04 = c(NA, 6L, NA, NA, NA, 6L), 
    close_num = c(1, NA, 0.2, 1, 0.8, NA), 
    gener_sat = c(7L, NA, 7L, 8L, 7L, NA) 
), 
    .Names = c(
    "nomem_encr", "timeline.compressed", 
    "sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat" 
), 
    class = "data.frame", 
    row.names = c(NA, 6L) 
) 
x

这是你的数据是什么样子：

nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat 
1  800009     79 NA NA NA NA  1.0   7 
2  800009     79  6  6  3  6  NA  NA 
3  800009     95 NA NA NA NA  0.2   7 
4  800012     79 NA NA NA NA  1.0   8 
5  800015     28 NA NA NA NA  0.8   7 
6  800015     28  7  7  5  6  NA  NA

现在，我们将数据融入长型：

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>% 
head(15)

输出：

nomem_encr timeline.compressed variable value 
1  800009     79 sel01 NA 
2  800009     79 sel01  6 
3  800009     95 sel01 NA 
4  800012     79 sel01 NA 
5  800015     28 sel01 NA 
6  800015     28 sel01  7 
7  800009     79 sel02 NA 
8  800009     79 sel02  6 
9  800009     95 sel02 NA 
10  800012     79 sel02 NA 
11  800015     28 sel02 NA 
12  800015     28 sel02  7 
13  800009     79 sel03 NA 
14  800009     79 sel03  3 
15  800009     95 sel03 NA

如果我们投了熔化的数据框，默认行为是计算我们对每件物品有多少条目：

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>% 
    dcast(
    formula = nomem_encr + timeline.compressed ~ variable 
)

输出：

Aggregation function missing: defaulting to length 
    nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat 
1  800009     79  2  2  2  2   2   2 
2  800009     95  1  1  1  1   1   1 
3  800012     79  1  1  1  1   1   1 
4  800015     28  2  2  2  2   2   2

我们有2项用于通过800009 79（使用nomem_encr和timeline.compressed作为识别变数）所标识的项目。

我们可以改变默认的行为别的东西像sum：

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>% 
    dcast(
    formula = nomem_encr + timeline.compressed ~ variable, 
    fun.aggregate = function(xs) sum(xs, na.rm = TRUE) 
)

输出：

nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat 
1  800009     79  6  6  3  6  1.0   7 
2  800009     95  0  0  0  0  0.2   7 
3  800012     79  0  0  0  0  1.0   8 
4  800015     28  7  7  5  6  0.8   7

来源

2017-10-16 14:39:53

这似乎工作。非常感谢！ – Elisabeth

更新：我只是注意到，当我使用这段代码时，它返回零，一和occiasional两个我的数据，而不是什么实际值。我复制粘贴你的语法并将其应用于整个数据集。任何想法可能会出错？此外，我得到这个错误：汇聚功能丢失：默认为长度 – Elisabeth

结构（列表（nomem_encr = C（800009L，800009L，800012L，800015L， 800015L，800015L），timeline.compressed = C（79，95，79，28 ，40， 52），sel01 = C（1L，0L，0L，1L，1L，0L），sel02 = C（1L，0L，0L， 1L，1L，0L），sel03 = C（1L，0L， 0L，1L，1L，0L），close_num = C（1L， 1L，1L，1L，1L，1L），gener_sat = C（1L，1L，1L，1L，1L，1L）），.Names = C（ “nomem_encr”， “timeline.compressed”， “sel01”， “sel02”， “sel03”， “close_num”， “gener_sat”），类= “data.frame”，row.names = C（NA，6L ）） – Elisabeth

您可以dplyr + tidyr做到这一点：

library(dplyr) 
library(tidyr) 

df %>% 
    group_by(nomem_encr, timeline.compressed) %>% 
    summarize_all(funs(sort(.)[1]))

结果：

# A tibble: 4 x 8 
# Groups: nomem_encr [?] 
    nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat 
     <int>    <dbl> <int> <int> <int> <int>  <dbl>  <int> 
1  800009     79  6  6  3  6  1.0   7 
2  800009     95 NA NA NA NA  0.2   7 
3  800012     79 NA NA NA NA  1.0   8 
4  800015     28  7  7  5  6  0.8   7

如果你想更换NA与零的，你可以做到以下几点：

df %>% 
    group_by(nomem_encr, timeline.compressed) %>% 
    summarize_all(funs(sort(.)[1])) %>% 
    mutate_all(funs(replace(., is.na(.), 0)))

结果：

# A tibble: 4 x 8 
# Groups: nomem_encr [3] 
    nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat 
     <int>    <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> 
1  800009     79  6  6  3  6  1.0   7 
2  800009     95  0  0  0  0  0.2   7 
3  800012     79  0  0  0  0  1.0   8 
4  800015     28  7  7  5  6  0.8   7

数据：

df = structure(list(nomem_encr = c(800009L, 800009L, 800009L, 800012L, 
800015L, 800015L), timeline.compressed = c(79, 79, 95, 79, 28, 
28), sel01 = c(NA, 6L, NA, NA, NA, 7L), sel02 = c(NA, 6L, NA, 
NA, NA, 7L), sel03 = c(NA, 3L, NA, NA, NA, 5L), sel04 = c(NA, 
6L, NA, NA, NA, 6L), close_num = c(1, NA, 0.2, 1, 0.8, NA), gener_sat = c(7L, 
NA, 7L, 8L, 7L, NA)), .Names = c("nomem_encr", "timeline.compressed", 
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"), class = "data.frame", row.names = c(NA, 
6L))

来源

2017-10-16 19:48:08 useR

在R中结合行

回答

使用reshape2和dplyr包

相关问题