使用R - 将多个色谱柱冷凝成新色谱柱而不重复内容

我是一位植物学家，也是初学者的R用户。我想知道你是否可以帮我找到写剧本的解决方案。我一直在使用R来优化从电子表格创建文本的过程。为此我使用MonographaR包，我很好。问题本身正在处理data.frame。我的电子表格（CSV文件）基本上由物种栏，字符行和交叉点单元格组成。我想要一个最终脚本，它允许我将两个或更多列合并到原始电子表格的新列中。当细胞具有不同的内容时，新的细胞内容必须通过昏迷+空间", "分开独立的内容。当单元格具有相同的内容时，新单元格必须只有相同的内容一次，而不重复它。我试图用连接编写的脚本，cbind等重复了单元格的内容，我对此并不满意。使用R - 将多个色谱柱冷凝成新色谱柱而不重复内容

我最初的CSV看起来像这样，

 cattleya.minor cattleya.maxima cattleya.pumila 
colour red   red    red 
surface sharp   smooth   sharp 
leaves 1    3    4

，我想有一个最终的结果是这样

 cattleya  cattleya.minor cattleya.maxima cattleya.pumila 
colour red   red   red    red 
surface sharp, smooth sharp   smooth   sharp 
leaves 1, 3, 4  1    3    4

非常感谢你确实。

来源

2016-08-12 T. M.

你的数据不是[整洁（http://vita.had.co.nz/papers/tidy-data.pdf），因为你已经得到了不同类型的数据（字符串，整数）在同一列内。转换数据会更好，因此每一列都是一个变量，每一行都是一个观察值。 – alistaire

As @alistaire评论说，从“整洁”数据开始，事情会变得更容易。

# Starting data (which I've called "dat") 
dat

 cattleya.minor cattleya.maxima cattleya.pumila 
colour    red    red    red 
surface   sharp   smooth   sharp 
leaves    1    3    4

library(reshape2) 
library(tibble) 
library(dplyr) 

# Make data tidy 
dat.tidy = dat %>% 
    rownames_to_column(var="Characteristic") %>%    # Turn rownames into a data column 
    melt(id.var="Characteristic", variable.name="Species") %>% # Reshape to "long" format 
    dcast(Species ~ Characteristic)        # Cast back to wide so that each characteristic gets its own column 

dat.tidy

  Species colour leaves surface 
1 cattleya.minor red  1 sharp 
2 cattleya.maxima red  3 smooth 
3 cattleya.pumila red  4 sharp

# Summarize by genus 
dat.tidy %>% 
    group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>%  # Collapse to genus (remove species designation) 
    summarise_all(funs(paste(unique(.), collapse=", "))) %>% # For each charactreristic, paste together each unique value for a given genus 
    select(-Species)

 Genus colour leaves  surface 
1 cattleya red 1, 3, 4 sharp, smooth

来源

2016-08-12 02:32:20 eipi10

谢谢@allistaire & @ eipi10！

Eipi10，我很高兴能接近我的目标。我完全按照您的建议和相同的数据集运行脚本。它工作得很好，但它看起来在最后一个命令块或在线select(-Species)上有一点问题。你会检查它吗？ [R取回我下面的：

> dat <- read.csv("dat.csv") 
> dat 
     cattleya.minor cattleya.maxima cattleya.pumila 
color    red    red    red 
surface   sharp   smooth   sharp 
leaves    1    3    4 
> 
> # Make data tidy 
> dat.tidy = dat %>% 
+ rownames_to_column(var="Characteristic") %>%    # Turn  rownames into a data column 
+ melt(id.var="Characteristic", variable.name="Species") %>% # Reshape to "long" format 
+ dcast(Species ~ Characteristic)        # Cast back to wide so that each characteristic gets its own column 
Warning message: 
attributes are not identical across measure variables; they will be dropped 
> 
> dat.tidy 
      Species color leaves surface 
1 cattleya.minor red  1 sharp 
2 cattleya.maxima red  3 smooth 
3 cattleya.pumila red  4 sharp 
> 
> # Summarize by genus 
> dat.tidy %>% 
+ group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>% # Collapse to genus (remove species designation) 
+ summarise_all(funs(paste(unique(.), collapse=", "))) # For each charactreristic, paste together each unique value for a given genus 
# A tibble: 1 x 5 
    Genus           Species color leaves   surface 
    <chr>           <chr> <chr> <chr>   <chr> 
1 cattleya cattleya.minor, cattleya.maxima, cattleya.pumila red 1, 3, 4 sharp, smooth 
> select(-Species) 
Error in select_(.data, .dots = lazyeval::lazy_dots(...)) : 
    objeto 'Species' não encontrado (my free translation: object 'Species' not found) 
>

来源

2016-08-12 16:56:35

这是因为我在编辑我的答案时，在选择（ - 种类）之前意外删除了'％>％'行。对于那个很抱歉。我现在修好了。如果没有前一行中的'％>％'，R会将'select（-Species）'作为单独的语句处理，因此会导致错误。 'select（-Species）'只是删除'Species'列，但如果你想在汇总输出中保留'Species'列，你可以删除那一行。 – eipi10

梦幻般的解决方案！非常感谢你。 –

使用R - 将多个色谱柱冷凝成新色谱柱而不重复内容

回答

相关问题