2014-10-06 53 views
0

我有一个数据框的列表,每个数据框都有一个“名称”列和一个“示例”列。如何合并单个列的数据框列表?

df1: 

Name Sample1 
A 23 
B 445 
C 456 

df2: 

Name Sample2 
A 45 
B 984 
C 374 

我怎么能合并所有dataframes在我的列表中,这样他们最终像这样:从类似的问题

merged: 

Name Sample1 Sample2 
A 23 45 
B 445 984 
C 456 374 

我试图回答的SO,但他们都没有产生预期的结果。例如。 merged.data.frame = Reduce(function(...) merge(..., all=T), list.of.data.frames)

编辑:

我原来的剧本是下面列出

# List all files in current working directory 
fs <- list.files() 

# Load the data from each file into a list of data frames 
dfs <- lapply(fs, read.table, header=TRUE, sep="\t") 

# Select only the Name and Concentration columns from the list of data frames 
dfs <- lapply(dfs, subset, select=c(Name, Concentration)) 

# Sort each dataframe alphabetically by the Name column 
dfs <- lapply(dfs, function(df){df[order(df$Name),]}) 

# Rename each Concentration heading with the basename of the filename where the data originates 
for (i in 1:length(dfs)){colnames(dfs[[i]])[2] <- substr(fs[i], 1, nchar(fs[i]) - 4)} 

# Merge all the dataframes together by the Name column 

# Write merged dataframe out to a tab-delimited file 
write.table(dfs, ".", sep="\t") 
+1

尝试''从库(plyr)'join_all'。此外,'Reduce(函数(...)合并(...,all = TRUE,by =“Name”),list(df1,df2))'给出了基于2个数据集显示的预期输出。 – akrun 2014-10-06 16:44:12

回答

1

这对我的作品。这是您正试图解决的问题的准确再现吗?

> X <- replicate(20, data.frame(name=letters, r=runif(26)), simplify=FALSE) 
> for(i in 1:20) names(X[[i]])[2] <- paste0("Sample", i) 
> M <- Reduce(function(x,y)merge(x,y,by="name"), X) 
> str(M) 
'data.frame': 26 obs. of 21 variables: 
$ name : Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ... 
$ Sample1 : num 0.17 0.957 0.443 0.181 0.113 ... 
$ Sample2 : num 0.8983 0.1802 0.7817 0.0818 0.7741 ... 
$ Sample3 : num 0.7473 0.6888 0.0195 0.9815 0.9674 ... 
$ Sample4 : num 0.557 0.331 0.902 0.177 0.504 ... 
$ Sample5 : num 0.0784 0.1561 0.5524 0.2631 0.2082 ... 
$ Sample6 : num 0.7455 0.5604 0.7232 0.5651 0.0727 ... 
$ Sample7 : num 0.721 0.807 0.902 0.965 0.41 ... 
$ Sample8 : num 0.209 0.17 0.207 0.303 0.258 ... 
$ Sample9 : num 0.736 0.566 0.125 0.417 0.521 ... 
$ Sample10: num 0.639 0.778 0.499 0.57 0.934 ... 
$ Sample11: num 0.0104 0.1629 0.4513 0.4821 0.383 ... 
$ Sample12: num 0.2.95563 0.39992 0.00256 0.69283 ... 
$ Sample13: num 0.466 0.735 0.857 0.695 0.673 ... 
$ Sample14: num 0.562 0.873 0.269 0.151 0.628 ... 
$ Sample15: num 0.809 0.75 0.414 0.644 0.953 ... 
$ Sample16: num 0.7729 0.0129 0.5654 0.5705 0.7514 ... 
$ Sample17: num 0.239 0.454 0.538 0.596 0.743 ... 
$ Sample18: num 0.29 0.1 0.806 0.66 0.668 ... 
$ Sample19: num 0.461 0.739 0.474 0.64 0.418 ... 
$ Sample20: num 0.631 0.369 0.913 0.655 0.641 ... 
+0

你的例子完美的工作,所以它必须是我的数据预处理有问题。这很奇怪,因为直到我需要合并数据框的步骤,它看起来就像您创建的虚拟数据。 – jma1991 2014-10-06 18:35:46

+1

有一点需要注意的是,如果“名称”列设置为正确因子;为'read.table'尝试'stringsAsFactors = FALSE';同样,如果文件中的名称集合不同,则可能需要'all = TRUE'作为merge,否则您只会获得出现在* all中的行*。 – 2014-10-06 18:49:26

2

它应该工作(如下):

dats <- list(df1=data.frame(Name=c("A", "B", "C"), Sample1=c(23, 445, 456), stringsAsFactors=FALSE), 
      df2=data.frame(Name=c("A", "B", "C"), Sample2=c(45, 984, 374), stringsAsFactors=FALSE), 
      df3=data.frame(Name=c("A", "B", "C"), Sample3=c(66, 111, 2), stringsAsFactors=FALSE)) 

dats 

## $df1 
## Name Sample1 
## 1 A  23 
## 2 B  445 
## 3 C  456 
## 
## $df2 
## Name Sample2 
## 1 A  45 
## 2 B  984 
## 3 C  374 
## 
## $df3 
## Name Sample3 
## 1 A  66 
## 2 B  111 
## 3 C  2 

# with by="Name" 
merged.data.frame <- Reduce(function(...) merge(..., by="Name", all=TRUE), dats) 

merged.data.frame 

## Name Sample1 Sample2 Sample3 
## 1 A  23  45  66 
## 2 B  445  984  111 
## 3 C  456  374  2 

# without by="Name" (same result) 
merged.data.frame <- Reduce(function(...) merge(..., all=TRUE), dats) 

## Name Sample1 Sample2 Sample3 
## 1 A  23  45  66 
## 2 B  445  984  111 
## 3 C  456  374  2 
+0

我刚试过你的命令,它确实将它们连接在一起,但数据帧中有成千上万的重复行。我把我的完整脚本附加到我原来的问题,因为它是我遇到的类型错误? – jma1991 2014-10-06 18:16:20

相关问题