我正在使用具有不同数据(每个组织数据的蛋白质)的9个文件。每个文件代表一个不同的组织,并具有蛋白质表达值(如数字)。我正在尝试将数据合并到一个data.frame中。我用合并不同列长度和操纵列的几个数据帧。
read.delim("fileName.txt")
所有的文件。在那之后,我用一个列表中的所有数据帧
l <- list(data.frame1,..etc)
然后我用了plyr库和do.call(rbind.fill,l)
。
我的问题:
1)我要遍历的9个data.frames列表中找到独特的数据在其中,并在直方图绘制。如果我发现多个具有相同名称但不同组织的条目,则应将其添加到正确组织标签上方的直方图中。那就是 - 我转到列表中的第一个data.frame,从中取出第一个条目,搜索在其他data.frames中是否找到该条目,如果是,则将其添加到直方图中。
直方图在x轴上有9个组织,y轴是我的文件中的值。我不知道如何让直方图(和代码)正确地更改名称以及如何在正确的位置显示条形图。
另外我不知道如何建立轴来获取每个栏下的组织名称。
我有没有做我想要的一些基本代码:
i=1
for(val in list2[1:9])
{
if(val appears in one of the other data.frames)
plot a bar over the correct tissue.
hist(val[i,8],breaks=11,col="blue",density=13,angle=45,
labels=c("Lung","ErythroleukemicCellLine","TCells","Blood","liver",
"BLimpho","pancreas","prostate","Bladder"), main=fileName[i,1])
dev.new() #each hist in a new window
i = i + 1
}
谢谢 yigeal
这是代码的输出结束的几行: 读后与read.delim( “nameOfFile.txt”)的文件
dput(BloodErythroleukemicCellLineFile)
"Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein",
"Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598",
"Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609",
"Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610",
"Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613",
"Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614",
"Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622",
"Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625",
"Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638",
"Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638",
"Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646",
"Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B",
"Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667, isoform CRA_a",
"Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671",
"Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687",
"Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687",
"Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317, highly similar to Zinc finger protein 691",
"Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700",
"Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714",
"Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)",
"Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721",
"Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76",
"Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782",
"Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787",
"Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800",
"Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein", "Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828",
"Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837",
"Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878",
"Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891",
"Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2",
"Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2",
"Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2",
"Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6",
"Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein", "Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog",
"Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor", "Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B",
"Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160, highly similar to Zyxin",
"Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein", "Tax_Id=9606 Gene_Symbol=ZYX Zyxin"
), class = "factor")), .Names = c("proteinIdentifier", "protein",
"spectra", "unique_peptides", "FDR", "local_FDR", "sequence_coverage",
"expression_value", "expression_percentile", "organism", "tissue",
"localization", "condition", "experiment", "annotation"), class = "data.frame", row.names = c(NA,
-4802L))
它是更长的时间在控制台
我编辑了你的问题,使其更易读。请每个问题只问一个问题。有关plyr库的全部内容,请参阅手册。 '?rbind.fill'会告诉你所有你需要知道的信息。 – 2011-05-26 14:41:36
你可以为你的两个data.frames(或者至少是它们的顶部行)提供dput输出,所以我们有一些东西可以使用吗? – 2011-05-26 15:00:48