2017-09-11 54 views
-1

我有一个15200行的excel表,对应于一个树的结构分析。我有所有的结构(48个结构),它们已经被计算在每一棵树上。例如,树12607具有3个结构CV11,1个结构IN12,并且所有结构的其余部分都没有(0)。因此,该表看起来像是一个巨大的表格,其中有很多0和树上结构的一些数字。最后一列是树的价值,根据其上的结构(每个结构通过它的存在给树提供了许多指向)。两个数据框的比较

问题是:是否有一些结构或结构的组合,给树提供了很高的价值。当然,根据每个结构的价值,我们可以看出哪一个具有比其他结构更高的值(例如,结构CV11具有值15,结构IN12具有值4)。但是我想知道的是,如果我们把所有树的最终值都大于100(我们创建一个新的数据帧“data100”),并且我们将比较最终值低于100的树(我们创建另一个数据帧“ data0“),我们可以发现在这些树上发现的结构的数量和发生有显着差异吗?因为价值高的结构可能只在100以下的树上才能找到;因为例如,这个结构不允许在同一棵树上找到其他结构。

Voilà,我希望我已经提供了足够的细节......如果你有任何想法或主张来解决这个问题..它会很棒!

下面是我的脚本。

> data100 
     CV11 CV12 CV13 CV14 CV15 CV21 CV22 CV23 CV24 CV25 CV26 CV31 CV32 CV33 CV41 CV42 CV43 CV44 CV51 CV52 IN11 IN12 IN13 
1  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
2  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
3  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
4  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
5  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
6  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 
7  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
8  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
9  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
10  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
11  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
12  0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 
13  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
14  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
15  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
     IN14 IN21 IN22 IN23 IN31 IN32 IN33 IN34 BA11 BA12 BA21 DE11 DE12 DE13 DE14 DE15 GR11 GR12 GR13 GR21 GR22 GR31 GR32 
1  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
2  0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 
3  0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
4  0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
5  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
6  0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 
7  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
8  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
9  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
10  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
11  0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 
12  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 3 0 0 
13  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 3 0 0 
14  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 
15  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
     EP11 EP12 EP13 EP14 EP21 EP31 EP32 EP33 EP34 EP35 NE11 NE12 NE21 OT11 OT12 OT21 OT22 ecoval 
1  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 
2  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  56 
3  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  10 
4  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  10 
5  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  4 
6  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  24 
7  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 
8  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 
9  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 
10  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 
11  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  18 
12  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  63 
13  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  77 
14  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  54 
15  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  20 
[ reached getOption("max.print") -- omitted 60749 rows ] 
> sortdata100<-data100[order(data100[,64],decreasing=T),] 

> rsortdata100<-sortdata100[sortdata100$ecoval>100,] 
> rsortdata100<-na.omit(rsortdata100)#181 lignes 
> rsortdata100 
     CV11 CV12 CV13 CV14 CV15 CV21 CV22 CV23 CV24 CV25 CV26 CV31 CV32 CV33 CV41 CV42 CV43 CV44 CV51 CV52 IN11 IN12 IN13 
1291  0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1083  0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
3919  0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 
14685 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
4021  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
5452  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
14686 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 
4022  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 
1013  0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
2895  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
4719  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 
682  0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 
3444  0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1299  0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 
2713  0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 
     IN14 IN21 IN22 IN23 IN31 IN32 IN33 IN34 BA11 BA12 BA21 DE11 DE12 DE13 DE14 DE15 GR11 GR12 GR13 GR21 GR22 GR31 GR32 
1291  0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1083  3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
3919  0 0 1 0 2 0 0 0 2 0 0 0 3 0 0 0 0 0 0 11 0 0 0 
14685 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
4021  0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
5452  0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 
14686 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 2 
4022  0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1013  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
2895  0 0 0 1 0 0 0 0 4 0 0 3 0 4 3 0 0 0 0 0 0 0 0 
4719  0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
682  0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 
3444  0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1299  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 
2713  0 0 0 2 0 3 0 0 2 0 0 0 1 5 1 0 0 0 0 0 0 0 0 
     EP11 EP12 EP13 EP14 EP21 EP31 EP32 EP33 EP34 EP35 NE11 NE12 NE21 OT11 OT12 OT21 OT22 ecoval 
1291  0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1192 
1083  0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 424 
3919  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 380 
14685 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 370 
4021  0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 358 
5452  0 0 0 0 0 0 1 0 0 11 0 0 0 0 1 0 0 356 
14686 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 354 
4022  0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 346 
1013  0 8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 326 
2895  0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 325 
4719  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 324 
682  0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 311 
3444  0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 306 
1299  0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 302 
2713  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 302 
[ reached getOption("max.print") -- omitted 166 rows ] 
> data0<-sortdata100[sortdata100$ecoval<100,] 
> data0<-na.omit(data0) 
> data0 
     CV11 CV12 CV13 CV14 CV15 CV21 CV22 CV23 CV24 CV25 CV26 CV31 CV32 CV33 CV41 CV42 CV43 CV44 CV51 CV52 IN11 IN12 IN13 
4728  0 0 0 1 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 
5339  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
11766 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
796  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
3561  0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 
10581 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 
10618 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 
14376 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 
14389 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 
790  0 0 0 1 0 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 1 0 0 
3974  0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 
4739  0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 
156  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
2740  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
2950  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 
     IN14 IN21 IN22 IN23 IN31 IN32 IN33 IN34 BA11 BA12 BA21 DE11 DE12 DE13 DE14 DE15 GR11 GR12 GR13 GR21 GR22 GR31 GR32 
4728  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 
5339  1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 
11766 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 
796  1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
3561  0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
10581 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
10618 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 
14376 1 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 
14389 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 
790  0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 
3974  0 0 0 0 0 0 0 0 1 0 0 0 4 0 0 0 1 0 0 0 0 0 0 
4739  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
156  0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 
2740  0 0 0 0 0 0 0 0 0 0 0 0 0 6 2 0 0 0 0 0 0 0 0 
2950  0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
     EP11 EP12 EP13 EP14 EP21 EP31 EP32 EP33 EP34 EP35 NE11 NE12 NE21 OT11 OT12 OT21 OT22 ecoval 
4728  0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0  99 
5339  0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0  99 
11766 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1  99 
796  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  98 
3561  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  98 
10581 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0  98 
10618 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0  98 
14376 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  98 
14389 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  98 
790  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  97 
3974  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  97 
4739  0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 0  97 
156  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  96 
2740  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0  96 
2950  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  96 
[ reached getOption("max.print") -- omitted 14984 rows ] 
+2

对不起我也不清楚,请阅读[如何提出一个很好的问题(HTTP的信息:// stackoverflow.com/help/how-to-ask)以及如何给出一个[可重现的例子](http://stackoverflow.com/questions/5963269)。这会让其他人更容易帮助你。 – zx8754

回答

0

也许是这样的?

library(dplyr) 
data %>% group_by(ecoval > 100) %>% summarize_all(mean) 

,应该给你的ecoval ><=每列平均为100

+0

非常感谢您的回答!我不太清楚如何解释R的结果,FALSE和TRUE行是什么?在名为TRUE的行上的平均值是多少? –

+0

'甲tibble:2×65 ecoval> 100 CV11 CV12 CV13 CV14 CV15 CV21 CV22 CV23 CV24 1 FALSE 0.00299880 0.003398641 0.0003332001 0..0005331201 0.005997601 0.00206584 0.003531921 0.00146608 2 TRUE 0.03314917 0.154696133 0.0441988950 0.535911602 0.0552486188 0.060773481 0.03867403 0.077348066 0.03867403' –

+0

我按条件'ecoval> 100'对行进行分组,因此包含'TRUE'的行是汇总'ec椭圆> 100',而包含'FALSE'的行包含'ecoval <= 100'的数据:) –