2016-01-06 67 views
0

我很严重地处理数据重排问题。以下数据包含折叠或稳定的协议(行)(列“折叠”)以及已减少,保留,添加或不存在的特性条款(列“diff.pps_leadership”,“diff.pps_cabinet”等)数据重排/类似于数据透视表?

我想重新排列这些数据,以便我了解减少,保留或添加特定设置的那些协议中有多少%已折叠。这些行应该是规定(diff.pps_leadership ...),这些列应该“减少”,“保留”和“添加”,而单元格的内容应该是折叠的百分比(仅限于那些这减少,保留,或增加的规定,而不是总数)

在Excle我会在数据透视表中做到这一点,但我一直没有能够与R.到达那里我试图铸造,聚合,融化和转命令,但都没有成功。

最终,该结果应该与此类似 https://docs.google.com/spreadsheets/d/1yhIbvTQTYkkwSFVxWEnPwvSvwTc0vuTYZxa15Eh1lT8/edit?usp=sharing

希望我的问题是不是太具体。感谢有任何暗示/建议。

example <- structure(list(Agreement = structure(c(8L, 4L, 6L, 9L, 2L, 3L, 
7L, 10L, 5L, 1L), .Label = c("Abuja Agreement", "Accra Peace Agreement", 
"Arusha Agreement", "Arusha/Global Ceasefire Agreement", "Comprehensive Peace Agreement", 
"InterabsentCongolese Dialogue", "Lome Agreement", "Lusaka Protocol", 
"Ouagadougou Agreement", "Tansitional Constituion"), class = "factor"), 
    diff.pps_cabinet = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c("kept", "reduced"), class = "factor"), 
    diff.pps_leadership = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 
    3L, 3L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.mps_milcmd = structure(c(3L, 2L, 3L, 3L, 3L, 3L, 1L, 
    3L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.mps_armyint = structure(c(3L, 2L, 2L, 3L, 3L, 3L, 1L, 
    3L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.eps_commission = structure(c(1L, 1L, 1L, 1L, 3L, 1L, 
    3L, 1L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.eps_company = structure(c(1L, 2L, 1L, 1L, 3L, 1L, 1L, 
    1L, 2L, 3L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.veto_leg = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("absent", "added"), class = "factor"), 
    diff.tps_devolution = structure(c(2L, 1L, 2L, 3L, 1L, 1L, 
    1L, 2L, 2L, 1L), .Label = c("absent", "kept", "reduced"), class = "factor"), 
    diff.ca.psh = structure(c(3L, 2L, 1L, 1L, 4L, 1L, 1L, 1L, 
    4L, 1L), .Label = c("absent", "added", "kept", "reduced"), class = "factor"), 
    collapse = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 
    1L), .Label = c("collapse", "stable"), class = "factor")), .Names = c("Agreement", 
"diff.pps_cabinet", "diff.pps_leadership", "diff.mps_milcmd", 
"diff.mps_armyint", "diff.eps_commission", "diff.eps_company", 
"diff.veto_leg", "diff.tps_devolution", "diff.ca.psh", "collapse" 
), class = "data.frame", row.names = c(NA, -10L)) 
+2

@akrun,它只是它们在导致错误的<-'中使用的连字符。 – A5C1D2H2I1M1N2O1R2T1

回答

1

以下内容会完成工作。

library(data.table) 
setDT(example) 

mvs <- c("diff.pps_cabinet", "diff.pps_leadership", 
     "diff.mps_milcmd", "diff.mps_armyint") 

vls <- c("reduced", "kept", "added", "absent") 

melt(example, c("Agreement", "collapse"), mvs 
    )[ , setNames(vapply(
     vls, function(vv) list(paste0(
     s <- sum(collapse[idx <- value == vv] == "collapse"), 
     " out of ", sum(idx), " = ", floor(100 * s/sum(idx)), "% collapsed"), 
     paste(Agreement[idx], collapse = "\n")), 
     vector("list", 2)), 
     paste0(rep(vls, each = 2), 
       c(".percent", ".names"))), by = variable] 

当前打印NaN什么也没有;要修复此问题,请用分号(if (!any(idx)) 1 else sum(idx))替换分​​母中的sum(idx)

+0

非常感谢您的努力!这已经非常接近我所寻找的。不幸的是,细胞中观察到的百分比和数量并不是我正在寻找的。例如,在单元格diff.pps_cabinet /reduced.percent现在是“9的10”应该是“5 9”之一。 9个(总共10个)减少和消除了这5个崩溃。 – zoowalker

+0

@zoowalk更新。 – MichaelChirico

+0

非常好,非常感谢。唯一剩下的问题是我只想要那些崩溃的协议的名称,而不是全部。如果我理解正确,这涉及到表达式粘贴(协议[idx],崩溃=“\ n”),并且只需要过滤掉那些崩溃的条件。思想协议[idx < - value == vv] ==“collapse”,collapse =“\ n”)可能是前进之路,但不幸的是无法正常工作。如果我最初的要求不够明确,我很抱歉。 – zoowalker