问题与R中

在xtab中的水平对于样本数据帧：问题与R中

df <- structure(list(area = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
             2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
             4L, 4L, 4L), .Label = c("a1", "a2", "a3", "a4"), class = "factor"), 
        result = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
           1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L), 
        weight = c(0.5, 0.8, 1, 3, 3.4, 1.6, 4, 1.6, 2.3, 2.1, 2, 
           1, 0.1, 6, 2.3, 1.6, 1.4, 1.2, 1.5, 2, 0.6, 0.4, 0.3, 0.6, 
           1.6, 1.8)), .Names = c("area", "result", "weight"), class = "data.frame", row.names = c(NA, 
                                 -26L))

我试图找出最高和最低的区域面积，然后产生一个加权交叉表，然后将其用于计算风险差。

df.summary <- setDT(df)[,.(.N, freq.1 = sum(result==1), result = weighted.mean((result==1), 
                        w = weight)*100), by = area] 

#Include only regions with highest or lowest percentage 
df.summary <- data.table(df.summary) 
incl <- df.summary[c(which.min(result), which.max(result)),area] 
df.new <- df[df$area %in% incl,] 
incl

“含”有我想要的两个领域，但仍四个层次：

[1] a2 a3 
Levels: a1 a2 a3 a4

如何摆脱水平的呢？随后的分析，我想要做的只是两个层面以及区域。有任何想法吗？

来源

2016-02-26 KT_1

我在网上找到这在其他地方（例如Problems with levels in a xtab in R）

df.new$area <- factor(df.new$area)

它的工作原理！

希望它对其他人有用。

来源

2016-02-26 10:47:31

但是它是一个data.table，所以'df.new [，area：= factor（area）]'保存'df.new'的变量名称重复更为习惯。 –

回答

相关问题