2016-02-26 91 views
1

在xtab中的水平对于样本数据帧:问题与R中

df <- structure(list(area = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
             2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
             4L, 4L, 4L), .Label = c("a1", "a2", "a3", "a4"), class = "factor"), 
        result = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
           1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L), 
        weight = c(0.5, 0.8, 1, 3, 3.4, 1.6, 4, 1.6, 2.3, 2.1, 2, 
           1, 0.1, 6, 2.3, 1.6, 1.4, 1.2, 1.5, 2, 0.6, 0.4, 0.3, 0.6, 
           1.6, 1.8)), .Names = c("area", "result", "weight"), class = "data.frame", row.names = c(NA, 
                                 -26L)) 

我试图找出最高和最低的区域面积,然后产生一个加权交叉表,然后将其用于计算风险差。

df.summary <- setDT(df)[,.(.N, freq.1 = sum(result==1), result = weighted.mean((result==1), 
                        w = weight)*100), by = area] 

#Include only regions with highest or lowest percentage 
df.summary <- data.table(df.summary) 
incl <- df.summary[c(which.min(result), which.max(result)),area] 
df.new <- df[df$area %in% incl,] 
incl 

“含”有我想要的两个领域,但仍四个层次:

[1] a2 a3 
Levels: a1 a2 a3 a4 

如何摆脱水平的呢?随后的分析,我想要做的只是两个层面以及区域。有任何想法吗?

回答

2

我在网上找到这在其他地方(例如Problems with levels in a xtab in R

df.new$area <- factor(df.new$area) 

它的工作原理!

希望它对其他人有用。

+0

但是它是一个data.table,所以'df.new [,area:= factor(area)]'保存'df.new'的变量名称重复更为习惯。 –