2017-05-05 142 views
2

想通过简化代码来问这个问题(逻辑有点不可思议 - 但它与我的情况很相似),我正在使用的当前代码很长,可能太多没有价值的词。我会很乐意添加什么需要回答这个问题:将列表中的循环输出转换为R中的数据框

我有一个情况与for循环,例如:

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16), 
       "Vanilla" = c(0.64), "Blueberry" = c(.75)) 

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace=T, prob = c(1-data2[i],data2[i]))) 

    lossCol <- freqSim*(runif(n=100, min=0, max=7000)) 

    costAvg <- mean(as.numeric(unlist(lossCol))) 
    costSD <- sd(as.numeric(unlist(lossCol))) 

    costAvg <- formatC(costAvg, format='d', big.mark=",") 
    costSD <- formatC(costSD, format='d', big.mark= ",") 

    stats <- list() 
    stats[[i]] <- list(costAvg,costSD) 

    print(stats[[i]]) 
} 

在那里我得到返回的载体,诸如:

[[1]] 
[1] "1,261" 

[[2]] 
[1] "2,103" 

[[1]] 
[1] "313" 

[[2]] 
[1] "1,165" 

[[1]] 
[1] "2,073" 

[[2]] 
[1] "2,206" 

[[1]] 
[1] "2,417" 

[[2]] 
[1] "2,258" 

我会理想像,看起来像一个矩阵:

  Chocolate Strawberry Vanilla Blueberry 
Label A  1,261  313   2,073  2,417 
Label B  2,103  1,165  2,206  2,258 

没有办法做到这一点没有抛出自己掉下悬崖?

回答

1

这里有一个简单的解决办法:

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16), 
     "Vanilla" = c(0.64), "Blueberry" = c(.75)) 

stats <- data.frame(row.names = c("Label A", "Label B")) 

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace=T, 
      prob = c(1-data2[i],data2[i]))) 

    lossCol <- freqSim*(runif(n=100, min=0, max=7000)) 

    costAvg <- mean(as.numeric(unlist(lossCol))) 
    costSD <- sd(as.numeric(unlist(lossCol))) 

    costAvg <- formatC(costAvg, format='d', big.mark=",") 
    costSD <- formatC(costSD, format='d', big.mark= ",") 

    stats["Label A", i] <- costAvg 
    stats["Label B", i] <- costSD 
} 

colnames(stats) <- colnames(data2) 

结果:

 Chocolate Strawberry Vanilla Blueberry 
Label A  764  470 2,003  2,932 
Label B  1,674  1,418 2,202  2,315 

我会鼓励你看看使用tidyr为这些类型的操作,而不是在基础R做, 如果你可以的话。

1

我们可以通过使用simplify2array

res <- simplify2array(stats) 
dimnames(res) <- list(paste("Label", c("A", "B")), names(data2)) 

注意做到这一点:确保定义

stats <- list() 

for

一个更好的选择是指定外面有length预 - 分配即

stats <- vector("list", length(data2)) 
1

为了准确地得到你作为输出表的内容,试试这个。没有时间应用正确的命名约定。请包涵。

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16), 
        "Vanilla" = c(0.64), "Blueberry" = c(.75)) 
x = c("Chocolate", "Strawberry", "Vanilla", "Blueberry") 
y = c("Label A", "Label B") 

data3 = matrix(nrow = 2, ncol = 4) 
colnames(data3) = x 
row.names(data3) = y 

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace = T, prob = c(1-data2[i],data2[i]))) 

    lossCol <- freqSim*(runif(n=100, min=0, max=7000)) 

    costAvg <- mean(as.numeric(unlist(lossCol))) 
    costSD <- sd(as.numeric(unlist(lossCol))) 

    costAvg <- formatC(costAvg, format='d', big.mark=",") 
    costSD <- formatC(costSD, format='d', big.mark= ",") 

    data3[1, i] = costAvg 
    data3[2, i] = costSD 
} 
1

下面是dplyr的示例。它不会给你你想要的矩阵,但它避免for循环一个更合适的方法:

freqSim <- lapply(names(data2), function(x) 
        sample(0:1, length(1:100), replace=T, 
        prob=c(1-data2[x], data2[x]))) 
names(freqSim) <- names(data2) 

lossCol <- lapply(freqSim, function(x) x*(runif(n=100, min=0, max=7000))) 

do.call(data.frame, lossCol) %>% 
    gather(type, val) %>% 
    group_by(type) %>% 
    summarise(mean=mean(val), sd=sd(val)) %>% 
    mutate_at(.cols=vars(mean, sd), .funs = funs(format(., format="d", big.mark=","))) 

# A tibble: 4 × 3 
     type  mean  sd 
     <chr>  <chr>  <chr> 
1 Blueberry 2,911.8587 2,481.310 
2 Chocolate 810.6141 1,820.357 
3 Strawberry 680.2027 1,659.491 
4 Vanilla 2,302.0011 2,305.148 
1

如果你真的想要一个矩阵格式输出,可以使用outer在基础R做到这一点。例如,为了计算上的mtcars每列一个平均数和中位数,你可以这样做:

> outer(list(mean=mean, median=median), as.data.frame(mtcars), Vectorize(function(f,y) f(y))) 
      mpg cyl  disp  hp  drat  wt  qsec  vs  am gear carb 
mean 20.090625 6.1875 230.721875 146.6875 3.5965625 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 
median 19.200000 6.0000 196.300000 123.0000 3.6950000 3.32500 17.71000 0.0000 0.00000 4.0000 2.0000 

外的第一个参数是要应用的功能命名列表,第二个是遍历列,最后一个参数是一个函数来评估列上的函数。这里需要Vectorize

在你的情况,我想你的代码分为三个部分:

生成样本:

>  freqSim <- lapply(data2, function(x) sample(0:1, length(1:100), replace=T, prob=c(1-x,x)) *(runif(n=100, min=0, max=7000))) 

看起来是这样的:

> str(freqSim) 
List of 4 
$ Chocolate : num [1:100] 0 0 0 0 0 ... 
$ Strawberry: num [1:100] 0 0 0 0 0 0 0 0 0 0 ... 
$ Vanilla : num [1:100] 4175 1456 0 1201 852 ... 
$ Blueberry : num [1:100] 0 3896 3794 5096 2901 ... 

声明你的功能:

> funs <- list(`Label A`=function(x) formatC(mean(x), format='d', big.mark=","), 
       `Label B`=function(x) formatC(sd(x), format='d', big.mark=",")) 

使用outer

> outer(funs, freqSim, Vectorize(function(f,y) f(y))) 
     Chocolate Strawberry Vanilla Blueberry 
Label A "518"  "427"  "2,044" "2,441" 
Label B "1,417" "1,290" "2,250" "2,259"