2016-08-11 38 views
0

我有一个GAMLSS模型我试图适应我的数据的多个子集。每个月都需要单独分析,所以我使用了一个foreach循环遍历数月。然而,当我并行我的循环时,dropterm的结果全部得到NA'd。下面是使用内置的数据类似的例子:并行foreach将数据更改为NA当运行dropterm

library(dplyr) 
library(gamlss) 
library(MASS) 
nCores <- detectCores() 
gamlssCl <- makeCluster(nCores) 
registerDoParallel(gamlssCl) 
test.par <- foreach(s = unique(iris$Species), 
        .packages = c('dplyr', 'gamlss', 'MASS')) %dopar% { 
    species.data <- filter(iris, Species == s) 
    model <- gamlss(Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length, 
        data = species.data, 
        family = GA) 
    var.rank <- dropterm(model, test = 'Chisq') %>% 
    mutate(Variable = row.names(.)) %>% 
    arrange(AIC) %>% 
    filter(Variable != '<none>') 

    var.rank 
} 
stopCluster(gamlssCl) 
test.par 
# [[1]] 
# Df AIC LRT Pr(Chi)  Variable 
# 1 NA NA NA  NA Sepal.Length 
# 2 NA NA NA  NA Sepal.Width 
# 3 NA NA NA  NA Petal.Length 
# 
# [[2]] 
# Df AIC LRT Pr(Chi)  Variable 
# 1 NA NA NA  NA Sepal.Length 
# 2 NA NA NA  NA Sepal.Width 
# 3 NA NA NA  NA Petal.Length 
# 
# [[3]] 
# Df AIC LRT Pr(Chi)  Variable 
# 1 NA NA NA  NA Sepal.Length 
# 2 NA NA NA  NA Sepal.Width 
# 3 NA NA NA  NA Petal.Length 

test.serial <- foreach(s = unique(iris$Species)) %do% { 
    species.data <- filter(iris, Species == s) 
    model <- gamlss(Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length, 
        data = species.data, 
        family = GA) 
    var.rank <- dropterm(model, test = 'Chisq') %>% 
    mutate(Variable = row.names(.)) %>% 
    arrange(AIC) %>% 
    filter(Variable != '<none>') 

    var.rank 
} 
test.serial 
# [[1]] 
# Df  AIC  LRT Pr(Chi)  Variable 
# 1 1 -31.66335 0.06406465 0.8001832 Sepal.Width 
# 2 0 -29.72741 0.00000000  NA Petal.Length 
# 3 1 -29.43731 2.29010516 0.1302011 Sepal.Length 
# 
# [[2]] 
# Df  AIC  LRT  Pr(Chi)  Variable 
# 1 0 31.03608 0.000000   NA Petal.Length 
# 2 1 33.81852 4.782442 2.875132e-02 Sepal.Width 
# 3 1 56.00459 26.968510 2.067972e-07 Sepal.Length 
# 
# [[3]] 
# Df  AIC   LRT  Pr(Chi)  Variable 
# 1 1 16.29265 0.08628226 7.689578e-01 Sepal.Width 
# 2 0 18.20637 0.00000000   NA Petal.Length 
# 3 1 77.14978 60.94341742 5.873901e-15 Sepal.Length 

注:该错误不体现使用glm时,而不是gamlss

+0

该示例不会在我的计算机上复制您的问题。也许是一个多余的问题,但是你有最新版本的所有软件包(和R)吗?你是否已经在重新启动R之后尝试运行代码? – Vandenman

+0

我也不能复制。如果范登曼的建议不能帮助你,你可以用'sessionInfo()'更新你的问题。 – user20650

+0

有趣的是,我可以在R 3.2.3上复制问题... [sessionInfo()incoming ...] > sessionInfo() R版本3.2.3(2015-12-10) 平台:x86_64-w64 -mingw32/64(64位) 运行下:视窗7 64(建立7601)服务包1 区域设置: [1] = LC_COLLATE LC_CTYPE German_Germany.1252 = German_Germany.1252 LC_MONETARY = German_Germany.1252 LC_NUMERIC = C [5] LC_TIME = German_Germany.1252 附加的基本软件包: [1] splines parallel stats graphics grDevices utils datasets methods base – AlexR

回答

0

对不起,没有解决办法,但这里有一个说明问题一个小例子,这并不取决于foreach。

首先,这样做:

library("gamlss") 
data <- subset(iris, Species == "setosa") 
model <- gamlss(Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length, 
       data = data, family = GA) 
## GAMLSS-RS iteration 1: Global Deviance = -37.7274 
## GAMLSS-RS iteration 2: Global Deviance = -37.7274 

model2 <- dropterm(model, test = "Chisq") 
print(model2) 
## Single term deletions for 
## mu 
## 
## Model: 
## Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length 
##    Df  AIC  LRT Pr(Chi) 
## <none>   -29.727     
## Sepal.Length 1 -29.437 2.29011 0.1302 
## Sepal.Width 1 -31.663 0.06406 0.8002 
## Petal.Length 0 -29.727 0.00000 

,然后将结果保存到文件:

saveRDS(list(model = model, model2 = model2), file = "gamlss.rds") 

然后在一个新的R对话R --vanilla),这样做:

> library("gamlss") 
Loading required package: splines 
Loading required package: gamlss.data 
Loading required package: gamlss.dist 
Loading required package: MASS 
Loading required package: nlme 
Loading required package: parallel 
********** GAMLSS Version 5.0-1 ********** 
For more on GAMLSS look at http://www.gamlss.org/ 
Type gamlssNews() to see new features/changes/bug fixes. 

> gamlss <- readRDS("gamlss.rds") 
> model <- gamlss$model 
> class(model) 
[1] "gamlss" "gam" "glm" "lm" 

> model2 <- dropterm(model, test = "Chisq") 
Model with term Sepal.Length has failed 
Model with term Sepal.Width has failed 
Model with term Petal.Length has failed 

> print(model2) 
Single term deletions for 
mu 

Model: 
Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length 
      Df  AIC LRT Pr(Chi) 
<none>   -29.727    
Sepal.Length      
Sepal.Width       
Petal.Length 

比较model2在新鲜R会议与上述第一次会议相比较;

> all.equal(model2, gamlss$model2) 
[1] "Component “Df”: 'is.NA' value mismatch: 1 in current 4 in target"  
[2] "Component “AIC”: 'is.NA' value mismatch: 0 in current 3 in target"  
[3] "Component “LRT”: 'is.NA' value mismatch: 1 in current 4 in target"  
[4] "Component “Pr(Chi)”: 'is.NA' value mismatch: 2 in current 4 in target" 

这里显然是不正确的。

我怀疑model对象包含一个或多个所谓的承诺,当转移到另一条R过程没有正确保存(如当使用SNOW簇的情况下)。

我认为这是gamlss包本身的问题。问题似乎是gamlss对象不能被序列化。我建议你把这个报告给包维护者。随意在报告中使用我最小的例子。