2017-06-16 95 views
0

与R中传统循环有关的大多数问题通过使用代码较少的函数来解释,并且通常更灵活。For循环用于在R中按顺序调整回归

然而,请纠正我,我觉得迭代次序很重要,因为循环仍然占主导地位。

在我的情况下,我想建立一个顺序和累积调整逻辑回归模型,存储OR/CIs和一列显示正在调整的内容。这是我的预期输出:

Model  OR  CI 

Biomarker 
+Age 
+Sex 
+Smoking 

这里就是我所做的:

df1 <- subset(df, select = c(age_cat, is_female, smoking_category, 
           bmi_calc, has_diabetes, sbp_mean, 
           alcohol_category, highest_education, 
           occupation, household_income)) 
model <- data.frame(NULL) 

for (i in seq_along(df1)) { 

    model <- exp((cbind(OR = coef(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")), 
         family=binomial, data=df)), 
      confint(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")), 
         family=binomial, data=df))))) 


} 

我的结果变量是中风(istroke,0或1)。我感兴趣的暴露是生物标志物(hscrp_mgl)。我知道我在某个地方犯了一个根本性的错误。我在其他SO帖子中寻找,但其中大多数不希望按顺序累积调整回归模型。

请让我知道如果这是重复的,但如果有什么不清楚的。

编辑

我的原始数据集DF包含DF1的所有变量,我的结果变量,然后一些。下面是它的一个重复的样品:

age_cat is_female smoking_category bmi_calc has_diabetes  sbp_mean istroke 
(59,69]  0   4   19.6   0    103.5   0 
(59,69]  1   1   19.1   0     138   0 
(29,59]  0   4   26.8   0    155.5   0 
(29,59]  0   1   23.1   0     130   1 
(29,59]  1   1   22.7   0     126   1 
(59,69]  0   4    25   0    182.5   0 
(29,59]  1   1    20   0     96   1 
(29,59]  1   2    23.9   0    134.5   0 
(59,69]  0   4    24.4   0    160.5   1 

编辑 更可重复的例子:

df <- data.frame(age = c(50, 60, 50, 40, 70, 90, 30), 
      gender = c(0, 1, 1, 0, 1, 1, 1), 
      smoke = c(4, 3, 2, 1, 4, 3, 4), 
      BMI = c(19, 20, 21, 22, 23, 24, 25), 
      SBP = c(100, 120, 140, 110, 120, 130, 120), 
      diab = c(0, 1, 1, 1, 0, 1, 1), 
      stroke = c(0, 1, 0, 0, 1, 1, 1)) 
dput(df) 
structure(list(age = c(50, 60, 50, 40, 70, 90, 30), gender = c(0, 
1, 1, 0, 1, 1, 1), smoke = c(4, 3, 2, 1, 4, 3, 4), BMI = c(19, 
20, 21, 22, 23, 24, 25), SBP = c(100, 120, 140, 110, 120, 130, 
120), diab = c(0, 1, 1, 1, 0, 1, 1), stroke = c(0, 1, 0, 0, 1, 
1, 1)), .Names = c("age", "gender", "smoke", "BMI", "SBP", "diab", 
"stroke"), row.names = c(NA, -7L), class = "data.frame") 
+0

请您提供DF的可重复的例子吗? – OmaymaS

+0

@OmaymaS,请参阅编辑。 – Mak

+0

请问你是否想要它?只是为了开始。 – OmaymaS

回答

0

其实,lapply可能是你的情况下,更好的方法了for,因为它可以返回data.frames的集合,用于最终行绑定,而不是扩大模型反复的。

以下示例随机化hscrp_mgl因为它不在发布的数据中。所以忽略结果,但考虑过程。另外,置信区间在不同的列中分为低和高。

set.seed(456) 
df <- data.frame(hscrp_mgl = abs(rnorm(250)), 
       age = sample(100, 1000, replace=TRUE), 
       gender = sample(0:1, 1000, replace=TRUE), 
       smoke = sample(1:4, 1000, replace=TRUE), 
       BMI = sample(19:25, 1000, replace=TRUE), 
       SBP = sample(c(100, 120, 140, 110, 120, 130, 120), 
           1000, replace=TRUE), 
       diab = sample(0:1, 1000, replace=TRUE), 
       stroke = sample(0:1, 1000, replace=TRUE)) 

# ITERATE THROUGH COLUMN NUMBERS (SUBSETTING OUT FIRST AND LAST) 
modeldfs <- lapply(seq_along(df)[3:ncol(df)-1], function(i) { 
    strf <- paste("stroke ~ log2(hscrp_mgl)", 
       paste(names(df)[2:i], collapse = "+"), sep = "+") 
    print(strf) 

    # FIT DYNAMIC CUMULATIVE FORMULA USING names() TO PASS IN COLUMN NAME 
    fit <- glm(as.formula(strf), family=binomial, data=df) 

    # BIND MODEL STATS 
    data.frame(OR = exp(coef(fit)[i+1]), 
      CI_2.5 = exp(confint(fit)[i+1,1]), 
      CI_97.5 = exp(confint(fit)[i+1,2])) 
}) 

model <- do.call(rbind, modeldfs) 
model 

输出

[1] "stroke ~ log2(hscrp_mgl)+age" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP+diab" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
# > model <- do.call(rbind, modeldfs) 
# > model 
      OR CI_2.5 CI_97.5 
age 1.003285 0.9989043 1.007701 
gender 1.067117 0.8318796 1.369055 
smoke 1.005926 0.9005196 1.123717 
BMI 1.011281 0.9505659 1.075928 
SBP 1.003252 0.9929368 1.013692 
diab 1.139586 0.8880643 1.462925 
+0

感谢@Parfait。不过,也许从帖子中不明确,我想**累计调整** OR和CI。因此,在第一次迭代中,它可能是stroke〜hscrp(粗糙模型),但下一次迭代应该给出stroke〜hscrp + age的ORs,然后是stroke〜hscrp + age + gender的ORs等等。因此,我的需要一个传统的循环而不是函数,因为顺序迭代和累积迭代的顺序在这里很重要。 – Mak

+0

您实际上仍然可以在公式中的列名动态范围上使用'paste(...,collapse)'使用'lapply'。请参阅编辑公式打印出来。 – Parfait

+0

非常感谢@Parfait!这看起来正确的钱。我将在周一回到我的部门时检查这一点,并让你知道它是怎么回事! PS:它非常优雅! – Mak

0

我没有与hscrp_mgl数据帧重现的结果,并确保它是与您想要的一样,但您可以尝试以下方法:

获取您想要在迭代中使用的所有功能的名称:

x <- setdiff(names(df), "stroke") 

使用purrr::map

创建与功能名称的第一列中的数据帧,并使用purrr::map变异所需的值。

library(purrr) 

model <- data_frame(Model = x) %>% 
    mutate(OR = map(Model, ~coef(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
            family=binomial, data=df))), 
     CI = map(Model, ~confint(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
            family=binomial, data=df))) 

你会得到某事像这样:

# A tibble: 6 × 3 
    Model  OR   CI 
    <chr> <list>  <list> 
1 age <dbl [3]> <dbl [3 × 2]> 
2 gender <dbl [3]> <dbl [3 × 2]> 
3 smoke <dbl [3]> <dbl [3 × 2]> 
4 BMI <dbl [3]> <dbl [3 × 2]> 
5 SBP <dbl [3]> <dbl [3 × 2]> 
6 diab <dbl [3]> <dbl [3 × 2]> 

使用Purrr::mapbroom

您还可以使用broom函数提取从模型中所需的数据如下:

  • 添加模型结果为一列
  • 使用tidy获取系数并进行变异并添加OR
  • 获取conf。使用confint_tidy和间隔添加CI

model2 <- data_frame(Model = x) %>% 
    mutate(model_details = map(Model, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
            family=binomial, data=df))) %>% 
    mutate(OR = map(model_details, broom::tidy), 
     CI = map(model_details, broom::confint_tidy)) 

累积调整

累积的调整,你可以尝试以下方法:

model <- data_frame(Model = cnames) %>% 
    mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+"))) %>% 
    mutate(model_details = map(Model_adjust, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
             family=binomial, data=df))) %>% 
    mutate(OR = map(model_details, broom::tidy), 
     CI = map(model_details, broom::confint_tidy)) 

的额外步骤添加一列与包含的变量,然后f ollowing步骤使用Model_adjust以适应机型:

model <- data_frame(Model = cnames) %>% 
    mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+"))) 

    # A tibble: 6 × 2 
     Model     Model_adjust 
     <chr>       <chr> 
    1 age       age 
    2 gender     age+gender 
    3 smoke    age+gender+smoke 
    4 BMI   age+gender+smoke+BMI 
    5 SBP  age+gender+smoke+BMI+SBP 
    6 diab age+gender+smoke+BMI+SBP+diab 
+0

感谢您的回复@OmaymaS。这是否给了我个人关系的ORs,如中风〜hscrp +年龄,中风〜hscrp +性别?或者它是累积调整的变量,如中风〜hscrp +年龄,然后下一个中风〜hscrp +年龄+性别...等我希望后者... ORs和CI表格格式的序贯和累积adjustemtns该模型。 – Mak

+0

@Mak \t 检查添加的累积调整部分 – OmaymaS

+0

谢谢@OmaymaS。虽然我发现扫帚套件非常有用,但我认为帕菲特的方法更适合我的目的。 – Mak