3
我对R比较新。我想知道如何使用“调查”包(http://r-survey.r-forge.r-project.org/survey/)来分析加权样本的多重回答问题?棘手的是,可以勾选多个响应,以便将响应存储在多个列中。如何使用R调查软件包分析加权样本中的多个回答问题?
例子:
我有500名受访者谁是随机来自全国10个地区得出的调查数据。假设被问到的主要问题是(存储在H1_AreYouHappy列中):'你快乐吗?' - 是/否/不知道
被访者被问到后续问题:'你为什么(快乐)? 这是一个选择题,可以选择多个答案框,因此答案被存储在单独的列中,例如:
H1Yes_Why1(0/1,即选择框打勾或未打勾) - '因为economny“;
H1Yes_Why2(0/1) - '因为我健康';
H1Yes_Why3(0/1) - '因为我的社交生活'。
下面是根据各地区
library(survey)
# Create an unweighted survey object
mySurvey.unweighted <- svydesign(ids=~1, data=myDataFrame)
# Choose which variable contains the sample distribution to be weighted by
sample.distribution <- list(~District)
# Specify (from Census data) how often each level occurs in the population
population.distribution <- data.frame(District = c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender'),
freq = c(0.1824885, 0.0891206, 0.1381343, 0.1006533, 0.1541269, 0.0955853, 0.0268172, 0.0398353, 0.0809459, 0.0922927))
# Apply the weights
mySurvey.rake <- rake(design = mySurvey.unweighted, sample.margins=sample.distribution, population.margins=list(population.distribution))
# Calculate the weighted mean for the main question
svymean(~H1_AreYouHappy, mySurvey.rake)
# How can I calculate the WEIGHTED means for the multiple choice - multiple response follow-up question?
的事实上的人口规模我的假数据集
districts <- c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender')
myDataFrame <- data.frame(H1_AreYouHappy=sample(c('Yes','No','Dont Know'),500,rep=TRUE),
H1Yes_Why1 = sample(0:1,500,rep=TRUE),
H1Yes_Why2 = sample(0:1,500,rep=TRUE),
H1Yes_Why3 = sample(0:1,500,rep=TRUE),
District = sample(districts,500,rep=TRUE), stringsAsFactors=TRUE)
我使用的R“调查”包申请后分层权重我如何计算多项选择问题的加权平均值(即跨越0/1响应列)?
如果我想它不加权的,我可以使用此功能横跨符合我的前缀“H1Yes_Why”
multipleResponseFrequencies = function(data, question.prefix) {
# Find the columns with the questions
a = grep(question.prefix, names(data))
# Find the total number of responses
b = sum(data[, a] != 0)
# Find the totals for each question
d = colSums(data[, a] != 0)
# Find the number of respondents
e = sum(rowSums(data[,a]) !=0)
# d + b as a vector. This is the overfall frequency
f = as.numeric(c(d, b))
result <- data.frame(question = c(names(d), "Total"),
freq = f,
percent = (f/b)*100,
percentofcases = (f/e)*100)
result
}
multipleResponseFrequencies(myDataFrame, 'H1Yes_Why')
任何帮助,将不胜感激所有列计算的频率。
你可能会更好,通过分析一个例子,在工作http://asdfree.com/ –
@AnthonyDamico请问你的例子告诉我们如何分析多个响应问题?任何示例? – SmallChess