2017-04-27 80 views
0

我有一些调查数据导致5点喜欢量表。但是,在某些回复栏中,缺少了一些因素。这里是数据:李克特在R与不同数量的因素水平

Increased student engagement ,Instructional time effectiveness increased,Increased student confidence,Increased student performance in class assignments,Increased learning of the students,Added unique learning activities

Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree

Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree

Disagree,Strongly disagree,Neither agree nor disagree,Disagree,Disagree,Neither agree nor disagree

正如你所看到的那样,一些响应列有一些缺失的因素,例如,在第一列中,同意,和强烈不同意缺失(为简单起见,我已粘贴的实际数据集的子集)

我使用R中的以下代码:

facultyData <- read_excel("FacultyResponsesForR.xlsx") 
facultyData[] <- lapply(facultyData, factor) 
facultyData[1:6] <- lapply(facultyData[1:6], factor, levels=1:5) 
likertData <- likert(facultyData, nlevels = 5) 
plot(likertData) 

然而,这是导致以下错误:

Error in mean(as.numeric(items[, i]), na.rm = TRUE) : 
    (list) object cannot be coerced to type 'double' 

我已经尝试过其他职位(一个在代码facultyData[] <- lapply(facultyData[], factor, levels=1:5)的注释行)中提到的解决方案,但它不工作,要么

显然,在执行此之前lappy上的数据包括:

# A tibble: 14 × 1 
    `Increased student engagement` 
          <fctr> 
1     Strongly agree 
2       Agree 
3       Agree 
4       Agree 
5       Agree 
6       Agree 
7       Agree 
8       Agree 
9       Agree 
10  Neither agree nor disagree 
11  Neither agree nor disagree 
12  Neither agree nor disagree 
13  Neither agree nor disagree 
14      Disagree 

执行它的数据后,重写与NA值?这是为什么发生?

> facultyData[1:6] <- lapply(facultyData[1:6], factor, levels=1:5) 
> facultyData[,1] 
# A tibble: 14 × 1 
    `Increased student engagement` 
          <fctr> 
1        NA 
2        NA 
3        NA 
4        NA 
5        NA 
6        NA 
7        NA 
8        NA 
9        NA 
10        NA 
11        NA 
12        NA 
13        NA 
14        NA 

改变代码如下后,数据将被保留(不成为NA,但我得到了同样的错误)

mylevels <- c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree') 
facultyData <- read_excel("FacultyResponsesForR.xlsx") 
facultyData[] <- lapply(facultyData, factor) 
facultyData[1:6] <- lapply(facultyData[1:6], factor, levels=mylevels) 

这种解决方案并没有为我工作 - https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R

回答

2

重写你的数据并不好玩,这需要弄清楚,但我认为这对你有帮助。有人可能有更短的路。让我知道它是否有帮助。

df <- rbind(c("Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree"), 
      c("Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree"), 
      c("Disagree","Strongly disagree","Neither agree nor disagree","Disagree","Disagree","Neither agree nor disagree")) 
df <- as.data.frame(df) 
colnames(df) <- c("Increased student engagement", "Instructional time effectiveness increased", "Increased student confidence", "Increased student performance in class assignments", "Increased learning of the students", "Added unique learning activities") 

lookup <- data.frame(levels = 1:5, mylabels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree')) 

df.1 <- as.data.frame(apply(df, 2, function(x) match(x, lookup$mylabels))) 
df.new <- as.data.frame(lapply(as.list(df.1), factor, levels = lookup$levels, labels = lookup$mylabels)) 

str(df.new) 
'data.frame': 3 obs. of 6 variables: 
$ Increased.student.engagement      : Factor w/ 5 levels "Strongly disagree",..: 5 3 2 
$ Instructional.time.effectiveness.increased  : Factor w/ 5 levels "Strongly disagree",..: 5 3 1 
$ Increased.student.confidence      : Factor w/ 5 levels "Strongly disagree",..: 5 3 3 
$ Increased.student.performance.in.class.assignments: Factor w/ 5 levels "Strongly disagree",..: 5 3 2 
$ Increased.learning.of.the.students    : Factor w/ 5 levels "Strongly disagree",..: 5 3 2 
$ Added.unique.learning.activities     : Factor w/ 5 levels "Strongly disagree",..: 5 3 3 
+0

我发现主要问题是'read_excel'函数。我使用了'facultyData < - read.csv('FacultyResponsesForR.csv',colClasses = c('factor','factor','factor',“factor”,“factor”,“factor”))' 。 – vipin8169

+0

很高兴你能想出来 –

+0

谢谢你的帮助:) – vipin8169

2

我使用您的示例数据创建了一个Excel文件。在read_excel读这给出了如下结果

library(readxl) 
dat <- read_excel("factor_labels.xlsx") 
dat 
#> # A tibble: 3 × 6 
#> `Increased student engagement` 
#>       <chr> 
#> 1     Strongly agree 
#> 2  Neither agree nor disagree 
#> 3      Disagree 
#> # ... with 5 more variables: `Instructional time effectiveness 
#> # increased` <chr>, `Increased student confidence` <chr>, `Increased 
#> # student performance in class assignments` <chr>, `Increased learning 
#> # of the students` <chr>, `Added unique learning activities` <chr> 

你是正确的,read_excel不字符变量转换为因素 - 这是故意的,因为它往往是不必要的或不适当的治疗字符变量的分类。即使我们想要转换为因子,明确地做到这一点也是很好的做法,以确保因素按照正确的顺序(默认情况下,将使用变量中存在的级别创建的因子,按字母顺序排序)具有合适的级别。有时我们可能想要做更复杂的事情,比如重命名级别或重组级别,但在这里我们不想更改级别,只是指定完整的级别集合。创建所需要的因素,一种方法是用mutate_alldplyr

mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree", 
    "Agree", "Strongly agree") 

library(dplyr) 
#> 
#> Attaching package: 'dplyr' 
#> The following objects are masked from 'package:stats': 
#> 
#>  filter, lag 
#> The following objects are masked from 'package:base': 
#> 
#>  intersect, setdiff, setequal, union 
dat <- dat %>% mutate_all(factor, levels = mylevels) 
dat 
#> # A tibble: 3 × 6 
#> `Increased student engagement` 
#>       <fctr> 
#> 1     Strongly agree 
#> 2  Neither agree nor disagree 
#> 3      Disagree 
#> # ... with 5 more variables: `Instructional time effectiveness 
#> # increased` <fctr>, `Increased student confidence` <fctr>, `Increased 
#> # student performance in class assignments` <fctr>, `Increased learning 
#> # of the students` <fctr>, `Added unique learning activities` <fctr> 
lapply(dat, levels) 
#> $`Increased student engagement` 
#> [1] "Strongly disagree"   "Disagree"     
#> [3] "Neither agree nor disagree" "Agree"      
#> [5] "Strongly agree"    
#> 
#> $`Instructional time effectiveness increased` 
#> [1] "Strongly disagree"   "Disagree"     
#> [3] "Neither agree nor disagree" "Agree"      
#> [5] "Strongly agree"    
#> 
#> $`Increased student confidence` 
#> [1] "Strongly disagree"   "Disagree"     
#> [3] "Neither agree nor disagree" "Agree"      
#> [5] "Strongly agree"    
#> 
#> $`Increased student performance in class assignments` 
#> [1] "Strongly disagree"   "Disagree"     
#> [3] "Neither agree nor disagree" "Agree"      
#> [5] "Strongly agree"    
#> 
#> $`Increased learning of the students` 
#> [1] "Strongly disagree"   "Disagree"     
#> [3] "Neither agree nor disagree" "Agree"      
#> [5] "Strongly agree"    
#> 
#> $`Added unique learning activities` 
#> [1] "Strongly disagree"   "Disagree"     
#> [3] "Neither agree nor disagree" "Agree"      
#> [5] "Strongly agree" 

注意从<chr><fctr>打印输出中的变化。与此相比,read.csv解决方案:

facultyData <- read.csv("factor_labels.csv") 
lapply(facultyData, levels) 
#> $Increased.student.engagement 
#> [1] "Disagree"     "Neither agree nor disagree" 
#> [3] "Strongly agree"    
#> 
#> $Instructional.time.effectiveness.increased 
#> [1] "Neither agree nor disagree" "Strongly agree"    
#> [3] "Strongly disagree"   
#> 
#> $Increased.student.confidence 
#> [1] "Neither agree nor disagree" "Strongly agree"    
#> 
#> $Increased.student.performance.in.class.assignments 
#> [1] "Disagree"     "Neither agree nor disagree" 
#> [3] "Strongly agree"    
#> 
#> $Increased.learning.of.the.students 
#> [1] "Disagree"     "Neither agree nor disagree" 
#> [3] "Strongly agree"    
#> 
#> $Added.unique.learning.activities 
#> [1] "Neither agree nor disagree" "Strongly agree" 

由于在集变量不包含所有的水平,级别数变化和水平并不总是按照逻辑顺序,这将需要修复。这是错误/挫折的常见根源!