我正在尝试清理数据集并在名称下创建3个变量:Adventure,Action和Comedy。原始数据集有3000个观测值(导入的文件名:dat)。我只显示一些意见使用多个变量创建变量
id Runtime Genres
37 75 animation, adventure, family, fantasy, musical
1 162 action, adventure, fantasy, sci_fi
95 126 action, fantasy
100 101 comedy, drama, fantasy
82 136 action, adventure, sci-fi
99 117 animation, adventure, comedy, family, sport
91 95 animation, comedy, crime, family
R中导入数据集后分离所有类别分为5使用下述R代码:
dat1 <- dat %>% separate (Genres, c("Genres1","Genres2" ,"Genres3" ,"Genres4" ,"Genres5"), sep=",", extra = "drop", fill = "right")
id Runtime Genres1 Genres2 Genres3 Genres4 Genres5
37 75 animation adventure family fantasy musical
1 162 action adventure fantasy sci_fi
95 126 action fantasy
100 101 comedy drama fantasy
82 136 action adventure sci-fi
99 117 animation adventure comedy family sport
91 95 animation comedy crime family
如何折叠所有类型为1类各行动,冒险和喜剧?
我用下面的代码尝试:
创建使用
dat1 ["adventure"] <- NA
dat1$adventure <- ifelse(dat1$Genres1=="adventure",1,(ifelse(dat1$Genres2=="adventure",1,0)))
缩短了代码的建议后冒险一空列
dat1$adventure <- ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure" | dat1$Genres3=="adventure" | dat1$Genres4=="adventure"),1, 0)
id Runtime Genres1 Genres2 Genres3 Genres4 Genres5 Adventure
37 75 animation adventure family fantasy musical 0
1 162 action adventure fantasy sci_fi 0
95 126 action fantasy 0
100 101 comedy drama fantasy 0
82 136 action adventure sci-fi 0
99 117 animation adventure comedy family sport 0
91 95 animation comedy crime family 0
的代码能够提取冒险Genres1
,但返回零为Genres2
。
我重新修正了这个问题。我尝试了一些建议,但不知道如何去做,因为有3000次观察。
运行建议流派,形成向量的
列表并将其分配给DAT2
dat2 <- c("adventure", "comedy", "action", "drama", "animation", "fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror", "musical","history", "war", "documentary", "biography")
表(因子(DAT2))表(因子(DAT2))
action adventure animation biography comedy documentary drama
1 1 1 1 1 1 1
family fantasy history horror musical mystery romance
1 1 1 1 1 1 1
sci-fi thriller war
1 1 1
后
创建功能
fun1 <- function("adventure", "comedy", "action", "drama", "animation",
"fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror",
"musical","history", "war", "documentary", "biography")) {
vector_of_cur_genres <- seperate(i, sep = ", ")
result <- table(factor(vector_of_cur_genres, dat2))
return(result)
}
# Results
fun1 <- function("adventure", "comedy", "action", "drama",
"animation", "fantasy", "mystery", "family", "sci-fi", "thriller",
"romance", "horror", "musical","history", "war", "documentary",
"biography")) {
Error: unexpected string constant in "fun1 <- function("adventure""
> vector_of_cur_genres <- separate(i, sep = ", ")
Error: Please supply column name
> result <- table(factor(vector_of_cur_genres, dat2))
Error in factor(vector_of_cur_genres, dat2) :
object 'vector_of_cur_genres' not found
> return(result)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"
mat <- mapply(fun1,dat2$Genres)
Error in match.fun(FUN) : object 'fun1' not found
仅供参考,有没有需要分配给它之前创建一个空的新列:分配创建也无妨。 –
欢迎来到Stack Overflow! [如何做一个伟大的R可重现的例子?](http://stackoverflow.com/questions/5963269) – zx8754
可能地,将数据从宽转换为长,然后将表汇总。 – zx8754