2016-04-27 63 views
0

我处理包含字符串如下[R字符串解析挑战

 Col1 
     ------------------------------------------------------------------ 
     Department of Mechanical Engineering, Department of Computer Science 
     Division of Advanced Machining, Center for Mining and Metallurgy 
     Department of Aerospace, Center for Science and Delivery 

我所试图做的是包含单词开始,要么,部门或Divison或中心,直至逗号(单独字符串列, )最终输出应该看起来像这样

 Dept_Mechanical_Eng Dept_Computer_Science Div_Adv_Machining Cntr_Mining_Metallurgy Dept_Aerospace Cntr_Science_Delivery 
     1      1      0     0      0    0 
     0      0      1     1      0    0 
     0      0      1     1      1    1 

我在预期的输出中为了审美目的而屠杀了实际名称。任何帮助解析这个字符串非常感谢。

+4

'library(splitstackshape); cSplit_e(mydf,“Col1”,“,”,type =“character”,drop = TRUE,fill = 0)'。也可以从“qdapTools”中查看'strsplit' +'mtabulate'。 – A5C1D2H2I1M1N2O1R2T1

回答

0

这与我刚刚列表另一个文本示例的问题非常相似。你和这位提问者在同一班吗? Count the number of times (frequency) a string occurs

inp <- "Department of Mechanical Engineering, Department of Computer Science 
     Division of Advanced Machining, Center for Mining and Metallurgy 
     Department of Aerospace, Center for Science and Delivery" 
inp2 <- factor(scan(text=inp,what="",sep=",")) 
#Read 6 items 
inp3 <- readLines(textConnection(inp)) 

as.data.frame(setNames(lapply(levels(inp2), function(ll) as.numeric(grepl(ll, inp3))), trimws(levels(inp2)))) 
    Department.of.Aerospace Division.of.Advanced.Machining 
1      0        0 
2      0        1 
3      1        0 
    Center.for.Mining.and.Metallurgy Center.for.Science.and.Delivery 
1        0        0 
2        1        0 
3        0        1 
    Department.of.Computer.Science Department.of.Mechanical.Engineering 
1        1         1 
2        0         0 
3        0         0 
+0

啊:)谢谢42,工作。 –