2014-09-02 176 views
0

说我有文字是这样的:分割字符串递归

pattern = "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')" 

的挑战是如何使用单词分隔符从

c(" ","-","/","\\","_",":","(",")",".",",") 

家人将其分割成单词。

期望的结果:

"This" "is" "some" "word" "expression" "I'd" "like" "to" "parse" "intelligently" "using" "special" "symbols" "like" 

方法

我可以用做sapplyfor循环:

keywords = unlist(strsplit(pattern," ")) 
keywords = unlist(strsplit(keywords,"-")) 

#等

问题:

但是什么解决方案使用Reduce(f, x, init, accummulate=TRUE)

回答

4

您可以使用选项perl = TRUE再拆标点符号或空间

> strsplit(pattern, '[[:punct:]]|[[:space:]]', perl = TRUE) 
[[1]] 
[1] "This"   "is"   "some"   "word"   "expression" 
[6] "I"    "d"    "like"   "to"   "parse"   
[11] "intelligently" "using"   "special"  "symbols"  "like"   
[16] ""  
+0

的确非常优雅! – 2014-09-02 10:25:39

+0

虽然... – 2014-09-02 10:34:21

+0

其实并不介意“我”+“d”与“我会”。为了简单起见,我将在 – 2014-09-02 10:45:56

5

您不应该在这里需要Reduce。你应该能够做到像下面这样:

splitters <- c(" ","/","\\","_",":","(",")",".",",","-") # dash should come last 
pattern <- paste0("[", paste(splitters, collapse = ""), "]") 
string <- "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')" 
strsplit(string, pattern)[[1]] 
# [1] "This"   "is"   "some"   "word"   
# [5] "expression" "I'd"   "like"   "to"   
# [9] "parse"   "intelligently" "using"   "special"  
# [13] "symbols"  "like"   "'"    "'" 

注意,在一个正则表达式字符类-应该摆在第一个或最后一个,所以我已经编辑相应的“分离器”的载体。此外,您可能希望在“模式”末尾添加+,以防止您想将多个空格合并为一个空格。

+0

@DavidArenburg,它更接近了。 – A5C1D2H2I1M1N2O1R2T1 2014-09-02 10:42:14

+0

非常有帮助的情况下,需要添加自定义到其他答案 – 2014-09-02 10:49:10

+0

为什么“短跑应该最后”的任何原因? – 2014-09-02 12:47:14

2

我会去(这将让"I'd"在一起)

strsplit(pattern, "[^[:alnum:][:digit:]']") 
## [[1]] 
## [1] "This"   "is"   "some"   "word"   "expression" "I'd"   "like"   "to"   "parse"   
## [10] "intelligently" "using"   "special"  "symbols"  "like"   "'"    "'"