2017-03-07 51 views
1

我有一个名为cars串如下:正则表达式到直到托架关闭的第一次出现

cars 
[1] "Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair" 
[2] "Other car(21, model-155) looked in good condition but car (36, model-8878) looked to be in terrible condition." 

我需要从字符串中提取以下部分:

car(52;model-14557) 
car(21, model-155) 
car (36, model-8878) 

我尝试使用下面的一块可以提取它:

stringr::str_extract_all(cars, "(.car\\s{0,5}\\(([^]]+)\\))") 

这给了我以下输出:

[[1]] 
[1] " car(52;model-14557) had a good engine(workable condition)" 

[[2]] 
[1] " car(21, model-155) looked in good condition but car (36, model-8878)" 

有没有一种方法可以提取带有关联号码和型号的单词汽车?

回答

2

Your regex does not work因为您使用的不是]匹配(),因而从第一(直到最后)中间没有]匹配其他[^]]+,一个或多个符号。

使用

> cars <- c("Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair","Other car(21, model-155) looked in good condition but car (36, model-8878) looked to be in terrible condition.") 
> library(stringr) 
> str_extract_all(cars, "\\bcar\\s*\\([^()]+\\)") 
[[1]] 
[1] "car(52;model-14557)" 

[[2]] 
[1] "car(21, model-155)" "car (36, model-8878)" 

正则表达式为\bcar\s*\([^()]+\),看到online regex demo here

它匹配:

  • \b - 字边界
  • car - 字面炭序列
  • \s* - 0+空格
  • \( - 字面(
  • [^()]+ - 1或除()之外的更多字符
  • \) - 字面值)

注相同的正则表达式将产生与以下基础R代码相同的结果:

> regmatches(cars, gregexpr("\\bcar\\s*\\([^()]+\\)", cars)) 
[[1]] 
[1] "car(52;model-14557)" 

[[2]] 
[1] "car(21, model-155)" "car (36, model-8878)" 
+1

正是我想要的。谢谢 – SBista

相关问题