2017-02-27 64 views
1

我在R中有一个字符数组。某些字符串有一个'(number)'模式附加到该字符串。我试图从正则表达式中删除这个'(数字)'字符串,但无法弄清楚。我可以访问字符串中有一个空格而不是一个字符的所有行的行,但是必须有一种方法来查找这些数字字符串。使用正则表达式与R

dat <- c("Alabama-Birmingham", "Arizona State", "Canisius", "UCF", "George Washington", 
      "Green Bay", "Iona", "Louisville (7)", "UMass", "Memphis", "Michigan State", 
      "Milwaukee", "Nebraska", "Niagara", "Northern Kentucky", "Notre Dame (21)", 
      "Quinnipiac", "Siena", "Tulsa", "Washington State", "Wright State", 
      "Xavier") 

    rows <- grep(" (.*)", dat) 
    fixed <- gsub(" (.*)","",games[rows,]) 
    dat = fixed 

回答

2

首先,你需要对括号进行转义,这将是好是更具体的了解里面有什么东西他们

gsub("\\s+\\(\\d+\\)", "", dat) 
[1] "Alabama-Birmingham" "Arizona State"  "Canisius"   
[4] "UCF"    "George Washington" "Green Bay"   
[7] "Iona"    "Louisville"   "UMass"    
[10] "Memphis"   "Michigan State"  "Milwaukee"   
[13] "Nebraska"   "Niagara"   "Northern Kentucky" 
[16] "Notre Dame"   "Quinnipiac"   "Siena"    
[19] "Tulsa"    "Washington State" "Wright State"  
[22] "Xavier" 
+0

这是做到这一点太好了,谢谢你的帮助。 – Developing

0

我们可以sub

sub("\\s*\\(.*", "", dat) 
#[1] "Alabama-Birmingham" "Arizona State"  "Canisius"   
#[4] "UCF"    "George Washington" "Green Bay"   
#[7] "Iona"    "Louisville"   "UMass"    
#[10] "Memphis"   "Michigan State"  "Milwaukee"   
#[13] "Nebraska"   "Niagara"   "Northern Kentucky" 
#[16] "Notre Dame"   "Quinnipiac"   "Siena"    
#[19] "Tulsa"    "Washington State" "Wright State"  
#[22] "Xavier"