为什么这个正则表达式不能在R

中工作我试过grep，grepl，regexpr，gregexpr和所有返回失败或非整数。为什么这个正则表达式不能在R

Ojbect是“test”，它是一个带地址的字符串。例如提供：

[9972] "1350 Hwy 160 W\nFort Mill, SC 29715"                 
[9973] "Sonoran Desert Dentistry\n9220 E Raintree Dr\nSte 102\nScottsdale, AZ 85260"       
[9974] "3252 Vilas Rd\nCottage Grove, WI 53527"                
[9975] "224 W Cottage Grove Rd\nCottage Grove, WI 53527"              
[9976] "320 W Cottage Grove Rd\nCottage Grove, WI 53527"              
[9977] "7914 State Road 19\nDane, WI 53529"                 
[9978] "106 Dane St\nDane, WI 53529"

的目标是在最后的“\ n”所以只是我市通过邮编保持提取的一切。像“山寨格罗夫，WI 53527”

这里是行不通的grep和正则表达式的样本：

> grep("\\[^\\]+$", test) 
integer(0)

任何帮助将是巨大的。

来源

2015-11-20 frameworkgeek

有这些文本行没有反斜杠。您需要知道，使用转义字符的字符值的“print”输出与“cat”输出不同。阅读'？Quotes'并尝试一些线路上的'cat'。（...我认为''[^ \\]“'会与任何东西匹配。） –

grep()不会改变文字。它只能找到它，并返回匹配索引或匹配本身。要更改匹配的文本，您希望使用sub()或gsub()。在这种情况下，sub()是合适的，因为要删除每个字符串中最后一次换行的所有内容。以下应该做到这一点。

sub(".*\n", "", test) 
# [1] "Fort Mill, SC 29715"  "Scottsdale, AZ 85260"  
# [3] "Cottage Grove, WI 53527" "Cottage Grove, WI 53527" 
# [5] "Cottage Grove, WI 53527" "Dane, WI 53529" 
# [7] "Dane, WI 53529"

.*是贪婪的，匹配任何
\n就是我们要找的

由于.*是贪婪的，这将删除一切直到并包括最后\n。

数据：

test <- c("1350 Hwy 160 W\nFort Mill, SC 29715", "Sonoran Desert Dentistry\n9220 E Raintree Dr\nSte 102\nScottsdale, AZ 85260", 
"3252 Vilas Rd\nCottage Grove, WI 53527", "224 W Cottage Grove Rd\nCottage Grove, WI 53527", 
"320 W Cottage Grove Rd\nCottage Grove, WI 53527", "7914 State Road 19\nDane, WI 53529", 
"106 Dane St\nDane, WI 53529")

来源

2015-11-20 02:17:05

你好，我欠你一杯啤酒。 – frameworkgeek

为什么这个正则表达式不能在R

回答

相关问题