在REGEX中匹配和替换多个带引号的字符串

我想用R中的下划线替换引号内的所有空格。我不知道如何在有多个引号时正确定义带引号的字符串。我的开始努力失败了，我甚至没有得到单/双引号。在REGEX中匹配和替换多个带引号的字符串

require(stringi) 
s = "The 'quick brown' fox 'jumps over' the lazy dog" 
stri_replace_all(s, regex="('.*) (.*')", '$1_$2') 
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"

感谢您的帮助。

来源

2017-05-25 geotheory

只有当你有一个庸俗的头脑;） – geotheory

你需要考虑内部转义序列？你正在处理正确逃脱的字符串？如果您可以匹配整个相关的“....”子字符串，那么您可以替换匹配内的任何文本。 –

让我们假设你需要匹配以'启动所有非重叠的子串，则比其他' 1个或多个字符，然后用'结束。该模式是'[^']+'。

然后，可以使用下面的基础R代码：

x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog" 
gr <- gregexpr("'[^']+'", x) 
mat <- regmatches(x, gr) 
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_") 
x 
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"

this R demo见。或者，使用gsubfn：

> library(gsubfn) 
> rx <- "'[^']+'" 
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog" 
> gsubfn(rx, ~ gsub("\\s", "_", x), s) 
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog" 
>

为了支持转义序列，你可以使用一个更复杂的PCRE正则表达式：

(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'

详细：

(?<!\\) - 没有\之前当前位置
(?:\\{2})* - 零个或更多个序列2 \小号
\K - 匹配复位操作者
' - 单引号
[^'\\]* - 零个或更多 - 零个或多个字符比'和\
(?:\\.[^'\\]*)*其他序列：
- \\. - a \后跟任何c哈日但一个换行符
- [^'\\]* - 零个或多个字符比'和\
'其他 - 一个单引号。

而且R demo会是什么样

x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog" 
cat(x, sep="\n") 
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE) 
mat <- regmatches(x, gr) 
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_") 
cat(x, sep="\n")

输出：

The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog 
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog

来源

2017-05-25 23:15:18

我有同样的想法 - 我不知道单独保存'mat'是否有什么好处，因为无论如何你必须运行regmatches'两次。 +1无论 - regmatches <-'确实是一个非常有用的功能。 – thelatemail

是的，我也认为使用PCRE regex选项的base R非常强大，并且是必须处理转义序列时唯一最方便的选项（请参阅更新）。 –

综合，谢谢Wiktor。我不是假装理解PCRE的例子.. – geotheory

试试这个：

require(stringi) 
s = "The 'quick brown' fox 'jumps over' the lazy dog" 
stri_replace_all(s, regex="('[a-z]+) ([a-z]+')", '$1_$2')

来源

2017-05-25 23:06:15 AChervony

这假设在''''+'字母'和'字母'+'''之间只有一个空格。 –

这不适用于引号内的两个以上单词。 – Rahul

我认为。*太贪婪。这就是为什么具体 - 去信件可能会有所帮助。你需要修改你的字符串是大写字母还是特殊字符。 – AChervony

在REGEX中匹配和替换多个带引号的字符串

回答

相关问题