正则表达式python后引语

我想开发一个Python程序，将从潘多拉的twit获得艺术家的名字。举例来说，如果我有这个推特：正则表达式python后引语

我在听潘多拉的Luther Vandross的“I Can Make It Better”#pandora http://t.co/ieDbLC393F。

我想只得到名字路德范德罗斯回来。我不知道很多关于正则表达式，所以我试着做下面的代码：

print re.findall('".+?" by [\w+]+', text)

但结果却路德

“我可以做的更好”

你对我怎么会什么想法能够在python上开发一个正则表达式来获得它？

来源

2015-06-21 Filipe

>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.''' 

>>> import re 
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s) 
>>> m 
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P> 
>>> m.groups() 
('I Can Make It Better', 'Luther Vandross')

多个测试用例：

>>> tests = [ 
    '''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''', 
    '''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''', 
    '''I'm listening to "It's Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''', 
    '''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''', 
    '''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1''' 
    '''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''', 
    '''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun''' 
] 
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora') 
>>> for s in tests: 
     print(expr.search(s).groups()) 

("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB') 
('G.O.D. Remix', 'Canton Jones') 
("It's Been Awhile", '@staindmusic') 
('Everlong', '@foofighters') 
('El Preso (2000)', 'Fruko Y Sus Tesos') 
("Space Age Pimpin'", '8 Ball & MJG')

来源

2015-06-21 16:35:56 poke

非常感谢！我设法使它适用于这个=） – Filipe

我扫描了Twitter上的#Pandora主题标签了解更多示例，并调整了表达式使其适用于所有这些模式。 – poke

您需要使用捕获组。

print re.findall(r'"[^"]*" by ([A-Z][a-z]+(?: [A-Z][a-z]+){0,2})', text)

我用的量词repeatation，因为这个名字可能只包含名字或第一，姓氏或名字，中间，最后一个名字。

来源

2015-06-21 16:37:22

非常感谢您的帮助=） – Filipe

print re.findall('".+?" by ((?:[A-Z][a-z]+)+)', text)

你可以试试看。

https://regex101.com/r/vH0iN5/5

来源

2015-06-21 16:37:58 vks

您可以使用此环视基于正则表达式：

str = 'I\'m listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'; 
print re.search(r'(?<=by).+?(?= on)', str).group() 
Luther Vandross

来源

2015-06-21 16:38:48 anubhava

你的正则表达式是接近的，但你可以改变分隔符使用" by和on。但是，您需要使用括号来捕获组。

您可以使用这样的正则表达式：

" by (.+?) on

Working demo

Regular expression visualization

这个表达式背后的想法是捕捉" by和on之间的内容，使用简单nongreedy正则表达式。

匹配信息

MATCH 1 
1. [43-58] `Luther Vandross`

代码

import re 
p = re.compile(ur'" by (.+?) on') 
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n" 

re.search(p, test_str)

来源

2015-06-21 16:39:07

感谢您的帮助=），我对理解正则表达式的工作原理有些困难，但是这个答案使得它更加清晰。 – Filipe

@菲力高兴地帮忙。 –

正则表达式python后引语

回答

相关问题