2015-07-19 51 views
1

我在与一个Python正则表达式困难正则表达式结束。我想罚款任何N,S,E,W,NB,SB,EB,WB,包括字符串的开头或结尾。我的正则表达式很容易在中间找到它,但在开始或结束时都失败。麻烦匹配图案或在Python

任何人都可以建议我在做什么毛病dirPattern我下面的代码示例?

注:我知道我有一些其他的问题来处理(例如,“W的”),但想我知道如何修改正则表达式的。

在此先感谢。

import re 

nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15', 
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean', 
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W', 
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran', 
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East', 
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)'] 

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'} 

dirPattern = re.compile(r'[ ^]([NSEW])B?[ $]') 

print('name\tmatch\tdirSting\tdirection') 
for name in nameList: 
    match = dirPattern.search(name) 
    direction = None 
    dirString = None 
    if match: 
     dirString = match.group(1) 
     if dirString in dirMap: 
      direction = dirMap[dirString] 
    print('%s\t%s\t%s\t%s'%(name, match, dirString, direction)) 

一些样品预期输出:

name match dirSting direction

Boulder Highway and US 95 NB <_sre.SRE_Match object at 0x7f68af836648> N North

Boulder Hwy and US 95 SB <_sre.SRE_Match object at 0x7f68ae836648> S South

Buffalo and Summerlin N <_sre.SRE_Match object at 0x7f68af826648> N North

Charleston and I-215 W <_sre.SRE_Match object at 0x7f68cf836648> W West

Flamingo and NB I-15 <_sre.SRE_Match object at 0x7f68af8365d0> N North

S Buffalo and Summerlin <_sre.SRE_Match object at 0x7f68aff36648> S South

Gibson and I-215 EB <_sre.SRE_Match object at 0x7f68afa36648> E East

然而,开始或结束的例子给:

Boulder Highway and US 95 NB None None None

+2

'^'和'$'*括号内*并不意味着仍然字符串的开始/结束,你知道吗? – jonrsharpe

+0

乔恩,谢谢,我不知道,虽然我开始怀疑这一点。 –

+1

你想要做什么?你也可以使用'direction = dirMap.get(dirString)',如果字典 –

回答

0

此代码中的正则表达式修改的伎俩。这包括“在E”搬运东西像“W的”,以及类似:

import re 

nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15', 
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean', 
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W', 
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran', 
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East', 
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)'] 

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'} 

dirPattern = re.compile(r'(?:^|)(?<! at)(?<! of)([NSEW])B?(?! of)(?: |$)') 

print('name\tdirSting\tdirection') 
for name in nameList: 
    match = dirPattern.search(name) 
    direction = None 
    dirString = None 
    if match: 
     dirString = match.group(1) 
     direction = dirMap.get(dirString) 
    print('> %s\t\t%s\t%s'%(name, dirString, direction)) 

正则表达式可以如下理解:

(?:^|)开始与字符串或者开始或空间

(?<! at) '在'

(?<! of)不是由前面之前没有通过 '的'

([NSEW]) 'N', 'S', 'E', 'W' 中的任何一个(这将是在match.group(1))

B?任选随后 'B'(如在结合)

(?! of)不后跟 '在' 与串的任一端或空格

(?: |$)

最终输出是:

Boulder Highway and US 95 NB N North

Boulder Hwy and US 95 SB S South

Buffalo and Summerlin N N North

Charleston and I-215 W W West

Eastern and I-215 S S South

Flamingo and NB I-15 N North

S Buffalo and Summerlin S South

Flamingo and SB I-15 S South

Gibson and I-215 EB E East

I-15 at 3.5 miles N of Jean None None

I-15 NB S I-215 (dual) N North

I-15 SB 4.3 mile N of Primm S South

I-15 SB S of Russell S South

I-515 SB at Eastern W S South

I-580 at I-80 N E N North

I-580 at I-80 S W S South

I-80 at E 4TH St Kietzke Ln None None

I-80 East of W McCarran None None

LV Blvd at I-215 S S South

S Buffalo and I-215 W S South

S Decatur and I-215 WB S South

Sahara and I-15 East None None

Sands and Wynn South Gate None None

Silverado Ranch and I-15 (west side) None None

西特注意:我决定我不想结束字符串的情况。对于这一点,正则表达式是:

dirPattern = re.compile(r'(?:^|)(?<! at)(?<! of)([NSEW])B? (?!of)')

1

您需要使用lookarounds

dirPattern = re.compile(r'(?<!\S)([NSEW])B?(?!\S)') 

[ ^]会匹配空格或插入符号。 (?<!\S)负面lookbehind断言,比赛将在任何机器人之前,而不是非空间字符。 (?!\S)断言他匹配的后面不能有非空格字符。

为什么我用积极的方式使用负面预测,python的默认re模块将不支持(?<=^|)

+0

*“carrot symbol”* - [caret](https://en.wikipedia.org/wiki/Caret)? – jonrsharpe

+0

Avinash,感谢您的提示。答案,我开始用lookaround来处理像'E 2nd St'或'I-15 W'这样的案例(都被排除在外,我想要的是N,NB等等,但是只有它自己,也就是在开始时接着是空间,在p末尾被空间退回,或在前后空间的中间。你的答案可能会让我在那里,但现在我不知道如何。 –