我试图从HP产品描述中过滤掉产品线和产品模型的信息。正则表达式来过滤产品模型（Python）

例子：

HP EliteDesk 800 G1 SFF (H3S08US#ABA) 
HP Pro 3400 Series MT (H3S08US#ABA) 
HP EliteBook 8460p (H3S08US#ABA)

预期输出：

Production line: EliteDesk 
Production model: 800 G1 

Production line: Pro 
Production model: 3400 Series 

Production line: EliteBook 
Production model: 8460p

以下是我对现在。

product_line = re.search('([a-zA-Z]+) ([a-zA-Z]*\d+[a-zA-Z]*)', model).group(1) 
product_model = re.search('([a-zA-Z]+) ([a-zA-Z]*\d+[a-zA-Z]*)', model).group(2)

但是，第一个和第二个示例的输出结果为800,3400。

有没有更好的方法来过滤掉这些信息？非常感谢你的所有先进

来源

2017-02-28 weijie lin

该行总是正好是一个单词吗？ –

用正则表达式和分裂

你可以只使用：

"HP (\w+) (.*?) \((.*)\)"

这里有Regex101.com一个example。

import re 

text="""HP EliteDesk 800 G1 SFF (H3S08US#ABA) 
HP Pro 3400 Series MT (H3S08US#ABA) 
HP EliteBook 8460p (H3S08US#ABA)""" 

pattern = re.compile("HP (\w+) (.*?) \((.*)\)") 


for line, model, serial in re.findall(pattern, text): 
    print "Production line : %s" % line 
    print "Production model : %s" % ' '.join(model.split(' ')[:2]) # Only the first two words 
    print "Serial number : %s" % serial 
    print

它输出：

Production line : EliteDesk 
Production model : 800 G1 
Serial number : H3S08US#ABA 

Production line : Pro 
Production model : 3400 Series 
Serial number : H3S08US#ABA 

Production line : EliteBook 
Production model : 8460p 
Serial number : H3S08US#ABA

只需用正则表达式

如果你只想要一个正则表达式的解决方案，你可以使用：

pattern = re.compile("HP ([a-z]+) (\d+[a-z]?(?: \w+)?) .*?\((.*)\)", re.IGNORECASE)

只是分裂

text="""HP EliteDesk 800 G1 SFF (H3S08US#ABA) 
HP Pro 3400 Series MT (H3S08US#ABA) 
HP EliteBook 8460p (H3S08US#ABA)""" 

for line in text.split("\n"): 
    words = line.split() 
    hp, hp_line = words[:2] 
    hp_model = ''.join(words[2:-1][:2]) 
    serial = words[-1] 
    print "Production line : %s" % hp_line 
    print "Production model : %s" % hp_model 
    print "Serial number : %s" % serial 
    print

来源

2017-02-28 15:46:17

非常感谢。我正在解决问题。 –

正则表达式来过滤产品模型（Python）

回答

用正则表达式和分裂

只需用正则表达式

只是分裂

相关问题