2015-06-21 45 views
1

到字典中,我得到了下面的脚本输出:将多行脚本输出使用正则表达式

*************************************************** 
[g4u2680c]: searching for domains 
--------------------------------------------------- 
host = g4u2680c.houston.example.com 
     ipaddr = [16.208.16.72] 
     VLAN = [352] 
     Gateway= [16.208.16.1] 
     Subnet = [255.255.248.0] 
     Subnet = [255.255.248.0] 
     Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c] 

host = g4u2680c.houston.example.com 
     ipaddr = [16.208.16.72] 
     VLAN = [352] 
     Gateway= [16.208.16.1] 
     Subnet = [255.255.248.0] 
     Subnet = [255.255.248.0] 
     Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c] 

* script completed Mon Jun 15 06:13:14 UTC 2015 ** 
* sleeping 30 to avoid DOS on dns via a loop ** 

我需要2个主机列表中提取到一个字典,用括号。

这里是我的代码:

#!/bin/env python 

import re 

text="""*************************************************** 
[g4u2680c]: searching for domains 
--------------------------------------------------- 
host = g4u2680c.houston.example.com 
     ipaddr = [16.208.16.72] 
     VLAN = [352] 
     Gateway= [16.208.16.1] 
     Subnet = [255.255.248.0] 
     Subnet = [255.255.248.0] 
     Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c] 

host = g4u2680c.houston.example.com 
     ipaddr = [16.208.16.72] 
     VLAN = [352] 
     Gateway= [16.208.16.1] 
     Subnet = [255.255.248.0] 
     Subnet = [255.255.248.0] 
     Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c] 

* script completed Mon Jun 15 06:13:14 UTC 2015 ** 
* sleeping 30 to avoid DOS on dns via a loop ** 
*************************************************** 
""" 

seq = re.compile(r"host.+?\n\n",re.DOTALL) 

a=seq.findall(text) 

matches = re.findall(r'\w.+=.+', a[0]) 

matches = [m.split('=', 1) for m in matches] 

matches = [ [m[0].strip().lower(), m[1].strip().lower()] for m in matches] 

#should have function with regular expression to remove bracket here 

d = dict(matches) 

print d 

我走到这一步,第一个主机是什么:

{'subnet': '[255.255.248.0]', 'vlan': '[352]', 'ipaddr': '[16.208.16.72]', 'cluster': '[g4u2679c g4u2680c g9u1484c g9u1485c]', 'host': 'g4u2680c.houston.example.com', 'gateway': '[16.208.16.1]'} 

我需要帮助找到正则表达式来卸下支架在词典中的价值包含带和不带括号的数据。

或者如果有更好更简单的方法将原始脚本输出转换为字典。

+0

检查我的答案.. –

回答

1

您可以使用:(\w+)\s*=\s*\[?([^\n\]]+)\]?

demo

import re 
p = re.compile(ur'(\w+)\s*=\s*\[?([^\n\]]+)\]?', re.MULTILINE) 
test_str = u"host = g4u2680c.houston.example.com\n   ipaddr = [16.208.16.72]\n   VLAN = [352]\n   Gateway= [16.208.16.1]\n   Subnet = [255.255.248.0]\n   Subnet = [255.255.248.0]\n   Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]\n\nhost = g4u2680c.houston.example.com\n   ipaddr = [16.208.16.72]\n   VLAN = [352]\n   Gateway= [16.208.16.1]\n   Subnet = [255.255.248.0]\n   Subnet = [255.255.248.0]\n   Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]\n" 

re.findall(p, test_str) 
+0

尼斯。并感谢演示网站:) –

+0

好的,也谢谢。如果这对你有帮助,那么接受这个答案。 –

1

您可以简单地使用re.findalldict

>>> dict([(i,j.strip('[]')) for i,j in re.findall(r'(\w+)\s*=\s*(.+)',text)]) 
{'Subnet': '255.255.248.0', 'VLAN': '352', 'ipaddr': '16.208.16.72', 'Cluster': 'g4u2679c g4u2680c g9u1484c g9u1485c', 'host': 'g4u2680c.houston.example.com', 'Gateway': '16.208.16.1'} 

而且你可以通过str.strip方法删除括号。

+0

您的解决方案不匹配的主机名,如主机名没有支架。 –

+0

@SharuzzamanAhmatRaslan如果你也想要主机名,你可以遍历're.findall()'并用'str.strip'去掉括号。 – Kasramvd

+1

我希望我可以接受多个答案,因为你的答案也很有趣 –

0

你可以试试这个。

matches = [m.replace('[','').replace(']','').split('=', 1) for m in matches]