Python：嵌套键值数据解析

我想创建一个python脚本，它可以解析以下类型的日志条目，其中包括键和值。对于每个键，可能有也可能不存在另一对嵌套键和值。一个例子如下。嵌套的深度可以根据我得到的日志而变化，所以它必须是动态的。然而，深度是用大括号封装的。Python：嵌套键值数据解析

我将与键和值的字符串是这样的：上面

Countries =  { 
    "USA" = 0; 
    "Spain" = 0; 
    Connections = 1; 
    Flights =   { 
     "KLM" = 11; 
     "Air America" = 15; 
     "Emirates" = 2; 
     "Delta" = 3; 
    }; 
    "Belgium" = 1; 
    "Czech Republic" = 0; 
    "Netherlands" = 1; 
    "Hungary" = 0; 
    "Luxembourg" = 0; 
    "Italy" = 0; 

};

的数据可以有多个巢为好。我想编写将通过此解析功能，并把它放在一组数据（或类似），使得我能得到这样一个特定键的值：

print countries.belgium 
      value should be printed as 1

同样，

print countries.flights.delta 
      value should be printed as 3.

请注意，输入不需要在所有键（如连接或航班）中有引号。

任何指向我可以开始的东西。任何可以像这样解析的python库？

来源

2016-02-29 user2605278

我已经创建了一个示例Python脚本，将做的工作，只是调整它作为你喜欢。它将您的格式转换为嵌套字典。它像你喜欢的一样动态。

在这里看看：Paste bin 代码：

import re 
import ast 

data = """ { Countries = { USA = 1; "Connections" = { "1 Flights" = 0; "10 Flights" = 0; "11 Flights" = 0; "12 Flights" = 0; "13 Flights" = 0; "14 Flights" = 0; "15 Flights" = 0; "16 Flights" = 0; "17 Flights" = 0; "18 Flights" = 0; "More than 25 Flights" = 0; }; "Single Connections" = 0; "No Connections" = 0; "Delayed" = 0; "Technical Fault" = 0; "Others" = 0; }; }""" 


def arrify(string): 
    string = string.replace("=", " : ") 
    string = string.replace(";", " , ") 
    string = string.replace("\"", "") 
    stringDict = string.split() 
    # print stringDict 
    newArr = [] 
    quoteCosed = True 
    for i, splitStr in enumerate(stringDict): 
     if i > 0: 
      # print newArr 
      if not isDelim(splitStr): 
       if isDelim(newArr[i-1]) and quoteCosed: 
        splitStr = "\"" + splitStr 
        quoteCosed = False 

       if isDelim(stringDict[i+1]) and not quoteCosed: 
        splitStr += "\"" 
        quoteCosed = True 

     newArr.append(splitStr) 

    newString = " ".join(newArr) 
    newDict = ast.literal_eval(newString) 
    return normalizeDict(newDict) 

def isDelim(string): 
    return str(string) in "{:,}" 


def normalizeDict(dic): 
    for key, value in dic.items(): 
     if type(value) is dict: 
      dic[key] = normalizeDict(value) 
      continue 
     dic[key] = normalize(value) 
    return dic 

def normalize(string): 
    try: 
     return int(string) 
    except: 
     return string 

print arrify(data)

从样本数据结果：

{'Countries': {'USA': 1, 'Technical Fault': 0, 'No Connections': 0, 'Delayed': 0, 'Connections': {'17 Flights': 0, '10 Flights': 0, '11 Flights': 0, 'More than 25 Flights': 0, '14 Flights': 0, '15 Flights': 0, '12 Flights': 0, '18 Flights': 0, '16 Flights': 0, '1 Flights': 0, '13 Flights': 0}, 'Single Connections': 0, 'Others': 0}}

，你可以得到像一个正常的字典值将:)希望它帮助...

来源

2016-02-29 10:14:16 rrw

你确实需要在你的答案中包含代码。只是连接到它是不够的。 – Blckknght

@richmondwang，正是我在找的东西。然而，这次我的动态字符串如下，这给了我一个语法错误： – user2605278

你传递了什么数据？ @ user2605278 – rrw

迭代数据并检查元素是否是另一个键 - 值对，如果是，则递归调用该函数。事情是这样的：

def parseNestedData(data): 
    if isinstance(data, dict): 
     for k in data.keys(): 
      parseNestedData(data.get(k)) 
    else: 
     print data

输出：

>>> Countries =  { 
"USA" : 0, 
"Spain" : 0, 
"Connections" : 1, 
"Flights" :   { 
    "KLM" : 11, 
    "Air America" : 15, 
    "Emirates" : 2, 
    "Delta" : 3, 
}, 
"Belgium" : 1, 
"Czech Republic" : 0, 
"Netherlands" : 1, 
"Hungary" : 0, 
"Luxembourg" : 0, 
"Italy" :0 
}; 

>>> Countries 
{'Connections': 1, 
'Flights': {'KLM': 11, 'Air America': 15, 'Emirates': 2, 'Delta': 3}, 
'Netherlands': 1, 
'Italy': 0, 
'Czech Republic': 0, 
'USA': 0, 
'Belgium': 1, 
'Hungary': 0, 
'Luxembourg': 0, 'Spain': 0} 
>>> parseNestedData(Countries) 
1 
11 
15 
2 
3 
1 
0 
0 
0 
1 
0 
0 
0

来源

2016-02-29 09:25:46 Himanshu

谢谢Himanshu。我怎样才能得到说捷克共和国的价值（应该返回我只是0） – user2605278

也需要一些预处理？因为并非所有密钥都用双引号括起来，例如 - Connections – user2605278

如果您知道捷克共和国密钥存在于第一级别，那么只需执行'data.get（'Czech Republic'）' – Himanshu

定义一个类结构来处理和存储信息，可以给你这样的事情：

import re 

class datastruct(): 
    def __init__(self,data_in): 
     flights = re.findall('(?:Flights\s=\s*\{)([\s"A-Z=0-9;a-z]*)};',data_in) 
     flight_dict = {} 
     for flight in flights[0].split(';')[0:-1]: 
      key,val = self.split_data(flight) 
      flight_dict[key] = val 

     countries = re.findall('("[A-Za-z]+\s?[A-Za-z]*"\s=\s[0-9]{1,2})',data_in) 
     countries_dict = {} 
     for country in countries: 
      key,val = self.split_data(country) 
      if key not in flight_dict: 
       countries_dict[key]=val 

     connections = re.findall('(?:Connections\s=\s)([0-9]*);',data_in) 
     self.country= countries_dict 
     self.flight = flight_dict 
     self.connections = int(connections[0]) 

    def split_data(self,data2): 
     item = data2.split('=') 
     key = item[0].strip().strip('"') 
     val = int(item[1].strip()) 
     return key,val

请注意，如果数据与我在下面假设的不完全一致，则可能需要调整Regex。数据可以如下设置和参考：

raw_data = 'Countries =  { "USA" = 0; "Spain" = 0; Connections = 1; Flights =   {  "KLM" = 11;  "Air America" = 15;  "Emirates" = 2;  "Delta" = 3; }; "Belgium" = 1; "Czech Republic" = 0; "Netherlands" = 1; "Hungary" = 0; "Luxembourg" = 0; "Italy" = 0;};' 

flight_data = datastruct(raw_data) 
print("No. Connections:",flight_data.connections) 
print("Country 'USA':",flight_data.country['USA'],'\n' 
print("Flight 'KLM':",flight_data.flight['KLM'],'\n') 

for country in flight_data.country.keys(): 
    print("Country: {0} -> {1}".format(country,flight_data.country[country]))

来源

2016-02-29 15:30:56

Python：嵌套键值数据解析

回答

相关问题