自定义格式为JSON

如何将转换为JSON格式的下列行（不知道是什么格式）？自定义格式为JSON

[root=Root [key1=value1, key2=value2, key3=Key3 [key3_1=value3_1, key3_2=value3_2, key3_3=Key3_3 [key3_3_1=value3_3_1]], key4=value4]]

其中Root，Key3，Key3_3表示复杂的元件。

到

{ 
     "root": { 
       "key1" : "value1", 
       "key2" : "value2", 
       "key3" : { 
         "key3_1" : "value3_1", 
         "key3_2" : "value3_2", 
         "key3_3" : { 
           "key3_3_1" : "value3_3_1" 
         } 
       }, 
       "key4" : "value4 
     } 
}

我要寻找的办法，而不是解决方案。如果你对这个问题进行了投票，请评论你为什么这么做。

来源

2014-10-29 joshu

是'root'，'key1'，'key3_1'等等，这种格式的标准键名？ – aa8y 2014-10-29 06:19:44

让x成为具有上述序列化的字符串。

首先，让我们用空字符串

# the string fragments like "root=Root [" need to be replaced by "root=[" 
# to achieve this, we match the regex pattern "\w+ [" 
# This matches ALL instances in the input string where we have a word bounded by "=" & " [", 
# i.e. "Root [", "Key3 [", "Key3_3" are all matched. as will any other example you can think of 
# where the `word` is composed of letters numbers or underscore followed 
# by a single space character and then "[" 
# We replace this fragment with "[", (which we will later replace with "{") 
# giving us the transformation "root=Root [" => "root=[" 
import re 
o = re.compile(r'\w+ [[]') 
y = re.sub(o, '[', x, 0)

然后更换的Root，Key3和Key3_3的出现，让分裂所产生的串入词和非词

# Here we split the string into two lists, one containing adjacent tokens (nonwords) 
# and the other containing the words 
# The idea is to split/recombine the source string with quotes around all our words 

w = re.compile(r'\W+') 
nw = re.compile(r'\w+') 

words = w.split(y)[1:-1] # ignore the end elements which are empty. 
nonwords = nw.split(y) # list elements are contiguous non-word characters, i.e not a-Z_0-9 
struct = '"{}"'.join(nonwords) # format structure of final output with quotes around the word's placeholder. 
almost_there = struct.format(*words) # insert words into the string

最后，更换方括号中有波浪状的，并且=与:

jeeson = almost_there.replace(']', '}').replace('=', ':').replace('[', '{') 
# "{'root':{'key1':'value1', 'key2':'value2', 'key3':{'key3_1':'value3_1', 'key3_2':'value3_2', 'key3_3':{'key3_3_1':'value3_3_1'}}, 'key4':'value4'}}"

来源

2014-10-29 07:02:50

这不是一个简单的正则表达式问题。他想要解析的数据可能要复杂得多，并且不会包含任何预定义的键（名称）。 – aa8y 2014-10-29 09:01:16

@ArunAllamsetty的任务是将给定的行转换为json，没有描述格式的描述。如果程序解决了这个问题 - 这可能不是海报的完整解决方案，但至少不应该得到低估。所以+1（我想给类似的解决方案，在实际发现这个之前） – pmod 2014-10-29 09:23:38

@ArunAllamsetty，我不明白为什么这不是一个简单的正则表达式问题。你能提供一个有效的输入，而不是用我上面的方法转换吗？根据我所看到的，输入是由令牌（'=，[]'）分隔的单词序列（'key1 value1'等），并且有一个以（'= ['）为界的单词的特定表达式，需要解析出来。 – 2014-10-29 12:06:43

我不得不花费两个小时左右的时间，但我认为我有一些能够根据您提供的格式工作的所有情况。如果没有，我相信这将是一个小小的改变。即使你只是提出这个想法，因为无论如何我都会编码它，下面是Python代码。

import json 

def to_json(cust_str): 
    from_index = 0 
    left_indices = [] 
    levels = {} 

    level = 0 
    for i, char in enumerate(cust_str): 
     if char == '[': 
      level += 1 
      left_indices.append(i) 
      if level in levels: 
       levels[level] += 1 
      else: 
       levels[level] = 1 
     elif char == ']': 
      level -= 1 

    level = max(levels.keys()) 
    value_stack = [] 
    while True: 
     left_index = left_indices.pop() 
     right_index = cust_str.find(']', left_index) + 1 
     values = {} 
     pairs = cust_str[left_index:right_index][1:-1].split(',') 

     if levels[level] > 0: 
      for pair in pairs: 
       pair = pair.split('=') 
       values[pair[0].strip()] = pair[1] 
     else: 
      level -= 1 
      for pair in pairs: 
       pair = pair.split('=') 
       if pair[1][-1] == ' ': 
        values[pair[0].strip()] = value_stack.pop() 
       else: 
        values[pair[0].strip()] = pair[1] 
     value_stack.append(values) 
     levels[level] -= 1 
     cust_str = cust_str[:left_index] + cust_str[right_index:] 

     if levels[1] == 0: 
      return json.dumps(values) 

if __name__ == '__main__': 
    # Data in custom format 
    cust_str = '[root=Root [key1=value1, key2=value2, key3=Key3 [key3_1=value3_1, key3_2=value3_2, key3_3=Key3_3 [key3_3_1=value3_3_1]], key4=value4]]' 
    # Data in JSON format 
    json_str = to_json(cust_str) 
    print json_str

的想法是，我们绘制水平dict一起去到自定义格式的数量，这是不对应于这些级别的字符串值的数量。除此之外，我们还跟踪给定字符串中[字符的索引。然后，我们通过弹出包含[（左侧）索引的堆栈并解析它们，从最里面的dict开始。由于每个人都被解析，我们将它们从字符串中移除并继续。其余的你可以在代码中读取。

我运行它为您提供的数据，结果如下。

{ 
    "root":{ 
     "key2":"value2", 
     "key3":{ 
     "key3_2":"value3_2", 
     "key3_3":{ 
      "key3_3_1":"value3_3_1" 
     }, 
     "key3_1":"value3_1" 
     }, 
     "key1":"value1", 
     "key4":"value4" 
    }  
}

为了确保它适用于更一般的情况，我使用了这个自定义字符串。

[root=Root [key1=value1, key2=Key2 [key2_1=value2_1], key3=Key3 [key3_1=value3_1, key3_2=Key3_2 [key3_2_1=value3_2_1], key3_3=Key3_3 [key3_3_1=value3_3_1]], key4=value4]]

并解析它。

{ 
    "root":{ 
     "key2":{ 
     "key2_1":"value2_1" 
     }, 
     "key3":{ 
     "key3_2":{ 
      "key3_2_1":"value3_2_1" 
     }, 
     "key3_3":{ 
      "key3_3_1":"value3_3_1" 
     }, 
     "key3_1":"value3_1" 
     }, 
     "key1":"value1", 
     "key4":"value4" 
    } 
}

据我所知，它应该如何解析。另外，请记住，不要去掉这些值，因为逻辑取决于数值末尾的空格，而这些空格应该有dict s作为值（如果这是有意义的）。

来源

2014-10-29 09:00:12 aa8y

自定义格式为JSON

回答

相关问题