2013-04-08 138 views
-1

我想重新格式化json文件并删除文件的很大一部分。这是原始的json文件。REGEX重新格式化

 "2597401":[{"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2- 5":{"libA":["libpgc.so"], 
       "flavor":["default"]}},   
       "startEpoch":"1338497979", 
       "runTime":"1022", 
       "execType":"user:binary",    
       "exec":"ft.D.64", 
       "numNodes":"4", 
       "sha1":"5a79879235aa31b6a46e73b43879428e2a175db5", 
       "execEpoch":1336766742, 
       "execModify":"Fri May 11 15:05:42 2012", 
       "startTime":"Thu May 31 15:59:39 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"1881400168","text":"239574","data":"22504"}}, 
       {"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"], 
       "flavor":["default"]}}, 
       "startEpoch":"1338497946", 
       "runTime":"33" "execType":"user:binary", 
       "exec":"cg.C.64", 
       "numNodes":"4", 
       "sha1":"caf415e011e28b7e4e5b050fb61cbf71a62a9789", 
       "execEpoch":1336766735, 
       "execModify":"Fri May 11 15:05:35 2012", 
       "startTime":"Thu May 31 15:59:06 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"29630984","text":"225749","data":"20360"}}, 
       {"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2-5": {"libA":["libpgc.so"], 
       "flavor":["default"]}}, 
       "startEpoch":"1338500447", 
       "runTime":"145", 
       "execType":"user:binary", 
       "exec":"mg.D.64", 
       "numNodes":"4", 
       "sha1":"173de32e1514ad097b1c051ec49c4eb240f2001f", 
       "execEpoch":1336766756, 
       "execModify":"Fri May 11 15:05:56 2012", 
       "startTime":"Thu May 31 16:40:47 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"456954120","text":"426186","data":"22184"}},{"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"], 
       "flavor":["default"]}}, 
       "startEpoch":"1338499002", 
       "runTime":"1444", 
       "execType":"user:binary", 
       "exec":"lu.D.64", 
       "numNodes":"4", 
       "sha1":"c6dc16d25c2f23d2a3321d4feed16ab7e10c2cc1", 
       "execEpoch":1336766748, 
       "execModify":"Fri May 11 15:05:48 2012", 
       "startTime":"Thu May 31 16:16:42 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"199850984","text":"474218","data":"27064"}}], 

对于每个JobId,我只想保留“exec”字段和JobID。我怎样才能构造一个正则表达式来哑数据的其余部分?理想情况下,我需要以下内容: JobID exec1 exec2 exec3
有没有办法做到这一点?

在此先感谢。

+0

你的意思是'{“2597401”:[{“JobID”:2597401,“exec”:“ft.D.64”}]}'? – 2013-04-08 00:22:45

+0

排序最初的数字是JobId,所以理想情况下我想要这样的东西。 2597401 ft.D.64 cg,C,64 mg.D.64 lu.d.64同一个工作有多个exec,所以我想要jobID和exec。 – amber4478 2013-04-08 00:26:09

+4

使用将读取JSON的JSON库,让您操作它并将其保存。与您的代码不同,该JSON库已经被写入,测试和调试过。正则表达式不是一个魔术棒,你在涉及文本的每一个问题上都会挥手。 – 2013-04-08 00:26:28

回答

2

因为您没有指定您的RegEx引擎,我会假设您正在使用作为我的答案。

基于JSON格式,你可以使用这个正则表达式匹配不需要双用什么来代替:

/(,\s*(*SKIP))?+("(?!jobID"|exec)[^"]+"\s*+:\s*+("[^"]*"|{(?2)?+(?>,\s*(?2))*}|\[(?3)?+(?>,\s*(?3))*\]))(?(1)|,?)/g 

这里是你将正则表达式替换后下令什么:

 "2597401":[{"jobID":"2597401", 
       "execType":"user:binary",    
       "exec":"ft.D.64", 
       "execEpoch":1336766742, 
       "execModify":"Fri May 11 15:05:42 2012"}, 
       {"jobID":"2597401" "execType":"user:binary", 
       "exec":"cg.C.64", 
       "execEpoch":1336766735, 
       "execModify":"Fri May 11 15:05:35 2012"}, 
       {"jobID":"2597401", 
       "execType":"user:binary", 
       "exec":"mg.D.64", 
       "execEpoch":1336766756, 
       "execModify":"Fri May 11 15:05:56 2012"},{"jobID":"2597401", 
       "execType":"user:binary", 
       "exec":"lu.D.64", 
       "execEpoch":1336766748, 
       "execModify":"Fri May 11 15:05:48 2012"}], 

由于您可以看到,结果字符串在'"jobID":"2597401" "execType":"user:binary"'内有无效的语法,这是中的语法错误给定的数据...

并提供了解释:

/(,\s*(*SKIP))?+ 
# Attempts to match a comma and whitespace, 
# without backtracking; 
# And if the comma is matched, use (*SKIP) verb, 
# which advances the pointer if we fail to match the comma. 

# Key - Value pairs not worthy of keeping. 
(
    "(?!jobID"|exec)[^"]+" # Check if we like this key. 
    \s*+:\s*+ # The colon, advance whitespaces. 
    (# Check keys recursively. 
    "[^"]*" 
     # String literals, boring. 
    | {(?2)?+(?>,\s*(?2))*} 
     # Or: An object storing some key-value pairs 
     # we don't care about. 
    | \[(?3)?+(?>,\s*(?3))*\] 
     # Or: An array storing some values 
     # we don't care about. 
) 
) 
(?(1)|,?) 
# Balance the comma (so the result string is still valid JSON) 
/gx 

这里是一个regex demo