Python，将json/dictionary对象迭代地写入一个文件（每次一个）

我有一个很大的for loop，我在其中创建了json对象，我希望能够将每次迭代中的对象写入一个文件。我希望稍后能够以类似的方式使用该文件（一次读取一个对象）。我的json对象包含换行符，我不能将每个对象转储为文件中的一行。我该如何做到这一点？Python，将json/dictionary对象迭代地写入一个文件（每次一个）

为了使它更具体，考虑以下因素：

for _id in collection: 
    dict_obj = build_dict(_id) # build a dictionary object 
    with open('file.json', 'a') as f: 
     stream_dump(dict_obj, f)

stream_dump是我想要的功能。

请注意，我不想创建一个大列表并使用类似json.dump(obj, file)这样的转储整个列表。我希望能够在每次迭代中将对象追加到文件中。

谢谢。

来源

2016-04-24 CentAu

如果我不明白你的问题是错误的，似乎有可能编写一个分隔符行，在你写对象之后的每一次迭代中，你的数据不会有像“-----”那样的分隔符，当你看到分隔符时创建一个新对象。 – alpert

啊，我明白了。这绝对有效。我认为可能有其他流处理解决方案。 – CentAu

您需要的JSONEncoder一个子类，然后代理build_dict功能

from __future__ import (absolute_import, division, print_function,) 
#      unicode_literals) 

import collections 
import json 


mycollection = [1, 2, 3, 4] 


def build_dict(_id): 
    d = dict() 
    d['my_' + str(_id)] = _id 
    return d 


class SeqProxy(collections.Sequence): 
    def __init__(self, func, coll, *args, **kwargs): 
     super(SeqProxy, *args, **kwargs) 

     self.func = func 
     self.coll = coll 

    def __len__(self): 
     return len(self.coll) 

    def __getitem__(self, key): 
     return self.func(self.coll[key]) 


class JsonEncoderProxy(json.JSONEncoder): 
    def default(self, o): 
     try: 
      iterable = iter(o) 
     except TypeError: 
      pass 
     else: 
      return list(iterable) 
     # Let the base class default method raise the TypeError 
     return json.JSONEncoder.default(self, o) 


jsonencoder = JsonEncoderProxy() 
collproxy = SeqProxy(build_dict, mycollection) 


for chunk in jsonencoder.iterencode(collproxy): 
    print(chunk)

输出继电器的工作：

[ 
{ 
"my_1" 
: 
1 
} 
, 
{ 
"my_2" 
: 
2 
} 
, 
{ 
"my_3" 
: 
3 
} 
, 
{ 
"my_4" 
: 
4 
} 
]

要由块读回块，你需要使用JSONDecoder和传递可致电object_hook。这个钩子会随着每个新解码的对象（在你的列表中的每个dict）被调用，当你调用JSONDecoder.decode(json_string)

来源

2016-04-24 19:23:28 mementum

完美，谢谢。只是一个问题，'SeqProxy'做了什么？ – CentAu

您的集合将不会为每个项目返回一个“字典”（您在每个项目上调用'build_dict'），并且'SeqProxy'封装您的集合，并在'JSONEncoder'请求下一个项目时返回'build_dict'的结果列表来序列化它。 – mementum

如果我错了，请纠正我：这解决了两个问题：（a）代理需要调用集合的特定子集上的自定义'build_dict'函数; （b）通过'iterencode'函数已经由JSON模块提供了按块进行串行化的任务。 - 我专注于（b），直到意识到完全关于（a），才明白代码。 – lenz

既然你生成自己的文件，你可以简单地写出每行一个JSON对象：

for _id in collection: 
    dict_obj = build_dict(_id) # build a dictionary object 
    with open('file.json', 'a') as f: 
     f.write(json.dumps(dict_obj)) 
     f.write('\n')

然后通过迭代线读他们：

with open('file.json', 'r') as f: 
    for line in f: 
     dict_obj = json.loads(line)

这是不是一个伟大的通用的解决方案，但它是一个简单的，如果你是两个发电机和消费国。

来源

2016-04-24 19:49:23 larsks

-1

简单的解决方案：

从您的JSON文档的所有空格字符：

import string 

def remove_whitespaces(txt): 
    """ We shall remove all whitespaces""" 
    for chr in string.whitespace: 
     txt = txt.replace(chr)

很明显，你也json.dumps(json.loads(json_txt))（顺便说一句，这也验证该文本是一个有效的JSON）。

现在，您可以将文档写入每个文件一行。

解决方法二：

创建[AnyStr]木卫一流，在IO写一个有效的文件，（你的文档作为一个对象或列表的一部分），然后写在一个文件中的IO（或上传到云）。

来源

2016-04-24 20:11:53

如果空白是内容的组成部分，会发生什么？ – mementum

好的观察！无论如何，json.dumps（json.loads（json_txt））在这种情况下是完美的。 –

你为什么要删除所有的空白？我不明白这是如何连接到OP。如果你想在一行上完成JSON转储，请执行'json.dump（... indent = None）'（实际上，它已经是默认了）。无论如何，文本节点内的换行符都会被转义。 – lenz

Python，将json/dictionary对象迭代地写入一个文件（每次一个）

回答

相关问题