如何分析ndjson在菲罗与NeoJSON

我想分析ndjson（换行分隔的JSON）数据NeoJSON上菲罗Smalltalk的。如何分析ndjson在菲罗与NeoJSON

ndjson数据是这样的：

{"smalltalk": "cool"} 
{"pharo": "cooler"}

目前，我在我的文件流转换为字符串，将它在新行，然后解析使用NeoJSON的单件。这似乎使用了一个不必要的（和非常巨大的）内存和时间，可能是因为将流转换为字符串，反之亦然。什么是完成这项任务的有效方法？

如果你看一下样本数据：NYPL-publicdomain: pd_items_1.ndjson

来源

2016-01-20 MartinW

这个斯文（NeoJSON的作者）在菲罗用户邮件列表的答案（他是不是在SO）：

读“格式”很简单，只需继续做#next每个JSON表达式（空格被忽略）。

| data reader | 
data := '{"smalltalk": "cool"} 
{"pharo": "cooler"}'. 
reader := NeoJSONReader on: data readStream. 
Array streamContents: [ :out | 
    [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].

防止中间数据结构也很容易，使用流媒体。

| client reader data networkStream | 
(client := ZnClient new) 
    streaming: true; 
    url: 'https://github.com/NYPL-publicdomain/data-and-utilities/blob/master/items/pd_items_1.ndjson?raw=true'; 
    get. 
networkStream := ZnCharacterReadStream on: client contents. 
reader := NeoJSONReader on: networkStream. 
data := Array streamContents: [ :out | 
    [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ]. 
client close. 
data.

花了几秒钟的时间，毕竟50K的项目在网络上是80MB +。

来源

2016-01-21 12:52:24 EstebanLM

请问，如果你打开一个新的ReadWriteStream它的工作，先写到$ {到它，然后流用逗号分隔的原始数据流中的所有内容到它，然后写一个尾随$}。产生的流应该适合NeoJSON ...？这可能是一个STTCPW攻击的问题，但W是impprtant ;-)它应该更快，更少的内存消耗，因为NeoJSON只会做一个传球。它。

来源

2016-01-21 11:18:24

我想，你的意思是$ [和$]做一个数组？这很好。 – MartinW

我想你是对的;-) –

你可以尝试这样的事：

| input reader | 
input := FileStream readOnlyFileNamed: 'resources/pd_items_1.ndjson.txt'. 
[ 
Array 
    streamContents: [ :strm | 
     | ln | 
     [ (ln := input nextLine) isNil ] 
      whileFalse: [ strm nextPut: (NeoJSONReader fromString: ln) ] ] ] timeToRun.

除非这是你尝试过什么已经...

来源

2016-01-21 12:50:09 Carlo

如何分析ndjson在菲罗与NeoJSON

回答

相关问题