2023年2月6日 17:49:39go评论102阅读模式

英文:

Write gzipped Json objects into a Json file without loading it

问题

我要将字典以gzipped json对象的形式写入一个json文件。

我有一些解决方案，但随着文件变得越来越大，附加过程变得越来越慢。因此，加载文件不是解决办法。

我在这里找到了解决方案：

def append_record_seek(data,filename):
    print('append_record_seek started with data:{data} filename:{filename}')
    with open(filename, mode="r+") as file:
        file.seek(os.stat(filename).st_size - 1)
        file.write(",{}".format(json.dumps(data)))

后来，我想要将该文件读取为一个字典列表。

以下是我的最小代码示例：

import global_variables as gv
import time
import json as json
import base64
import io
import sys
import cv2
import gzip
import numpy as np
import os
from numpy import asarray
from json import JSONEncoder
data = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
path = r'C:/Videotool/Data'
name = 'test'
filename = path + '/' + name + '.json'
isExist = os.path.exists(path)
if not isExist:
    os.makedirs(path)
os.chdir(path)
def first_writer(data, filename):
    print(f'first_writer started with data:{data} filename:{filename}')
    with open(filename, 'w') as file:
        file.write('[')
        file.write(json.dumps(data))
        file.write(',')
        file.write(']')
def append_record_seek(data, filename):
    print('append_record_seek started with data:{data} filename:{filename}')
    with open(filename, mode="r+") as file:
        file.seek(os.stat(filename).st_size - 1)
        file.write(",{}".format(json.dumps(data))
for x in range(6):
    print(f'step:{x}')
    file_exists = os.path.exists(name+'.json')
    if file_exists:
        print('file_exists')
        append_record_seek(data, filename)
    else:
        print('else')
        first_writer(data, filename)

非压缩的结果应该如下所示：

[{"brand": "Ford", "model": "Mustang", "year": 1964},
{"brand": "Ford", "model": "Mustang", "year": 1964},
{"brand": "Ford", "model": "Mustang", "year": 1964},
{"brand": "Ford", "model": "Mustang", "year": 1964},
{"brand": "Ford", "model": "Mustang", "year": 1964}]

我的结果是：

[{"brand": "Ford", "model": "Mustang", "year": 1964},,,,,,]

如果这样可以工作，我希望在写入之前对数据进行压缩。

我希望有人可以帮助。更新：

我已经获得了正确的JSON格式：

def first_writer(data, filename):
    print(f'first_writer started with data:{data} filename:{filename}')
    with open(filename, 'w') as file:
        file.write("[{}]".format(json.dumps(data)))
def append_record_seek(data, filename):
    print('append_record_seek started with data:{data} filename:{filename}')
    with open(filename, mode="r+") as file:
        file.seek(os.stat(filename).st_size - 1)
        file.write(",{}]".format(json.dumps(data))

现在我需要对数据进行压缩。

英文:

I want to write dicts as gzipped json objects into a json file.

I had some solutions, but as the file got bigger the appending process got slower and slower.
So loading the file was not the way.

I found the solution here with:

def append_record_seek(data,filename):
    print(&#39;append_record_seek started with data:{data} filename:{filename}&#39;)
    with open (filename, mode=&quot;r+&quot;) as file:
        file.seek(os.stat(filename).st_size -1)
        file.write( &quot;,]&quot;.format(json.dumps(data)) )

Later i want to read that file as a list of dicts.

Here is my minimal Code example:

import global_variables as gv
import time
import json as json
import base64
import io
import sys
import cv2
import gzip
import numpy as np
import os
from numpy import asarray
from json import JSONEncoder
data = {
  &quot;brand&quot;: &quot;Ford&quot;,
  &quot;model&quot;: &quot;Mustang&quot;,
  &quot;year&quot;: 1964
}
path = r&#39;C:/Videotool/Data&#39;
name = &#39;test&#39;
filename = path + &#39;/&#39; + name + &#39;.json&#39;
isExist = os.path.exists(path)
if not isExist:
    os.makedirs(path)
os.chdir(path)
def first_writer(data,filename):
    print(f&#39;first_writer started with data:{data} filename:{filename}&#39;)
    with open (filename, &#39;w&#39;) as file:
        file.write(&#39;[&#39;)
        file.write(json.dumps(data))
        file.write(&#39;,&#39;)
        file.write(&#39;]&#39;)
        
def append_record_seek(data,filename):
    print(&#39;append_record_seek started with data:{data} filename:{filename}&#39;)
    with open (filename, mode=&quot;r+&quot;) as file:
        file.seek(os.stat(filename).st_size -1)
        file.write( &quot;,]&quot;.format(json.dumps(data)) )
for x in range(6):
    print(f&#39;step:{x}&#39;)
    file_exists = os.path.exists(name+&#39;.json&#39;)
    if file_exists:
        print(&#39;file_exists&#39;)
        append_record_seek(data,filename)
    else:
        print(&#39;else&#39;)
        first_writer(data,filename)

the non zipped result should be looking like:

[{&quot;brand&quot;: &quot;Ford&quot;, &quot;model&quot;: &quot;Mustang&quot;, &quot;year&quot;: 1964},
{&quot;brand&quot;: &quot;Ford&quot;, &quot;model&quot;: &quot;Mustang&quot;, &quot;year&quot;: 1964},
{&quot;brand&quot;: &quot;Ford&quot;, &quot;model&quot;: &quot;Mustang&quot;, &quot;year&quot;: 1964},
{&quot;brand&quot;: &quot;Ford&quot;, &quot;model&quot;: &quot;Mustang&quot;, &quot;year&quot;: 1964},
{&quot;brand&quot;: &quot;Ford&quot;, &quot;model&quot;: &quot;Mustang&quot;, &quot;year&quot;: 1964}]

My result is : [{"brand": "Ford", "model": "Mustang", "year": 1964},,,,,,]

If that works, i want to zip the dumps before writing.

I hope somebody can help

Update:
I've got the right Json format with:

def first_writer(data,filename):
    print(f&#39;first_writer started with data:{data} filename:{filename}&#39;)
    with open (filename, &#39;w&#39;) as file:
        file.write( &quot;[{}]&quot;.format(json.dumps(data)) )
        
def append_record_seek(data,filename):
    print(&#39;append_record_seek started with data:{data} filename:{filename}&#39;)
    with open (filename, mode=&quot;r+&quot;) as file:
        file.seek(os.stat(filename).st_size -1)
        file.write( &quot;,{}]&quot;.format(json.dumps(data)) )

Now i have to get that zipped...

答案1

得分: 1

以下是代码部分的翻译：

import gzip
from copy import copy
import json
# just test data
x = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
z = {
  "brand": "Mato",
  "model": "Laatikko",
  "year": 2023
}
l = []
# populate the initial "json" in the list l
for i in range(3):
  y = copy(x)
  y["year"] += i
  l.append(y)
# write list of dicts as jsons string into file and compress it via gzip
# it doesnt really matter how this was originally done..
with open("data.gz", "wb") as f:
   f.write(gzip.compress(bytes(json.dumps(l, indent=2),"utf-8"))
# then, append a new entry to the same file -- which will get uncompressed
# with the previously stored *valid* json structure..
with open("data.gz", "ab") as f:
   f.write(gzip.compress(bytes(json.dumps(z, indent=2),"utf-8"))

希望这对你有所帮助。

英文:

NOTE: This is not the answer to the question, as there is none, this will just highlight that a single compressed file can be generated and decompressed later but it will not be valid json.

import gzip
from copy import copy
import json
# just test data
x = {
  &quot;brand&quot;: &quot;Ford&quot;,
  &quot;model&quot;: &quot;Mustang&quot;,
  &quot;year&quot;: 1964
}
z = {
  &quot;brand&quot;: &quot;Mato&quot;,
  &quot;model&quot;: &quot;Laatikko&quot;,
  &quot;year&quot;: 2023
}
l = []
# populate the initial &quot;json&quot; in the list l
for i in range(3):
  y = copy(x)
  y[&quot;year&quot;] += i
  l.append(y)
# write list of dicts as jsons string into file and compress it via gzip
# it doesnt really matter how this was originally done..
with open(&quot;data.gz&quot;, &quot;wb&quot;) as f:
   f.write(gzip.compress(bytes(json.dumps(l, indent=2),&quot;utf-8&quot;)))
# then, append a new entry to the same file -- which will get uncompressed
# with the previously stored *valid* json structure..
with open(&quot;data.gz&quot;, &quot;ab&quot;) as f:
   f.write(gzip.compress(bytes(json.dumps(z, indent=2),&quot;utf-8&quot;)))

This will result a file that looks like this when uncompressed

[
  {
    &quot;brand&quot;: &quot;Ford&quot;,
    &quot;model&quot;: &quot;Mustang&quot;,
    &quot;year&quot;: 1964
  },
  {
    &quot;brand&quot;: &quot;Ford&quot;,
    &quot;model&quot;: &quot;Mustang&quot;,
    &quot;year&quot;: 1965
  },
  {
    &quot;brand&quot;: &quot;Ford&quot;,
    &quot;model&quot;: &quot;Mustang&quot;,
    &quot;year&quot;: 1966
  }
]{
  &quot;brand&quot;: &quot;Mato&quot;,
  &quot;model&quot;: &quot;Laatikko&quot;,
  &quot;year&quot;: 2023
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将经过压缩的JSON对象写入JSON文件，而无需加载它。

问题

答案1

在pandas中比较两个列表会引发TypeError错误：“float”对象不可迭代。

连接 Pandas 行如果时间是连续的。

置信区间为二进制数据大于1的原因是什么？

Error managing bytes array while trying to code my first Burp extension

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。