It is not a good idea to use json for such purposes:JSON is a text format (not intended for binary data, which are performed files, libraries, resource files - for example, images).JSON data are stored in a structured manner, so they have to read the entire file, which can lead to overflows of accessible memory if the file is large. On the other hand, binary data can be collected in a binary file (even it's gonna be a blunt sticker of files consecutively behind each other, or even better if it's a concise format, even if it's just a ZIP, and in JSON, keep information on the names of the files where they're archived, the substitution of each file inside the common binary file (if it's just stick together).If you wish, the JSON could be attached to the same binary file, for example, at the end of the file. At the end of adding more comfortable than at the beginning, since all the information on the files will already be known (the same change from the start of the common file for each of the files), and when the new files are added, it will be possible to cut off the hospice, finish the new file, and attach the new whistle to all the old information, plus the new file. If you add it to the start of the file, first you have to record JSON, pre-recording all the shifts, then finish all the files, and if you add a new file, you're gonna have to write a headline in the new JSON file, and then you'll have to sign all the data from the old file. By the way, ZIP's "head" format is stored at the end of the file.The outcome is a large volume binary data stored separately, meta-information separate.How to get some files in one file:import shutil
def concat_files(src_files, dest):
offset = 0
with open(dest, 'wb') as dest:
for filename in src_files:
with open(filename, 'rb') as file:
shutil.copyfileobj(file, dest)
new_offset = dest.tell()
yield filename, offset, new_offset - offset
offset = new_offset
files = ['abc.txt', 'def.txt', 'xyz.txt']
offsets = list(concat_files(files, 'file.bin'))
print(offsets) # [('abc.txt', 0, 1810), ('def.txt', 1810, 129769), ('xyz.txt', 131579, 2197)]
The function is taking the file list and the name of the exit file, recording all the files from the list to one, and returning for each file his shift from the start of the file (Bate number from which the file begins) and the size of the file. To read one file from the general file, we need to move to its beginning and count the number of byte as equal to the original file.The list can already be maintained in JSON and used in any way.Function https://docs.python.org/3/library/shutil.html#shutil.copyfileobj It's possible to copy large files without putting them in full memory. Coping takes place in blocks, the size of the block can be subset by a 3-metre.File extraction function:def extract(file_name, dest_name, offset, size, block_size=1024):
with open(file_name, 'rb') as file:
file.seek(offset)
with open(dest_name, 'wb') as dest:
while size > 0:
block = file.read(block_size if size > block_size else size)
dest.write(block)
size -= block_size
Пример использования:
extract('file.bin', 'def.txt', 1810, 129769)
It must be borne in mind that the files will be stored in a non-compressive manner, and many files are tightened, so it would be possible to use even a simple ZIP instead of its binary format, especially since its support is in the Python box.