You don't have to keep a tree. You can just keep the counters for each symbol at the beginning of the file, and the tree can be rebuilt before the decoding. If you keep 8 byte, you'll need 8B * 256 = 2KiB, which is not much.UPDIf 2KiB is pathetic, the enumerators may be stored more compactly. For example, introduce the following format for the counter: prefix + counter. The prefix is a three-bit whole number equal to the size of the counter in the whites. Examples:The counter is 0. Maintains 1,000 (3 bat).The invoice is 222. retained at 00111011110 (11 battles).The counter is 1024. Maintained at 0100010000000 (19 battle).This is the value of the retained data for the n counter will be seil(ceil(log_2(n)) / * 8 + 3 battles.UPD2I've come up with another way that I think is better than the previous one.
For starters, I'm gonna use the tree algorithm:nodes = [Node(WEIGHT[c], c) for c in ALPHABET].sort(reverse=True)
while nodes.size() != 1:
l, r = nodes[0:2]
nodes = nodes[2:]
new_node = create_node_with_children(l, r)
insert_index = 0
if nodes.size() > 1: # на 2-x последних шагах уже не важен порядок
while insert_index < nodes.size() and nodes[insert_index].weight > new_node.weight:
insert_index += 1
nodes[insert_index:insert_index] = [new_node]
If you look carefully at the code, you can understand that in order to restore the tree's topology, it's enough for us to know the beginning of the symbols in the first place. nodes and insert_index every step. We'll count how much memory we need.The initial reset will require 255 Byte (the first/last symbol may not be stored but be obtained by exception).It is further noted that the number of algorithms is fixed (255) and i-itheration (from zero) insert_index is within the range [0, 255 - i), so we can spend less battle on storage at late stages. insert_index♪ Specifically: i | число бит на индекс
[0, 127) | 8
[127, 191) | 7
[191, 223) | 6
[223, 239) | 5
[239, 247) | 4
[247, 251) | 3
[251, 253) | 2
[254, 256) | 0 // тут не надо ничего сохранять
For the storage of box indices, 127 * 8 + 64 * 7 + 32 * 6 + 16 * 5 + 8 * 4 + 4 * 3 + 2 * 2 = 1784 beat = 223 Byte.The total is 255 + 223 = 478 Byte, the number not dependent on the size of the input data, different from the method described above.