E
It's a long story, you can put a tea. I also have no opportunity to copy all of the records in response, so the documentation will have to be read. There won't be any experiments. https://docs.python.org/3/library/dis.html - The author of the question is supposed to be playing with him and decided to dig deeper.It's a voluminous question, but since it's popular, I'll touch a little on this topic for one selected implementation and one selected version-- CPython 3.6.1 - this version, because it's the last and the answer will be outdated later, and because the question is clearly listed in version 3.5. Well, in the last versions, it's pretty easy to figure out that there's no difficulty in ripping your cap off. The reference point will be the module https://docs.python.org/3/library/py_compile.html ♪ Why is with her, not the C of the originals? Этот модуль предоставляет функцию для генерации байт-кода из исходников♪ What the doctor wrote.We'll also pre-write a test example that will be reviewed and defined in a separate file. test.py♪import time
def my_gen():
for i in range(10):
time.sleep(1)
yield i**2
if name == "main":
for elem in my_gen():
print(elem)
Now, basically the file we can generate the whitecode like this:import py_compile
py_compile.compile("test.py", "test.pyc")
Growth *.pyc It's just an expansion, a name agreement, and it could be anything, it's the contents.Start as usual. python test.pyc♪Now's the time to draw attention to the function that the whitecode has generated. You can see the origins on your own, I'll give you a brief extract:The first thing that's going on is a definition of the way and name for the weekend bytecode. We pointed it out clearly. If not, the route is determined according to https://www.python.org/dev/peps/pep-3147/ / https://www.python.org/dev/peps/pep-0488/ And it depends.The originator in the form of the text is transformed into a bayta. That's what another module is responsible for. https://docs.python.org/3/library/importlib.html#module-importlib I mean, it's cool. https://docs.python.org/3/library/importlib.html#importlib.machinery.SourceFileLoader ♪ Suddenly, all the code that's being implemented is that with _io.FileIO(path, 'r') as file:
return file.read()
I mean, the source is just reading. In our code, it looks like:loader = importlib.machinery.SourceFileLoader('<py_compile>', file)
source_bytes = loader.get_data(file)
The origins are transformed into AST - Abractic Syntaxy Treevo. There's no way to describe the way the source travels through all the functions and checks, but in brief: https://docs.python.org/3/library/functions.html#compile :code1 = compile(source_bytes, file, mode='exec', dont_inherit=True)
code = loader.source_to_code(source_bytes, file)
print(code, type(code))
print(code == code1)
I copied their source. py_compile (sighs)loader.source_to_code) and wrote itself using the function compile - the result is just the same. The result was a copy of the class from the module https://docs.python.org/3/library/code.html#module-code ♪Finally, most importantly, the transformation of the AST into the whitecode. In py_compile, this is done:source_stats = loader.path_stats(file)
bytecode = importlib._bootstrap_external._code_to_bytecode(
code, source_stats['mtime'], source_stats['size'])
sourcestats It looks like, {'mtime': 1493229661.7715623, 'size': 180}♪ mtime - is the time of change in the source expressed in some special timestamp. We'll need more.Contents code_to_bytecode:data = bytearray(MAGIC_NUMBER)
data.extend(_w_long(mtime))
data.extend(_w_long(source_size))
data.extend(marshal.dumps(code))
I was surprised, but the very transformation into the whitecode takes four lines. MAGIC_NUMBER - A magic word, new to every new version. For 3.6.1. (3379).to_bytes(2, 'little') + b'\r\n'♪ A list of all magic words can be found in the file. importlib/_bootstrap_external.py♪ Please note that, in addition to the number, the transfer of the carriage is still in place. Function _w_long - That's it. (int(x) & 0xFFFFFFFF).to_bytes(4, 'little')- You can check HEX as an editor on the actual content of the file's pyc. I mean, the code itself begins after. 2 + 2 + 4 + 4 Bait. It's all petty, the real purpose is a model. https://docs.python.org/3/library/marshal.html#marshal.dumps and function https://docs.python.org/3/library/marshal.html#marshal.dumps ♪ marshal - That's it. модуль, содержащий функции, которые могут читать и писать значение переменных в бинарном формате. Формате специфичен для Python, но независим от архитектуры конечной машины♪Sadly, there's no way to go on the python, so we'll have to go to the originals, so you can follow the fate of the facility facility. marshal.dumpsbut eventually the object hits the bayta. https://github.com/python/cpython/blob/3.6/Python/marshal.c#L334 ♪ It's long, but concerning Simple. Marshall can put a lot of objects into the file, but we're interested in a type of object. code - for that. https://github.com/python/cpython/blob/3.6/Python/marshal.c#L532 if the object is not a code. As can be seen from this function, there is no doubt a lot of information, such as the type of operation (op_code), arguments, the rationale of the arguments is the simplest of the list. What these variables mean specifically, you can see (there are comments that have been filled) in the structure of PyCodeObject. https://github.com/python/cpython/blob/3.6/Include/code.h#L21 Not all of them. It just starts. We can see what's going on, on the bikes, writing a file in the pys and not having to go to C. The thing is, all the necessary attributes have a piton facility. code♪ I put the arguments in the order in which they write in the pys:args = ['co_argcount', 'co_kwonlyargcount', 'co_nlocals', 'co_stacksize', 'co_flags', 'co_code', 'co_consts',
'co_names', 'co_varnames', 'co_freevars', 'co_cellvars', 'co_filename', 'co_name', 'co_firstlineno',
'co_lnotab']
for arg in args:
print(arg, getattr(code, arg, None))
Having printed the meaning of the arguments on the screen, as well as deductions (1 challenge) w_long He's writing four bikes, W_TYPE You can easily, with the help of the HEX editor, see what you're writing. Analyse co_code you can already dis-- in new versions, he's got a cozy method. https://docs.python.org/3/library/dis.html#dis.get_instructions :for instruction in dis.get_instructions(code):
print(instruction)
There's also a white room in it, and an orange code and whatever it takes.In order to secure the reader, you can fully retaliate with your hands. pyc file. I'm not gonna write the whole password-- it's a bit costly, but from what it says, you can write the password completely:from io import BytesIO
import struct
import datetime
byte1_little = b"<c"
byte2_little = b"<h"
byte4_little = b"<I"
byte1_big = b">c"
byte2_big = b">h"
byte4_big = b">I"
INTEGER_TYPE = int.from_bytes(b'i', 'little')
CODE_TYPE = int.from_bytes(b'c', 'little')
SHORT_TUPLE_TYPE = int.from_bytes(b')', 'little')
NONE_TYPE = int.from_bytes(b'N', 'little')
def parse_string(raw_bytes):
# Наш рулевой тут - https://github.com/python/cpython/blob/3.6/Python/marshal.c#L427
# Байтик на тип, четверочку на размер
SIZE = int.from_bytes(raw_bytes.read(4), 'little')
print("BYTES TO READ: ", SIZE)
s = raw_bytes.read(SIZE)
return s
def parse_code(raw_bytes):
CO_ARGCOUNT = struct.unpack(byte4_little, raw_bytes.read(4))[0]
co_kwonlyargcount = struct.unpack(byte4_little, raw_bytes.read(4))[0]
CO_NLOCALS = struct.unpack(byte4_little, raw_bytes.read(4))[0]
CO_STACKSIZE = struct.unpack(byte4_little, raw_bytes.read(4))[0]
CO_FLAGS = struct.unpack(byte4_little, raw_bytes.read(4))[0]
print("CODE STATS:", CO_ARGCOUNT, co_kwonlyargcount, CO_NLOCALS, CO_STACKSIZE, CO_FLAGS)
CODE_TYPE = raw_bytes.read(1)
# Тип строка
assert CODE_TYPE == b's'
CODE_ITSELF = parse_string(raw_bytes)
return CODE_ITSELF
def parse_long(raw_bytes):
VALUE = int.from_bytes(raw_bytes.read(4), 'little')
print("LONG VALUE:", VALUE)
return VALUE
def parse_none(raw_bytes):
# Уже отпрасили
return None
def parse_tuple(raw_bytes):
# Размер у маленького - 1 байт, а не 4
# https://github.com/python/cpython/blob/3.6/Python/marshal.c#L471
SIZE = int.from_bytes(raw_bytes.read(1), 'little')
print("TUPLE LEN:", SIZE)
# Следующий тип - SHORT_TUPLE. Также по аналогии
for index in range(SIZE):
NEXT_TYPE = int.from_bytes(raw_bytes.read(1), 'little')
if NEXT_TYPE == INTEGER_TYPE or NEXT_TYPE == INTEGER_TYPE | 128:
print("LONG TYPE")
parse_long(raw_bytes)
# None 1 байт
elif NEXT_TYPE == NONE_TYPE:
print("NONE TYPE")
parse_none(raw_bytes)
# code
elif NEXT_TYPE == CODE_TYPE or NEXT_TYPE == CODE_TYPE | 128:
print("CODE TYPE")
parse_code(raw_bytes)
elif NEXT_TYPE == SHORT_TUPLE_TYPE:
parse_tuple(raw_bytes)
with open("test.pyc", "rb") as f:
raw_bytes = BytesIO(f.read())
MAGIC_NUMBER = struct.unpack(byte2_little, raw_bytes.read(2))[0]
assert MAGIC_NUMBER == 3379
пропускаем \r\n
raw_bytes.read(2)
TIMESTAMP = struct.unpack(byte4_little, raw_bytes.read(4))[0]
print(datetime.datetime.fromtimestamp(TIMESTAMP))
SIZE = struct.unpack(byte4_little, raw_bytes.read(4))[0]
print(SIZE)
OBJ_TYPE = int.from_bytes(raw_bytes.read(1), 'little')
Объект типа TYPE_CODE - см. https://github.com/python/cpython/blob/3.6/Python/marshal.c#L46
Флаг 128 выставляется в функции https://github.com/python/cpython/blob/3.6/Python/marshal.c#L288
assert OBJ_TYPE == CODE_TYPE | 128
parse_code(raw_bytes)
NEXT_TYPE = int.from_bytes(raw_bytes.read(1), 'little')
Small tuple
if NEXT_TYPE == SHORT_TUPLE_TYPE:
parse_tuple(raw_bytes)
The code is somewhat incomplete, only a few types are being processed, but I hope the idea itself is understandable - the code is a mirror reflection of the function. dumps - only used there. w_object (write object) and your sheets for simple types, you need to. r_object (read object) and simple readings. https://github.com/python/cpython/blob/3.6/Python/marshal.c#L910 , it's only time to read it carefully.