The main work of the processor is not to transfer information but to transform it. The Register is the same operational memory, but from which there is direct divorce to a bunch of performing devices that produce arithmetic and other data-related actions. These lines are very large. Each level of registry will have its own and a large set of transistors for specific operations. Hence, the complexity of increasing the number of grades. With the growth of the size of the register, at least doubles the volume of all performing devices.Look at the team system, any data transformation requires the participation of at least one register. And part of the operation is exclusively in the registers. In x86, the architecture can be added to the memory register. But for example, a change or multiplication of the memory cell cannot be effected; the values of the two memory cells cannot be decorated without pre-recording one of them, because the operation ' s executive has a direct divorce only with the register.B: Why can't we read 64 bats over a 16-dimensional register?We can, but where to read and why? In fact, modern processors do, fill the inner kees and operate the discharge of the tyre, registers do not participate. We've read 64 bats in the cab, and now we need to multiply them by 3 for example. And we have a 16-strong record, how do we multiply? Right, parts, using a lot of extra changes and spending precious tacts on it. The tyre size is therefore secondary. The key is the decoration of the register. And they called it a machine word.B: But is it necessary to keep the addresses in the registers?Yes, I do. The processor must be told, take the data. Where is that? Remember? And then what's it gonna look like, take the address at the address over to that address in the instructions... And if we have to work in a row with the datablock (we are in the process of processing the mass) and this address needs to be increased (i.e., to perform the deposit we can only make in the register)By the way, team level and processor deficiency are different things. MIPS packed all teams in 32 bats. And the x86 platform from time immemorial was with a variable range of operations, from short one-bite to long monsters with a bunch of prefixes. Processor design = default of the registry = maximum amount of information processed by one instruction (usually consisting of a basic code, any SSE shall not be taken).Rapid action - who said deficiency plays a key role. Yeah, the decoration affects. Boom for 64 battle processors and LOs is an excellent marketing example. 64 battle code often It's slower than 32 hits. If the programme does not need to address more than 4 Gb memory and its code stores 64 battle addresses, the programme is twice as large. More size - more reading in the kash. More memory. It's starting to chase the hygabytes of the operatives... It's even reversed now, and it's developing x32 ABI-- 32 battle codes in 64 battle modes.But let's take the RSA encryption that is used in the same common SSL. He needs a very large number of hard calculations. Let's say we don't have the special instructions of the processor for him. Of course, if the processor operates 64 battle registries, it will do twice as fast as it is because it is twice as capable of processing information in one tact. Yeah, it's hard to overestimate the high-volume job gains. B: The 64-dimensional LO works for the 64-dimensional processor, 32-strong.No, 64 battled LOs consist of 64 battle code capable of commemorating 64 battle addresses. Of course, she can only do that on 64 battleships. The appearance of the DS was inevitable. Marketing has played a significant role, though. 90 per cent of the competitors believe that 64 OSBs are needed to address over 4 Gb of operational memory on the intel platform. Yeah, Windows was forced into this restriction. Intel processes in 32 battle mode https://ru.wikipedia.org/wiki/PAE Up to 64 Gb of operations, with one trial limited to 4 Gb. 32 Bit linux feels good with these volumes.With regard to the history and complexity of the construction of 128 battle registers... the issue is only at cost. Yes, in some systems, not a mass market, it's been done a long time ago, and there's no need for a mass market. And then it should have been bassy, because, as we said at the beginning, every battle of the register is a bunch of performing devices, and when those production technologies put so many traistors on the crystal, it was, say, difficult. The 128 full-fledged processors are simply not needed, especially for the mass market, to address more than 64 EBs in memory wherever they may be found. In fact, there are 16 SSE registers of 128 bats in all processors intel now, they're not general stores, they're for calculation. And on modern Xeon for serious calculations, 32 registers of ZMM for 512 battles https://ru.wikipedia.org/wiki/AVX ) ♪ ♪