Russ Hunt's Blog: Computer Architecture

Monday, November 16, 2020

Parallel Computer Architecture: Efficient Micro Parallel Synchronization Mechanisms

I have often wondered why modern computer instruction set architectures do not have more efficient parallel synchronization mechanisms. Current mainstream microprocessor designs currently support two types of parallelism.

Very fine grain

Hardware based implicit instruction level
Implemented via advanced pipeline register renaming.
Synchronization delays on the order of a single cycle.

Very course grain

Software based explicit thread synchronization primitives
Implemented via atomic memory instructions.
Synchronization delays on the order of tens of thousands of cycles or more.

With CPU clock frequencies beginning to plateau it may be time to revisit architectural synchronization models as a method to continue improving overall program performance. If we have any bright PHD candidates reading this fishing for a dissertation topic, please consider this.

Parallel Architecture Models

At the process level we have the architectural notion of an interrupt. But at the thread level this does not exist. We have to rely on threads spinning in a loop reading and writing a shared memory location together with memory synchronization barriers and no architectural specification about how long this can take. This is ridiculous. We can't have efficient parallel programming if the programming model has no mechanism to facilitate it. We need some data queue or message passing mechanism or interrupt that operates at the instruction architecture level if we are to enable efficient parallel programming.

Explicit Instruction Level Parallelism

I would like to see an efficient software visible instruction level synchronization mechanism. For example, something like a 'Queue Register'. Some existing IO registers track read and write state. I'm thinking some general purpose registers could similarly be architected for managing data flow synchronization at the register data level. Such registers could essentially stall the execution pipeline on reads until a write to that register has occurred. So the register effectively acts as a 'data queue' at the instruction execution level. This would enable software control of fine grain parallelism, opening up potentially more real parallelism than relying on hardware to extract parallelism from an inherently sequential programming model.

Since all compute state needs to be visible in order to stop, save, and later restart a process, status bits will also need to track the read/write data state of each queue register. CPU pipelines could be redesigned to key off of these explicit reg data states, instead of implicit internal hardware states. Just like current hardware threads swap in whichever thread has data ready, these new threads could work the same way. The primary difference being the data ready state is now software architecturally visible.

Further note that these hardware queue registers are effectively thread state ready registers, analogous to ready state flags in operating system thread schedulers. Since these ready flags are intended for micro data level parallelism, they should be closely aligned to the real register thread state supported by the hardware, as opposed to some arbitrary virtual state that relies on time slicing and swapping threads in and out of hardware. While time slicing is theoretically possible it would blow up performance by 10000 times, entirely defeating the advantage of micro level parallelism.

So there is a different mind set when programming this level of parallelism. This type of parallelism should have some awareness of the number of hardware threads efficiently supported by hardware, as opposed to some very course grain parallelism that has little concern about real hardware thread counts. The implication is that this level of coding is more appropriate for hand coded assembly or for compilers.

References

https://riscv.org/community/directory-of-working-groups/

https://en.wikipedia.org/wiki/MIMD

https://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing

https://en.wikipedia.org/wiki/Parallel_programming_model

https://www.cs.utexas.edu/~trips/index.html

https://scholar.google.com/scholar?q=parallel+computer+architecture&hl=en&as_sdt=0&as_vis=1&oi=scholart

Sunday, October 20, 2019

Computer Architecture: Security Done Right

I don't get why it is so hard to build a secure computer. I read so much about encryption technology and software being 'cracked' and companies spending millions of dollars patching their computer's security bugs. Microsoft installs critical patches on my computer every few days and I cross my fingers every time that happens that I can boot my computer in the morning. What the hell ? It's not that hard to build a secure computer. Here is how you do it.

Secure the core

Use a Harvard computer architecture. In other words build entirely separate logic pathways (including caches) for instructions and data. This means that you can't have 'self modifying code' but so what ? Less than 1% of the code in the world is really self modifying and that code is not really required and can be worked around in other ways with modest performance cost. It is like trading turds for diamonds. Easy decision.
A little more detail for clarity for CPU designers. Yes I literally mean separate data paths and caches for instructions and data, all the way to the ALU instruction units. No buses or caches should ever be shared, period. Think entirely different systems. This guarantees that no matter what magic software does it cannot modify the expected behavior of software instructions delivered to the CPU. Period. End of story. Drop the mike. Case closed. No encryption coprocessor. No decryption keys. No asterisk. No buts. The CPU is now naturally secure by design. Trying to secure every line of software in a non Harvard architecture is like trying to stop rain one drop at a time. Totally nuts.
You can and should still have caches for performance, but since the caches aren't shared there is no concern for security. There is no coherence or complexity issue either, since by definition the instructions and data are separate you don't have to worry about coherence between the instruction and data caches.

Secure the box

All software should be delivered and run from 'ROM sticks' (ROM = Read Only Memory). Imagine a small box with several USB like slots in it. Each slot contains a stick from a trusted software vendor. Installing new software from a trusted source literally means replacing the ROM stick in that slot in your computer. Pulling the stick out of the slot 'uninstalls' it. All software runs from ROM, not a hard drive.
Finally you should then only buy software ROM sticks from trusted software vendors.

Congratulations. You now have a secure computer. Go sell a million of these to the military and get rich. You're welcome.

A couple more points for clarity and background.

The hard drive

Once the CPU core and software ROM path is secured per above the hard drive becomes the next vulnerability point to secure. This is because if software is allowed to write arbitrary information to your hard drive, this in an indirect way enables software the ability to behave in malicious unpredictable ways. This is similar to the self modifying code problem in the CPU, but is at a higher level - the file IO level. You should only buy software from trusted software vendors who certify that their software does not store any 'cookies' or such on your hard drive to effect software behavior.
The good news is that if the CPU and box are secured per steps 1 and 2 above the options for malicious actors to do malicious things to your computer are limited to them finding bizarre behavior effects for unusual data. Even then since they can't execute code on your computer their hands are indeed very very tied. Cool.

JavaScript is evil

JavaScript by design is intended to be downloaded per website and executed on the users local machine. Isn't that like parking an RV in a crime ridden neighborhood in the middle of the night, opening the front door and putting a neon Welcome sign above the door, and then acting surprised when a stranger comes in and misbehaves. Sheesh.
For this and several other technical reasons let me suggest RISCV discussed in a previous post for those looking for an open universal machine language, as an alternative to Java.

Further Info:

https://en.wikipedia.org/wiki/Security-focused_operating_system

Labels

Monday, November 16, 2020

Parallel Computer Architecture: Efficient Micro Parallel Synchronization Mechanisms

Sunday, October 20, 2019

Computer Architecture: Security Done Right