To get myself back up to speed with the MFM, I went back and reread Dave Ackley’s two papers on the subject: “Pursue Robust Indefinite Scalability”, Ackley, Cannon, 2011, and “A Movable Architecture for Robust Spatial Computing”, Ackley, Cannon, Williams, 2011.
Then I dove further, reading publications cited in these two papers; at least, those that are accessible on the Internet. There are two main design considerations that have come to mind as I have been reading through this literature.
The first is that for very large and indefinitely scalable arrays, power considerations will dominate any design effort and asynchronous arrays are the only ones that satisfy this design constraint. Why this is so is that, neglecting leakage currents, CMOS circuits only consume power when being switched. For large synchronous CMOS logic circuits, the current consumed with the constant clock switching of 100's of MHz to GHz speeds, becomes the dominant power drain. For example, the datasheet for the fifth generation Intel core processor family lists I-ccmax values from 18 to 40 Amps!
Reducing power consumption in CMOS circuitry means clocking individual logic elements only when necessary. How this works in practice is that each individual processor cell in an asynchronous array is given a ring oscillator to act as its own local clock. The crucial feature of a ring oscillator is that it only turns on when an individual processor cell is accessed, then turns off when the cell’s internal programming has run to a stopping point. Therefore, when not in use, the individual processor cells turn off, effectively consuming no power at all. The potential drawback of this scheme is that once a cell has gone to “sleep”, it will only wake up again when accessed from a neighboring cell; that is, only when its ring oscillator gets “rung” again.
The second consideration came to mind as I read through this paper: “Embedding Universal Delay-Insensitive Circuits in Asynchronous Cellular Spaces”, Lee, Adachi, Peper, Morita, 2003.
It struck me as a good example of the kind of disconnect one finds between the theory of array computation versus the reality of hardware design. In this case, the authors considered the situation of a communication race condition; that is, the situation when two neighboring cells initiate communication at the same time. The question, then, of “who goes first?” comes into play with the possibility of a resultant lockup situation occurring. The authors presented a theoretical fix which they called delay-insensitive circuits. Theoretically, this is a fascinating piece of work in that it shows that every synchronous array is equivalent to some asynchronous version of it. But in reality, at the timescales where such communication race conditions would occur, digital electronics starts to behave in an analog fashion, in which case the paper’s straightforward theoretical results will no longer apply.
Race conditions within asynchronous arrays will always be the norm, not the exception. Some way has to be found, so that when communication is initiated between individual cells, they will automatically know who gets to go first and who waits. Just like in any social group, there has to be some kind of dominance hierarchy imposed on the global array.
In terms of what can be accomplished in a practical sense, one has to break the array’s global symmetry – directions have to be defined: up/down, right/left, top/bottom. Then communication is given preferred directions either based on a rule set or by the addition of a “breathing mode”; that is, a phase during which all communication goes in one preferred direction, followed by an alternate period of time when all communication flows in an opposite direction, with information flowing back and forth within the array something like the tides in the ocean.
This is one of the aspects of asynchronous array design that makes simulation qualitatively different than running on bare silicon. In simulation the programming for the virtual machine can be written to arbitrate any race condition that might come up between the individual CA-atoms. But when you forgo the virtual machine and try to run your asynchronous cellular automata on bare-metal processors, then the processor coding itself has to take care of these race conditions. It’s an extra layer to the hardware design that needs to be added to the MFM concept in order to make it work for real.
No comments:
Post a Comment