Wild Iris Discovery: The GA144

The inspiration for the direction I’m heading in design-wise is the GA144 processor; an asynchronous array of 144 separate processors arranged in an 8 x 18 matrix, all fabricated together on a single die. So to understand where my design ideas are headed, it might be best to start with a description of the GA144, both its strengths and deficiencies.

For those not familiar with the GA144 from GreenArrays, here is the best short exposition of how the GA144 chip works and how it’s programmed that I’ve seen. "FD 2014 Daniel Kalny".

More information can be found at the GreenArrays web site, www.greenarraychips.com/

The GA144 is without a doubt one of the most frustratingly disappointing pieces of silicone ever made. So much so, that I’ve personally taken to calling it the Stephen Hawking of computer chips, a brilliant mind stuck in a useless body.

There is no end of projects that can be built around the GA144. The above YouTube video from Daniel Kalny is a great example of this. But the sad truth is that from a commercial point of view, the same functionality can be fit into any number of standard processor parts from companies like Microchip, Silicon Labs, TI and etc. that are both cheaper cost-wise and easier to program using already industry standard compiler tools. So, despite its potential, the GA144 has remained a silicon oddity without attaining any commercial success.

An integrated chip this intriguing in potential begs to be used for something. But the big question is, what? This is where the problems start for the GA144. Every commercial application you might think of ends up requiring more I/O pins then the designers of this part gave it. In other words, the GA144 seems to be an ASIC part designed with no specific application in mind.

The only way to get signals/data into or out of this part is through a handful of cells along the edges of the array. Which means trying to get signals to any of the internal cells of the array requires your data stream to pass through all the cells in between the edge and the one you’re targeting for data transfer. So a lot of the cells in the array end up functioning simply as connections between adjacent cells. This would be okay if each of the cells had sufficient program memory to be able to store more functionality than just acting as one member of a bucket brigade transferring data across the array.

Another frustrating aspect of the GA144 is that it was not laid out in a symmetric fashion. That is, you can’t tile single GA144s together to form a much larger array because the top/bottom and right/left edges of the chip don’t match up pin to pin. This means if you did try and use the GA144 as a tile element in a much larger array, neighboring GA144s would be forced to talk to each other through a single SERDES link between one chip and the other. This again might not be a stopping point either, except for the fact the GA144 only has two SERDES links, and both are located on the same side of the part. As I ran into design details like this last observation, it just made me want to pull my hair and scream, “What were you idiots thinking when you made this part?!”

On the other hand, there are some amazing things about the GA144. First is the asynchronous operation of the individual cells. Each cell has its own ring oscillator for its internal clock. This ring oscillator only turns on when the cell is accessed by one of its neighbors, and it only stays on until the cell completes its current program call. The cell then goes to sleep and waits until it is accessed again. The result is you have an array whose current draw can be as little one-percent or less than that of a comparable FPGA part.

This by itself might not seem like a big deal, but from a hardware design point of view, this is huge. It’s not uncommon for processors and FPGA parts to draw currents on the order of amps. For a single processor on a board, such current draws are not an issue. But if you want to start creating large arrays of processors, you’re very quickly looking at thousands of amps of current to power your processor array. This becomes a huge wall to designing large processor arrays. The low current draw for the asynchronous array concept means that such chips can be tiled by the thousands and still be run on a few tens of amps power supply.

Another positive about the asynchronous array concept is that most of the new parts get their speed into the gigahertz range by making use of pipelining in their internal structures. The asynchronous array is naturally pipelined just by its construction. Each cell in the array does its little thing and then passes the result on to the next neighboring cells. In this way, a process passes like a wave through the asynchronous array, starting from one edge and flowing through till the result comes out the opposite edge. One can take advantage of this by having multiple waves of processing going on simultaneously. Another trick for matrix operations is to have the matrix element come in one side of the array while constants for the matrix operation are flowing in from a different edge of the array, with the result flowing out yet another edge of the array.

But for the GA144, trying to use this trick for matrix multiplications just doesn’t work. Again, it comes back to the fact that there are not enough I/O pins around the edges of the array to get data in and out of the processor at a pace that can keep up with how fast the GA144 can go.

And yet one more aspect of the GA144 that ends up just teasing you with its potential is that each of the cells is a stack-based processor element. For those not familiar with programming for a stack-based engine, the best example might be the old HP calculators that ran what is called a Reverse Polish Notation programming structure. Rather than store data to be processed in registers, everything is pushed onto or popped off of a stack; with the ALU element just working on the top of the stack.

For example, for a register-based processor, you would write {4 + 5 =}, but for a stack-based engine you would just write {4 5 +}. Programs written for stack-based engines can be very compact and run very fast. But they can also be frustratingly impossible for most programmers to work with because they force them to pay absolute attention to the order that operations are done in. In other words, when programming for a stack-based processor, you can’t just give variables names and then let your compiler tools worry about the exact machine level code that your program generates.

The linked YouTube video above contains a number of examples of program coding for a stack-based processor. The reason such code examples look so cryptic to those not familiar with programming in such languages is that the visual clues most programmers look for when reading a piece of source code aren’t there; those visual clues are hidden, so to speak, in the order that the operations are performed.

This latter observation is why stack-based programming languages like Forth never made it commercially. Writing in assembly for a register-based engine is already beyond the ability and patience of most programmers. Then, adding the extra frustration of also having to keep track of the order you do things in becomes “a bridge too far”.

On the other hand, when you’re working at the level of a tiny processor core that’s trying to make maximum use of the silicon resources available, stack-based engines come into their own and are probably the most efficient processing structure to use at this cellular level.

My goal for the next few months is to see if I can re-create the GA144 asynchronous array structure using discrete FPGA parts; in other words, recreate on a printed-circuit-board size scale, the structure that’s found in the GA144 part at the silicone level. By going this design route, I can then give myself access to all the input/output pathways that the pin-out of the GA144 part doesn’t give you. I will thus be able to explore the full range of functional possibilities that such an asynchronous array structure can bring to the table. The other advantage is that, by using discrete FPGA’s for the cells, I can give myself many more options in terms of programming at the cellular level of the array. (More on this last point in future posts.)

4 comments:

JustAguyMarch 10, 2017 at 9:38 AM
Nice post on a rather obscure processor. I been looking for information about this and it is nice to get some feedback from someone how actually used it. I'm relatively new to embedded systems so reading about such info from more experienced developers is always interesting.
AnonymousNovember 18, 2024 at 10:32 AM
I am thinking about using it in a special education device interfaced between a 1000tvl CCTV camera and a 15.6" display (via an HDMI upscaler). The NTSC signal would be processed to save educational slides, present slides, display results and etc. It would present lessons keep data of Is and Ps (I = independently selected correct answers and P = prompting needed) when the student attempts a lesson. The only input device is the camera (and possibly a microphone) and the only output is the display (and possibly a speaker).

Teachers can photograph student data displays with their phones. During the lesson the student inputs answers by blocking an answer option on the display from the camera. The system responds to a correct answer by highlighting the outline of the correct answer in green and proceeding. The system responds to an incorrect answer by highlighting the answers outline box in red and the correct one in green.

I have purchased the video components and am still attempting to get enough confidence by reading the Greenarrays literature to purchase a dev board. I had some success programming in Forth back in the 1980's and it was the only language that was any fun for me.

I am tired of all the overhead that comes with operating systems. I am hoping that with this system, once it's done, its done, and it will keep on running with no maintenance. This also prevents other people's great ideas for changes from busting this system.

The current manual system often allows the student to stress out the instructor by attempting ambiguous responses to questions to illude the teacher that a correct answer may have been provided. The student will quickly learn that the only way to move forward is to block a portion of an answer with his fingers and there is no going back for another attempt until a future session.
AnonymousFebruary 27, 2025 at 5:28 AM
I finally purchased the EVAL Board for the GA144 last night for development of the Special Education device.

Saturday, February 7, 2015

The GA144

4 comments: