Sunday, March 8, 2015

Very Large Asynchronous Arrays of Simple Processors

I’ve spent the last month searching on the web for applications that a large asynchronous array might be useful for. Having tried a number of different combinations of search terms, the title of this blog post represents the most promising of those collections.

And as I spent time thinking about the idea of a large asynchronous array, it seems the question is breaking into two aspects; the answer to one depending on the answer to the other. One question being, what would such an array look like at the hardware level? And its mirror question, what would you use such an array for?

When you approach the problem from the hardware level, a number of very intriguing conclusions come to the forefront.

First, if you are going to create a large asynchronous array, what kind of part packages at the IC level would you want to tile with? On one hand, you want to be able to have sufficient signal pins so that each simple processor cell in your array has full access to all its neighbors across any IC pin-to-pin transition. But even a single simple processor cell will have on the order of 100 signal pins. This would imply that your individual IC chips would be limited to no more than four simple cells per die. And if you used ball grid parts, even a small asynchronous array would start to take up quite a large PC board area. There’s also the challenge that one loses a lot of speed crossing a chip-to-chip transition. A quick order-of-magnitude estimate suggests that crossing a chip-to-chip barrier reduces your potential clock speeds by a factor of 10.

One way to deal with this clock speed loss crossing a chip-to-chip transition is to maximize the number of individual simple processor cells you place on a single die. But then you run into the problem that the GA144 has; that is, the lack of sufficient I/O to tile directly one GA144 to another; which again causes you severe loss in speed when crossing a chip-to-chip boundary.

What this conundrum is trying to tell us is that our very large asynchronous array wants to be a wafer scale construct. That is, you deal with the loss of speed between chips by just not cutting your wafer up into chips in the first place. This is where things get intriguing: subtracting out the area taken up by bonding pads, the GA144 is approximately in area. This means you could tile 400 GA144’s onto a 10-cm x 10-cm wafer scale chip. This in turn translates into 57,600 individual F18A processor cells on an area about the size of two postcards. Wow! Letting our imagination run a little further, imagine stacking 20 such layers together in a 3-D arrangement. We now have something on the order of 1 million F18A processor cells in a volume about the size of a small book.

The one thing that invariably kills the design of any large processor array is heat. But this is the area where asynchronous arrays come to the forefront. So again using the GA144 as a worked example, its typical quiescent current draw is 7-µA. That means that for our 10-cm x 10-cm wafer scale chip, its typical quiescent current draw will be on the order of 3-mA; a ridiculously small number.

But what about the case when the GA144 is running? The typical full-on current draw for a single F18A cell is 3.75-mA. Multiplying this current draw by the number of individual F18A cells that could be tiled on our 10-cm x 10-cm chip gives a total current draw of about 200-A. That, of course, would probably melt our array. But if only 2 to 4 percent of our individual cells were running at any one time, the total power dissipation at 1.2-V would still be less than 10-watts. This is a very doable amount of heat to dissipate.

What the asynchronous array hardware wants to be is a wafer-sized chip upon which tens of thousands of individual simple processor cells have been placed, with the qualification that only a few percent of these individual processor cells are active at any one moment. What this, in turn, tells us about any application that we might run on our asynchronous array, is that it must be “sparse” in its operation.

The one downside of a wafer-sized chip is the inevitable presence of defects in the fabrication process. This implies that for any large array, there will be at least some dead processor cells, so the neighbors to such a dead cell will have to have, as part of their programming, a way to route around it. This is something that will have to be included in the basic design and programming of an individual simple processor cell.

No comments:

Post a Comment