Sunday, March 8, 2015

Very Large Asynchronous Arrays of Simple Processors

I’ve spent the last month searching on the web for applications that a large asynchronous array might be useful for. Having tried a number of different combinations of search terms, the title of this blog post represents the most promising of those collections.

And as I spent time thinking about the idea of a large asynchronous array, it seems the question is breaking into two aspects; the answer to one depending on the answer to the other. One question being, what would such an array look like at the hardware level? And its mirror question, what would you use such an array for?

When you approach the problem from the hardware level, a number of very intriguing conclusions come to the forefront.

First, if you are going to create a large asynchronous array, what kind of part packages at the IC level would you want to tile with? On one hand, you want to be able to have sufficient signal pins so that each simple processor cell in your array has full access to all its neighbors across any IC pin-to-pin transition. But even a single simple processor cell will have on the order of 100 signal pins. This would imply that your individual IC chips would be limited to no more than four simple cells per die. And if you used ball grid parts, even a small asynchronous array would start to take up quite a large PC board area. There’s also the challenge that one loses a lot of speed crossing a chip-to-chip transition. A quick order-of-magnitude estimate suggests that crossing a chip-to-chip barrier reduces your potential clock speeds by a factor of 10.

One way to deal with this clock speed loss crossing a chip-to-chip transition is to maximize the number of individual simple processor cells you place on a single die. But then you run into the problem that the GA144 has; that is, the lack of sufficient I/O to tile directly one GA144 to another; which again causes you severe loss in speed when crossing a chip-to-chip boundary.

What this conundrum is trying to tell us is that our very large asynchronous array wants to be a wafer scale construct. That is, you deal with the loss of speed between chips by just not cutting your wafer up into chips in the first place. This is where things get intriguing: subtracting out the area taken up by bonding pads, the GA144 is approximately 25-sq.mm in area. This means you could tile 400 GA144’s onto a 10-cm x 10-cm wafer scale chip. This in turn translates into 57,600 individual F18A processor cells on an area about the size of two postcards. Wow! Letting our imagination run a little further, imagine stacking 20 such layers together in a 3-D arrangement. We now have something on the order of 1 million F18A processor cells in a volume about the size of a small book.

The one thing that invariably kills the design of any large processor array is heat. But this is the area where asynchronous arrays come to the forefront. So again using the GA144 as a worked example, its typical quiescent current draw is 7-µA. That means that for our 10-cm x 10-cm wafer scale chip, its typical quiescent current draw will be on the order of 3-mA; a ridiculously small number.

But what about the case when the GA144 is running? The typical full-on current draw for a single F18A cell is 3.75-mA. Multiplying this current draw by the number of individual F18A cells that could be tiled on our 10-cm x 10-cm chip gives a total current draw of about 200-A. That, of course, would probably melt our array. But if only 2 to 4 percent of our individual cells were running at any one time, the total power dissipation at 1.2-V would still be less than 10-watts. This is a very doable amount of heat to dissipate.

What the asynchronous array hardware wants to be is a wafer-sized chip upon which tens of thousands of individual simple processor cells have been placed, with the qualification that only a few percent of these individual processor cells are active at any one moment. What this, in turn, tells us about any application that we might run on our asynchronous array, is that it must be “sparse” in its operation.

The one downside of a wafer-sized chip is the inevitable presence of defects in the fabrication process. This implies that for any large array, there will be at least some dead processor cells, so the neighbors to such a dead cell will have to have, as part of their programming, a way to route around it. This is something that will have to be included in the basic design and programming of an individual simple processor cell.

An EDS Outline for the First Proof-of-Principle Prototype

Regarding the writing of a formal EDS, I have templates in Microsoft Word that I use for creating such documentation. But since I’m working for myself and not some medical device company, where the FDA is always looking over your shoulder, I will just stick to writing outlines rather than full formal documentation.

The meta-goal of this project will be the creation of an ASIC level asynchronous array of simple processors. (Yes, I know that sounds a bit ambitious, but it doesn't cost anything to dream.) While that level of hardware creation might be beyond my resources, there are two layers of hardware creation that are still accessible to me and my pocketbook. The first layer will be the creation of a simple proof-of-principle prototype board, while the second layer will be the creation of a larger 8 x 8 asynchronous array.

So this blog post will restrict itself to an EDS outline for that first layer proof-of-principle prototype. The IDS for this project was already outlined in a past blog post, so this post will just concentrate on outlining the EDS side of the design specifications.

As a hobbyist my primary design constraint is budget. Building a larger 8 x 8 asynchronous array will be an expensive out of pocket undertaking. So before I undertake such an expensive hardware design project, I want to be completely confident that the hardware will work and do what I want it to do, so one of the functions of this proof-of-principle prototype will be to validate the functionality of the various pieces of the final project.

So what are the aspects of the final design that I want to validate using this reduced complexity prototype? In no particular order...

  • Daisy chain programming via the JTAG interface. 
  • Programming environment. 
  • Ring oscillator design. 
  • Clock speeds attainable. 
  • Machine code command structure. 
  • Run-time debug environment. 
  • Will the chosen FPGA parts be adequate for their expected functionality? 
  • Power supply questions about current draw during operation. 

In other words, this first level of hardware design is for the validation of what will become the basic processor cell that will tiled into a larger asynchronous array. So validation of this first proof-of-principle prototype needs to be against whatever applications that a full sized ASIC level asynchronous array of simple processors would be running. But I don’t know what that is yet so this EDS outline will have remain a work in progress for now.

Saturday, March 7, 2015

Writing Design Specifications

Over the years, working as an engineer, the approach I take to design specification writing has matured into a two-part process. First there is what I call an external design specification (EDS), followed by what I call the internal design specification (IDS). The EDS is a black box view of the project from the perspective of the end user or customer. The IDS, on the other hand, is a white box view of the project written from the perspective of the engineers who are doing the actual design of the product. Then, at the conclusion of a project, the process of “validation” is done to the external design specification, while the process of “verification” is done to the internal design specification.

The IDS is generally written separately from the EDS and is put together by the engineering staff that will be responsible for designing, building, testing and otherwise turning the original EDS into a working piece of hardware. Starting with a well-written EDS, the IDS should virtually write itself.

Sadly, more often than not, what passes for a project’s EDS is generated by the marketing and sales departments. Rather than a clear outline of what needs to be accomplished, the project design specifications take the form of a vague wish list of features; some of which might be impossible to attain, while others turn into a set of mutually contradictory design goals. In cases like this, it’s the engineering management and staff that have to fill in for all of the missing design specs. It’s at this point projects start to fall apart. Looking back on my career, I would have to say every time I watched this happen, the project in question invariably ended up getting terminated, six months to a year later, without ever coming close to completion.

So what information should a well-written and robust external design specification contain?

  • An EDS needs to set forth its meta-goal. That is, what is the intent of this project? It’s so easy to get caught up in the design of a particular piece of hardware that one fails to ask the very basic question, “What are we trying to do with this in the first place?” I’ve had the experience myself of getting partially into a project design only to realize that there is actually a way easier and better way to do it than was originally envisioned. An EDS needs to be flexible enough, so that if something like this comes up, it can go through a revision process and take off in a new direction. An EDS which is written too specifically can trap the design team into an effort which, before it’s even done, people realize is not going to work. I’ve noticed as a physicist working in the engineering world that this kind of question comes naturally to me, but it generally seems to be an area of challenge for those trained academically as engineers.
  • An EDS needs to set forth a single design priority for the project that should then be religiously held to throughout the design and development process. This could be cost, size, speed, power consumption, a particular user interface or target environment, and etc. The important thing is that a project design can have only one priority. Once you allow a project to have more than one design goal, the dreaded problem of feature-creep sets in.
  • An EDS cannot assume as fact information which is not yet available. Another way to formulate this point is to say that an EDS should never try to specify the design of the final product in its first revision.  Once a design team gets involved with a project, all sorts of unknowns will surface that will invariably force a review and amendment of the original design specifications. At the very least, a project should pass through three stages; a prototype stage, a refinement stage, and the final stage. And for each of these stages, there should be a unique revision of the original EDS.

In some ways, you can think of an EDS as a tool or template, which if used properly, will help you formulate your ideas into a doable project that has the greatest chance of a successful completion.

So the first task in creating an EDS for my project is to narrow down exactly what it is I intend to accomplish. I’ve spent a lot of time the last month reviewing literature on the subject of asynchronous processor arrays and have narrowed down my field of interest to the subject of very large asynchronous arrays of simple processors (VLAASP’s). As I've Googled around the web looking for technical papers on such an architecture, I’m finding next to nothing there. I’m taking from this that I’ve stumbled onto a field of inquiry which might still be open for an individual like me to make some contributions to. So the overarching goal for this project will be the eventual publication of original results.

The project at this point breaks into two separate but parallel pathways. The reason for this is that what goes into the individual simple processor cells is going to be determined by the kinds of applications that the asynchronous array is going to be running. So before I can develop the Verilog code to be installed into the various FPGA parts I need to have some idea of what these individual cells are going to be doing. But the kind of applications I might search for in the literature will be constrained by what the underlying hardware can do in the first place.

So the first pathway in this process will be the creation of a piece of working hardware that would function as a demonstration platform for my ideas. The first pathway breaks down further into the construction of a proof of principle prototype followed by the creation of a larger asynchronous array. The second pathway will be to search for applications for which a very large asynchronous array would be uniquely suited for. There will no doubt be multiple blog posts on both of these subjects over the course of the next few months as my ideas converge on some doable but useful project goals.