Using Antifuse-Based FPGAs in Performance Critical Digital Designs

 

Barry Fagin

Thayer School of Engineering

Dartmouth College

Hanover, NH  03755

barry.fagin@dartmouth.edu

 

 

 

Abstract

 

We present experimental results on the use of antifuse-based FPGA's in a performance critical digital design, performed at the recently completed Thayer Rapid Prototyping Facility.  Our case study focuses on the design of a special purpose ALU for gene sequence analysis.  Our work indicates the existence of highly nonlinear relationships between design changes and critical path lengths, due to the overwhelming influence of routing on performance.  This suggests that the standard paradigms for digital design are not appropriate for FPGA's.  We compare our results to previous work using SRAM-based technology, and discuss the implications of our results for digital design and rapid prototyping.

 

 

1.0 Introduction

 

 

         We have previously reported in [1] the results of experimental investigations concerning the use of SRAM-based FPGA's in high performance digital designs.  This paper describes similar experiments using antifuse-based devices.  Both efforts were performed at the Thayer Rapid Prototyping Facility [2], now essentially complete.

 

            The Thayer RPF is an integrated digital systems laboratory, designed to permit all stages of the design process to be carried out in a single facility.  The emphasis is on board-level systems, consistent with our research objective of producing working hardware quickly.  This is facilitated by the use of a PCB prototyping system.  This system employs a plotter/etcher and a drill machine.  The plotter/etcher uses commercial inkjet printing technology to spray a resistive ink on copper film.  The film is then etched with sodium persulfate, dissolving all exposed copper.  After a rinse, the ink is scrubbed off with steel wool, and the film is tin plated.  The drilling machine and laminating equipment are then used to produce a finished board.  Both the plotter/etcher and the drill are controlled by a PC, using vendor-supplied software.

 

            The system produces boards that are well behaved electrically, and supports multilayer prototypes.  Minimum trace width and spacing are both 5/1000 inches.  Figure 1 shows a simple 2-layer, 300 hole board produced at the RPF.  This board took about an hour to etch and drill.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1: Sample PCB Prototype Board

 


2.0 A Gene Sequence ALU

 

         We are currently using the Thayer Rapid Prototyping Facility to perform a number of experiments in special purpose computation.  One of these experiments entails the construction of a special purpose computer for molecular genetics.

 

            Many applications of computers in molecular biology are essentially string comparison problems.  A typical computational task in molecular biology, for example, is the determination of the relationships between two sequences of DNA. These relationships can be specified in terms of additions, deletions, and changes of bases.  A computerized analysis of these relationships can aid the molecular biologist in an understanding of both evolutionary history and biological function.

 

            Algorithms for solving these problems are well known [3], perhaps more so by computer scientists than biologists. This type of problem is solved with a dynamic programming algorithm.  The two sequences to be compared are placed along the top row and left column of a matrix.  The value at a particular entry in the matrix reflects the 'similarity' between the subsequences corresponding to the appropriate row and column.  To determine the value at a particular cell, the values of the left, north, and northwest neighbors are examined.  The value for the cell is based on a parametrized weighting of these values, and the process repeats.  As shown in Figure 2, the computation can proceed in parallel wavefronts along the diagonals of the matrix.

 

 

Figure 2: Gene Sequence Alignment Computation

            The basic operation of this type of computation is a 5-input function, in which the west, north, and northwest neighbors of cell [m,n] along with nucleotides m and n in sequences A and B are used to determine the new value of a cell.  This suggests that performance can be improved through the use of a special "gene ALU" that uses a cell's neighboring values and nucleotide information to calculate a new value, based on a user-supplied weighting function.  This is shown in Figure 3.  Because the choice of weighting function is itself often an experimental variable,  freezing this function in hardware is a poor design decision. For this reason,  we targeted the gene alu to the Actel 1010/1020 series of FPGAs.

 

 

 

 

Figure 3:  Gene ALU (targeted for Actel 1010/1020 FPGA)

 

 

            The basic design for the gene ALU is shown in Figure 4.  (The schematic was created using Viewlogic's Workview® design package, the design entry tool at the Rapid Prototyping Facility).    The three paths of comparison are easily visible in the design: north values at the top, northwest in the middle, and west at the bottom.  The eight-bit output value appears at the right of the schematic, going directly to output pads.  Figure 4 is hierarchical; boxed symbols include latches and comparators, also specially designed and associated with their own schematics.

 

            We anticipated to improve the performance of the design through a repeated process of critical path identification, redesign, and resimulation.  Our expectation, based on digital design experience with other technologies, was that each iteration would yield smaller and smaller delays along the critical path, or perhaps a new critical path on which the process would be repeated.  In either case, we expected a reasonably linear interaction between designer and device, in which each iteration would yield performance improvements over the previous one until a point of diminishing returns was reached.  Our actual experience with the design was quite different.


 

           

 

           

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4: Gene ALU Schematic

3.0 Timing Analysis

 

            Once the schematic was created and functionally simulated, we began with the random pin assignment A of Figure 5.  This produced a critical path of 181.9ns, through the north input path as shown.  We then employed the more careful pin assignment of B in Figure 5, in which all related signals were placed next to each other.  This resulted in a slightly shorter delay along a different critical path.  We then attempted pin assignment C, and found a new critical path with a longer delay than the original one.

 

 

 

Figure 5: Pin assignments and critical paths

           

            Further experimentation yielded similar nonlinear relationships between design changes and performance.  To obtain a clearer understanding of the phenomenon, we examined the software estimates of the critical path of 8 points in the design space (2 designs, 2 dies, and 3 pin assignments).  This is shown in Table 1.

                                                        

 

Table 1: CRITICAL PATH LENGTHS

 

                                                         pinout A           pinout B           pinout C           Average

MODIFIED

Average                    Die 1010          185.8               183.1               218.9               195.9

         198.67             Die 1020          186.0               194.6               223.6               201.4

 

 

ORIGINAL

Average                    Die 1010          171.1               169.6               184.5               175.1

         188.42             Die 1020          198.2               192.2               214.9               201.8

 

 

Average                                            186.7               187.8               202.6               193.5

 

 

         We may make several observations from Table 1.  The difference between the slowest and fastest designs, for example,  is about 31%, a large value considering the similarities between designs.  We see as well that in every case performance is worse with a larger device.  This is consistent with previous results reported for SRAM-based devices [1]; the longer net length of designs implemented on larger devices outweighs their easier routability.

 

            Table 1 seems to describe a chaotic system, one that is extremely sensitive to initial conditions.  Perturbations in design parameters that would yield small changes in critical path length for other technologies may yield much larger perturbations for FPGA's.  The difference between the original and modified designs, for example, is a single buffer.  Two designs in the same row of Table 1 differ only by pin assignment, while two designs adjacent to one another in the same column differ only in the target device. 

 

 

4.0 Routing and Performance

 

         The key to understanding Table 1 lies in the place and route phase of the design cycle.  The routing software may be viewed as a highly nonlinear function that maps between schematics and FPGA programming files.  This mapping is so irregular that accurate predictions of performance in response to design changes are essentially impossible. We note that two points in our design space differing only in a signal name had a 3% difference in maximum delay and different critical paths.

 

            Figures 6 and 7 illustrate another aspect of routing and performance.  El Gamal et. al. in their description of the Actel FPGA architecture discuss the uses of both horizontal and vertical routing tracks for routability [4].  Vertical lines are connected to large numbers of antifuses, which add parasitic capacitance and slow down signals routed through them.  This suggests that designs with many vertical long lines will show substantially reduced performance, and implies that the minimization of long lines is an important task of the router.  Our examination of different gene alu designs supports this hypothesis.  Figure 6 shows the number of horizontal lines versus the maximum delay for 34 gene alu designs, along with a line of best fit.  Since the data points are widely scattered, the equation of the line is of little interest.  Its negative slope, however, indicates that maximum delay decreases as the number of horizontal lines increases, as expected.

 

 

Figure 6

 

 

            Figure 7 shows a similar plot for vertical lines.  The line of best fit slopes upward, indicating the predicted increase in delay with the number of vertical lines.

 

 

 

Figure 7

 


5.0 Conclusions and Practical Observations

 

         Actel FPGAs are now an important part of the Thayer Rapid Prototyping Facility.  We are currently using hardware and software support for these devices, and are pleased with the results.  Designs place and route quickly;  manual intervention was never required to achieve a successful route.

 

         However, the relative difference in sophistication between FPGA devices and FPGA software described in [1] is further indicated here.  The Actel/Workview interface, for example, does not currently support the simulation of systems that mix Actel FPGAs with standard TTL parts.  Additionally, the Actel architecture is not transparent; when the timing analyzer describes critical path information, the designer cannot view the fuse map to see where the problems are.  This in turn makes selecting the appropriate design changes difficult, requiring a heuristic "bag of tricks" approach to optimize for performance.  A tool that permits the user to see how signals were routed and, if necessary, edit the fuse map, would not be difficult to write and would assist designers interested in using antifuse-based FPGAs in performance-critical designs.

           

            Our experience indicates that the standard mode of interaction between digital designer and digital design does not apply for FPGAs.  The expected linear relationship between design changes and design performance does not appear, due to the chaotic interventions of design routing.    We note, however, that while the direction of performance change in response to a design change may not be known, the magnitude of the change can be bounded by the fundamental characteristics of the device. Major design changes may thus be able to move the design into a new region of performance, but fine tuning to introduce marginal improvements seems impossible.  This suggests that the suitability of FPGAs for a given design depends on performance objectives.  For projects in which every nanosecond of performance is important,  FPGA's are not an appropriate implementation technology. The larger the region of acceptable performance, the more attractive FPGA's become. 

 

            We note that the conclusions described here are based on a single design.  Other projects are currently underway at the Thayer RPF, including the design of a multiplier for large integers and an architectural subset of the DLX microprocessor [5].  Future work includes examining the FPGAs associated with these designs to see if they support the conclusions drawn here.    Additionally, the times described in this paper are estimated by vendor-supplied software.  When the gene alu and other systems are actually built, these delays should be measured empirically.

 

            Long term plans at the RPF call for a shift of  focus, using FPGA's in systems for which exacting standards of performance are less important than the ability to produce a working prototype quickly.  We plan to integrate FPGA's into board-level systems using the RPF PCB prototyper, concentrating on issues of trace routing and system testing.  We  also plan to compare different families of FPGAs by implementing identical designs with different devices and studying cost/performance tradeoffs.

 

 

6.0 Acknowledgements

 

         The Thayer Rapid Prototyping Facility has received support from a number of sources,  Industrial sponsors include Viewlogic, Actel, Xilinx, Direct Imaging, and Sun Microsystems.  The gene sequence processor project is supported by a grant from the Whitaker Foundation, while the completion of the RPF was supported with a grant from the National Science Foundation, award #CDA-8921062.

 

 


7.0 References

 

 

[1] Fagin, Barry, "Using Reprogrammable Gate Arrays in Performance Critical Digital Designs", Proceedings of the 3rd Microelectronics Systems Education Conference and Exposition, Santa Clara, CA, 1990, pp 43-60.

 

[2] Fagin, Barry and Hitchcock, Charlie, "Rapid Prototyping Without MOSIS: A Minority View", Proceedings of the 2nd VLSI Education Conference, Santa Clara, 1989, pp 59-67.

 

[3] Sellers, Peter H., "On the Theory and Computation of Evolutionary Distances", SIAM Journal of Applied Mathematics, June 1974, Volume 26, No. 4, pp 787-793.

 

[4] El Gamal, Abbas et. al., "An Architecture for Electrically Configurable Gate Arrays", IEEE Journal of Solid State Circuits, Vol. 24, No. 2, April 1989.

 

[5] Patterson, David and Hennessy, John, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers Inc., San Mateo, CA, 1990.