Date of Award
2013
Document Type
Open Access Dissertation
Department
Computer Science and Engineering
First Advisor
Jason D. Bakos
Abstract
The set-wise summation operation is perhaps one of the most fundamental and widely used operations in scientific applications. In these applications, maintaining the accuracy of the summation is also important as floating point operations have inherent errors associated with them. Designing floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists. There have been several efforts to design floating point accumulators and accurate summation architecture using different algorithms on FPGAs but these problems have been dealt with separately. In this dissertation, we present a general purpose reduction circuit architecture which addresses the issues of data hazard and accuracy in set-wise floating point summation. The reduction circuit architecture is parametrizable and can be scaled according to the depth of the adder pipeline. Also, the dynamic scheduling logic we use in this makes it highly resource efficient. Further, the resource requirements for this design are low. We also study various methods to improve the accuracy of summation of floating point numbers. We have implemented four designs. The reduction circuit architecture serves as the framework for these designs. Two of the designs namely AEC and AECSA are based on compensated summation while the two designs called EPRC80 and EPRC128 implement set-wise floating point accumulation in extended precision. We present and compare the accuracy and cost- operating frequency and resource requirements- tradeoffs associated with these designs. On the basis of our experiments, we find that these designs achieve significantly better accuracy. Three of the designs– AEC, EPRC80 and EPRC128– operate at around 180MHz on Xilinx Virtex 5 FPGA which is comparable to the reduction circuit while AECSA operates at 28% less frequency. The increase in resource requirement ranges from 41% to 320%. We conclude that accuracy can be achieved at the expense of more resources but the operating frequency can be maintained.
Rights
© 2013, Krishna Kumar Nagar
Recommended Citation
Nagar, K. K.(2013). Accuracy, Cost and Performance Trade-Offs for Streaming Set-Wise Floating Point Accumulation on FPGAs. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/3585