Theses and Dissertations

Accuracy, Cost and Performance Trade-Offs for Streaming Set-Wise Floating Point Accumulation on FPGAs

Krishna Kumar Nagar, University of South Carolina

Date of Award

2013

Document Type

Open Access Dissertation

Department

Computer Science and Engineering

First Advisor

Jason D. Bakos

Abstract

The set-wise summation operation is perhaps one of the most fundamental and widely used operations in scientific applications. In these applications, maintaining the accuracy of the summation is also important as floating point operations have inherent errors associated with them. Designing floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists. There have been several efforts to design floating point accumulators and accurate summation architecture using different algorithms on FPGAs but these problems have been dealt with separately. In this dissertation, we present a general purpose reduction circuit architecture which addresses the issues of data hazard and accuracy in set-wise floating point summation. The reduction circuit architecture is parametrizable and can be scaled according to the depth of the adder pipeline. Also, the dynamic scheduling logic we use in this makes it highly resource efficient. Further, the resource requirements for this design are low. We also study various methods to improve the accuracy of summation of floating point numbers. We have implemented four designs. The reduction circuit architecture serves as the framework for these designs. Two of the designs namely AEC and AECSA are based on compensated summation while the two designs called EPRC80 and EPRC128 implement set-wise floating point accumulation in extended precision. We present and compare the accuracy and cost- operating frequency and resource requirements- tradeoffs associated with these designs. On the basis of our experiments, we find that these designs achieve significantly better accuracy. Three of the designs– AEC, EPRC80 and EPRC128– operate at around 180MHz on Xilinx Virtex 5 FPGA which is comparable to the reduction circuit while AECSA operates at 28% less frequency. The increase in resource requirement ranges from 41% to 320%. We conclude that accuracy can be achieved at the expense of more resources but the operating frequency can be maintained.

Rights

Recommended Citation

Nagar, K. K.(2013). Accuracy, Cost and Performance Trade-Offs for Streaming Set-Wise Floating Point Accumulation on FPGAs. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/3585

Download

Included in

Computer Engineering Commons

COinS

Theses and Dissertations

Accuracy, Cost and Performance Trade-Offs for Streaming Set-Wise Floating Point Accumulation on FPGAs

Date of Award

Document Type

Department

First Advisor

Abstract

Rights

Recommended Citation

Included in

Search

Browse

Submissions

Links

Theses and Dissertations

Accuracy, Cost and Performance Trade-Offs for Streaming Set-Wise Floating Point Accumulation on FPGAs

Author

Date of Award

Document Type

Department

First Advisor

Abstract

Rights

Recommended Citation

Included in

Share

Search

Browse

Submissions

Links