Computer Science and Engineering
In this paper we describe a new approach for accelerating the Conjugate Gradient (CG) method using an FPGA co-processor. As in previous approaches, our co-processor performs a double-precision sparse matrix-vector multiplication. However, our implementation doubles the amount of computation per unit of input data by exploiting the symmetry of the input matrix and computing the upper and lower triangle of the input matrix in parallel. Using a Virtex-2 Pro 100 FPGA, we have achieved an observed computational throughput of 1155 MFLOPS.
Published in 17th IEEE Symposium on Field Programmable Custom Computing Machines, 2009, pages 223-226.
© 2009 by the Institute of Electrical and Electronics Engineers (IEEE)