Date of Award

8-20-2024

Document Type

Open Access Dissertation

Department

Computer Science and Engineering

First Advisor

Ramtin Zand

Abstract

Machine learning (ML) has become ubiquitous, integrating into numerous real-life applications. However, meeting the computational demands of ML systems is challenging, as existing computing platforms are constrained by memory bandwidth, and technology scaling no longer yields substantial improvements in system performance. This work introduces novel hardware architectures to accelerate ML workloads, addressing both compute and memory challenges. In the compute domain, we explore various approximate computing techniques to assess their efficacy in accelerating ML computations. Subsequently, we propose the Approximate Tensor Processing Unit (APTPU), a hardware accelerator that utilizes approximate processing elements to replace direct quantization of inputs and weights in ML models, thereby enhancing performance, power efficiency, and area utilization. The APTPU achieves significant throughput gains while maintaining comparable accuracies. In the memory domain, we present the In-Memory Analog Computing (IMAC) architecture as an effective solution to the data movement issues faced by von Neumann architectures. IMAC employs memristive devices in crossbars and analog activation functions to efficiently execute matrix-vector multiplication (MVM) and non-linear vector operations in ML workloads. Finally, we integrate IMAC with TPU and APTPU architectures to capitalize on their combined strengths in accelerating MVM and matrix-matrix multiplication operations across various ML workloads. This integration leads to noteworthy performance enhancements and reduced memory requirements. This work provides design baselines and automation tools for architects to seamlessly incorporate the proposed compute and memory solutions into their custom systems.

Rights

© 2024, Mohammed Essa Fawzy Essa

Share

COinS