Wenqi Jiang gave a talk at Xilinx

02.05.2022

Read
Number of comments

Wenqi Jiang gave the following talk at Xilinx:

Title: Efficient Recommendation Inference on FPGAs

Abstract:

Deep neural networks are widely used in personalized recommendation systems. Recommendation inference is largely bound by memory due to random memory accesses needed to lookup the embedding tables.
This talk will introduce MicroRec, a high-performance FPGA inference engine for recommendation systems that tackles the memory bottleneck. This is extended to implement two high-performance recommendation inference clusters; one using FPGAs and the other using combined FPGAs and GPUs.
Experiments on three production models show that our cluster-based solutions outperform the CPU baseline by more than one order of magnitude while achieving significantly lower latency.