I am a third-year PhD candidate in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Technology, affiliated with the AI Chip Center for Emerging Smart Systems and the Vision and System Design Lab (VSDL), under the supervision of Prof. Kwang-Ting CHENG. A concise version of my experience is available in my CV.

Before joining HKUST, I received my B.E. in Microelectronics from the School of Microelectronics, Southern University of Science and Technology, where I worked with Prof. Fengwei An.

My research interests include software-hardware co-design, AI accelerators, LLM/VLM systems, and 3D processing.

News

  • 2026.01: One paper is accepted by CICC 2026.
  • 2025.10: One paper is accepted by ISSCC 2026.
  • 2025.02: One paper is accepted by DAC 2025.
  • 2024.10: One paper is accepted by ISSCC 2025.
  • 2024.02: One paper and one poster are accepted by DAC 2024.
  • 2022.08: Our second paper of the SLAM accelerator project is accepted by Sensors.
  • 2022.08: Our first paper of the SLAM accelerator project, my first paper, is accepted by TCAS-II.
  • 2021.12: Our team won the first prize in the 2021 National College Students FPGA Innovation Design Competition.
  • 2021.10: Our team won the first prize in the 2021 International Competition of Autonomous Running Robots.

Publications

A 5nm 91.43 TOPS/W 4-Chiplet Generalizable-Rendering-Processor with UCIe-Enabled Cross-Die-Cache and Balance-Aware Progressive Multi-Level Sparsity

Yonghao Tan*, Songchen Ma*, Pingcheng Dong, Peng Luo, Zhiyuan Lei, Wencai Lu, Guangxi Ying, Man-To Yung, Haibo Zhao, Lan Liu, Yuzhong Jiao, Xuejiao Liu, Yu Liu, Li Li, Luhong Liang, Mao Liu, Kwang-Ting Cheng

Equal contribution.

A 14.08-to-135.69 Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding

Pingcheng Dong, Yonghao Tan, Xuejiao Liu, Peng Luo, Yu Liu, Di Pang, Songchen Ma, Xijie Huang, Shih-Yang Liu, Dong Zhang, Luhong Liang, Chi-Ying Tsui, Fengbin Tu, Liang Zhao, Kwang-Ting Cheng

APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

Yonghao Tan*, Pingcheng Dong*, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xiejie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng

Equal contribution.

A 28nm 0.22uJ/Token Memory-Compute-Intensity-Aware CNN-Transformer Accelerator with Hybrid-Attention-Based Layer-Fusion and Cascaded Pruning for Semantic Segmentation

Pingcheng Dong*, Yonghao Tan*, Xuejiao Liu, Peng Luo, Yu Liu, Luhong Liang, Yitong Zhou, Di Pang, Manto Yung, Dong Zhang, Xijie Huang, Shih-Yang Liu, Yongkun Wu, Fengshi Tian, Chi-Ying Tsui, Fengbin Tu, Kwang-Ting Cheng

Equal contribution.

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Pingcheng Dong*, Yonghao Tan*, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng

Equal contribution.

A Reconfigurable Coprocessor for Simultaneous Localization and Mapping Algorithms in FPGA

Yonghao Tan*, Huanshihong Deng*, Mengying Sun, Minghao Zhou, Yifei Chen, Lei Chen, Chao Wang, Fengwei An

Equal contribution.

A Reconfigurable Visual-Inertial Odometry Accelerator with High Area and Energy Efficiency for Autonomous Mobile Robots

Yonghao Tan*, Mengying Sun*, Huanshihong Deng, Haihan Wu, Minghao Zhou, Yifei Chen, Zhuo Yu, Qinghan Zeng, Ping Li, Lei Chen, Fengwei An

Equal contribution.

Research Projects

  • Hybrid Bonding based co-design AI accelerator (AC-RHB)
    • Co-design optimization for LLMs.
    • Implement the AI core and ReRAM with 55nm die-on-wafer stacking via bumping process.
  • Transformer based co-design AI accelerator (AC-Transformer)
    • Hardware-software collaborative optimization of Transformer-based architectures.
    • Implement an energy-efficient Transformer accelerator for semantic segmentation in 28nm ASIC technology.
  • ASIC design of a SLAM accelerator in 28nm CMOS technology
    • Propose a reconfigurable visual-inertial odometry accelerator implemented on FPGA for real-time trajectory output at 160MHz and 110fps.
    • Optimize the hardware architecture and complete the back-end design for ASIC development.

Honors and Awards

Education

  • 2023.09 - present, Doctor of Philosophy, Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
  • 2019.09 - 2023.06, Bachelor of Engineering, Microelectronics, Experimental Class, School of Microelectronics, Southern University of Science and Technology, Shenzhen, China. GPA: 3.77.
  • 2016.09 - 2019.06, Shimen Middle School, Foshan, China.

Teaching Assistant

  • ELEC2350: Introduction to Computer Organization and Design (2025 Fall)
  • ELEC3400: Introduction to Integrated Circuits and Systems (2024 Spring)
  • ELEC6910H: Advanced AI Chip and System (2024 Fall)

Visitor History

Total Views: -- Total Visitors: -- Last Snapshot: --