I am a third-year PhD candidate in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Technology, affiliated with the AI Chip Center for Emerging Smart Systems and the Vision and System Design Lab (VSDL), under the supervision of Prof. Kwang-Ting CHENG. A concise version of my experience is available in my CV.
Before joining HKUST, I received my B.E. in Microelectronics from the School of Microelectronics, Southern University of Science and Technology, where I worked with Prof. Fengwei An.
My research interests include software-hardware co-design, AI accelerators, LLM/VLM systems, and 3D processing.
News
- 2026.01: One paper is accepted by CICC 2026.
- 2025.10: One paper is accepted by ISSCC 2026.
- 2025.02: One paper is accepted by DAC 2025.
- 2024.10: One paper is accepted by ISSCC 2025.
- 2024.02: One paper and one poster are accepted by DAC 2024.
- 2022.08: Our second paper of the SLAM accelerator project is accepted by Sensors.
- 2022.08: Our first paper of the SLAM accelerator project, my first paper, is accepted by TCAS-II.
- 2021.12: Our team won the first prize in the 2021 National College Students FPGA Innovation Design Competition.
- 2021.10: Our team won the first prize in the 2021 International Competition of Autonomous Running Robots.
Research Projects
- 5nm UCIe-Enabled Multi-Chiplet Generalizable Rendering Processor (Mar. 2024 - Sept. 2025)
- Architected a 5nm 4-chiplet GeNeRF processor for generalizable rendering to address the heavy external-memory traffic induced by multi-view feature fetching.
- Introduced a UCIe-enabled cross-die unified cache, distributed source-view management, and balance-aware scheduling to maximize source-view reuse while reducing off-chip and cross-die data movement.
- Silicon results reached 91.43 TOPS/W, 55.43 FPS real-time rendering, and 0.29 uJ/pixel through hierarchical sparsity and hybrid NeRF-SR execution in a 45 mm x 45 mm MCM package footprint.
- 55nm ReRAM-on-Logic Stacked LLM Accelerator for Speculative Decoding (Apr. 2024 - Aug. 2025)
- Architected a 55nm edge LLM accelerator whose logic die is stacked with four ReRAM dies via face-to-face bump bonding to support in-stack storage of draft-model codebooks.
- Developed block-clustered weight compression, local-rotation-based outlier-free W4A8 quantization, and adaptive parallel speculative decoding to reduce both target-model EMA and rejected-draft overhead.
- The prototype delivered 14.08 to 135.69 token/s and a 4.46x to 7.17x throughput speedup over a BF16 speculative-decoding baseline on a 55.98 mm² logic die.
- 28nm CNN-Transformer Accelerator for Semantic Segmentation (Nov. 2021 - Sept. 2024)
- Architected a 28nm memory-compute-intensity-aware accelerator for high-resolution ConvFormer and SegFormer semantic-segmentation workloads.
- Combined hybrid-attention processing, data-reuse-oriented layer fusion, and cascaded feature-map pruning to reduce attention-side EMA and eliminate redundant KV and weight movement across fused blocks.
- Silicon results achieved 0.22 uJ/token on SegFormer-B0 and up to 52.90 TOPS/W peak efficiency in a 13.93 mm² chip.
Publications
A 5nm 91.43 TOPS/W 4-Chiplet Generalizable-Rendering-Processor with UCIe-Enabled Cross-Die-Cache and Balance-Aware Progressive Multi-Level Sparsity
Yonghao Tan*, Songchen Ma*, Pingcheng Dong, Peng Luo, Zhiyuan Lei, Wencai Lu, Guangxi Ying, Man-To Yung, Haibo Zhao, Lan Liu, Yuzhong Jiao, Xuejiao Liu, Yu Liu, Li Li, Luhong Liang, Mao Liu, Kwang-Ting Cheng
Equal contribution.
A 14.08-to-135.69 Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding
Pingcheng Dong, Yonghao Tan, Xuejiao Liu, Peng Luo, Yu Liu, Di Pang, Songchen Ma, Xijie Huang, Shih-Yang Liu, Dong Zhang, Luhong Liang, Chi-Ying Tsui, Fengbin Tu, Liang Zhao, Kwang-Ting Cheng
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
Yonghao Tan*, Pingcheng Dong*, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xiejie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng
Equal contribution.
Pingcheng Dong*, Yonghao Tan*, Xuejiao Liu, Peng Luo, Yu Liu, Luhong Liang, Yitong Zhou, Di Pang, Manto Yung, Dong Zhang, Xijie Huang, Shih-Yang Liu, Yongkun Wu, Fengshi Tian, Chi-Ying Tsui, Fengbin Tu, Kwang-Ting Cheng
Equal contribution.
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Pingcheng Dong*, Yonghao Tan*, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng
Equal contribution.
A Reconfigurable Coprocessor for Simultaneous Localization and Mapping Algorithms in FPGA
Yonghao Tan*, Huanshihong Deng*, Mengying Sun, Minghao Zhou, Yifei Chen, Lei Chen, Chao Wang, Fengwei An
Equal contribution.
Honors and Awards
- 2025.08 Best Teaching Assistant Award, Department of Electronic and Computer Engineering, HKUST
- 2023.05 Outstanding Graduate (School Level), SUSTech
- 2022.09 First-Class Outstanding Students Scholarship with the highest score
- 2022.04 Undergraduate Innovation and Entrepreneurship Training Program
- 2021.12 Shenzhen Longsys Electronics Company Award (Top 2% in the School of Microelectronics)
- 2021.12 First Prize, 2021 National College Students FPGA Innovation Design Competition (Top 22 out of 1,341 teams)
- 2021.10 First Prize, 2021 International Competition of Autonomous Running Robots (1st place out of 34 finalist teams)
Education
- 2023.09 - present, Doctor of Philosophy, Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
- 2019.09 - 2023.06, Bachelor of Engineering, Microelectronics, Experimental Class, School of Microelectronics, Southern University of Science and Technology, Shenzhen, China. GPA: 3.77.
- 2016.09 - 2019.06, Shimen Middle School, Foshan, China.
Teaching Assistant
- ELEC2350: Introduction to Computer Organization and Design (2025 Fall)
- ELEC3400: Introduction to Integrated Circuits and Systems (2024 Spring)
- ELEC6910H: Advanced AI Chip and System (2024 Fall)





