Yonghao Tan
Ph.D. Candidate in Electronic and Computer Engineering, The Hong Kong University of Science and Technology (HKUST)HKUST, Clear Water Bay, Hong Kong, China +852 53946864 ytanaz@connect.ust.hk
yonghaot1017@gmail.com yonghao-tan.github.io ORCID 0000-0001-5372-5863
Education
- Ph.D. in Electronic and Computer Engineering
- Sept. 2023 - Present
- The Hong Kong University of Science and Technology (HKUST)
- Hong Kong, China
Supervisor: Prof. Tim Kwang-Ting Cheng
- B.E. in Microelectronics
- Sept. 2019 - Jun. 2023
- Southern University of Science and Technology (SUSTech)
- Shenzhen, Guangdong, China
Supervisor: Prof. Fengwei An
Research Experience
- 5nm UCIe-Enabled Multi-Chiplet Generalizable Rendering Processor
- AI Chip Center for Emerging Smart Systems (ACCESS), Hong Kong, China
- Mar. 2024 - Sept. 2025
- Designed the architecture and execution scheduler for a 5nm four-chiplet GeNeRF processor, and built an early-stage simulation framework to evaluate buffer management, cross-die source-view caching, dense/sparse D2D transfer, multi-level sparsity control, and hybrid GeNeRF-SR dataflow.
- Developed algorithm-hardware co-design simulation models for projection-area-driven source-view placement, dynamic patch grouping, dense/sparse D2D modes, source-view pruning, tile-level fine-stage sharing, and patch-grained SR routing; used them to validate accuracy-sensitive sparsity and SR decisions and guide hardware-aware end-to-end quantization for silicon deployment.
- Contributed to the implementation of inter-chiplet transfer, chip-level control, and nonlinear computation modules; participated in chip testing and system validation of the fabricated 5nm MCM processor, which integrates the four-chiplet GeNeRF engine in a 45 mm x 45 mm package and achieves 91.43 TOPS/W and 55.43 FPS throughput.
- 55nm ReRAM-on-Logic Stacked LLM Accelerator
- AI Chip Center for Emerging Smart Systems (ACCESS), Hong Kong, China
- Apr. 2024 - Aug. 2025
- Contributed to algorithm-hardware co-design optimizations for a 55nm ReRAM-on-logic stacked edge LLM accelerator, targeting decoding-stage memory and scheduling bottlenecks with local-rotation-based W4A8 quantization, ReRAM-resident block-clustered codebook reconstruction, and adaptive speculative decoding.
- Implemented and validated the end-to-end algorithm flow, including layer-wise quantized LLM evaluation, codebook-based draft-model weight reconstruction, acceptance/rejection-aware speculative decoding analysis, and hardware-mapping studies that balance target-model EMA reduction against rejected-draft overhead.
- Designed RTL for nonlinear computation modules and related control/datapath logic; assisted chip testing and validation of the 55nm logic die stacked with four ReRAM dies via face-to-face bumps, achieving 14.08 to 135.69 token/s on a 55.98 mm² logic die.
- 28nm CNN-Transformer Accelerator for Semantic Segmentation
- AI Chip Center for Emerging Smart Systems (ACCESS), Hong Kong, China
- Nov. 2021 - Sept. 2024
- Contributed to the development of algorithm-hardware co-design optimizations for a 28nm CNN-Transformer semantic-segmentation accelerator, including hybrid attention processing, data-reuse-oriented layer fusion, and cascaded feature-map pruning for high-resolution ConvFormer workloads.
- Built a hardware energy simulation framework to quantify optimization impact and guide architecture and tape-out decisions; implemented the algorithm validation flow for VA/LA hybrid attention, KV/weight reuse scheduling, non-overlap layer fusion, and mask-based cascaded pruning.
- Designed RTL for attention/layer-fusion control and pruning-related datapath logic; assisted silicon testing and validation of the 13.93 mm² 28nm chip, achieving 0.22 uJ/token and up to 52.90 TOPS/W peak efficiency.
Publications
- A 5nm 91.43 TOPS/W 4-Chiplet Generalizable-Rendering-Processor with UCIe-Enabled Cross-Die-Cache and Balance-Aware Progressive Multi-Level Sparsity
- Tan, Y.*, Ma, S.*, Dong, P., Luo, P., Lei, Z., Lu, W., Ying, G., ... & Cheng, K. T.
- 2026 IEEE Custom Integrated Circuits Conference (CICC), IEEE.
- A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding
- Dong, P., Tan, Y., Liu, X., Luo, P., Liu, Y., Pang, D., Ma, S., ... & Cheng, K. T.
- 2026 IEEE International Solid-State Circuits Conference (ISSCC), IEEE.
- A 28nm 0.22uJ/Token Memory-Compute-Intensity-Aware CNN-Transformer Accelerator with Hybrid-Attention-Based Layer-Fusion and Cascaded Pruning for Semantic-Segmentation
- Dong, P.*, Tan, Y.*, Liu, X., Luo, P., Liu, Y., Liang, L., ... & Cheng, K. T.
- 2025 IEEE International Solid-State Circuits Conference (ISSCC), IEEE.
- APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
- Tan, Y.*, Dong, P.*, Wu, Y., Liu, Y., Liu, X., Luo, P., Liu, S. Y., Huang, X., Zhang, D., Liang, L., & Cheng, K. T.
- 2025 62nd ACM/IEEE Design Automation Conference (DAC), IEEE.
- Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
- Dong, P.*, Tan, Y.*, Zhang, D., Ni, T., Liu, X., Liu, Y., ... & Cheng, K. T.
- 2024 61st ACM/IEEE Design Automation Conference (DAC), IEEE.
- A Reconfigurable Coprocessor for Simultaneous Localization and Mapping Algorithms in FPGA
- Tan, Y.*, Deng, H.*, Sun, M., Zhou, M., Chen, Y., Chen, L., ... & An, F.
- IEEE Transactions on Circuits and Systems II: Express Briefs, 70(1), 286-290, 2022.
- A Reconfigurable Visual-Inertial Odometry Accelerator with High Area and Energy Efficiency for Autonomous Mobile Robots
- Tan, Y.*, Sun, M.*, Deng, H., Wu, H., Zhou, M., Chen, Y., ... & An, F.
- Sensors, 22(19), 7669, 2022.
* Authors marked with an asterisk contributed equally to the corresponding work.
Honors and Awards
- Postgraduate Studentship (PGS) Award in HKUST
- Sept. 2023 - Present
- Best Teaching Assistant Award, Department of Electronic and Computer Engineering, HKUST
- Aug. 2025
- Outstanding Graduate (School Level), SUSTech
- May 2023
- First-Class Outstanding Students Scholarship with the highest score
- Sept. 2022
- Undergraduate Innovation and Entrepreneurship Training Program
- Apr. 2022
- Shenzhen Longsys Electronics Company Award (Top 2% in the School of Microelectronics)
- Dec. 2021
- First Prize, 2021 National College Students FPGA Innovation Design Competition (Top 22 out of 1,341 teams)
- Dec. 2021
- First Prize, 2021 International Competition of Autonomous Running Robots (1st place out of 34 finalist teams)
- Oct. 2021
Skills
Research Interests: Software/Hardware Co-Design, Model Compression, 3D Processing
Programming Languages: C, C++, Java, Python, SystemVerilog, Verilog HDL, VHDL
Professional Software: AutoCAD, Cadence, Design Compiler, IC Compiler II, MATLAB, Multisim, Silvaco
Languages: English (fluent), Mandarin (native), Cantonese (native)