Zhe Feng

I'm currently a graduate student at National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA). I am interested in Deep Learning, with a special focus on Computer Vision and Autonomouc Driving. My latest works focus on Efficient Transformer Theory and Vision-based 3D representation learning.

Email  /  GitHub  /  Zhihu  / 

profile photo

Research

project image

GeoROS++: A Georeferenced Real-Time Stitching Method for Sparse Aerial Orthophotos


Jiaming Gu*, Zhe Feng*, Weiliang Meng, Guangze Gao, Jianwei Guo, Jiguang Zhang, Xiaopeng Zhang
ACM SIGGRAPH Asia Conference, 2025

*Equal contribution.

project image

AccidentX: A Large-Scale Multimodal BEV Dataset for Traffic Accident Analysis and Prevention


Muyang Zhang, Zhe Feng, JinMing Yang, Mingda Jia, Weiliang Meng, Wenxuan Wu, Jiguang Zhang, Xiaopeng Zhang
IROS, 2025 (Oral Presentation)

We introduce AccidentX, a large-scale multimodal dataset for traffic accident analysis and prevention in autonomous driving. It contains over 10,000 BEV videos from CARLA with rich annotations, offering seven times more frames than nuScenes. Leveraging VLMs and GPT-4o, AccidentX enables comprehensive scene understanding and establishes benchmarks for advanced MLLMs. The dataset will be fully open-sourced to support research in driving safety.




Projects

project image

Radar-Camera Fusion Object Detection System


code

This project implements a real-time radar-camera fusion system for robust object detection in complex environments. It leverages YOLOv8 for visual inference, with CUDA-accelerated pre- and post-processing and TensorRT-based deployment for high-performance execution.

project image

Let's build a neural network from scratch together.


code

Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.

project image

A White Paper on Neural Network Deployment


code

This open-source project aims to bridge the gap between deep learning theory and practical deployment, focusing on deploying neural network models efficiently on NVIDIA hardware platforms.

project image

Build CUDA Neural Network From Scratch


code

This project is a CUDA-based neural network implementation, developed from scratch with performance optimizations and modifications.

project image

TorchStat2: PyTorch Model Analyzer


code

TorchStat2 is a comprehensive and lightweight neural network analyzer for PyTorch models. It provides detailed statistics about your neural networks, including computational complexity, memory usage, and performance analysis.




Open Source Contributions

I actively contribute to open source projects in deep learning, CUDA programming, and model deployment communities.

CUDA-Learn-Notes


repository
⭐ 8.3k   🍴 820

A comprehensive collection of 200+ Tensor/CUDA Cores Kernels, featuring flash-attention-mma, hgemm with WMMA, MMA and CuTe implementations achieving 98%~100% TFLOPS of cuBLAS/FA2 performance.

Efficient-AI-Backbones


repository
⭐ 4.3k   🍴 733

A collection of efficient AI backbone networks from Huawei Noah’s Ark Lab. These lightweight architectures are designed for mobile and edge deployment scenarios with state-of-the-art performance.

tensorrt_starter


repository
⭐ 286   🍴 75

A comprehensive guideline repository for learning CUDA and TensorRT from the beginning.

tensorrtx


repository
⭐ 7.5k   🍴 1.9k

Implementation of popular deep learning networks with TensorRT network definition API. Provides optimized inference solutions for various architectures, enabling high-performance deployment on NVIDIA hardware.




Internships

2025.02 - present, Hirain Academia Sinica, Beijing, China.

2024.09 - 2025.01, Tsinghua Lion Team, Tsinghua University.