Zhe Feng

I'm currently a graduate student at National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA). I am interested in Deep Learning, with a special focus on Computer Vision and Autonomouc Driving. My latest works focus on Efficient Transformer Theory and Vision-based 3D representation learning.

Email / GitHub / Zhihu /

Research

GeoROS++: A Georeferenced Real-Time Stitching Method for Sparse Aerial Orthophotos

Jiaming Gu^*, Zhe Feng^*, Weiliang Meng, Guangze Gao, Jianwei Guo, Jiguang Zhang, Xiaopeng Zhang
ACM SIGGRAPH Asia Conference, 2025

^*Equal contribution.

AccidentX: A Large-Scale Multimodal BEV Dataset for Traffic Accident Analysis and Prevention

Muyang Zhang, Zhe Feng, JinMing Yang, Mingda Jia, Weiliang Meng, Wenxuan Wu, Jiguang Zhang, Xiaopeng Zhang
IROS, 2025 (Oral Presentation)

We introduce AccidentX, a large-scale multimodal dataset for traffic accident analysis and prevention in autonomous driving. It contains over 10,000 BEV videos from CARLA with rich annotations, offering seven times more frames than nuScenes. Leveraging VLMs and GPT-4o, AccidentX enables comprehensive scene understanding and establishes benchmarks for advanced MLLMs. The dataset will be fully open-sourced to support research in driving safety.

Projects

	Radar-Camera Fusion Object Detection System code This project implements a real-time radar-camera fusion system for robust object detection in complex environments. It leverages YOLOv8 for visual inference, with CUDA-accelerated pre- and post-processing and TensorRT-based deployment for high-performance execution.
	Let's build a neural network from scratch together. code Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.
	A White Paper on Neural Network Deployment code This open-source project aims to bridge the gap between deep learning theory and practical deployment, focusing on deploying neural network models efficiently on NVIDIA hardware platforms.
	Build CUDA Neural Network From Scratch code This project is a CUDA-based neural network implementation, developed from scratch with performance optimizations and modifications.
	TorchStat2: PyTorch Model Analyzer code TorchStat2 is a comprehensive and lightweight neural network analyzer for PyTorch models. It provides detailed statistics about your neural networks, including computational complexity, memory usage, and performance analysis.

Open Source Contributions

I actively contribute to open source projects in deep learning, CUDA programming, and model deployment communities.

CUDA-Learn-Notes

repository
⭐ 8.3k 🍴 820

A comprehensive collection of 200+ Tensor/CUDA Cores Kernels, featuring flash-attention-mma, hgemm with WMMA, MMA and CuTe implementations achieving 98%~100% TFLOPS of cuBLAS/FA2 performance.

Efficient-AI-Backbones

repository
⭐ 4.3k 🍴 733

A collection of efficient AI backbone networks from Huawei Noah’s Ark Lab. These lightweight architectures are designed for mobile and edge deployment scenarios with state-of-the-art performance.

tensorrt_starter

repository
⭐ 286 🍴 75

A comprehensive guideline repository for learning CUDA and TensorRT from the beginning.

tensorrtx

repository
⭐ 7.5k 🍴 1.9k

Implementation of popular deep learning networks with TensorRT network definition API. Provides optimized inference solutions for various architectures, enabling high-performance deployment on NVIDIA hardware.

Internships

2025.02 - present, Hirain Academia Sinica, Beijing, China.

2024.09 - 2025.01, Tsinghua Lion Team, Tsinghua University.

Zhe Feng

Research

GeoROS++: A Georeferenced Real-Time Stitching Method for Sparse Aerial Orthophotos

AccidentX: A Large-Scale Multimodal BEV Dataset for Traffic Accident Analysis and Prevention

Projects

Radar-Camera Fusion Object Detection System

Let's build a neural network from scratch together.

A White Paper on Neural Network Deployment