|
GeoROS++: A Georeferenced Real-Time Stitching Method for Sparse Aerial Orthophotos
Jiaming Gu*, Zhe Feng*, Weiliang Meng, Guangze Gao, Jianwei Guo, Jiguang Zhang, Xiaopeng Zhang
ACM SIGGRAPH Asia Conference, 2025
*Equal contribution.
|
|
AccidentX: A Large-Scale Multimodal BEV Dataset for Traffic Accident Analysis and Prevention
Muyang Zhang, Zhe Feng, JinMing Yang, Mingda Jia, Weiliang Meng, Wenxuan Wu, Jiguang Zhang, Xiaopeng Zhang
IROS, 2025
(Oral Presentation)
We introduce AccidentX, a large-scale multimodal dataset for traffic accident analysis and prevention in autonomous driving. It contains over 10,000 BEV videos from CARLA with rich annotations, offering seven times more frames than nuScenes. Leveraging VLMs and GPT-4o, AccidentX enables comprehensive scene understanding and establishes benchmarks for advanced MLLMs. The dataset will be fully open-sourced to support research in driving safety.
|
|
Radar-Camera Fusion Object Detection System
code
This project implements a real-time radar-camera fusion system for robust object detection in complex environments. It leverages YOLOv8 for visual inference, with CUDA-accelerated pre- and post-processing and TensorRT-based deployment for high-performance execution.
|
|
Let's build a neural network from scratch together.
code
Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.
|
|
A White Paper on Neural Network Deployment
code
This open-source project aims to bridge the gap between deep learning theory and practical deployment, focusing on deploying neural network models efficiently on NVIDIA hardware platforms.
|
|
Build CUDA Neural Network From Scratch
code
This project is a CUDA-based neural network implementation, developed from scratch with performance optimizations and modifications.
|
|
TorchStat2: PyTorch Model Analyzer
code
TorchStat2 is a comprehensive and lightweight neural network analyzer for PyTorch models. It provides detailed statistics about your neural networks, including computational complexity, memory usage, and performance analysis.
|
Open Source Contributions
I actively contribute to open source projects in deep learning, CUDA programming, and model deployment communities.
|
CUDA-Learn-Notes
repository
⭐ 8.3k
🍴 820
A comprehensive collection of 200+ Tensor/CUDA Cores Kernels, featuring flash-attention-mma, hgemm with WMMA, MMA and CuTe implementations achieving 98%~100% TFLOPS of cuBLAS/FA2 performance.
|
Efficient-AI-Backbones
repository
⭐ 4.3k
🍴 733
A collection of efficient AI backbone networks from Huawei Noah’s Ark Lab. These lightweight architectures are designed for mobile and edge deployment scenarios with state-of-the-art performance.
|
tensorrt_starter
repository
⭐ 286
🍴 75
A comprehensive guideline repository for learning CUDA and TensorRT from the beginning.
|
tensorrtx
repository
⭐ 7.5k
🍴 1.9k
Implementation of popular deep learning networks with TensorRT network definition API. Provides optimized inference solutions for various architectures, enabling high-performance deployment on NVIDIA hardware.
|
Internships
2025.02 - present, Hirain Academia Sinica, Beijing, China.
2024.09 - 2025.01, Tsinghua Lion Team, Tsinghua University.
|
|