Junfan Zhu junfanz1

Junfan Zhu 👋

🤗 Founder & Principal Curator of Saturday Robotics — Silicon Valley’s high-signal Robotics & World Models community, connecting frontier researchers, founders, investors across embodied intelligence.

Physical AI researcher on World-Action Models, sim-to-real transfer, cross-embodiment policy learning.

Building evaluation-centric embodied AI systems spanning world models, agentic reasoning, real-world deployment.

Master’s in CS from Georgia Tech and Mathematics from UChicago, part-time studied at Stanford GSB. Previously, a Machine Learning Quant Researcher in Chicago.

A long-term thinker, resilient collaborator, and builder of high-impact AI systems.

X: https://x.com/junfanzhu98

Github (1.6k⭐️): https://github.com/junfanz1/

📄 Publications

Agents Last Exam (ALE): Benchmarking Long-Horizon AI Agents [NeurIPS 2026] · Contributed to large-scale AI eval infra at NeurIPS 2026, 1K+ task benchmark led by 300+ domain experts.
🚗 IEDD: An Interactive Enhanced Driving Dataset for Autonomous Driving [Scientific Data 2026]
🌲 As AutonomousDriving evolves toward VLA, sparse interactive scenarios and weak multimodal alignment remain critical bottlenecks. Existing datasets heavily bias toward straight-line cruising while severely under-representing long-tail interactive events (cut-in, merging, pedestrian crossing, head-on avoidance). IEDD introduces a physics-aware, interaction-dense dataset (plus IEDD-VQA multimodal extension) mined from 7.31M ego-centric scenes across Waymo, nuPlan, Lyft, INTERACTION, SIND — with 91% multi-agent interactions, dual Intensity–Efficiency metrics, pixel-level BEV-video alignment, rule-based hallucination-free language, and hierarchical L1–L4 VLM benchmarking. 🌍 It lays a scalable, causality-grounded foundation to evolve general-purpose VLMs into truly capable autonomous driving experts. 🤗 HuggingFace, LinkedIn, X.
📊 QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models [ACL 2026]
🧪 Evaluation and domain knowledge are the core bottlenecks of Quant + AI. Without expert-level, strong verifiers for evaluation, models cannot reliably assess performance in multi-step strategy generation, risk control, or real-world trading effectiveness. QuantEval is proposed in this context, providing a reproducible benchmark framework that goes beyond static question answering and shifts toward evaluation grounded in realistic trading details. It represents an initial exploration of evaluating financial “World Models.” 🌍

🏆 Awards

Finalist & Track Winner, 🏆 Y Combinator Hackathon 2025
Inspired by Isaac Asimov’s Foundation, PsychoHistory is a probabilistic forecasting system that maps the branching futures of human events—combining history, data, and AI to model the flow of possibility. 🧠 Our approach blended SFT+RL, training the model not just what to predict but how to reason across alternative futures—like a psychohistorian trained on uncertainty itself.
Meritorious Winner, Mathematical Contest in Modeling.
Finalist, Asia Supercomputer Challenge.
Top 10 Algo Trader, Rotman International Trading Competition.
Outstanding Thesis (1%).

Professional Services

Invited Reviewer, ACM Conf (AI Agentic Systems), 2026. Nominated by committee for research contributions.
Program-Committee-Equivalent Curator, Saturday Robotics Reading Club—top Bay Area robotics ecosystem.

🚀 AI Engineering Portfolio

My portfolio boasts pioneering projects in MoE & Attention for scalable LLM, reflective multi-agent orchestrations, and full-stack GenAI applications.

1. Awesome-AI-Engineer-Review
In-depth review of industry trends in AI, LLMs, Machine Learning, Computer Science, and Quantitative Finance.
- 2025 NVIDIA GTC Conference − Technical & Industrial Insight
- 2025 Agentic AI Summit Berkeley − Technical & Industrial Insight
2. Agentic RL: GRPO Reinforcement Learning for Agentic Search in LLMs
Search-R1 leverages Group Relative Policy Optimization (GRPO) to fine-tune LLMs at the token level, enabling stable reinforcement learning over multi-step search–reasoning trajectories. The model learns adaptive retrieval policies, deciding when to trigger searches and integrating results into its reasoning context for more precise answers.
3. MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention
Memory-efficient multi-head latent attention in PyTorch, that leverages low-rank approximation and decoupled rotary positional embeddings, to compress key–value representations, reducing inference memory while maintaining high performance in long-context language models.
4. DeepSeek-MoE-Mixture-of-Experts-in-PyTorch
Implemented scalable 8-expert MoE model with top-k routing, expert load balancing, and capacity-aware gating; enabled parallel sparse activation and DeepSeek-R1-style distributed training scalability.
5. MCP-MultiServer-Interoperable-Agent2Agent-LangGraph-AI-System
A decoupled real-time agent architecture connecting LangGraph agents to remote tools served by custom MCP servers via SSE and STDIO, enabling a scalable multi-agent system for LLM workflows. The design supports flexible multi-server connectivity and lays the groundwork for an Agent2Agent protocol, fostering seamless, cloud-deployable interoperability across diverse AI systems.
6. LangGraph-Reflection-Researcher
Engineered LangGraph-based multi-agent system with self-reflection and retrieval-grounded alignment; integrated LangSmith trace for reasoning introspection, cutting hallucination 40% with iterative expert routing.
7. Cognito-LangGraph-RAG-Chatbot
Advanced Retrieval Augmented Generation (RAG) chatbot that utilizes LangGraph to enhance answer accuracy and minimize hallucinations in LLM outputs.
8. Cursor-FullStack-AI-App
Cursor Vibe Engineering: Full-stack micro SaaS AI application that processes GitHub URLs to generate insightful JSON reports powered by AI analytics.
9. Cryptocurrency-Blockchain-FullStack
Comprehensive decentralized blockchain platform demonstrating practical applications of core blockchain concepts through a modular, full-stack approach.

Favorite project integrating Generative AI, Humanoid Robotics (RLHF), and Low-Altitude Economy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Junfan Zhu junfanz1

Achievements

Achievements

Organizations

Block or report junfanz1

Junfan Zhu 👋

📄 Publications

🏆 Awards

Professional Services

🚀 AI Engineering Portfolio

🛠️ Tech Stack

🌏 Fun Facts

📊 GitHub Stats

Pinned Loading

Uh oh!