absorb.md

About Ravi Netravali

Princeton CS associate professor. Systems for ML, LLM serving, video systems, networked systems. Runs the SNS (Systems and Networking Systems) Group. Former PhD advisor work on lots of PLDI/OSDI/NSDI papers.

Ravi Netravali is a Princeton CS associate professor leading the SNS Group, specializing in systems for ML, LLM serving, video systems, and networked systems with numerous PLDI/OSDI/NSDI papers. His thinking emphasizes application-aware, data-driven optimizations that bridge domain-specific insights with low-level systems design to overcome inefficiencies in distributed, resource-constrained environments. Recurring motifs include hardware-software co-design, speculative/passive learning techniques, and dynamic adaptation to variability in workloads, networks, and user behaviors.

Biography

Ravi Netravali is an associate professor of Computer Science at Princeton University, directing the Systems and Networking Systems (SNS) Group. His research spans systems for machine learning (ML), large language model (LLM) serving, video systems, and networked systems, with contributions to top venues like PLDI, OSDI, and NSDI. [bio]

LLM Systems and Inference Optimization

Netravali's work on LLM infrastructure tackles efficiency bottlenecks in serving, inference, and agentic workflows. FailFast uses diffusion LLMs for speculative decoding, dynamically adjusting speculation lengths to balance speed and quality.[5] Aragog and its dynamic just-in-time model routing decouple configuration selection for scalable agentic workflows, improving throughput under fluctuating loads.[6][7] LessIsMore introduces training-free sparse attention via cross-head token aggregation, cutting reasoning latency by 1.13× end-to-end without accuracy loss.[8][9] SpecReason accelerates large reasoning models (LRMs) with semantic speculative reasoning on thinking tokens.[10][13] METIS optimizes retrieval-augmented generation (RAG) through adaptive scheduling and configuration.[18][19] Marconi enhances prefix caching for hybrid LLMs with reuse-aware policies.[20][21]

Video Systems and Conferencing

A major focus is real-time video, from rate control to hardware acceleration. Mowgli applies offline RL to production telemetry for rate control, boosting bitrates 15-39% over GCC without online exploration.[2][22][23] Scallop offloads SFU media operations to Tofino ASICs in an SDN-style design, scaling 210× and cutting latency 26×.[14][15] Other innovations include Dashlet for swipe-uncertainty in short video QoE,[30] MadEye for PTZ camera optimization,[26] and Legilimens for continuous learning on edge SoCs.[4][11]

Networked Systems and Hardware Offload

Netravali reframes networks as active compute resources. SmartNICs are positioned for AI inference offloading due to packet-processing alignment.[16][17] ABC enables precise congestion control on wireless links with single-bit feedback.[40][41] Application-centric design integrates app insights into networking.[24] Scallop exemplifies SDN-inspired hardware for video SFUs.[14][15]

Distributed Systems Debugging and Observability

Tools like Snicket provide query-driven tracing for microservices,[1] Lumos uses provenance-guided debugging,[3] and Revelio generates ML-assisted debugging queries.[34]

Edge and Resource-Constrained ML

Systems address memory, compute limits: GEMEL merges models for edge video analytics,[33] Apparate uses early exits for ML inference,[25] Bamboo fills pipeline bubbles for preemptible DNN training,[31] and Legilimens enables on-device continuous learning.[4][11]

Privacy, Security, and Other Systems

Privid introduces duration-based differential privacy for video analytics.[29][35] Guillotine proposes hardware-software co-design for AI containment.[12] MARVOLO augments ML malware detection,[28] Canvas isolates remote memory swaps,[32] Dorylus scales GNN training on CPUs,[37] and Gringotts incentivizes P2P video with crypto proofs.[42]

Web and Miscellaneous

JavaScript-aware crawler Java cuts web archive storage 41%.[27] Khameleon prefetching for DVE apps achieves sub-100ms latencies.[39] Boggart accelerates general video analytics with imprecise pre-filtering.[36] SAMU offloads GC tracing in disaggregated datacenters.[38]

Dynamic Adaptation and Optimization

Systems that adjust configurations, routing, or computations in real-time based on workload variability, resource constraints, or telemetry.

  • Aragog decouples routing for agentic workflows [6][7]

  • METIS adapts RAG configs for latency-quality [18][19]

  • Scallop SDN-style offload for video [14][15]

Speculative and Passive Learning Techniques

Avoiding costly online training via speculation, offline RL from logs, or passive telemetry to boost performance safely.

  • FailFast speculative decoding with dLLMs [5]

  • Mowgli offline RL for video rate control [2][22][23]

  • SpecReason semantic speculation for LRMs [10][13]

Hardware-Software Co-Design and Offloading

Leveraging SmartNICs, ASICs, and disaggregated memory to offload compute from general-purpose CPUs/GPUs.

  • SmartNICs for AI pipelines [16][17]

  • Scallop Tofino SFUs [14][15]

  • SAMU GC offload to memory servers [38]

Application-Centric Systems Design

Integrating app-level insights (e.g., video swipes, reasoning tokens) into low-level optimizations.

  • Application-centric networking talk [24]

  • Dashlet swipe pre-buffering [30]

  • LessIsMore cross-head attention [8][9]

Efficiency in Resource-Constrained Environments

Memory, compute optimizations for edge, distributed training, and serving under constraints.

  • Legilimens edge continual learning [4][11]

  • GEMEL model merging [33]

  • Bamboo preemptible training [31]

Observability and Debugging

Query-driven, provenance-guided tools for production distributed systems.

  • Snicket query-driven tracing [1]

  • Lumos provenance debugging [3]

  • Revelio ML query generation [34]

Video Analytics and Privacy

Optimizing accuracy, privacy in live/retrospective video processing.

  • MadEye PTZ optimization [26]

  • Privid duration-DP [29][35]

  • Boggart imprecise pre-filtering [36]

paper · by Ravi Netravali · 2 mentions
paper · by Ravi Netravali · 2 mentions
paper · by Ravi Netravali · 2 mentions
paper · by Ravi Netravali · 2 mentions
alohambra
tool
pensive
tool
dashlet
tool
reducto
tool
geml
tool
paper · by Ravi Netravali
paper · by Ravi Netravali
paper · by Ravi Netravali
paper · by Ravi Netravali
repo · by Ravi Netravali
paper · by Ravi Netravali
paper · by Ravi Netravali
paper · by Ravi Netravali
paper · by Ravi Netravali
paper · by Ravi Netravali
repo

Every entry that fed the multi-agent compile above. Inline citation markers in the wiki text (like [1], [2]) are not yet individually linked to specific sources — this is the full set of sources the compile considered.

  1. Snicket: A Query-Driven Distributed Tracing System for Microservicesyoutube · 2026-04-06
  2. Mowgli: Offline RL for Real-time Video Rate Controlyoutube · 2026-04-06
  3. Lumos: Provenance-Guided Debugging for Distributed Systemspaper · 2026-03-30
  4. Legilimens: Continuous Learning for Mobile Edge Video Analyticsblog · 2026-01-01
  5. FailFast: Optimizing Speculative Decoding with Diffusion LLMs for Enhanced LLM Accelerationpaper · 2025-12-23
  6. Dynamic Just-in-Time Model Routing for Scalable Agentic Workflowsblog · 2025-11-30
  7. Aragog: Dynamic LLM Configuration for Agentic Workflowspaper · 2025-11-26
  8. LessIsMore: Training-Free Sparse Attention for Efficient LLM Reasoningblog · 2025-08-31
  9. Training-Free Sparse Attention via Cross-Head Token Aggregation Cuts Reasoning Latency Without Accuracy Losspaper · 2025-08-09
  10. Accelerating LRM Inference via Semantic Speculative Reasoningblog · 2025-04-30
  11. Legilimens: Continuous Learning for On-Device Video Analytics on Mobile Edge SoCspaper · 2025-04-29
  12. Guillotine: Hardware-Software Co-Design for Existential AI Containmentpaper · 2025-04-22
  13. Speculative Reasoning for Faster LRM Inferencepaper · 2025-04-10
  14. SDN-Inspired Hardware-Offloaded SFUs Cut Video Conferencing Latency 26x and Scale 210x Over Commodity Serversblog · 2025-03-31
  15. Scallop: Hardware-Accelerated SFUs for Scalable Video Conferencingpaper · 2025-03-14
  16. SmartNICs as First-Class Compute in AI Inference Pipelines: A Case for Network-Side Offloadingblog · 2025-02-28
  17. SmartNICs as First-Class Compute in AI Inference Pipelines: A Case for Network-Side Offloadingpaper · 2025-01-22
  18. METIS: Optimizing RAG for Latency and Qualityblog · 2024-12-31
  19. METIS: Optimizing RAG System Performance through Adaptive Configuration and Schedulingpaper · 2024-12-13
  20. Marconi: Optimized Prefix Caching for Hybrid LLMsblog · 2024-11-30
  21. Marconi: Optimizing Prefix Caching for Hybrid LLMspaper · 2024-11-28
  22. Mowgli: passively learned rate control for real-time videoblog · 2024-10-31
  23. Mowgli Enables Production-Ready Data-Driven Rate Control via Passive Telemetry Learningpaper · 2024-10-04
  24. Application-Centric Network Design: Bridging the Gap for Enhanced Performanceyoutube · 2023-12-22
  25. Apparate Harnesses Early Exits to Decouple Latency from Throughput in ML Inferencepaper · 2023-12-08
  26. MadEye Dynamically Optimizes PTZ Camera Orientations to Maximize Live Video Analytics Accuracypaper · 2023-04-04
  27. JavaScript-Aware Crawler Java Cuts Web Archive Storage by 41% While Eliminating Fidelity Lossyoutube · 2022-09-22
  28. MARVOLO Enables Efficient Semantics-Preserving Augmentation for ML Malware Detectionpaper · 2022-06-07
  29. Privid Enables Privacy-Preserving Video Analytics via Row Event Duration Privacyyoutube · 2022-05-19
  30. Dashlet Tames Swipe Uncertainty with Statistical Pre-Buffering for Superior Short Video QoEpaper · 2022-04-27
  31. Bamboo Enables Resilient Preemptible Training of Large DNNs by Filling Pipeline Bubbles with Redundant Computationpaper · 2022-04-26
  32. Canvas Isolates Swap Paths to Eliminate Multi-App Interference in Remote Memory Systemspaper · 2022-03-17
  33. Model Merging Overcomes Edge GPU Memory Limits for Concurrent Video Analyticspaper · 2022-01-19
  34. Revelio ML Assistant Generates Effective Debugging Queries for Distributed Systemspaper · 2021-06-28
  35. Privid Introduces Duration-Based Differential Privacy for Robust Video Analyticspaper · 2021-06-22
  36. Boggart Enables General-Purpose Video Analytics Acceleration with Indexed Imprecise Pre-Filteringpaper · 2021-06-21
  37. Dorylus Enables Scalable GNN Training on Distributed CPUs with Serverless Threads for Billion-Edge Graphspaper · 2021-05-24
  38. SAMU: Offloading GC Tracing to Memory Servers Boosts Managed Workloads in Disaggregated Datacentersyoutube · 2020-11-18
  39. Khameleon Achieves Sub-100ms Latencies in Network-Bottlenecked DVE Apps via Progressive Prefetching and Greedy Schedulingpaper · 2020-07-15
  40. ABC: Single-Bit Feedback Enables Precise Congestion Control for Time-Varying Wireless Linksyoutube · 2020-03-25
  41. ABC: Simple Explicit Congestion Control Excels on Wireless Networkspaper · 2019-05-09
  42. Monetary Incentives and Cryptocurrency-Enabled Proofs Unlock Peer Participation in P2P Video Deliverypaper · 2018-08-02