absorb.md

About Together AI

Open-source AI cloud and inference infrastructure. Co-founded by Vipul Ved Prakash. $305M Series B. Major player in open-model inference.

Together AI operates an open-source inference platform for large language models, backed by $305M in Series B funding. They shape infrastructure discussions around model efficiency and multi-modal inference through frequent technical announcements targeting builders.

What Together talks about (last 47 posts)

llm-inference23%
together-ai21%
model-efficiency17%
ai-agents13%
long-context-reasoning11%
text-to-video9%
speech-to-text9%
query-optimization6%

Vibe

Provocative0
Announcing85
Devil's Advocate0
Humorous0
Troll0

Together AI is an open-source AI cloud and inference infrastructure provider co-founded by Vipul Ved Prakash, backed by a $305M Series B, positioning itself as a major player in AI infra, inference, open-source models, and GPU cloud services. They emphasize production-ready deployment of cutting-edge open models like DeepSeek V4 Pro and Kimi K2.6 with 99.9% SLA, alongside pioneering research in model efficiency, long-context handling, and kernel optimizations such as FlashAttention-4 and Parcae looped transformers. Recognized on Forbes AI 50, they focus on AI-native cloud for the full lifecycle from research to scalable production, with integrations for multimodal, agentic, and voice AI.

Company Overview

Together AI provides an AI-native cloud platform for open-source AI inference, fine-tuning, and deployment, co-founded by Vipul Ved Prakash with a $305M Series B.[bio] Named to Forbes AI 50 for its complete AI lifecycle support including fast inference and large-scale fine-tuning.[21] Topics: ai-infra, inference, open-source, gpu-cloud.[bio]

Model Deployments and Benchmarks

Together AI hosts production-ready open models with 99.9% SLA in serverless or dedicated modes.[1][2][9] DeepSeek V4 Pro delivers SOTA coding (93.5% LiveCodeBench, 3206 Codeforces, 80.6% SWE-Bench Verified) via hybrid attention (27% lower FLOPs, 10% reduced KV cache vs V3.2) and modes: Non-think, Think High, Think Max.[1][2][3] Challenges: Benchmarks self-reported without independent verification; SWE-Bench sensitive to agent frameworks; efficiency vs prior model only; SLA uptime-focused.[structured claims & counter-claims] Kimi K2.6 (Moonshot AI) offers multimodal agentic AI with 300-sub-agent swarms (80.2% SWE-Bench, 89.6% LiveCodeBench, 79.4% MMMU-Pro).[9][10][11] Other integrations: Alibaba Wan 2.7 for video gen/editing,[30][31][32][37] Deepgram STT/TTS for voice agents,[33][34][35][38] NVIDIA Nemotron 3 Super (hybrid MoE, 1M context).[46]

Research Contributions

At ICLR 2025, presents on model efficiency, long-context reasoning, next-gen attention, decoding.[4][5][6][7][8] Challenges: 'New work' may not confirm accepted papers; topics could be general focus.[counter-claims] Parcae enables stable looped transformers matching 1.3B quality at 770M params via spectral radius control (e.g., 370M: 20.00 Core vs Transformer's 17.46). Scales recurrence/data via power laws for FLOP-efficient inference.[14-20] Mamba-3 SSM optimized for inference (better prefill/decode vs Mamba-2, Transformers).[44] Aurora: Online RL for adaptive speculative decoding.[41] Divide & Conquer boosts long-context perf with small models outperforming GPT-4o.[42] DBPlanBench: LLMs patch database query plans for speedups.[28][29][36]

Infrastructure and Optimizations

AI-native cloud for GPU-intensive workloads, rapid research-to-production.[26] Kernel team optimizes LLM perf (FlashAttention heritage), e.g., FlashAttention-4 hits 1605 TFLOPs/s on Blackwell.[39][40][47] Fine-tuning supports 100B+ models, tool-calling, VLMs.[43] Partnerships: NVIDIA (Dynamo, Nemotron),[45][46] OpenClaw agents.[27] EinsteinArena: AI agents solve math problems (e.g., kissing number 11D: 593→604).[12][13]

Blogs and Tools

Production-Ready Inference

Emphasis on 99.9% SLA cloud for open models like DeepSeek V4 Pro, Kimi K2.6, with multimodal/agentic support.

  • DeepSeek V4 Pro production-ready on Together AI with 99.9% SLA [1]

  • Kimi K2.6 deployable with 99.9% SLA [9]

Model Efficiency & Architectures

Innovations like Parcae looped transformers, Mamba-3 SSM, hybrid attention for lower FLOPs/memory.

  • Parcae: 1.3B quality at 770M params [14]

  • DeepSeek hybrid attention: 27% fewer FLOPs [1]

  • Mamba-3 inference optimized [44]

Coding & Agentic Benchmarks

SOTA claims on LiveCodeBench, SWE-Bench, Codeforces; agent swarms, EinsteinArena math solvers.

  • DeepSeek: 93.5% LiveCodeBench [1]

  • Kimi: 80.2% SWE-Bench [9]

  • Kissing number breakthrough [12]

Kernel & Hardware Optimizations

FlashAttention-4, Aurora RL specdec, Blackwell GPU max utilization.

  • FlashAttention-4: 1605 TFLOPs/s [47]

  • Kernels team focus [39][40]

AI-Native Cloud & Integrations

Full lifecycle platform with video (Wan 2.7), voice (Deepgram), NVIDIA partnerships.

  • AI-native cloud imperative [26]

  • Deepgram integration [33]

  • NVIDIA Nemotron [46]

Research at Conferences

ICLR papers on efficiency, long-context, attention/decoding.

  • ICLR 2025 presentations [4]

  • Parcae, etc. [14-20]

service · by TogetherCompute · 7 mentions
paper · by Together AI · 3 mentions
paper · by Together AI · 3 mentions
model · by @Kimi_Moonshot · 3 mentions
tool · by Alibaba Cloud · 2 mentions
product · by Deepgram · 2 mentions
paper · 2 mentions
event · 2 mentions
tool · by Together AI · 2 mentions
tool · 2 mentions
deepseek-v4-pro
tool · 2 mentions
nova3
tool
nova3-multilingual
tool
aura2
tool
service
paper · by Together AI
repo · by BauplanLabs
calc-var-pde-2022
paper · by He and Hua
bull-lond-math-soc-2025
paper · by Lin-Zhao
paper · by Together AI

Other thinkers in the absorb network who most often quote, reply to, or cite Together in their compiled entries (last 90 days weighted 2x). Honest signal — no follower-graph required.

Dwarkesh Patel
@dwarkesh · rank 0/100
1 recent

Every entry that fed the multi-agent compile above. Inline citation markers in the wiki text (like [1], [2]) are not yet individually linked to specific sources — this is the full set of sources the compile considered.

  1. DeepSeek V4 Pro Delivers SOTA Coding with Efficient Long-Context Hybrid Attentiontweet · 2026-04-25
  2. DeepSeek V4 Pro Delivers SOTA Coding with Hybrid Attention and Multi-Mode Reasoning on Together AItweet · 2026-04-25
  3. DeepSeek V4 Pro Achieves SOTA Coding with Hybrid Attention and Multi-Mode Reasoning for Long-Context Efficiencytweet · 2026-04-25
  4. Together AI Unveils Efficiency and Reasoning Advances for ICLR 2025tweet · 2026-04-23
  5. Together AI Showcases Efficiency, Long-Context, and Attention Innovations at ICLRtweet · 2026-04-23
  6. Together AI Showcases Efficiency, Long-Context, and Next-Gen Attention Advances at ICLRtweet · 2026-04-23
  7. Together AI Showcases Efficiency, Long-Context, and Next-Gen Attention Innovations at ICLRtweet · 2026-04-23
  8. Together AI Unveils ICLR Papers on Model Efficiency, Long-Context Reasoning, and Advanced Attention-Decodingtweet · 2026-04-23
  9. Kimi K2.6 Delivers Production-Ready Multimodal Agentic AI with 80%+ SWE-Bench and Swarm Scalingtweet · 2026-04-23
  10. Kimi K2.6 Delivers Top Agentic Performance with 300-Sub-Agent Swarm and 80%+ Coding Benchmarkstweet · 2026-04-23
  11. Kimi K2.6 Delivers Production-Ready Multimodal Agentic AI with 80%+ SWE-Bench and Swarm Scalingtweet · 2026-04-23
  12. AI Agents in EinsteinArena Achieve Breakthrough on Newton's 300-Year-Old Kissing Number Problemtweet · 2026-04-20
  13. AI Agents in EinsteinArena Achieve Breakthroughs on Century-Old Math Problems, Improving Kissing Number in 11D from 593 to 604tweet · 2026-04-20
  14. Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
  15. Parcae Enables Stable Looped Transformers with 1.3B-Quality from 770M Parameters via Spectral Radius Controltweet · 2026-04-20
  16. Parcae Enables Stable Looped Transformers with 1.3B-Quality Performance from 770M Parameterstweet · 2026-04-20
  17. Parcae Enables Stable Looped Transformers with Superior Scaling and Efficiencytweet · 2026-04-20
  18. Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
  19. Parcae Enables Stable Looped Transformers with Superior Performance at Sub-Billion Scalestweet · 2026-04-20
  20. Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
  21. Together AI Secures Repeat Spot on Forbes AI 50 for AI-Native Cloud Platformtweet · 2026-04-20
  22. Causal Fairness Analysis Reveals ADHD Penalty on STEM GPA is Largely Direct and Varies by Racepaper · 2026-04-17
  23. Geometric Classification of Steklov Eigenvalues on Trees with Diameter Constraints Completedpaper · 2026-04-17
  24. Observation of the Exotic Meson π1(1600) at BESIIIpaper · 2026-04-17
  25. Pinching-Antenna Systems (PASS): A Novel Solution for High-Capacity Rail Transit Communicationspaper · 2026-04-17
  26. The Imperative for an AI-Native Cloud Infrastructureblog · 2026-04-07
  27. OpenClaw Integrates with Together AI for Enhanced Agentic AI Capabilitiestweet · 2026-04-05
  28. LLMs Enhance Query Plan Optimization for Databasestweet · 2026-04-03
  29. LLM-Driven Query Plan Patching for Database Optimizationtweet · 2026-04-03
  30. Together AI Integrates Alibaba Cloud's Wan 2.7 for Enhanced Video Generationtweet · 2026-04-03
  31. Together AI Integrates Alibaba Cloud's Wan 2.7 for Enhanced Video Generationtweet · 2026-04-03
  32. Together AI Integrates Alibaba Cloud's Wan 2.7 Video Generation Modeltweet · 2026-04-03
  33. Deepgram Speech Models Integrated into Together AI for Real-time Voice Agentstweet · 2026-04-03
  34. Together AI Integrates Deepgram for Real-time Voice AItweet · 2026-04-03
  35. Together AI Integrates Deepgram for Real-time Voice AItweet · 2026-04-03
  36. Correcting Database Optimizer Failures via LLM-Driven Semantic Plan Patchingblog · 2026-04-03
  37. Together AI Launches Wan 2.7 for Enhanced Video Generation and Editingblog · 2026-04-03
  38. Deepgram Speech-to-Text and Voice Models Now Available on Together AIblog · 2026-04-02
  39. Together AI Kernel Optimization Initiativestweet · 2026-04-01
  40. Kernel Optimization is Key to AI Performance and Efficiencyblog · 2026-04-01
  41. Aurora: Closing the Loop with Online RL for Adaptive Speculative Decodingblog · 2026-03-31
  42. Divide and Conquer Strategy Improves LLM Long-Context Performanceblog · 2026-03-26
  43. Together AI Enhances Fine-Tuning with Advanced Capabilities for LLMs and VLMsblog · 2026-03-18
  44. Mamba-3: State Space Model Optimized for Inference Efficiencyblog · 2026-03-17
  45. Together AI and NVIDIA Collaborate on Open, Agentic, and Production-Ready AI Systemsblog · 2026-03-16
  46. Nemotron 3 Super: A Hybrid MoE for Agentic AI on Together AIblog · 2026-03-11
  47. FlashAttention-4: Maximizing Blackwell GPU Utilization Through Algorithmic and Kernel Co-design for Attentionblog · 2026-03-05