
About Together AI
Open-source AI cloud and inference infrastructure. Co-founded by Vipul Ved Prakash. $305M Series B. Major player in open-model inference.
Together AI operates an open-source inference platform for large language models, backed by $305M in Series B funding. They shape infrastructure discussions around model efficiency and multi-modal inference through frequent technical announcements targeting builders.
What Together talks about (last 47 posts)
Vibe
Together AI is an open-source AI cloud and inference infrastructure provider co-founded by Vipul Ved Prakash, backed by a $305M Series B, positioning itself as a major player in AI infra, inference, open-source models, and GPU cloud services. They emphasize production-ready deployment of cutting-edge open models like DeepSeek V4 Pro and Kimi K2.6 with 99.9% SLA, alongside pioneering research in model efficiency, long-context handling, and kernel optimizations such as FlashAttention-4 and Parcae looped transformers. Recognized on Forbes AI 50, they focus on AI-native cloud for the full lifecycle from research to scalable production, with integrations for multimodal, agentic, and voice AI.
Company Overview
Together AI provides an AI-native cloud platform for open-source AI inference, fine-tuning, and deployment, co-founded by Vipul Ved Prakash with a $305M Series B.[bio] Named to Forbes AI 50 for its complete AI lifecycle support including fast inference and large-scale fine-tuning.[21] Topics: ai-infra, inference, open-source, gpu-cloud.[bio]
Model Deployments and Benchmarks
Together AI hosts production-ready open models with 99.9% SLA in serverless or dedicated modes.[1][2][9] DeepSeek V4 Pro delivers SOTA coding (93.5% LiveCodeBench, 3206 Codeforces, 80.6% SWE-Bench Verified) via hybrid attention (27% lower FLOPs, 10% reduced KV cache vs V3.2) and modes: Non-think, Think High, Think Max.[1][2][3] Challenges: Benchmarks self-reported without independent verification; SWE-Bench sensitive to agent frameworks; efficiency vs prior model only; SLA uptime-focused.[structured claims & counter-claims] Kimi K2.6 (Moonshot AI) offers multimodal agentic AI with 300-sub-agent swarms (80.2% SWE-Bench, 89.6% LiveCodeBench, 79.4% MMMU-Pro).[9][10][11] Other integrations: Alibaba Wan 2.7 for video gen/editing,[30][31][32][37] Deepgram STT/TTS for voice agents,[33][34][35][38] NVIDIA Nemotron 3 Super (hybrid MoE, 1M context).[46]
Research Contributions
At ICLR 2025, presents on model efficiency, long-context reasoning, next-gen attention, decoding.[4][5][6][7][8] Challenges: 'New work' may not confirm accepted papers; topics could be general focus.[counter-claims] Parcae enables stable looped transformers matching 1.3B quality at 770M params via spectral radius control (e.g., 370M: 20.00 Core vs Transformer's 17.46). Scales recurrence/data via power laws for FLOP-efficient inference.[14-20] Mamba-3 SSM optimized for inference (better prefill/decode vs Mamba-2, Transformers).[44] Aurora: Online RL for adaptive speculative decoding.[41] Divide & Conquer boosts long-context perf with small models outperforming GPT-4o.[42] DBPlanBench: LLMs patch database query plans for speedups.[28][29][36]
Infrastructure and Optimizations
AI-native cloud for GPU-intensive workloads, rapid research-to-production.[26] Kernel team optimizes LLM perf (FlashAttention heritage), e.g., FlashAttention-4 hits 1605 TFLOPs/s on Blackwell.[39][40][47] Fine-tuning supports 100B+ models, tool-calling, VLMs.[43] Partnerships: NVIDIA (Dynamo, Nemotron),[45][46] OpenClaw agents.[27] EinsteinArena: AI agents solve math problems (e.g., kissing number 11D: 593→604).[12][13]
Blogs and Tools
Production-Ready Inference
Emphasis on 99.9% SLA cloud for open models like DeepSeek V4 Pro, Kimi K2.6, with multimodal/agentic support.
Model Efficiency & Architectures
Innovations like Parcae looped transformers, Mamba-3 SSM, hybrid attention for lower FLOPs/memory.
Coding & Agentic Benchmarks
SOTA claims on LiveCodeBench, SWE-Bench, Codeforces; agent swarms, EinsteinArena math solvers.
Kernel & Hardware Optimizations
FlashAttention-4, Aurora RL specdec, Blackwell GPU max utilization.
AI-Native Cloud & Integrations
Full lifecycle platform with video (Wan 2.7), voice (Deepgram), NVIDIA partnerships.
Research at Conferences
ICLR papers on efficiency, long-context, attention/decoding.
ICLR 2025 presentations [4]
Parcae, etc. [14-20]
Other thinkers in the absorb network who most often quote, reply to, or cite Together in their compiled entries (last 90 days weighted 2x). Honest signal — no follower-graph required.
Every entry that fed the multi-agent compile above. Inline citation markers in the wiki text (like [1], [2]) are not yet individually linked to specific sources — this is the full set of sources the compile considered.
- DeepSeek V4 Pro Delivers SOTA Coding with Efficient Long-Context Hybrid Attentiontweet · 2026-04-25
- DeepSeek V4 Pro Delivers SOTA Coding with Hybrid Attention and Multi-Mode Reasoning on Together AItweet · 2026-04-25
- DeepSeek V4 Pro Achieves SOTA Coding with Hybrid Attention and Multi-Mode Reasoning for Long-Context Efficiencytweet · 2026-04-25
- Together AI Unveils Efficiency and Reasoning Advances for ICLR 2025tweet · 2026-04-23
- Together AI Showcases Efficiency, Long-Context, and Attention Innovations at ICLRtweet · 2026-04-23
- Together AI Showcases Efficiency, Long-Context, and Next-Gen Attention Advances at ICLRtweet · 2026-04-23
- Together AI Showcases Efficiency, Long-Context, and Next-Gen Attention Innovations at ICLRtweet · 2026-04-23
- Together AI Unveils ICLR Papers on Model Efficiency, Long-Context Reasoning, and Advanced Attention-Decodingtweet · 2026-04-23
- Kimi K2.6 Delivers Production-Ready Multimodal Agentic AI with 80%+ SWE-Bench and Swarm Scalingtweet · 2026-04-23
- Kimi K2.6 Delivers Top Agentic Performance with 300-Sub-Agent Swarm and 80%+ Coding Benchmarkstweet · 2026-04-23
- Kimi K2.6 Delivers Production-Ready Multimodal Agentic AI with 80%+ SWE-Bench and Swarm Scalingtweet · 2026-04-23
- AI Agents in EinsteinArena Achieve Breakthrough on Newton's 300-Year-Old Kissing Number Problemtweet · 2026-04-20
- AI Agents in EinsteinArena Achieve Breakthroughs on Century-Old Math Problems, Improving Kissing Number in 11D from 593 to 604tweet · 2026-04-20
- Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
- Parcae Enables Stable Looped Transformers with 1.3B-Quality from 770M Parameters via Spectral Radius Controltweet · 2026-04-20
- Parcae Enables Stable Looped Transformers with 1.3B-Quality Performance from 770M Parameterstweet · 2026-04-20
- Parcae Enables Stable Looped Transformers with Superior Scaling and Efficiencytweet · 2026-04-20
- Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
- Parcae Enables Stable Looped Transformers with Superior Performance at Sub-Billion Scalestweet · 2026-04-20
- Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
- Together AI Secures Repeat Spot on Forbes AI 50 for AI-Native Cloud Platformtweet · 2026-04-20
- Causal Fairness Analysis Reveals ADHD Penalty on STEM GPA is Largely Direct and Varies by Racepaper · 2026-04-17
- Geometric Classification of Steklov Eigenvalues on Trees with Diameter Constraints Completedpaper · 2026-04-17
- Observation of the Exotic Meson π1(1600) at BESIIIpaper · 2026-04-17
- Pinching-Antenna Systems (PASS): A Novel Solution for High-Capacity Rail Transit Communicationspaper · 2026-04-17
- The Imperative for an AI-Native Cloud Infrastructureblog · 2026-04-07
- OpenClaw Integrates with Together AI for Enhanced Agentic AI Capabilitiestweet · 2026-04-05
- LLMs Enhance Query Plan Optimization for Databasestweet · 2026-04-03
- LLM-Driven Query Plan Patching for Database Optimizationtweet · 2026-04-03
- Together AI Integrates Alibaba Cloud's Wan 2.7 for Enhanced Video Generationtweet · 2026-04-03
- Together AI Integrates Alibaba Cloud's Wan 2.7 for Enhanced Video Generationtweet · 2026-04-03
- Together AI Integrates Alibaba Cloud's Wan 2.7 Video Generation Modeltweet · 2026-04-03
- Deepgram Speech Models Integrated into Together AI for Real-time Voice Agentstweet · 2026-04-03
- Together AI Integrates Deepgram for Real-time Voice AItweet · 2026-04-03
- Together AI Integrates Deepgram for Real-time Voice AItweet · 2026-04-03
- Correcting Database Optimizer Failures via LLM-Driven Semantic Plan Patchingblog · 2026-04-03
- Together AI Launches Wan 2.7 for Enhanced Video Generation and Editingblog · 2026-04-03
- Deepgram Speech-to-Text and Voice Models Now Available on Together AIblog · 2026-04-02
- Together AI Kernel Optimization Initiativestweet · 2026-04-01
- Kernel Optimization is Key to AI Performance and Efficiencyblog · 2026-04-01
- Aurora: Closing the Loop with Online RL for Adaptive Speculative Decodingblog · 2026-03-31
- Divide and Conquer Strategy Improves LLM Long-Context Performanceblog · 2026-03-26
- Together AI Enhances Fine-Tuning with Advanced Capabilities for LLMs and VLMsblog · 2026-03-18
- Mamba-3: State Space Model Optimized for Inference Efficiencyblog · 2026-03-17
- Together AI and NVIDIA Collaborate on Open, Agentic, and Production-Ready AI Systemsblog · 2026-03-16
- Nemotron 3 Super: A Hybrid MoE for Agentic AI on Together AIblog · 2026-03-11
- FlashAttention-4: Maximizing Blackwell GPU Utilization Through Algorithmic and Kernel Co-design for Attentionblog · 2026-03-05