About Together AI

Open-source AI cloud and inference infrastructure. Co-founded by Vipul Ved Prakash. $305M Series B. Major player in open-model inference.

Together AI operates an open-source inference platform for large language models, backed by $305M in Series B funding. They shape infrastructure discussions around model efficiency and multi-modal inference through frequent technical announcements targeting builders.

What Together talks about (last 47 posts)

llm-inference23%

together-ai21%

model-efficiency17%

ai-agents13%

long-context-reasoning11%

text-to-video9%

speech-to-text9%

query-optimization6%

Vibe

Provocative0

Announcing85

Devil's Advocate0

Humorous0

Troll0

X / @togethercompute Blog

Compiled from 47 entries (31 tweets, 4 papers, 12 articles) / updated 20d ago / v1

Together AI is an open-source AI cloud and inference infrastructure provider co-founded by Vipul Ved Prakash, backed by a $305M Series B, positioning itself as a major player in AI infra, inference, open-source models, and GPU cloud services. They emphasize production-ready deployment of cutting-edge open models like DeepSeek V4 Pro and Kimi K2.6 with 99.9% SLA, alongside pioneering research in model efficiency, long-context handling, and kernel optimizations such as FlashAttention-4 and Parcae looped transformers. Recognized on Forbes AI 50, they focus on AI-native cloud for the full lifecycle from research to scalable production, with integrations for multimodal, agentic, and voice AI.

Company Overview

Together AI provides an AI-native cloud platform for open-source AI inference, fine-tuning, and deployment, co-founded by Vipul Ved Prakash with a $305M Series B.[bio] Named to Forbes AI 50 for its complete AI lifecycle support including fast inference and large-scale fine-tuning.[21] Topics: ai-infra, inference, open-source, gpu-cloud.[bio]

Model Deployments and Benchmarks

Together AI hosts production-ready open models with 99.9% SLA in serverless or dedicated modes.[1][2][9] DeepSeek V4 Pro delivers SOTA coding (93.5% LiveCodeBench, 3206 Codeforces, 80.6% SWE-Bench Verified) via hybrid attention (27% lower FLOPs, 10% reduced KV cache vs V3.2) and modes: Non-think, Think High, Think Max.[1][2][3] Challenges: Benchmarks self-reported without independent verification; SWE-Bench sensitive to agent frameworks; efficiency vs prior model only; SLA uptime-focused.[structured claims & counter-claims] Kimi K2.6 (Moonshot AI) offers multimodal agentic AI with 300-sub-agent swarms (80.2% SWE-Bench, 89.6% LiveCodeBench, 79.4% MMMU-Pro).[9][10][11] Other integrations: Alibaba Wan 2.7 for video gen/editing,[30][31][32][37] Deepgram STT/TTS for voice agents,[33][34][35][38] NVIDIA Nemotron 3 Super (hybrid MoE, 1M context).[46]

Research Contributions

At ICLR 2025, presents on model efficiency, long-context reasoning, next-gen attention, decoding.[4][5][6][7][8] Challenges: 'New work' may not confirm accepted papers; topics could be general focus.[counter-claims] Parcae enables stable looped transformers matching 1.3B quality at 770M params via spectral radius control (e.g., 370M: 20.00 Core vs Transformer's 17.46). Scales recurrence/data via power laws for FLOP-efficient inference.[14-20] Mamba-3 SSM optimized for inference (better prefill/decode vs Mamba-2, Transformers).[44] Aurora: Online RL for adaptive speculative decoding.[41] Divide & Conquer boosts long-context perf with small models outperforming GPT-4o.[42] DBPlanBench: LLMs patch database query plans for speedups.[28][29][36]

Infrastructure and Optimizations

AI-native cloud for GPU-intensive workloads, rapid research-to-production.[26] Kernel team optimizes LLM perf (FlashAttention heritage), e.g., FlashAttention-4 hits 1605 TFLOPs/s on Blackwell.[39][40][47] Fine-tuning supports 100B+ models, tool-calling, VLMs.[43] Partnerships: NVIDIA (Dynamo, Nemotron),[45][46] OpenClaw agents.[27] EinsteinArena: AI agents solve math problems (e.g., kissing number 11D: 593→604).[12][13]

Blogs and Tools

Key themes

Production-Ready Inference

Emphasis on 99.9% SLA cloud for open models like DeepSeek V4 Pro, Kimi K2.6, with multimodal/agentic support.

DeepSeek V4 Pro production-ready on Together AI with 99.9% SLA [1]
Kimi K2.6 deployable with 99.9% SLA [9]

Model Efficiency & Architectures

Innovations like Parcae looped transformers, Mamba-3 SSM, hybrid attention for lower FLOPs/memory.

Parcae: 1.3B quality at 770M params [14]
DeepSeek hybrid attention: 27% fewer FLOPs [1]
Mamba-3 inference optimized [44]

Coding & Agentic Benchmarks

SOTA claims on LiveCodeBench, SWE-Bench, Codeforces; agent swarms, EinsteinArena math solvers.

DeepSeek: 93.5% LiveCodeBench [1]
Kimi: 80.2% SWE-Bench [9]
Kissing number breakthrough [12]

Kernel & Hardware Optimizations

FlashAttention-4, Aurora RL specdec, Blackwell GPU max utilization.

FlashAttention-4: 1605 TFLOPs/s [47]
Kernels team focus [39][40]

AI-Native Cloud & Integrations

Full lifecycle platform with video (Wan 2.7), voice (Deepgram), NVIDIA partnerships.

AI-native cloud imperative [26]
Deepgram integration [33]
NVIDIA Nemotron [46]

Research at Conferences

ICLR papers on efficiency, long-context, attention/decoding.

ICLR 2025 presentations [4]
Parcae, etc. [14-20]

What Together Recommends

together-ai ↗

service · by TogetherCompute · 7 mentions

blog-post ↗

paper · by Together AI · 3 mentions

ai-for-systems-using-llms-to-optimize-database-query-execution ↗

paper · by Together AI · 3 mentions

kimi-k26 ↗

model · by @Kimi_Moonshot · 3 mentions

wan-27 ↗

tool · by Alibaba Cloud · 2 mentions

deepgram-speechtotext-and-voice-models ↗

product · by Deepgram · 2 mentions

parcae ↗

paper · 2 mentions

nvidia-gtc-2026 ↗

event · 2 mentions

aurora ↗

tool · by Together AI · 2 mentions

flux ↗

tool · 2 mentions

deepseek-v4-pro

tool · 2 mentions

nova3

tool

nova3-multilingual

tool

aura2

tool

deepgramai ↗

service

using-llms-to-optimize-database-query-execution ↗

paper · by Together AI

bauplanlabsmakingdatabasesfasterwithllmevolutionarysampling ↗

repo · by BauplanLabs

calc-var-pde-2022

paper · by He and Hua

bull-lond-math-soc-2025

paper · by Lin-Zhao

plan-divide-and-conquer-how-weak-models-excel-at-long-context-tasks ↗

paper · by Together AI

Most cited by

Other thinkers in the absorb network who most often quote, reply to, or cite Together in their compiled entries (last 90 days weighted 2x). Honest signal — no follower-graph required.

Dwarkesh Patel

@dwarkesh · rank 0/100

1 recent

Sources (47)

Every entry that fed the multi-agent compile above. Inline citation markers in the wiki text (like [1], [2]) are not yet individually linked to specific sources — this is the full set of sources the compile considered.

DeepSeek V4 Pro Delivers SOTA Coding with Efficient Long-Context Hybrid Attentiontweet · 2026-04-25
DeepSeek V4 Pro Delivers SOTA Coding with Hybrid Attention and Multi-Mode Reasoning on Together AItweet · 2026-04-25
DeepSeek V4 Pro Achieves SOTA Coding with Hybrid Attention and Multi-Mode Reasoning for Long-Context Efficiencytweet · 2026-04-25
Together AI Unveils Efficiency and Reasoning Advances for ICLR 2025tweet · 2026-04-23
Together AI Showcases Efficiency, Long-Context, and Attention Innovations at ICLRtweet · 2026-04-23
Together AI Showcases Efficiency, Long-Context, and Next-Gen Attention Advances at ICLRtweet · 2026-04-23
Together AI Showcases Efficiency, Long-Context, and Next-Gen Attention Innovations at ICLRtweet · 2026-04-23
Together AI Unveils ICLR Papers on Model Efficiency, Long-Context Reasoning, and Advanced Attention-Decodingtweet · 2026-04-23
Kimi K2.6 Delivers Production-Ready Multimodal Agentic AI with 80%+ SWE-Bench and Swarm Scalingtweet · 2026-04-23
Kimi K2.6 Delivers Top Agentic Performance with 300-Sub-Agent Swarm and 80%+ Coding Benchmarkstweet · 2026-04-23
Kimi K2.6 Delivers Production-Ready Multimodal Agentic AI with 80%+ SWE-Bench and Swarm Scalingtweet · 2026-04-23
AI Agents in EinsteinArena Achieve Breakthrough on Newton's 300-Year-Old Kissing Number Problemtweet · 2026-04-20
AI Agents in EinsteinArena Achieve Breakthroughs on Century-Old Math Problems, Improving Kissing Number in 11D from 593 to 604tweet · 2026-04-20
Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
Parcae Enables Stable Looped Transformers with 1.3B-Quality from 770M Parameters via Spectral Radius Controltweet · 2026-04-20
Parcae Enables Stable Looped Transformers with 1.3B-Quality Performance from 770M Parameterstweet · 2026-04-20
Parcae Enables Stable Looped Transformers with Superior Scaling and Efficiencytweet · 2026-04-20
Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
Parcae Enables Stable Looped Transformers with Superior Performance at Sub-Billion Scalestweet · 2026-04-20
Parcae Enables Stable Looped Transformers Matching 1.3B Quality at 770M Parameterstweet · 2026-04-20
Together AI Secures Repeat Spot on Forbes AI 50 for AI-Native Cloud Platformtweet · 2026-04-20
Causal Fairness Analysis Reveals ADHD Penalty on STEM GPA is Largely Direct and Varies by Racepaper · 2026-04-17
Geometric Classification of Steklov Eigenvalues on Trees with Diameter Constraints Completedpaper · 2026-04-17
Observation of the Exotic Meson π1(1600) at BESIIIpaper · 2026-04-17
Pinching-Antenna Systems (PASS): A Novel Solution for High-Capacity Rail Transit Communicationspaper · 2026-04-17
The Imperative for an AI-Native Cloud Infrastructureblog · 2026-04-07
OpenClaw Integrates with Together AI for Enhanced Agentic AI Capabilitiestweet · 2026-04-05
LLMs Enhance Query Plan Optimization for Databasestweet · 2026-04-03
LLM-Driven Query Plan Patching for Database Optimizationtweet · 2026-04-03
Together AI Integrates Alibaba Cloud's Wan 2.7 for Enhanced Video Generationtweet · 2026-04-03
Together AI Integrates Alibaba Cloud's Wan 2.7 for Enhanced Video Generationtweet · 2026-04-03
Together AI Integrates Alibaba Cloud's Wan 2.7 Video Generation Modeltweet · 2026-04-03
Deepgram Speech Models Integrated into Together AI for Real-time Voice Agentstweet · 2026-04-03
Together AI Integrates Deepgram for Real-time Voice AItweet · 2026-04-03
Together AI Integrates Deepgram for Real-time Voice AItweet · 2026-04-03
Correcting Database Optimizer Failures via LLM-Driven Semantic Plan Patchingblog · 2026-04-03
Together AI Launches Wan 2.7 for Enhanced Video Generation and Editingblog · 2026-04-03
Deepgram Speech-to-Text and Voice Models Now Available on Together AIblog · 2026-04-02
Together AI Kernel Optimization Initiativestweet · 2026-04-01
Kernel Optimization is Key to AI Performance and Efficiencyblog · 2026-04-01
Aurora: Closing the Loop with Online RL for Adaptive Speculative Decodingblog · 2026-03-31
Divide and Conquer Strategy Improves LLM Long-Context Performanceblog · 2026-03-26
Together AI Enhances Fine-Tuning with Advanced Capabilities for LLMs and VLMsblog · 2026-03-18
Mamba-3: State Space Model Optimized for Inference Efficiencyblog · 2026-03-17
Together AI and NVIDIA Collaborate on Open, Agentic, and Production-Ready AI Systemsblog · 2026-03-16
Nemotron 3 Super: A Hybrid MoE for Agentic AI on Together AIblog · 2026-03-11
FlashAttention-4: Maximizing Blackwell GPU Utilization Through Algorithmic and Kernel Co-design for Attentionblog · 2026-03-05