Chronological feed of everything captured from Together AI.
DeepSeek V4 Pro introduces hybrid attention for 27% lower FLOPs and 10% reduced KV cache versus V3.2 in long-context inference. It achieves state-of-the-art coding benchmarks including 93.5% on LiveCodeBench, 3206 on Codeforces, and 80.6% on SWE-Bench Verified. The model supports three reasoning modes—Non-think, Think High, Think Max—and is production-ready on Together AI with 99.9% SLA for agentic workflows.
DeepSeek V4 Pro introduces hybrid attention for 27% lower FLOPs and 10% reduced KV cache versus V3.2 in long-context inference. It achieves state-of-the-art coding benchmarks including 93.5% on LiveCodeBench, 3206 on Codeforces, and 80.6% on SWE-Bench Verified. The model supports three reasoning modes—Non-think, Think High, and Think Max—and is production-ready on Together AI with 99.9% SLA.
DeepSeek V4 Pro introduces hybrid attention reducing FLOPs by 27% and KV cache by 10% compared to V3.2 for long-context inference. It delivers state-of-the-art coding benchmarks including 93.5% on LiveCodeBench, 3206 on Codeforces, and 80.6% on SWE-Bench Verified. The model supports three reasoning modes—Non-think, Think High, and Think Max—and is production-ready on Together AI with 99.9% SLA.
Together AI researchers present new work at ICLR on model efficiency, long-context reasoning, next-generation attention mechanisms, and decoding techniques. The announcement highlights ongoing developments in these areas via paper previews. This positions Together AI as advancing core AI model capabilities for practical deployment.
Together AI researchers present new papers at ICLR on model efficiency, long-context reasoning, next-generation attention mechanisms, and advanced decoding techniques. These works highlight ongoing advancements in core AI model architectures and inference optimizations. Technical details are shared via a four-post X thread with accompanying images.
Together AI researchers are presenting new work at ICLR on model efficiency, long-context reasoning, next-generation attention mechanisms, and advanced decoding techniques. The thread previews these contributions with links to papers and projects. This highlights ongoing innovations in AI model architectures and inference optimization.
Together AI researchers are presenting new work at ICLR on model efficiency, long-context reasoning, next-generation attention mechanisms, and advanced decoding techniques. The announcement highlights ongoing advancements in these core AI areas via linked resources. This positions Together AI as actively contributing to foundational model improvements.
Together AI researchers are presenting new work at ICLR focused on model efficiency, long-context reasoning, next-generation attention mechanisms, and decoding techniques. The announcement highlights ongoing advancements in these core AI areas via linked resources. This positions Together AI as actively contributing to foundational improvements in large-scale model performance.
Kimi K2.6 is a multimodal agentic model from Moonshot AI, now available on Together AI, featuring Agent Swarm scaling to 300 sub-agents and up to 4,000 coordinated steps for long-horizon tasks. It achieves 80.2% on SWE-Bench Verified, 89.6% on LiveCodeBench v6, and 79.4% on MMMU-Pro across text, image, and video inputs. Deployable on AI Native Cloud with 99.9% SLA in serverless or dedicated modes for reliable production inference.
Kimi K2.6 is a multimodal agentic model from Moonshot AI that scales to 300 sub-agents via Agent Swarm, enabling up to 4,000 coordinated steps with long-horizon coding stability. It achieves 80.2% on SWE-Bench Verified, 89.6% on LiveCodeBench v6, and 79.4% on MMMU-Pro across text, image, and video inputs. Now available on Together AI's cloud with 99.9% SLA for production-scale inference.
Kimi K2.6 is a multimodal agentic model from Moonshot AI, accessible via Together AI, featuring Agent Swarm scaling to 300 sub-agents and up to 4,000 coordinated steps for long-horizon coding stability. It achieves 80.2% on SWE-Bench Verified, 89.6% on LiveCodeBench v6, and 79.4% on MMMU-Pro across text, image, and video inputs. Deployable on AI Native Cloud with 99.9% SLA in serverless or dedicated modes for production-scale autonomous workflows.
EinsteinArena enables collaborative AI agents to tackle open science problems, recently improving the kissing number in 11 dimensions from 593 to 604 spheres through iterative optimization. Agents refined an initial overlapping sphere construction using LSQR to minimize overlap loss from 1e-13 to 1e-50, followed by integer snapping to achieve a verified valid solution. The platform has produced 11 new state-of-the-art results on problems like Erdős minimum overlap and Tammes problem (n=50), demonstrating real-time agent collaboration.
EinsteinArena enables collaborative AI agents to tackle open science problems in real-time, yielding rapid advancements like boosting the kissing number in dimension 11 from 593 to 604 spheres. Agents iteratively refined an initial overlapping construction using LSQR optimization to slash overlap loss from 1e-13 to 1e-50, followed by integer snapping for validation. By April 11, the platform set 11 new state-of-the-art results across problems including Erdős minimum overlap, second autocorrelation inequality, Tammes (n=50), and circles in a rectangle (n=21). The open-source system invites contributions via live leaderboards.
Parcae introduces stable looped architectures by modeling recurrence as a discrete LTI system and constraining the spectral radius below 1 with a learned negative diagonal matrix, enabling training at LR 1e-3 versus 4e-4 for unconstrained loops. Across 140M-1.3B scales, Parcae outperforms parameter-matched Transformers, e.g., 370M model scores 20.00 on Core vs 17.46 (+14.5%) and shows 6.3% lower validation perplexity than prior looped models. It establishes scaling laws where recurrence depth and data scale via power laws in tandem, optimizing quality under fixed FLOP budgets while reducing inference memory via deeper looping over wider models.
Parcae stabilizes looped transformer architectures by modeling recurrence as a discrete LTI system and constraining the spectral radius below 1 with a learned negative diagonal matrix, enabling training at LR 1e-3 versus 4e-4 for unconstrained loops. It outperforms parameter-matched Transformers across scales up to 1.3B, e.g., 370M Parcae achieves Core score 20.00 vs Transformer's 17.46 (+14.5%) and 6.3% lower validation perplexity than prior looped models. The work derives scaling laws showing recurrence and data must scale together as power laws, allowing FLOP-budgeted tradeoffs for deeper looping over wider models, reducing memory for edge inference.
Parcae introduces stable looped architectures by modeling recurrence as a discrete LTI system and constraining the spectral radius below 1 using a learned negative diagonal matrix, enabling training at LR 1e-3 versus 4e-4 for unconstrained loops. It outperforms parameter-matched Transformers across scales up to 1.3B, e.g., 370M model achieves 20.00 Core score vs. Transformer's 17.46 (+14.5%) and 6.3% lower validation perplexity than prior looped models. The work establishes scaling laws showing recurrence and data must scale together via power laws, allowing quality gains by looping deeper under fixed FLOP budgets while reducing memory costs for edge inference.
Parcae introduces a novel looped architecture that passes activations through the same layers multiple times, achieving stable training by modeling recurrence as a discrete LTI system and constraining the spectral radius below 1 with a learned negative diagonal matrix. This allows learning rates up to 1e-3, matching Transformer convergence, and outperforms parameter-matched Transformers across scales from 140M to 1.3B parameters. It establishes scaling laws where recurrence depth and data scale via power laws in tandem, enabling quality gains under fixed FLOP budgets by trading parameters for loops, with inference benefits from reduced memory usage.
Parcae introduces stable looped transformer architectures by modeling recurrence as a discrete LTI system and constraining the spectral radius below 1 with a learned negative diagonal matrix, enabling training at LR 1e-3. It outperforms parameter-matched Transformers across scales, e.g., 370M Parcae scores 20.00 on Core vs Transformer's 17.46 (+14.5%), with 6.3% lower validation perplexity than prior looped models. The work derives scaling laws showing recurrence and data must scale together as power laws, allowing quality gains via deeper looping under fixed FLOP budgets while reducing memory constraints for inference.
Parcae introduces stable looped Transformer architectures by modeling recurrence as a discrete LTI system and constraining the spectral radius below 1 with a learned negative diagonal matrix, enabling training at LR 1e-3 versus 4e-4 for unconstrained loops. It outperforms parameter-matched Transformers across 140M to 1.3B scales, e.g., 370M model achieves 20.00 Core score vs 17.46 (+14.5%) and 6.3% lower validation perplexity than prior looped models. The work derives scaling laws showing recurrence and data must scale together as power laws, allowing FLOP-budgeted tradeoffs for quality via deeper looping at fixed memory.
Parcae introduces stable looped Transformer architectures by modeling recurrence as a discrete LTI dynamical system and constraining the spectral radius below 1 with a learned negative diagonal matrix, enabling training at learning rates up to 1e-3. Across model sizes from 140M to 1.3B parameters, Parcae outperforms parameter- and data-matched Transformers, with a 370M model achieving 20.00 Core score versus Transformer's 17.46 (+14.5%) and 6.3% lower validation perplexity than prior looped models. The work establishes scaling laws showing recurrence and data must scale together via power laws, allowing quality gains by looping deeper under fixed FLOP budgets while reducing memory costs for edge inference.
Together AI has been named again to the Forbes AI 50 list, recognizing its AI Native Cloud designed for the complete AI lifecycle. The platform supports fast inference, open models, and large-scale fine-tuning. This accolade underscores its leadership in AI infrastructure.