absorb.md

LangChain

Chronological feed of everything captured from LangChain.

LangChain Introduces Dual Agent Authorization Models

LangChain's new Fleet offering distinguishes between two agent authorization models: "on-behalf-of" (Assistants) and fixed credentials (Claws). This differentiation addresses varying security and access control needs for AI agents interacting with external tools. The choice of model impacts data access, sharing capabilities, and the necessity for human-in-the-loop guardrails, particularly for sensitive actions.

LangChain at Google Cloud Next 2026: Agent Development and Deployment Focus

LangChain will have a significant presence at Google Cloud Next 2026, focusing on agent development and deployment. They will showcase updates to the LangChain ecosystem, including LangSmith for observability, evaluation, and deployment, and participate in breakout sessions covering secure, high-velocity runtimes for AI agents and frictionless developer experiences. LangChain will also host networking events and announce LangSmith's availability on the Google Cloud Marketplace.

Moda: AI-Powered Design Platform Leveraging Deep Agents and LangSmith for Production-Grade Visual Design

Moda is an AI-native design platform catering to non-designers, enabling the creation of professional-grade visual content through a multi-agent system. This system, built with Deep Agents and observed via LangSmith, addresses the challenge of AI in visual design by employing a custom Domain Specific Language (DSL) for layout representation, contextual engineering, and a collaborative user experience. The platform prioritizes iterative design and efficient resource utilization, demonstrating a strong product-market fit in enterprise sales for pitch deck generation.

LangSmith Fleet introduces shareable skills for enhanced agent functionality

LangSmith Fleet now supports shareable skills, enabling agents to be equipped with specialized knowledge for specific tasks. These skills codify domain expertise, improving agent utility by providing essential context that basic reasoning alone lacks. This functionality addresses the challenge of knowledge silos within teams and facilitates consistent agent performance across an organization by making crucial information uniformly accessible.

LangChain's Agent Middleware for Customizable LLM Agent Harnesses

LangChain introduces "Agent Middleware" to enable deep customization of LLM agent harnesses, moving beyond basic prompt and tool adjustments. This system provides distinct hooks (e.g., `before_model`, `wrap_tool_call`) to inject custom logic at various stages of the agent's operational loop. This architecture allows for the implementation of complex features like PII redaction, dynamic tool selection, and robust error handling, which are critical for production-grade AI applications while keeping core agent logic decoupled.

LangChain Deep Agents: Practical Evaluation Strategies for Agentic Systems

LangChain emphasizes targeted, behavior-driven evaluations for their Deep Agents framework, aiming to improve accuracy and reliability in production environments. Their methodology prioritizes curating specific evals based on observed agent behavior and desired outcomes, rather than relying on broad benchmarks. This approach focuses on optimizing for both correctness and efficiency, using metrics like step ratio, tool call ratio, and solve rate.

A Comprehensive Checklist for Robust AI Agent Evaluation

This post outlines a systematic, step-by-step methodology for evaluating AI agents, emphasizing practical considerations from initial setup to production deployment. It differentiates between various evaluation levels, dataset construction strategies, and grader designs. The core insight revolves around building a comprehensive evaluation framework that integrates manual review, automated checks, and continuous feedback loops to ensure agent reliability and performance.

The Evolution of LLM Agent Development from Scaffolds to Long-Horizon Agent Harnesses

The conversation explores the paradigm shift from traditional software development to building AI agents, highlighting the increased complexity introduced by non-deterministic LLM behavior. It emphasizes the critical role of "context engineering" and "agent harnesses" in navigating these complexities, particularly for long-horizon agents. The discussion also touches upon the use of tracing for debugging and collaboration, and the growing importance of human feedback and iterative development in refining agent performance.

The Shift to AI-Native Infrastructure: From Deterministic Code to Agentic Orchestration

The industry is transitioning from 'cloud-native' to 'AI-native' infrastructure, shifting the developer's role from writing deterministic code to orchestrating non-deterministic systems. This new stack relies on the Model Context Protocol (MCP) for tool integration and requires specialized observability and secure sandboxing to mitigate risks like prompt injection and non-deterministic failures. For enterprise viability, this infrastructure must move beyond local developer environments into secure, VPC-deployed runtimes with rigorous audit logging.

LangChain’s Role in Orchestrating the Agentic AI Paradigm

LangChain, initially known for its chain-based LLM applications, is evolving to become a crucial orchestration layer for AI agents. The company is actively developing frameworks like LangGraph to enable the creation of custom, controllable, and persistent agents that operate within the nuanced spectrum between rigid chains and fully autonomous AI. This strategy addresses the current limitations of both extremes, focusing on practical, production-ready agent deployments.

Deep Agents v0.5: Decoupling Agent Orchestration via Async Subagents

LangChain has introduced async subagents to Deep Agents, shifting from blocking inline execution to a non-blocking, task-based delegation model. This architecture utilizes the Agent Protocol to support stateful, remote, and heterogeneous agent deployments, allowing supervisor agents to manage multiple concurrent long-running tasks without stalling the user interaction loop.

LangChain Fleet Integrates Arcade.dev for Enhanced Agent Tooling

LangChain Fleet has partnered with Arcade.dev to provide agents with secure and reliable access to over 7,500 optimized tools. This integration addresses the complexities of managing multiple API connections by centralizing tool access through Arcade's MCP gateway. The partnership focuses on delivering agent-specific tools, improving tool selection and reducing issues like hallucinated parameters, while also offering robust authentication and authorization mechanisms for agent actions.

Beyond Model Weights: Continual Learning Across AI Agent Architectures

Continual learning in AI agents extends beyond mere model weight updates, encompassing three distinct layers: the model itself, the operational harness, and external context. Understanding these layers is crucial for developing AI systems that exhibit sustained improvement. This architecture enables more granular and flexible learning strategies, moving beyond the limitations of single-layer continuous learning.

Open-Weight Models Achieve Feature Parity with Frontier Models for Agentic Workloads

Recent evaluations by LangChain demonstrate that leading open-weight models like GLM-5 and MiniMax M2.7 perform comparably to closed frontier models on core agent tasks including file operations, tool use, and instruction following. This parity is achieved with significantly reduced costs and improved latency. These advancements enable more viable real-world agent deployments in production environments.

Open Models Achieve Performance Parity with Frontier Models in Agentic Tasks

Deep Agent harness evaluations reveal that open models like GLM-5 and MiniMax M2.7 now perform comparably to closed frontier models on core agent tasks, including file operations, tool use, and instruction following. This parity is achieved at significantly lower costs and latency, making open models viable alternatives or complements in production deployments. The Deep Agents SDK and CLI simplify integration by abstracting model-specific complexities.

LangChain Deepens Enterprise AI Capabilities with NVIDIA Partnership and Enhanced Agent Management Tools

LangChain is expanding its enterprise AI offerings through a strategic partnership with NVIDIA and significant upgrades to its agent management platform, LangSmith. Key developments include rebranding Agent Builder to LangSmith Fleet, introducing advanced access controls and audit logging, and releasing new open-source tooling. These updates aim to enhance the security, scalability, and functionality of AI agents for enterprise deployment.

LangChain and MongoDB Partner to Simplify AI Agent Development and Deployment

LangChain and MongoDB have partnered to integrate MongoDB Atlas as a comprehensive backend for AI agents, addressing the complexities of moving agent prototypes to production. This collaboration provides a unified platform for retrieval, persistent memory, operational data access, and observability, eliminating the need for fragmented infrastructure. The integration aims to leverage MongoDB's existing enterprise presence to streamline the development and deployment of reliable AI agents.

The AI Agent Harness: A Deep Dive with LangChain’s Harrison Chase

The conversation with Harrison Chase, co-founder of LangChain, explores the rapid evolution of AI agents, emphasizing the critical role of "harnesses" in enabling LLMs to perform complex tasks. These harnesses, comprising components like system prompts, planning tools, sub-agents, and file systems, are more crucial than the underlying models themselves for achieving reliable and predictable agent behavior. The discussion differentiates between conversational and long-horizon agents, highlighting the increasing importance of coding agents due to their versatility and LLM training data. LangChain's journey reflects this evolution, moving from basic abstractions to sophisticated agent runtimes like Langraph, with a strong focus on observability and continuous improvement for agent engineering.

Harness Engineering: The Foundation of Effective AI Agents

Harness engineering is critical for transforming raw AI models into functional and useful agents. It encompasses all the infrastructure, logic, and tools surrounding a model that enable it to perform complex tasks, maintain state, interact with external environments, and overcome inherent model limitations. This engineering discipline focuses on designing systems that extend and enhance model intelligence rather than solely relying on innate model capabilities.

LangChain Deep Agents Drive Sales Efficiency and Pipeline Growth via GTM Agent

LangChain developed a GTM agent using Deep Agents to automate and optimize sales workflows, from lead qualification and personalized outreach to account intelligence. This agent significantly improved conversion rates and sales rep efficiency by integrating with existing systems like Salesforce and Gong, providing a human-in-the-loop mechanism for review and continuous learning from rep interactions. The system also expanded beyond sales, demonstrating utility for other teams like engineering and customer success due to its comprehensive data access.

LangChain Deep Agent Drives 2.5x Conversion Rate & 40 Hours Saved Per Rep

LangChain implemented a Deep Agent to automate their Go-To-Market (GTM) processes, integrating with existing systems like Salesforce and Gong. This agent significantly improved lead conversion rates and sales rep efficiency by automating lead research, personalized outreach drafting, and account intelligence, maintaining human-in-the-loop oversight and continuous learning from rep interactions.

Evaluating Skills for Coding Agents: A LangChain Perspective

This article outlines LangChain's methodology for evaluating coding agent skills, emphasizing the need for structured evaluation to ensure performance gains. The process involves setting up clean testing environments, defining constrained tasks with clear metrics, and iteratively refining skills. Key to this approach is leveraging tools like LangSmith for observability and performance comparison.

Evaluating Skills for Coding Agents

Evaluating skills is crucial for enhancing coding agent performance. Skills are dynamically loaded prompts that improve agent capabilities in specialized domains. A robust evaluation pipeline involves setting up clean testing environments, defining constrained tasks with clear metrics, and strategically organizing skill content. LangSmith provides tools for experiment tracking and analysis.

LangSmith CLI and Skills Revolutionize Agent Development

LangChain has released a new CLI and "Skills" framework for LangSmith, designed to empower AI coding agents. This allows agents to perform complex tasks within the LangSmith ecosystem, such as adding tracing, building test sets, and evaluating performance. This integration dramatically improves agent performance; for instance, Claude Code's pass rate on specific tasks jumped from 17% to 92%. The approach emphasizes agent-driven improvement loops, utilizing a terminal-first methodology.

LangSmith CLI and Skills Revolutionize AI Agent Development

LangChain\'s new LangSmith CLI and "Skills" paradigm enable AI coding agents to autonomously navigate and optimize within the LangSmith ecosystem. This integration dramatically improves agent performance by providing curated instructions and scripts for tasks like tracing, dataset curation, and evaluation. The approach promotes a virtuous cycle of agent-driven improvement, enhancing development workflows without overwhelming agents with excessive tools.

LangChain Deepens Enterprise AI Support with NVIDIA Partnership and Enhanced Agent Management Tools

LangChain's recent updates focus on bolstering enterprise-grade AI agent development and deployment. Key initiatives include a strategic partnership with NVIDIA for a full-stack agent platform, advancements in LangSmith for secure agent fleet management with new features like "Skills" and "Sandboxes," and enhancements to open-source libraries like `langgraph` and `deepagents`. These updates aim to provide robust infrastructure for building, deploying, and managing AI agents in production environments, particularly for enterprise use cases requiring security, control, and scalability.

Context Engineering for LLM Agents: Key Techniques and Emerging Trends

Context engineering is crucial for optimizing LLM agent performance, cost, and latency. Key techniques involve managing the agent's context window by offloading information to file systems, progressively disclosing tools and skills, and using sub-agents for isolation. Emerging trends include the development of models that can learn to manage their own context and the increasing use of agents for personal life management and bioscience applications.

The Evolution of AI Agents: From Simple LLM Calls to Autonomous Deep Agents

The AI agent landscape is rapidly evolving, moving beyond single LLM calls to complex, autonomous "deep agents." These agents leverage improved models, sophisticated harnesses, and file systems for state management. While early agents struggled with reliability, current deep agents, often designed for asynchronous, "first draft" work, demonstrate increased capabilities in complex tasks like research and coding, demanding new approaches to UX and evaluation.

LangSmith Enhances Agent Monitoring with Production-Focused Insights and Multi-turn Evaluation

LangSmith introduces new capabilities to monitor AI agents in production, addressing the limitations of traditional observability. The platform now treats multi-turn interactions as first-party "threads," enabling more comprehensive analysis. Key features include an Insights Agent for automated usage pattern categorization and Multi-turn Evals for assessing complete conversational trajectories, providing vital feedback for agent improvement.

Building Enterprise-Grade Agents: Reliability, Human-in-the-Loop, and the Shift to Ambient Architectures

Enterprise agent adoption hinges on a simple expected-value equation: maximize the probability of success × value delivered, while minimizing the cost of failure. The most effective levers are making agent behavior more deterministic (workflows + agents, not workflows vs. agents), reducing perceived risk through observability tools, and designing UX patterns—reversible changes, human approval gates, and "first draft" outputs—that bound downside exposure. The next architectural frontier is "ambient agents" triggered by events rather than humans, enabling one-to-many scale, but these must retain human-in-the-loop checkpoints to remain deployable in enterprise contexts.

LangGraph Adds Node Caching, Deferred Execution, and Agent Hooks to Tighten Agentic Workflow Control

LangGraph's latest release week delivers a set of primitives targeting efficiency and control in agentic workflows: node-level caching reduces redundant computation during development, deferred nodes enable clean map-reduce and multi-agent coordination patterns, and pre/post model hooks give developers lifecycle control over ReAct agent message flow. On the JS side, LangGraph v0.3 ships resumable streams, full type-safety on `.stream()`, and ergonomic graph construction APIs. Together, these features push LangGraph closer to a production-grade orchestration layer for both single and multi-agent systems.

Trellix Leverages LangChain for Cybersecurity Automation and Efficiency

Trellix implemented LangChain, LangGraph, and LangSmith to develop "Sidekick," an internal application addressing cybersecurity integration and log parsing backlogs. This initiative significantly reduced manual log parsing from days to minutes and accelerated plugin development. The project highlights how LLMOps tools can streamline development workflows, improve customer satisfaction, and provide clear communication of AI processes to diverse stakeholders.

LangGraph's Three-Layer Memory Architecture for Adaptive AI Agents

Harrison Chase (LangChain CEO) presents a practical framework for embedding persistent memory into agentic systems using LangGraph, mapping three human memory types—semantic (facts), episodic (past experiences), and procedural (instructions)—to concrete agent components. Semantic memory is implemented via LLM-managed key-value stores with semantic search; episodic memory as few-shot classification examples retrieved at triage time; and procedural memory as dynamically updatable system prompts optimized by an LLM reflecting on user feedback trajectories. Memory updates can run either in-band (hot path, immediate but adds latency and cognitive load to the agent) or out-of-band (background, lower latency impact but less transparent). The most underrated challenge is not retrieval but using LLMs to accurately reflect on interactions and decide when and how to update the knowledge store.

The Enterprise Shift: From Model Experimentation to Action-Oriented AI Value

Enterprise AI is transitioning from a 'build vs. buy' mentality toward a hybrid approach that favors transparent, integrable full-stack systems over black-box APIs. Technical implementation is moving away from simple chatbots toward 'Action AI' and complex workflows, requiring a shift from generic benchmarks to business-value metrics (e.g., time-to-process claims). To achieve the 99%+ accuracy required for production, the focus is shifting toward self-evolving architectures that use self-reflection to optimize multi-step tool calling.

LangSmith Enables Rapid AI-Native App Development and Scaling at Lovable

Lovable, a no-code AI platform, leveraged LangSmith to achieve significant operational efficiencies and accelerate growth, reaching $25M ARR in four months. LangSmith's observability and debugging capabilities were critical for understanding agentic interactions, facilitating rapid iteration, and resolving issues in their AI-powered application development process.

LangGraph 0.3 Decouples Core Primitives from High-Level Agent Abstractions

LangGraph 0.3 decouples low-level framework primitives from high-level agent abstractions by moving prebuilt components into a dedicated `langgraph-prebuilt` package. This architecture maintains production-grade control (no hidden prompts) while providing rapid entry points for common patterns like swarm and supervisor multi-agent systems. The strategy aims to catalyze a community-led registry of modular, modifiable agent implementations.

Decagon's Five-Layer AI Agent Engine: Architecture and Lessons from Production Customer Support

Decagon builds AI agents for enterprise customer support, structured around five interlocking components: a core agent brain, routing, agent assist (human co-pilot), admin dashboard, and QA interface — forming a flywheel of continuous improvement. The core brain ingests unstructured knowledge, SOPs, and callable tools (e.g., issue refund, check order status) and applies consistent logic across chat, email, SMS, and voice modalities. Unlike pre-GenAI decision-tree systems, the agent can handle open-ended interactions and fuzzy criteria (e.g., customer sentiment, churn risk), with configurable guardrails per action. Testing is split into pre-deployment coverage sets (~hundreds of conversations per workflow) and ongoing distribution-drift monitoring, with LLM-as-judge scoring against defined evaluation criteria.

LangMem SDK: Enabling Adaptive AI Agents with Long-Term Memory

LangChain has released the LangMem SDK, a library designed to equip AI agents with long-term memory capabilities. This SDK provides tools for extracting conversational information, optimizing agent behavior via prompt updates, and maintaining persistent memory across behaviors, facts, and events. It integrates with LangGraph and offers a managed service, aiming to facilitate the development of smarter, more personalized AI experiences.

ReAct Agent Performance Collapses Under Context and Tool Overload — Model Choice Matters

LangChain benchmarked five LLMs (o1, o3-mini, claude-3.5-sonnet, gpt-4o, llama-3.3-70B) on a ReAct agent tasked with calendar scheduling and customer support, progressively increasing the number of instruction domains and bound tools. Performance degraded across all models as context grew, with the rate of degradation strongly correlated to required trajectory length. o1 and claude-3.5-sonnet showed the most stability under context expansion, while o3-mini degraded sharply despite strong baseline performance. gpt-4o and llama-3.3-70B were clearly outclassed, with llama failing basic tool-call sequencing even under minimal context.

How Infor Rebuilt Its Enterprise AI Platform on LangGraph for Multi-Agent, Multi-Industry Scale

Infor migrated its legacy AWS Lex chatbot (Coleman DA) to a LangChain/LangGraph-powered multi-agent platform embedded across its industry-specific cloud suites, built on AWS Bedrock. The architecture spans three components: embedded LLM experiences via API gateway, a RAG-based Knowledge Hub using AWS OpenSearch, and a multi-agent assistant with real-time data access and enforced data governance. LangGraph's stateful, cyclical agent execution and LangSmith's tracing layer address Infor's specific SaaS requirements around compliance, observability, and model-switching across regulated industries. The initiative positions Infor to expose customizable AI agents to enterprise customers across verticals like Healthcare, Aerospace, and Manufacturing.

LangGraph's Functional API Brings Graph-Level Features to Standard Python Functions

LangGraph's new Functional API introduces two decorators — `@entrypoint` and `@task` — that expose core LangGraph capabilities (human-in-the-loop, persistence, streaming) without requiring developers to define an explicit graph structure. This lowers the adoption barrier for existing codebases by allowing standard Python control flow (loops, conditionals) instead of graph topology. The Functional API shares the same underlying runtime as the Graph API (StateGraph), making the two interoperable and mixable within the same project. The tradeoff is reduced granularity in checkpointing and no built-in workflow visualization, both of which the Graph API handles better.

LLM-driven Prompt Optimization: Benchmarking Methods and Model Performance

This analysis benchmarks five prompt optimization techniques across three LLMs (GPT-4o, Claude-Sonnet, O1) on five diverse datasets. Key findings indicate that prompt optimization is most effective when the underlying model lacks domain knowledge, leading to up to a 200% accuracy increase. Claude-Sonnet is identified as the most reliable optimizer model due to consistent performance and lower variance compared to O1 and GPT-4o.

LangSmith Enables Scalable AI Audience Segmentation at Acxiom

Acxiom, a leader in customer intelligence, leveraged LangSmith to overcome significant challenges in scaling their AI-driven audience segmentation platform. The integration of LangSmith provided robust observability, streamlined debugging, and supported a hybrid model ecosystem, enabling Acxiom to build a scalable and user-friendly generative AI application for precise marketing audience creation. This allowed them to enhance their data-driven marketing strategies and optimize customer acquisition and retention.

Character.AI: Scaling LLMs and Conquering Engineering Challenges

Character.AI, founded by ex-Google Brain researchers, offers an open-domain dialogue platform driven by user-generated prompt-based characters. The company scaled from 300 to over 30,000 generations per second in 18 months, becoming one of the most used GenAI applications. Their scaling journey involved addressing unique challenges in data volume, cost optimization through custom model architecture, and managing persistent open connections for low-latency responses.

Older entries →