absorb.md

LlamaIndex

Chronological feed of everything captured from LlamaIndex.

Multimodal RAG Achieves Near-Perfect Scores in PDF QA

LlamaIndex and LanceDB developed a structure-aware PDF QA pipeline that significantly improves agentic search. This pipeline addresses the challenge of processing visually rich documents by integrating multimodal data storage and retrieval. The combined approach of robust parsing with LiteParse and multimodal storage in LanceDB enables agents to achieve high accuracy in complex reasoning tasks involving PDFs.

LlamaIndex Workshop: LLM-Ready Data from Financial Documents with Agentic OCR

LlamaIndex is hosting an in-person workshop in NYC on May 13th for fintech leaders. The workshop will focus on practical applications of agentic OCR to transform complex financial documents into LLM-ready data, including insights from a top-tier PE firm's production agent. Attendees are expected to bring their own laptops to build real pipelines.

LlamaIndex Community Event in San Francisco

LlamaIndex hosted a community gathering at their new San Francisco office, attracting over 100 developers. The event served as a networking session for AI builders, coinciding with local festivities in the city.

LlamaIndex Introduces Extract V2 for Enhanced Document Data Extraction

LlamaIndex has launched Extract v2, a significant upgrade to its document extraction tool. This new version offers simplified operation through intuitive tiers, pre-saved extraction configurations for efficiency, and configurable document parsing for greater control and improved results. Extract v1 will remain available for a limited transition period.

LlamaIndex Sponsors Stanford FutureLaw 2026, Highlighting AI in Legal Sector Education and Underserved Commercial Legal Needs

LlamaIndex is sponsoring Stanford FutureLaw Week 2026, an event focused on the intersection of AI and law, featuring bootcamps, hackathons, and a conference. This initiative aims to train future legal professionals in AI. However, a significant need remains for AI legal tools supporting commercial teams in small to mid-sized companies that lack dedicated legal support.

LlamaIndex Recognized as a Leading Enterprise Tech Innovator

LlamaIndex has been named to the 2026 Enterprise Tech 30, securing the #3 spot in the Early Stage category. This recognition, based on votes from over 90 leading investors and corporate development leaders, highlights LlamaIndex's significant potential to influence the future of enterprise technology. The award underscores the company's strong industry standing and validates its impact within the enterprise tech landscape.

LlamaIndex Office Warming Event

LlamaIndex is hosting an office warming party on April 2nd at their new "AI Waterfront" location on 2nd Street. The event will offer networking opportunities, food, and drinks. Due to limited space, early RSVP is encouraged.

Architecting Local-First RAG Pipelines with LiteSearch

LiteSearch serves as a reference implementation for high-performance, fully local document ingestion and retrieval. The stack integrates LiteParse for parsing, Chonkie for chunking, and a Rust-based Qdrant edge shard for vectorized storage, executed via the Bun runtime.

Advanced Table Extraction for Structured Data

Modern OCR solutions for tables go beyond basic text recognition by reconstructing spatial relationships, preserving header hierarchies, and ensuring data integrity. This deep dive explains the three core phases of table extraction: detection, structure recognition, and data extraction with validation. The applications are wide-ranging, from financial services to healthcare, enabling the conversion of complex tabular data into structured formats like JSON for seamless integration.

Advanced Table Extraction for Structured Data

Modern OCR solutions like LlamaParse address the challenges of extracting structured data from complex tables in PDFs. This technology reconstructs spatial relationships, preserves header hierarchies, and validates data integrity, going beyond basic OCR capabilities. It transforms visual table formats into usable structured data, crucial for various industry applications.