absorb.md — A knowledge graph of what AI thinkers are actually saying

Loan processors dedicate 40-60% of their time to manually reconciling income from tax returns, pay stubs, W-2s, and bank statements. LlamaIndex developed an end-to-end automation pipeline using LlamaParse for schema-driven extraction across these document types, providing confidence scores and citations. Claude Agent SDK enables cross-document validation to detect discrepancies like W-2/pay-stub gaps, unexplained deposits, and employer mismatches, culminating in an HTML report with COMPLETE/REVIEW/FLAG decisions.

llamaparseclaude-agentdocument-automationincome-reconciliationfinancial-pipelinerpa-finance

“Loan processors spend 40–60% of their time reconciling income across tax returns, pay stubs, W-2s, and bank statements.”

youtube / llama_index / Apr 25 / failed

Deep Dive into ChartDataPointMatch: A New Metric for Evaluating Parsing Accuracy on Charts

youtube / llama_index / Apr 25 / failed

Building an #AI #Agent from Scratch: Manning Book Walkthrough with Val Andrei Fajardo

youtube / llama_index / Apr 25 / failed

The Anatomy of an AI Agent

tweet / @llama_index / Apr 24

ParseBench Launches as Premier OCR Benchmark for AI Agent Document Parsing

ParseBench, now live on Kaggle, is the first OCR benchmark tailored for AI agents, featuring 2,000 enterprise pages and over 167,000 test rules across 5 dimensions that expose downstream agent failures. It enables direct comparison of custom parsers against 14 established methods, including GPT-5 Mini, Gemini 3, Textract, and LlamaParse. This benchmark targets real-world parsing challenges in enterprise document processing.

ai-benchmarkdocument-ocrparser-evaluationllamaindexkaggleai-agentsenterprise-data

“ParseBench is the first document OCR benchmark specifically built for AI agents”

tweet / @llama_index / Apr 23 / failed

LiteParse: our open-source, layout-aware PDF parser for AI agents. The secret? Grid projection. Instead of heavy ML layout models or flat text extraction, it projects text onto a monospace grid so alignment preserves structure. Full deep dive into the grid projection algorithm behind the magic ↓ https://t.co/pXS1ZNlIE…

tweet / @llama_index / Apr 22 / failed

Let's talk parsing charts 📊📈. Last week we released ParseBench, the first document OCR benchmark for AI agents. New in ParseBench: ChartDataPointMatch. Most document look at a chart and OCR the caption. Agents need the actual numbers. That's the gap between "OCR'd the text around the chart" and "actually read t…

tweet / @llama_index / Apr 20

LiteParse Rapidly Gains Traction with 4.3K Stars, Integrates into LlamaIndex for High-Speed Document Parsing

LiteParse, a zero-cloud-dependency parser, achieved over 4.3K GitHub stars in weeks and has joined the LlamaIndex ecosystem. It processes ~500 pages across 50+ formats in 2 seconds, powering agents in Claude Code, Cursor, and production pipelines. Upcoming live workshop demonstrates building a fintech due diligence agent using LiteParse.

llamaindexliteparsegithub-starsoss-tooldocument-parsingai-agentsfintech-workshop

“LiteParse reached over 4,300 GitHub stars within a few weeks of launch.”

tweet / @llama_index / Apr 20

Anthropic Opus 4.7 Boosts Document Parsing 42 Points on ParseBench but Trails LlamaParse Agentic

Anthropic's Opus 4.7 model achieves 80.6% on Document Reasoning, a 23.5-point gain from 57.1%, but ParseBench reveals uneven parsing improvements: Charts surge 42.3 points to 55.8%, minor gains in Formatting (+5.2%), Content (+0.6%), and Tables (+0.7%), with a Layout regression (-2.5%). Overall ParseBench score reaches 55.8% at ~1.5¢/page, lagging LlamaParse Agentic's 84.9% at ~1.2¢/page. No single model dominates general document understanding.

anthropic-opusdocument-parsingparsebenchllamaparseai-benchmarksagentic-parsingllm-evaluation

“Anthropic Opus 4.7 scores 80.6% on Document Reasoning, up from 57.1%.”

tweet / @llama_index / Apr 20

ParseBench Sets New Standard for Faithful OCR in AI Agents

ParseBench is the first document OCR benchmark for AI agents, evaluating content faithfulness via a core metric that checks if parsers extract all text in order without fabrications. It uses over 167K rule-based tests to grade three failure modes: omissions (word, sentence, digit), hallucinations, and reading order violations. The benchmark raises expectations from human-readable outputs to agent-actionable reliability.

parsebenchocr-benchmarkdocument-parsingai-agentscontent-faithfulnesshallucination-detectionreading-order

“ParseBench is the first document OCR benchmark specifically for AI agents.”

tweet / @llama_index / Apr 20

LlamaIndex Launches AI Track and Rooftop Happy Hour at NYC FinTech Week

LlamaIndex is introducing an AI track to NYC FinTech Week, targeting builders of fintech agents, document intelligence, and agentic workflows. They are co-hosting an AI Builders Rooftop Happy Hour with LinkupAPI next week, featuring cocktails, a rooftop venue, and potential piñata battle. RSVP link provided for attendance.

fintech-weekai-trackai-buildersrooftop-happy-hourfintech-agentsagentic-workflowsnyc-event

“NYC FinTech Week now includes an AI track”

tweet / @llama_index / Apr 7

Multimodal RAG Achieves Near-Perfect Scores in PDF QA

LlamaIndex and LanceDB developed a structure-aware PDF QA pipeline that significantly improves agentic search. This pipeline addresses the challenge of processing visually rich documents by integrating multimodal data storage and retrieval. The combined approach of robust parsing with LiteParse and multimodal storage in LanceDB enables agents to achieve high accuracy in complex reasoning tasks involving PDFs.

multimodal-aipdf-processingllm-agentsinformation-retrievalrag-pipelineslancedbliteparse

“Visually rich documents pose a significant challenge for traditional document processing pipelines and AI agents.”

tweet / @llama_index / Apr 7

LlamaIndex Workshop: LLM-Ready Data from Financial Documents with Agentic OCR

LlamaIndex is hosting an in-person workshop in NYC on May 13th for fintech leaders. The workshop will focus on practical applications of agentic OCR to transform complex financial documents into LLM-ready data, including insights from a top-tier PE firm's production agent. Attendees are expected to bring their own laptops to build real pipelines.

fintechllmsocragentic-aidata-pipelinesworkshops

“LlamaIndex is hosting a workshop for fintech leaders in NYC on May 13th.”

tweet / @llama_index / Apr 3

LlamaIndex Community Event in San Francisco

LlamaIndex hosted a community gathering at their new San Francisco office, attracting over 100 developers. The event served as a networking session for AI builders, coinciding with local festivities in the city.

new-officecommunity-eventai-buildersllamaindex-x-feednetworking

“LlamaIndex has opened a new office in San Francisco.”

tweet / @llama_index / Apr 2

LlamaIndex Introduces Extract V2 for Enhanced Document Data Extraction

LlamaIndex has launched Extract v2, a significant upgrade to its document extraction tool. This new version offers simplified operation through intuitive tiers, pre-saved extraction configurations for efficiency, and configurable document parsing for greater control and improved results. Extract v1 will remain available for a limited transition period.

document-extractionllm-data-processingdata-pipelinesplatform-updatesllamaindex

“Extract v2 features simplified, intuitive tiers, replacing previous modes.”

tweet / @llama_index / Apr 1

LlamaIndex Sponsors Stanford FutureLaw 2026, Highlighting AI in Legal Sector Education and Underserved Commercial Legal Needs

LlamaIndex is sponsoring Stanford FutureLaw Week 2026, an event focused on the intersection of AI and law, featuring bootcamps, hackathons, and a conference. This initiative aims to train future legal professionals in AI. However, a significant need remains for AI legal tools supporting commercial teams in small to mid-sized companies that lack dedicated legal support.

legal-aiai-applicationsai-bootcampslegal-techstanfordfuture-law

“LlamaIndex is sponsoring Stanford FutureLaw Week 2026.”

tweet / @llama_index / Mar 31

LlamaIndex Recognized as a Leading Enterprise Tech Innovator

LlamaIndex has been named to the 2026 Enterprise Tech 30, securing the #3 spot in the Early Stage category. This recognition, based on votes from over 90 leading investors and corporate development leaders, highlights LlamaIndex's significant potential to influence the future of enterprise technology. The award underscores the company's strong industry standing and validates its impact within the enterprise tech landscape.

llamaindex-recognitionenterprise-techearly-stage-companieswing-vcindustry-awardsstartup-ecosystem

“LlamaIndex was recognized in the 2026 Enterprise Tech 30.”

tweet / @llama_index / Mar 30

LlamaIndex Office Warming Event

LlamaIndex is hosting an office warming party on April 2nd at their new "AI Waterfront" location on 2nd Street. The event will offer networking opportunities, food, and drinks. Due to limited space, early RSVP is encouraged.

new-officenetworking-eventai-communityllama-indexmiami-events

“LlamaIndex has moved to a new office location.”

tweet / @llama_index / Mar 30

Architecting Local-First RAG Pipelines with LiteSearch

LiteSearch serves as a reference implementation for high-performance, fully local document ingestion and retrieval. The stack integrates LiteParse for parsing, Chonkie for chunking, and a Rust-based Qdrant edge shard for vectorized storage, executed via the Bun runtime.

open-sourcelocal-airetrieval-augmented-generationdeveloper-toolsdocument-parsing

“LiteSearch is a fully local document ingestion and retrieval CLI/TUI application.”

tweet / @llama_index / Mar 27

Advanced Table Extraction for Structured Data

Modern OCR solutions for tables go beyond basic text recognition by reconstructing spatial relationships, preserving header hierarchies, and ensuring data integrity. This deep dive explains the three core phases of table extraction: detection, structure recognition, and data extraction with validation. The applications are wide-ranging, from financial services to healthcare, enabling the conversion of complex tabular data into structured formats like JSON for seamless integration.

document-processingintelligent-table-extractionocrllama-parsedata-extractionai-applications

“Modern OCR for tables reconstructs spatial relationships and preserves header hierarchies.”

tweet / @llama_index / Mar 27

Advanced Table Extraction for Structured Data

Modern OCR solutions like LlamaParse address the challenges of extracting structured data from complex tables in PDFs. This technology reconstructs spatial relationships, preserves header hierarchies, and validates data integrity, going beyond basic OCR capabilities. It transforms visual table formats into usable structured data, crucial for various industry applications.

document-processingintelligent-table-extractionocrllama-parsedata-extractionpdf-processing

“Table extraction is more challenging than standard text OCR due to the importance of spatial relationships.”

LlamaIndex

Building a Financial Due Diligence Agent with LiteParse

Cloud Configuration Automation Guide: RAGformation

LlamaIndex Newsletter 5-19-26

Income Verification API: How to Automate Document-Based Income Checks at Scale

Mortgage Document Automation: Transforming Loan Processing

OCR for KYC: Why Standard Text Extraction Falls Short of Compliance Requirements

LlamaIndex Newsletter 2026-03-31

LlamaIndex Newsletter 2026-04-14

LlamaIndex Newsletter 2026-04-21

LlamaParse MCP: Agentic OCR tools for your AI agents

LiteParse Server: Self-Hostable Document Parsing

Parsing the Unreadable: How LlamaParse Handles Legal Discovery Documents

Introducing ParseBench: The First Document Parsing Benchmark for AI Agents

LlamaIndex Pipeline Automates 40-60% of Loan Processors' Manual Income Reconciliation

Deep Dive into ChartDataPointMatch: A New Metric for Evaluating Parsing Accuracy on Charts

Building an #AI #Agent from Scratch: Manning Book Walkthrough with Val Andrei Fajardo

The Anatomy of an AI Agent

ParseBench Launches as Premier OCR Benchmark for AI Agent Document Parsing

LiteParse Rapidly Gains Traction with 4.3K Stars, Integrates into LlamaIndex for High-Speed Document Parsing

Anthropic Opus 4.7 Boosts Document Parsing 42 Points on ParseBench but Trails LlamaParse Agentic

ParseBench Sets New Standard for Faithful OCR in AI Agents

LlamaIndex Launches AI Track and Rooftop Happy Hour at NYC FinTech Week

Multimodal RAG Achieves Near-Perfect Scores in PDF QA

LlamaIndex Workshop: LLM-Ready Data from Financial Documents with Agentic OCR

LlamaIndex Community Event in San Francisco

LlamaIndex Introduces Extract V2 for Enhanced Document Data Extraction

LlamaIndex Sponsors Stanford FutureLaw 2026, Highlighting AI in Legal Sector Education and Underserved Commercial Legal Needs

LlamaIndex Recognized as a Leading Enterprise Tech Innovator

LlamaIndex Office Warming Event

Architecting Local-First RAG Pipelines with LiteSearch

Advanced Table Extraction for Structured Data

Advanced Table Extraction for Structured Data