Chronological feed of everything captured from LlamaIndex.
blog / llama_index / 5d ago / failed
blog / llama_index / 5d ago / failed
blog / llama_index / 5d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
blog / llama_index / 13d ago / failed
tweet / @llama_index / 18d ago / failed
May the 4th be with you!β¨ Celebrate with us, you must. Join our Start Up Party Up this Thursday before SF's free 2nd St Fest π΅
Calling all #AI Jedis - leave your agents at home and sign up for our next monthly meet up to find:
π Outta Sight's specialty pizza
π§ Jawa juice & green milk (iykyk)
πΈ New learnings, new β¦
tweet / @llama_index / 18d ago / failed
π @CBinsights AI 100 2026 is out and LlamaIndex made the list.
We're proud to provide the leading document understanding API for AI agents. Congrats to all honorees in the AI Infrastructure category.
Full list here: https://www.cbinsights.com/research/report/artificial-intelligence-top-startups-2026/
tweet / @llama_index / 18d ago / failed
LlamaIndex NYC takeover, 5/13 π½
Our CEO Jerry Liu is in town.
Two events, open to every NYC builder:
π οΈ FinParse Workshop β laptops out, hands-on with @jerryjliu0 β https://luma.com/updli8i6
π AI Engineers on Tap β happy hour w/ @tabs β https://luma.com/tklfgwh8
tweet / @llama_index / 18d ago / failed
What if you could extract text from any photo on your phone?
We built LlamaParse Mobile, an @expo + @reactnative app for iOS & Android, powered by the LlamaParse TypeScript SDK π±
Three steps, thatβs it:
π Add your API key (securely stored on-device)
πΈ Snap a photo of anything with text
π Parse it and, in under aβ¦
tweet / @llama_index / 18d ago / failed
A few weeks ago @simonw got Claude to port LiteParse to the browser. Today, we are launching that work as a complete guide in our docs! https://developers.llamaindex.ai/liteparse/guides/browser-usage/?utm_medium=socials&utm_source=twitter
The guide itself relies on some fun hacks with vite and mocking. We expect this β¦
tweet / @llama_index / 25d ago / failed
Our CEO @jerryjliu0 in @VentureBeat , on what's actually changing in the LLM stack:
"We've really identified that there's a core set of data that has been locked up in all these file format containers. Ultimately, whether you use OpenAI Codex or Claude Code doesn't really matter. The thing that they all need is conteβ¦
tweet / @llama_index / 25d ago / failed
Let's talk document formatting.
Bold. Italics. Superscripts. Strikethroughs. The visual cues humans rely on every time we read a doc, and ones existing OCR benchmarks completely ignore.
π±"$199" struck through next to "$149" isn't decoration. It's the meaning.
π±A superscript tells your agent "3" is a citation, not β¦
tweet / @llama_index / 25d ago / failed
Parsing documents with AI agents just got a lot more seamlessπ
We've rebuilt the LlamaParse MCP server to handle your document processing workflows, and you can connect it today to any MCP-compatible client at https://mcp.llamaindex.ai/mcp π
Once connected, you'll be able to:
π Parse documents into clean markdownβ¦
tweet / @llama_index / 25d ago / failed
Building scalable, distributed document processing pipelines isnβt easy.
Thatβs why we teamed up with @render to build a system that:
π Leverages the LlamaParse platform to parse, classify, extract, and retrieve information from documents
βοΈ Uses Render Workflows to distribute tasks across nodes and accelerate backgroβ¦
tweet / @llama_index / 25d ago / failed
Thank you AI Dev Day '26 @DeepLearningAI
@jerryjliu0 shares why SOTA LLMs can build an app but can't read a PDF π€―
tweet / @llama_index / 28d ago
Loan processors dedicate 40-60% of their time to manually reconciling income from tax returns, pay stubs, W-2s, and bank statements. LlamaIndex developed an end-to-end automation pipeline using LlamaParse for schema-driven extraction across these document types, providing confidence scores and citations. Claude Agent SDK enables cross-document validation to detect discrepancies like W-2/pay-stub gaps, unexplained deposits, and employer mismatches, culminating in an HTML report with COMPLETE/REVIEW/FLAG decisions.
llamaparseclaude-agentdocument-automationincome-reconciliationfinancial-pipelinerpa-finance
βLoan processors spend 40β60% of their time reconciling income across tax returns, pay stubs, W-2s, and bank statements.β
youtube / llama_index / Apr 25 / failed
youtube / llama_index / Apr 25 / failed
youtube / llama_index / Apr 25 / failed
tweet / @llama_index / Apr 24
ParseBench, now live on Kaggle, is the first OCR benchmark tailored for AI agents, featuring 2,000 enterprise pages and over 167,000 test rules across 5 dimensions that expose downstream agent failures. It enables direct comparison of custom parsers against 14 established methods, including GPT-5 Mini, Gemini 3, Textract, and LlamaParse. This benchmark targets real-world parsing challenges in enterprise document processing.
ai-benchmarkdocument-ocrparser-evaluationllamaindexkaggleai-agentsenterprise-data
βParseBench is the first document OCR benchmark specifically built for AI agentsβ
tweet / @llama_index / Apr 23 / failed
LiteParse: our open-source, layout-aware PDF parser for AI agents.
The secret? Grid projection. Instead of heavy ML layout models or flat text extraction, it projects text onto a monospace grid so alignment preserves structure.
Full deep dive into the grid projection algorithm behind the magic β
https://t.co/pXS1ZNlIEβ¦
tweet / @llama_index / Apr 22 / failed
Let's talk parsing charts ππ.
Last week we released ParseBench, the first document OCR benchmark for AI agents.
New in ParseBench: ChartDataPointMatch.
Most document look at a chart and OCR the caption.
Agents need the actual numbers. That's the gap between "OCR'd the text around the chart" and "actually read tβ¦
tweet / @llama_index / Apr 20
LiteParse, a zero-cloud-dependency parser, achieved over 4.3K GitHub stars in weeks and has joined the LlamaIndex ecosystem. It processes ~500 pages across 50+ formats in 2 seconds, powering agents in Claude Code, Cursor, and production pipelines. Upcoming live workshop demonstrates building a fintech due diligence agent using LiteParse.
llamaindexliteparsegithub-starsoss-tooldocument-parsingai-agentsfintech-workshop
βLiteParse reached over 4,300 GitHub stars within a few weeks of launch.β
tweet / @llama_index / Apr 20
Anthropic's Opus 4.7 model achieves 80.6% on Document Reasoning, a 23.5-point gain from 57.1%, but ParseBench reveals uneven parsing improvements: Charts surge 42.3 points to 55.8%, minor gains in Formatting (+5.2%), Content (+0.6%), and Tables (+0.7%), with a Layout regression (-2.5%). Overall ParseBench score reaches 55.8% at ~1.5Β’/page, lagging LlamaParse Agentic's 84.9% at ~1.2Β’/page. No single model dominates general document understanding.
anthropic-opusdocument-parsingparsebenchllamaparseai-benchmarksagentic-parsingllm-evaluation
βAnthropic Opus 4.7 scores 80.6% on Document Reasoning, up from 57.1%.β
tweet / @llama_index / Apr 20
ParseBench is the first document OCR benchmark for AI agents, evaluating content faithfulness via a core metric that checks if parsers extract all text in order without fabrications. It uses over 167K rule-based tests to grade three failure modes: omissions (word, sentence, digit), hallucinations, and reading order violations. The benchmark raises expectations from human-readable outputs to agent-actionable reliability.
parsebenchocr-benchmarkdocument-parsingai-agentscontent-faithfulnesshallucination-detectionreading-order
βParseBench is the first document OCR benchmark specifically for AI agents.β
tweet / @llama_index / Apr 20
LlamaIndex is introducing an AI track to NYC FinTech Week, targeting builders of fintech agents, document intelligence, and agentic workflows. They are co-hosting an AI Builders Rooftop Happy Hour with LinkupAPI next week, featuring cocktails, a rooftop venue, and potential piΓ±ata battle. RSVP link provided for attendance.
fintech-weekai-trackai-buildersrooftop-happy-hourfintech-agentsagentic-workflowsnyc-event
βNYC FinTech Week now includes an AI trackβ
tweet / @llama_index / Apr 7
LlamaIndex and LanceDB developed a structure-aware PDF QA pipeline that significantly improves agentic search. This pipeline addresses the challenge of processing visually rich documents by integrating multimodal data storage and retrieval. The combined approach of robust parsing with LiteParse and multimodal storage in LanceDB enables agents to achieve high accuracy in complex reasoning tasks involving PDFs.
multimodal-aipdf-processingllm-agentsinformation-retrievalrag-pipelineslancedbliteparse
βVisually rich documents pose a significant challenge for traditional document processing pipelines and AI agents.β
tweet / @llama_index / Apr 7
LlamaIndex is hosting an in-person workshop in NYC on May 13th for fintech leaders. The workshop will focus on practical applications of agentic OCR to transform complex financial documents into LLM-ready data, including insights from a top-tier PE firm's production agent. Attendees are expected to bring their own laptops to build real pipelines.
fintechllmsocragentic-aidata-pipelinesworkshops
βLlamaIndex is hosting a workshop for fintech leaders in NYC on May 13th.β
tweet / @llama_index / Apr 3
LlamaIndex hosted a community gathering at their new San Francisco office, attracting over 100 developers. The event served as a networking session for AI builders, coinciding with local festivities in the city.
new-officecommunity-eventai-buildersllamaindex-x-feednetworking
βLlamaIndex has opened a new office in San Francisco.β
tweet / @llama_index / Apr 2
LlamaIndex has launched Extract v2, a significant upgrade to its document extraction tool. This new version offers simplified operation through intuitive tiers, pre-saved extraction configurations for efficiency, and configurable document parsing for greater control and improved results. Extract v1 will remain available for a limited transition period.
document-extractionllm-data-processingdata-pipelinesplatform-updatesllamaindex
βExtract v2 features simplified, intuitive tiers, replacing previous modes.β
tweet / @llama_index / Apr 1
LlamaIndex is sponsoring Stanford FutureLaw Week 2026, an event focused on the intersection of AI and law, featuring bootcamps, hackathons, and a conference. This initiative aims to train future legal professionals in AI. However, a significant need remains for AI legal tools supporting commercial teams in small to mid-sized companies that lack dedicated legal support.
legal-aiai-applicationsai-bootcampslegal-techstanfordfuture-law
βLlamaIndex is sponsoring Stanford FutureLaw Week 2026.β
tweet / @llama_index / Mar 31
LlamaIndex has been named to the 2026 Enterprise Tech 30, securing the #3 spot in the Early Stage category. This recognition, based on votes from over 90 leading investors and corporate development leaders, highlights LlamaIndex's significant potential to influence the future of enterprise technology. The award underscores the company's strong industry standing and validates its impact within the enterprise tech landscape.
llamaindex-recognitionenterprise-techearly-stage-companieswing-vcindustry-awardsstartup-ecosystem
βLlamaIndex was recognized in the 2026 Enterprise Tech 30.β
tweet / @llama_index / Mar 30
LlamaIndex is hosting an office warming party on April 2nd at their new "AI Waterfront" location on 2nd Street. The event will offer networking opportunities, food, and drinks. Due to limited space, early RSVP is encouraged.
new-officenetworking-eventai-communityllama-indexmiami-events
βLlamaIndex has moved to a new office location.β
tweet / @llama_index / Mar 30
LiteSearch serves as a reference implementation for high-performance, fully local document ingestion and retrieval. The stack integrates LiteParse for parsing, Chonkie for chunking, and a Rust-based Qdrant edge shard for vectorized storage, executed via the Bun runtime.
open-sourcelocal-airetrieval-augmented-generationdeveloper-toolsdocument-parsing
βLiteSearch is a fully local document ingestion and retrieval CLI/TUI application.β
tweet / @llama_index / Mar 27
Modern OCR solutions for tables go beyond basic text recognition by reconstructing spatial relationships, preserving header hierarchies, and ensuring data integrity. This deep dive explains the three core phases of table extraction: detection, structure recognition, and data extraction with validation. The applications are wide-ranging, from financial services to healthcare, enabling the conversion of complex tabular data into structured formats like JSON for seamless integration.
document-processingintelligent-table-extractionocrllama-parsedata-extractionai-applications
βModern OCR for tables reconstructs spatial relationships and preserves header hierarchies.β
tweet / @llama_index / Mar 27
Modern OCR solutions like LlamaParse address the challenges of extracting structured data from complex tables in PDFs. This technology reconstructs spatial relationships, preserves header hierarchies, and validates data integrity, going beyond basic OCR capabilities. It transforms visual table formats into usable structured data, crucial for various industry applications.
document-processingintelligent-table-extractionocrllama-parsedata-extractionpdf-processing
βTable extraction is more challenging than standard text OCR due to the importance of spatial relationships.β