absorb.md — A knowledge graph of what AI thinkers are actually saying

Natural language understanding systems struggle with low-resource languages, including many dialects of high-resource ones. Dialect-to-standard normalization attempts to tackle this issue by transforming dialectal text so that it can be used by standard-language tools downstream. In this study, we tackle this task by introducing a new normalization method that combines rule-based linguistically informed transformations and large language models (LLMs) with targeted few-shot prompting, without requiring any parallel data. We implement our method for Greek dialects and apply it on a dataset of regional proverbs, evaluating the outputs using human annotators. We then use this dataset to conduct downstream experiments, finding that previous results regarding these proverbs relied solely on superficial linguistic information, including orthographic artifacts, while new observations can still be made through the remaining semantics.

paper / Apr 14

From KMMLU-Redux to Pro: A Professional Korean Benchmark Suite for LLM Evaluation

The development of Large Language Models (LLMs) requires robust benchmarks that encompass not only academic domains but also industrial fields to effectively evaluate their applicability in real-world scenarios. In this paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux, reconstructed from the existing KMMLU, consists of questions from the Korean National Technical Qualification exams, with critical errors removed to enhance reliability. KMMLU-Pro is based on Korean National Professional Licensure exams to reflect professional knowledge in Korea. Our experiments demonstrate that these benchmarks comprehensively represent industrial knowledge in Korea. We release our dataset publicly available.

paper / Apr 14

Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems

With the advancement of large language models, many dialogue systems are now capable of providing reasonable and informative responses to patients' medical conditions. However, when patients consult their doctor, they may experience negative emotions due to the severity and urgency of their situation. If the model can provide appropriate comfort and empathy based on the patient's negative emotions while answering medical questions, it will likely offer a more reassuring experience during the medical consultation process. To address this issue, our paper explores the balance between knowledge sharing and emotional support in the healthcare dialogue process. We utilize a large language model to rewrite a real-world interactive medical dialogue dataset, generating patient queries with negative emotions and corresponding medical responses aimed at soothing the patient's emotions while addressing their concerns. The modified data serves to refine the latest large language models with various fine-tuning methods, enabling them to accurately provide sentences with both emotional reassurance and constructive suggestions in response to patients' questions. Compared to the original LLM model, our experimental results demonstrate that our methodology significantly enhances the model's ability to generate emotional responses while maintaining its original capability to provide accurate knowledge-based answers.

paper / Apr 14

MNLP at PerAnsSumm: A Classifier-Refiner Architecture for Improving the Classification of Consumer Health User Responses

,

paper / Apr 14

Jailbreak Distillation: Renewable Safety Benchmarking

Large language models (LLMs) are rapidly deployed in critical applications, raising urgent needs for robust safety benchmarking. We propose Jailbreak Distillation (JBDistill), a novel benchmark construction framework that"distills"jailbreak attacks into high-quality and easily-updatable safety benchmarks. JBDistill utilizes a small set of development models and existing jailbreak attack algorithms to create a candidate prompt pool, then employs prompt selection algorithms to identify an effective subset of prompts as safety benchmarks. JBDistill addresses challenges in existing safety evaluation: the use of consistent evaluation prompts across models ensures fair comparisons and reproducibility. It requires minimal human effort to rerun the JBDistill pipeline and produce updated benchmarks, alleviating concerns on saturation and contamination. Extensive experiments demonstrate our benchmarks generalize robustly to 13 diverse evaluation models held out from benchmark construction, including proprietary, specialized, and newer-generation LLMs, significantly outperforming existing safety benchmarks in effectiveness while maintaining high separability and diversity. Our framework thus provides an effective, sustainable, and adaptable solution for streamlining safety evaluation.

paper / Apr 14

DiplomacyAgent: Do LLMs Balance Interests and Ethical Principles in International Events?

,

paper / Apr 14

Bibby AI -- AI Latex Editor writing assistant for researchers vs Overleaf Alternative vs OpenAI Prism. (Bibby AI Latex Editor)

Large language models are increasingly integrated into academic writing workflows; however, the most widely used \LaTeX\ editors remain AI-peripheral -- offering compilation and collaboration, but no native intelligence. This separation forces researchers to leave their editing environment for AI assistance, fragmenting document context and interrupting writing flow. We present Bibby AI (trybibby.com), a native, AI-first \LaTeX\ editor that unifies the complete research writing lifecycle within a single interface. Bibby embeds an AI writing assistant, smart citation search, AI table and equation generation, an AI paper reviewer, abstract generator, literature review drafting, a deep research assistant, and real-time \LaTeX\ error detection and auto-fix -- all natively, without plugins or copy-paste workflows. We introduce LaTeXBench-500, a benchmark of 500 real-world compilation errors across six categories. Bibby achieves 91.4\% detection accuracy and 83.7\% one-click fix accuracy, outperforming Overleaf's native diagnostics (61.2\%) and OpenAI Prism (78.3 / 64.1\%) by large margins. Bibby demonstrates that a privacy-preserving, research-first AI editor can meaningfully accelerate every stage of academic manuscript preparation. We found that Bibby AI is a far superior alternative to overleaf latex and better than OpenAI Prism functionalities and AI.

paper / Apr 14

Bibby AI - AI Latex Editor writing assistant for researchers vs Overleaf Alternative vs OpenAI Prism. (Bibby AI Latex Editor)

github_star / karpathy / Apr 12

karpathy starred langchain-ai/langchain: The agent engineering platform

The agent engineering platform. Stars: 133282

github_star / karpathy / Apr 12

karpathy starred PrimeIntellect-ai/verifiers: Our library for RL environments + evals

Our library for RL environments + evals. Stars: 3996

github_star / karpathy / Apr 12

karpathy starred KaTeX/KaTeX: Fast math typesetting for the web.

Fast math typesetting for the web.. Stars: 19964

github_star / karpathy / Apr 12

karpathy starred open-webui/open-webui: User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

User-friendly AI Interface (Supports Ollama, OpenAI API, ...). Stars: 131421

github_push / karpathy / Apr 12

karpathy pushed to karpathy/nanochat: code update

code update

github_push / karpathy / Apr 12

karpathy pushed to karpathy/nanochat: code update

code update

github_push / karpathy / Apr 12

karpathy pushed to karpathy/autoresearch: code update

code update

github_push / karpathy / Apr 12

karpathy pushed to karpathy/nanochat: code update

code update

github_push / karpathy / Apr 12

karpathy pushed to karpathy/karpathy.github.io: code update

code update

github_push / karpathy / Apr 12

karpathy pushed to karpathy/karpathy.github.io: code update

code update

github_star / karpathy / Apr 12

karpathy starred triton-lang/triton: Development repository for the Triton language and compiler

Development repository for the Triton language and compiler. Stars: 18911

github_star / karpathy / Apr 12

karpathy starred cvxpy/cvxpy: A Python-embedded modeling language for convex optimization problems.

A Python-embedded modeling language for convex optimization problems.. Stars: 6172

github_star / karpathy / Apr 12

karpathy starred unslothai/unsloth: Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.. Stars: 61171

github_star / karpathy / Apr 12

karpathy starred vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs. Stars: 76239

github_star / karpathy / Apr 12

karpathy starred skypilot-org/skypilot: Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).. Stars: 9828

github_star / karpathy / Apr 12

karpathy starred anomalyco/opencode: The open source coding agent.

The open source coding agent.. Stars: 141928

github_star / karpathy / Apr 12

karpathy starred pytorch/torchtitan: A PyTorch native platform for training generative AI models

A PyTorch native platform for training generative AI models. Stars: 5228

github_star / karpathy / Apr 12

karpathy starred pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tensors and Dynamic neural networks in Python with strong GPU acceleration. Stars: 99065

youtube / karpathy / Apr 9

Navigating the Open-Source Note-Taking Ecosystem for Privacy and Efficiency

The video critiques commercial note-taking applications like Notion for their data privacy implications and bloat, advocating for open-source, non-commercial, and plain-text alternatives. It explores various tools and their trade-offs regarding features, user-friendliness, customization, and performance across different operating environments. The author ultimately lands on Neovim with specific plugins as a highly customizable yet challenging solution for plain-text note-taking, highlighting the perpetual quest for an ideal, distraction-free note-taking system versus the practicalities of productivity.

note-taking-appsproductivity-toolsopen-source-softwaretext-editorsdata-privacyself-hostingemacsneovimmarkdownknowledge-management

“Commercial note-taking apps like Notion are problematic due to data privacy concerns and vendor lock-in.”

tweet / @karpathy / Apr 6

LLMs as a Tool for Knowledge Curation, Not Creation

Large Language Models (LLMs) can effectively summarize and contextualize information, reducing the need for manual writing but not replacing the critical processes of reading and analytical thought. This approach facilitates efficient knowledge integration into existing systems like wikis, by providing LLM-generated summaries and contextual analyses that augment human understanding.

llm-workflowinformation-processingknowledge-managementai-toolsreading-comprehension

“LLMs allow users to bypass the writing process.”

tweet / Apr 6 / failed

tweet / @karpathy / Apr 6

LLMs as Knowledge Bases: The Compilation Thesis

Karpathy argues that LLMs are becoming the primary interface for accessing compiled human knowledge, replacing search engines and wikis. The model weights themselves function as a lossy compression of the internet's knowledge, and retrieval-augmented generation patches the gaps.

llmknowledge-managementai-agentssearch

“LLMs are compressed knowledge bases that can be queried conversationally”