I Let Karpathy's AutoResearch Agent Run Overnight! - HackerNoon
I Let Karpathy's AutoResearch Agent Run Overnight! - HackerNoon
Chronological feed of everything captured from Andrej Karpathy.
I Let Karpathy's AutoResearch Agent Run Overnight! - HackerNoon
'I Call Him Dobby The Elf Claw,' OpenAI Cofounder Andrej Karpathy Says — After Nvidia's Jensen Huang Gift - Benzinga
"I've Never Felt This Much Behind As A Programmer." - Andrej Karpathy [89937c] - Fathom Journal
Andrej Karpathy considers returning to Tesla to work on Optimus [video] - Not a Tesla App
Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications - Venturebeat
An OpenAI cofounder 'vibe coded' an analysis of the U.S. labor market's exposure to AI - Fortune
Andrej Karpathy says he uses an AI agent named Dobby the Elf Claw to control his pool and track his packages - Business Insider
OpenAI cofounder says he hasn't written a line of code in months and is in a 'state of psychosis' - Fortune
Andrej Karpathy's 630-line Python script ran 50 experiments overnight without any human input - The New Stack
'The Karpathy Loop': 700 experiments, 2 days, and a glimpse of where AI is heading - Fortune
Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG with an evolving markdown library maintained by AI - Venturebeat
He coined 'vibe coding.' Now he says there's a 'growing gap' among AI users - Business Insider
Karpathy says developers have ‘AI Psychosis.’ Everyone else is next. - The New Stack
Salient Object Subitizing: Supplementary Material 1. Visualizing the CNN Subitizing Classifiers
Workshop Track -iclr 2016 Visualizing and Understanding Recurrent Networks
Connecting images and natural language
Intelligent Mirror: Detecting Skin Cancer (Melanoma) using Convolutional Neural Network with Augmented Reality Feedback
2 D Racing game using reinforcement learning and supervised learning
Challenges in Region-Specific Image Captioning: A Deep Learning Approach
,
Natural language understanding systems struggle with low-resource languages, including many dialects of high-resource ones. Dialect-to-standard normalization attempts to tackle this issue by transforming dialectal text so that it can be used by standard-language tools downstream. In this study, we tackle this task by introducing a new normalization method that combines rule-based linguistically informed transformations and large language models (LLMs) with targeted few-shot prompting, without requiring any parallel data. We implement our method for Greek dialects and apply it on a dataset of regional proverbs, evaluating the outputs using human annotators. We then use this dataset to conduct downstream experiments, finding that previous results regarding these proverbs relied solely on superficial linguistic information, including orthographic artifacts, while new observations can still be made through the remaining semantics.
The development of Large Language Models (LLMs) requires robust benchmarks that encompass not only academic domains but also industrial fields to effectively evaluate their applicability in real-world scenarios. In this paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux, reconstructed from the existing KMMLU, consists of questions from the Korean National Technical Qualification exams, with critical errors removed to enhance reliability. KMMLU-Pro is based on Korean National Professional Licensure exams to reflect professional knowledge in Korea. Our experiments demonstrate that these benchmarks comprehensively represent industrial knowledge in Korea. We release our dataset publicly available.
With the advancement of large language models, many dialogue systems are now capable of providing reasonable and informative responses to patients' medical conditions. However, when patients consult their doctor, they may experience negative emotions due to the severity and urgency of their situation. If the model can provide appropriate comfort and empathy based on the patient's negative emotions while answering medical questions, it will likely offer a more reassuring experience during the medical consultation process. To address this issue, our paper explores the balance between knowledge sharing and emotional support in the healthcare dialogue process. We utilize a large language model to rewrite a real-world interactive medical dialogue dataset, generating patient queries with negative emotions and corresponding medical responses aimed at soothing the patient's emotions while addressing their concerns. The modified data serves to refine the latest large language models with various fine-tuning methods, enabling them to accurately provide sentences with both emotional reassurance and constructive suggestions in response to patients' questions. Compared to the original LLM model, our experimental results demonstrate that our methodology significantly enhances the model's ability to generate emotional responses while maintaining its original capability to provide accurate knowledge-based answers.
,
Large language models (LLMs) are rapidly deployed in critical applications, raising urgent needs for robust safety benchmarking. We propose Jailbreak Distillation (JBDistill), a novel benchmark construction framework that"distills"jailbreak attacks into high-quality and easily-updatable safety benchmarks. JBDistill utilizes a small set of development models and existing jailbreak attack algorithms to create a candidate prompt pool, then employs prompt selection algorithms to identify an effective subset of prompts as safety benchmarks. JBDistill addresses challenges in existing safety evaluation: the use of consistent evaluation prompts across models ensures fair comparisons and reproducibility. It requires minimal human effort to rerun the JBDistill pipeline and produce updated benchmarks, alleviating concerns on saturation and contamination. Extensive experiments demonstrate our benchmarks generalize robustly to 13 diverse evaluation models held out from benchmark construction, including proprietary, specialized, and newer-generation LLMs, significantly outperforming existing safety benchmarks in effectiveness while maintaining high separability and diversity. Our framework thus provides an effective, sustainable, and adaptable solution for streamlining safety evaluation.
,
Large language models are increasingly integrated into academic writing workflows; however, the most widely used \LaTeX\ editors remain AI-peripheral -- offering compilation and collaboration, but no native intelligence. This separation forces researchers to leave their editing environment for AI assistance, fragmenting document context and interrupting writing flow. We present Bibby AI (trybibby.com), a native, AI-first \LaTeX\ editor that unifies the complete research writing lifecycle within a single interface. Bibby embeds an AI writing assistant, smart citation search, AI table and equation generation, an AI paper reviewer, abstract generator, literature review drafting, a deep research assistant, and real-time \LaTeX\ error detection and auto-fix -- all natively, without plugins or copy-paste workflows. We introduce LaTeXBench-500, a benchmark of 500 real-world compilation errors across six categories. Bibby achieves 91.4\% detection accuracy and 83.7\% one-click fix accuracy, outperforming Overleaf's native diagnostics (61.2\%) and OpenAI Prism (78.3 / 64.1\%) by large margins. Bibby demonstrates that a privacy-preserving, research-first AI editor can meaningfully accelerate every stage of academic manuscript preparation. We found that Bibby AI is a far superior alternative to overleaf latex and better than OpenAI Prism functionalities and AI.
Bibby AI - AI Latex Editor writing assistant for researchers vs Overleaf Alternative vs OpenAI Prism. (Bibby AI Latex Editor)
The agent engineering platform. Stars: 133282
Our library for RL environments + evals. Stars: 3996
Fast math typesetting for the web.. Stars: 19964
User-friendly AI Interface (Supports Ollama, OpenAI API, ...). Stars: 131421
code update
code update
code update
code update
code update
code update
Development repository for the Triton language and compiler. Stars: 18911
A Python-embedded modeling language for convex optimization problems.. Stars: 6172
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.. Stars: 61171
A high-throughput and memory-efficient inference and serving engine for LLMs. Stars: 76239
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).. Stars: 9828
The open source coding agent.. Stars: 141928
A PyTorch native platform for training generative AI models. Stars: 5228
Tensors and Dynamic neural networks in Python with strong GPU acceleration. Stars: 99065
The video critiques commercial note-taking applications like Notion for their data privacy implications and bloat, advocating for open-source, non-commercial, and plain-text alternatives. It explores various tools and their trade-offs regarding features, user-friendliness, customization, and performance across different operating environments. The author ultimately lands on Neovim with specific plugins as a highly customizable yet challenging solution for plain-text note-taking, highlighting the perpetual quest for an ideal, distraction-free note-taking system versus the practicalities of productivity.
Large Language Models (LLMs) can effectively summarize and contextualize information, reducing the need for manual writing but not replacing the critical processes of reading and analytical thought. This approach facilitates efficient knowledge integration into existing systems like wikis, by providing LLM-generated summaries and contextual analyses that augment human understanding.
Karpathy argues that LLMs are becoming the primary interface for accessing compiled human knowledge, replacing search engines and wikis. The model weights themselves function as a lossy compression of the internet's knowledge, and retrieval-augmented generation patches the gaps.