Chronological feed of everything captured from Jony Ive.
paper / jonyives / 5d ago
VolTA-3D is a self-supervised 3D Vision Transformer framework that addresses the poor transferability of existing brain MRI models by jointly aligning global class-style tokens and local patch tokens within a student-teacher paradigm while enforcing fine-grained structural reconstruction. Most existing 3D SSL models for brain MRI are specialized for either segmentation or classification, limiting cross-task and cross-dataset generalization. VolTA-3D's combined global-local alignment strategy is specifically designed to handle the limited semantic diversity and subtle anatomical variation inherent to brain MRI — a known failure mode for standard SSL approaches. Evaluations on out-of-distribution tasks (hippocampal segmentation, sex classification, Alzheimer's vs. healthy classification) show consistent improvements over randomly initialized baselines.
self-supervised-learningbrain-mrimedical-imagingvision-transformer3d-volumetricsegmentationalzheimers-disease
“Most existing 3D brain MRI SSL models are specialized for either segmentation or classification, limiting their generalizability across tasks, datasets, and imaging protocols.”
paper / jonyives / Apr 13
This study examines the alignment between predictive uncertainty from Bayesian Deep Learning approximations (Monte Carlo Dropout and Deep Ensembles) and linguistic uncertainty extracted from free-text radiology reports using BERT and rule-based labeling on chest radiographs. Models achieve good performance, but correlation between machine predictive uncertainty and human linguistic uncertainty remains modest. Findings indicate Bayesian methods provide useful estimates yet require refinement to capture human interpretive nuances for clinical use.
uncertainty-estimationbayesian-deep-learningchest-radiographradiology-reportspredictive-uncertaintylinguistic-uncertaintydeep-ensembles
“BERT-based models demonstrate good performance in chest radiograph interpretation when using Bayesian uncertainty approximations.”
paper / jonyives / Apr 13
HopeBot, an LLM-powered voice chatbot, administers the PHQ-9 depression screening via retrieval-augmented generation with real-time clarification, showing strong score agreement (ICC=0.91; 45% identical) against self-administered versions in a 132-participant within-subject study across UK and China. 71% of 75 feedback respondents trusted the chatbot more due to its structured clarity, interpretive guidance, and supportive tone. High usability ratings (7.4-8.4/10) and 87.1% reuse/recommendation intent indicate viability as a scalable screening adjunct, with variations by employment and prior mental health service use.
llm-chatbotphq-9-screeningdepression-detectionhealthcare-aihuman-computer-interactionretrieval-augmented-generation
“HopeBot PHQ-9 scores showed strong agreement with self-administered PHQ-9 (ICC = 0.91; 45% identical scores).”
paper / jonyives / Apr 13
Researchers developed a validated silver-standard dataset from NICE guidelines covering multiple diagnoses, using GPT to generate realistic patient scenarios and clinical questions. The dataset addresses the absence of standardized benchmarks for evaluating guideline-based clinical reasoning in healthcare LLMs. Popular LLMs were benchmarked on this dataset to demonstrate its validity and enable systematic assessment of clinical utility and guideline adherence.
clinical-llmsdataset-creationhealthcare-aillm-evaluationguideline-adherencesilver-standardarxiv-paper
“Standardised benchmarks for evaluating guideline-based clinical reasoning in LLMs are missing in healthcare.”
paper / jonyives / Apr 13
Researchers address gaps in LLM therapeutic dialogue by applying SFT and multi-component RL to GPT-2, restructuring inputs to incorporate context and emotions alongside user queries. A novel reward function prioritizes therapeutic logic and emotional alignment over lexical metrics. RL yields BLEU gains of 0.0111, ROUGE-1 of 0.1397, and 99.34% emotion accuracy versus baseline GPT-2's 66.96%.
therapeutic-dialoguereinforcement-learningllm-fine-tuningmental-healthemotion-awarenesscontextual-nlp
“SFT on GPT-2 produces repetitive, context-insensitive therapeutic outputs lacking empathy balance”
paper / jonyives / Apr 13
This study evaluates LLMs on two EHR data science tasks: generating accurate Python/Pandas queries for structured data analytics and extracting information from clinical notes using a RAG pipeline. Experiments on MIMIC III subsets employ synthetic question-answer pairs, exact-match metrics, semantic similarity, and human judgment across local and API-based LLMs. Results affirm LLMs' reliability for precise querying and semantically correct extraction in clinical workflows.
llmsragehrclinical-datainformation-extractiondata-queryingmimic-iii
“LLMs can accurately generate Python/Pandas code for querying large structured EHR datasets”