Large Language Models

Dated Dec 14, 2023; last modified on Thu, 14 Dec 2023

		Random Link ¯\_(ツ)_/¯
Dec 14, 2023	»	Introduction to LLMs 4 min; updated Dec 17, 2023 What is a Language Model? A language model (LM) is a probability distribution over sequences of tokens. Suppose we have a vocabulary \(\mathcal{V}\) of a set of tokens, then a language model \(p\) assigns each sequence of tokens \(x_1, …, x_L \in \mathcal{V} \) a probability. To assign meaningful probabilities to all sequences requires syntactic knowledge and world knowledge. Given \( \mathcal{V} = \{ \text{ate}, \text{ball}, \text{cheese}, \text{mouse}, \text{the} \} \):...
Jun 11, 2025	»	Popping Bubbles Game for Computer-Use Models 2 min; updated Jun 12, 2025 .bubble { width: 50px; height: 50px; border-radius: 50%; background: radial-gradient(circle, rgba(173,216,230,1) 0%, rgba(135,206,250,1) 50%, rgba(0,191,255,1) 100%); box-shadow: 0 0 10px rgba(0,191,255,0.5), 0 0 20px rgba(0,191,255,0.3); cursor: pointer; } This page shows bubbles at random locations on the screen. To get the highest score, click on the bubble as soon as it appears. To restart the game, refresh the page. Game starts in seconds....
Jun 1, 2025	»	Copilot in VS Code 14 min; updated Jun 8, 2025 My work is primarily in Microsoft’s ecosystem, so learning Copilot usage in VS Code is pretty important. If not for my own productivity gains, then for having knowledgeable conversations with coworkers about using LLMs as a SWE. Copilot-Powered Scenarios AI Code Completions I’ve found code completions more distracting than useful. Probably because I already have an idea of what I want to type, and Copilot’s hallucinations slow me down....
Apr 6, 2025	»	LLM Evals 4 min; updated Apr 6, 2025 Notable Benchmarks Some notable benchmarks in language modeling: MMLU: 57 tasks spanning elementary math, US history, computer science, law, and more. EleutherAI Eval: Unified framework to test models via zero/few-shot settings on 200 tasks from various evals, including MMLU. HELM: Evaluates LLMs across domains; tasks include Q&A, information retrieval, summarization, text classification, etc. AlpacaEval: Measures how often a strong LLM (e.g., GPT-4) prefers the output of one model over a reference model....
Feb 16, 2025	»	UX for LLMs 4 min; updated Feb 16, 2025 tldraw.com’s Take on an LLM Canvas is a library for creating infinite canvas experiences in React. In UX history, chat has tended to precede canvas. From running computers from the command prompt to using the screen as a canvas via a mouse. Mobile phones went from being keypad oriented into a canvas where the finger controls the touchscreen. Where are the canvases for LLMs? Multi-modal models, e....
Dec 24, 2024	»	Using LLMs to Enhance My Capabilities 6 min; updated Jun 8, 2025 Sample Use Cases LLMs are increasingly here to stay despite the reservations . How can I use them to enhance my capabilities? Look out for the Gell-Man amnesia effect. You prompt the LLM on some subject you know well. You read the response and see the LLM has absolutely no understanding of either the facts or the issues. In any case, you read with exasperation or amusement the multiple errors in the response, and then ask it about something else, and read the response as if it’s more accurate than the baloney you just read....
Mar 3, 2021	»	LLMs: Stochastic Parrots 🦜 and How (Not) to Use Them 10 min; updated Dec 14, 2023 was written in a period when NLP practitioners are producing bigger (# of parameters; size of training data) language models (LMs), and pushing the top scores on benchmarks. The paper itself was controversial because it led to Gebru being fired from Google, following disagreements with her managers on conditions (withdraw, or remove Google-affiliated authors) for publishing the paper. A lot changed since mid-2021, when I initially wrote this page....