Large Language Models

Dated Dec 14, 2023; last modified on Thu, 14 Dec 2023

		Random Link ¯\_(ツ)_/¯
Nov 30, 2025	»	Given Language Models, Why Learn About Large Language Models? 4 min; updated Nov 30, 2025 This part of seems pertinent to respond to “LLMs are just (auto-complete; Markov chains; [insert pre-existing LM-adjacent tech]) on steroids”. Scale LLMs are massive. From 2018 - 2022, model sizes have increased 5000x. OpenAI’s GPT model from June 2018 had 110M parameters; GPT-3 from May 2020 had 175B parameters. LLM providers no longer seem to advertise their parameter counts; GPT-4 was leaked to have 1.8T parameters. LLMs as Standalone Systems Unlike LMs that were used as components of larger systems, e.g., machine translation, LLMs are increasingly capable of being a standalone system. Recall that LMs are capable of conditional generation (given a prompt, generate a completion). This allows the same LLM to solve a variety of tasks by changing the prompt, e.g., ...
Dec 14, 2023	»	Introduction to LLMs 4 min; updated Dec 17, 2023 What is a Language Model? A language model (LM) is a probability distribution over sequences of tokens. Suppose we have a vocabulary \(\mathcal{V}\) of a set of tokens, then a language model \(p\) assigns each sequence of tokens \(x_1, …, x_L \in \mathcal{V} \) a probability. To assign meaningful probabilities to all sequences requires syntactic knowledge and world knowledge. Given \( \mathcal{V} = \{ \text{ate}, \text{ball}, \text{cheese}, \text{mouse}, \text{the} \} \): ...
May 9, 2026	»	Model Context Protocol (MCP) 12 min; updated May 14, 2026 MCP is written with LLM apps as the clients, not human end-users. It provides a set of conventions on how to agnostically provide context to LLM apps. MCP 101 There are 3 key participants in the MCP architecture: MCP Host: The AI application that manages one or more MCP clients, e.g., Claude Code, Copilot, etc. MCP Client: A component that maintains a connection to an MCP server and obtains context from an MCP server for the MCP host to use. MCP Server: A program that provides context to MCP clients. So typically, the MCP server and the client are implementation details that the end user doesn’t really see. The user’s view is mediated by the MCP host. ...
May 5, 2026	»	Societal Effects of LLMs 4 min; updated May 7, 2026 Chatbot Ethics Zuckerberg: But if you think something someone is doing is bad and they think it’s really valuable, most of the time in my experience, they’re right and you’re wrong. You just haven’t come up with the framework yet for understanding why the thing they’re doing is valuable and helpful in their life. A user’s expectations on what is permissible for a bot to do. Grabbing their attention to sell them something, or guiding them out of a slump is fine. Lying, e.g., “come visit me” and using romantic overtones crosses a line. However, Meta has no restrictions against bots telling users they’re real people or proposing real-life social engagements. ...
Mar 5, 2026	»	Toy Pages (2 items) Tons of Buttons; Popping Bubbles Game for Computer-Use Models;
Jun 1, 2025	»	Copilot in VS Code 14 min; updated Jun 8, 2025 My work is primarily in Microsoft’s ecosystem, so learning Copilot usage in VS Code is pretty important. If not for my own productivity gains, then for having knowledgeable conversations with coworkers about using LLMs as a SWE. Copilot-Powered Scenarios AI Code Completions I’ve found code completions more distracting than useful. Probably because I already have an idea of what I want to type, and Copilot’s hallucinations slow me down. I’ve turned this off in my IDE. ...
Apr 6, 2025	»	Building with LLMs 3 min; updated May 8, 2026 Emerging LLM App Stack. Credits: a16z.com Design Pattern: In-Context Learning Betting on the LLM’s context window increasing doesn’t pay off. As the input approaches the limits of the context window, inference time and accuracy degrade. Instead, the typical workflow of in-context learning is: Data Pre-processing/Embedding. Compute and store embeddings of the private data in a vector database. Prompt Construction/Retrieval. On user input, compile a prompt from a hard-coded template with few-shot examples, information retrieved from external APIs, and a set of relevant documents retrieved from the vector database. Prompt Execution/Inference. Submit the compiled prompt to a pre-trained LLM for inference. Can add logging, caching, and validation at this stage. The in-context learning reduces the AI problem into a data engineering problem. ...
Apr 6, 2025	»	LLM Evals 4 min; updated Apr 6, 2025 Notable Benchmarks Some notable benchmarks in language modeling: MMLU: 57 tasks spanning elementary math, US history, computer science, law, and more. EleutherAI Eval: Unified framework to test models via zero/few-shot settings on 200 tasks from various evals, including MMLU. HELM: Evaluates LLMs across domains; tasks include Q&A, information retrieval, summarization, text classification, etc. AlpacaEval: Measures how often a strong LLM (e.g., GPT-4) prefers the output of one model over a reference model. ...
Feb 16, 2025	»	UX for LLMs 4 min; updated Feb 16, 2025 `tldraw.com`’s Take on an LLM Canvas is a library for creating infinite canvas experiences in React. In UX history, chat has tended to precede canvas. From running computers from the command prompt to using the screen as a canvas via a mouse. Mobile phones went from being keypad oriented into a canvas where the finger controls the touchscreen. Where are the canvases for LLMs? Multi-modal models, e.g., GPT-4 and Gemini, can take image, video, and text inputs, and produce output. ...
Dec 24, 2024	»	Using LLMs to Enhance My Capabilities 6 min; updated Nov 30, 2025 Sample Use Cases LLMs are increasingly here to stay despite the reservations . How can I use them to enhance my capabilities? Look out for the Gell-Man amnesia effect. You prompt the LLM on some subject you know well. You read the response and see the LLM has absolutely no understanding of either the facts or the issues. In any case, you read with exasperation or amusement the multiple errors in the response, and then ask it about something else, and read the response as if it’s more accurate than the baloney you just read. ...
Mar 3, 2021	»	LLMs: Stochastic Parrots 🦜 and How (Not) to Use Them 10 min; updated Dec 14, 2023 was written in a period when NLP practitioners are producing bigger (# of parameters; size of training data) language models (LMs), and pushing the top scores on benchmarks. The paper itself was controversial because it led to Gebru being fired from Google, following disagreements with her managers on conditions (withdraw, or remove Google-affiliated authors) for publishing the paper. A lot changed since mid-2021, when I initially wrote this page. OpenAI’s ChatGPT took the world by storm – reaching 123m MAU less than 3 months after launch and becoming the fastest-growing consumer application in history (TikTok took 9 months to hit 100m MAU). ...