What Is a Large Language Model? A Plain-English Explanation

You have heard the term “large language model” dozens of times. You know it has something to do with ChatGPT, Gemini, and Claude. You might not know what the term actually means, how these models learn, or why they sometimes confidently say wrong things. This explanation covers all of that without requiring a computer science background.

The Basic Idea

A large language model is a type of AI system trained to understand and generate text. “Large” refers to the number of parameters – internal numerical values the model adjusts during training. Modern LLMs have hundreds of billions of parameters. “Language model” means it models language: given some text, it predicts what text should come next.

That prediction task sounds simple. It is not. To predict the next word in “The capital of France is ___”, the model needs to know geography. To predict the next line of a poem, it needs to understand rhythm and rhyme. To continue a legal argument coherently, it needs to understand legal reasoning. Training a model to predict text well across billions of examples forces it to develop broad knowledge and reasoning ability as a side effect.

How Training Works

LLMs are trained on enormous amounts of text – books, websites, code repositories, academic papers, forums. The training process works roughly like this: the model sees text with some words hidden, guesses what the hidden words are, compares its guesses to the actual words, and adjusts its parameters to guess better next time. This process repeats billions of times across trillions of words.

After this initial training (called pretraining), most commercial LLMs go through additional steps. Instruction tuning trains the model to follow instructions helpfully. Reinforcement Learning from Human Feedback (RLHF) adjusts the model based on human ratings of its outputs – teaching it to give answers that people find helpful, accurate, and safe. This is why ChatGPT and Claude feel different from a raw language model: they have been shaped to be assistants, not just text predictors.

What an LLM Actually Does When You Ask It Something

When you type a message to ChatGPT or Claude, the model receives your text, processes it through its layers of parameters, and generates a response one token at a time. A token is roughly a word or part of a word – “transformers” might be one token, “un-” and “believable” might be two. The model samples the next token based on probability, influenced by everything in the conversation so far.

The model does not “look up” answers in a database. It does not “know” things the way a human knows them. It generates text that is statistically likely to be correct given its training. Most of the time, statistically likely is the same as correct. Sometimes it is not – which is why LLMs confidently generate plausible-sounding false information.

Why LLMs Make Things Up

This phenomenon has a name: hallucination. LLMs hallucinate because their goal is to generate plausible text, not to access verified facts. If asked about a book that does not exist, the model generates a plausible-sounding description of a book rather than saying it does not know – because generating something plausible is what it was trained to do. Modern LLMs are much better at declining to answer uncertain questions than they were two years ago, but the tendency to generate confident-sounding wrong answers has not been fully eliminated.

Context Windows

Every LLM has a context window – the maximum amount of text it can consider at once. This includes both your input and its response. GPT-4o has a 128,000-token context window (roughly 100,000 words). Claude 3.7 Sonnet has a 200,000-token window. Within this window, the model has complete access to everything said. Outside this window, it has no memory of previous conversations – each new conversation starts fresh unless the application provides a summary.

The Difference Between Models

Different LLMs – GPT-4o, Claude, Gemini, Llama 3 – differ in their training data, parameter count, fine-tuning approach, and architecture details. These differences produce real quality differences on specific tasks. That is why our ChatGPT vs Gemini comparison finds real gaps between them, and why Claude and ChatGPT differ on coding tasks. Under the hood, they are all doing the same thing – predicting tokens – but the training differences accumulate into meaningful capability differences.

What LLMs Cannot Do

LLMs cannot reliably do arithmetic without a calculator tool. They cannot access real-time information unless connected to a search engine. They do not retain memory between conversations unless specifically designed to. They cannot reason about truly novel problems the way a human expert can – they interpolate from training patterns, which fails on genuinely unprecedented problems. These are not fixable by making the model bigger; they are structural properties of how LLMs work.

The Bottom Line

A large language model is a statistical text predictor trained on massive amounts of human-written text. It generates useful, often accurate responses because predicting human text requires learning enormous amounts about the world. It generates wrong answers sometimes because it predicts plausible text rather than accessing verified facts. Understanding this distinction – and the genuine capabilities these systems have – is the starting point for using them effectively.