SHARE
AI tools like ChatGPT can generate summaries in seconds, but the quality of those summaries depends heavily on how you design the prompt. This is where prompt engineering becomes critical.
At a technical level, prompt engineering is shaping the input space of a probabilistic model to influence token prediction patterns, output structure, and reasoning behavior. That is, your prompt directly affects how the model assigns probabilities to possible next tokens.
Large Language Models (LLMs) do not “understand” text the way humans do. Instead, they perform a series of computational actions:
This means every word in your prompt triggers a re-weighting of the model's internal attention mechanism. Even small changes in wording can shift the "focus" of the neural network, significantly affecting output quality.
Compare these two actions:
“Summarize this text” — Leaves the token space wide open, increasing the risk of "hallucinations" or irrelevant fluff.“Summarize this text in 3 bullet points focusing only on key technical insights” — Constrains the search space, forcing the model to ignore non-technical tokens and follow a rigid structural pattern.Before an LLM processes text, it converts strings into tokens (words, subwords, or punctuation). This process defines the model's "budget" and "vision”.
max_tokens: 200 to force the model to prioritize the most information-dense tokens.From a modeling perspective, your task is to control the output's entropy (randomness). You achieve this by:
Ambiguity increases output variance. Instead of a vague request, issue a multi-layered instruction.
Write a summary.Summarize this text in 5 bullet points. Extract only technical decisions and implementation details. Use an objective, professional tone.LLMs do not inherently know what is “important.” You must manually calibrate the relevance filter.
“Summarize focusing on product features. Ignore all marketing adjectives and promotional language.”
This instruction suppresses the probability of flowery, "salesy" tokens appearing in the output.While not always written in the prompt text, adjusting the API parameters is a core part of the engineering process:
Adding constraints is essentially output space reduction.
Example: Summarize in max 80 words. Do not repeat ideas. Use short sentences.
This forces the model to compress information rather than expand it.
Prompt engineering is iterative because LLM output is probabilistic.
Typical production loop:
This is similar to model evaluation in ML pipelines, except the “model” stays fixed and only the input distribution changes.
| Technique | Action | Best For... |
|---|---|---|
| Zero-shot | Give a direct command without examples. | Simple, standard tasks. |
| Few-shot | Provide 2-3 examples of "Input -> Ideal Summary" before the actual task. | Maintaining a very specific brand voice or complex format. |
| Chain-of-Thought | Ask the model to "think step-by-step" (e.g., "First, list the 3 main arguments, then synthesize them"). | Highly complex or academic papers. |
| Prompt Chaining | Break the task into steps: 1. Extract facts -> 2. Clean facts -> 3. Format as summary. | Long-document reliability and high-stakes reporting. |
Quantify the success using specific metrics:
Layer your instructions into a structured "System Prompt."
You are a Technical Writer.Summarize the provided documentation.Max 5 bullets. No marketing jargon.Highlight API endpoints and authentication methods.Prompt engineering is not just about writing better instructions; it is about controlling how a probabilistic language model distributes meaning across tokens.
Understanding LLM behavior, tokenization, and evaluation helps move from “getting outputs” to engineering reliable AI behavior in production systems.
In real-world applications, this is what turns prompts from simple text inputs into a core part of system design.