March 26, 202611 min read

Prompt Engineering for Developers: How to Actually Get Good Output from LLMs

A practical guide to prompt engineering. Learn system messages, few-shot prompting, chain of thought, structured output, and patterns for code generation.

prompt-engineering ai llm chatgpt developers

Most developers interact with LLMs the same way they'd Google something: type a vague question, get a vague answer, complain that AI is overhyped. The difference between mediocre AI output and genuinely useful output almost always comes down to how you ask.

Prompt engineering isn't a buzzword -- it's the skill of communicating precisely with language models. And for developers who write code with AI tools daily, it's become one of the most practical skills to develop.

This guide covers the techniques that actually matter, with examples you can use immediately.

Why Prompts Matter

An LLM doesn't "think" about your request. It predicts the most likely continuation of the text you provide. This means:

Vague input produces vague output (the model fills in the blanks with generic patterns)
Specific input produces specific output (the model has clear constraints to work within)
Format in the prompt influences format in the output
The context you provide determines the quality ceiling

This is not a bug -- it's how the technology works. Once you internalize this, prompt engineering becomes intuitive.

The Anatomy of a Prompt

Most AI APIs accept two types of messages:

System Message

Sets the persona, rules, and context for the entire conversation. The model treats this as its "instructions."

System: You are a senior Python developer who writes clean,
well-documented code. You follow PEP 8 conventions. When asked
to write code, always include type hints and docstrings. If a
request is ambiguous, ask clarifying questions before generating
code.

User Message

Your actual request -- the thing you want done.

User: Write a function that validates email addresses. It should
handle edge cases like missing @ symbol, multiple @ symbols, and
domains without TLDs.

The system message shapes every response. Without it, you get generic output. With a well-crafted one, you get output tuned to your exact needs.

Practical System Messages for Developers

# For code review
System: You are a code reviewer focused on correctness, security,
and maintainability. Point out bugs, security vulnerabilities,
and suggest specific improvements. Be direct and skip pleasantries.

# For documentation
System: You are a technical writer who creates clear, concise
documentation. Use short sentences. Include code examples for
every concept. Target audience: mid-level developers who know
the language but not this specific library.

# For debugging
System: You are a debugging assistant. When given code that isn't
working, first identify the exact issue, then explain why it
happens, then provide the fix. Show the corrected code with
comments marking what changed.

Technique 1: Few-Shot Prompting

Instead of describing what you want, show examples. This is remarkably effective because the model pattern-matches from your examples.

Zero-Shot (no examples)

Convert these function names to descriptive docstrings:

getUserById

Output might be generic and inconsistent.

Few-Shot (with examples)

Convert function names to descriptive docstrings.

Function: getUserById
Docstring: """Retrieve a user record from the database by their unique identifier."""

Function: validateEmail
Docstring: """Check if an email address is valid according to RFC 5322 format rules."""

Function: calculateShippingCost
Docstring:

Now the model understands your exact format, style, and level of detail. The output will match your examples closely.

When to Use Few-Shot

Code formatting or transformation tasks
Generating structured data (JSON, YAML, SQL)
When you want a specific style the model wouldn't default to
Converting between formats (API → types, SQL → ORM, etc.)

The sweet spot is 2-4 examples. More than that and you're wasting tokens without much improvement.

Technique 2: Chain of Thought

Chain of Thought (CoT) prompting asks the model to reason step-by-step before giving a final answer. This dramatically improves accuracy on complex problems.

Without CoT

What's the time complexity of this function?

def mystery(arr):
    result = []
    for i in range(len(arr)):
        for j in range(i, len(arr)):
            if arr[j] not in result:
                result.append(arr[j])
    return result

The model might give a quick (possibly wrong) answer.

With CoT

What's the time complexity of this function? Think through it
step by step, analyzing each operation.

def mystery(arr):
    result = []
    for i in range(len(arr)):
        for j in range(i, len(arr)):
            if arr[j] not in result:
                result.append(arr[j])
    return result

Now the model will:

Identify the outer loop: O(n)

Identify the inner loop: O(n) in worst case

Identify the not in result check: O(n) for a list

Conclude: O(n^3) worst case

The key phrase is "think step by step" (or "reason through this" or "walk me through your analysis"). It triggers more careful reasoning.

When to Use CoT

Algorithm complexity analysis
Debugging complex issues
Architectural decisions with tradeoffs
Any problem where you'd want to see the reasoning, not just the answer

Technique 3: Structured Output

If you need the output in a specific format, specify the format explicitly. LLMs are excellent at following structural constraints when you tell them what you need.

JSON Output

Analyze this Python function and return your analysis as JSON
with this exact structure:

{
  "function_name": "string",
  "parameters": ["list of param names"],
  "return_type": "string",
  "time_complexity": "string",
  "space_complexity": "string",
  "potential_bugs": ["list of issues"],
  "suggestions": ["list of improvements"]
}

Function:
def find_duplicates(lst):
    seen = {}
    dupes = []
    for item in lst:
        if item in seen:
            dupes.append(item)
        seen[item] = True
    return dupes

Markdown Tables

Compare these three approaches in a markdown table with columns:
Approach, Time Complexity, Space Complexity, Best For, Drawbacks


Hash map frequency count
Sorting then scanning
Bit manipulation

Code with Specific Structure

Write a FastAPI endpoint with this structure:

Route decorator with proper HTTP method
Pydantic model for request body
Pydantic model for response
Input validation
Error handling with HTTPException
Docstring with example request/response

Endpoint: POST /api/users - creates a new user with name, email,
and optional phone number. Returns the created user with an ID.

Technique 4: Temperature and Model Parameters

Temperature controls randomness in the output:

Low temperature (0.0-0.3): deterministic, focused, consistent. Use for code generation, data extraction, factual questions.
Medium temperature (0.4-0.7): balanced. Good default for most tasks.
High temperature (0.8-1.0+): creative, varied, sometimes surprising. Use for brainstorming, creative writing, exploring alternatives.

# Using the OpenAI API as an example
response = client.chat.completions.create(
    model="gpt-4",
    temperature=0.2,  # Low temp for deterministic code output
    messages=[
        {"role": "system", "content": "Write clean Python code."},
        {"role": "user", "content": "Implement a binary search function."}
    ]
)

For code generation, always use low temperature. You want consistency, not creativity, in your function implementations.

Patterns for Code Generation

Pattern 1: Specification First

Tell the model what the code should do before asking it to write:

Requirements:

Function: parse_log_file(filepath: str) -> list[LogEntry]
LogEntry has: timestamp (datetime), level (str), message (str)
Log format: "2026-03-26 14:30:00 [ERROR] Connection timeout"
Skip blank lines and malformed entries
Handle files up to 1GB efficiently (don't load entire file into memory)
Raise FileNotFoundError if file doesn't exist

Write this function with full type hints, error handling, and a
docstring including an example.

This is dramatically better than "write a log parser."

Pattern 2: Incremental Building

For complex systems, build up piece by piece instead of asking for everything at once:

Step 1: "Design the data models for a task management system. Users, Tasks, Projects, with relationships." Step 2: "Now write the database layer using SQLAlchemy with these models. Include CRUD operations."

Step 3: "Now write FastAPI endpoints that use this database layer. Include authentication middleware."

Each step builds on the previous context. The model maintains coherence across the conversation.

Pattern 3: Test-Driven Prompting

Give the model the tests first, then ask for the implementation:

Here are the tests that must pass:

def test_slugify_basic():
    assert slugify("Hello World") == "hello-world"

def test_slugify_special_chars():
    assert slugify("What's Up?") == "whats-up"

def test_slugify_multiple_spaces():
    assert slugify("too   many   spaces") == "too-many-spaces"

def test_slugify_unicode():
    assert slugify("cafe\u0301") == "cafe"

def test_slugify_empty():
    assert slugify("") == ""

Write the slugify function that passes all these tests.

This constrains the output to match exact behavior. No ambiguity.

Pattern 4: Diff Mode

When modifying existing code, ask for changes as diffs rather than full rewrites:

Here's my current function. Add connection pooling and retry logic. Show me only the lines that change, in diff format.

[paste existing code]

This is faster to review and less likely to introduce unrelated changes.

Debugging Prompts

When an AI-generated solution doesn't work, these meta-prompts help:

"What assumptions did you make?"

This code throws a KeyError on line 23. Before suggesting a fix,
list every assumption your original implementation made about the
input data.

Forces the model to surface hidden assumptions that might be wrong.

"Find three bugs in this code"

Assume this code has bugs. Find at least three potential issues, ordered by severity. For each bug, explain: what triggers it, what happens, and how to fix it.

[paste code]

Asking "find bugs" is more effective than asking "is this correct?" because the model is primed to look for problems instead of confirming correctness.

"Explain this like I'll maintain it at 3 AM"

Explain what this code does in plain English. Assume I'll be debugging this at 3 AM during an outage. What are the non-obvious behaviors? What state could cause unexpected results?

[paste complex code]

This reframing produces more practical, operationally-relevant explanations.

Common Prompt Mistakes

1. Being Too Vague

Bad: "Write a web scraper"

Good: "Write a Python web scraper using requests and BeautifulSoup that extracts product names and prices from pagination pages at example.com/products?page=N. Handle rate limiting (1 request per second), failed requests (3 retries with exponential backoff), and output results as CSV."

2. Not Specifying Constraints

Bad: "Write a sorting algorithm"

Good: "Write an in-place sorting algorithm for a list of integers. Must be O(n log n) average case. Cannot use Python's built-in sort. Include type hints."

3. Asking for Too Much at Once

Bad: "Build me a complete e-commerce backend"

Good: "Let's build an e-commerce backend. Start with the product catalog: data models for products (name, price, description, category, stock_count), SQLAlchemy models, and CRUD endpoints with FastAPI."

4. Not Providing Context

Bad: "Why doesn't this work?" (pastes 3 lines from the middle of a function)

Good: "This FastAPI endpoint returns 422 Unprocessable Entity. Here's the full endpoint, the Pydantic model, and the request I'm sending via curl." (pastes all relevant code)

5. Ignoring the System Message

If you use the API, an empty or generic system message wastes your most powerful lever. Invest time in crafting system messages that encode your preferences, conventions, and quality bar.

Advanced: Prompt Templates for Teams

If your team uses AI tools regularly, standardize your prompts:

# prompt_templates.py

CODE_REVIEW_PROMPT = """
Review this code for:

Correctness (bugs, logic errors, edge cases)

Security (injection, auth, data exposure)

Performance (complexity, unnecessary allocations)

Maintainability (naming, structure, documentation)


For each issue found:

Severity: Critical / Warning / Suggestion

Location: file and line reference

Issue: what's wrong

Fix: how to fix it


Code to review:
{code}
"""

MIGRATION_PROMPT = """
Convert this {source_framework} code to {target_framework}.
Preserve all behavior including error handling and edge cases.
Flag any features that don't have a direct equivalent in the
target framework.

Source code:
{code}
"""

This ensures consistent output quality regardless of who on the team is writing the prompt.

The Fundamental Principle

Every prompting technique boils down to one idea: reduce ambiguity.

The more precisely you specify what you want, what constraints exist, what format to use, and what quality bar to hit, the better the output. LLMs aren't mind readers -- they're completion engines. The better the input, the better the completion.

Start with clear specs. Show examples when the format matters. Ask for reasoning on complex problems. Specify the output structure. Review everything.

That's prompt engineering for developers. No magic, just clarity.

For more AI development guides, programming tutorials, and engineering best practices, check out CodeUp.