Prompting

GPT Prompt Engineering

Two drop-in Claude skills that make Claude write GPT prompts the way OpenAI actually recommends — one for GPT-4.1, one for GPT-5.

Grab the skills

Copy or download a skill, drop it into your project's .claude/skills/ folder (or ~/.claude/skills/ for every project), and Claude will reach for it whenever it writes a prompt for that model.

GPT-4.1write-prompt-gpt-4-1.md

Write Prompt — GPT-4.1

For 4.1, 4.1 mini and nano. The literal, fast tool-callers and classifiers. Covers explicit tool-calling, the ask-don't-act escape valve, end-of-prompt weighting and when examples earn their place.

Download
GPT-5write-prompt-gpt-5.md

Write Prompt — GPT-5

For 5, 5 mini and nano. The reasoning-heavy surfaces. Covers resolving contradictions, exploration criteria, XML specs, no few-shot examples and the API-level effort controls.

Download

Why this matters

Ask Claude to write a prompt for ChatGPT and it writes in its own house style: stacked CRITICAL / MUST / NEVER markers, long negative rules, hand-held numbered sequences, and a few worked examples for good measure. That style works for Claude. It actively hurts GPT models.

OpenAI's own guidance points the other way. GPT-4.1 follows instructions literally — it needs explicit tool-calling rules and benefits from examples. GPT-5 reasons — and gets damaged by contradictions, emphasis spam, and few-shot examples that the literal model would have loved. The right move for one is the wrong move for the other.

These two skills encode that split, straight from OpenAI's GPT-4.1 and GPT-5 prompting guides. Hand the right one to Claude and it stops writing Claude-flavoured prompts and starts writing GPT-native ones.

What Claude gets wrong (and the fix)

It piles on emphasis

Claude's default

"CRITICAL: You MUST ALWAYS call the tool. NEVER answer from memory."

GPT-native

"Call the tool to gather data before answering."

It writes in the negative

Claude's default

"Don't fabricate. Don't guess. Never invent data."

GPT-native

"Use only data from tool results. If a tool fails, say so."

It hard-codes step-by-step sequences (bad for GPT-5)

Claude's default

"1. First call get_weather. 2. Then call list_events. 3. Then reply."

GPT-native

"Gather: weather, today's events, today's tasks."

It drops in few-shot examples (bad for GPT-5)

Claude's default

"Example — user: 'hi' → assistant: 'Hello! How can I help today?'"

GPT-native

"Default to a direct, friendly tone. Keep replies tight."

It leaves contradictions unresolved

Claude's default

"Always check both sources." … later … "Use only the one the user names."

GPT-native

"Check both sources. Exception: if the user names one, use only that."

4.1 vs 5 — the key splits

GPT-4.1 — the literal model

  • -Follows instructions literally — state every behaviour you want
  • -Needs explicit "use your tools" encouragement or it answers from context
  • -Benefits from examples for tricky tool selection
  • -Weights the end of the prompt — repeat key rules there
  • -Under tool-forcing, add an ask-don't-act escape valve

GPT-5 — the reasoning model

  • -Reasons across tool calls — give it goals, not numbered steps
  • -Damaged by contradictions more than any other model — resolve them
  • -No few-shot examples — they reduce reasoning quality
  • -Weights the start of the prompt — core rules first
  • -Tune reasoning_effort and verbosity at the API, not in the prompt

Both skills are built on OpenAI's published prompting guides for each model, with the source quotes kept inline so you can check the reasoning.