Iterate & Deploy
Refine through testing, then package for real users
Phase 4: Iterate and Refine
No prompt is right on the first try. Plan for at least 2-3 major iterations. Each iteration should address a specific category of improvement.
The Iteration Cycle
What to Focus on at Each Iteration
| Iteration | Focus Area | Key Questions |
|---|---|---|
| v1 → v2 | Structure & length | Is the prompt too long? Are sections redundant? Can examples be extracted? |
| v2 → v3 | Actionability | Does the output help users do something? Are recommendations specific enough? |
| v3 → v4 | Edge cases | Does it handle ambiguous inputs? Multi-role jobs? Non-standard industries? |
Refactoring Principles
Extract, don't delete
Move worked examples to separate files instead of removing them
Compress, don't lose
Condense verbose rubrics into inline scales (e.g., 1=Routine | 5=Creative)
Add actionability
If users read the output and ask "now what?", strengthen recommendations
Test the edges
Feed unusual inputs: vague titles, multi-SOC roles, niche industries
Phase 5: Package and Deploy
A GPT isn't just a prompt — it's a complete package. Here's everything you need:
| Component | Purpose |
|---|---|
| Name | Clear, searchable, describes the function |
| Description | One sentence explaining what it does |
| System Prompt | Your refined instructions (the final version) |
| Knowledge Files | Curated data files uploaded to the GPT |
| Conversation Starters | Pre-written prompts showing users how to begin |
| Capabilities | Platform features to enable (web search, code interpreter, etc.) |
| Supporting Materials | Templates, cheat sheets, setup guides for end users |
Conversation Starters That Work
Write 3-4 starters that demonstrate the tool's range. Each shows a different use case and primes the user to provide the right input:
Phase 6: Test with Real Users
| Test Type | What to Try | Watch For |
|---|---|---|
| Happy path | Standard job description with clear tasks | Output matches expected format? |
| Ambiguous input | Vague title like “Analyst” with no context | Does the GPT ask for clarification? |
| Edge case | Highly physical role (e.g., construction) | Correctly flags tasks as Human-led? |
| Multi-role | JD spanning 2-3 SOC codes | Acknowledges ambiguity? |
| Stress test | Very long or very short job descriptions | Output quality holds? |
Feedback Questions to Ask Testers
- Was the output useful for making a decision?
- Was anything confusing or unexpected?
- What would you do differently with this information?
- What's missing?
Case Study: The Prompt Iteration Journey (v1 → v3)
This is the real evolution of the AI Task Augmentation Analyst prompt, showing what changed at each version and why.
The Kitchen Sink
Characteristics:
- ~228 lines — comprehensive but long
- Full Identity & Purpose section
- Complete scoring rubrics with detailed tables
- Worked example embedded directly in the prompt
- Risk x Moat quadrant diagram included
What worked:
- Thorough — covered every edge case
- Self-contained — everything in one file
- The worked example produced consistent outputs
What didn't:
- Token-heavy — example consumed context window
- Redundant sections overlapped
- Tried to be both manual and instructions
Lesson: Your first version should be exhaustive. Get everything out of your head and onto the page. You'll refine later.
Compress and Extract
What changed from v1:
| Change | Why |
|---|---|
| Reduced ~228 lines to ~97 lines | Less token usage, faster processing |
| Extracted worked example to separate file | Keeps prompt focused on instructions |
| Compressed rubrics to inline format | Same information, fewer tokens |
| Condensed Identity to one paragraph | Removed verbose opening |
Lesson: Separate "instructions" from "reference material." The prompt tells the AI how to think. Knowledge files provide the data and examples it references.
Make It Actionable
What changed from v2:
| Change | Why |
|---|---|
| Expanded Step 8 to full Pilot Launch Plan | Users said "great analysis, but now what?" |
| Added pilot scoring criteria table | Makes pilot selection systematic, not gut-feel |
| Added structured recommendation format (6 parts) | Task, rationale, AI approach, metrics, scope, rollback |
| Added Pilot Kickoff Questions | Helps users take immediate action |
Lesson: The biggest improvement often isn't in the analysis — it's in the "so what?" Your GPT should bridge the gap between insight and action.
The Evolution at a Glance
Exhaustive
Get everything down. Be thorough. Don't self-edit yet.
Compressed
Separate instructions from data. Cut redundancy. Extract examples.
Actionable
Close the "now what?" gap. Add recommendations, next steps, follow-ups.
Iteration principle: Each version should make ONE major improvement, not rewrite everything. Preserve what works, fix what doesn't.
Common Pitfalls
Prompt too long
AI ignores later instructions. Fix: extract examples and reference data to knowledge files.
No scoring rubric
Same task gets different scores each run. Fix: define explicit 1-5 scales with criteria per level.
Output without action
Users say “interesting, but now what?” Fix: add a recommendation section with next steps.
No fallback chain
AI hallucinates data when files aren't loaded. Fix: build a priority chain with confidence flags.
Platform lock-in
Can't move to another tool. Fix: keep prompt and data platform-agnostic, package separately.
Source Files
These are the actual files used in the AI Task Augmentation Analyst GPT. View, copy, or download them to use as a starting point for your own custom GPT.
System Prompt (v3 — Latest)
The final prompt used in the GPT instructions field
Worked Example Analysis
Marketing Coordinator analysis — uploaded as a knowledge file
O*NET Data Priority Guide
Download these files from O*NET and upload as knowledge files in the order below
Tier 1 — Start here
- ●Occupation Data
- ●Task Statements
- ●Task Ratings
Tier 2 — Add next
- ●Work Activities
- ●Skills
- ●Technology Skills
Tier 3 — If space allows
- ●Abilities
- ●Knowledge
- ●Work Context
Key Takeaways
- ✓Plan for 2-3 iterations minimum — no prompt is right on the first try
- ✓v1 = exhaustive, v2 = compressed, v3 = actionable — this is the natural evolution
- ✓Package the full experience — name, description, starters, knowledge files, and supporting materials
- ✓Test with real users and diverse inputs — happy path, ambiguous, edge cases, and stress tests
- ✓Close the “now what?” gap — the biggest improvement is often in recommendations, not analysis