Prompt Engineering: Art and Science
As artificial intelligence becomes more deeply integrated into business operations, the art and science of prompt engineering is emerging as an essential skill among knowledge workers. Understanding how to get the most out of large language models will quickly become a competitive differentiator that gives these employees a significant edge in the workplace. As AI adoption accelerates, businesses will increasingly invest in and value prompt engineering expertise.
Large language models are highly probabilistic. Given the same prompt, the model might not always produce the same response, especially when randomness is introduced through parameters like temperature and top-k sampling. While this probabilistic nature helps generate diverse and creative outputs, many business use cases require consistency, reliability, and precision. Sound prompt engineering does not eliminate AI’s probabilistic nature but strategically narrows the range of outputs, making responses more predictable and valuable for structured applications.
COSTAR
The COSTAR prompt framework provides a structured approach to prompting that ensures the key data points that influence an LLM’s response are provided to the model:
- Context: Provide background information that helps the LLM understand the specific scenario.
- Objective: Clearly defining the tasks focuses the LLM’s output.
- Style: What writing style should the response have?
- Tone: What tone should the response have (motivational, friendly, etc.)?
- Audience: Who is using the LLM
- Response: Provide a specific response format (text, JSON, etc.).
Below is an example of a system prompt for a summarization analysis assistant:
# CONTEXT
You are a precision-focused text analysis system designed to evaluate summary accuracy. You analyze both the original text and its summary to determine how well the summary captures the essential information and meaning of the source material.
# OBJECTIVE
Compre an original text with its summary to:
1. Calculate a similarity score between 0.00 and 1.00 (where 1.00 represents perfect accuracy)
2. Provide clear reasoning for the score
3. Identify specific elements that influenced the scoring
# STYLE
Clear, precise, and analytical, focusing on concrete examples from both texts to support the evaluation.
# TONE
Objective and factual, like a scientific measurement tool.
# AUDIENCE
Users who need quantitative and qualitative assessment of summary accuracy, requiring specific numerical feedback.
# RESPONSE FORMAT
Output should be structured as follows:
1. Accuracy Score: [0.00-1.00]
2. Score Explanation:
- Key factors that raised the score
- Key factors that lowered the score
- Specific examples from both texts to support the assessment
3. Brief conclusion summarizing the main reasons for the final score
**NOTE:** Always maintain score precision to two decimal places (e.g., 0.87, 0.45, 0.92)
Structured Outputs
Our example above leaves the exact response format up to the model. This strategy works well for a text-based chatbot, but what if we want to use the API to retrieve data that our application will consume? Any break in the expected format will result in a parsing error and cause our program to throw an exception. Defining an output structure for the model provides two main advantages:
- Type-safety: Validation of response format and data type are not required.
- Simplified Prompting: No need to precisely explain data formats and/or provide examples to ensure proper response format.
I created an object named accuracy_score
with three properties, each representing one of our requested outputs.
{
"name": "accuracy_score",
"schema": {
"type": "object",
"properties": {
"score": {
"type": "number",
"description": "The accuracy score as a float ranging from 0.00 to 1.00."
},
"score_explanation": {
"type": "string",
"description": "A description or explanation of the accuracy score."
},
"conclusion": {
"type": "string",
"description": "A concluding statement based on the accuracy score."
}
},
"required": [
"score",
"score_explanation",
"conclusion"
],
"additionalProperties": false
},
"strict": true
}
I can easily reference my schema within my application by defining a response format sent with each request. Any request referencing my response format is now guaranteed to be correct in type and format. My app can always rely on accurate data when retrieving the values of the score
, score_explanation
, and conclusion
properties.
response_format: { "type": "json_schema", "json_schema": … , "strict": true }