Prompt Engineering: Art and Science
Effective prompt engineering is an art as much as it is a science. Programmers can ensure quality LLM output in their apps by following established prompting frameworks.
As artificial intelligence becomes more deeply integrated into business operations, the art and science of prompt engineering is emerging as an essential skill among knowledge workers. Understanding how to get the most out of large language models will quickly become a competitive differentiator that gives these employees a significant edge in the workplace. As AI adoption accelerates, businesses will increasingly invest in and value prompt engineering expertise.
Large language models are highly probabilistic. Given the same prompt, the model might not always produce the same response, especially when randomness is introduced through parameters like temperature and top-k sampling. While this probabilistic nature helps generate diverse and creative outputs, many business use cases require consistency, reliability, and precision. Sound prompt engineering does not eliminate AI’s probabilistic nature but strategically narrows the range of outputs, making responses more predictable and valuable for structured applications.
COSTAR
The COSTAR prompt framework provides a structured approach to prompting that ensures the key data points that influence an LLM’s response are provided to the model:
- Context: Provide background information that helps the LLM understand the specific scenario.
- Objective: Clearly defining the tasks focuses the LLM’s output.
- Style: What writing style should the response have?
- Tone: What tone should the response have (motivational, friendly, etc.)?
- Audience: Who is using the LLM
- Response: Provide a specific response format (text, JSON, etc.).
Below is an example of a system prompt for a summarization analysis assistant:
# CONTEXT
You are a precision-focused text analysis system designed to evaluate summary accuracy. You analyze both the original text and its summary to determine how well the summary captures the essential information and meaning of the source material.
# OBJECTIVE
Compre an original text with its summary to:
1. Calculate a similarity score between 0.00 and 1.00 (where 1.00 represents perfect accuracy)
2. Provide clear reasoning for the score
3. Identify specific elements that influenced the scoring
# STYLE
Clear, precise, and analytical, focusing on concrete examples from both texts to support the evaluation.
# TONE
Objective and factual, like a scientific measurement tool.
# AUDIENCE
Users who need quantitative and qualitative assessment of summary accuracy, requiring specific numerical feedback.
# RESPONSE FORMAT
Output should be structured as follows:
1. Accuracy Score: [0.00-1.00]
2. Score Explanation:
- Key factors that raised the score
- Key factors that lowered the score
- Specific examples from both texts to support the assessment
3. Brief conclusion summarizing the main reasons for the final score
**NOTE:** Always maintain score precision to two decimal places (e.g., 0.87, 0.45, 0.92)
Structured Outputs
Our example above leaves the exact response format up to the model. This strategy works well for a text-based chatbot, but what if we want to use the API to retrieve data that our application will consume? Any break in the expected format will result in a parsing error and cause our program to throw an exception. Defining an output structure for the model provides two main advantages:
- Type-safety: Validation of response format and data type are not required.
- Simplified Prompting: No need to precisely explain data formats and/or provide examples to ensure proper response format.
I created an object named accuracy_score
with three properties, each representing one of our requested outputs.
{
"name": "accuracy_score",
"schema": {
"type": "object",
"properties": {
"score": {
"type": "number",
"description": "The accuracy score as a float ranging from 0.00 to 1.00."
},
"score_explanation": {
"type": "string",
"description": "A description or explanation of the accuracy score."
},
"conclusion": {
"type": "string",
"description": "A concluding statement based on the accuracy score."
}
},
"required": [
"score",
"score_explanation",
"conclusion"
],
"additionalProperties": false
},
"strict": true
}
I can easily reference my schema within my application by defining a response format sent with each request. Any request referencing my response format is now guaranteed to be correct in type and format. My app can always rely on accurate data when retrieving the values of the score
, score_explanation
, and conclusion
properties.
response_format: { "type": "json_schema", "json_schema": … , "strict": true }
Apple is Missing the AI Race
Apple is failing to implement artificial intelligence in a way that plays to their greatest strengths.
Apple has two major advantages in the AI race. Their ARM-based SoC’s unified memory architecture allows the GPU and Neural Engine access to far more RAM than competitors. This architectural advantage allows excellent performance of smaller models running on device as the context window grows. Each token requires a key/value pair, which causes the memory footprint to quickly grow as individual conversations get longer. Apple is not taking advantage of their most valuable resource, the platform advantage – access to all of my personal data.
Mark Gurman at Bloomberg reports:
“The goal is to ultimately offer a more versatile Siri that can seamlessly tap into customers’ information and communication. For instance, users will be able to ask for a file or song that they discussed with a friend over text. Siri would then automatically retrieve that item. Apple also has demonstrated the ability for Siri to quickly locate someone’s driver’s license number by reviewing their photos.”
This is Apple’s competitive differentiator and where Apple should have focused its resources from the start. Why can't I ask questions about my archived email or find correlations in exercise volume and sleep quality within the Health app?
Apple’s real AI advantage isn’t just hardware — it’s the platform. A company that prides itself on tight integration across devices should be leading in AI that understands me. The ability to surface insights from my personal data, securely and privately, is where Apple could create the most compelling user experience.