Prompt Engineering for Data Analysis
Prompt Engineering for Data Analysis focuses on using LLMs to understand datasets, clean data, write queries, generate insights, or produce Python code. Unlike creative or content prompts, data prompts depend heavily on specific instructions, format requirements, and clarity of the dataset structure.
This guide provides frameworks and ready-to-use prompts for practical data analysis tasks.
Why Prompt Engineering Matters in Data Analysis
Data analysis involves structured steps:
- Understanding the dataset
- Cleaning and transforming data
- Asking analytical questions
- Writing SQL or Python
- Interpreting outputs
- Communicating insights
Clear prompts help LLMs follow these steps and avoid hallucination.
Example:
Weak Prompt
Analyse this dataset.
Strong Prompt
“Here is a dataset with 10 columns and 500 rows. Identify missing values, suggest cleaning steps, and provide Python code using Pandas to apply those steps.”
Core Structure for Data Analysis Prompts
Context → Dataset Description → Task → Constraints → Output Format
Example:
Prompt
You are a data analyst. I will give you a dataset description. Suggest EDA steps, highlight potential issues, and provide clear, step-by-step analysis.
1. Data Cleaning Prompts
- A. Identify Cleaning Requirements
- B. Generate Cleaning Code
- C. Standardise Data
Prompt
I have a dataset with the following columns: [list columns].Identify missing values, duplicates, inconsistencies, and potential outliers.Explain the cleaning steps required in bullet points.
Prompt
Based on the issues you identified, provide Python (Pandas) code to clean the dataset.Make the code readable and explain each step.
Prompt
Suggest the best way to standardise and normalise the numerical columns in this dataset. Provide code and a short explanation.
2. Exploratory Data Analysis (EDA) Prompts
A. EDA Overview
Prompt
Perform a structured EDA based on the dataset description below.
Include:
Column summaries
Numeric vs categorical distribution
Outlier detection
Key insights
Dataset: [describe dataset]
B. Chart Suggestions
Prompt
Suggest 5 different charts to visualise patterns in this dataset. Explain what each chart would reveal.
C. EDA Code
Prompt
Write Python code using Pandas and Matplotlib to generate basic EDA charts: histogram, bar chart, correlation matrix, and box plot.
3. Statistical Analysis Prompts
A. Summary Statistics
Prompt
Provide descriptive statistics for all numeric columns in a reader-friendly format. Explain what each statistic means for non-technical readers.
B. Correlation Insights
Prompt
Explain which variables are most correlated and why this might matter in analysis. Avoid assumptions without evidence.
C. Hypothesis Testing
Prompt
“Based on the dataset description, propose 2–3 hypothesis tests to run. Explain which statistical test to use and why.
4. Machine Learning Prompts
A. Feature Engineering
Prompt
Suggest feature engineering ideas for this dataset. Include domain-specific enhancements, transformations, and encoding techniques.
B. Model Selection
Prompt
Based on the target variable type (numeric or categorical), recommend suitable machine learning models. Explain your reasoning.
C. Training Code
Prompt
Write clean and simple Python code using scikit-learn to:
Split the dataset
Train [model]
Evaluate it using appropriate metrics.
5. SQL Query Prompts
A. Query Writing
Prompt
Write an SQL query to achieve the following: [describe task]. Use readable formatting and include comments.
B. Query Optimisation
Prompt
Suggest performance improvements for the SQL query below. Explain why each optimisation helps. Query: [paste here]
6. Insight Generation Prompts
A. Natural-Language Insights
Prompt
Based on the following summary statistics, write 5 key insights in plain English. Make the explanations simple and actionable.
B. Business Interpretation
Prompt
Translate the analytical findings below into business insights that a non-technical manager can understand. Avoid jargon and focus on implications.
Visual Guide: Input vs Output
Prompt
Here is a dataset description with 12 features. Generate cleaning steps, EDA suggestions, and Python code in separate sections.
Simplified Output
Section 1: Cleaning
Section 2: EDA
Section 3: Python code
Section 4: Key insights
This illustrates how structured prompts lead to predictable, organised analytical outputs.
Conclusion
--Infinite Ripples | HK
Comments
Post a Comment