SQL is the language of data interviews. Doesn't matter if you're applying for a Data Analyst, BI Engineer, or Data Engineer role — you will be tested on SQL. No exceptions.
The good news? Interviews aren't trying to trick you with obscure syntax. They're testing a handful of core concepts over and over. Once you understand these 10 patterns, you'll recognize them no matter how the question is dressed up.
Each question below includes the concept, why they ask it, a real example, and the SQL you need to write. Let's go.
Don't just read these — type them out. Use a free tool like SQLite Online or DB Fiddle to run each query yourself. Your hands need to know this, not just your eyes.
If you only learn one thing before a data interview, it's this. GROUP BY + aggregation functions (COUNT, SUM, AVG, MIN, MAX) is the single most tested concept in data interviews. Period.
The interviewer wants to see that you can group rows together and summarize them. Here's how it looks:
SELECT
category,
SUM(revenue) AS total_revenue,
COUNT(*) AS num_orders,
AVG(revenue) AS avg_order_value
FROM orders
GROUP BY category
ORDER BY total_revenue DESC;
Sample output:
| category | total_revenue | num_orders | avg_order_value |
|---|---|---|---|
| Electronics | $142,000 | 320 | $443.75 |
| Clothing | $89,500 | 780 | $114.74 |
| Books | $21,200 | 1,040 | $20.38 |
Every column in your SELECT that isn't inside an aggregation function (SUM, COUNT, etc.) must be in your GROUP BY. This trips up so many candidates. If you're selecting category and summing revenue, category goes in GROUP BY.
JOINs are tested in literally every data interview. They test whether you understand how tables relate to each other. The most important ones: INNER JOIN (only matching rows) and LEFT JOIN (all rows from left table, nulls where no match).
The key word here is "include customers who have never placed an order" — that's a LEFT JOIN signal. An INNER JOIN would drop those customers entirely.
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS total_orders,
COALESCE(SUM(o.amount), 0) AS total_spent
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC;
Notice the COALESCE around SUM — it replaces NULL with 0 for customers who never ordered. Forgetting this is a common mistake. Interviewers love when you handle NULLs without being asked.
Window functions are what separate junior candidates from strong ones. If you walk into a mid-level data interview and don't know window functions, you will struggle. The key insight: window functions calculate across a set of rows without collapsing them like GROUP BY does.
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
ORDER BY customer_id, order_date;
| customer_id | order_date | amount | running_total |
|---|---|---|---|
| 101 | 2024-01-05 | $50 | $50 |
| 101 | 2024-02-10 | $120 | $170 |
| 101 | 2024-03-22 | $80 | $250 |
| 102 | 2024-01-12 | $200 | $200 |
PARTITION BY is like GROUP BY inside the window — it resets the calculation for each customer. ORDER BY inside OVER controls the direction of the running total. These two combined are incredibly powerful and used constantly in real analytics work.
A CTE (Common Table Expression) is a way to name a subquery and refer to it later — like creating a temporary table inline. Interviewers love CTEs because they show you can write clean, structured, readable SQL.
-- Step 1: Calculate total spend per customer
WITH customer_spend AS (
SELECT
customer_id,
SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id
),
-- Step 2: Calculate the average spend across all customers
avg_spend AS (
SELECT AVG(total_spend) AS avg_total
FROM customer_spend
)
-- Step 3: Filter customers above average
SELECT
cs.customer_id,
cs.total_spend,
ROUND(a.avg_total, 2) AS avg_spend
FROM customer_spend cs
CROSS JOIN avg_spend a
WHERE cs.total_spend > a.avg_total
ORDER BY cs.total_spend DESC;
You could write this as a nested subquery, but CTEs are far cleaner and interviewers notice. In my experience at AWS, we use CTEs in almost every production query. Writing CTEs signals seniority and professionalism — even as a fresh grad.
A subquery is a SELECT nested inside another query. They come up constantly — either in the WHERE clause, the FROM clause, or the SELECT itself. You need to be comfortable reading and writing them.
SELECT name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
Many interviewers will then ask: "Can you rewrite this as a JOIN?" The answer is yes — and you should know both versions. A JOIN is usually faster at scale. Showing you know both approaches is a great way to stand out.
CASE WHEN lets you add conditional logic directly in your SQL — like an if/else statement. It's used all the time to create categories, labels, and buckets from raw data.
SELECT
order_id,
customer_id,
amount,
CASE
WHEN amount < 50 THEN 'Small'
WHEN amount < 200 THEN 'Medium'
ELSE 'Large'
END AS order_size
FROM orders;
You can also use CASE WHEN inside aggregations — for example, COUNT(CASE WHEN status = 'completed' THEN 1 END) to count only completed orders. This is a very common pattern in real dashboards.
Almost every real dataset has dates. Interviews will test whether you can filter, extract, and manipulate timestamps. The exact functions vary slightly by database (PostgreSQL, MySQL, BigQuery, Redshift) but the concepts are the same.
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS num_orders
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2023
GROUP BY 1
ORDER BY 1;
Always ask which database you're working with before the interview if you can — BigQuery uses DATE_TRUNC too, MySQL uses DATE_FORMAT, Snowflake uses TRUNC. If you're unsure in the interview, just say which flavor you're using and the interviewer will appreciate the transparency.
This comes up in almost every SQL interview. The difference: WHERE filters rows before grouping. HAVING filters groups after aggregation. You cannot use WHERE to filter on an aggregated value like COUNT or SUM.
-- ❌ WRONG — can't filter on COUNT in WHERE
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
WHERE COUNT(*) > 5 -- this will throw an error
GROUP BY customer_id;
-- ✅ CORRECT — use HAVING to filter after aggregation
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5
ORDER BY num_orders DESC;
Interviewers sometimes ask this as a verbal question: "What's the difference between WHERE and HAVING?" The clean answer: WHERE filters rows, HAVING filters groups. Say that out loud, then back it up with a quick example.
Ranking functions are window functions that assign a rank to each row. They come up constantly in "top N per group" type questions — one of the most popular interview question formats.
WITH ranked_products AS (
SELECT
category,
product_name,
SUM(revenue) AS total_revenue,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY SUM(revenue) DESC
) AS rank
FROM sales
GROUP BY category, product_name
)
SELECT category, product_name, total_revenue
FROM ranked_products
WHERE rank <= 3
ORDER BY category, rank;
ROW_NUMBER vs RANK vs DENSE_RANK — what's the difference?
| Function | Ties? | Gap after tie? | Use when... |
|---|---|---|---|
| ROW_NUMBER() | No — always unique | — | You want exactly N rows per group |
| RANK() | Yes — same rank | Yes (1,1,3) | Ties share rank, next rank skips |
| DENSE_RANK() | Yes — same rank | No (1,1,2) | Ties share rank, no gaps |
The "top N per group" pattern using ROW_NUMBER() + CTE + WHERE rank <= N is one of the most asked SQL patterns in data interviews. Memorise this structure. You'll use it constantly in real work too.
Real data is messy. Interviewers love to ask about duplicates because it mirrors actual data quality problems you'll face on the job. There are several ways to find (and remove) duplicates — you should know all of them.
SELECT
email,
COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
WITH deduped AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC -- keep most recent
) AS rn
FROM customers
)
SELECT * FROM deduped
WHERE rn = 1; -- only keep the first row per email
At AWS I deal with duplicate data more than you'd think. Knowing both methods — finding duplicates with GROUP BY/HAVING, and removing them with ROW_NUMBER() — will make you look genuinely experienced. Bonus: explain why you'd keep the most recent row (ORDER BY created_at DESC) rather than the oldest.
THE QUICK-REFERENCE CHEAT SHEET
SQL Interview Cheat Sheet
Don't just read this post once and move on. Practice each pattern at least 3 times with different table scenarios. Great free resources: LeetCode SQL 50, Mode Analytics SQL Tutorial, and StrataScratch for real interview questions from specific companies.
WANT A MOCK SQL INTERVIEW?
I'll run you through a real practice session — same format as actual data interviews. Walk in ready, not just prepared.
Get Free Mentorship →