The 10 SQL Questions Every Data Interview Will Ask

SQL is the language of data interviews. Doesn't matter if you're applying for a Data Analyst, BI Engineer, or Data Engineer role — you will be tested on SQL. No exceptions.

The good news? Interviews aren't trying to trick you with obscure syntax. They're testing a handful of core concepts over and over. Once you understand these 10 patterns, you'll recognize them no matter how the question is dressed up.

Each question below includes the concept, why they ask it, a real example, and the SQL you need to write. Let's go.

Jump to a Question

GROUP BY + Aggregations
JOINs (all types)
Window Functions
CTEs
Subqueries
CASE WHEN
Date & Time Functions
HAVING vs WHERE
Ranking Functions
Finding Duplicates

Before You Start

Don't just read these — type them out. Use a free tool like SQLite Online or DB Fiddle to run each query yourself. Your hands need to know this, not just your eyes.

Easy

GROUP BY + Aggregations — The Foundation of Everything

If you only learn one thing before a data interview, it's this. GROUP BY + aggregation functions (COUNT, SUM, AVG, MIN, MAX) is the single most tested concept in data interviews. Period.

"Write a query to find the total revenue by product category."

The interviewer wants to see that you can group rows together and summarize them. Here's how it looks:

SQL

SELECT
  category,
  SUM(revenue) AS total_revenue,
  COUNT(*) AS num_orders,
  AVG(revenue) AS avg_order_value
FROM orders
GROUP BY category
ORDER BY total_revenue DESC;

Sample output:

category	total_revenue	num_orders	avg_order_value
Electronics	$142,000	320	$443.75
Clothing	$89,500	780	$114.74
Books	$21,200	1,040	$20.38

Anjalika's Tip

Every column in your SELECT that isn't inside an aggregation function (SUM, COUNT, etc.) must be in your GROUP BY. This trips up so many candidates. If you're selecting category and summing revenue, category goes in GROUP BY.

Easy

JOINs — INNER, LEFT, RIGHT, and When to Use Each

JOINs are tested in literally every data interview. They test whether you understand how tables relate to each other. The most important ones: INNER JOIN (only matching rows) and LEFT JOIN (all rows from left table, nulls where no match).

"Find all customers and their total orders. Include customers who have never placed an order."

The key word here is "include customers who have never placed an order" — that's a LEFT JOIN signal. An INNER JOIN would drop those customers entirely.

SQL

SELECT
  c.customer_id,
  c.name,
  COUNT(o.order_id) AS total_orders,
  COALESCE(SUM(o.amount), 0) AS total_spent
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC;

Anjalika's Tip

Notice the COALESCE around SUM — it replaces NULL with 0 for customers who never ordered. Forgetting this is a common mistake. Interviewers love when you handle NULLs without being asked.

Medium

Window Functions — The Interview Game Changer

Window functions are what separate junior candidates from strong ones. If you walk into a mid-level data interview and don't know window functions, you will struggle. The key insight: window functions calculate across a set of rows without collapsing them like GROUP BY does.

"For each customer, show every order they placed AND their running total spend over time."

SQL

SELECT
  customer_id,
  order_date,
  amount,
  SUM(amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM orders
ORDER BY customer_id, order_date;

customer_id	order_date	amount	running_total
101	2024-01-05	$50	$50
101	2024-02-10	$120	$170
101	2024-03-22	$80	$250
102	2024-01-12	$200	$200

Anjalika's Tip

PARTITION BY is like GROUP BY inside the window — it resets the calculation for each customer. ORDER BY inside OVER controls the direction of the running total. These two combined are incredibly powerful and used constantly in real analytics work.

Medium

CTEs — Writing SQL That's Actually Readable

A CTE (Common Table Expression) is a way to name a subquery and refer to it later — like creating a temporary table inline. Interviewers love CTEs because they show you can write clean, structured, readable SQL.

"Find all customers whose total spend is above the average customer spend."

SQL — Using a CTE

-- Step 1: Calculate total spend per customer
WITH customer_spend AS (
  SELECT
    customer_id,
    SUM(amount) AS total_spend
  FROM orders
  GROUP BY customer_id
),

-- Step 2: Calculate the average spend across all customers
avg_spend AS (
  SELECT AVG(total_spend) AS avg_total
  FROM customer_spend
)

-- Step 3: Filter customers above average
SELECT
  cs.customer_id,
  cs.total_spend,
  ROUND(a.avg_total, 2) AS avg_spend
FROM customer_spend cs
CROSS JOIN avg_spend a
WHERE cs.total_spend > a.avg_total
ORDER BY cs.total_spend DESC;

Anjalika's Tip

You could write this as a nested subquery, but CTEs are far cleaner and interviewers notice. In my experience at AWS, we use CTEs in almost every production query. Writing CTEs signals seniority and professionalism — even as a fresh grad.

Medium

Subqueries — A Query Inside a Query

A subquery is a SELECT nested inside another query. They come up constantly — either in the WHERE clause, the FROM clause, or the SELECT itself. You need to be comfortable reading and writing them.

"Find the names of customers who placed an order in the last 30 days."

SQL

SELECT name
FROM customers
WHERE customer_id IN (
  SELECT DISTINCT customer_id
  FROM orders
  WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);

Anjalika's Tip

Many interviewers will then ask: "Can you rewrite this as a JOIN?" The answer is yes — and you should know both versions. A JOIN is usually faster at scale. Showing you know both approaches is a great way to stand out.

Easy

CASE WHEN — SQL's If/Else Statement

CASE WHEN lets you add conditional logic directly in your SQL — like an if/else statement. It's used all the time to create categories, labels, and buckets from raw data.

"Classify each order as 'Small', 'Medium', or 'Large' based on the order amount."

SQL

SELECT
  order_id,
  customer_id,
  amount,
  CASE
    WHEN amount < 50  THEN 'Small'
    WHEN amount < 200 THEN 'Medium'
    ELSE 'Large'
  END AS order_size
FROM orders;

Anjalika's Tip

You can also use CASE WHEN inside aggregations — for example, COUNT(CASE WHEN status = 'completed' THEN 1 END) to count only completed orders. This is a very common pattern in real dashboards.

Medium

Date & Time Functions — Working With Timestamps

Almost every real dataset has dates. Interviews will test whether you can filter, extract, and manipulate timestamps. The exact functions vary slightly by database (PostgreSQL, MySQL, BigQuery, Redshift) but the concepts are the same.

"How many orders were placed each month in 2023?"

SQL (PostgreSQL / Redshift style)

SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(*) AS num_orders
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2023
GROUP BY 1
ORDER BY 1;

Anjalika's Tip

Always ask which database you're working with before the interview if you can — BigQuery uses DATE_TRUNC too, MySQL uses DATE_FORMAT, Snowflake uses TRUNC. If you're unsure in the interview, just say which flavor you're using and the interviewer will appreciate the transparency.

Easy

HAVING vs WHERE — A Classic Trick Question

This comes up in almost every SQL interview. The difference: WHERE filters rows before grouping. HAVING filters groups after aggregation. You cannot use WHERE to filter on an aggregated value like COUNT or SUM.

"Find all customers who placed more than 5 orders."

SQL — correct use of HAVING

-- ❌ WRONG — can't filter on COUNT in WHERE
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
WHERE COUNT(*) > 5  -- this will throw an error
GROUP BY customer_id;

-- ✅ CORRECT — use HAVING to filter after aggregation
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5
ORDER BY num_orders DESC;

Anjalika's Tip

Interviewers sometimes ask this as a verbal question: "What's the difference between WHERE and HAVING?" The clean answer: WHERE filters rows, HAVING filters groups. Say that out loud, then back it up with a quick example.

Hard

Ranking Functions — ROW_NUMBER, RANK, DENSE_RANK

Ranking functions are window functions that assign a rank to each row. They come up constantly in "top N per group" type questions — one of the most popular interview question formats.

"Find the top 3 best-selling products in each category."

SQL

WITH ranked_products AS (
  SELECT
    category,
    product_name,
    SUM(revenue) AS total_revenue,
    ROW_NUMBER() OVER (
      PARTITION BY category
      ORDER BY SUM(revenue) DESC
    ) AS rank
  FROM sales
  GROUP BY category, product_name
)
SELECT category, product_name, total_revenue
FROM ranked_products
WHERE rank <= 3
ORDER BY category, rank;

ROW_NUMBER vs RANK vs DENSE_RANK — what's the difference?

Function	Ties?	Gap after tie?	Use when...
ROW_NUMBER()	No — always unique	—	You want exactly N rows per group
RANK()	Yes — same rank	Yes (1,1,3)	Ties share rank, next rank skips
DENSE_RANK()	Yes — same rank	No (1,1,2)	Ties share rank, no gaps

Anjalika's Tip

The "top N per group" pattern using ROW_NUMBER() + CTE + WHERE rank <= N is one of the most asked SQL patterns in data interviews. Memorise this structure. You'll use it constantly in real work too.

Medium

Finding Duplicates — A Classic Real-World Problem

Real data is messy. Interviewers love to ask about duplicates because it mirrors actual data quality problems you'll face on the job. There are several ways to find (and remove) duplicates — you should know all of them.

"Find all customers who appear more than once in the customers table."

Method 1 — GROUP BY + HAVING

SELECT
  email,
  COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

Method 2 — ROW_NUMBER() to find and remove dupes

WITH deduped AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY email
      ORDER BY created_at DESC  -- keep most recent
    ) AS rn
  FROM customers
)
SELECT * FROM deduped
WHERE rn = 1;  -- only keep the first row per email

Anjalika's Tip

At AWS I deal with duplicate data more than you'd think. Knowing both methods — finding duplicates with GROUP BY/HAVING, and removing them with ROW_NUMBER() — will make you look genuinely experienced. Bonus: explain why you'd keep the most recent row (ORDER BY created_at DESC) rather than the oldest.

THE QUICK-REFERENCE CHEAT SHEET

SQL Interview Cheat Sheet

Summarize data by groupGROUP BY + SUM/COUNT/AVG

Combine tables (matching rows only)INNER JOIN

Keep all rows from left tableLEFT JOIN

Calculate across rows without collapsingOVER (PARTITION BY ... ORDER BY ...)

Named subquery for readabilityWITH name AS (...)

Conditional logic in SQLCASE WHEN ... THEN ... ELSE ... END

Filter after aggregationHAVING COUNT(*) > N

Top N per groupROW_NUMBER() OVER (...) + WHERE rn <= N

Truncate date to month/yearDATE_TRUNC('month', date_col)

Find duplicatesGROUP BY col HAVING COUNT(*) > 1

Replace NULL with a valueCOALESCE(col, default_value)

Running totalSUM(col) OVER (PARTITION BY ... ORDER BY ...)

🎯 How to Practice

Don't just read this post once and move on. Practice each pattern at least 3 times with different table scenarios. Great free resources: LeetCode SQL 50, Mode Analytics SQL Tutorial, and StrataScratch for real interview questions from specific companies.

WANT A MOCK SQL INTERVIEW?

I'll run you through a real practice session — same format as actual data interviews. Walk in ready, not just prepared.

Get Free Mentorship →

THE 10 SQL QUESTIONS EVERYDATA INTERVIEW WILL ASK

THE QUICK-REFERENCE CHEAT SHEET

SQL Interview Cheat Sheet

WANT A MOCK SQL INTERVIEW?

THE 10 SQL QUESTIONS EVERY
DATA INTERVIEW WILL ASK