Skip to content

SQL Interview Questions for E-Commerce and Marketplace Data Science Roles

DS
LDS Team
Let's Data Science
15 min

According to a 2025 StrataScratch analysis, over 70% of data science SQL interviews at large technology companies involve JOINs, CTEs, or window functions applied to transactional data. If you are interviewing at a company with an e-commerce, marketplace, or logistics business model, the SQL questions you face will test whether you can think in orders, sessions, funnels, and cohorts. This guide walks through 15 real-world SQL problems drawn from the patterns that show up most often.

The questions in this article are drawn from our research of publicly available interview prep platforms and community discussions, including DataLemur, StrataScratch, LeetCode, and forums such as r/datascience and Blind. These represent patterns that candidates have reported across interviews at large technology companies with e-commerce and marketplace operations. We do not claim these are proprietary questions from any specific company.

Customer and Order Analytics

These questions test your ability to aggregate transactional data, handle ties in rankings, and compute metrics that drive business decisions around customer retention.

Question 1: Top N Customers by Total Revenue

A major online retailer asks you to find the top five customers by total spending in 2024. If two customers are tied, both should appear.

Sample Schema

sql
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT,
    order_date  DATE,
    total_amount DECIMAL(10,2)
);
order_idcustomer_idorder_datetotal_amount
10012012024-03-15249.99
10022022024-04-0189.50
10032012024-06-22310.00
10042032024-07-10559.99
10052022024-09-05470.50
10062042024-11-28559.99

The approach is to aggregate first, rank second, then filter. Using DENSE_RANK() instead of RANK() preserves consecutive numbering even when ties exist, which is what interviewers expect when they say "top 5."

sql
WITH customer_spend AS (
    SELECT
        customer_id,
        SUM(total_amount) AS total_revenue
    FROM orders
    WHERE order_date >= '2024-01-01'
      AND order_date < '2025-01-01'
    GROUP BY customer_id
),
ranked AS (
    SELECT
        customer_id,
        total_revenue,
        DENSE_RANK() OVER (ORDER BY total_revenue DESC) AS rnk
    FROM customer_spend
)
SELECT customer_id, total_revenue, rnk
FROM ranked
WHERE rnk <= 5
ORDER BY rnk;

Expected Output

customer_idtotal_revenuernk
202560.001
201559.992
203559.992
204559.992

Key Insight: DENSE_RANK() returns consecutive ranks (1, 2, 3) while RANK() skips ranks after ties (1, 1, 3). When the question says "top 5," clarify which behavior is expected before writing code.

What the Interviewer Is Testing: Window function selection, date filtering with half-open intervals, and whether you ask about tie-breaking behavior.

Question 2: Repeat Purchase Rate

A large marketplace company wants to know the percentage of customers who made more than one purchase in the last 12 months.

Sample Schema

sql
CREATE TABLE transactions (
    transaction_id INT PRIMARY KEY,
    customer_id    INT,
    purchase_date  DATE,
    amount         DECIMAL(10,2)
);

The key is separating the count step from the ratio step. Count distinct purchase dates per customer, classify each as single or repeat, then compute the ratio.

sql
WITH customer_counts AS (
    SELECT
        customer_id,
        COUNT(DISTINCT purchase_date) AS purchase_days
    FROM transactions
    WHERE purchase_date >= CURRENT_DATE - INTERVAL '12 months'
    GROUP BY customer_id
)
SELECT
    ROUND(
        100.0 * COUNT(CASE WHEN purchase_days > 1 THEN 1 END)
        / COUNT(*),
        2
    ) AS repeat_purchase_rate_pct
FROM customer_counts;

Expected Output

repeat_purchase_rate_pct
38.24

Key Insight: Using COUNT(DISTINCT purchase_date) instead of COUNT(*) avoids inflating the repeat count when a customer places multiple orders on the same day. Interviewers notice this nuance.

What the Interviewer Is Testing: Conditional aggregation with CASE WHEN inside an aggregate, and whether you de-duplicate same-day orders.

Question 3: Customer Order History with Running Totals

A cloud services provider with e-commerce operations needs a report showing each customer's orders alongside a running cumulative spend, ordered chronologically.

Sample Schema

sql
CREATE TABLE orders (
    order_id     INT PRIMARY KEY,
    customer_id  INT,
    order_date   DATE,
    order_total  DECIMAL(10,2)
);
sql
SELECT
    customer_id,
    order_id,
    order_date,
    order_total,
    SUM(order_total) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_spend
FROM orders
ORDER BY customer_id, order_date;

Expected Output

customer_idorder_idorder_dateorder_totalcumulative_spend
20110012024-03-15249.99249.99
20110032024-06-22310.00559.99
20210022024-04-0189.5089.50
20210052024-09-05470.50560.00

Key Insight: Always specify the window frame explicitly (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). The default frame changes depending on whether ORDER BY is present, and relying on implicit behavior is a common source of bugs.

What the Interviewer Is Testing: Window frame syntax, PARTITION BY usage, and awareness of frame defaults.

Product and Inventory Analysis

Product-level queries test your ability to work with normalized schemas (orders, order items, products) and to handle category-level aggregations.

Question 4: Highest-Grossing Products by Category

A major e-commerce platform asks you to find the top two products by total revenue within each product category for the year 2024.

Sample Schema

sql
CREATE TABLE product_spend (
    category         VARCHAR(50),
    product          VARCHAR(100),
    user_id          INT,
    spend            DECIMAL(10,2),
    transaction_date DATE
);
categoryproductuser_idspendtransaction_date
ElectronicsHeadphones10179.992024-02-14
ElectronicsTablet102349.002024-03-20
ElectronicsHeadphones10379.992024-05-11
AppliancesBlender10159.992024-01-08
AppliancesAir Fryer104129.992024-06-30
AppliancesBlender10559.992024-08-15

Aggregate spending by category and product, rank within each category, then filter.

sql
WITH category_product_totals AS (
    SELECT
        category,
        product,
        SUM(spend) AS total_spend
    FROM product_spend
    WHERE transaction_date >= '2024-01-01'
      AND transaction_date < '2025-01-01'
    GROUP BY category, product
),
ranked AS (
    SELECT
        category,
        product,
        total_spend,
        RANK() OVER (
            PARTITION BY category
            ORDER BY total_spend DESC
        ) AS product_rank
    FROM category_product_totals
)
SELECT category, product, total_spend
FROM ranked
WHERE product_rank <= 2
ORDER BY category, product_rank;

Expected Output

categoryproducttotal_spend
AppliancesAir Fryer129.99
AppliancesBlender119.98
ElectronicsTablet349.00
ElectronicsHeadphones159.98

Key Insight: This is one of the most reported SQL questions from e-commerce interviews. The pattern of aggregate, rank with a window function, then filter in an outer query appears repeatedly. Memorize this three-step CTE structure.

What the Interviewer Is Testing: PARTITION BY within RANK(), date filtering, and whether you handle the outer filter correctly (you cannot put WHERE product_rank <= 2 inside the CTE that defines the rank).

Question 5: Inventory Turnover Rate

A large online retailer wants to calculate inventory turnover for each product over the last 90 days. Inventory turnover measures how many times the available stock was sold through.

Sample Schema

sql
CREATE TABLE daily_inventory (
    product_id    INT,
    snapshot_date DATE,
    units_on_hand INT
);

CREATE TABLE order_items (
    order_id    INT,
    product_id  INT,
    quantity    INT,
    order_date  DATE
);
sql
WITH trailing_sales AS (
    SELECT
        product_id,
        SUM(quantity) AS units_sold_90d
    FROM order_items
    WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
    GROUP BY product_id
),
avg_inventory AS (
    SELECT
        product_id,
        AVG(units_on_hand) AS avg_units_on_hand
    FROM daily_inventory
    WHERE snapshot_date >= CURRENT_DATE - INTERVAL '90 days'
    GROUP BY product_id
)
SELECT
    s.product_id,
    s.units_sold_90d,
    i.avg_units_on_hand,
    ROUND(s.units_sold_90d::DECIMAL / NULLIF(i.avg_units_on_hand, 0), 2)
        AS turnover_ratio
FROM trailing_sales s
JOIN avg_inventory i ON s.product_id = i.product_id
ORDER BY turnover_ratio DESC;

Expected Output

product_idunits_sold_90davg_units_on_handturnover_ratio
50128401207.00
50346202003.10
50011503000.50

Key Insight: NULLIF(i.avg_units_on_hand, 0) prevents division-by-zero errors for products with no inventory records. This defensive pattern shows production-level thinking.

What the Interviewer Is Testing: Multi-table joins across different time grains, NULLIF for safe division, and your understanding of what inventory turnover means as a business metric.

Question 6: Frequently Bought Together (Product Pair Analysis)

A marketplace company asks you to find the top 10 product pairs that are most often purchased in the same order.

Sample Schema

sql
CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    quantity   INT
);

The trick is a self-join on order_id with a condition that prevents duplicate pairs and self-pairs.

sql
SELECT
    a.product_id AS product_a,
    b.product_id AS product_b,
    COUNT(*) AS times_bought_together
FROM order_items a
JOIN order_items b
    ON a.order_id = b.order_id
    AND a.product_id < b.product_id
GROUP BY a.product_id, b.product_id
ORDER BY times_bought_together DESC
LIMIT 10;

Expected Output

product_aproduct_btimes_bought_together
10011045312
10011023287
10231045241

Key Insight: The a.product_id < b.product_id condition is critical. Without it, the pair (1001, 1045) and (1045, 1001) would both appear, doubling every count. Interviewers specifically check for this.

Common Mistake: Using != instead of < in the join condition. This eliminates self-joins but still produces duplicate mirrored pairs.

Funnel and Conversion Queries

Funnel analysis is where e-commerce SQL interviews get distinctive. These questions test conditional aggregation and your ability to reason about user sessions.

Question 7: Checkout Funnel Conversion Rates

A major e-commerce platform tracks user events through a four-step funnel: page view, add to cart, begin checkout, purchase. Calculate the conversion rate at each step.

Sample Schema

sql
CREATE TABLE user_events (
    event_id   INT PRIMARY KEY,
    user_id    INT,
    event_type VARCHAR(20),
    event_date DATE
);
-- event_type values: 'page_view', 'add_to_cart', 'begin_checkout', 'purchase'
sql
WITH funnel_counts AS (
    SELECT
        COUNT(DISTINCT CASE WHEN event_type = 'page_view'
            THEN user_id END) AS page_views,
        COUNT(DISTINCT CASE WHEN event_type = 'add_to_cart'
            THEN user_id END) AS add_to_carts,
        COUNT(DISTINCT CASE WHEN event_type = 'begin_checkout'
            THEN user_id END) AS checkouts,
        COUNT(DISTINCT CASE WHEN event_type = 'purchase'
            THEN user_id END) AS purchases
    FROM user_events
)
SELECT
    page_views,
    add_to_carts,
    ROUND(100.0 * add_to_carts / NULLIF(page_views, 0), 2)
        AS view_to_cart_pct,
    checkouts,
    ROUND(100.0 * checkouts / NULLIF(add_to_carts, 0), 2)
        AS cart_to_checkout_pct,
    purchases,
    ROUND(100.0 * purchases / NULLIF(checkouts, 0), 2)
        AS checkout_to_purchase_pct
FROM funnel_counts;

Expected Output

page_viewsadd_to_cartsview_to_cart_pctcheckoutscart_to_checkout_pctpurchasescheckout_to_purchase_pct
10000350035.00210060.00168080.00

Key Insight: Using COUNT(DISTINCT ... user_id) inside CASE WHEN is the standard pattern for conditional aggregation in funnel queries. This counts unique users at each stage, not total events.

What the Interviewer Is Testing: Conditional aggregation, CASE WHEN inside aggregate functions, and whether you compute step-to-step rates rather than only overall conversion.

Question 8: Cart Abandonment by Day of Week

A large marketplace company needs to know which days of the week have the highest cart abandonment rate. A cart is abandoned when a user has an add_to_cart event but no subsequent purchase event on the same day.

Sample Schema

sql
CREATE TABLE user_events (
    event_id   INT PRIMARY KEY,
    user_id    INT,
    event_type VARCHAR(20),
    event_date DATE
);
sql
WITH daily_carts AS (
    SELECT DISTINCT user_id, event_date
    FROM user_events
    WHERE event_type = 'add_to_cart'
),
daily_purchases AS (
    SELECT DISTINCT user_id, event_date
    FROM user_events
    WHERE event_type = 'purchase'
),
cart_status AS (
    SELECT
        c.event_date,
        c.user_id,
        CASE WHEN p.user_id IS NULL THEN 1 ELSE 0 END AS abandoned
    FROM daily_carts c
    LEFT JOIN daily_purchases p
        ON c.user_id = p.user_id
        AND c.event_date = p.event_date
)
SELECT
    TO_CHAR(event_date, 'Day') AS day_of_week,
    EXTRACT(DOW FROM event_date) AS day_num,
    COUNT(*) AS total_carts,
    SUM(abandoned) AS abandoned_carts,
    ROUND(100.0 * SUM(abandoned) / COUNT(*), 2) AS abandonment_rate_pct
FROM cart_status
GROUP BY TO_CHAR(event_date, 'Day'), EXTRACT(DOW FROM event_date)
ORDER BY day_num;

Expected Output

day_of_weekday_numtotal_cartsabandoned_cartsabandonment_rate_pct
Sunday0142098769.51
Monday11850114762.00
Tuesday21780110361.97

Key Insight: The LEFT JOIN between cart events and purchase events is the core of abandonment logic. Users who added to cart but never purchased will have NULL in the purchase side of the join.

What the Interviewer Is Testing: LEFT JOIN for set difference logic, date-level granularity decisions, and whether you can translate a business definition ("abandoned cart") into precise SQL.

Question 9: Session-Based Funnel with Time Constraint

A major online retailer defines a valid conversion as a purchase that occurs within 30 minutes of the initial page view in the same session. Calculate the time-constrained conversion rate.

Sample Schema

sql
CREATE TABLE session_events (
    event_id    INT PRIMARY KEY,
    session_id  VARCHAR(36),
    user_id     INT,
    event_type  VARCHAR(20),
    event_ts    TIMESTAMP
);
sql
WITH session_starts AS (
    SELECT
        session_id,
        user_id,
        MIN(event_ts) AS session_start
    FROM session_events
    WHERE event_type = 'page_view'
    GROUP BY session_id, user_id
),
session_purchases AS (
    SELECT
        session_id,
        MIN(event_ts) AS purchase_ts
    FROM session_events
    WHERE event_type = 'purchase'
    GROUP BY session_id
)
SELECT
    COUNT(DISTINCT s.session_id) AS total_sessions,
    COUNT(DISTINCT CASE
        WHEN p.purchase_ts IS NOT NULL
         AND p.purchase_ts <= s.session_start + INTERVAL '30 minutes'
        THEN s.session_id
    END) AS converted_sessions,
    ROUND(100.0 *
        COUNT(DISTINCT CASE
            WHEN p.purchase_ts IS NOT NULL
             AND p.purchase_ts <= s.session_start + INTERVAL '30 minutes'
            THEN s.session_id
        END)
        / COUNT(DISTINCT s.session_id),
    2) AS conversion_rate_pct
FROM session_starts s
LEFT JOIN session_purchases p
    ON s.session_id = p.session_id;

Expected Output

total_sessionsconverted_sessionsconversion_rate_pct
4500029256.50

Key Insight: Time-constrained funnels are more realistic than simple event-presence funnels. Real purchase attribution almost always involves a time window, whether it is 30 minutes for sessions or 7 days for marketing attribution.

What the Interviewer Is Testing: Timestamp arithmetic, interval logic, and whether you apply the time constraint before computing the ratio rather than after.

Seller and Marketplace Metrics

Two-sided marketplace interviews add a second dimension: seller performance. These questions test your ability to work with data from both sides of the platform.

Question 10: Top Sellers by Fulfillment Speed

A large marketplace company asks you to rank sellers by their average order fulfillment time (days between order placement and shipment) and identify the top 10 fastest.

Sample Schema

sql
CREATE TABLE marketplace_orders (
    order_id    INT PRIMARY KEY,
    seller_id   INT,
    buyer_id    INT,
    order_date  DATE,
    ship_date   DATE,
    order_total DECIMAL(10,2)
);
sql
SELECT
    seller_id,
    COUNT(*) AS total_orders,
    ROUND(AVG(ship_date - order_date), 2) AS avg_fulfillment_days,
    ROUND(
        PERCENTILE_CONT(0.5) WITHIN GROUP (
            ORDER BY ship_date - order_date
        )::NUMERIC, 2
    ) AS median_fulfillment_days
FROM marketplace_orders
WHERE ship_date IS NOT NULL
GROUP BY seller_id
HAVING COUNT(*) >= 10
ORDER BY avg_fulfillment_days ASC
LIMIT 10;

Expected Output

seller_idtotal_ordersavg_fulfillment_daysmedian_fulfillment_days
30214871.241.00
30893121.411.00
30451561.582.00

Key Insight: The HAVING COUNT(*) >= 10 filter removes sellers with too few orders to produce a meaningful average. In real marketplace analytics, minimum order thresholds prevent low-volume sellers from appearing in top-performer lists due to statistical noise.

What the Interviewer Is Testing: Date arithmetic for fulfillment time, HAVING clause for minimum sample filtering, and the difference between mean and median for skewed distributions.

Question 11: Seller Rating Distribution

A marketplace platform wants to categorize sellers into performance tiers based on their average customer rating: "Excellent" (4.5 or above), "Good" (3.5 to 4.49), "Needs Improvement" (below 3.5).

Sample Schema

sql
CREATE TABLE seller_reviews (
    review_id  INT PRIMARY KEY,
    seller_id  INT,
    buyer_id   INT,
    rating     DECIMAL(2,1),
    review_date DATE
);
sql
WITH seller_avg AS (
    SELECT
        seller_id,
        COUNT(*) AS review_count,
        ROUND(AVG(rating), 2) AS avg_rating
    FROM seller_reviews
    GROUP BY seller_id
    HAVING COUNT(*) >= 5
)
SELECT
    CASE
        WHEN avg_rating >= 4.5 THEN 'Excellent'
        WHEN avg_rating >= 3.5 THEN 'Good'
        ELSE 'Needs Improvement'
    END AS tier,
    COUNT(*) AS seller_count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_of_sellers
FROM seller_avg
GROUP BY
    CASE
        WHEN avg_rating >= 4.5 THEN 'Excellent'
        WHEN avg_rating >= 3.5 THEN 'Good'
        ELSE 'Needs Improvement'
    END
ORDER BY
    CASE tier
        WHEN 'Excellent' THEN 1
        WHEN 'Good' THEN 2
        ELSE 3
    END;

Expected Output

tierseller_countpct_of_sellers
Excellent124041.33
Good138046.00
Needs Improvement38012.67

Key Insight: SUM(COUNT(*)) OVER () computes the grand total across all groups in a single pass, which is cleaner than using a subquery or CTE to get the denominator for percentage calculations.

What the Interviewer Is Testing: CASE WHEN for tiering, window functions mixed with GROUP BY, and custom ORDER BY logic for non-alphabetical sorting.

Question 12: 7-Day Rolling Revenue per Seller

A marketplace platform tracks daily seller revenue and wants a 7-day rolling average to smooth out day-to-day fluctuations.

Sample Schema

sql
CREATE TABLE daily_seller_revenue (
    seller_id INT,
    rev_date  DATE,
    revenue   DECIMAL(10,2)
);
sql
SELECT
    seller_id,
    rev_date,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            PARTITION BY seller_id
            ORDER BY rev_date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ), 2
    ) AS rolling_7d_avg
FROM daily_seller_revenue
ORDER BY seller_id, rev_date;

Expected Output

seller_idrev_daterevenuerolling_7d_avg
30212024-10-011250.001250.00
30212024-10-02980.001115.00
30212024-10-031340.001190.00
30212024-10-071120.001158.57

Key Insight: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW gives exactly 7 rows (current plus 6 prior). If your data has gaps (missing dates), consider using RANGE instead of ROWS, or fill gaps with a date spine before applying the window.

Common Mistake: Writing ROWS BETWEEN 7 PRECEDING AND CURRENT ROW gives an 8-row window. This off-by-one error is one of the most frequent mistakes in rolling window questions.

Time-Based Business Intelligence

Time-series questions are where e-commerce interviews differentiate strong candidates from average ones. These patterns appear in almost every data science take-home and live coding round.

Question 13: Month-over-Month Revenue Growth

A major e-commerce platform needs a report showing each month's total revenue alongside the percentage change from the prior month.

Sample Schema

sql
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    order_date  DATE,
    total_amount DECIMAL(10,2)
);
sql
WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', order_date) AS revenue_month,
        SUM(total_amount) AS monthly_total
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
    revenue_month,
    monthly_total,
    LAG(monthly_total) OVER (ORDER BY revenue_month) AS prev_month_total,
    ROUND(
        100.0 * (monthly_total - LAG(monthly_total) OVER (ORDER BY revenue_month))
        / NULLIF(LAG(monthly_total) OVER (ORDER BY revenue_month), 0),
        2
    ) AS mom_growth_pct
FROM monthly_revenue
ORDER BY revenue_month;

Expected Output

revenue_monthmonthly_totalprev_month_totalmom_growth_pct
2024-01-011240000.00NULLNULL
2024-02-011185000.001240000.00-4.44
2024-03-011367000.001185000.0015.36

Key Insight: LAG() with no offset parameter defaults to 1, which gives you the immediately preceding row. For year-over-year comparisons, you would use LAG(monthly_total, 12) to reach back 12 months.

What the Interviewer Is Testing: LAG() window function, DATE_TRUNC for period aggregation, and handling the NULL first row gracefully rather than showing 0% or an error.

Question 14: Year-over-Year Product Spend Growth

A large e-commerce platform asks you to calculate the year-over-year growth rate for total spending on each product.

Sample Schema

sql
CREATE TABLE user_transactions (
    transaction_id   INT PRIMARY KEY,
    product_id       INT,
    spend            DECIMAL(10,2),
    transaction_date DATE
);
sql
WITH yearly_spend AS (
    SELECT
        EXTRACT(YEAR FROM transaction_date) AS txn_year,
        product_id,
        SUM(spend) AS annual_spend
    FROM user_transactions
    GROUP BY EXTRACT(YEAR FROM transaction_date), product_id
)
SELECT
    txn_year,
    product_id,
    annual_spend AS curr_year_spend,
    LAG(annual_spend) OVER (
        PARTITION BY product_id ORDER BY txn_year
    ) AS prev_year_spend,
    ROUND(
        100.0 * (annual_spend - LAG(annual_spend) OVER (
            PARTITION BY product_id ORDER BY txn_year
        ))
        / NULLIF(LAG(annual_spend) OVER (
            PARTITION BY product_id ORDER BY txn_year
        ), 0),
        2
    ) AS yoy_growth_pct
FROM yearly_spend
ORDER BY product_id, txn_year;

Expected Output

txn_yearproduct_idcurr_year_spendprev_year_spendyoy_growth_pct
2022100115200.00NULLNULL
2023100118400.0015200.0021.05
2024100116100.0018400.00-12.50

Key Insight: This question is one of the most commonly reported from interviews at e-commerce companies, per DataLemur. The pattern is identical to month-over-month except that LAG() is partitioned by product and ordered by year. Once you understand the MoM pattern, YoY is a direct adaptation.

What the Interviewer Is Testing: LAG() with PARTITION BY, EXTRACT(YEAR FROM ...) for year-level aggregation, and safe division with NULLIF.

Question 15: First-Purchase Cohort Retention

A major online retailer asks you to build a cohort table showing what percentage of customers who made their first purchase in a given month returned to purchase in each of the following three months.

Sample Schema

sql
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT,
    order_date  DATE
);
sql
WITH first_purchase AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM orders
    GROUP BY customer_id
),
subsequent_orders AS (
    SELECT
        fp.cohort_month,
        fp.customer_id,
        DATE_TRUNC('month', o.order_date) AS order_month,
        (EXTRACT(YEAR FROM DATE_TRUNC('month', o.order_date))
         - EXTRACT(YEAR FROM fp.cohort_month)) * 12
        + (EXTRACT(MONTH FROM DATE_TRUNC('month', o.order_date))
         - EXTRACT(MONTH FROM fp.cohort_month)) AS months_since_first
    FROM first_purchase fp
    JOIN orders o ON fp.customer_id = o.customer_id
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(DISTINCT customer_id) AS cohort_size
    FROM first_purchase
    GROUP BY cohort_month
)
SELECT
    cs.cohort_month,
    cs.cohort_size,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN so.months_since_first = 1
        THEN so.customer_id END) / cs.cohort_size, 2) AS month_1_retention,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN so.months_since_first = 2
        THEN so.customer_id END) / cs.cohort_size, 2) AS month_2_retention,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN so.months_since_first = 3
        THEN so.customer_id END) / cs.cohort_size, 2) AS month_3_retention
FROM cohort_sizes cs
LEFT JOIN subsequent_orders so
    ON cs.cohort_month = so.cohort_month
    AND so.months_since_first BETWEEN 1 AND 3
GROUP BY cs.cohort_month, cs.cohort_size
ORDER BY cs.cohort_month;

Expected Output

cohort_monthcohort_sizemonth_1_retentionmonth_2_retentionmonth_3_retention
2024-01-01234028.4219.1515.60
2024-02-01218031.1021.3316.88
2024-03-01251026.7718.45NULL

Key Insight: Cohort retention is the single most asked advanced SQL question at e-commerce companies. The structure is always the same: identify each customer's first event, calculate the time delta to subsequent events, then pivot into a retention grid. Practice this pattern until you can write it from memory.

What the Interviewer Is Testing: CTEs for multi-step logic, date arithmetic for month deltas, conditional aggregation with COUNT(DISTINCT CASE WHEN ...), and whether you handle incomplete cohorts (the NULL for month 3 in recent cohorts) correctly.

Conclusion

The 15 questions in this guide cover the patterns that appear most frequently in SQL interviews at companies with e-commerce and marketplace business models. Three themes dominate: window functions (RANK, LAG, SUM OVER), conditional aggregation (CASE WHEN inside COUNT and SUM), and multi-step CTE pipelines that break complex business logic into readable stages.

The questions that separate candidates are not the hardest technically. They are the ones that require translating a business concept into precise SQL. "Cart abandonment" becomes a LEFT JOIN where the purchase side is NULL. "Cohort retention" becomes a self-join with month-delta arithmetic. If you can make that translation quickly, you will handle whatever variation the interviewer creates.

Practice each question by writing the SQL from scratch before looking at the solution. Then modify it: change "top 2 per category" to "top 3 per region," swap month-over-month for quarter-over-quarter, add a filter for minimum order count. These small modifications build the fluency that lets you adapt in a live interview.

For more SQL and data science interview preparation, explore the practice problems and career guides at letsdatascience.com.

Career Q&A

How much of a typical data science interview at an e-commerce company focuses on SQL? At most large technology companies with e-commerce operations, SQL accounts for at least one full interview round and often appears in the take-home assignment as well. StrataScratch reports that 70% of questions at these companies involve JOINs, CTEs, or window functions. Expect SQL to be a gating criterion rather than a bonus skill.

Should I practice on a specific SQL dialect for marketplace interviews? Most interview platforms and live coding environments use PostgreSQL syntax. If the company's job posting mentions Redshift, BigQuery, or Spark SQL, practice with that dialect's date functions and string handling, which differ across platforms. For standard window functions and aggregations, the syntax is nearly identical across all major dialects.

What is the most common SQL mistake that costs candidates an offer? Writing queries that produce incorrect results due to grain confusion. When an orders table has one row per line item rather than one row per order, a simple COUNT(*) overstates the order count. Always check the grain of each table before writing your GROUP BY. Ask the interviewer explicitly if the schema is ambiguous.

How should I structure my approach when given a SQL question I have not seen before? Start by restating the question in your own words and confirming the expected output with the interviewer. Then identify which tables you need and at what grain. Write the query in stages, using CTEs to build intermediate results you can verify. Name your CTEs descriptively. This structured approach shows clear thinking even if your final query needs debugging.

Do e-commerce SQL interviews ever include database design questions? Yes, particularly at companies building new data platforms. You may be asked to design a schema for tracking orders, returns, and inventory, or to normalize a denormalized reporting table. Understanding star schema design and the tradeoffs between normalization and query performance is worth reviewing before your interview.

Is it worth memorizing specific query patterns, or should I focus on understanding fundamentals? Both. The cohort retention pattern, the aggregate-rank-filter pattern, and the LAG-based period-over-period pattern appear so frequently that writing them from memory saves you time and mental energy during the interview. But the fundamentals (joins, group by, window frames) are what let you adapt those patterns to new variations.

How do marketplace SQL interviews differ from standard e-commerce SQL interviews? Marketplace interviews add a second entity: the seller. You will see questions about seller performance tiers, fulfillment speed rankings, and cross-side metrics like "what percentage of buyers purchase from more than one seller." The SQL is similar, but you need to keep track of which side of the marketplace each metric belongs to.

What should I do if I get stuck during a live SQL coding round? Talk through your thought process. Explain what you are trying to compute, what intermediate result you need, and where you are stuck. Interviewers routinely give hints to candidates who communicate well but may not help candidates who go silent. Partial credit is real: a correct CTE with a flawed final SELECT is better than a blank screen.

Sources

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems
Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths