#226
Deduplicate Page Impressions
MEDIUMAggregationPOSTGRESQL
Problem Description
You are building a **page impression deduplication pipeline** for a web analytics platform. The impression logging system has a known bug that emits duplicate events — multiple rows can share the same (`visitor_id`, `page_name`, `referrer_page_name`, `ts`) combination.
Your task is to return a de-duplicated result set from the `page_impression` table.
**Key Rules:**
- A duplicate is defined as rows sharing the same `visitor_id`, `page_name`, `referrer_page_name`, and `ts`
- `referrer_page_name` can be `NULL` — treat `NULL` values as equal when grouping
- For each unique combination, keep the row with the **smallest `id`** value
**Output:** Return `visitor_id`, `page_name`, `referrer_page_name`, `ts`, and `id` (the surviving row's id) for all unique impressions. Order by `id` ascending.
Topics
aggregation
Asked at Companies
airbnb
Solve This Problem
Sign up to access the interactive code editor, run test cases, view the editorial, and get AI-powered feedback on your solution.
Start Solving →