#226

Deduplicate Page Impressions

MEDIUMAggregationPOSTGRESQL

Problem Description

You are building a **page impression deduplication pipeline** for a web analytics platform. The impression logging system has a known bug that emits duplicate events — multiple rows can share the same (`visitor_id`, `page_name`, `referrer_page_name`, `ts`) combination. Your task is to return a de-duplicated result set from the `page_impression` table. **Key Rules:** - A duplicate is defined as rows sharing the same `visitor_id`, `page_name`, `referrer_page_name`, and `ts` - `referrer_page_name` can be `NULL` — treat `NULL` values as equal when grouping - For each unique combination, keep the row with the **smallest `id`** value **Output:** Return `visitor_id`, `page_name`, `referrer_page_name`, `ts`, and `id` (the surviving row's id) for all unique impressions. Order by `id` ascending.

Topics

aggregation

Asked at Companies

airbnb

Solve This Problem

Sign up to access the interactive code editor, run test cases, view the editorial, and get AI-powered feedback on your solution.

Start Solving →