#464

E-Commerce Marketplace Order Analytics

HARDData Warehouse Designdatamodelling

Problem Description

## E-Commerce Marketplace Order Analytics ### Context A large e-commerce marketplace platform connects millions of buyers with thousands of third-party sellers offering products across hundreds of categories. The business wants to understand buyer purchase behavior, seller performance, and supply chain health through a centralized analytics platform. Analysts need to slice revenue, order volume, and fulfillment metrics across time, category, geography, and seller. The platform also processes a significant volume of returns and refunds daily, which must be accurately reflected in all revenue reporting. ### Requirements - Track purchase behavior including order volume, revenue, average order value, and repeat purchase rate, segmentable by buyer geography, product category, and time. - Assess seller performance including units sold, revenue generated, cancellation rate, and late shipment rate per seller per period. - Monitor order fulfillment health: time-to-ship, carrier performance, on-time delivery rate, and estimated vs. actual delivery window. - Measure return and refund volumes and amounts by product category, seller, and geography — revenue figures must remain accurate even when orders are only partially returned. - Support historical accuracy: dimension attributes such as product price, seller status, and buyer address may change over time and historical orders must reflect the state at the time of purchase. ### Constraints - ~300 million active buyers and ~2 million active sellers on the platform. - ~15 million orders placed per day; each order contains between 1 and 50 line items on average. - Order and line-item fact data retained at event granularity for 3 years; older data summarized at weekly grain. - Dimension snapshots refreshed daily; partition by date. - Return and refund events must be modeled at the line-item level to avoid partial-return double-counting. ### Follow-Up Questions 1. An order contains multiple line items, each representing a distinct product, quantity, price, and fulfillment status. How does your model support per-item analytics such as revenue by product, units returned per SKU, and fulfillment status breakdown? 2. A product can belong to multiple levels of a category hierarchy (e.g., Electronics > Phones > Smartphones). How does your model allow analysts to filter and aggregate metrics at any level of that hierarchy? 3. Multiple sellers can list the same product at different prices, and the winning seller for each line item is selected at checkout. How would you track seller-level competitive pricing history and attribute performance metrics to the correct seller at the time of sale? ### Notes - All timestamps are stored in UTC. - An order transitions through statuses: `placed → confirmed → shipped → delivered → (optionally) return_requested → returned`. - Refund amounts may differ from the original item price (e.g., partial refunds, restocking fees). - Candidates should assume seller attributes include category specialization, fulfillment type (self-fulfilled vs. platform-fulfilled), and tier rating.

Topics

dimensional modelingslowly changing data

Asked at Companies

amazon

Solve This Problem

Sign up to access the interactive code editor, run test cases, view the editorial, and get AI-powered feedback on your solution.

Start Solving →