#475

Professional Network Connection Analytics

HARDData Warehouse Designdatamodelling

Problem Description

## Professional Network Connection Analytics ### Context A large professional social platform hosts nearly a billion members who form mutual connections with one another. The analytics team needs to measure network growth, understand how connections form (via search, recommendations, or imports), and assess the quality and depth of each member's professional network. Connections between members are inherently mutual — when two members connect, both gain access to each other's updates and extended network. The data platform must support accurate network-wide aggregation, cohort analysis, and per-member reach metrics across several years of connection history. ### Requirements - Report the total number of unique connections in the network, globally and segmented by industry, seniority level, geography, and connection source (e.g., search, suggestion, imported contact) - Track daily, weekly, and monthly new connection volume and net connection growth (accounting for removals) - Identify members with the highest connection counts and analyze the distribution of network size across member segments - Enable accurate deduplication so that each mutual connection is counted exactly once when computing network-wide totals or per-member degree counts - Support cohort analysis: for members who joined in a given month, how does their network size grow over the following 6, 12, and 24 months? - Measure connection source effectiveness: what fraction of connections originated from platform-generated suggestions versus member-initiated search? ### Constraints - The platform has approximately 950 million active members - The connection graph contains roughly 30 billion unique mutual connections - Approximately 50 million new connections are formed each day - Connection history must be retained for a minimum of 3 years - Query latency for aggregated network metrics must support interactive BI dashboards (sub-30-second response on pre-aggregated tables) - Member profile attributes (industry, seniority, location) change over time and must be snapshotted to support point-in-time analysis ### Follow-Up Questions 1. A new "follow" feature lets members follow other members without establishing a mutual connection — the relationship is directed and one-way. How would you modify the model to support both mutual connections and one-way follows, and how would you report on each type separately? 2. Members can now disconnect and later reconnect with each other. How do you preserve the full history of connection state changes, including when a connection was removed and subsequently re-established? 3. The business wants to analyze second-degree network reach — for any given member, how many unique people can they reach through their direct connections' connections? Describe what analytical structure (pre-aggregation layer or denormalized table) you would add to make this queryable efficiently at scale for hundreds of millions of members. ### Notes - All timestamps are stored in UTC - `connection_source` may be null for connections migrated from legacy systems prior to source tracking - For members with no connections, they appear in `dim_member` but have no rows in the connection bridge - Member attribute snapshots in `dim_member` are taken daily; use the snapshot date closest to (but not exceeding) the analysis date for point-in-time joins

Topics

cardinalityslowly changing data

Asked at Companies

linkedin

Solve This Problem

Sign up to access the interactive code editor, run test cases, view the editorial, and get AI-powered feedback on your solution.

Start Solving →