This figures shows our data collection pipeline and the network topology. Shared graduate student apartments have one Ethernet port per bedroom, while other apartments have one Ethernet port per apartment. The ports connect to a switch in the residential building, which connects to an aggregation switch and then to the Internet via the campus network and a few providers. The aggregation switch mirrors both traffic to and from our residential buildings over 2x10Gbps dedicated fiber to a server in our nearby lab. Since Columbia has not deployed IPv6 in these buildings, we only study IPv4 traffic.
Our data collection/anonymization pipeline followed established practices, was approved by Columbia’s IT, and received formal review and was declared exempt by our Institutional Review Board (IRB) as it is not human-subjects research. It anonymizes privacy-sensitive fields and discards personally identifiable information. We do not identify any human or study network usage below the level of buildings.
We associate each flow with a service (e.g., Netflix, YouTube) using a combination of domain keyword matching, unsupervised clustering, and transport-layer heuristics.
We match ⟨DNS, SNI⟩ domain pairs against a curated list of ~200 service-related keywords (e.g., domains containing "nflx" are mapped to Netflix). This rule-based mapping, built upon the public nDPI keyword set, accounts for 73% of traffic by volume. The list of keywords is provided below:
| Keyword | Service |
|---|---|
| nflx | netflix |
| hbomax | hbomax |
| apple-dns | icloud |
| icloud | icloud |
| tv.apple | appletv |
| itunes | applestore |
| aapl | icloud |
| blizzard | blizzard |
| hulu | hulu |
| icloud | icloud |
| steam | steam |
| outlook | microsoftcloud |
| twimg | |
| googlevideo | youtube |
| tiktok | tiktok |
| cdn-apple | icloud |
| espn | espn |
| movetv | movetv |
| redd.it | |
| spotify | spotify |
| zoom | zoom |
| slack | slack |
| peacock | peacock |
| gmail | gmail |
| ytimg | youtube |
| fitbit | fitbit |
| stripe | stripe |
| bestbuy | bestbuy |
| roblox | roblox |
| youtube | youtube |
| ookla | ookla |
| sling.com | sling |
| cdn-apple | applecdn |
| apple | icloud |
| fbcdn | |
| facetime.apple | facetime |
| messenger.com | facebookmessenger |
| ttvnw | twitch |
| photosdata-pa.googleapis | googlephotos |
| uploadgig | uploadgig |
| idrive | idrive |
| wireguard | wireguardvpn |
| megaphone | spotify |
| cbsivideo | cbsvideo |
| pbs.org | pbs |
| dropbox | dropbox |
| hbo | hbomax |
| roku | roku |
| warner | warner |
| spectrum | spectrum |
| xbox | xbox |
| cbsaavideo | cbsvideo |
| msggo | msggo |
| nvidiagrid | nvidiagrid |
| oneclient | microsoftcloud |
| skype | msteams |
| youtube | youtube |
| plutotv | plutotv |
| pluto.tv | plutotv |
| taobao | taobao |
| shopify | shopify |
| github | github |
| grammarly | grammarly |
| nyt | newyorktimes |
| tidal | tidal |
| twitchcdn | twitch |
| dssott | disneyplus |
| microsoft | microsoft |
| teams.microsoft | msteams |
| adobe | adobe |
| bilivideo | bilivideo |
| line-scdn | line |
| scdn | spotify |
| telegram | telegram |
| qooqlevideo | qooqlevideo |
| wattpad | wattpad |
| riotcdn | riotcdn |
| cbsnews | cbsnews |
| pandora | pandora |
| siriusxm | pandora |
| torproject.org | tor |
| echo | amazonecho |
| office | microsoftcloud |
| stitcher | stitcher |
| fbsbx | |
| redgifs | |
| playstation | playstation |
| wikimedia | wikipedia |
| metmuseum | metmuseum |
| courseworks | courseworks |
| wordpress | wordpress |
| discord | discord |
| zillow | zillow |
| windows | microsoft |
| onlyfans | onlyfans |
| tumblr | tumblr |
| xvideos | xvideos |
| llnwd | limelight |
| xhcdn | xhcdn |
| igcdn | |
| phncdn | phcdn |
| bumble.com | bumble |
| edgesuite | microsoftcloud |
| tinder | tinder |
| pv-cdn | primevideo |
| gaijin | gaijin |
| wetransfer | wetransfer |
| eporner | eporner |
| wsj | wallstreetjournal |
| mushroomtrack | mushroomtrack |
| comcast | comcast |
| epicgames | epicgames |
| pornez | pornez |
| squarespace | squarespace |
| paramountplus | paramountplus |
| mmcdn | mmcdn |
| gopuff | gopuff |
| photos.google | googlephotos |
| messages.google | googlemessages |
| video.google | youtube |
| groups.google | googlegroups |
| play.google | googleplay |
| drive.google | googledrive |
| calendar.google | googlecalendar |
| spreadsheets.google | googledrive |
| chat.google | googlemessages |
| googleusercontent | googleusercontent |
| webex | webex |
| cisco | cisco |
| foxitsoftware | foxitsoftware |
| campusgroups | campusgroups |
| conda | conda |
| columbia | columbia |
| eerospeedtests | eerospeedtests |
| dailymotion | dailymotion |
| samsung | samsung |
| notion | notion |
| sndcdn | soundcloud |
| soundcloud | soundcloud |
| groupme | groupme |
| mail.google | gmail |
| docs.google | gdocs |
| nintendo | nintendo |
| ubisoft | ubisoft |
| showtime | showtime |
| crunchyroll | crunchyroll |
| baddiehub | baddiehub |
| dssedge | disneyplus |
| sc‑cdn | snapchat |
| onedrive | microsoftcloud |
| live‑video | twitch |
| aiv‑cdn | primevideo |
| olemovienews | olemovienews |
| afreecatv | afreecatv |
| cdn‑videos.lpsg | lpsg |
| v.vrv | crunchyroll |
| fubo | fubo |
| bittorrent | bittorrent |
| publicbt | bittorrent |
| android.googleapis | playstore |
| torrents | bittorrent |
| watchliveformula1 | watchliveformula1 |
| storage.live | microsoftcloud |
| ott‑video‑cf.formula1 | ott‑video‑cf.formula1 |
| kakao | kakaotalk |
| cwtv | cartoonnetwork |
| vimeo | vimeo |
| disney | disneyplus |
| licdn | |
| kanopy | kanopy |
| bdsmlr | bdsmlr |
| sharepoint | microsoftcloud |
| tubi | tubi |
| n.shifen | baidu |
| primevideo | primevideo |
| amazonvideo | primevideo |
| video.a2z | primevideo |
| clients.google.com | playstore |
| clients6.google.com | playstore |
| snapchat | snapchat |
| inbox.google.com | gmail |
| meet.google | googlemeet |
| netflix | netflix |
| hoyoverse | hoyoverse |
| hinge | hinge |
| theleague | theleague |
| porntn | porntn |
| xnxx-cdn | xnxx-cdn |
| patreon | patreon |
| mcafee | mcafee |
| cloudfront | primevideo |
| max.com | hbomax |
For domain pairs not covered by keywords, we use an unsupervised learning approach to identify clusters of related ⟨DNS, SNI⟩ pairs based on temporal correlation—how often they appear near each other in time. We apply the Louvain clustering algorithm and assign each cluster to the service that dominates its traffic. If a service makes up ≥ 60% of a cluster's traffic, we label the entire cluster accordingly. This adds another 6.4% of traffic to our service mapping.
For flows with no DNS or SNI data, we apply manual rules based on known transport-layer signatures (e.g., destination AS + port + protocol). Traffic on ports 16393–16402 or ports 3478–3497 from ASN 714 was classified as FaceTime. Port 51820 was used to identify WireGuard VPN, while port 3480 traffic from Microsoft's ASN (8075) indicated Microsoft Teams usage. Google Meet was detected via port 3478 traffic from Google's ASN (15169). Twitch traffic was identified using its ASN (46489), and Facebook Messenger by matching ASN 32934 with port 3478. We also labeled traffic from Ubisoft (ASN 49544) and PlayStation (ASN 33353). Finally, BitTorrent traffic was inferred from activity on ports 6881–6889.