Our pipeline (upgraded in May 2025) is built around an Intel Tofino programmable switch, which runs a scalable online network traffic anonymization system (ONTAS [Kim et al., SIGCOMM 2019 Workshop on Network Meets AI & ML]). This system efficiently anonymizes IP and MAC addresses and flexibly removes packet payload depending on protocol, while preserving key information such as DNS A records and TLS Server Name Indication (SNI).
| Frame Time | Protocol | Source IP | Destination IP | Source MAC Organization Unique Identifier | Destination MAC Organization Unique Identifier | DNS A | DNS CNAME | DNS Response TTL | DNS Query Name | DNS Query Type | DNS Flag Response |
|---|---|---|---|---|---|---|---|---|---|---|---|
We are authorized by Columbia University’s IT and Institutional Review Board (IRB) to share our datasets with external researchers.
To ensure user privacy and prevent any risk of de-anonymization, we will share the dataset only with researchers who
Please complete the Research Data Request Form to submit your IRB approval (or local equivalent) and request access.