Data Description

Our pipeline (upgraded in May 2025) is built around an Intel Tofino programmable switch, which runs a scalable online network traffic anonymization system (ONTAS [Kim et al., SIGCOMM 2019 Workshop on Network Meets AI & ML]). This system efficiently anonymizes IP and MAC addresses and flexibly removes packet payload depending on protocol, while preserving key information such as DNS A records and TLS Server Name Indication (SNI).

DNS traffic metadata

We also provide CSV files containing useful information extracted from DNS packets. Metadata example:

Frame Time Protocol Source IP Destination IP Source MAC Organization Unique Identifier Destination MAC Organization Unique Identifier DNS A DNS CNAME DNS Response TTL DNS Query Name DNS Query Type DNS Flag Response

Access Request Terms

We are authorized by Columbia University’s IT and Institutional Review Board (IRB) to share our datasets with external researchers.

To ensure user privacy and prevent any risk of de-anonymization, we will share the dataset only with researchers who


Access Request

Please complete the Research Data Request Form to submit your IRB approval (or local equivalent) and request access.