Anonymized netflow and security scan data

Version 1.1

Rigó, Ernő, 2024, "Anonymized netflow and security scan data", https://hdl.handle.net/21.15109/ARP/FBIIOZ, ARP, V1

Dataset Metrics

28 Downloads

Description	Description This dataset contains network traffic and vulnerability scan reports for networks with different characteristics: vlan11 is a public network with low traffic and ~30 hosts cloud is a public network with moderate traffic and ~100 hosts from a cloud environment vlan23 is a private network with high traffic and ~200 hosts Data formats netflow data is presented in (CSV, JSON, RAW) formats for 30 day period security scan reports are presented in (CSV, filtered CSV, HTML, XML) formats Data is compressed in may cases for preserving repository space and network bandwidth. Uncompress with `xz` Anonymization The anonymized dataset comprises a collection of network traffic and domain-related information derived from the described environments. The source information includes sensitive IPv4 addresses and domain hostnames, vital for network analysis, vulnerability assessments, and security research. However, due to the sensitive nature of the data, anonymization is employed to protect personal and organizational privacy. Anonymization Methodology To ensure privacy while retaining the dataset's analytical value, the following anonymization techniques are applied: The main objective is to maintain the utility of network patterns and relationships while masking specific addresses to prevent any form of trace-back to individual devices or networks. IPv4 Address Anonymization Each IPv4 address in the dataset has its first two octets anonymized, using a consistent mapping system that replaces these octets with random, uniquely assigned numbers. This transformation is deterministic, meaning that the same original address segments always map to the same anonymized segments, thus preserving relationships and patterns critical for analysis. Domain Name Anonymization The hostnames within domain names are anonymized by substituting them with a randomly generated string. These new hostnames follow a structured anonymized format: <randomname>.random.xyz. Similar to IP anonymization, the mapping is consistent across the dataset, ensuring that each original hostname is consistently replaced with the same anonymized version. Privacy Considerations Consistency: The anonymization process employs a reproducible mapping system, ensuring that every occurrence of a unique IP address segment or domain hostname is anonymized identically across the dataset. This consistency allows for meaningful analysis of trends and repeated interactions without exposing raw data. Data Integrity: By focusing the anonymization on specific segments of IP addresses and hostnames, the overall structure of the data remains intact. This integrity is crucial for operations such as network flow analysis and anomaly detection, which rely on the continuity of data patterns. Data Minimization: Alongside anonymizing critical fields, the dataset also undergoes a process of column removal, where non-essential fields that might contain sensitive information are excluded. This further reduces the risk of unintended information exposure. (2023-06-01)
Subject	Computer and Information Science
Notes	For additional information see README.md
License/Data Use Agreement	CC BY-NC 4.0

Filter by

	1 to 7 of 7 Files	Download
	netflow-aggr-cloud.csv.xz XZ Archive - 546.5 KB Published Dec 6, 2024 4 Downloads MD5: 30646322ecdd3a9f402ead40b1f360e2 aggregated anonymized netflow data for cloud vlan	Access File File Access Public Download Options XZ Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	netflow-aggr-vlan11.csv.xz XZ Archive - 14.5 MB Published Dec 6, 2024 3 Downloads MD5: f5484240762ef633514aa9912f6271a4 aggregated anonymized netflow data for vlan11	Access File File Access Public Download Options XZ Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	netflow-aggr-vlan23-filtered.csv.xz XZ Archive - 35.4 KB Published Dec 6, 2024 3 Downloads MD5: 4e1c7d87008c71e24d0d8c1134dece83 aggregated anonymized netflow data for vlan23	Access File File Access Public Download Options XZ Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	README.md Markdown Text - 3.1 KB Published Dec 6, 2024 7 Downloads MD5: 06cc03198a6b30df611d81c583369abc additional description of the datasets and anonymization methods	Preview "README.md" Access File File Access Public Download Options Markdown Text Download Metadata Data File Citation EndNote XML RIS BibTeX Explore Options Read Text
	scan-report-cloud.csv.xz XZ Archive - 30.9 KB Published Dec 6, 2024 5 Downloads MD5: d3e16353da13b2b9e7d5bbc3ef6ddfdf anonymized security scan data for cloud vlan	Access File File Access Public Download Options XZ Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	scan-report-vlan11.csv.xz XZ Archive - 29.3 KB Published Dec 6, 2024 3 Downloads MD5: a06e462543454cdd8e62823c5f310412 anonymized security scan data for vlan11	Access File File Access Public Download Options XZ Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	scan-report-vlan23.csv.xz XZ Archive - 47.2 KB Published Dec 6, 2024 3 Downloads MD5: b4dedd49cc89a94bdf633756dd5723df anonymized security scan data for vlan23	Access File File Access Public Download Options XZ Archive Download Metadata Data File Citation EndNote XML RIS BibTeX

Download RO-Crate Open AROMA

This dataset has been configured to use English as the language for all metadata entries.

Citation Metadata

Persistent Identifier	hdl:21.15109/ARP/FBIIOZ
Publication Date	2024-12-06
Title	Anonymized netflow and security scan data
Author	Rigó, Ernő (SZTAKI staff) - ORCID: 0000-0003-1044-7167
Point of Contact	Use email button above to contact. Rigó, Ernő (SZTAKI staff)
Description	Description This dataset contains network traffic and vulnerability scan reports for networks with different characteristics: vlan11 is a public network with low traffic and ~30 hosts cloud is a public network with moderate traffic and ~100 hosts from a cloud environment vlan23 is a private network with high traffic and ~200 hosts Data formats netflow data is presented in (CSV, JSON, RAW) formats for 30 day period security scan reports are presented in (CSV, filtered CSV, HTML, XML) formats Data is compressed in may cases for preserving repository space and network bandwidth. Uncompress with `xz` Anonymization The anonymized dataset comprises a collection of network traffic and domain-related information derived from the described environments. The source information includes sensitive IPv4 addresses and domain hostnames, vital for network analysis, vulnerability assessments, and security research. However, due to the sensitive nature of the data, anonymization is employed to protect personal and organizational privacy. Anonymization Methodology To ensure privacy while retaining the dataset's analytical value, the following anonymization techniques are applied: The main objective is to maintain the utility of network patterns and relationships while masking specific addresses to prevent any form of trace-back to individual devices or networks. IPv4 Address Anonymization Each IPv4 address in the dataset has its first two octets anonymized, using a consistent mapping system that replaces these octets with random, uniquely assigned numbers. This transformation is deterministic, meaning that the same original address segments always map to the same anonymized segments, thus preserving relationships and patterns critical for analysis. Domain Name Anonymization The hostnames within domain names are anonymized by substituting them with a randomly generated string. These new hostnames follow a structured anonymized format: <randomname>.random.xyz. Similar to IP anonymization, the mapping is consistent across the dataset, ensuring that each original hostname is consistently replaced with the same anonymized version. Privacy Considerations Consistency: The anonymization process employs a reproducible mapping system, ensuring that every occurrence of a unique IP address segment or domain hostname is anonymized identically across the dataset. This consistency allows for meaningful analysis of trends and repeated interactions without exposing raw data. Data Integrity: By focusing the anonymization on specific segments of IP addresses and hostnames, the overall structure of the data remains intact. This integrity is crucial for operations such as network flow analysis and anomaly detection, which rely on the continuity of data patterns. Data Minimization: Alongside anonymizing critical fields, the dataset also undergoes a process of column removal, where non-essential fields that might contain sensitive information are excluded. This further reduces the risk of unintended information exposure. (2023-06-01)
Subject	Computer and Information Science
Notes	For additional information see README.md
Depositor	Rigó, Ernő
Deposit Date	2024-12-06

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

CC BY-NC 4.0

Restricted Files + Terms of Access

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

You may request access to the restricted file(s) by clicking the Request Access button.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 100.0 MB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, or not Globus accessible.)

Click Continue to transfer the elligible files.

Restricted Files in the RO-Crate

The file(s) in the RO-Crate may not be downloaded because you have not been granted access.

You may request access to the restricted file(s) by clicking the Request Access button.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Sign Up or Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

CC BY-NC 4.0

Agree to ARP General Terms of Use

I have read, understood and accept the General Terms of Use of the ARP Data Repository and I take responsibility for compliance with the terms and conditions contained therein.

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://repo.researchdata.hu/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.2)

Major Release (2.0)

I have read, understood and accept the terms of the Creative Commons Licenses.

Publish Dataset

This dataset cannot be published until Rivet is published by its administrator.

Publish Dataset

This dataset cannot be published until Rivet and Network Security and Internet Technologies are published.

Return to Author

Return this dataset to contributor for modification.