Abstract: Vulnerability is a defect in the design, implementation, or deployment of information systems. Once these defects are exploited or attacked by malicious entities, they will cause damage to the security of the information system, resulting in heavy losses to users, society, and the state. This study collects the data of vulnerability of popular vulnerability platforms from 1999 to 2018 through the combination of program automation and manual acquisition. The collection of nearly 20 years of data of vulnerability is sliced and formatted to ensure data readability and consistency, thus building a complete dataset of vulnerability. According to the vulnerability platform to which the vulnerability data belongs, the dataset is divided into several parts with different sources. Based on the vulnerability data attribute, the total number of entries in the dataset containing the vulnerability is counted, including the number of vulnerability entries identified by the CVE and the number of vulnerability entries corresponding to different vulnerability types. The dataset plays an essential role in scientific research, security early warning, and security incident handling. Researchers can use this dataset to conduct corresponding security research; software developers can find out the vulnerabilities in their software by querying this dataset.
Keywords: vulnerability; vulnerability dataset; data collection; vulnerability platform