Abstract: Identification and statistics of reproduction of river sand and sediment components are the key steps of provenance analysis. The traditional human identification and manual statistics process are time-consuming and laborious, and the data obtained are of different standards and of uneven quality. The data obtained by different laboratories are of poor contrast. The automatic identification of sand components by computer using machine learning technology can help geologists relieve themselves from this tedious and time-consuming work. To achieve this goal, professional geologists need to take and mark a large number of microscopic image files as a basis for training. However, the large number of computer workers who want to do this work cannot find such datasets. Based on the principle of data disclosure and sharing, the author published the marked image dataset which had spent a lot of time and energy before. The dataset consists of 8,734 tagged clastic particle images and coordinate files, 1,536 sand microscope images, 120 numbered base maps and two sand composition identification tables, which provides a large number of data bases for computer automatic identification of sand components using machine learning techniques, and can also serve as reference standards for identification of other river sand detrital components.
Keywords: photomicrograph of sand grains; dataset of labeled fragments; machine learning; modern river sand of Yarlung Tsangpo