Data Paper Zone II Versions EN3 Vol 2 (3) 2017
A dataset of land cover classification for 25 port cities and their surrounding areas along the Belt and Road (2015)
>>
： 2017 - 03 - 18
： 2017 - 08 - 25
2492 11 0
Abstract & Keywords
Abstract: Based on Landsat 8 OLI satellite images captured in 2015, and referring to the classification criteria and technical specifications of GlobeLand30 dataset, we used Support Vector Machine (SVM) provided by ENVI 5.2 to obtain the dataset of land cover classification for 25 port cities and their surrounding areas along the Belt and Road. In the process, rigorous precision verification and data modification have been carried out by collecting verification samples from high-resolution remote sensing images such as Google maps. The dataset clearly reflects the ecological environment of the 25 port cities and their surrounding areas, supporting researches on land use/cover changes, environmental changes, and so on.
Keywords: the Belt and Road; port cities; land cover; ecology and environment
Dataset Profile
 Chinese title 2015年“一带一路”25个港口城市及其周边区域土地覆盖分类数据集 English title A dataset of land cover classification for 25 port cities and their surrounding areas along the Belt and Road (2015) Corresponding author Hou Xiyong (xyhou@yic.ac.cn) Data authors Song Yang, Hou Xiyong Time range 2015 Geographical scope The following 25 port cities and their surrounding areas: Shanghai, Busan, Singapore, Jakarta, Kyaukpyu, Bangkok, Kuantan, Mumbai, Calcutta, Chittagong, Gwadar, Colombo, Doha, Abbas, Dubai, Lisbon, Jeddah, St Petersburg, Djibouti, Port Sudan, Piraeus, Istanbul, Sydney, Alexander, and Darwin. Spatial resolution 30 m Data volume 28.7 MB Data format *.tif, *.tfw, *.dbf, *.ovr, *.xml Data service system Source of funding National Natural Science Foundation of China (31461143032) Dataset composition This dataset consists of the land cover data of 25 port cities and their surrounding areas. The data of each city is stored in a single folder in TIF format and all the 25 cities are compressed into one file. The compressed file takes up a disk space of 28.7 MB.
1.   Introduction
In response to the Belt and Road (B&R) Initiative, the National Remote Sensing Center of China (NRSCC) of the Ministry of Science and Technology focused its Global Ecosystems and Environment Observation: Annual Report from China (GEOARC-2015) on the ecological environment of the B&R. As port is one of the key targets in the infrastructure construction along coastal cities, it is of great significance to monitor the spatial distribution and exploitation of coastline resources along the 21st-Century Maritime Silk Road. 25 port cities have been chosen for this study based on the geographical distribution of the port cities’ maritime economic activities, location advantages, economic radiation, transportation roles and development status, as well as their role in the B&R Initiative. Using remote sensing and GIS techniques, we built the dataset of land cover classification for the 25 port cities and their surrounding areas along the B&R. The methods and outcomes of this research have been included in GEOARC-2015.
2.   Data collection and processing
2.1   Data collection
The Landsat 8 OLI images for 2015 with a spatial resolution of 30 m were selected as the source data, downloadable from the United States Geological Survey (http://glovis.usgs.gov/). Considering the costal location of these port cities, cloudless images for spring and autumn were downloaded (Table 1).
Table 1   Landsat 8 OLI images used for land cover classification
 Port city Date Path/Row Shanghai 2015/8/3 118/38 2015/8/3 118/39 Busan 2015/6/4 114/35 2015/6/4 115/35 2015/5/27 115/35 2015/5/27 115/36 Dubai 2015/8/16 161/42 2015/9/19 159/43 2015/9/26 160/42 2015/9/26 160/43 Alexander 2015/9/1 177/38 2015/9/24 178/38 Calcutta 2015/11/19 138/44 2015/11/19 138/45 2015/10/25 139/44 2015/10/25 139/45 Jeddah 2015/9/25 169/44 2015/9/16 170/45 2015/9/16 170/46 2015/9/16 170/47 Lisbon 2015/6/26 204/33 Doha 2015/8/23 162/42 2015/8/23 162/43 Singapore 2015/6/1 125/59 Sydney 2015/2/28 89/83 2015/2/28 89/84 Colombo 2015/1/8 141/55 2015/2/25 141/56 Kyaukpyu 2015/11/23 134/47 Port Sudan 2015/7/21 171/46 2015/7/21 171/47 Bangkok 2015/5/21 128/51 2015/11/4 129/50 2015/11/4 129/51 2015/3/16 130/50 Jakarta 2015/8/31 122/64 2015/9/23 123/64 St Petersburg 2015/8/24 185/18 2015/8/24 185/19 Abbas 2015/8/19 159/42 2015/8/26 160/42 Istanbul 2015/7/13 179/31 2015/6/28 179/32 2015/9/6 180/31 2015/9/6 180/32 2015/8/28 181/31 Chittagong 2015/11/5 136/44 2015/10/20 136/45 Mumbai 2015/10/8 148/46 2015/10/8 148/47 Gwadar 2015/9/7 155/43 Darwin 2015/5/11 106/68 2015/5/27 106/69 Kuantan 2015/7/26 126/58 Piraeus 2015/8/19 182/34 2015/7/9 183/33 2015/7/9 183/34 Djibouti 2015/10/6 166/52
2.2   Data processing
Image classification is an important application of remote sensing technologies. However, deficiency widely exists among both the traditional (e.g., maximum likelihood method, Mahalanobis distance method, and minimum distance method) and emerging (e.g., neural network, fuzzy clustering method, decision tree classification algorithm) classification methods of remote sensing images, including their high sample dependence, low algorithm stability and low classification accuracy.12 The SVM (Support Vector Machine) method, based on the statistical learning theory of Vapnik-Chervonenkis Dimension (VC dimension), is a new kind of artificial intelligence algorithm developed in recent years. It is a very effective method to enhance machine learning theory while minimizing structural risks, which can largely overcome the shortcomings and deficiencies mentioned above. VC dimension characterizes an exponential function set in which h samples will be separated by 2h functions. The higher the VC dimension, the greater the confidence bounds; the more sophisticated the machine learning, the higher the empirical risks.3 SVM can decrease the VC dimension as greatly as possible to minimize the structural risk, thereby achieving the best balance between model complexity and learning ability based on limited sample information. Some problems in usual methods could be better solved through SVM method, such as small sample, nonlinearity, over learning, high dimension and local minima.4 The SVM method was hence chosen in this study for remote sensing image classification.
The land cover classification system of this research (Table 2) was established by referring to the GlobalLand30 land cover dataset (http://globallandcover.com/GLC30Download/index.aspx). SVM method58 provided by ENVI 5.2 was used to obtain the dataset of land cover classification for the 25 port cities and their surrounding areas along the R&B. Figure 1 shows the technical flow of land cover classification. Furthermore, rigorous precision verification and data modification have been carried out by collecting verification samples from high-resolution remote sensing images such as Google maps and GF-2. Different from the GlobalLand30 dataset, our dataset classified the urban greenbelt inside the built-up area, so that it reflects the ecological environment of built-up area.
Table 2   Definition of land cover types in the classification system
CodeTypesDefinition
10Cultivated landLand used for agriculture, horticulture and gardens, including paddy fields, irrigated and dry farmland, vegetation and fruit gardens, etc.
20ForestLand covered with trees, with a vegetation coverage of over 30%, including deciduous and coniferous forests, and sparse woodland with a coverage of 10 – 30%.
30GrasslandLand covered by natural grass with a coverage of over 10%.
40Shrub landLand covered by shrubs with a coverage of over 30%, including deciduous and evergreen shrubs, and desert steppe with a coverage of over 10%.
50WetlandLand covered with wetland plants and water bodies, including inland marsh, lake marsh, river floodplain wetland, forest/shrub wetland, peat bogs, mangrove, salt marsh, etc.
60Water bodiesWater bodies refer to river, lake, reservoir, fish pond, ocean, etc.
80Impervious surfaceNatural and artificial surfaces that prevent water from penetrating directly into the soil, including urban transportation facilities, construction land, industrial and mining land, buildings, roofs, etc.
90Bare landLand with a vegetation coverage of lower than 10%, including desert, sandy fields, Gobi, bare rocks, saline and alkaline land, etc.

Figure 1   Technical flow of land cover classification
3.   Sample description
3.1   Data description
The dataset consists of the land cover data of 25 port cities and their surrounding areas. The land cover data of each city were stored in a single folder in 8-bit TIF raster format and all the 25 data folders were then compressed into a rar file. The disk space of the compressed document is 28.7 MB. The WGS84 geographic coordinate system was adopted.
3.2   Data sample
Based on SVM supervised classification, the land cover dataset for the 25 port cities was obtained after rigorous precision verification and data modification. Figure 2 shows the data sample in Busan and its surrounding areas. The results showed that, forest was the major land cover type, with a coverage of 3875.08 km2 that accounted for 69% of built-up areas and their 50 km buffer zone. The area of impervious surface, which mainly distributed in built-up area, had a coverage of 701.07 km2 accounting for 12.66%, while cultivated land mainly distributed in the estuary area and river valley regions, with an area of 694.09 km2 accounting for 12.36%. Busan had a built-up area of 253.39 km2, in which impervious surface and vegetation were the major land cover types, with a coverage of 154.16 km2 and 96.11 km2, respectively. In comparison, other land cover types accounted for a relatively smaller area, and scattered across the whole Busan and its surrounding areas.

Figure 2   Land cover map of Busan and its surrounding area, 2015
4.   Quality control and assessment
4.1   Technical flow of accuracy verification
Based on Google Earth images, visual interpretation was adopted to collect validation data and to verify the accuracy of land cover classification for the 25 port cities and their surrounding areas. The technical flow of accuracy verification is shown in Figure 3. The specific process is described as follows: (1) Use the “Generate Tool” in ArcGIS 10.1 to build grid points in built-up areas and their surrounding areas for each city; (2) Overlay the grid points layer with the land cover map, and then extract and record the land cover types of grid points into the attribute table of the grid points layer; (3) Export the grid points layer into KML format, and then display them on Google Earth to verify the land cover classification through comparison with Google Earth images; (4) Calculate the total accuracy and Kappa coefficient based on confusion matrix, and assess the accuracy of land cover data of the 25 port cities and their surrounding areas along the B&R.

Figure 3   Technical flow for verifying the land cover classification
4.2   Verification methods
Confusion matrix, which is the most commonly used method to evaluate the accuracy of land use/cover mapping, can be used to calculate Kappa coefficient and total accuracy of the port cities. Confusion matrix consists of n rows and n columns, where n characterizes the total number of land cover types. Column (1... n) represents the actual land cover types, and Row (1... n) represents the land cover types of SVM classification. The total quantities of elements are equal to the quantities of sample verification points.
The overall accuracy represents the percentage of correct classification points among all classification points, while the correct classification distributed along the diagonal of the confusion matrix. Kappa coefficient can comprehensively and clearly reflect the overall accuracy of this dataset. The overall accuracy and Kappa coefficient can be calculated by the following formulae:
$P= {\sum_{i=1}^{n} x_{ii}\over N}$
(1)
$K={N\sum_{i=1}^{n} x_{ii} - \sum_{i=1}^{n} x_{i+}x_{+i}\over N^2- \sum_{i=1}^{n} x_{i+}x_{+i}}$
(2)
where P and K represent total accuracy and Kappa coefficient respectively, N is the total quantities of verification points, and n is the total quantities of columns in the confusion matrix. χii represents the verification points distributed in the i-th raw and i-th column, χi + and χ+i represent the total quantities of verification points in the i-th raw and i-th column, respectively.
4.3   Verification accuracy
The results of accuracy verification are shown in Table 3. Among the 25 port cities, most port cities have a high total accuracy of above 90%, except 88.94% for Calcutta; and the Kappa coefficients of most cities are above 0.8, among which those of the majority are above 0.85. This proves the dataset has a high accuracy, with convincing land cover classification.
Table 3   Verification results of the land cover dataset for the 25 port cities
CityTotal accuracy (%)Kappa coefficientVerification Points number
Colombo97.320.91149
Abbas96.100.88154
Piraeus90.500.90180
Darwin96.000.94142
Busan94.190.88196
Dubai95.830.90210
Doha96.630.82196
Kuantan97.410.83206
Djibouti93.000.88100
Jeddah96.230.85210
Chittagong93.150.89146
Kyaukpyu92.110.89114
Bangkok93.590.87156
Shanghai93.250.90164
Mumbai90.770.87130
Calcutta88.940.81208
St Petersburg95.230.93210
Port Sudan97.630.79127
Lisbon93.130.91133
Singapore97.060.95102
Sydney97.040.94135
Alexander96.940.9598
Jakarta91.520.88165
Istanbul97.370.96190
5.   Usage notes
ArcGIS software can be used to edit, analyze, manage and inquire about this dataset. The dataset can be applied to researches on, for example, land use/cover changes and environmental changes along the B&R, as it accurately reflects the ecological environment of the 25 port cities and their surrounding areas.
Authors and contributions
Song Yang, PhD; research area: remote sensing of land use change and shoreline change in coastal areas. Contribution: downloading and processing the Landsat 8 OLI images, delineating and classifying the shorelines.
Hou Xiyong, Professor; research area: remote sensing of coastal area, coastal management. Contribution: designing the overall plan and technical framework.
Acknowledgments
This work was completed as part of the Global Ecosystems and Environment Observation: Annual Report from China (GEOARC-2015). Special thanks go to workgroups in the National Remote Sensing Center of China (NRSCC), Ministry of Science and Technology of the People’s Republic of China. Thanks also go to Wang Yuandong, Liu Jing, Wang Junhui, Wei Liaosheng, Wang Xiaoli and Hou Wan for their contributions and valuable suggestions on the technical scheme, data collection and processing.
1.
Wang Y, Shen X & Xie J. A review of remote sensing image classification method. Remote Sensing Information 36 (2006 ): 67 – 71.
2.
Li S, Wang J, Bi Y et al. Review of methods for classification of remote sensing images. Remote Sensing for Land & Resources 17 (2005 ): 1 – 6.
3.
Yuan Y, Chen Q & Wang H. Research about the dimension verification in support vector machine. Agriculture & Technology 26 (2006): 210 – 211.
4.
Wang X, Mao M, Zhang C et al. Comparative study on classification of remote sensing image by support vector machine. Geomatics & Spatial Information Technology 36 (2013): 17 – 20, 23.
5.
Liu W. Auto-identify[ing] Classification Technology for LUCC Information Based on Remote Sensing Data. Doctoral Dissertation, Northwest A&F University, 2012.
6.
Fu W, Hong J & Lin M. A method of land use classification from remote sensing image based on support vector machines and spectral similarity scale. Remote Sensing Technology and Application 21 (2006): 25 – 30.
7.
Jing X, Shu Q T & Liu Q. Remote sensing images land use/cover monitoring based on support vector machine. Journal of Anhui Agricultural Science 42 (2014): 7631 – 7632.
8.
Yang Q & Li X. Cellular automata for simulating land use changes based on support vector machine. Journal of Remote Sensing 10 (2006): 836 – 846.
Data citation
1. Song Y & Hou X. A dataset of land cover classification for 25 port cities and their surrounding areas along the Belt and Road (2015). Science Data Bank. DOI: 10.11922/sciencedb.382
Article and author information