Other Data Paper Zone II Versions EN1 Vol 4 (2) 2019
Remote sensing data for urban thermal anomaly in Sanya City
>>
： 2018 - 11 - 19
： 2019 - 03 - 25
： 2018 - 12 - 12
： 2019 - 05 - 16
1331 2 0
Abstract & Keywords
Abstract: Aiming at the problem of thermal anomaly in Sanya, this paper takes Landsat data from 2008 to 2017 as the main data source, extracts the high temperature anomaly area based on the radiation equation transfer method and the improved box-plot method, carries out frequency statistics of the high temperature anomaly area, determines the high temperature anomaly area whose frequency is more than 60% as the urban thermal anomaly area, and obtains 10-year urban thermal pollution anomaly area. This data set can be applied to other cities to extract multi-regional urban thermal anomaly areas, so as to play a higher application value.
Keywords: thermal anomaly; radiation transfer equation; modified boxplot method;  frequency of the high-temperature anomaly area
Dataset Profile
 Title Remote sensing data for urban thermal anomaly in Sanya City Data corresponding author Meng Qingyan (mengqy@radi.ac.cn) Data authors Meng Qingyan, Gu Yanchun, Hao Lichun, Hu Die, Zhang Ying, Zhang Linlin Time range 2008–2017 Geographical scope Sanya (18°09′34″–18°37′27″N, 108°56′30″–109°48′28″E) Spatial resolution Landsat 5：30/120 m Landsat 7：30/60 m Landsat 8：30/100 m Data volume 4.05 MB Data format *.vsdx,*.jpg,*.png, *.py, *.shp Data service system Sources of funding Natural Science Foundation of Hainan Province (417219); Major Science and Technology Program of Hainan Province (ZDKJ2016021); Science and Technology Program of Sichuan Province (2018JZ0054); Major Science and Technology Program of Hainan Province (ZDKJ2017009). Dataset composition This data set mainly includes seven folders comprised of the following: 1. Cover Map Folder: It contains data named Cover Map*.jpg format; 2. Sanya City Boundary Map Folder: Contains the boundary map data of Sanya City with the format *.jpg; 3. Data Processing Flow Chart Folder: It contains data named Data Processing Flow Chart*. vsdx format; 4. The folder of annual thermal anomaly extraction results of Sanya City from 2008 to 2017: annual thermal anomaly extraction results of Sanya City in 2008-2017*.png format; 5. Metadata folder of annual thermal anomaly extraction results in Sanya City from 2008 to 2017: metadata of annual thermal anomaly extraction results in Sanya City in 2008-2017*.shp format; 6. Code folder: contains the *.py format code used in the data processing process; 7. Sanya Vector Boundary Folder: Sanya City Boundary Data with *. SHP format.
1.   Overview
Thermal anomalies are phenomena that cause an abnormal increase in ambient temperature due to natural factors and heat emissions from human activities. Thermal observations of these temperature changes can be used to investigate the timeline of fluctuate sequences.[1]In recent years, the rapid advancement of urbanization have increased resulting in greater chances of thermal anomalies - situations where the temperature at a location exceeds the safety threshold for equipment placed there, which causes serious negative effects on water resources, local climate and living environment. According to statistics, in the past 100 years, the annual average temperature in the world has risen by 0.7–1°C, while the average temperature in big cities has increased by 2–3°C. The annual average temperatures of the world-class cities such as Beijing, Shanghai and Tokyo is much higher than those of the suburbs. It is 2 °C higher and 6 °C higher in summer.[2]There is an obvious distiction between the effects caused by various pollution such as the material pollution caused by pollutants, can be controlled by specific methods. At the same time, many intelligent solutions have been explored to minimize the greenhouse gases emission into atmosphere. Howerver, thermal anomalies are quite different. They are primarily derived from heat sources and energy consumptions.[3]Therefore, thermal anomalies are inevitable. In response to such phenomena, scholars at home and abroad have conducted a lot of explorations and researches. For example, Wu and Chen have evaluated the thermal anomaly intensity and the impact on the surrounding environment by monitoring the heat flux of nuclear power plants in different regions.[4,5] Liu and Xia and others based on thermal anomaly products, using different methods to extract industrial heat sources;[6,7] Zhang et al. used Landsat data to explore and analyze the thermal environment before and after the relocation of Shougang.[8]Some scholars analyze the interactive relationship between urban heat island intensity and temperature to assess urban thermal anomalies.[9,10]It can be seen that thermal anomalies have been a widely concerned scientific subject and become a research hotspot in the last few decades. However, in the past, a large amount of literature on thermal anomalies was mostly directed at industrial cities, or where there was industrial pollution, and there were few studies on thermal anomalies for non-industrial cities. Thermal anomalies exist not only in industrial cities but also in non-industrial cities and has turned into one Of the most important and frequent meteorological disaster over such areas, which causes huge damages on local social economics. Therefore, for non-industrial cities, how to effectively extract the data in hot anomalous areas and predict the occurrence of thermal anomalies has become a hot issue for meteorologists.
As the only tropical coastal tourism city in China, Sanya is particularly important for its ecological environment protection. It coincides with the promotion of the construction of “International Tourism Island” in Hainan Province. As the leader of Hainan's tourism development, Sanya is committed to creating an international coastal tourism city with “one port and two places” with international standards. However, along with the continuous development of urbanization in Sanya, the impact of urban thermal anomalies on the urban environment has become increasingly prominent. Therefore, aiming at the urban construction and regional, environmental and vegetation characteristics of Sanya, the development of targeted urban thermal anomalies is of great significance to Sanya's urban construction, planning and layout, and environmental protection.
In summary, aiming at how to monitor the thermal anomaly area in Sanya and extract the thermal anomaly area, this paper employed the long-term sequence Landsat data to obtain the surface temperature information by the radiation equation transmission method, and extracted the corresponding high temperature anomaly with the improved box-line diagram method. District, through the statistical analysis of the annual frequency of occurrence of high temperature anomaly areas, is studied to obtain urban thermal anomalies. The method can not only obtain the urban thermal anomaly area accurately and large-area, but also effectively avoid the influence of potential thermal anomalies compared with other methods.
2.   Data collection and processing
2.1   Geographic range
Sanya City is located at 18°09′34′′–18°37′27′′ north latitude and 108°56′30′′–109°48′28′′ east longitude. It is located at the southernmost tip of Hainan Province, mainly based on tourism and agriculture. The second largest city in Hainan Province. As one of the most important tourist cities in Hainan Province, its urban environment has always been a hot spot for the masses. However, with the continuous development and expansion of the city, the urban thermal anomaly is increasingly intensified. Therefore, the Sanya municipal district is selected as the research area, and the research schematic is shown in Figure 1.

Figure 1   Schematic diagram of the boundary of Sanya City
2.2   Basic data preparation
This article uses Landsat 5, Landsat 7, and Landsat 8 data with less than 30% cloud volume between 2008 and 2017 as the primary data source (available at https://glovis.usgs.gov). The band used in the temperature inversion and its spatial resolution are shown in Table 1. Among them, there is a problem with the operation of Band11. The Landsat 8 temperature inversion is used in Band 10. Table 2 shows the specific time information of the images used in this paper.
Table 1   Band information used for temperature inversion
 data source Band used Spatial resolution /m Landsat 5 Band3 30 Band4 30 Band6 120 Landsat 7 Band3 30 Band4 30 Band6 60 Landsat 8 Band4 30 Band5 30 Band10 100
Table 2   Image date used in the data set
yearsImage acquisition date
200820080419、20081028、20081113、20081129、20081215
200920090820、20091210
201020110316、20100706、20100714、20101026、20101127、20101213
201120111021
201220120321、20120913、201209220121015
201320130519、20130620、20130924、20131010
201420140114、20140420、20140924、20141013、20141114
201520150407
201620160308、20160916、20161103、20161221
201720170106、20170122、20170311、20170701、20170802、20171021、20171208、20171224
2.3   data processing
First, data processing such as splicing, cropping, and re-projecting of Landsat data is performed. The Landsat 7 ETM+ image is stripped and needs to be stripped on the basis of data preprocessing. Secondly, the radiation temperature equation is used to invert the surface temperature and the improved box-line diagram method is constructed to extract the high temperature anomaly area, and the spatial distribution map of the high temperature anomaly area is obtained. Finally, in order to avoid the phenomenon of missing judgment, and referring to the idea of extracting industrial heat source by Liu et al,[6] the frequency of thermal anomaly in the high temperature anomaly area is obtained by analyzing the frequency of thermal anomalies with the frequency greater than 60% as the thermal anomaly area. At the same time, the total area of the urban thermal anomaly area is counted. The specific data processing flow is shown in Figure 2.

Figure 2   data processing flow chart
2.3.1 land surface temperature retrieval
Based on the Landsat satellite image, the corresponding apparent radiance value is calculated according to the gain value and the offset value of the thermal infrared band of the Landsat data. Then, the surface specific radiance is estimated by using the image vegetation coverage. Finally, according to the Planck inverse function and the Landsat preset, the calibration factor calculates the surface temperature. The specific implementation go as follows:
Apparent radiance calculation is the process of converting the image pixel gray value to the corresponding heat radiation intensity, transferring the image's grayscale values to radiance values with the Radimetric Calibration tool in ENVI software.
(2) Estimation of surface specific emissivity
The surface specific emissivity is an ability to characterize the electromagnetic radiation of a ground object. Based on the near-infrared and red-light bands of the image, the normalized vegetation index NDVI is calculated according to formula (1).[11]

 $$NDVI=\left({TM}_{\mathrm{N}\mathrm{I}\mathrm{R}}-{TM}_{\mathrm{R}\mathrm{e}\mathrm{d}}\right)/\left({TM}_{\mathrm{N}\mathrm{I}\mathrm{R}}+{TM}_{\mathrm{R}\mathrm{e}\mathrm{d}}\right)$$ （1）
Where: $${\mathrm{T}\mathrm{M}}_{\mathrm{N}\mathrm{I}\mathrm{R}}$$ is the near-infrared band reflectivity of the image; $${\mathrm{T}\mathrm{M}}_{\mathrm{R}\mathrm{e}\mathrm{d}}$$ is the red band reflectance of the image.
Second, to estimate the surface specific emissivity using image vegetation coverage. Due to the differences in the structure of surface materials, the corresponding surface radiance calculation methods are different for different land types. There are roughly three types: water, town, and natural surfaces. The water surface pixel has a high surface emissivity and a similar radiance to the black body. Therefore, when estimating the water body specific emissivity, it is often assigned a value of 0.995. The natural surface and the town surface pixel are estimated by the surface emissivity according to the formula. (2) and formula (3) calculation:[12,13,14]

 $${e}_{\mathrm{S}\mathrm{u}\mathrm{r}\mathrm{f}\mathrm{a}\mathrm{c}\mathrm{e}}=0.9625+0.0614{P}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}-0.0461{{P}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}}^{2}$$ （2） $${e}_{\mathrm{B}\mathrm{u}\mathrm{i}\mathrm{l}\mathrm{d}\mathrm{i}\mathrm{n}\mathrm{g}}=0.9589+0.086{P}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}-0.0671{{P}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}}^{2}$$ （3）
Where:$$\mathrm{ }{\mathrm{e}}_{\mathrm{S}\mathrm{u}\mathrm{r}\mathrm{f}\mathrm{a}\mathrm{c}\mathrm{e}}$$ represents the surface specific emissivity of natural surface pixels; $${\mathrm{e}}_{\mathrm{B}\mathrm{u}\mathrm{i}\mathrm{l}\mathrm{d}\mathrm{i}\mathrm{n}\mathrm{g}}$$ represents the surface specific emissivity of the town pixel; $${\mathrm{P}}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}$$ represents the vegetation coverage, as calculated by equation (4):[15,16]

 $${P}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}={\left[\frac{NDVI-{NDVI}_{\mathrm{S}\mathrm{o}\mathrm{i}\mathrm{l}}}{{NDVI}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}-{NDVI}_{\mathrm{S}\mathrm{o}\mathrm{i}\mathrm{l}}}\right]}^{2}$$ （4）
Where: NDVI is the normalized vegetation index of the image, $${\mathrm{N}\mathrm{D}\mathrm{V}\mathrm{I}}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}$$ is the NDVI value of the pure vegetation pixel in the image, $${\mathrm{N}\mathrm{D}\mathrm{V}\mathrm{I}}_{\mathrm{S}\mathrm{o}\mathrm{i}\mathrm{l}}$$ is the NDVI value of the pure soil pixel in the image. When the NDVI of a pixel is greater than 0.70,$$\mathrm{ }{\mathrm{P}}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}$$ takes a value of 1; when NDVI is less than 0.05, $${\mathrm{P}}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}$$ takes a value of 0; when NDVI is between 0.05 and 0.7, $${\mathrm{N}\mathrm{D}\mathrm{V}\mathrm{I}}_{\mathrm{V}\mathrm{e}\mathrm{g}\mathrm{e}}$$ and NDVI are respectively used. $${\mathrm{N}\mathrm{D}\mathrm{V}\mathrm{I}}_{\mathrm{S}\mathrm{o}\mathrm{i}\mathrm{l}}$$ takes values of 0.70 and 0.05, and uses the above formula to estimate the vegetation coverage of the image.
（3）Surface temperature calculation
Based on the calculation results of apparent radiance and surface specific emissivity in the thermal infrared band, the surface thermal radiance is reversed according to the radiative transfer equation, as shown in equation (5):

 $$B\left({T}_{\mathrm{S}\mathrm{e}\mathrm{n}\mathrm{s}}\right)=\frac{L-{L}^{↑}-\tau \left(1-\epsilon \right){L}_{↓}}{\tau \epsilon }$$ （5）
Where: $$\mathrm{B}\left({\mathrm{T}}_{\mathrm{S}\mathrm{e}\mathrm{n}\mathrm{s}}\right)$$ is the surface thermal radiance; L is the planetary brightness temperature value; $${\mathrm{L}}^{↑}$$ and $${\mathrm{L}}_{↓}$$ are the atmospheric up-radiation and atmospheric down-radiation, respectively, according to the Landsat satellite transit time, the latitude and longitude of the center of interest, and the atmospheric mode. And the type of sensor obtained, the specific parameters can be directly obtained through NASA related webpage (http://atmcorr.gsfc.nasa.gov/); τ is the atmospheric path transmittance, which can be obtained by NASA related webpage; ε is the surface specific radiation rate.
Calculate the true surface temperature[17,18]using the Plank inverse function in conjunction with Landsat's preset scaling constants $${\mathrm{K}}_{1}$$ and $${\mathrm{K}}_{2}$$ as shown in equation (6):

 $${T}_{\mathrm{S}\mathrm{u}\mathrm{r}\mathrm{f}\mathrm{a}\mathrm{c}\mathrm{e}}=\frac{{K}_{2}}{\mathrm{ln}\left(\frac{{K}_{1}}{\mathrm{B}\left({\mathrm{T}}_{\mathrm{S}\mathrm{e}\mathrm{n}\mathrm{s}}\right)}+1\right)}-273.15$$ （6）
Where: $${\mathrm{T}}_{\mathrm{S}\mathrm{u}\mathrm{r}\mathrm{f}\mathrm{a}\mathrm{c}\mathrm{e}}$$ is the true temperature of the surface; $${\mathrm{K}}_{1}$$ and $${\mathrm{K}}_{2}$$ are the preset scaling constants of Landsat, which can be obtained from the image header file.
In order to avoid the influence of multi-phase data, the multi-temporal surface temperature data is normalized by pixel, and the specific implementation is as shown in formula (7):

 $$Image=\frac{X-{X}_{\mathrm{m}\mathrm{i}\mathrm{n}}}{{X}_{\mathrm{m}\mathrm{a}\mathrm{x}}-{X}_{\mathrm{m}\mathrm{i}\mathrm{n}}}$$ （7）
Where: Image represents the image cell value after normalization, the pixel value is between 0 and 1; X represents the image surface temperature value; $${\mathrm{X}}_{\mathrm{m}\mathrm{a}\mathrm{x}}$$ and $${\mathrm{X}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$$ represent the maximum and minimum values of the image surface temperature, respectively.
2.3.2 Extracting thermal anomalies based on improved box plot method
The improved box plot method has five statistics, which are the upper and lower quartiles, the upper and lower non-abnormal value intercept lines, and the median. The upper and lower non-abnormal value intercept lines are used to distinguish between normal values and abnormal values, and the data within the non-abnormal value intercept line is a normal value, otherwise it is an abnormal value. There are two main differences between this method and the most common box plot method in statistics: one is to introduce the Bowley coefficient, and the Pauli coefficient is used as the skewness multiplier of the sample data in the form of "1". When the data set is normally distributed, the Bowley coefficient is concentrated near 0. When the data is right skewed, the Bowley coefficient takes 1; when the data is left skewed, the Bowley coefficient takes −1.The second is to calculate the non-outlier intercept line by using the semi-quartile range, which can be better applied to data with skewed state.[19]The specific implementation of the method is as follows:
(1) Get the upper and lower quartiles and the semi-quartile range
The surface temperature data is obtained from the image-by-pixel by the IDL program, and then the upper and lower quartiles and medians of the corresponding data are obtained by using the SPSS software, that is, the values are ranked at 25%, 50%, and 75% respectively after sorting the data from small to large.
Then, based on the corresponding values based on the quartiles, the upper and lower semi-quartile distances are calculated using EXCEL software, as shown in equations (8) and (9):[20]

 $${SIQR}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{n}}={Q}_{2}-{Q}_{1}$$ （8） $${SIQR}_{\mathrm{u}\mathrm{p}}={Q}_{3}-{Q}_{2}$$ （9）
Where: $${\mathrm{S}\mathrm{I}\mathrm{Q}\mathrm{R}}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{n}}$$ and $${\mathrm{S}\mathrm{I}\mathrm{Q}\mathrm{R}}_{\mathrm{u}\mathrm{p}}$$ are the upper and lower limits of the box length, that is, the upper and lower semi-quartile distance of the improved box plot; $${\mathrm{Q}}_{1}$$,$$\mathrm{ }{\mathrm{Q}}_{2}$$ and$${\mathrm{Q}}_{3}$$ are the upper quartiles of the sample data respectively. , median and lower quartile.
（2）Calculate non-outlier intercept lines
The non-abnormal value intercept line includes a non-abnormal maximum intercept line and a non-abnormal minimum intercept line for determining whether the sample data is an abnormal value. The non-abnormal maximum intercept line is used as the basis for judging whether it is a thermal abnormal value. If the sample data exceeds this value, it is determined to be a thermal abnormal value, that is, there may be a thermal anomaly. The farther the sample data deviates from the outside of the extreme anomaly intercept line, the more serious the thermal anomaly is. The implementation process is as shown in formula (10), formula (11) and formula (12):

 $${B}_{\mathrm{c}}=\frac{{SIQR}_{\mathrm{u}\mathrm{p}}-{SIQR}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{n}}}{{SIQR}_{\mathrm{u}\mathrm{p}}+{SIQR}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{m}}}$$ （10）
Where: $${\mathrm{B}}_{\mathrm{c}}$$ is the Bowley coefficient, between −1 and 1; $${\mathrm{S}\mathrm{I}\mathrm{Q}\mathrm{R}}_{\mathrm{u}\mathrm{p}}$$ a$${\mathrm{S}\mathrm{I}\mathrm{Q}\mathrm{R}}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{m}}$$ are the upper and lower semi-quartile distances of the box respectively.

 $${f}_{\mathrm{u}\mathrm{p}}={Q}_{3}+1.5×IQR×\left(\frac{1+{B}_{\mathrm{c}}}{1-{B}_{\mathrm{c}}}\right)$$ （11） $${f}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{n}}={Q}_{1}-1.5×IQR×\left(\frac{1-{B}_{\mathrm{c}}}{1+{B}_{\mathrm{c}}}\right)$$ （12）
Where: $${\mathrm{f}}_{\mathrm{u}\mathrm{p}}$$ and $${\mathrm{f}}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{n}}$$ are the non-abnormal maximum intercept line and the non-abnormal minimum intercept line of the box respectively. When a certain data in the sample data set is larger than $${\mathrm{f}}_{\mathrm{u}\mathrm{p}}$$, it is classified as a thermal abnormal value; when the sample data set is some When a data is less than $${\mathrm{f}}_{\mathrm{d}\mathrm{o}\mathrm{w}\mathrm{n}}$$, it is classified as a cold abnormal value.
Finally, use ArcGIS software to filter the values larger than the non-abnormal maximum intercept line by pixel by pixel
2.3.3 Extracting thermal anomaly zone based on thermal anomaly frequency
The surface temperature of the thermal anomaly zone tends to be higher than the surrounding surface temperature, forming a local heat island phenomenon, and its frequency of occurrence is generally greater than the frequency of occurrence of thermal anomalies caused by other factors. Therefore, based on the results of the thermal anomaly extraction, the frequency of occurrence of the annual thermal anomaly area is statistically analyzed. In order to avoid the phenomenon of missed judgment, when the frequency of occurrence of a thermal anomaly zone is greater than 60%, it is determined as an urban thermal anomaly zone, as shown in formula (13):

 $${f}_{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}=\frac{{i}_{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}}{{n}_{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}}$$ （13）
Where: $${\mathrm{f}}_{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}$$ represents the frequency of occurrence of thermal anomalies; $${\mathrm{i}}_{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}$$ represents the number of occurrences of a local thermal anomaly in a year; $${\mathrm{n}}_{\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}}$$ represents the total number of occurrences of the overall thermal anomaly in a year
In order to further explore the distribution range of urban thermal anomaly areas, statistically analyze the frequency of occurrence of thermal anomalies in 2008–2017, and take the area with the extraction frequency greater than 60% as the urban thermal anomaly area of 10 a, as shown in formula (14):

 $${f}_{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}=\frac{{i}_{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}}{{n}_{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}}$$ （14）
Where:$$\mathrm{ }{\mathrm{f}}_{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}$$ represents the frequency of occurrence of thermal anomalies in 10 a; $${\mathrm{i}}_{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}$$ represents the total number of occurrences of a local thermal anomaly in 10 years; $${\mathrm{n}}_{\mathrm{g}\mathrm{l}\mathrm{o}\mathrm{b}\mathrm{a}\mathrm{l}}$$ represents the total number of occurrences of the overall thermal anomaly in 10 years, that is, a value of is 10.
3.   Sample description
Use the above steps to complete the 2008-2017 Sanya Thermal Anomaly Spatial Distribution Product Dataset. Figure 3 is an example of the spatial distribution product of the 10 a thermal anomaly zone in Sanya City.

（a）2008

（b）2009

（c）2010

（d）2011

（e）2012

（f）2013

（g）2014

（h）2015

（i）2016

（j）2017

Figure 3   Spatial distribution results of thermal anomaly areas in Sanya City, 2008–2017
4.   Quality control and assessment
This data set is quality controlled through the following aspects:
In order to verify the effectiveness of the method, Google Earth can be used to check the advantages of previous years' images. The indirect verification is carried out by counting the total number of factory areas in 2008-2017 and the number of factory areas identified by the improved box plot method and the number of factory areas. The specific method is as follows: Firstly, circle the factory area on Google Earth every year and count the number. Secondly, convert the hot anomaly identified by the improved box plot method into kmz format data, load it into Google Earth and count it at the factory year by year. The number of districts; Finally, one factory area contains one or more factories, and the factory identified by the improved box plot method is counted to the factory area, the number of factory areas is counted, and the number of the factory area and the Google Earth circle The ratio of the number of factory areas is the accuracy. Because some factories stop producing or emitting enough heat to reach the range of thermal anomalies, or some human causes the small field temperature to reach the thermal anomaly range, and the error of the manual interpretation of the factory, will affect the final accuracy, the specific annual accuracy is shown in Table 3.
Table 3   Accuracy table of thermal anomaly extraction in Sanya City, 2008–2017
yearsIdentify the numberNumber of identification areasNumber of factory areasPrecision /（%）
200835683.33
200955683.33
20101466100
2011866100
201255683.33
201375683.33
201411616.67
2015766100
20161366100
2017866100
5.   Data value
In this study, analyses of high-resolution datasets and well-log data were combined to characterize a distinct long-term dynamics of thermal anomalies in Sanya city. The long-term sequence of Landsat data, combined with the improved box plot method for thermal anomaly extraction, provides data services for city planning, construction management and urban ecological quality assessment. Studies on and evaluation of the thermal characters of Sanya residential areas for improving the quality of environment are valuable in terms of town planning to offer better conditions for the development of infrastructure and have a great potential for the formation of scenic landscape for urban development.
In addition, the application of remote sensing technology to monitor and evaluate urban thermal anomalies can help alleviate the pressure of urban development on the environment, promote environmental protection and governance, and provide technical reference for the construction of Sanya International Island. The research results indicate that understanding the main thermal mechanisms are long-term beneficial to identify some of the common anomalies found in the city residential thermography, their relation to construction and their probable cause. The long-term sequence of urban thermal anomalies and technologies obtained in this data set can also be promoted and applied in other cities.
Acknowledgments
Thanks to the city's land surface environment team for providing technical support services, thanks to the USGS and China Air Quality Online Monitoring and Analysis Platform for providing data support services.
[1] Yang Xinxing, Li Shilian, Qi Peng, et al. Thermal Pollution and Its Harm in the Environment[J]. Frontier Science, 2014, 8(3): 14-26.
[2] Qu Chen. Reflection of the Severe Summer——Interpretation of Urban Thermal Pollution[J]. Environmental Guide, 2003(19): 18-18.
[3] Guo Guixiang, Chen Dongwei. Understanding the Problem of Thermal Pollution[J]. Environmental Protection, 1994(5): 35-37.
[4] Wu C , Wang Q , Yang Z , et al. Monitoring heated water pollution of the DaYaWan nuclear power plant using TM images[J]. International Journal of Remote Sensing, 2007, 28(5): 885-890.
[5] Chen C Q, Shi P, Mao Q W. Application of Remote Sensing Techniques for Monitoring the Thermal Pollution of Cooling-Water Discharge from Nuclear Power Plant[J]. Environmental Letters, 2003, 38(8): 10.
[6] Liu Y, Hu C, Zhan W, et al. Identifying industrial heat sources using time-series of the VIIRS Nightfire product with an object-oriented approach[J]. Remote Sensing of Environment, 2017: S0034425717304820.
[7] Xia H, Chen Y, Quan J. A simple meathod based on the thermal anomaly index to detect industrial heat sources[J]. International Journal of Applied Earth Observation & Geoinformation 2018 (73): 627-637.
[8] Zhang L , Meng Q , Sun Z , et al. Spatial and temporal analysis of the mitigating effects of industrial relocation on the surface urban heat island over China[J]. ISPRSInternational Journal of Geo-Information , 2017, 6(121): 121.
[9] Murata A , Sasaki H , Hanafusa M , et al. Estimation of urban heat island intensity using biases in surface air temperature simulated by a nonhydrostatic regional climate model[J]. Theoretical & Applied Climatology, 2013, 112(1-2): 351-361.
[10] Yang X , Leung L R , Zhao N , et al. Contribution of urbanization to the increase of extreme heat events in an urban agglomeration in east China[J]. Geophysical Research Letters, 2017, 44. DOI: 10.1002/2017GL074084.
[11] Zhao Yingshi. Principles and Methods of Remote Sensing Application Analysis[M]. Beijing: Science Press, 2003.
[12] Zhai Zhihao, Li Wenjuan, Xu Bin, et al. Estimation of surface specific emissivity in terrestrial satellite TM6 band[J]. Remote Sensing for Land and Resources, 2004, 16(3): 28-32.
[13] Ding Feng, Xu Hanqiu. Surface Temperature Retrieval Algorithm and Experimental Analysis of TM Thermal Band Images[J]. Journal of Earth Information Science, 2006, 8(3): 125-130.
[14] Deng Shubin. ENVI Remote Sensing Image Processing Method. 2nd Edition [M]. Beijing: Higher Education Press, 2014.
[15] Sobrino J A , Jimenez-Muoz J C , Soria G , et al. Land surface emissivity retrieval from different VNIR and TIR sensors[J]. IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(2): 316-327.
[16] Carlson T N , Ripley D A . On the relation between NDVI, fractional vegetation cover, and leaf area index[J]. Remote Sensing of Environment, 1997, 62(3): 241-252.
[17] Jimenezmunoz J C , Cristobal J , Sobrino J A, et al. Revision of the single-channel algorithm for land surface temperature retrieval from Landsat thermal-infrared data[J]. IEEE Transactions on Geoscience & Remote Sensing, 2009 47(1): 339-349.
[18] Chander G, Markham B. Revised landsat-5 tm radiometric calibration procedures and postcalibration dynamic ranges[J]. IEEE Transactions on Geoscience and Remote Sensing, 2003, 41(11): 2674-2677.
[19] Walker M L , Dovoedo Y H , Chakraborti S , et al. An improved boxplot for univariate data[J]. American Statistician, 2018: 1-13.
[20] Hubert M, Vandervieren E. An adjusted boxplot for skewed distributions[J]. Computational Statistics & Data Analysis, 2004, 52(12): 5186-5201.
Article and author information
Meng Q, Gu Y, Hao L, et al. Remote sensing data for urban thermal anomaly in Sanya City [J/OL]. China Scientific Data 4(2019). DOI: 10.11922/csdata.2018.0077.zh.
Meng Qingyan
Mainly responsible for data set design and technical guidance work.
(1971-), male, Hedong, Dongdong, Ph.D., researcher, doctoral tutor, mainly engaged in urban land surface environment remote sensing and seismic infrared remote sensing research.
Gu Yanchun
Mainly responsible for data processing and analysis.
(1993-), female, Zhoukou, Henan, master student, the main research direction is urban thermal environment.
Hao Lichun
Mainly responsible for data processing and analysis.
(1993-), female, Shanxi Yangquan, master student, the main research direction is urban thermal environment.
Hu Die
(1994-), female, Tianjin native, doctoral student, the main research direction is urban thermal environment.
Zhang Ying
Mainly responsible for data set design and technical guidance work.
(1994-), male, Wuhan, Hubei, Ph.D. student, the main research direction is urban thermal environment.
Zhang Linlin
Mainly responsible for data set design and technical guidance work.
(1994-), female, Hengshui, Hebei, Ph.D. student, the main research direction is urban thermal environment.
Natural Science Foundation of Hainan Province (417219); Major Science and Technology Program of Hainan Province (ZDKJ2016021); Science and Technology Program of Sichuan Province (2018JZ0054); Major Science and Technology Program of Hainan Province (ZDKJ2017009).
Publication records
Published: May 16, 2019 （ VersionsEN1
Released: Dec. 12, 2018 （ VersionsZH1
Published: May 16, 2019 （ VersionsZH2
References

csdata