The data processing of the dataset is shown in Figure 2, and specific steps are as follows:

**Figure 2
**The manufacturing and preparation process of the dataset of LST inversion for the China-Pakistan Economic Corridor**Step 1** First, screen the basic data of Landsat 8 according to the data quality. Remove any cloud cover above 20%, and then process the data from OLI and TIRS sensor respectively.

**Step 2 ** Calculate the radiation luminance values

\({L}_{\lambda }\) in band 10 and 11 of TIRS respectively on the basis of the Landsat 8 data manual

^{[4]} and calculate the star brightness temperature

*BT *based on radiant brightness values. Inverse the amount of the vapor of water

*ω *based on the split window covariance ratio algorithm, so as to correct the effect of atmosphere and land surface emissivity. The algorithm is as follows:

\[\omega ={c}_{0}+{c}_{1}×\frac{{\tau }_{j}}{{\tau }_{i}}+{c}_{2}{\left(\frac{{\tau }_{j}}{{\tau }_{i}}\right)}^{2} \left(1\right)\]

\[\frac{{\tau }_{j}}{{\tau }_{i}}=\frac{{\epsilon }_{i}}{{\epsilon }_{j}}{R}_{j,i}, { R}_{ji}=\frac{{\sum }_{k=1}^{N}\left({BT}_{i,k}-{\stackrel{-}{BT}}_{i}\right)\left(B{T}_{j,k}-{\stackrel{-}{BT}}_{j}\right)}{{\sum }_{k=1}^{N}{\left({BT}_{i,k}-{\stackrel{-}{BT}}_{i}\right)}^{2}} \left(2\right)\]

In the formula, \({\mathrm{\tau }}_{\mathrm{i}}\) and \({\mathrm{\tau }}_{\mathrm{j}}\) are atmospheric transmittance at band 10 and band 11 respectively. \({\mathrm{\epsilon }}_{\mathrm{i}}\)and \({\mathrm{\epsilon }}_{\mathrm{j}}\) are corresponding specific emissivity. \({\stackrel{-}{\mathrm{B}\mathrm{T}}}_{\mathrm{i}}\)and \({\stackrel{-}{\mathrm{B}\mathrm{T}}}_{\mathrm{j}}\) are average brightness temperature of band 10 and band 11 in the moving window. *k* is the index of pixels in a window. *N *is the window size. \({c}_{0}\), \({c}_{1}\) and \({c}_{2}\) are the regression coefficients generated by the simulation based on the atmospheric radiative transfer model MODTRAN and the atmospheric profile database TIGR (Thermodynamic Initial Guess Retrieval database).

**Step 3 ** Surface emissivity is a measure of surface heat dissipation efficiency and different underlying surfaces have different emittance values (0–1). The dataset adopts the vegetation coverage weighting method

^{[6]}^{[7]} and obtain surface emissivity

*ε *based on band 4 and 5 of Landsat 8 OLI.

\[\varepsilon ={\epsilon }_{v}FVC+{\epsilon }_{s}\left(1-FVC\right)+4 〈 d\epsilon 〉 FVC \left(1-FVC\right) \left(3\right)\]

\[FVC={\left(\frac{NDVI-{NDVI}_{s}}{{NDVI}_{v}-{NDVI}_{s}}\right)}^{2} \left(4\right)\]

\[NDVI=\left({\rho }_{5}-{\rho }_{4}\right)/\left({\rho }_{5}+{\rho }_{4}\right) \left(5\right)\]

In the formula, the data of the emissivity of vegetation components \({\epsilon }_{v}\) and surface background emissivity \({\epsilon }_{s}\) are from the spectral database. <*dε*> refers to the cavity effect parameters formed by multiple scattering between components in a pixel. *FVC *is the vegetation coverage. *NDVI*_{s } and *NDVI*_{v } are normalized vegetation index NDVI for bare soil and dense vegetation respectively. \({\rho }_{4}\) and \({\rho }_{5}\) are surface reflectance of the Landsat 8 red band Band 4 and near-infrared band Band 5 corrected based on the atmospheric radiometric correction model.

**Step 4 ** Surface thermal radiation transfer equation is the basis of remote sensing inversion of surface temperature. Based on the theory of surface thermal radiation, the inversion model set of surface temperature LST includes single channel inversion model of radiation transfer equation (SC1)

^{[8]}, single channel emittance correction model (SC2)

^{[9]}, two-channel split window algorithm model (SW)

^{[10]}and data fusion algorithm (DF). The detailed derivation process and parameters won’t be repeated here and users can refer to the corresponding literature of each algorithm. The main formula is as follows: