"Pool mirroring the sky and clouds. From the source run fresh currents" — Introducing China Scientific Data
"Pool mirroring the sky and clouds. From the source run fresh currents" — Introducing China Scientific Data
Guo Huadong, Editor in Chief
Academician of the Chinese Academy of Sciences
A brand new era for mankind is dawning – the big data era. As a revolutionary innovation that transforms our understanding of scientific productivity, big data not only brings new methodologies into scientific research, but creates a research paradigm expediting a new way of thinking for pursuit of scientific discoveries. China Scientific Data comes out at the crossroads of such a global tide and national strategic incorporation.
From the founding of Kepler's theory of planetary motion some four hundred years ago, the Tomson's discovery of the first particle of the late 19th century, to the nowadays formulation of new medicines, and the development of digital earth, all known scientific progress relied on the analysis and utilization of data which were directly collected or derived from assays, observations, surveys, measurements, and simulations. It follows that the core objective of information technology is to improve human ability in data/information collection, transmission, storage, management and analysis. Indeed, the improvement has brought significant data explosion since the new millennium, when scientific research presents an increasing data-intensive or data-centered feature that pushes scientific research into the big data era. "Big data is a strategic highland in the era of knowledge-driven economy, and it is also a new type of strategic resource for all nations [...] the competition around big data will not only determine international patterns in the information industry, but it will also profoundly impact economic development, national security, scientific and technical progress, and comprehensive competitiveness"1.
Data has become a touchstone of the value of scientific research. On the one hand, new discoveries of many disciplines are not only based on data, but aim to discover new data and integrate them into important discoveries with mining and analytical tools. On the other hand, data has enabled scientists to reproduce assays and validate research findings. New explorations in biological sciences, high-energy physics, digital earth, global changes, deep space exploration, and so forth have unanimously revealed the connectivity and inextricability between big data and big science. Scientific data walked out of its cabinet into the spotlight, becoming a magic key to scientific research, which opens up new possibilities for data-driven discoveries. As a branch of big data, scientific big data becomes a paradigm of data-intensive science following assays, theories and computation patterns. The shift from model-driven to data-driven brings about innovations in scientific methodologies2. Scientific big data has played and will continue to play an important role in the fields of big science, and its greater contributions to scientific discoveries should be expected3.
The rapid development of modern sciences greatly benefits from open explorations. Since the publication of the world's first scientific journals in the 17th century, an open exchange mechanism has gradually been established to publicize research findings, which bestows sciences with great powers for self correction and mobilization. However, restrained by previous media and communication technologies, publications, represented by journal articles, had to reduce or even omit the data that supported the arguments of research findings. With the rapid development of information technologies, predicaments – the volumetric, spatial and temporal limitations of information storage and dissemination – have been overcome, which decreased the costs of data storage and dissemination while fulfilling technical conditions for scientific evidence publicizing. Now the publicizing seems more urgent and imperative. Open scientific data helps to improve the validity and credibility of research findings, thus ensuring the self correction of sciences. Besides, it also provides rich foundational materials for a larger number of scientific activities, expands the scope of scientific research, broadens its dimension, and produces scientific knowledge that can be developed and utilized by production industries to generate unpredictable social and economic values. Therefore, the open data initiative calls for the attention of both the academia and all its stakeholders, including the government, funding agencies, publishers and the general public4, 5.
Up to now, a number of international organizations (e.g., OECD6, ICSU7, GEO8) and countries have released policy documents concerning the sharing of scientific data. For instance, the US government promulgated Circular No. A-1309 and Open Data Policy – Managing Information as an Asset10. Funding agencies of the United States like the National Science Foundation11 and the Institute of Health12 also issued corresponding policies concerning data openness. There is no exception in China where data sharing has been incorporated into national strategies. The Action Plan to Promote the Development of Big Data13, released in August 2015 by the State Council, put forward a clear goal to "actively promote the gradual openness and sharing of scientific data obtained or produced from publicly funded non-profit research activities". In response to state policies, many institutions formulated their own data management and sharing rules, such as New York University Policy on Retention of and Access to Research Data14 and University of Cambridge Research Data Management Policy Framework15. The publication industry also increasingly recognizes the necessity of scientific data sharing, among which some publishers (e.g., Science16, Nature17, BMC18, PLoS19) have reformed their policies for journal publication. As such, policies of all levels jointly contribute to making open data an accord for future scientific research5, 20.
Currently, scientific data sharing presents multiple exploratory modes on the practical level, including: large-scale international collaborative projects or organization-built open database or data platforms (e.g., ICSU World Data System, GEOSS Portal, Worldwide Protein Data Bank), government-invested national scientific data centers or systems (e.g., NASA data centers21, NERC data centers22, Platform of the National Science and Technology Infrastructures of China), internal repositories for data storage and sharing of institutions (e.g., Oxford ORA-Data, Harvard Dataverse), practices of the publishers that ask authors to submit and publicize their supporting data, as well as general data repositories and data storage and sharing platforms (e.g., Dryad, Figshare, ScienceDB). Though joyful progress has been made in the policy and practical aspects, the overall efficacy has not reached the expected level.
The professionalization of modern sciences has accelerated scientific progress, and a metrics-based evaluation mechanism was established to assess scientists' contribution and academic credibility. Today, the importance of open data has been widely recognized by scientists and stakeholders, but the evaluation mechanism has largely remained intact, which is a fundamental problem that frequently invalidates the open data endeavor23. In addition, the lack of open tools and platforms dedicated to rapid publication constitutes another major challenge. There is a demand for a multi- disciplinary publishing mode resilient for multi-systems, a demand for an integrated platform of data sharing or the integrated publishing of multi-data24. Consequently, researchers proposed the concept of data publishing, aiming to make data citable and permanently accessible while helping to incorporate data into the existing evaluation system. Data publishing is hence believed to have potential to break the foundational predicament of data sharing25. The basic procedure of data publishing borrows from that of traditional academic journals26. There are currently three publishing forms: data-as-attachment publication, article-as-descriptor publication, and standalone data publication27. As a newly emerged form, article-as-descriptor fits well into the current scientific evaluation system, trying to mediate through the interests mechanism for data openness.
Scientific data publishing provides a new perspective. By means of the publishing media, readers can more easily discover, access, understand, reanalyze, reuse, or cite the data. In fact, action has been taken by some publishers, who reformed their policies concerning journal publication and proposed new requirements for supporting data publishing. Among them, standalone data publication initiates a new mode of exploration. It inherits the traditional discovery-releasing system and mediates stakeholders through an effective interests-balancing mechanism to make data more discoverable, citable, intelligible and reusable.
The Computer Network Information Center of the Chinese Academy of Sciences, together with the Chinese National Committee for CODATA, has been long devoted to the open data initiative. In view of the latest data publishing mode, they jointly launched China's first professional data journal China Scientific Data. Driving data publishing by means of data paper publication, the Journal aims to make data discoverable, intelligible, citable, reusable, evaluable and permanently accessible, in the hope of advancing domestic progress of scientific data sharing and also contributing to the development of international data sciences. The Journal is among the first online serial publications in China. Its experience is expected to shed light on the country's online publication industry at large.
Data publishing is a mobilizing force of the scientific community – it allows us to see the big data captured by publicly funded projects. Data publishing is the touchstone of the scientific community – the open access format of data papers and datasets will help to gradually expand the value of scientific data. Data publishing is a detector of scientific frontiers – it enables us to better perceive all innovative achievements made from different fields at home and abroad. "Pool mirroring the sky and clouds. From the source run fresh currents". By awakening the long-sleeping data, China Scientific Data hopes to inject fresh currents into the scientific community, providing data services and charting new territories in the field of data publishing.
1. Guo H, Wang L & Liang D. Big earth data from space: a new engine for Earth science. Science Bulletin 61 (2016): 505 – 513.
2. Guo H, Wang L, Chen F et al. Scientific big data and digital earth. Chinese Science Bulletin 59 (2014): 5066 – 5073.
3. Guo H. Big data, big science, big discovery: review of CODATA Workshop on Big Data for International Scientific Programmes. Bulletin of Chinese Academy of Sciences 29 (2014): 500 – 506.
4. The Royal Society. Science as an Open Enterprise, trans. He W et al. Shanghai: Shanghai Jiao Tong University Press, 2015.
5. Gu L. Studies on the Open Access Policy for Scientific Data. Beijing: Scientific and Technical Documentation Press, 2016.
6. OECD. OECD Principles and Guidelines for Access to Research Data from Public Funding. Available at: [Accessed May 27th, 2016].
7. WDS. Data Sharing Principles. Available at: [Accessed May 27th, 2016].
8. GEO-VI. Implementation Guidelines for the GEOSS Data Sharing Principles. Available at: [Accessed May 27th, 2016].
9. The White House. Circular No. A-130. Available at: [Accessed May 27th, 2016].
10. Executive Office of the President of the United States. Open Data Policy – Managing Information as an Asset. Available at: : [Accessed May 27th, 2016].
11. The National Science Foundation. Proposal and Award Policies and Procedures Guide. Available at: [Accessed May 27th, 2016].
12. National Institutes of Health. NIH Data Sharing Policy and Implementation Guidance. Available at: [Accessed May 27th, 2016].
13. State Council of the People's Republic of China. Action Plan to Promote the Development of Big Data. Available at: [Accessed May 27th, 2016].
14. New York University. Policy on Retention of and Access to Research Data. Available at: [Accessed May 27th, 2016].
15. University of Cambridge. Research Data Management Policy Framework. Available at: [Accessed May 27th, 2016].
16. American Association for the Advancement of Science. Science: Editorial Policies. Available at: [Accessed May 27th, 2016].
17. Springer Nature. Availability of Data, Material and Methods. Available at: [Accessed May 27th, 2016].
18. BioMed Central. Open Data. Available at: [Accessed May 27th, 2016].
19. PLOS. Data Availability. Available at: [Accessed May 27th, 2016].
20. Hou Y & Hu L. Data policy development, in Scientific Discovery in Big Data Era, 199 – 209, ed. The Chinese National Committee for CODATA. Beijing: Science Press, 2014.
21. NASA Administrator. Data from NASA's Missions, Research, and Activities. Available at: [Accessed May 27th, 2016].
22. NERC. Data Centres. Available at: [Accessed May 27th, 2016].
23. Fecher B, Friesike S, Hebing M et al. A Reputation Economy: Results from an Empirical Survey on Academic Data Sharing (February 2015). Available at: [Accessed May 27th, 2016].
24. Guo H, Chen R, Xu Z et al. Big data in natural sciences, humanities and social sciences: review of the 6th Exploratory Round Table Conference. Bulletin of Chinese Academy of Sciences 31 (2016): 707 – 716.
25. Wu L, Wang L, Nan Z et al. Scientific data publication: a review and framework. Remote Sensing Technology and Application 28 (2013): 383 – 390.
26. Zhang X & Li X. Key theoretic and practical issues about data publication. Chinese Journal of Scientific and Technical Periodicals 26 (2015): 813 – 821.
27. Zhang X, Shen Z & Liu F. Scientific data and document interoperability, in Scientific Discovery in Big Data Era, 149 – 158, ed. The Chinese National Committee for CODATA. Beijing: Science Press, 2014.
How to cite this article: Guo H. "Pool mirroring the sky and clouds. From the source run fresh currents" — Introducing China Scientific Data. China Scientific Data 1 (2016), DOI: 10.11922/csdata.0.2016.0014