Sep 17, 2021 (by Jesse Stone)
Part of a series on PDS4
Let’s start with a story. Lunar Orbiter was a series of missions that ran from 1966-1967, in support of the Apollo missions. The data was no longer considered useful after Apollo, so the tapes containing the data were forgotten for 20 years. After 20 years, the tapes would have been discarded if not for Nancy Evans, an archivist at the Jet Propulsion Laboratory. She began the Lunar Orbiter Image Recovery Project, intended to retrieve and preserve the data from these tapes. It was difficult to retrieve the data from the tapes, since the tapes required special readers that were no longer in production. Additionally, the recovered data was not usable at first, and required additional special hardware to convert to a digital format. The first images from this project were not recovered until 2008, over 40 years after the mission first launched.
There are many lessons that can be learned from this story. While the tapes themselves were initially archived, the hardware to read them was not, and those tapes were never converted to new formats. Then, they were nearly discarded. It is only a long chain of hard work and luck that made it possible to recover the data.
All data has value beyond its intended scope. On top of this, it is irreplaceable. Even if you had the money to launch a new mission, you would never reproduce data that was lost.
This is the mission of the PDS: to prevent data from being destroyed, lost, or forgotten.
What Can Happen to Data
Without being actively archived and maintained, data is vulnerable to many risks. It can be physically destroyed or deleted, it can be made inaccessible when the tools to read it are gone, or it can become incomprehensible when the knowledge to interpret it is lost. Finally, even if it is readable, it might become unusable if the context around the observations are lost. A picture of an asteroid is nothing more than a wallpaper if you don’t know where and when it was taken.
There are some ways to protect against these risks. Physical destruction can be prevented by keeping multiple copies of the data. The protection is better if the copies are far apart. Physical obsolescence can be protected against by either keeping all of the equipment needed to read media, or regularly moving data to modern media. Logical obsolescence can be protected against by keeping copies of all of the important documentation with the data, and keeping the data in formats that are likely to be supported in the future.
How Does the PDS Protect Data?
Archiving your data in the PDS is an important and easy way to protect it. Once it is archived, PDS will take over maintenance of the data files. PDS will also take steps to physically protect the data, keep the data on readable media, and ensure that the data remains documented well enough to be usable.
PDS prefers open formats, especially those defined by a standards body like ISO or ANSI. PDS prefers to keep data formats as simple as possible. This means that the data products that are archived the most are images and data tables. The advantage to this approach is that anybody can read the data with simple programming tools. This simplicity ensures that specialized knowledge required to read in the data is not lost to time.
The data in PDS is self contained. Whenever possible, critical artifacts, such as documentation, are included with the data products. This ensures that everything needed to understand the data is readily available. Naturally, this documentation is stored in a long-lasting format, either in plain-text, which is as simple as possible, or as a PDF/A, a rigorously defined international standard designed for long-term archiving.
PDS requires that all of its archived data comes with comprehensive metadata. The metadata documents everything about a data file, including its internal structure, the chain of instruments that created the data, and time and place where the data was created. This prevents data from being misinterpreted.
PDS maintains the files themselves and ensures that they are kept on readable media. Currently, all data is stored on servers accessible over the internet via HTTPS. Data will be moved to other, more modern formats as needed. PDS also ensures that the metadata is kept up to date and readable. PDS nodes are currently hard at work migrating older data to the latest version of the PDS standard.
PDS also takes steps to ensure that data is not physically destroyed. Multiple copies of the data are kept in geographically separated regions. This is accomplished in many ways, such as mutual hosting between PDS nodes, using cloud services such as Amazon AWS, or storing data in a deep archive such as NSSDCA. This ensures that a natural disaster or cyberattack does not destroy every copy of the data. In addition, data is regularly checked for integrity. This ensures that the data has not been altered, accidentally or otherwise.
Making Data Accessible
Finally, PDS ensures that your data remains accessible to the world. It does so by providing a central repository of data along with a variety of search services. This helps, because data is only useful if it can be found. It’s also better if everyone in the community can find the same data. To make finding data easier, PDS assigns a unique identifier to every data product. This ensures each product can be easily discovered and referenced called a LID. PDS also provides DOIs for data, ensuring that other organizations can find data.
Who Can Get Data from the PDS?
Data archived in PDS is available to everyone with no limits on distribution. It is usable by all communities including researchers, students, and citizen scientists. Data in the PDS is available for use by other projects and can be referenced when writing proposals for funding.
Who Can Add Data to the PDS?
A variety of groups can archive data through the PDS as well. Scientists may archive their results in the PDS, which is analogous to publishing the data. Larger projects, such as missions and ground-based surveys, may archive their data. Regardless of who submits data to the PDS, it is subjected to a rigorous peer-review process ensuring that the data and metadata are high-quality and scientifically useful.
Spacecraft data is very valuable and often irreplaceable. It is also very vulnerable to a variety of risks that could make it inaccessible to the community and to future generations. The PDS provides an archive that will protect data from these risks and makes the data available to the scientific community and to the world at large. Anyone who produces planetary data should consider archiving their data with the PDS in order to keep it safe and accessible.