Astronomical Techniques - Data Archives

Astronomy has a long history of exploiting data originally taken by someone else or for another purpose, going back to the imperial Chinese records of nova and supernova outbursts, through the plate vaults at Lick, Mt. Wilson, and Palomar, and brought to real public availability by digital data and networking. For many astronomers, it was the IUE experience that brought home the values of a digital archive - by the end of the mission, more than 105 spectra had been taken, and the archive was producing more published results than were new data. The WWW has added value to the whole concept. Now that data are digital, they can be exactly duplicated (unlike photographic plates), and the HTTP interface avoids the plethora of individual account authorizations that were briefly common in the early 1990s. Networks are fast enough that shovelling GBytes of data around is routine (although there is always a race with the trendiest datasets swamping available bandwidth, now heading for 10-Tbyte/night volumes)..

A useful data archive must be filled - that is, the actual data must be routinely retrievable. It must be searchable and preferably indexed on useful quantities such as position, time, and perhaps target identifiers. The spherical nature of celestial coordinates makes efficient position searches a bit more complex than commercial data bases support off the shelf. It must be documented - the user wasn't there at the time, so every factor that might affect data quality or interpretation must be saved along with the actual data bits. This makes data from space missions more amenable than from many Earthbound sites with changeable atmospheric conditions.

The standard in archiving is set by the HST project. The data are all uniform and of identifiable quality, and they have more resources for its management than anyone else. The archive offers not only a range of search and sorting options, but the ability to see image previews and an outline of the field superimposed on the Digital Sky Survey. This works so well that STScI manages the archives from the International Ultraviolet Explorer (IUE), Extreme Ultraviolet Explorer (EUVE), Far-UV Spectroscopic Explorer (FUSE), and Copernicus. These archive may be found here. This is also the server for the VLA FIRST 21-cm sky survey.

X-ray and γ-ray data are mostly to be found at the High Energy Astrophysics Science Archive Research Center (HEASARC).

Infrared data may be found from the IPAC IRSA site, IRAS and Akari had nearly full-sky coverage with a polar ecliptic scanning scheme. 2MASS and WISE used many short exposures stepping the telescope in between, while ISO, Hershel, and Spitzer were more traditional observatories with many small targeted fields for imaging and spectroscopy. The actual ISO archive interface is at the ESA ISO Data Centre.

For ground-based data, the most extensive archive is the Digitized Sky Survey, comprising the Palomar Sky Survey and the ESO-SRC survey of the southern sky. There are two versions available, plus a CDROM set. The WWW sources are Skyview (http://skyview.gsfc.nasa.gov/skyview.html) and the HST proposal-preparation site (http://stdatu.stsci.edu/dss/dss\_form\_phase2.html). The PSS had a pivotal role in the astronomy of the 1960s, and is worth knowing about in some detail. The Sloan Digital Sky Survey (SDSS, http://wwwsdss.org) has had a similar impact at the start of the 21st century.

The PSS used the 1.2-m Schmidt at Palomar, deliberately built to survey potential targets for the Hale telescope. On 14-inch plates, it covered slightly more than a 6× 6° field at once, so the survey has strips centered at δ = 0°, ±6°, ±12°... photographed in red (E) and blue (O) light. This is where the Abell and Zwicky catalogs came from (and in fact some of the plates were taken by George Abell as a graduate student, so he got the first look at many fields). The POSS was a huge advance, magnitudes deeper than the earlier photographic surveys. The limiting magnitude was about 21, and varies depending on the generation of glass or paper copy you can find. This was complemented in the 1980s by a southern survey (originally from -15° southward, later extended to the equator) as a joint effort with the 1m ESO Schmidt (red-light F plates) and the 1.2m UK Schnidt in Australia (blue J plates). Better corrector plates and emulsions make these data about a magnitude deeper than the original sky survey, and they have better overlap being made on 5° centers. This scheme was repeated for the POSS-II, whose film copies are still being distributed, for which the Palomar Schmidt was optically upgraded (and renamed Oschin) as well as using the new III-class plates. The original survey was reviewed by Lund and Dixon (1973 PASP 85, 230). Some catalogs still list rectangular (x,y) coordinates on the PSS prints, since celestial coordinates were not easy to use in the days before digitized images (you can still find the plastic coordinate overlays for each image). In the years just before the POSS-II, a short-exposure yellow-light survey ("Quick V") was done from Palomar to produce the HST Guide-Star Catalog, to reduce effects of proper motion since the late 1950s. The southern surveys were recent enough not to need such a repeat performance. There are efforts to catalog objects from scans of the POSS plates, such as the USNO catalogs, Minnesota APS effort and the Edinburgh COSMOS group.

For much of the northern sky in particular, the leading optical survey role now is taken by the Sloan Digital Sky Survey (SDSS). This used a wide-feld 2.5m telescope to scan over 1/4 of the sky in five bands (ugriz), generating extensive photometric catalogs which are tightly cross-calibrated between great-circle scans, and selected objects for spectroscopy from these images (over a million for the initial survey, more in later sequels). All this is available for SQL database query and retrieval of derived properties or the full 2D FITS data. Some important aspects have been timed public data releases, extensive data quality control, and the team systematically making products "better than they have to be" for the initial science goals (enabling the widest range of additional use). Smaller-area deeper surveys have also been done for more specialized purposes - maybe your target is in one of them!

Traditional ground-based observatories have lagged behind in producing useful archives, both because of expense and because the vagaries of weather and multiple observers make documenting the observations and data quality a real challenge. The Isaac Newton Group at La Palma has a usable archive, as does the Canada-France-Hawaii telescope and (recently) Keck. The Gemini observatory has as a built-in requirement filling and maintaining an archive, as does ESO's Very Large Telescope. KPNO and CTIO run a minimal "save-the-bits" operation which could be turned into an archive, but for now retrieval really is an emergency procedure.

Additional radio surveys online include the NRAO VLA Sky Survey, a 20-cm survey with different surface-brightness sensitivity than FIRST. Raw data have been archived at the VLA since very early in its operation, for those who want to reprocess them to modern standards. Single-dish surveys exist in both skymap and catalog form, having largely been done with scanning patterns in either right ascension or declination, while the VLA surveys use a large number of slightly overlapping short pointings.

A related issue is availability and searchability of catalogs, compilations of derived quantities. Most of these are in fact available electronically. Whether planning observations or writing a paper, it's a really good idea to make sure you're current on what's already known. Data collections keyed to specific objects are SIMBAD primarily for stars ( US mirror site) and the NASA Extragalactic Database (NED) for extragalactic objects (galaxies and QSOs). One generally wants to access catalog data either "horizontally" (a set of data uniformly derived for all objects in a class) or "vertically" (a wide range of data from various sources for a single object), and the collection strategies must be tuned to these needs.

Easy access to these data resources has changed the practice of astronomical research, and this will only continue in coming years. Many projects once requiring dedicated observations with a 1-m telescope can now be done at your desktop using SDSS and 2MASS data. Particularly for imaging and high-dispersion spectroscopy, there's a great deal of information in HST data which goes beyond the goals of the original proposal, and by now some large programs are done in a public-service mode (a classic example is the set of Hubble Deep Fields). The success of the HDF and SDSS models makes large public projects more and more attractive. In the temporal domain, the INternational AGN Watch includes observations by more than 100 astronomers spanning nearly 20 years.

Other rich online resources exist, beyond the strict definition of an archive, which slice and dice astronomical knowledge in different ways. The NASA Astrophysics Data Service lets you search the literature - by author, abstract words, keywords, forward and backward citation, objects mentioned... NED (concentrating on galaxies) and SIMBAD (stars) organize information by object identifier - cross-IDs, basic data, literature references. Catalog services like ViZIER deliver tabular material from the literature, with the ability to filter and plot. These have growing interlinkages - the HST archive will tell you what publications came from a program with certain data, and NED will lead you to the datasets involved on an object. Beyond this, there are databases of atomic and molecular data, and simulation results available freely. Software is increasingly released for general use as well. All these are strides toward the era of the Virtual Observatory, whose goal is to make the myriad formats and retrieval protocols of these resources as transparent as possible. Skyview could be viewed as a first toy model VO, but the idea goes way beyond this. (One implemented example: IRAF can now accept a URL anywhere a file name is expected). Retrieval of infmration from many of these repositories can be automated and built into scripts, for example.

You can use these resources to find to the right acknowledgements for data or code you use. This is as good a random place as any to point out that the major astronomical journals have standardized their abbreviations for bibliographies which see wide informal use as well: AJ, ApJ, A&A, MNRAS, PASP, ApJSuppl, ApJL, and so on.

Some archive and search links

« Data presentation and standards | Some bits of career advice »

Course home page | Bill Keel's Home Page | Image Usage and Copyright Info | UA Astronomy

wkeel@ua.edu
2014  © 2000-2014