NED and SIMBAD Conventions for Bibliographic Reference Coding

M. Schmitz (1), G. Helou (1), P. Dubois (2), C. LaGue (1), B. Madore (1), H. G. Corwin Jr. (1) & S. Lesteven (2)

(1) Infrared Processing and Analysis Center, Jet Propulsion Laboratory, CALTECH, Pasadena, CA 91125, USA
(2) CDS, Observatoire astronomique de Strasbourg, 11 rue de l'Université, 67000 Strasbourg, France

Published in "Information & On-line Data in Astronomy", D. Egret & M.A. Albrecht, Eds., Kluwer Acad. Publ. (1995), p. 259.


The uniform 19-digit code used for bibliographic references within NED and SIMBAD was developed by both teams in consultation with Dr. H. Abt, editor of the Astrophysical Journal. The primary purpose of the ``REF_CODE'' is to provide a unique and traceable representation of a bibliographic reference within the structure of each database. However, in many cases, the code has sufficient information to be quickly deciphered by eye, and it is used frequently in the interfaces as a succinct abbreviation of a full bibliographic reference. Since its inception, it has become a standard code not only for NED and SIMBAD, but - with minor variations - for ADS and other bibliographic services. In addition, the acronyms for journals used as part of the code have become standards for some of the main astronomical journals in their own bibliographies.

Our main consideration in designing the REF_CODE was to make its definition as objective as possible. This helps to avoid having the history of data entry affect the naming system; allows automatic coding to some extent; avoids confusion, conflicts, and ambiguities in its meaning; lets different individuals or teams construct REF_CODEs without having to resort to constant consultation on the details of the code; and facilitates exchange between databases (e.g. NED and SIMBAD).

Reference Coding: Definition

The standard code is a string 19 characters long, a combination of fields, some numerical and some alphabetic, exactly predictable for journal articles, but not necessarily for books. The format is as follows, with the various fields explained below. Blank spaces within the string are replaced with periods, and no leading zeros are allowed in volume and page numbers.


The four digits of the year of publication.
Code for the publication, entered left-justified within the five spaces. Five categories are distinguished:
Periodicals (including both regularly-published periodicals and occasional publications): these codes are acronyms based on the names (as in ApJ, A&A, PASJ, MNRAS), and are reserved for all years. The codes for the journals that NED presently scans directly are given in Table 1. Codes for journals currently scanned for the SIMBAD bibliography are given in Table 2, and a sample of codes for less-frequently encountered journals are given in Table 3. A complete listing of these tables is available on the World-Wide Web, on the NED and SIMBAD servers.

Catalogs these codes are generally built from ``standard'' abbreviations of the catalogs' names. Examples are UGC, ESO, RSA, and RC3. If the catalog is a multi-volume work, the volume number is inserted in the Volume field (see below). The codes for some often-used extragalactic catalogs are listed in Table 4.
Books (by which we mean all other monograph-length publications): the codes in this category are constructed in essentially the same way as those for periodicals and catalogs, from some or all of the initials (or following letters) of the title. While there is clearly some freedom in assigning codes to books, it is not necessary for the user to be able to identify a random book from its reference code (the database interface does the decoding as needed). Note also that the same code combined with a different year points to a different book.
thesis (primarily doctoral theses, but occasionally includes masters theses): these codes are acronyms based on the name of the university granting the degree (see Table 5 for examples; the complete list is available on-line). For theses, the volume number field (``VVVV'' below) contains ``.T00''. In the case of duplicate author initials, the ``.T00'' becomes ``.T01'', ``.T02'',
this, unfortunately, is unavoidable as a category. If the reference is to a collection of data never described in print, then this field will contain the code ``UNPUB''. Private communications to NED or SIMBAD carry the code ``PrivC''.

Volume number, right-justified, if the reference is to a periodical; otherwise, the second character in this field is a letter that serves as a classification flag. The following flags and classes of books are presently identified:
digitized version (magnetic tape, CD-ROM, etc.)
report or conference proceeding

For multi-volume books, catalogs, and reports, the volume number is given in the last two digits.

This field is intended to break any remaining ambiguity after volume number, page number, and author's initial have been specified. It is used only when necessary, as in the following two classes of problems:

One class of ambiguities results when there are two or more independent page sequences within the same volume number, in which case the following codes are reserved for this field:

Letters sections in various journals
Pink pages in MNRAS
a, b, ..., z
Issue numbers within the same volume, each of which starts with page 1 (e.g. Physics Today).
A, B, ..., K
Issue designations used by publisher within same volume, where each issue starts. with page 1.

Another class of ambiguities results when there are two or more articles on the same page, as in Nature. Such articles starting on the same page are numbered sequentially in their order of appearance, and a code corresponding to this order is inserted in this field. In that case, the code has values

Q, R, ..., Z
First, second, ..., tenth article on the page.
For Theses, this field contains the author's first initial.

Page number of reference, or ``...0'' when the whole book is referenced. This field contains the page numbers, which are right-justified within the four spaces available, preceded by periods to fill empty spaces.
This field contains the first letter of the first author's last name. This provides some redundancy in the code which might be useful in tracking down errors. If the first author cannot be identified, or no authorship is expressed, a colon (:) appears in this field. When the REF_CODE as a whole does not follow the standard rules described above (which might happen for books) a percent sign (%) is inserted in this field. This field is case sensitive.

Here are some examples to illustrate the use of the reference code:

1983ARA&A..21..177S    Stein and Soifer. 1983,  Ann. Rev. Astron.
                               Astrophys.  21 177. 
1988ApJ...324..767W    Ward et al. 1988,  Astrophys. J.  324 767. 
1988ApJS...66..183J    Jura. 1988,  Astrophys. J. Suppl.  66 183. 
1988PASP..100..625S    Sandage. 1988,  Publ. Astron. Soc.
                                Pacific  100 625. 
1988Natur.331.6157B    Bergvall. 1988,  Nature  331 6157. 
1976ApJS...31..187D    Dressel and Condon. 1976,  Astrophys. J
                               Suppl.  31 187. 
1978IAUC.3305....1K    Kowal, Lo, and Sargent. 1978,  IAU Circ
                                No.  3305 
1988A&A...206L..23M   Maurogordato et al. 1988,  Astron
                                Astrophys.  206 L23. 
1984IRSD..R....118G    Gatley. 1984, in  Lab. and Obs. IR Spectra of IS Dust
                             proc. of the Hilo Workshop, July 1983, ed.
                             Wolstencroft and Greenberg, p. 118. 
1909UCB...T00E....F    Fath, E. A. 1909,  The Spectra of Some
                             Spiral Nebulae and Globular Star Clusters
                               thesis, Univ. of Calif., Berkeley.


The Bibliographic Reference Code is a domain-specific code which was designed to be sufficient for the immediate needs of astronomy in uniquely, succinctly, and informatively identifying bibliographic references. Nevertheless, the REF_CODE proved to be general enough to encompass most of the existing astronomical literature. But these REF_CODEs were not explicitly designed to be so general that they were guaranteed to automatically encompass all presently available media, nor do they necessarily fully anticipate future directions in publishing.

In combination with a descriptive reference database, the cryptic form of the REF_CODE can be (and is) attached to a more extensible information listing. For instance, while the REF_CODE carries only the first page number of a reference, the Reference Database carries the first and last page numbers of the article. Obviously, the same qualifications apply to titles and authors which are highly abbreviated in the REF_CODE, but more fully represented in the Database.

The same principles could be used to fully link a REF_CODE to data cubes, CD-ROMs, external databases, animations, simulations, time-tagged data, etc. While the Reference Code is compact, it is not yet saturated; there are still fields with room for added pointers to the new directions that the publishing of astronomical data may take in the immediate future.


We thank Helmut Abt and the rest of the NED and SIMBAD groups for their help in defining the reference codes. Table 2 has been prepared with the kind help of Suzanne Laloë at the Institut d'Astrophysique de Paris. NED is a research support program operated by the Jet Propulsion LaboratoryJPL, California Institute of Technology, under contract with the National Aeronautics and Space Administration (Astrophysics Division, Science Operations Branch). SIMBAD is maintained by the Centre de Données astronomiques de Strasbourg, France.