Electronic Publication and Data Distribution
for the Five College Astronomy Department

Karen M. Strom

Five College Astronomy Department,
University of Massachusetts,
Amherst, MA 01003

Abstract

The Star Formation Group at FCAD is making use of the World Wide Web to:
  1. explore the advantages of hypermedia presentation for the distribution of preprints, Ph. D. theses and observatory publications;
  2. create hyperlinked catalogs of astronomical data which enable not only the recovery of tabular data but instantaneous links to abstracts of the original references;
  3. create on-line resources of spectroscopic and image data.
We here illustrate the current status of this effort. The Herbig-Bell Catalog of Emission Line Stars (Herbig & Bell 1988) and the Catalog of Herbig-Haro Objects (Reipurth 1994) have both been translated into HTML and linked to the ADS abstract server, to each other, to HTML versions of the Star Formation Newsletter (for abstracts of newer papers) and to papers that are available on line.

As a byproduct of this work, a method for displaying subscripts (e.g. M_sun, Delta_nu_D) and superscripts (e.g. T^4, C^18O) using the minimum size and number of bitmaps has been developed. This set of bitmaps is publically available.

Introduction

The explosion in scholarly literature has recently far outstripped the traditional, paper based methods of scanning the literature and locating that fraction of the published work relevant to the research of an individual or group. Many researchers no longer subscribe to even the most basic journals in their fields because they cannot hope to store the massive amount of paper in their offices. Libraries are confronted by increasing subscription rates, partly to compensate for the decline in individual subscriptions, and are therefore being forced to choose between journals and books. Departmental libraries are also becoming a thing of the past in many institutions where their collections have been turned over to central libraries which have moved them ever farther from those who require them on a daily basis. As an adjunct to this problem, the rapidly growing databases, combined with the costs of publication, have meant that a smaller and smaller fraction of the data associated with a given paper is made publically available. An associated phenomenon is the disappearance of catalogs of similar objects from the archival literature. A few years ago, these could be found among the Observatory publications (see the Catalog of Emission Line Stars, Herbig & Bell 1988). Today catalogs are issued in electronic form only, (see A General Catalogue of Herbig-Haro Objects, Reipurth 1994) usually via anonymous FTP, perhaps in LaTeX format.

Some of these problems have been addressed in an ad hoc manner for at least the last decade and a half by the growth of a preprint culture, both to make research results more immediately available and to direct circulation to those most interested in the work. This does not address the issue of the availability of the data sets, but does make the approximately correct subset of the literature available to those working in a field in a more timely manner than does journal publication.

For the past decade or longer, authors have been preparing their own manuscripts as word-processing and formatting software has become almost universally available. Figures are prepared by the author using a variety of more and more sophisticated computer drawing tools. The widespread availability of the Internet has made it possible to transmit these electronic files easily and rapidly around the world. The process of writing a paper with a collaborator across the country, or in another country, has been greatly facilitated by these advances.

Most of the professional journals in astronomy are now accepting submissions in LaTeX, but each journal has developed its own set of macros which is designed, at least in part, to reproduce the traditional "on paper" appearance of that journal. Astronomers using electronic preprint servers (Ginsparg 1994) must collect a set of LaTeX macros for each journal of interest and update their collection as new versions are issued. This very shortly becomes a burdensome task. In response, many papers are sent to the preprint servers in Postscript files, but these files, particularly when figures are included, can be very large. On the other hand, most papers, whether in LaTeX or Postscript, do not contain the figures. They must be acquired separately. In neither case is much data readily accessible to the reader. At the same time, the journals consider the acceptance of the LaTeX files as a big step in their transition to "electronic publication," but the time between acceptance of a paper and the appearance of that paper in the journal is six months for The Astrophysical Journal, four months for the Astronomical Journal. The median "age" of a paper referenced in these journals is between four and five years, with many of the referenced papers being still in press. It is clear that advances in manuscript preparation have not been matched by increases in distribution speed - the first and most easily achievable gain which should result from some standardization of the format for manuscript submission.

For the past two years astronomers working in the field of star formation have been kept abreast of the developments in their field by the Star Formation Newsletter, edited by Bo Reipurth of the European Southern Observatory. This newsletter is distributed monthly by email to a community of over 800 astronomers worldwide. A standard LaTeX form is used in which astronomers enter the abstracts of their recently accepted papers. These are then emailed to a central collection point and assembled into the monthly newsletter. The recipients can then simply strip off the email header and process the LaTeX file to read the newsletter. Abstracts of Ph.D.theses, notices of upcoming meetings and new books are also included. The newsletter helps to eliminate the previous hit and miss circulation of preprints, especially for younger workers. Some of the astronomers have also submitted their preprints to an electronic preprint distribution service to make actual retrieval of the preprint possible without the distribution of a large amount of paper through the mail system. However, this system suffers from the above stated disadvantages, as well as the fact that many authors have their own LaTeX or TeX macro files which they neglect to include.

There are enormous advantages to having papers available electronically with their associated data sets. They can be made available almost instantly upon the completion of the refereeing process. They are easily searchable, and, if the tabular material is in an HTML document (if small) or other easily interpretable form, the data is instantly accessible. Large data sets can be made available in standard, generally accepted formats. (Tabular data presented in Postscript format is not easily accessible in numerical form.) More sophisticated graphics are easily and cheaply included. Links can be built into the paper to references and abstracts that are on-line as well as to other non-standard materials not usually available to readers of the papers.

In order to address some of the problems outlined above, we have begun a preprint and data distribution service for the papers and data of the Star Formation group in the Five College Astronomy Department from our World Wide Web server.

Preprints

In order to circumvent some of the problems discussed above, I undertook to make the preprints from our Star Formation group available in HTML over the World Wide Web. Since over 800 astronomers had readily adapted to receiving the Star Formation Newsletter by email, it would appear possible to convince them to obtain their preprints on line in an instantly readable format. This format would allow me to pursue alternate means for presentation of graphical material (for instance, the use of color graphics), possible inclusion of movies and the ability to link to other papers on line, link the references to the ADS Abstract database, and link to other material available on line but absent from the archival literature.

Since the astronomy journals are now accepting manuscripts in LaTeX, the obvious method for conversion of these manuscripts into HTML is by use of Nikos Drakos' LaTeX2HTML. This perl script will translate a LaTeX manuscript into a set of linked HTML pages, translating the mathematical expressions into transparent GIFs. These images are then placed in the text as in-line images. This program creates a new in-line image each time a math mode entry was encountered, no matter how many times the identical image had been created before in the same document. This resulted in the creation of tens of identical images for commonly used symbols and expressions of units, such as cm^2 or s^{-1}". Since all images were installed with ALIGN=BOTTOM, e.g. cm2 or s-1, the resulting text could be difficult to read. It is also true that math mode expressions including subscripts would appear to float above the rest of the text on the line,balanced on the subscript e.g. Teff. Of course, because these expressions were inserted in the text as images, the font would rarely match that of the rest of the text.

In response to these practical and perceptual problems, I developed an image library of the most commonly used subscripts, superscripts and mathematical symbols used in the manuscripts which I had converted to HTML. This library was designed with several considerations in mind.

The first item on this list concerns the fact that, when the entire expression is captured as an image, with default font settings, as in LaTeX2HTML, the expression tends to look out of place in the rest of the text generated by the browser, no matter which font is selected. However, if only the sub- and superscripts are generated as bitmaps, this problem is greatly allieviated. Many times letters are the main symbols used and are therefore simply part of the client generated text. The sub- and superscripts are always generated in another font and are much smaller so that few differences in the structure of the character are possible or noticeible.

The last two items take into consideration the default image cache size of most browsers, and the fact that the least recently accessed image will be the first discarded. The typical size of a 2 - 5 color image used as a figure in these preprints is 4 - 8 kilobytes. The typical size of a bitmap to be used as a sub- or superscript is 0.06 kilobytes. Thus our third consideration should easily be met.

These three considerations thus work in concert to provide a clean-looking, easily readable paper, taking minimal time to load each page, thus allowing browsing as well as reading in depth without forcing the scientific community to adopt another, temporary, shorthand for complex mathematical expressions or simply delaying the onset of electronic publication of journal papers.

The library includes the Greek letter set previously made available, the numerals, the entire alphabet, both capital and lower case, and some special symbols and letter combinations of common use in astronomy. To implement this set of transparent GIFs, perl scripts were written to preprocess the manuscripts to insert the in-line images as required. LaTeX2HTML is then used to convert the manuscript to HTML pages and a post-processing perl script is used to clean up the few things that may have been affected by the LaTeX2HTML conversion process. The manuscript is then ready for insertion of the images, tables and references.

As a small sidelight, I note that the display of subscripts is a simple use of the HTML tag for displaying an in-line image. The unexpected subscript effect occurs because, in this case, the image is smaller than the text height. Thus when you specify <IMG ALIGN=MIDDLE ALT="_sun" SRC="/kicons/smsun.gif">, the middle of the small image is aligned with the base line of the text. Thus a subscript, M_sun , is displayed. This method is also used to place the Greek letters having trailers properly on the baseline. The images are filled out with sufficient blank pixels below the letter that setting ALIGN=MIDDLE will properly locate the letter. I also urge users of this library to always make use of the ALT = "xxxx" option so that people accessing your files with Lynx may more easily read your pages. This is easily accomplished when the substitutions are made in the manuscript by the perl script.

Although the new versions of LaTeX2HTML allows the option of conversion of Postscript figures, at a user-specified scale, into transparent GIFs in-lined into the text, I prefer to exercise more control over the scaling and appearance of the figures because the figures were initially designed for display on paper. (See my online tutorial on using images on the World Wide Web.) I display the file using ghostview and then scale the image to the desired size, then capturing the image using xv. For some images, in particular, gray-scale and full color Postscript files, ImageMagick may be better for this conversion process. I have chosen to place the scales, axis labels and other labeling text in a deep blue to separate it from the text. However, although the figure on the printed page may be very small, and we are not constrained by space considerations on the HTML page, the spatial resolution available to us is still less than that in the paper representation. Therefore I have chosen to make use of the fact that color is an option available at no extra cost to the server. When different line styles or point styles are called for, different colors are substituted by making use of xpaint. When the desired changes are completed, the file is converted into a transparent GIF and placed in the HTML document at the appropriate location. There is much more flexibility in figure placement in HTML documents as one is not forced to manuever within fixed page sizes. For this reason, figures may be more reasonably located with respect to their textual references.

There are certainly some very complex figures for which this technique will not be appropriate. In that case a thumbnail or postage stamp image can be made, either by downscaling the entire image or by cropping a recognizable section out of the image for insertion into the text. This smaller image can then act as a link to the Postscript (or GIF) high resolution figure.

Tabular material presents different problems. Small tables can be placed within an HTML document by using the <PRE> tag. Larger tables are only practical, at the moment, as Postscript files linked to the text at the appropriate point (but see the Catalogs section below). The use of HTML 3 should allow easer introduction of tabular material into the text. Very large tables can be associated with papers in other formats more appropriate to the data.

With original manuscripts in LaTeX using the AASTeX macros, the conversion of the listing of references is not easily handled by LaTeX2HTML. However it is a trivial matter to use global substitutions to replace the abbreviations used for the macros with those appropriate for the references section of the paper. The large value added available to astronomers is the possibility of linking to the ADS Abstract server. By adding these links, the reader has immediate access to the abstracts of the papers referenced, allowing him to judge the relevance of this paper to the information sought. This can be especially important to people located at institutions where they may have easy access to the Internet but no library down the hall. Within the last two weeks, we have made available on the World Wide Web the first of what we plan to be many Ph. D. theses. We hope that, by undertaking this publishing venture as a department, we will encourage other departments to join us in making this material, so often neglected, more widely available.

Catalogs

Last spring I received a request from Bo Reipurth of the European Southern Observatory to examine his preliminary Catalog of Herbig-Haro Objects and to make any corrections or additions necessary before he made it available via anonymous FTP from the ESO FTP server. Herbig-Haro objects are shock excited nebulosities associated with bipolar outflows from young stellar objects. These objects may be the only optically visible manifestation of star formation occuring within a dense molecular cloud since the outflow may have punched a hole through to the cloud exterior. Therefore a comprehensive catalog of these objects is a great aid in the study of the early stages of star formation. The imminent appearance of a new catalog of objects associated with the star formation process, solely in electronic form, motivated me to think of making an older, but still extremely useful, database, the Herbig-Bell Catalog of Emission Line Stars, available in electronic form as well. Instead of making these catalogs available solely for on-line browsing, I elected to create hypertext documents, linking them not only internally, but also to the online database of abstracts of astronomical papers.

The Herbig-Bell Catalog of Emission Line Stars

The Third Catalog of Emission Line Stars (Herbig & Bell 1988, HBC) was distributed solely as a Lick Observatory publication. It is a catalog of pre-main-sequence stars which have had slit spectra taken to confirm their nature, as opposed to the much larger list of objects detected by objective prism or filter techniques. (The second edition (Herbig & Rao 1972) was published in The Astrophysical Journal.) Included in this catalog of approximately 750 objects is the complete coordinate information, magnitude, color, and variability in bands from the x-ray through the radio (less the IRAS data), spectral type and emission line information, both radial and rotational velocity information and the references for all of this information as well as the identification of the molecular cloud in which it is found. Of course, not all this information was available for every star, but it is a considerable database of detailed information bound together in an easily portable reference. While a few copies of the catalog were distributed as 80 column card images on 9 track tapes, the catalog has basically been carried from observatory to observatory in briefcases for almost 8 years.

The Herbig-Bell Catalog of Emission Line Stars had always been somewhat awkward to use because the data for a single star was spread over two pages which were printed in landscape mode. There were many columns empty for all but the most frequently observed stars making it difficult to follow the correct line across the page. In making the electronic version, I have linked the catalog number of the object to its companion entries on the alternate pages so that the top of the window will act as a guide line across the columns. An asterisk in the notes column is linked to the note for that object. All references contained within the database of journal article abstracts used by the ADS Abstract Server were linked to these abstracts to provide more information on the contents of the reference.

Because of the difficulty of reading the table and the unlikelihood of the need for all of the information spread across the two pages, I also provided a forms-based access to the data. This form requests the catalog number of the object desired and then returns all of the most commonly desired data for the object as well as any other data that was requested in a nicely formatted page.

The Reipurth Catalog of Herbig-Haro Objects

When Reipurth released his public version of the Catalog of Herbig-Haro Objects, I began the process of converting it into a hypertext document. The Catalog was formatted in LaTeX and consisted of a table of the basic data, extensive notes on each object, heavily referenced to the literature, published, preprint and still in preparation. The catalog contains approximately 250 objects and lists for each object the catalog number, any other designations, the best available positions, the source of the outflow, if known, the name of the star formation region in which it is found and the distance of the object. Each catalog number was linked to the notes for that object. In some instances the suspected source of the outflow was listed in the HBC. In those cases, the source was linked to the entry in the HBC. The notes for each catalog object are heavily referenced. The majority of the references in this currently very active field are available in the ADS Abstract server and have been linked to those abstracts.

There were two problems posed by the references in this catalog. Among the references that predated the beginning of the compilation of the abstract database by NASA were approximately 25 references with dates ranging from 1894 to 1960, references which might be difficult to obtain for people located at smaller institutions. The librarian at Kitt Peak National Observatory kindly copied either the title page and abstract when available, or just the first page of the article and forwarded these copies to me. From these copies, either the title, author, citation and abstract were entered, or else the first few paragraphs of the paper were used in place of the abstract. In a few cases, the papers in question were notes so short that the entire paper was entered.

Due to the extremely active state of this field of research, there were a very large number of papers (25 - 30% of the references) too recent to be found in the abstract database as yet. To make the abstracts for these papers available, we placed the entire set of the Star Formation Newsletters on line, using LaTeX2HTML and the perl scripts, and linked the catalog references to these abstracts. This procedure was almost entirely successful, leaving only those papers which are still in preparation and the papers published between 1960 and 1975, which I felt were certainly more easily accessible than the pre-1960 papers. As these papers which were still in preparation appear in the Star Formation Newsletters, the links are made to the catalog references.

The Star Formation Newsletters

The Star Formation Newsletters were placed on line in order to have immediate searchable access to the most recent abstracts of refereed papers. However, in doing so we found that it was necessary to integrate into the text a much larger number of symbols than was necessary for the proper display of our preprints because of the wider purview of the Newsletter. As a result we undertook a general expansion of the image library to cover most of the cases we would encounter. We also constructed the table of contents for these newsletters in two forms:
  1. a single document containing all of the papers included thus far in the newsletters with each title linked to the abstract for the paper. This document would be large but would allow an easy search through the entire database for the desired paper.
  2. individual table of contents files for each issue of the newsletter. these documents would be much smaller but require knowledge of the publication datem to be used effectively. These titles were also linked to the abstracts.
When the new issues arrive each month they are added to the available database. When papers are published, their final references are added to the newsletters to complete the cycle.

Additional Data Services

We have applied some of the ideas developed for use in electronic manuscript conversion and catalog delivery to some specific databases useful to astronomers engaged in the study of young stellar objects and red and infrared spectroscopy. I will briefly describe here a few of these services.

Pre-Main Sequence Evolutionary Tracks

While we have plans to make extensive databases of our own available through our World Wide Web pages, the first data set that we made available was the set of pre-main sequence evolutionary tracks computed by Francesca D'Antona and Italo Mazzitelli . These authors did not have access to a high speed connection to the Internet and agreed that we should make their models available from our server. Accompanying the data files themselves is a hypertext README file written by Dr. D'Antona, plots and a hypertext README file for a set of plot macros supplied as a guide to the use of the tracks. Access to these files has been welcomed by the star formation community and references to this online resource are beginning to appear in the literature.

2µm Spectra of Standards

Spectra in the 2µm window for a set of 26 spectral standards were published in 1986 by Susan Kleinmann & Donald Hall (1986). These spectra have gained in usefulness with time as the increased sensitivity of infrared detectors has allowed us to take spectra in this wavelength region of much fainter stars, otherwise inaccessible to optical techniques. Infrared spectroscopy has now taken its place as a commonly used technique for the exploration of the character of newly accessible types of objects in the Universe. With this development, access to this never before available data set became crucial for many groups of astronomers.

Susan Kleinmann made the data set available to me, and I used a utility in STSDAS to convert the ASCII files into standard IRAF image files for plotting purposes. Plots were made for all of the spectra to accompany the distribution of the data files. The spectra were written as standard FITS (Flexible Image Transport, Wells, Greisen & Harten 1981) files and made available as a compressed tar file. The plot files were also made available as a compressed tar file. A hypertext README file describing how to display the plots was given as well as a hypertext README file, written by Susan Kleinmann, on the spectra themselves.

Standard Spectra in the Red (5600 - 9000 Å) Region

In order to establish standards for spectral classification in the red region for use in our investigations of the stellar population on star forming regions, we concentrated on obtaining a large number of spectra of stars in the Praesepe cluster(age = 70 Myr) and in M67 (age ~ 3 - 5 × 10^8 yr). We have obtained 107 spectra of stars on the Prasepe main sequence with spectral types ranging from F1 - M4. In M67 we have spectra of 16 stars on the giant branch, 26 subgiants (F5 - K0) and 104 stars on the main sequence with spectral types from B8 - G9.

The spectra of these stars are available, with the associated paper (Allen & Strom 1995), as FITS files of images in the IRAF multispec format. Plots of a selected set of spectra are also available as well as tables of the stellar identifications, positions, magnitudes and colors.

Standard Spectra in the near infrared region (1.25µm & 1.65µm)

In a related project, Michael Meyer, Suzan Edwards, Stephen Strom and Kenneth Hinkle have had a long-term observing project on the Kitt Peak National Observatory 4-meter telescope using the Fourier Transform Spectrograph to obtain an almost complete set of 1.25µm and 1.65µm spectra of the Morgan-Keenan spectral standards. We hope that by November we will be able to make these spectra available as well. There will be approximately 75 stars over the entire range of temperature and luminosity represented in this sample.

Acknowledgements

I wish to express my gratitude to my summer student and aide, Jessica Norman, without whose help much of the work on the Star Formation Newsletters and the Catalog of Herbig-Haro Objects would not yet be finished. I also wish to thank Cathy Van Atta, the librarian at Kitt Peak National Observatory, for her help in obtaining the older reference material.

References

Allen, L. & Strom, K.M. 1995, A.J., 109, 1379.
The red region standard spectra can be found at:
http://www-astro.phast.umass.edu/ASsp.html.
The ADS Abstract Server can be reached at:
http://adswww.harvard.edu/abs_doc/abstract_service.html.
D'Antona, F. & Mazzitelli, I. 1994, Ap. J. Suppl., 90, 861.
The model tracks and the plots can be retrieved from:
http://www-astro.phast.umass.edu/fac/tracks.html.
Drakos, N. The manual for LaTeX2HTML is available from:
http://cbl.leeds.ac.uk/nikos/tex2html/doc/manual/manual.html.
Ginsparg. P. 1994, Computers in Physics available at:
http://publish.aps.org/EPRINT/KATHD/ginsparg.html
Herbig, G.H. & Bell, K. Robbin, 1988, Third Catalog of Emission-Line Stars of the Orion Population, Lick Observatory Bulletin No. 1111.
The hypertext version of the catalog can be found at:
http://www-astro.phast.umass.edu/latex/HBC/HBC.html.
Herbig, G.H. & Rao, N.K. 1972, Ap.J., 174, 401.
The IRAF home page is at:
http://iraf.noao.edu/iraf-homepage.html;
IRAF Support Services can be found at:
http://iraf.noao.edu/support.html;
Manuals are available at:
ftp://iraf.noao.edu/iraf/docs.
Kleinmann, S. G. & Hall, D.N.B. 1986, Ap. J. Suppl., 62, 1986.
The spectra and plots can be retrieved from:
http://www-astro.phast.umass.edu/khsp.html.
Reipurth, B. 1994, A General Catalogue of Herbig-Haro Objects,
The LaTeX files can be retrieved from:
ftp://ftp.hq.eso.org/pub/Catalogs/Herbig-Haro/.
The hypertext version can be found at:
http://www-astro.phast.umass.edu/latex/HHcat/HHcat.html.
The Star Formation Newsletter, ed. Bo Reipurth is available as Postscript files from:
http://http.hq.eso.org/star-form-newsl/star-form-list.html,
as a searchable WAIS index from:
wais://http.hq.eso.org:2010/starform
or as complete hypertext from:
http://www-astro.phast.umass.edu/latex/sfnews/no1/toc2.html.
The STSDAS homepage is found at:
http://ra.stsci.edu/STSDAS.html;
the STSDAS Documentation home page is at:
http://ra.stsci.edu/Document.html.
Tufte, Edward R. 1983, The Visual Display of Quantitative Information,
Graphics Press, Cheshire, CT.
Wells, D. C., Greisen, E. W. & Harten, R. H., 1981, A & A Suppl., 44, 363.
The updated electronic archive can be reached at:
http://fits.nrao.edu/FITS.html.

kstrom@hanksville.phast.umass.edu