As a byproduct of this work, a method for displaying subscripts (e.g. M, ) and superscripts (e.g. T, CO) using the minimum size and number of bitmaps has been developed. This set of bitmaps is publically available.
Some of these problems have been addressed in an ad hoc manner for at least the last decade and a half by the growth of a preprint culture, both to make research results more immediately available and to direct circulation to those most interested in the work. This does not address the issue of the availability of the data sets, but does make the approximately correct subset of the literature available to those working in a field in a more timely manner than does journal publication.
For the past decade or longer, authors have been preparing their own manuscripts as word-processing and formatting software has become almost universally available. Figures are prepared by the author using a variety of more and more sophisticated computer drawing tools. The widespread availability of the Internet has made it possible to transmit these electronic files easily and rapidly around the world. The process of writing a paper with a collaborator across the country, or in another country, has been greatly facilitated by these advances.
Most of the professional journals in astronomy are now accepting submissions in LaTeX, but each journal has developed its own set of macros which is designed, at least in part, to reproduce the traditional "on paper" appearance of that journal. Astronomers using electronic preprint servers (Ginsparg 1994) must collect a set of LaTeX macros for each journal of interest and update their collection as new versions are issued. This very shortly becomes a burdensome task. In response, many papers are sent to the preprint servers in Postscript files, but these files, particularly when figures are included, can be very large. On the other hand, most papers, whether in LaTeX or Postscript, do not contain the figures. They must be acquired separately. In neither case is much data readily accessible to the reader. At the same time, the journals consider the acceptance of the LaTeX files as a big step in their transition to "electronic publication," but the time between acceptance of a paper and the appearance of that paper in the journal is six months for The Astrophysical Journal, four months for the Astronomical Journal. The median "age" of a paper referenced in these journals is between four and five years, with many of the referenced papers being still in press. It is clear that advances in manuscript preparation have not been matched by increases in distribution speed - the first and most easily achievable gain which should result from some standardization of the format for manuscript submission.
For the past two years astronomers working in the field of star formation have been kept abreast of the developments in their field by the Star Formation Newsletter, edited by Bo Reipurth of the European Southern Observatory. This newsletter is distributed monthly by email to a community of over 800 astronomers worldwide. A standard LaTeX form is used in which astronomers enter the abstracts of their recently accepted papers. These are then emailed to a central collection point and assembled into the monthly newsletter. The recipients can then simply strip off the email header and process the LaTeX file to read the newsletter. Abstracts of Ph.D.theses, notices of upcoming meetings and new books are also included. The newsletter helps to eliminate the previous hit and miss circulation of preprints, especially for younger workers. Some of the astronomers have also submitted their preprints to an electronic preprint distribution service to make actual retrieval of the preprint possible without the distribution of a large amount of paper through the mail system. However, this system suffers from the above stated disadvantages, as well as the fact that many authors have their own LaTeX or TeX macro files which they neglect to include.
There are enormous advantages to having papers available electronically with their associated data sets. They can be made available almost instantly upon the completion of the refereeing process. They are easily searchable, and, if the tabular material is in an HTML document (if small) or other easily interpretable form, the data is instantly accessible. Large data sets can be made available in standard, generally accepted formats. (Tabular data presented in Postscript format is not easily accessible in numerical form.) More sophisticated graphics are easily and cheaply included. Links can be built into the paper to references and abstracts that are on-line as well as to other non-standard materials not usually available to readers of the papers.
In order to address some of the problems outlined above, we have begun a preprint and data distribution service for the papers and data of the Star Formation group in the Five College Astronomy Department from our World Wide Web server.
Since the astronomy journals are now accepting manuscripts in LaTeX, the obvious method for
conversion of these manuscripts into HTML is by use of Nikos Drakos' LaTeX2HTML. This perl
script will translate a LaTeX manuscript into a set of linked HTML pages, translating the
mathematical expressions into transparent GIFs. These images are then placed in the text as in-line images.
This program creates a new in-line image each
time a math mode entry was encountered, no matter how many times the identical image had
been created before in the same document. This resulted in the creation of tens of identical
images for commonly used symbols and expressions of units, such as
cm or s. Since all images
were installed with ALIGN=BOTTOM
, e.g. cm or s, the resulting text could be difficult to read. It
is also true that math mode expressions including subscripts would appear to float above the
rest of the text on the line,balanced on the subscript e.g. . Of course, because these expressions
were inserted in the text as images, the font would rarely match that of the rest of the text.
In response to these practical and perceptual problems, I developed an image library of the most commonly used subscripts, superscripts and mathematical symbols used in the manuscripts which I had converted to HTML. This library was designed with several considerations in mind.
The last two items take into consideration the default image cache size of most browsers, and the fact that the least recently accessed image will be the first discarded. The typical size of a 2 - 5 color image used as a figure in these preprints is 4 - 8 kilobytes. The typical size of a bitmap to be used as a sub- or superscript is 0.06 kilobytes. Thus our third consideration should easily be met.
These three considerations thus work in concert to provide a clean-looking, easily readable paper, taking minimal time to load each page, thus allowing browsing as well as reading in depth without forcing the scientific community to adopt another, temporary, shorthand for complex mathematical expressions or simply delaying the onset of electronic publication of journal papers.
The library includes the Greek letter set previously made available, the numerals, the entire alphabet, both capital and lower case, and some special symbols and letter combinations of common use in astronomy. To implement this set of transparent GIFs, perl scripts were written to preprocess the manuscripts to insert the in-line images as required. LaTeX2HTML is then used to convert the manuscript to HTML pages and a post-processing perl script is used to clean up the few things that may have been affected by the LaTeX2HTML conversion process. The manuscript is then ready for insertion of the images, tables and references.
As a small sidelight, I note that the display of subscripts is a simple use of the HTML tag for
displaying an in-line image. The unexpected subscript effect occurs because, in this case, the image is
smaller than the text height. Thus when you specify <IMG ALIGN=MIDDLE ALT="_sun"
SRC="/kicons/smsun.gif"
>, the middle of the small image is aligned with the base line
of the text. Thus a subscript, M , is displayed. This method
is also used to place the Greek letters having trailers properly on the
baseline. The images are filled out with sufficient blank pixels below
the letter that setting ALIGN=MIDDLE
will properly locate the letter.
I also urge users of this library to always make use of the
ALT = "xxxx"
option so that people accessing your files with Lynx may more easily read your pages. This is easily
accomplished when the substitutions are made in the manuscript by the
perl script.
Although the new versions of LaTeX2HTML allows the option of conversion of Postscript figures, at a user-specified scale, into transparent GIFs in-lined into the text, I prefer to exercise more control over the scaling and appearance of the figures because the figures were initially designed for display on paper. (See my online tutorial on using images on the World Wide Web.) I display the file using ghostview and then scale the image to the desired size, then capturing the image using xv. For some images, in particular, gray-scale and full color Postscript files, ImageMagick may be better for this conversion process. I have chosen to place the scales, axis labels and other labeling text in a deep blue to separate it from the text. However, although the figure on the printed page may be very small, and we are not constrained by space considerations on the HTML page, the spatial resolution available to us is still less than that in the paper representation. Therefore I have chosen to make use of the fact that color is an option available at no extra cost to the server. When different line styles or point styles are called for, different colors are substituted by making use of xpaint. When the desired changes are completed, the file is converted into a transparent GIF and placed in the HTML document at the appropriate location. There is much more flexibility in figure placement in HTML documents as one is not forced to manuever within fixed page sizes. For this reason, figures may be more reasonably located with respect to their textual references.
There are certainly some very complex figures for which this technique will not be appropriate. In that case a thumbnail or postage stamp image can be made, either by downscaling the entire image or by cropping a recognizable section out of the image for insertion into the text. This smaller image can then act as a link to the Postscript (or GIF) high resolution figure.
Tabular material presents different problems. Small tables can be placed within an HTML
document by using the <PRE>
tag. Larger tables are only practical, at the moment, as Postscript
files linked to the text at the appropriate point (but see the Catalogs section below).
The use of
HTML 3 should allow easer introduction of tabular material into the text. Very large tables can be
associated with papers in other formats more appropriate to the data.
With original manuscripts in LaTeX using the AASTeX macros, the conversion of the listing of references is not easily handled by LaTeX2HTML. However it is a trivial matter to use global substitutions to replace the abbreviations used for the macros with those appropriate for the references section of the paper. The large value added available to astronomers is the possibility of linking to the ADS Abstract server. By adding these links, the reader has immediate access to the abstracts of the papers referenced, allowing him to judge the relevance of this paper to the information sought. This can be especially important to people located at institutions where they may have easy access to the Internet but no library down the hall. Within the last two weeks, we have made available on the World Wide Web the first of what we plan to be many Ph. D. theses. We hope that, by undertaking this publishing venture as a department, we will encourage other departments to join us in making this material, so often neglected, more widely available.
The Herbig-Bell Catalog of Emission Line Stars had always been somewhat awkward to use because the data for a single star was spread over two pages which were printed in landscape mode. There were many columns empty for all but the most frequently observed stars making it difficult to follow the correct line across the page. In making the electronic version, I have linked the catalog number of the object to its companion entries on the alternate pages so that the top of the window will act as a guide line across the columns. An asterisk in the notes column is linked to the note for that object. All references contained within the database of journal article abstracts used by the ADS Abstract Server were linked to these abstracts to provide more information on the contents of the reference.
Because of the difficulty of reading the table and the unlikelihood of the need for all of the information spread across the two pages, I also provided a forms-based access to the data. This form requests the catalog number of the object desired and then returns all of the most commonly desired data for the object as well as any other data that was requested in a nicely formatted page.
There were two problems posed by the references in this catalog. Among the references that predated the beginning of the compilation of the abstract database by NASA were approximately 25 references with dates ranging from 1894 to 1960, references which might be difficult to obtain for people located at smaller institutions. The librarian at Kitt Peak National Observatory kindly copied either the title page and abstract when available, or just the first page of the article and forwarded these copies to me. From these copies, either the title, author, citation and abstract were entered, or else the first few paragraphs of the paper were used in place of the abstract. In a few cases, the papers in question were notes so short that the entire paper was entered.
Due to the extremely active state of this field of research, there were a very large number of papers (25 - 30% of the references) too recent to be found in the abstract database as yet. To make the abstracts for these papers available, we placed the entire set of the Star Formation Newsletters on line, using LaTeX2HTML and the perl scripts, and linked the catalog references to these abstracts. This procedure was almost entirely successful, leaving only those papers which are still in preparation and the papers published between 1960 and 1975, which I felt were certainly more easily accessible than the pre-1960 papers. As these papers which were still in preparation appear in the Star Formation Newsletters, the links are made to the catalog references.
README
file written by Dr. D'Antona, plots and a hypertext
README
file for a set of plot macros supplied
as a guide to the use of the tracks. Access to these files has been welcomed by the star formation
community and references to this online resource are beginning to appear in the literature.
Susan Kleinmann made the data set available to me, and I used a utility in STSDAS to convert
the ASCII files into standard IRAF image files for plotting purposes. Plots were made for all of
the spectra to accompany the distribution of the data files. The spectra were written as standard FITS (Flexible Image Transport, Wells, Greisen & Harten 1981) files
and made available as a compressed tar file. The plot files were
also made available as a compressed tar file. A hypertext README
file
describing how to display the plots
was given as well as a hypertext README
file, written by Susan Kleinmann, on the spectra themselves.
The spectra of these stars are available, with the associated paper (Allen & Strom 1995), as FITS files of images in the IRAF multispec format. Plots of a selected set of spectra are also available as well as tables of the stellar identifications, positions, magnitudes and colors.