Electronic Scholarly Journals;

by Dr. Peter B. Boyce

A talk at Université Louis Pasteur

Strasbourg, France, 29 January, 1999


Slide 1 of 27[Next]
Slide 1

The American Astronomical Society (AAS) has been publishing its most prestigious journal on line in a rich, effective format since 1995. Science magazine awarded our electronic journal a runner-up in their year end most important advance of the year for 1995. Our work has generated comment and feedback which can be used to refine the approach to electronic publishing. The author was the Executive Officer of the American Astronomical Society for 16 years, and led the Society's efforts to bring its journals electronic.

Most of the links to references which have not been put into the text may be found at the end in Slide 27.


Slide 2 of 27 [Previous] [Next][Beginning]
Slide 2

As we will discuss, electronic journals come in different formats, and with different features. Some are ugly, some are bad, and a few are very good. We'll see what sets them apart. To return to fundamentals, we have to recognize all the purposes which a scholarly journal has fulfilled. We must be sure that, as we move into the electronic realm, we continue to fulfill all those purposes. We will look at the richly interconnected world of electronic information delivery, and will come to understand that the journal is only one part of the information needed by scholars. We will see how rapidly the print version is diverging from the electronic version -- which implies that we must continue to make the electronic version accessible. Paper is not a sufficient archival medium And, if we are to archive an electronic journal, we must plan for that in the way we produce the original publication. A well-designed electronic journal archive can be maintained for pennies. In contrast, many of today's so-called "electronic journals" can not be archived at all.


Slide 3 of 27 <[Previous] [Next][Beginning]
Slide 3

For those who are not familiar with astronomy, the AAS is the scientific society for professional astronomers. It was founded in 1899. Our mission is to support astronomy and closely related disciplines. The Society was originally founded to provide a venue for the communication of research results in astronomy and astrophysics.

Throughout the life of the Society, the enabling of communication among astronomers has remained the primary goal. Our mission is not simply to publish journals, and we do not depend upon journal income to finance the other programs of the Society.


Slide 4 of 27 [Previous] [Next][Beginning]
Slide 4

These numbers are approximate, but serve to indicate the size of the AAS journal publishing program. It is small enough to be agile in adopting new technology, but large enough to serve as a realistic prototype for electronic publishing methods. Note that the Astrophysical Journal and the ApJ Letters come out three times per month. It was no accident that, in moving to electronic publication, we started with the ApJ Letters. It is the most prestigious of our journals, and is small enough that we could use manual intervention if necessary. It immediately attracted a lot of viewers, which hastened community acceptance of the electronic version.


Slide 5 of 27 [Previous] [Next][Beginning]
Slide 5

None of the current design team has a background in day-to-day publication of paper journals, nor did any of the previous members. But we were all scholars of one sort or another and knew how we and our scholarly colleagues used the journal. Our thinking was not constrained by conventional publishing practices, apparently making it easier for us to develop new features more easily. During the early design phase, it became apparent that the astronomical library community had great insight into how the journals were used, and we tapped their expertise as well as the opinions of our colleagues.

We have been working with electronic communication for a number of years -- we have been using electronic delivery of the abstracts of meeting presentations since early 1991 (well before the Web). We produced our first electronic publishing plan in May of 1992. We achieved all we set out to do.


Slide 6 of 27 [Previous] [Next][Beginning]
Slide 6

We must not forget that, historically, the scholarly journals serve a multitude of purposes, not all of which are readily apparent. As we enter the electronic era, we must ensure that these purposes continue to be fulfilled, but not necessarily by the journal.

They have provided information about who is working on what problems, who might have moved to a different institution, and what the hot topics are today.

They have provided the latest results - an important function in today's rush to produce results which justify additional research funding, or scarce observing time, or other resources.

In many fields, such as astronomy and physics, the journals carry the corpus of our discipline's knowledge.

Acceptance of a paper by a high quality journals confers a certain degree of status upon the author.

Journals record the progress of our knowledge and ideas for posterity.

And, finally, journals set the standard for what makes an acceptable research paper. If the Physical Review had not existed, the LANL xxx preprint server would not function. Everyone who posts to the preprint server is writing ultimately for publication in the Physical Review -- or other established journal. This keeps the quality of the preprints high, both in style and substance.


Slide 7 of 27 [Previous] [Next][Beginning]
Slide 7

As we will see throughout this talk, links are everything. A paper on the Web without links is no great advance over a paper journal. Nor should a series of static pages in a proprietary format lay claim to being a true electronic journal. Electronic journals have to be accessible with the simplest browsers, and the pages have to be linked to other information.

Perhaps the worst of all is the electronic delivery of the same old paper pages in the same format that served the paper journals -- particularly PDF. PDF files are considerably larger than HTML files, taking a long time to download on today's crowded Internet, and once in your computer they are hard to read, even with the tools provided. Publishers find it hard to insert links in PDF (at least nobody is doing it adequately, yet) and PDF will probably not be readable in ten years. (Can you read old Wordstar documents today? Yet it once had a large share of the market.)

PDF is useful for making a printout at your local printer. This is what it was designed for, and this it does well. But, as the primary delivery mechanism for an electronic journal, it is inferior to a good HTML presentation.


Slide 8 of 27 [Previous] [Next][Beginning]
Slide 8

Links, permanence, and Information transfer -- three hallmarks of a good electronic journal. We will investigate each of these attributes separately. A good electronic journal should be linked to a multitude of other information, should be designed to be permanent (we'll see what this means later), and it should be designed to transfer information efficiently. A good electronic journal is a complex system, not just one simple document. A year of the Astrophysical Journal, for instance, has close to 250,000 files, a library of GIF images of 1,200 or so special characters in various sizes, and a multitude of scripts and programs which are used to generate the screen version which readers see. For 1997 the whole journal occupies about 60 GBytes of storage. As we will see, archiving such a journal is not a straightforward "record it and leave it" operation.


Slide 9 of 27 [Previous] [Next][Beginning]
Slide 9

Our readers consistently tell us that the most important feature of our electronic journals are the links.

We link the journal for two purposes, within the document for ease of use, and outside the document to connect to relevant information. Reading on a screen is not nice. We provide tools to make it easier -- to replace the ability to flip through pages and scan quickly. We provide abundant navigational links. It is important to provide additional capabilities, beyond what can be done on paper. One such capability is the ability to load first the reference list, scan for a name, and (provided the capability is included in the electronic journal) jump into the text to see what the author says about that person. Our readers tell us this is a useful new capability. However, it requires that one uses the in-line author name citation style, and not just numbered endnotes -- one example of where editorial styles should be altered to make use of the electronic tools.

But we also use links to connect to outside information. Primary, of course is the list of references. A click brings us to the reference page, and another one brings us to the abstract of the referenced article. From there, the full text of all articles in the core literature are available. We also have the feature of linking to later papers which reference the paper being read. We call these forward citations, since they refer forward in time. Additional links connect to relevant online databases. The central point is that the journal article is part of a distributed database, linked forward and backward in time, and providing additional material and information to the user's desktop.


Slide 10 of 27 [Previous] [Next][Beginning]
Slide 10

Astronomy is one example where we have linked the whole system. We have done this through the abstract service of the Astrophysics Data System (ADS). Medicine has the same facility, except they use the PubMed database to make the links. There are two keys: 1. common, open standards for naming digital articles. 2. Name resolution for robust linking. Our links can work at several mirror sites, and will remain valid over time.

But the central job of linking to the abstracts of references and citations is filled by the ADS abstracts system. The ADS also provides scanned page images of the important historical literature within astronomy. With NASA support, this collection is available for free. The ADS also provides the links from articles to the machine readable data tables which reside in the astronomical, online databases (CDS, NED, ADC). which can be searched by astronomical object. We call this system of protocols and links Urania. It is not a collection of objects, it is the underlying, enabling protocols - an important distinction. No need to point out that Urania is entirely a Web Creature.


Slide 11 of 27 [Previous] [Next][Beginning]
Slide 11

The interlinked astronomical information system can be entered at many points, as illustrated by the red arrows entering from the left. One can browse the journals -- jumping to the abstracts of the references and forward citations, then reading the full text, or going to the relevant data in the online databases as shown with the green arrows.

One can search the abstract collection, get to the full text, the online journals and the data. Or knowing the reference one can go directly to the historical collection -- full page images of all the core journals. The historical collection will be complete by June 1999, back to volume 1, issue 1 for the main important journals in astronomy.

Or one can enter the databases, giving the astronomical object of interest, retrieve the published data on that object and linking into the articles where the data were originally published. One of the great tools -- particularly useful in this form for astronomy, is the ability (now only in prototype form) to search over a huge collection of data for a list of all objects which meet certain characteristics (e.g. are in a certain region of the sky, and are brighter than a certain magnitude, but emit a large amount of X-ray energy, and have more than the expected amount of infrared radiation. This capability is changing the way astronomers do their research. The time spent on tedious literature searches can now be used in converting this information into real knowledge about the universe.


Slide 12 of 27 [Previous] [Next][Beginning]
Slide 12

The Centre de Données astronomiques de Strasbourg (CDS), located in the Strasbourg observatory is one of the wold's primary databases for the collection, storage, and provision of information about stars and other objects, particularly in our galaxy. The Simbad service of the CDS maintains the list of all publications which deal with astronomical objects, organized by object. As of 30 January, 1999, the Vizier service includes 2088 catalogs of astronomical data of various kinds, all available online. The CDS has developed specialized tools to help astronomers extract and use this information most effectively. Their homepage provides more information about their services, which are open to all qualified users.


Slide 13 of 27 [Previous] [Next][Beginning]
Slide 13

Getting back to the electronic journal attributes -- particularly the effective transfer of information. Ease of use is important. We provide several versions of the AAS journals, each designed for a different use. The HTML versions are for browsing -- either full articles, or in chunks, such as the abstract only, or the reference page, or just the tables, or just the figures. The PDF version is for local printout, and the users of our ApJ Letters seem to be using it that way, accessing the HTML versions five times as frequently as they do the PDF. The relegation of the PDF to printout only seems to come with growing community awareness of the additional features available in the HTML version.

We have eliminated superfluous graphics (something which many other journals have not done yet). Our links are all text links instead of icons. We present carefully constructed thumbnails of the figures which load rapidly, and give the reader the choice of loading the large version of the figure if he or she wants it. In short, we design for the user, particularly a user in a small or remote location.

One interesting development is the rapidity with which the paper and electronic versions are diverging. We have authors requesting color figures online, but black and white on paper (where te use of color adds $1,000 to the cost that the author pays). We have videos which can not appear on paper, and we have a considerable number of data tables in which the first ten lines are printed, and the full table is available only on line, but there it is available in a machine readable format. We have published an interactive table which displays results based upon the reader's requests. And there are many more electronic features which will come. Combined with the live links, which are only in the electronic version, these features make up a large portion of the value of the article. Paper can not convey the full content any more. Which brings us to the question of permanence.


Slide 14 of 27 [Previous] [Next][Beginning]
Slide 14

Scholarly knowledge is built up through time, with succeeding generations of scholars depending upon and referring to the work that came before. Scholarly journals, by their very nature, must remain available indefinitely -- and all the features which convey the content have to continue to work, as well. Archiving is not the right word for the electronic environment. Archives connote safe places where the materials are protected from harm. In the electronic future, reading a journal does not degrade the information, and accessibility and continued functionality are the desired functions of an electronic archive. It is not enough to record the material somewhere and then forget it. To stay accessible, it has to be managed, migrated to new formats and new physical storage media, and technically updated to remain readable by the current technology.


Slide 15 of 27 [Previous] [Next][Beginning]
Slide 15

PDF is not an archival format. It is widespread enough that it will probably become possible to migrate it to some new page presentation format, but it is not designed to make this task easy. At this time, the best way to assure longevity of the electronic documents is to produce a master copy in SGML a markup language which is sufficiently detailed to be able to code all the information necessary both to derive the public documents (the HTML and PDF versions) in an automatic fashion and to translate the SGML into whatever new standard might become widely accepted in the future (such as XML).

Of course, the links have to continue to function. So references can not be to URLs, but must be made through logical name resolution. All contributors to the electronic information system have to maintain their holdings in order for an electronic archive to function. It is difficult, but not impossible, and it is absolutely necessary. In order to accomplish this at a reasonable, even minimal, cost, the electronic journal has to be produced with this need in mind. The publication process, in short, has to be re-engineered.


Slide 16 of 27 [Previous] [Next][Beginning]
Slide 16

In the production of an electronic journal, all stages in the process become interdependent, and there is considerable interaction with other products, such as the abstract database. By reworking the production process to produce the electronic master copy first, using the power of automated tools, where possible, and adding links and formatting codes as early as possible in the production process, we have been able to make substantial cost savings -- savings which can more than offset the additional cost of producing the electronic version of the document.

Our partner in electronic journal development, the University of Chicago Press, has successfully made this transition to a new process. They first produce a master electronic copy of our journal in a robust SGML format, and derive both the paper version and the publicly available electronic versions from it. We now have the capability to refresh the technical appearance and features of the HTML version as browsers become more sophisticated. By using automatic scripts to produce the HTML versions from the SGML "master copy," we can accomplish this at very little cost, well within the normal operating budget of the journal.

We have rederived our whole output three times already, adding features such as HTML tables (the capability to display tables was not available when we started publishing), forward citation references, and data tables in machine readable format among others. We see the rederiving of the public versions as a continuing requirement, part of maintaining effective access to the electronic material. And if the two major browsers continue to evolve along separate paths, we will have to include the ability to derive the requested document upon request, tailoring it to the users browser capabilities.


Slide 17 of 27 [Previous] [Next][Beginning]
Slide 17

Readers of our electronic journals have the same expectations of quality, permanence and quality of presentation they have always had.


Slide 18 of 27 [Previous] [Next][Beginning]
Slide 18

But, they have new expectations as well. Foremost among these is the expectation of updatability. Readers expect the online version to represent the latest information. Authors expect to be able to update their work.

We particularly get requests from the authors of the abstracts of meeting presentations to update their abstract before the meeting. Even if the paper version has gone to the printer, they always ask us to update the electronic version. Our community has made the switch from accepting the paper version as the authentic one to counting upon the electronic version as being the latest authentic material.

The problem with updates is that, as scholars, we require a definitive version of someone else's work in order to cite it and build upon it for our own contribution to knowledge. Continually changing the "published" papers would undermine the long-established scholarly process. Our solution is to "freeze" the paper as accepted by the scientific editor and make links to new papers which update the material. The only exception is for typographical errors and mistakes which would produce the publication of "errata" in the paper version. In this case we add a link to the correction of erratum in the published version. But, we require the author to stick by the version of the paper which he or she has finally approved for publication. However, as noted earlier, it is critical to keep the links operable and the public electronic versions updated technically, so they will be available to the current suite of readers using current software. I emphasize, we do not change the content, only the presentation.


Slide 19 of 27 [Previous] [Next][Beginning]
Slide 19

Readers expect the Internet process to be faster. We have responded with a process which makes the papers in the ApJ Letters available one by one as they are accepted by the scientific editor. When the paper version is printed, and we have the "page numbers," we republish the paper (exactly the same content) adding the page numbers.


Slide 20 of 27 [Previous] [Next][Beginning]
Slide 20

As the struggle for resources becomes ever more competitive, the need to stay aware of the status of current work grows more imperative. The electronic preprint servers are filling this need to stay abreast of the latest results.

However, at least in astronomy, there has been no reduction in the number of papers submitted to the peer reviewed journals. Our community still values the quality of the journal. The two services serve complementary purposes. The ideal solution is to develop a mechanism to track the article from the preprint stage into the publication. A reference to the preprint would return the abstract and a link to the published paper. Such a system is about to be deployed for the use of the astronomical community. We'll see if the community will adopt this system.


Slide 21 of 27 [Previous] [Next][Beginning]
Slide 21

To summarize the three years of feedback, users want a certain set of features. They may seem mundane and obvious, but they clearly represent the concern of the community that our first job should be to maintain the capability to disseminate the latest scientific results and to maintain the scientific record.


Slide 22 of 27 [Previous] [Next][Beginning]
Slide 22

The subtitle says it all. We are entering an era where it is deceptively simple to put something up on the WWW for all to see. So publishers had better provide what the scholarly community wants, or they will start doing it for themselves -- more or less successfully. An example is the LANL xxx preprint server. The major innovations to date have been made by people other than the established publishers. HighWire Press at Stanford originated within the Stanford library -- not a traditional publisher. I have already said that the AAS team did not originate from within our publishing operation. The British Medical Journal is being driven by their editor, a person from the medical community.

It is also clear that the users are flocking to the LANL preprint server to fill the need for both the latest results and, secondarily for information on the community. The electronic journals will no longer fill this role. In my opinion, the good journals will provide the framework upon which the scholar will rely: the basis for effective literature searching, the stable set of links and stable logical names, and the conformity of format and descriptive data which will allow for powerful searching over many distributed information resources. Without the systematic infrastructure imposed by the journals, the true power of the interlinked Web will not be realized. However, the journal will be just one part of a widely distributed web of electronic information. The example we have developed in astronomy bears this out. And, as said earlier, the publishers of journals will be able to assemble and format articles upon demand, based upon the hardware, software, and needs of the user. This is not to say that all journals will survive, only those which can understand their role in today's changing information environment.


Slide 23 of 27 [Previous] [Next][Beginning]
Slide 23

As publishers we have a number of challenges, some of which the users will participate in. We are challenged to use the new electronic tools effectively. We are challenged to reassess our role, to look at the basic mission we fulfill now, and to use new electronic tools to advance that mission. We have to learn how to work together to develop the full potential of a widely linked, interoperable, distributed information system. The community as a whole has to strive to maintain the standards of scholarship and to keep a working information system functioning as we go through the changes which are upon us. We must, as a community, insist upon having a robust archive. If the publishing is done right, maintaining the archive is brought well within our reach. Of course we must maintain quality. And, the community has to temper its enthusiasm for self publishing through the preprint servers. The electronic preprints will not be sufficient to fulfill all the purposes which, until now, have been fulfilled by the journals. But, most of all, we have an unparalleled opportunity to improve the system of scholarly communication.


Slide 24 of 27 [Previous] [Next][Beginning]
Slide 24

Trends: Current awareness

The preprint servers have taken over the function of keeping the community up to date (at least for those fields in which such servers exist).They provide a very useful function, although they do add to the information glut. There is a lot to read each day, just to keep up, and this will eventually become too much of a burden upon the working scientist. Our electronic capabilities allow for updating, but, as mentioned earlier, scholarship requires publications frozen in time. The present solution is to link forward to the next "snapshot frozen in time." However, some data are best presented as a new type of entity -- a database of the best current estimate of the real value of some measurement. Prime examples can be found in the protein databases and genetic databases in biology. Researchers are receiving scholarly credit for contributing directly to the databases. Updates to the database are taking the place of publishing a paper -- which would only say, "We have added thus and so to the XYZ database." Crystallographers have been doing this for years, and I believe such living databases will become common in other fields of science.


Slide 25 of 27 [Previous] [Next][Beginning]
Slide 25

Trends: Interconnection

Linking and interconnection are, without doubt, the most important features of the world of electronic scholarly information. There is truly a new paradigm and a new interlinked structure for maintaining and disseminating information. However, this means that the boundaries between information providers are becoming very indistinct. One hardly knows any more where a good (= highly interlinked) electronic journal stops and the associated databases begin. There is a desperate need for standard identifiers. None of the current ones are up to the task yet, and a number of fields have created their own, each with their accompanying capabilities.

The scholarly world needs a universal identifier, one which will serve in the creating of the links, the following of links by the reader in a seamless fashion, the discovery of the document or information chunk, as well as the establishment of the rights and ownership. The bibliographic databases can hold the key to broad interoperability, although we already see the temptation to charge too much for the service. The linkages between scholarly information have to be affordable to the scholarly community. It is no accident that disciplines, such as astronomy and medicine, which support the infrastructure for the free interlinking of information with public funds, have progressed farther than the other disciplines.


Slide 26 of 27 [Previous] [Next][Beginning]
Slide 26

Trends: Journals

The good electronic journals have long since left behind the limitations of words on paper. They include many features which cold only be done electronically. This trend will continue. The number and variety of electronic-only features will grow, most likely at an accelerating pace. Serving the readers' needs will become more imperative. The marketplace will see to that. The appreciation of the need to design the journal for easy maintenance into the future will grow, as will the understanding of the need to actively manage the electronic archive. But the hardest thing for many people and organizations to accept will be the need to adopt a new mindset and new publishing procedures.

Concluding with a warning to authors, "Think about the likelihood that your work will be accessible in ten years, and publish with an organization which can ensure long term access.


Slide 27 of 27 [Previous] [Beginning]
Slide 27
Astrophysical Journal (US) or (France).

Astronomical Journal (US) or (France).

Astrophysics Data System (US) or (France).

American Astronomical Society.

Urania.


Comments or Questions?

Last update: 1/Feb/99