VOTable: A Proposed XML Format for Astronomical Tables

Roy Williams, California Institute of Technology, USA
François Ochsenbein, Observatoire Astronomique de Strasbourg, France
Clive Davenhall, University of Edinburgh, UK
Daniel Durand, Canadian Astronomy Data Centre, Canada
Pierre Fernique, Observatoire Astronomique de Strasbourg, France
David Giaretta, Rutherford Appleton Laboratory, UK
Robert Hanisch, Space Telescope Science Institute, USA
Tom McGlynn, NASA Goddard Space Flight Center, USA
Alex Szalay, Johns Hopkins University, USA
Andreas Wicenec, European Southern Observatory, Germany

Version 1.0 (15 Apr 2002)

Document repository: http://cdsweb.u-strasbg.fr/doc/VOTable/
Comments:  


Contents:


1  Introduction

The VOTable format is a proposed XML standard for representing a table. In this context, a table is an unordered set of rows, each of a uniform format, as specified in the table metadata. Each row is a sequence of table cells, and each of these is either primitive data types, or an array of such primitives. The format is derived from the Astrores format [1], itself modeled on the FITS Table format [2]; VOTable was designed to be closer to the FITS Binary Table format.

1.1  Example

A simple example of a VOTable document is:
<?xml version="1.0"?>
<!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd">
<VOTABLE version="1.0">
  <DEFINITIONS>
  <COOSYS ID="myJ2000" equinox="2000." epoch="2000." system="eq_FK5"/>
  </DEFINITIONS>
  <RESOURCE>
    <PARAM name="Observer" datatype="char" arraysize="*" value="William Herschel">
      <DESCRIPTION>This parameter is designed to store the observer's name
      </DESCRIPTION> 
    </PARAM>
    <TABLE name="Stars">
      <DESCRIPTION>Some bright stars</DESCRIPTION>
      <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/>
      <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" 
             datatype="float" precision="F3" width="7"/>
      <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" 
             datatype="float" precision="F3" width="7"/>
      <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/>
      <DATA>
        <TABLEDATA>
        <TR>
          <TD>Procyon</TD><TD>114.827</TD><TD> 5.227</TD>
          <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD>
        </TR>
        <TR>
          <TD>Vega</TD><TD>279.234</TD>
          <TD>38.782</TD><TD>8 7 8 6 8 6</TD>
        </TR>
        </TABLEDATA>
      </DATA>
    </TABLE>
  </RESOURCE>
</VOTABLE>

This table shows the positions of two stars, each with a name and two floating point numbers as coordinates, together with a variable-length, multidimensional array called "Counts". The star names have a fixed length of 10 characters (padded by trailing blanks). The floating-point numbers (RA and Dec) are in degrees, and assumed to have three decimal digits (precision="3"), irrespective of the number of digits presented in the data. The frame of the coordinate system is specified explicitly with the COOSYS element. Associated with the table is a parameter (PARAM), which is to be interpreted as a string, which in this example is the name of the observer (William Herschel).

1.2  Why VOTable?

Astronomers have always been at the forefront of developments in information technology, and funding agencies across the world have recognized this by supporting the Virtual Observatory movement, in the hopes that other sciences and business can follow their lead in making online data both interoperable and scalable.

VOTable is designed as a flexible storage and exchange format for tabular data, with particular emphasis on astronomical tables.

Interoperability is encouraged through the use of standards (XML); because physical quantities are tagged not only with units, but also through a Uniform Content Descriptor (UCD) that expresses the nature of the quantity (eg. Gunn J magnitude, declination). The XML fabric allows applications to easily validate an input document, as well as facilitating transformations through XSLT (eXtensible Style Language Transformation) engines.

Grid Computing

VOTable has built-in features for big-data and Grid computing. It allows metadata and data to be stored separately, with the remote data linked according to the Xlink model. Processes can then use metadata to 'get ready' for their input data, or to organize third-party or parallel transfers of the data. Remote data allow the metadata to be sent in email and referenced in documents without pulling the whole dataset with it: just as we are used to the idea of sending a pointer to a document (URL) in place of the document, so we can now send metadata-rich pointers to data tables in place of the data itself. The remote data is referenced with the URL syntax protocol://location, meaning that arbitrarily complex protocols are allowed.

When we are working with very large tables in a distributed-computing environment (``the Grid"), the data streams between processors, with flows being filtered, joined, and cached in different geographic locations. It would be very difficult if the number of rows of the table were required in the header – we would need to stream in the whole table into a cache, compute the number of rows, then stream it again for the computation. In the Grid-data environment, the component in short supply is not the computers, but rather these very large caches! Furthermore, these remote data streams may be created dynamically by another process or cached in temporary storage: for this reason VOTable can express that remote data may not be available after a certain time (expires). Data on the net may require authentication for access, so VOTable allows expression of password or other identity information (the 'rights' attribute).

Data Storage: Flexible and Efficient

The data part in a VOTable may be represented using one of three different formats: TABLEDATA, FITS and BINARY. TABLEDATA is a pure XML format so that small tables can be easily handled in their entirety by XML tools. The FITS binary table format is well-known to astronomers, and VOTable can be used either to encapsulate such a file, or to re-encode the metadata; unfortunately it is difficult to stream FITS, since the dataset size is required in the header (NAXIS2 keyword), and FITS requires a specification up front of the maximum size of its variable-length arrays. The BINARY format is supported for efficiency and ease of programming: no FITS library is required, and the streaming paradigm is supported.

We hope that VOTable can be used in different ways, as a data storage and transport format, and also as a way to store metadata alone (table structure only). In the latter case, we can imagine a VOTable structure being sent to a server, which can then open a high-bandwidth connection to receive the actual data, using the previously-digested structure as a way to interpret the stream of bytes from the data socket. Alternatively, the metadata can be sent alone as an implicit query to a server, which will respond with the data part of the table filled in.

VOTable can be used for small numbers of small records (pure XML tables), or for large numbers of simple records (streaming data), or it can be used for small numbers of larger objects. In the latter case, there will be software to spread large data blocks among multiple processors on the Grid. Currently the most complex structure that can be in a VOTable Cell is a multidimensional array.

Future

In future versions of the VOTable format, we expect to benefit from both experience and tool-building. Such tools include presentation and transformations of the metadata (and data too, when it is in XML), using XML transformation language and software: XSL and XSLT. We would like to migrate to the more powerful document validation provided by XSchema [8] rather than DTD – a draft version of the VOTable format in XSchema is included as Appendix.

We also expect XSchema to allow better modularization of the document schema, so that, for example, users might put whatever serialized objects they wish into the table cells. In this way, we expect to use VOTable for handling the flow of large data objects through Grid resources, objects such as FITS files or XDF [7] documents. Also, it would also mean, for example, that the description of a table could contain arbitrary HTML instead of the current version – plain text with paragraph markers; or that an XML definition of non-standard astronomical coordinate systems could be seamlessly integrated.

VOTable is derived from Astrores, which is specified not only as a way to write a data table, but also as a way to specify how to address a request to data tables. We expect to sharpen and formalize this dichotomy with the benefit of experience, building into VOTable the ways of making sophisticated querying mechanisms and protocols.

We expect to add features for efficiency in the future also: to specify that the data stream has a particular sort order, to specify that a column in one table is a key in to another table; to specify that one table is an index into another. The binary format will be extended to facilitate large-scale streaming.

2  Data Model

In this section we define the data model of a VOTable, and in the next sections its syntax when expressed as XML. The data model of VOTable can be expressed as:

 VOTable = hierarchy of Metadata + Tables
 Metadata = Parameters + Infos + Descriptions + Links + Fields
 Table = list of Fields + Data
 Data = stream of Rows
 Row = list of Cells
 Cell = Primitive
 
or variable-length list of Primitives
or multidimensional array of Primitives
 Primitive = integer, character, float, floatComplex, etc (see table below).

Metadata is divided into that which concerns the table itself (parameters), and the definitions of the fields (or column attributes) of the table. Each Field represents the metadata at the top of the column, for example, in the table above, a Field has name set to "RA". The Field can be thought of as a class definition, and the table cells below it are the instances of that class.

A parameter (PARAM) is similar to a FIELD, except that it has a value attribute. Parameters can be used for storing FITS keywords or any other information pertaining to the table itself or its environment.

An informative parameter (INFO) is a restricted form of the PARAM – it has only the name / value pair of attributes.

datatype Meaning FITS Bytes
"boolean" Logical "L" 1
"bit" Bit "X" *
"unsignedByte" Byte (0 to 255) "B" 1
"short" Short Integer "I" 2
"int" Integer "J" 4
"long" Long integer "K" 8
"char" ASCII Character "A" 1
"unicodeChar" Unicode Character   2
"float" Floating point "E" 4
"double" Double "D" 8
"floatComplex" Float Complex "C" 8
"doubleComplex" Double Complex "M" 16

The ordered list of Fields at the top of the table thus provides a template for a Row object, (also called a record). The template allows interpretation of the data in the Row. In VOTable, there is no advance specification of the number of rows in the table: this is to allow streaming of large tables, as discussed above. The record is a set of Cells, with the number of Cells the same for each Row, and the same as the number of Fields defined in the Metadata.

2.1  Primitives

Each Cell is composed from Primitives, each of which is a datatype of fixed-length binary representation, as enumerated in the accompanying table. Cells may consist of a single Primitive (this is the default); or a multidimensional array of Primitives (see next section).

Except for the Bit type, each primitive has the fixed length in bytes given in the table. Bit scalars and arrays are stored in the minimum number of bytes feasible (so that b bits take the integer part of (b+7)/8 bytes). It is this fixed size that allows efficiency in storage, so that the memory used is minimized. These primitives are described in more detail in section 7.

In VOTable, characters are Primitives, either one byte for an ASCII character or two bytes for a Unicode character. Strings can be represented as a fixed or variable length array of characters. We can represent a 1D array of strings as a 2D array of characters, but given the logic above, the Cell can contain a variable-length array of fixed-length strings (but not a fixed-length array of variable-length strings).

2.2  Multidimensional Arrays

A table cell can contain an array of a given primitive type. The array is specified by a sequence of dimensions, with the first dimension changing fastest, and the last dimension that may be variable in length. For example, a table cell may contain a set of up to 10 images, each 64x64 bytes:

<FIELD ID="thumbs" datatype="unsignedByte" arraysize="64x64x10*"/>

The string in the arraysize attribute expressed these dimensions, each integer separated by the x character, except the last. The last (slowest-varying) subscript of a multidimensional array may have variable length, meaning that the dimensionality of the final subscript may be different for different rows of the table. In this case, there may be just an asterisk, in which case the array may be arbitarily large; or a number followed by an asterisk, meaning that this subscript is guaranteed not to exceed this value.

Variable-length arrays are more efficient in storage and data transfer, but less efficient computationally, because extra pointer dereferencing is required to access the data. The reason that only the slowest-varying subscript can be variable is effectively a demand that the data provider pack the data into a small number of fixed-size containers as much as is possible, rather than a large number of small containers.

2.3  FITS Binary Tables

VOTable is closely compatible with the FITS Binary Table format. Henceforth, we shall abbreviate ``FITS Binary Table and its Conventions" simply by the word ``FITS". Given a FITS file that represents a binary table, the header may be converted to VOTable, with a pointer to the original file, or with the original file included directly in VOTable. Since the original file is still present, it is clear that no data has been lost. The PARAM element can be used to hold the FITS keyword, value, and comment string.

We might ask two more significant questions, about how much of the FITS header and data can be represented in VOTable. The answer is that there is considerable overlap.

What can FITS do but not VOTable?

FITS has semantics for how data is to be represented when printed, as the non-mandatory TDISP keyword: for example F12.4 means 12 characters are to be used, and 4 decimal places. This has been converted in VOTable as the attributes width and precision which, connected with datatype, are semantically identical. Note that error estimation and the number of digits to print are rather different semantically.

FITS has a complex semantics (the ``Substring Array" convention) for structuring a single string as a collection of substrings, and VOTable does not support this. VOTable allows fixed and variable-length strings, as well as variable-length arrays of fixed length strings.

What can VOTable do but not FITS?

VOTable supports separating of data from metadata and the streaming of tables, and other ideas from modern distributed computing. It bridges two ways to express structured data: XML and FITS. It tries (through UCD) to express formally what is the semantic content of a parameter or field. It has the hierarchy and flexibility of XML. FITS does not handle Unicode (extended alphabet) characters.

It should be noticed that the transformation of FITS to VOTable is meant to be reversible: the conversion of a FITS file into VOTable does not lose any information, and a transformation back into FITS is possible. It will however not be possible to transform any VOTable into a FITS file without losing some information.

3  Document Structure

The VOTable document consists of a single all-containing element called VOTABLE, which may contain a DESCRIPTION, a DEFINITIONS element, a number of INFO elements, and one or more RESOURCE elements. Resource elements can contain child Resources, and they can also contain Tables (TABLE) and Parameters (PARAM).

3.1  DEFINITIONS element

This element may contain a definition of a coordinate system, stored in a COOSYS element: there are attributes for equinox and epoch, as well as a specification of the coordinate system. We expect that future versions of VOTable may have a more formal structuring of the coordinate system definition which is currently extracted from Astrores. There may also be one or more PARAM elements that may contain user-specific data. Each of these may have an ID attribute, that can be referenced with the ref attribute of a FIELD. Thus we can achieve grouping of fields (by having members of the group reference the same PARAM or COOSYS element). We can also extend the definition of a field by adding user-specific data.

3.2  RESOURCE element

There may be multiple RESOURCE elements, and each of these may contain a DESCRIPTION, INFO, COOSYS and PARAM elements. There may be LINK elements to provide URL-type pointers that give further information.

The RESOURCE may also contain contain other RESOURCEs: the following is a complete VOTable which contains no tables, only a hierarchy of parameters.

<?xml version="1.0"?>
<!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd">
<VOTABLE version="1.0">
  <RESOURCE ID="Stars">
    <PARAM ID="Mass" datatype="float" unit="solMass" value="1"/>
    <RESOURCE ID="BigStars">
      <PARAM ID="Mass-big" datatype="float" unit="solMass" value="10"/>
    </RESOURCE>
    <RESOURCE ID="SmallStars">
      <PARAM ID="Mass-small" datatype="float" unit="solMass" value="0.2"/>
      <RESOURCE ID="VerySmallStars">
        <PARAM ID="Mass-tiny" datatype="float" unit="solMass" value="0.05"/>
      </RESOURCE>
    </RESOURCE>
  </RESOURCE>
</VOTABLE>

The main ingredient of the RESOURCE element is one or more TABLEs. These are described in section 5 of this document.

4  XML

VOTable is constructed with XML (extensible Markup Language), a powerful standard for structured data throughout the Internet industries. It derives from SGML, a standard used in the publishing industry and for technical documentation for many years. XML consists of elements and payload, where an element consists of a start tag (the part in angle brackets), the payload, and an end tag (with angle brackets and a slash). Elements can contain other elements. Elements can also bear attributes (keyword-value combinations), such as the PARAM elements above.

The payload may be in two forms: parsed or unparsed character data. Examples are:

<text>Fran&#231;ois</text>
<text><![CDATA[ a <= (b & c) ]]></text>

In the first example, the sequence &#231; is interpreted as part of the ISO/IEC 10646 character set, and translates to an accented character, so that the text is ``François". The second example uses the special CDATA sequence so that the characters <, >, and & can be used without interpretation; in this case, any ASCII characters are allowed except the terminating sequence ]]> For more information, see any book on XML.

4.1  Lists

Within a table cell, multiple Primitives can be formatted by separating them by whitespace, to express compound primitives (complex numbers) and multidimensional arrays. Text tokens are separated by contiguous whitespace (ASCII space 0x20, tab 0x9, carriage-return 0xD, newline 0xA, vertical tab 0xB). There are no null tokens, comments, quote characters, or separators. However, it is possible to include special characters through an escape mechanism as in HTML: for example the string New York would be encoded as New&nbsp;York.

Thus a table cell that contains an array of three complex numbers could be represented as:

<TD>1.0 0.0   -0.5 0.866   -0.5 -0.866</TD>

However, it should be noted that in a character array (a string) no space is needed to separate each element (a character).

4.2  Syntax policy

Following the general XML rule, element and attribute names are case-sensitive and have to be used with the specified capitalisation. For VOTable, we have adopted the convention that element names should be in uppercase and attribute names in lowercase (with an exception for the ID attribute). Element and attribute names are further distinguished in this paper by being in fixed-width font.

4.3  ID and name

The FIELD and PARAM elements provide both ID and name attributes: the ID is meant as a formal naming of the objects in the VOTable document, while the name is meant for presentation purposes, and defaults to the same as the ID value if not present. The ID's of the fields and parameters must be unique throughout the document – this is part of the XML specification, and we intend that eventually ID's can be used to tie together data sources and applications that read and write them. The name of an object, when not present, defaults to the ID value, and when it is present is intended for presentation to humans.

Currently, however, the ID attribute (as defined by Xpointer standard) is used in order to refer to other elements in the document. According to the XML standard, the attribute ID is a string beginning with a letter or underscore (_), followed by a sequence of letters, digits, or any of .-_:, and each ID must be unique in the XML document. For example ref="apple" refers to the element that contains ID="apple" in the current XML document. The ID attribute is required for the elements which have to be referenced, but in principle any element may have an ID attribute. Elements that support the ref attribute (and can point to those with ID) are: FIELD, PARAM, and TABLE.

The ID is different from the name attribute in that (a) the ID attribute is made from a restricted character set, and must be unique (or else the document is considered invalid in the XML sense), whereas names are standard XML attributes and need not be unique; and (b) there should be support in the parsing software to look up references and extract the relevant element with matching ID.

4.4  Xlink and STREAM

The STREAM element is used to point to remote table data, and as such it closely follows the W3C specification called ``Xlink". The STREAM implements the interface defined by Xlink; in particular it is an Xlink with type="locator". However, STREAM has more attributes than Xlink allows for: a rights attribute for authentication information; and expires attribute for when the link may cease to be valid; and an encoding attribute if the data is filtered, (for example compression or binary-to-ascii filtering). Therefore we will wait until a future release to formalize the relationship between Xlink and STREAM.

4.5  Location of the DTD

A VOTable document, like all XML documents, should be well-formed, meaning that it obeys the syntax rules of XML: for example, elements should start and end properly and be properly nested. The document may be further constrained to be valid, meaning that it follows the VOTable syntax rules, as defined in the DTD of : for example the data type ``float" is valid, but not the datatype ``real". Access to the DTD is necessary to check validity. Valid XML documents may employ certain advanced features of XML, features that can significantly improve the usability of a document, including: linking mechanisms, entities and attributes. Valid XML documents offer much more to the document process than those that may be simply well-formed. Document authoring, processing, storage and display are made easier because documents exist in a structured environment. Authors create documents against a pre-defined structure and benefit from a clear document model.

There are three ways to give the document access to its DTD structure: by embedding the DTD directly into the XML file, or by referencing a local file, or by referencing a remote file. We should point out that many parsers will simply stop if the DTD reference cannot be resolved, rather than falling back to a non-validated document. Any XML book will explain the syntax of these options.

5  Table

The Table element is written in XML as a DESCRIPTION and LINK elements, together with a collection of FIELD elements that describe the nature of the table columns. Finally, the data of the Table may be specified with a DATA element.

A FIELD element may have several sub-elements, including the informational DESCRIPTION, and LINK, as well as VALUES, that can express limits and ranges of the values that the corresponding cell can contain, such as minimum, maximum, or enumeration of possible values.

The FIELD must contain a datatype attribute, which expresses the nature of the data that is in the cells of this column of the table. This attribute determines how data is read and stored internally, and is mandatory.

Each table cell may contain more than one of the specified datatype, and this is specified with the arraysize attribute, as explained above, inducing a multidimensional array structure on those table cells.

Strings are not a primitive type: characters are.  To simulate variable length strings users can use variable length arrays of characters, for example:

<FIELD ID="unboundedString" datatype="char" arraysize="*"/>

VOTables support two kinds of characters: ASCII 1-byte characters and Unicode 2 byte characters. Unicode is a way to represent characters that is an alternative to ASCII. It uses two bytes per character instead of one, it is strongly supported by XML tools, and it can handle a large variety of international alphabets. Therefore VOTable supports not only ASCII strings (datatype="char"), but also Unicode (datatype="unicodeChar").

For details of the exact meaning of all valid datatypes, please see section 7.

If the data is written as TABLEDATA, and a table cell contains an array or complex number, then it should be encoded as multiple numbers separated by white space.

A FIELD may also specify a null attribute through a VALUES element. For example: null="-99". When this value is found in the corresponding data, it is assumed that no data exists for that table cell; the parser may choose to use this also when unparsable data is found, and the null value will be substituted instead. The default representation of a ``null'' value is an empty column in the TABLEDATA representation (i.e. <TD></TD>). For FITS and BINARY data, the NaN (not-a-number) patterns are recommended to represent floating-point ``null'' values. The ``null'' convention is therefore only necessary for primitive types that do not have a natural ``null'' value, such as int, short, byte, etc.

5.1  Numerical Accuracy

The VOTable format is meant for transferring, storing, and processing tabular data, it is not intended for presentation purposes. Therefore (in contrast with Astrores) we generally avoid giving rules on presentation, such as formatting. However, we retain the width attribute of the FIELD, which is meant as a hint to the presentation system about the number of characters to use for input or output of the quantity. Inevitably some at least of the data will have to be presented – either as actual tables, or in graphs, etc... In any case, some estimation of the accuracy has to be known.

But there is a semantic difference between a number written as "5.12" and one that is written "5.1200", in that the former implies three significant digits of accuracy, and the latter five digits. Therefore the number of digits to show is not purely a presentation matter, but part of the metadata content of the number.

VOTable therefore provides the precision attribute in the FIELD element to express the number of significant digits, or equivalently, the log of the implied error estimate of the numbers in the column. More control is available through an initial character: setting this to "E" rather than the default "F" implies that the precision measures the relative error (significant figures) rather than its absolute error (decimal places). Thus precision="E5" means an implied relative error 10–5 (5 significant digits), and precision="5" or "F5" means an implied absolute error 10–5 (5 digits following the decimal point).

5.2  Units

The quantities in a column of the table may have physical units, and this is specified by the units attribute of the FIELD. Examples are:

      units="cm-2.s-1.keV-1"
      units="erg.s-1"

The syntax of this string is defined in reference [3].

5.3  Unified Content Descriptors

The CDS in Strasbourg has used the metadata from thousands of astronomical tables to make a hierarchical glossary of the scientific meanings of the data in those tables [4]. Of 1600 entries in the glossary, here are a few typical examples.

"PHOT_INT-MAG_B" Integrated total blue magnitude
"ORBIT_ECCENTRICITY" Orbital eccentricity
"STAT_MEDIAN" Statistics Median Value
"INST_QE" Detector's Quantum Efficiency

The ucd attribute of the FIELD element is to hold this information.

5.4  VALUES element

The VALUES element of the FIELD is designed to hold subsidiary information about the nature of the data in the field. It may have MIN and MAX elements, and it may contain OPTION elements. The latter contains name and value attributes, and may also contain more OPTION elements, so that a hierarchy of keyword-values pairs may be associated with each field.

There may also be a null attribute, as discussed above. If this is present in a table cell, it is assumed to mean that no data is present.

5.5  LINK Elements

The LINK element is to provide pointers to other documents or data servers on the Internet through a URL. In VOTable, the LINK element may be part of the RESOURCE, TABLE or FIELD elements. The href attribute of the LINK is meant to provide a URL that is at least valid syntactically, even though there need be no assurance that the link will actually connect and deliver data. It may be that a strange protocol is implied that the parser does not know about, for example "httpg://server/file". However, parsers are expected to understand at least the "file", "http" and "ftp" protocols.

The gref attribute is meant for a higher-level protocol of some type, perhaps a logical name for a data resource, perhaps a GLU reference [5].

In the Astrores format, from which VOTable is derived, there is additional semantics for the LINK element; the href attribute is used as a template for creating URL's. This behavior is explained in Appendix, and it represents a further proposal, a possible extension of VOTable.

6  Data Content

While the bulk of the metadata of a VOTable document is in the FIELD elements, the data content of the table is in a single DATA element. The data is organized in ``reading" order, so that the content of each row appears in the same order as the order of the FIELD tags, with each row having the same number of items as there are FIELD tags.

The data section of the VOTable document is created through a data pipeline. The abstract table is first serialized by one of several methods, then it may be Encoded for compression or other reasons. The result may be embedded in the XML file (local data), or it may be remote data.

The figure shows how the abstract table is rendered into the VOTable document. First the data is serialized, either as XML, a FITS binary table, or tyhe VOTable Binary format. This data stream may then be encoded, perhaps for compression or to convert binary to text. Finally, the data stream may be put in a remote file with a URL-type pointer in the VOTable document; or the table data may be embedded in the VOTable.

6.1  Data Serialization

The serialization elements and their attributes are:

6.1.1  TABLEDATA

This element is a way to build the table in pure XML, and is the only serialization method that does not allow an encoding or a remote data stream. It contains TR elements, which in turn contain TD elements. An example:

<TABLEDATA>
  <TR>
    <TD>Procyon</TD> <TD>114.827242</TD>
    <TD>5.227506</TD>  
  </TR>
  <TR>
    <TD>Vega</TD>    <TD>279.234106</TD>
    <TD>38.782992</TD> 
  </TR>
</TABLEDATA>

The number of TD elements should be in number equal to the number of FIELD elements declaring the table; when there are less TD's than expected, the corresponding values are set to "null"s; superfluous TD's are ignored.

While this serialization has a high overhead in the number of bytes, it has the advantage that XML tools can manipulate and present the table data directly.

Each item in the TD tag is passed to a reader that is implicitly defined by the datatype attribute of the corresponding FIELD, which attempts to read the object from it. If it reads a value that is the same as the NULL value for that field, then the cell will contain that value, and is therefore assumed to contain no data.

The reader may not succeed, for example if we try to parse the string "36.9H9" into a float, where the alphabetic character is obviously a problem. In this case, the parser may choose to insert the null value (no data available), or it may use a NaN (not a number), or it may throw an exception. It might however be useful if the data provider would warn that invalid data patterns could be used to designate non-existing data – via an invalid attribute.

If a cell contains an array or complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings, no separators are required. Here is an example of a table with a single row, that has arrays in the table cells:

<TABLE>
  <FIELD ID="aString" datatype="char" arraysize="10"/>
  <FIELD ID="Floats" datatype="float" arraysize="3"/>
  <FIELD ID="varComplex" datatype="floatComplex" arraysize="*"/>
  <DATA><TABLEDATA>
  <TR>
   <TD>Apple</TD><TD>1.62 4.56 3.44</TD>
   <TD>67 1.57  4 3.14  77 -1.57</TD>
  </TR><TR>
   <TD>Orange</TD><TD>2.33 4.66 9.53</TD>
   <TD>39 0  46 3.14</TD>
  </TR>
  </TABLEDATA></DATA>
</TABLE>

The first entry is a fixed-length array of 10 characters; since the value being presented (Apple) has 5 characters, this is padded with trailing blanks. The second cell is an array of three floats. The last cell contains a variable array of complex numbers, each complex number being represented by its real part followed by at least a blank and its imaginary part – hence 6 numbers for 3 complex numbers, or 4 numbers for 2 complex numbers.

6.1.2  FITS

The FITS format for binary tables is well-used in astronomy [2], and its structure is a major influence on the VOTable specification. Metadata is stored in a header section, followed by the data. The metadata is substantially equivalent to the metadata of the VOTable format. One important difference is that VOTable does not require specification of the number of rows in the table, an important freedom if the table is being created dynamically from a stream.

The VOTable specification does not define the behavior of parsers with respect to this doubling of the metadata. A parser may ignore the FITS metadata, or it may compare it with the VOTable metadata for consistency, or other possibilities.

The following code shows a fragment that might have been created by a FITS-to-VOTable converter. Each FITS keyword has been converted to a PARAM, and the data itself is remotely stored and gzipped at an ftp site:

<RESOURCE>
<PARAM name="EPOCH" datatype="float" value="1999.987"> Original Epoch of the coordinates </PARAM>
<PARAM name="TELESCOP" datatype="char" arraysize="*" value="VTel" />
<INFO name="HISTORY"> The very first Virtual Telescope observation made in 2002
</INFO>
<TABLE> <FIELD  (insert field metadata here) >
<DATA><FITS extnum="2">
<STREAM %Not for REMOTE data !! encoding="gzip" href="ftp://archive.cacr.caltech.edu/myfile.fit.gz"/>
</FITS></DATA>
</TABLE>
</RESOURCE>

The FITS file may contain many data objects (known as extensions, numbered from 1 up – the main header being numbered 0), and the extnum attribute allows the VOTable to point to one of those.

6.1.3  BINARY

The Binary format is intended to be easy to read by parsers, so that additional libraries are not required. It is just a sequence of byte strings, the length of each string corresponding to the datatype and arraysize attributes of the FIELD elements in the metadata. The binary format consists of a sequence of records, with no header bytes, no alignment considerations, no block sizes.

Table cells may contain arrays of primitive types, each of which may be of fixed or variable length. In the former case, the number of bytes is the same for each instance of the item, as specified by the arraysize attribute of the FIELD. If all the fields have a fixed arraysize, then each record of the binary format has the same length, as the sum of arraysize times the length in bytes of the corresponding datatype.

In the case of variable-length arrays of primitives, however, the Binary format becomes more complex. Each record has first a part for the fixed-length fields, (as well as four bytes for each of the variable-length fields), followed by a section for the variable length fields. The four bytes for the variable-length field is interpreted as a four-byte integer with the number of items in the variable-length array, as shown in the figure. The parser can then read the data by computing appropriate offsets. This is done by multiplying the size and number of the primitives in each table cell to get length in bytes, then adding these lengths from previous variable-length sections of the record.

 

6.2  Data Encoding

As a result of the serialization, the table has been converted to a byte stream, either text or binary. If the TABLEDATA serialization is used, then those elements are directly in the XML document, and conventional tools can be used to encode the entire XML document. However, VOTable also provides limited encoding of its own. A VOTable document may point to a remote data resource that is compressed; rather than decompressing before sending on the wire, it can be dynamically decoded by the VOTable reader. We might also use the encoding facilities to convert a binary file to text (through base64 encoding), so that binary data can be used in the XML document.

In this version of VOTable, it is not possible to encode individual columns of the table: the whole table must be encoded in the same way.

In order to use an encoding of the data, it must be enclosed in a STREAM element, whose attributes define the nature of the encoding. The encoding attribute is a string that should indicate to the parser how to undo the encoding that has been applied. Parsers should understand and interpret these values at least:

The parser may also respond to the string "dynamic", implying that the data is in a remote resource (see below), and the encoding will be delivered with the header of the data. This occurs with the http protocol, where the MIME header indicates the type of encoding that has been used. The default value of the encoding attribute is the null string, meaning that no encoding has been applied. In future releases, we will allow more complex strings in the encoding attribute, allowing combinations of encoding filters and a way for the parser to find the software needed for the decoding.

6.3  Remote Data

If the encoding of the data produces text, or if the serialization is naturally text-based, then it can be directly embedded into the XML document. However, if the data encoding produces binary, or if the data is very large, it may be preferable to keep the data separate from the metadata. The text contained in the STREAM element is then interpreted as the location of the data, rather than the data itself. The location is specified in a URL-type syntax, for example:

<STREAM href="ftp://server.com/mydata.dat"/>

<STREAM href="ftp://server.com/mydata.dat" expires="2002-02-22"/>

<STREAM href="httpg://server.com/mydata.dat" actuate="onLoad"/>

<STREAM file="file:///usr/home/me/mydata.dat"/>

The examples are the well-known anonymous ftp, and http protocols. Also is an example of a Grid-based access to data through httpg, and finally a reference to a local file.

There are further attributes of the STREAM element that may be useful. The expires tag is for when the VOTable is part of a pipeline of data processing, when data is being dynamically created and stored in temporary space, in which case it may be deleted after a certain time limit. The expires attribute expresses when a remote resource may cease to become valid, and is expressed in Universal Time in the same way as the FITS specification [2], itself conforming to ISO 8601 standard, for example:

<STREAM expires="2002-01-31T12:00:00">

The rights attribute expresses authentication information that may be necessary to access the remote resource. If the VOTable document is suitably encrypted, this attribute could be used to store a password.

The actuate attribute is borrowed from the XML Xlink specification, expressing when the remote link should be actuated. The default is "onRequest", meaning that the data is only fetched when explicitly requested (like a link on an HTML page), and the "onLoad" value means that data should be fetched as soon as possible (like an embedded image on an HTML page).

7  Definitions of Primitive Datatypes

8  Sample VOTable Document

<!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd">
<VOTABLE version="1.0" xmlns="http://vizier.u-strasbg.fr/VOTable">
  <DESCRIPTION>This is an example VOTable document</DESCRIPTION>
  <DEFINITIONS>
    <COOSYS ID="myJ2000" equinox="2000." epoch="2000" system="eq_FK5"/>
  </DEFINITIONS>
  <RESOURCE name="GSC1.2">
    <DESCRIPTION>
     This is an excerpt of the HST Guide Star Catalog, Version 1.2 (Lasker+ 1996).  
     This version was re-reduced with PPM catalogue.
    </DESCRIPTION>
    <TABLE>
      <DESCRIPTION>   Default result of GSC1.2 Server around a target</DESCRIPTION>
      <FIELD ID="_r" name="_r" ucd="POS_ANG_DIST" unit="arcmin" 
          datatype="float" width="7" precision="4">
        <DESCRIPTION>Distance from target NGC40</DESCRIPTION>
        <VALUES type="actual">
          <MIN value="0.0"/>
          <MAX value="10.0"/>
        </VALUES>
      </FIELD>
      <FIELD ID="gsc" name="GSC-Id" datatype="char" ucd="ID_MAIN" arraysize="10">
        <DESCRIPTION>The GSC-Id is made of 10 digits, 5 representing the plate number, 
             and 5 the object number on the plate.
        </DESCRIPTION>
      </FIELD>
      <FIELD ID="ra" name="RA(J2000)" ref="myJ2000" ucd="POS_EQ_RA" 
          unit="deg" datatype="double" precision="F5">
        <DESCRIPTION>Right ascension in J2000, epoch of plate</DESCRIPTION>
      </FIELD>
      <FIELD ID="dec" name="Dec(J2000)" ref="myJ2000" ucd="POS_EQ_DE" 
          unit="deg" datatype="double" precision=" F5" >
      <DESCRIPTION>Declination in J2000, epoch of plate</DESCRIPTION>
      </FIELD>
      <FIELD ID="pos_err" name="PossErr" unit="arcsec" datatype="float" 
          precision="1" ucd="ERROR">
      <DESCRIPTION>Mean error on position</DESCRIPTION>
      </FIELD>
      <FIELD ID="mag" name="Pmag" ucd="PHOT_PHG_MAG" unit="mag" 
          datatype="float" width="5" precision="2">
      <DESCRIPTION>photographic magnitude (see n_Pmag)</DESCRIPTION>
      </FIELD>
      <FIELD ID="mag_err" name="e_Pmag" ucd="ERROR" unit="mag" 
          datatype="float" width="4" precision="2">
      <DESCRIPTION>Mean error on photographic magnitude</DESCRIPTION>
      </FIELD>
      <FIELD ID="class" name="Class" ucd="CLASS_CODE" datatype="short"  width="1" >
      <DESCRIPTION>Class of object (0=star; 3=non-stellar)</DESCRIPTION>
      <VALUES type="actual">
      <OPTION name="star" value="0"/>
      <OPTION name="galaxy" value="3"/>
      </VALUES>
      </FIELD>
      <LINK content-role="doc" title="documentation" href="http://vizier.u-strasbg.fr/viz-bin/Cat?I/254"/>
      <DATA><TABLEDATA>
<TR><TD>0.0146</TD><TD>0430201297</TD><TD>4.7766</TD><TD>72.8474</TD><TD>3.6</TD><TD>8.59 </TD>
    <TD>0.20</TD><TD>0</TD></TR>
<TR><TD>0.9704</TD><TD>0430200545</TD><TD>5.4576</TD><TD>72.6528</TD><TD>0.2</TD><TD>12.18</TD>
    <TD>0.34</TD><TD>0</TD></TR>
<TR><TD>0.9730</TD><TD>0430200545</TD><TD>3.9867</TD><TD>72.9484</TD><TD>0.2</TD><TD>12.09</TD>
    <TD>0.20</TD><TD>0</TD></TR>
<TR><TD>1.5843</TD><TD>0430202363</TD><TD>8.9587</TD><TD>72.6635</TD><TD>0.2</TD><TD>14.38</TD>
    <TD>0.34</TD><TD>0</TD></TR>
<TR><TD>2.8586</TD><TD>0430200269</TD><TD>5.4847</TD><TD>72.8272</TD><TD>0.3</TD><TD>14.96</TD>
    <TD>0.20</TD><TD>3</TD></TR>
<TR><TD>2.9198</TD><TD>0430200153</TD><TD>10.4746</TD><TD>72.4542</TD><TD>0.2</TD><TD>12.89</TD>
    <TD>0.20</TD><TD>0</TD></TR>
<TR><TD>2.9215</TD><TD>0430200153</TD><TD>6.9484</TD><TD>72.1162</TD><TD>0.2</TD><TD>13.06</TD>
    <TD>0.34</TD><TD>0</TD></TR>
<TR><TD>3.0487</TD><TD>0430202336</TD><TD>4.7586</TD><TD>72.9837</TD><TD>0.2</TD><TD>14.38</TD>
    <TD>0.34</TD><TD>0</TD></TR>
<TR><TD>3.2247</TD><TD>0430200121</TD><TD>7.9585</TD><TD>72.5565</TD><TD>0.2</TD><TD>12.39</TD>
    <TD>0.21</TD><TD>0</TD></TR>
<TR><TD>3.2269</TD><TD>0430200121</TD><TD>7.9484</TD><TD>72.5874</TD><TD>0.2</TD><TD>12.50</TD>
    <TD>0.34</TD><TD>0</TD></TR>
      </TABLEDATA></DATA>
    </TABLE>
 </RESOURCE>
</VOTABLE>

9  The DTD for VOTable

Note: this DTD is also accessible in the 2 sites:

http://us-vo.org/xml/VOTable.dtd
http://cdsweb.u-strasbg.fr/xml/VOTable.dtd

and a draft version of a VOTable document making use of XML Schemas terminology [8] is included as an Appendix.

<!-- DOCUMENT TYPE DEFINITION for VOTable = Virtual Observatory Tabular Format
     See History at      http://vizier.u-strasbg.fr/doc/VOTable
     See Discussions at  http://archives.us-vo.org/VOTable
     Reference DTD as    http://us-vo.org/xml/VOTable.dtd
		or at    http://cdsweb.u-strasbg.fr/xml/VOTable.dtd
     XML Schema at       http://us-vo.org/xml/VOTable.xsd
		or at    http://cdsweb.u-strasbg.fr/xml/VOTable.xsd
.Version 1.0 : 15-Apr-2002
-->

<!-- VOTABLE is the root element -->
<!ELEMENT VOTABLE (DESCRIPTION?, DEFINITIONS?, INFO*, RESOURCE*)>
<!ATTLIST VOTABLE
        ID ID #IMPLIED
        version CDATA #IMPLIED
>

<!-- RESOURCEs can contain other RESOURCES,
     together with TABLEs and other stuff -->
<!ELEMENT RESOURCE (DESCRIPTION?, INFO*, COOSYS*, PARAM*, LINK*, 
     TABLE*, RESOURCE*)>
<!ATTLIST RESOURCE
        name CDATA #IMPLIED
        ID ID #IMPLIED
        type (results | meta) "results"
>

<!ELEMENT DESCRIPTION (#PCDATA)>
<!ELEMENT DEFINITIONS (COOSYS?, PARAM?)*>

<!-- INFO is a name-value pair -->
<!ELEMENT INFO (#PCDATA)>
<!ATTLIST INFO
        ID ID #IMPLIED
        name CDATA #IMPLIED
        value CDATA #IMPLIED
>

<!-- A PARAM is similar to a FIELD, but it also has a "value attribute -->
<!ELEMENT PARAM (DESCRIPTION?, VALUES?, LINK*)>
<!ATTLIST PARAM
        ID ID #IMPLIED
        unit CDATA #IMPLIED
        datatype (boolean | bit | unsignedByte | short | int | long | char
	| unicodeChar | float | double | floatComplex | doubleComplex) #IMPLIED
        precision CDATA #IMPLIED
        width CDATA #IMPLIED
        ref IDREF #IMPLIED
        name CDATA #IMPLIED
        ucd CDATA #IMPLIED
        value CDATA #IMPLIED
        arraysize CDATA #IMPLIED
>

<!-- A TABLE is a sequence of FIELDS and LINKS and DESCRIPTION,
     possibly followed by a DATA section -->
<!-- ELEMENT TABLE (DESCRIPTION?, LINK*, FIELD*, DATA?) -->
<!ELEMENT TABLE (DESCRIPTION?, FIELD*, LINK*, DATA?)>
<!ATTLIST TABLE
        ID ID #IMPLIED
        name CDATA #IMPLIED
        ref IDREF #IMPLIED
>

<!-- FIELD is the definition of what is in a column of the table -->
<!-- A field may have 2 sets of VALUES: "legfal" and "actual" -->
<!ELEMENT FIELD (DESCRIPTION?, VALUES*, LINK*)>
<!ATTLIST FIELD
        ID ID #IMPLIED
        unit CDATA #IMPLIED
        datatype (boolean | bit | unsignedByte | short | int | long | char
	| unicodeChar | float | double | floatComplex | doubleComplex) #IMPLIED
        precision CDATA #IMPLIED
        width CDATA #IMPLIED
        ref IDREF #IMPLIED
        name CDATA #IMPLIED
        ucd CDATA #IMPLIED
        arraysize CDATA #IMPLIED
        type (hidden | no_query | trigger) #IMPLIED
>

<!-- VALUES expresses the values that can be taken by the data in a column. -->
<!ELEMENT VALUES (MIN?, MAX?, OPTION*)>
<!ATTLIST VALUES
        ID ID #IMPLIED
        type (legal | actual) "legal"
        null CDATA #IMPLIED
        invalid (yes | no) "no"
>
<!ELEMENT MIN (#PCDATA)>
<!ATTLIST MIN
        value CDATA #REQUIRED
        inclusive (yes | no) "yes"
>
<!ELEMENT MAX (#PCDATA)>
<!ATTLIST MAX
        value CDATA #REQUIRED
        inclusive (yes | no) "yes"
>
<!ELEMENT OPTION (OPTION*)>
<!ATTLIST OPTION
        name CDATA #IMPLIED
        value CDATA #REQUIRED
>

<!-- The link is a URL (href) or some other kind of reference (gref). -->
<!ELEMENT LINK (#PCDATA)>
<!ATTLIST LINK
        ID ID #IMPLIED
        content-role (query | hints | doc) #IMPLIED
        content-type CDATA #IMPLIED
        title CDATA #IMPLIED
        value CDATA #IMPLIED
        href CDATA #IMPLIED
        gref CDATA #IMPLIED
        action CDATA #IMPLIED
>

<!-- DATA is the actual table data, in one of three formats -->
<!ELEMENT DATA (TABLEDATA | BINARY | FITS)>

<!-- Pure XML data -->
<!ELEMENT TABLEDATA (TR*)>
<!ELEMENT TR (TD+)>
<!ELEMENT TD (#PCDATA)>
<!ATTLIST TD
        ref IDREF #IMPLIED
>

<!-- FITS file, perhaps with specification of which extension to seek to -->
<!ELEMENT FITS (STREAM)>
<!ATTLIST FITS
        extnum CDATA #IMPLIED
>

<!-- Binary data format -->
<!ELEMENT BINARY (STREAM)>

<!-- Stream can be local or remote, encoded or not -->
<!ELEMENT STREAM (#PCDATA)>
<!ATTLIST STREAM
        type (locator | other) "locator"
        href CDATA #IMPLIED
        actuate (onLoad | onRequest | other | none) "onRequest"
        encoding (gzip | base64 | dynamic | none) "none"
        expires CDATA #IMPLIED
        rights CDATA #IMPLIED
>

<!-- Expresses the coordinate system we are using -->
<!ELEMENT COOSYS (#PCDATA)>
<!ATTLIST COOSYS
        ID ID #IMPLIED
        equinox CDATA #IMPLIED
        epoch CDATA #IMPLIED
        system (eq_FK4 | eq_FK5 | ICRS | ecl_FK4 | ecl_FK5 | galactic
               | supergalactic | xy | barycentric | geo_app) "eq_FK5"
>

10  Schema Diagram for VOTable


Appendices

A  A draft XML Schema defining VOTable

Note: The XML Schema introduces more strict definitions of some patterns which can otherwise not be specified in a DTD, like the arraysize which can only be made of digits and x, and an optional * terminator. Like the DTD presented in section 9, this XML Schema is also accessible from

http://us-vo.org/xml/VOTable.xsd
http://cdsweb.u-strasbg.fr/xml/VOTable.xsd

<?xml version="1.0" encoding="UTF-8"?>
<!--W3C Schema for VOTable  = Virtual Observatory Tabular Format
     See History at      http://vizier.u-strasbg.fr/doc/VOTable
     See Discussions at  http://archives.us-vo.org/VOTable
  This XML schema can be referenced by
       http://us-vo.org/xml/VOTable.xsd
   or  http://cdsweb.u-strasbg.fr/xml/VOTable.xsd 
  The DTD is available from
       http://us-vo.org/xml/VOTable.dtd
   or  http://cdsweb.u-strasbg.fr/xml/VOTable.dtd 
.Version 1.0 : 15-Apr-2002
.Version 1.0a: 27-Sep-2001 in MIN MAX STREAM
.Version 1.0b: 09-Nov-2002 from Steve Lowe, slowe@head-cfa.harvard.edu:
	       in DEFINITIONS, use 'xs:sequence' rather than 'xs:all'
.Version 1.0c: 12-Nov-2002 from Steve Lowe, use xs:choice in DEFINITIONS
.Version 1.0d: 10-Dec-2002 Allow several INFO TABLE
.Version 1.0e: 01-Oct-2003 Changes in LINK 
-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

<!-- Here we define some interesting new datatypes:
     - anyTEXT   may have embedded XHTML (conforming HTML)
     - astroYear is an epoch in Besselian or Julian year, e.g. J2000
     - arrayDEF  specifies an array size e.g. 12x23x*
     - dataType  defines the acceptable datatypes
     - precType  defines the acceptable precisions
     - yesno     defines just the 2 alternatives
-->

<xs:complexType name="anyTEXT" mixed="true">
      <xs:sequence>
        <xs:any minOccurs="0" maxOccurs="unbounded" processContents="skip"/>
      </xs:sequence>
</xs:complexType>

<xs:simpleType  name="astroYear">
  <xs:restriction base="xs:token">
    <xs:pattern  value="[JB][0-9]+([.][0-9]*)?"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType  name="arrayDEF">
  <xs:restriction base="xs:token">
    <xs:pattern  value="([0-9]+x)*[0-9]*[*]?"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="dataType">
  <xs:restriction base="xs:NMTOKEN">
    <xs:enumeration value="boolean"/>
    <xs:enumeration value="bit"/>
    <xs:enumeration value="unsignedByte"/>
    <xs:enumeration value="short"/>
    <xs:enumeration value="int"/>
    <xs:enumeration value="long"/>
    <xs:enumeration value="char"/>
    <xs:enumeration value="unicodeChar"/>
    <xs:enumeration value="float"/>
    <xs:enumeration value="double"/>
    <xs:enumeration value="floatComplex"/>
    <xs:enumeration value="doubleComplex"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="precType">
  <xs:restriction base="xs:token">
    <xs:pattern value="[EF]?[1-9][0-9]*"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="yesno">
  <xs:restriction base="xs:NMTOKEN">
    <xs:enumeration value="yes"/>
    <xs:enumeration value="no"/>
  </xs:restriction>
</xs:simpleType>

<!-- VOTable is the root element -->
  <xs:element name="VOTABLE">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DESCRIPTION" minOccurs="0"/>
        <xs:element ref="DEFINITIONS" minOccurs="0"/>
        <xs:element ref="INFO" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="RESOURCE" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="version">
        <xs:simpleType>
          <xs:restriction base="xs:NMTOKEN">
            <xs:enumeration value="1.0"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

<!-- RESOURCES can contain DESCRIPTION, (INFO|PARM|LINK), (TABLE|RESOURCE) -->
  <xs:element name="RESOURCE">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DESCRIPTION" minOccurs="0"/>
        <xs:element ref="INFO" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="COOSYS" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="PARAM" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="LINK" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="TABLE" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="RESOURCE" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="name" type="xs:token"/>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="type" default="results">
        <xs:simpleType>
          <xs:restriction base="xs:NMTOKEN">
            <xs:enumeration value="results"/>
            <xs:enumeration value="meta"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

  <xs:element name="DESCRIPTION" type="anyTEXT" />

  <xs:element name="DEFINITIONS">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="COOSYS" />
        <xs:element ref="PARAM" />
      </xs:choice>
    </xs:complexType>
  </xs:element>

<!-- INFO is a name-value pair -->
  <xs:element name="INFO">
    <xs:complexType mixed="true"><xs:complexContent>
      <xs:extension base="anyTEXT">
        <xs:attribute name="ID" type="xs:ID"/>
        <xs:attribute name="name" type="xs:token" use="required"/>
        <xs:attribute name="value" type="xs:string" use="required"/>
      </xs:extension>
    </xs:complexContent></xs:complexType>
  </xs:element>

<!-- A PARAM is similar to a FIELD, but it also has a "value" attribute -->
  <xs:element name="PARAM">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DESCRIPTION" minOccurs="0"/>
        <xs:element ref="VALUES" minOccurs="0"/>
        <xs:element ref="LINK" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="unit" type="xs:token"/>
      <xs:attribute name="datatype" type="dataType"/>
      <xs:attribute name="precision" type="precType"/>
      <xs:attribute name="width" type="xs:positiveInteger"/>
      <xs:attribute name="ref" type="xs:IDREF"/>
      <xs:attribute name="name" type="xs:token" use="required"/>
      <xs:attribute name="ucd" type="xs:token"/>
      <xs:attribute name="value" type="xs:string"/>
      <xs:attribute name="arraysize" type="arrayDEF"/>
    </xs:complexType>
  </xs:element>

<!-- A TABLE is a sequence of FIELDS and LINKS and DESCRIPTION, 
     possibly followed by a DATA section 
-->
  <xs:element name="TABLE">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DESCRIPTION" minOccurs="0"/>
        <xs:element ref="FIELD" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="LINK" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="DATA" minOccurs="0"/>
      </xs:sequence>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="name" type="xs:token"/>
      <xs:attribute name="ref" type="xs:IDREF"/>
    </xs:complexType>
  </xs:element>

<!-- FIELD is the definition of what is in a column of the table -->
  <xs:element name="FIELD">
    <xs:complexType>
      <xs:sequence minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="DESCRIPTION" minOccurs="0"/>
        <xs:element ref="VALUES" minOccurs="0" maxOccurs="2"/>
        <xs:element ref="LINK" minOccurs="0"/>
      </xs:sequence>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="unit" type="xs:token"/>
      <xs:attribute name="datatype" type="dataType" use="required"/>
      <xs:attribute name="precision" type="precType"/>
      <xs:attribute name="width" type="xs:positiveInteger"/>
      <xs:attribute name="ref" type="xs:IDREF"/>
      <xs:attribute name="name" type="xs:token" use="required"/>
      <xs:attribute name="ucd" type="xs:string"/>
      <xs:attribute name="arraysize" type="xs:string"/>
      <xs:attribute name="type">
        <xs:simpleType>
          <xs:restriction base="xs:NMTOKEN">
            <xs:enumeration value="hidden"/>
            <xs:enumeration value="no_query"/>
            <xs:enumeration value="trigger"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

<!-- VALUES expresses the values that can be taken by the data 
     in a column or by a parameter
-->
  <xs:element name="VALUES">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="MIN" minOccurs="0"/>
        <xs:element ref="MAX" minOccurs="0"/>
        <xs:element ref="OPTION" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="type" default="legal">
        <xs:simpleType>
          <xs:restriction base="xs:NMTOKEN">
            <xs:enumeration value="legal"/>
            <xs:enumeration value="actual"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="null" type="xs:token"/>
      <xs:attribute name="invalid" type="yesno" default="no"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="MIN">
    <xs:complexType>
      <xs:attribute name="value" type="xs:string" use="required"/>
      <xs:attribute name="inclusive" type="yesno" default="yes"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="MAX">
    <xs:complexType>
      <xs:attribute name="value" type="xs:string" use="required"/>
      <xs:attribute name="inclusive" type="yesno" default="yes"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="OPTION">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="OPTION" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="name" type="xs:token"/>
      <xs:attribute name="value" type="xs:string" use="required"/>
    </xs:complexType>
  </xs:element>

<!-- The LINK is a URL (href) or some other kind of reference (gref) -->
  <xs:element name="LINK">
    <xs:complexType mixed="true">
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="content-role">
        <xs:simpleType>
          <xs:restriction base="xs:NMTOKEN">
            <xs:enumeration value="query"/>
            <xs:enumeration value="hints"/>
            <xs:enumeration value="doc"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="content-type" type="xs:token"/>
      <xs:attribute name="title" type="xs:string"/>
      <xs:attribute name="value" type="xs:string"/>
      <xs:attribute name="href" type="xs:anyURI"/>
      <xs:attribute name="gref" type="xs:token"/>
      <xs:attribute name="action" type="xs:anyURI"/>
    </xs:complexType>
  </xs:element>

<!-- DATA is the actual table data, in one of three formats -->
  <xs:element name="DATA">
    <xs:complexType>
      <xs:choice>
        <xs:element ref="TABLEDATA"/>
        <xs:element ref="BINARY"/>
        <xs:element ref="FITS"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>

<!-- Pure XML data -->
  <xs:element name="TABLEDATA">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="TR" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="TD">
    <xs:complexType mixed="true"><xs:complexContent>
      <xs:extension base="anyTEXT">
        <xs:attribute name="ref" type="xs:IDREF"/>
      </xs:extension>
    </xs:complexContent></xs:complexType>
  </xs:element>

  <xs:element name="TR">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="TD" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

<!-- FITS file, perhaps with specification of which extension to seek to -->
  <xs:element name="FITS">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="STREAM"/>
      </xs:sequence>
      <xs:attribute name="extnum" type="xs:positiveInteger"/>
    </xs:complexType>
  </xs:element>

<!-- BINARY data format -->
  <xs:element name="BINARY">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="STREAM"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

<!-- STREAM can be local or remote, encoded or not -->
  <xs:element name="STREAM">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute name="type" default="locator">
            <xs:simpleType>
              <xs:restriction base="xs:NMTOKEN">
                <xs:enumeration value="locator"/>
                <xs:enumeration value="other"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
          <xs:attribute name="href" type="xs:anyURI"/>
          <xs:attribute name="actuate" default="onRequest">
            <xs:simpleType>
              <xs:restriction base="xs:NMTOKEN">
                <xs:enumeration value="onLoad"/>
                <xs:enumeration value="onRequest"/>
                <xs:enumeration value="other"/>
                <xs:enumeration value="none"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
          <xs:attribute name="encoding" default="none">
            <xs:simpleType>
              <xs:restriction base="xs:NMTOKEN">
                <xs:enumeration value="gzip"/>
                <xs:enumeration value="base64"/>
                <xs:enumeration value="dynamic"/>
                <xs:enumeration value="none"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
          <xs:attribute name="expires" type="xs:dateTime"/>
          <xs:attribute name="rights" type="xs:token"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

<!-- Expresses the coordinate system we are using -->
  <xs:element name="COOSYS">
    <xs:complexType mixed="true"><xs:complexContent>
      <xs:extension base="anyTEXT">
        <xs:attribute name="ID" type="xs:ID"/>
        <xs:attribute name="equinox" type="astroYear"/>
        <xs:attribute name="epoch" type="astroYear"/>
        <xs:attribute name="system" default="eq_FK5">
          <xs:simpleType>
            <xs:restriction base="xs:NMTOKEN">
              <xs:enumeration value="eq_FK4"/>
              <xs:enumeration value="eq_FK5"/>
              <xs:enumeration value="ICRS"/>
              <xs:enumeration value="ecl_FK4"/>
              <xs:enumeration value="ecl_FK5"/>
              <xs:enumeration value="galactic"/>
              <xs:enumeration value="supergalactic"/>
              <xs:enumeration value="xy"/>
              <xs:enumeration value="barycentric"/>
              <xs:enumeration value="geo_app"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
      </xs:extension></xs:complexContent></xs:complexType>
  </xs:element>

</xs:schema>

B  VOTable LINK substitutions

This section summarizes a part of the Astrores format which is a proposed extension to VOTable, allowing the document to represent not just a table and its structure, but also a request for a table. It defines the behavior of a LINK when it appears inside a TABLE.

When a LINK element appears within a TABLE, there is extra functionality implied. The href or gref attributes may not be a simple link, but instead a template for a link. For example, in the table of section 1.1, we add the link:

<LINK href="http://us-vo.org/lookup?Star=${Star-Name}&RA=${RA}&DE=${Dec}"/>

The implication is that the text is seen in the context of a particular row of the table, and a substitution filter is applied. If the selected row of the table is the first one, the result of the substitution would be:

http://us-vo.org/lookup?Star=Procyon&RA=114.827&DE=5.227

Whenever the pattern ${...} is found in the original link, the part in the braces is compared with the set of name attributes of the fields of the table. If a match is found, then the value from that field of the selected row is used in place of the ${...}. If no match is found, no substitution is made. Thus the parser makes available to the calling application a value of the href and gref attributes that depends on which row of the table has been selected. Another way to think of it is that there is not a single link associated with the table, but rather an implicitly defined new column of the table. This mechanism can be used to connect each row of the table to further information resources.

The action attribute is related to the Query mechanism described in the next section.

The purpose of the link is defined by the content-role attribute. The allowed values are "query", "hints", and "doc". The first implies that string substitution should be used as defined above, and the latter two imply first that no substitution is needed, and that the link points to either information for use by the application ("hints") or human-readable documentation ("doc").

The type attribute of the FIELD may carry values that express the status of the field when the enclosing table describes the components for submitting a query, rather than a data document. If the value is "noquery", then the marked field is ignored in the creation of the action query – this field does not belong to the form described by the set of FIELDs. A computed column (value computed from other FIELDs) is a typical example of a FIELD which content is to be ignored in the generation of a query.

If type="trigger", then the marked field contains data necessary for correct LINK generation. If for instance only the columns "RA" and "Dec" are asked, but a link requires the knowledge of a "RecordNumber" to be operational, the result contains the additional column ``RecordNumber" flagged as a "trigger" field.

C  VOTable Query Extension

The definitions enclosed in this section are not part of VOTable. This section is a short explanation on how Astrores defines the set of parameters and fields which can be qualified for a query – what could be defined as the contents of a form. VOTable currently does not define the parameters available for a query; such definitions are delayed to the next version of VOTable, and could make use of the Web Services Description Language (WSDL)

In Astrores [1], the details on the input parameters available in queries are described by the PARAM and FIELD, and the syntax used to generate the actual query is described in the ASU [6] procotol: the FIELD elements are paired in the form name=value, where name is the contents of the name attribute, and value represents a constraint written with the ASU conventions (e.g. "<8" or "12.0..12.5" which denotes a range of values). Such pairs are appended to the action specified in the LINK element contained in the RESOURCE, separated by the ampersand (&) symbol – in a way similar to the FORM HTML syntax. PARAM or FIELD which have the attribute type="noquery" are however ignored, and are never paired. A valid query could be:

http://server/asu-xml?-source=I/271/out&-out.max=99&-c=01:02:03-12:31:14&Rmag=12.0..12.5

and the corresponding VOTable document that generates it:

<?xml version="1.0"?>
<!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd">
<VOTABLE version="1.0">
  <DEFINITIONS>
    <COOSYS ID="J2000" equinox="2000." epoch="2000." system="eq_FK5"/>
  </DEFINITIONS>
  <RESOURCE ID="GSC2.2.01" type="meta">
    <DESCRIPTION>The GSC 2.2 Catalogue (STScI, 2001,455851237 objects)</DESCRIPTION> 
    <PARAM ID="MaxRec" name="-out.max" ucd="NUMBER" datatype="int" value="50">
      <DESCRIPTION>Maximal number of retrieved records</DESCRIPTION> 
      <VALUES multiple="no" type="legal">
        <MIN value="1" inclusive="yes" /> 
        <MAX value="9999" inclusive="yes" />
      </VALUES>
    </PARAM>
    <PARAM ID="SkyLocation" name="-c" ucd="POS_EQ" datatype="char" arraysize="*">
      <DESCRIPTION>Target position in the sky according to the ASU syntax in J2000
      </DESCRIPTION> 
    </PARAM>
    <TABLE>
      <FIELD name="GSC2.2" UCD="ID_MAIN" datatype="char" arraysize="14">
        <DESCRIPTION>Identification of the object</DESCRIPTION></FIELD>
      <FIELD name="RA(ICRS)" UCD="POS_EQ_RA_MAIN" ref="J2000" 
          datatype="double" width="10" precision="6" unit="deg">
        <DESCRIPTION>Right Ascension in ICRS (J2000), at Epoch
        </DESCRIPTION>
      </FIELD>
      <FIELD name="DE(ICRS)" UCD="POS_EQ_DEC_MAIN" ref="J2000" 
          datatype="double" width="10" precision="6" unit="deg">
        <DESCRIPTION>Declination in ICRS (J2000)
        </DESCRIPTION>
      </FIELD>
      <FIELD name="Rmag" UCD="PHOT_PHG_R" datatype="float" width="5" 
          precision="2" unit="mag">
        <DESCRIPTION>?Magnitude in F photographic band (red)
        </DESCRIPTION>
      </FIELD>
      <FIELD name="e_Rmag" UCD="ERROR" datatype="float" width="5" 
          precision="2" unit="mag">
        <DESCRIPTION>? Mean error on Rmag (2)</DESCRIPTION> 
      </FIELD>
      <LINK content-role="query" action="asu-xml?-source=I/271/out&amp;" /> 
    </TABLE>
  </RESOURCE>
</VOTABLE>

Note that the RESOURCE displaying the parameters accessible for a query has the type="meta" attribute; it is also assumed that only one LINK having the content-role="query" attribute together with an action attribute exists within the current RESOURCE.

References

[1] Accomazziet. al, Describing Astronomical Catalogues and Query Results with XML
http://vizier.u-strasbg.fr/doc/astrores.htx

[2] FITS: Flexible Image Transport Specification, specifically the Binary Tables Extension
http://fits.gsfc.nasa.gov/

[3] Standards for Astronomical Catalogues: Units, CDS Strasbourg
http://vizier.u-strasbg.fr/doc/catstd-3.2.htx

[4] Unified Content Descriptors
http://vizier.u-strasbg.fr/doc/UCD.htx

[5] GLU: Générateur de Liens Uniformes, CDS Strasbourg
http://simbad.u-strasbg.fr/glu/glu.htx

[6] ASU: Astronomical Server URL, CDS Strasbourg
http://vizier.u-strasbg.fr/doc/asu.html

[7] XDF: Extensible Data format, ADC
http://xml.gsfc.nasa.gov/XDF/XDF_home.html

[8] XML Schema: W3C Document
http://www.w3.org/XML/Schema