Notes from the Opticon Interoperability Meeting


Notes from the Interoperability (Opticon) Meeting at Strasbourg VOTable discussion, 29 January 2002 (DRAFT) 2002-Feb-05 Francois Ochsenbein
Participants: Christophe Arviset, ESA-VILSPA, AVO Francois Bonnarel, CDS Clive Davenhall, ROE, AstroGrid Erik Deul, Leiden, ASTRO-WISE Sebastien Derriere Pierre Didelon, TERAPIX, CEA-SAP Markus Dolensky, ST-ECF Daniel Durand, CADC Pierre Fernique, CDS Francoise Genova, CDS Dave Giaretta, RAL Bob Hanisch, NVO, STScI Jim Lewis, Cambridge Astronomical Survey Unit, Cambridge (UK) Mireille Louys, CDS Jonathan McDowell, Chandra Science Center, Cambridge (Mass) Tom McGlynn, HEASARC Bob Mann, ROE, AstroGrid, AVO, Planck Alberto Micol, ST-ECF, AVO Francois Ochsenbein, CDS Clive Page, Leicester, AstroGrid Fabio Pasian, Trieste, Planck, TNG Francesco Pierfederici, AstroVirtel, ST-ECF Benoit Pirenne, AVO, ST-ECF, ESO Ray Plante, NCSA, NVO Philippe Prugniel, Lyon, Hypercat Guy Rixon, Cambridge (UK), AstroGrid, AVO, Arnold Rots, Chandra Science Center, Cambridge (Mass) Andre Schaaff, CDS Enrique Solano, LAEFF Wolfgang Voges, MPE Marc Wenger, CDS Andras Wicenec, ESO, AVO
The current version (0.4) of the VOTable document can be read from http://vizier.u-strasbg.fr/doc/VOTable/ ; PDF and MSWORD versions are also available from this page. "Combined comments" about this version, prepared by Clive Davenhall, David Giaretta, Bob Mann, Clive Page and Guy Rixon, are also available at http://vizier.u-strasbg.fr/doc/VOTable/combined_0.4.htx
The relations between FITS and VOTable were extensively discussed; the following is a tentative summary of the questions raised, followed by some comments introduced by |. I have numbered the questions, for possible referencing; the comments include also the results of some investigations made after the meeting. The follow-up of the meeting should be the preparation of a version 0.5 of the VOTable document in the forthcoming days, the aim being a version 1.0 ready for the Garching meeting ("Toward an International Virtual Observatory", June 10-14, see http://www.eso.org/gen-fac/meetings/vo2002/ ) A. Basic ("existentialist") questions about VOTable and its context: A1 What is the problem to which VOTable is the solution ? - a data storage format ? - a data transport format accross VO participants -- or more ? - a format for metadata transport ? - a query system designed for archive/catalogue repositories ? - a resource discovery system accross VO participants (portals) | Note that FITS was designed, 25 years ago, as a "flexible image | transport system"; it is nowadays also a format widely used for | data archival. | | VOTable was created from Astrores, an XML structuration of tabular | results with emphasis on the following actions: | - propagate the metadata together with the data in such a way that | applications can interpret the data coming from different sources | as accurately as possible, with their proper physical meaning, | unit, and system (e.g. photometric) in which the data are expressed; | - be efficient in the data transfer, i.e. the metadata are not | repeated for each row of results; | - propose LINKs in addition to the results to enable e.g. more | detailed results, an access to related data (images, spectra), etc. A2 What are the problems to which VOTable is not the solution ? | It was felt that the VOTable document should tackle some aspects | of these questions (i.e. A1 and A2) in its introduction. B. FITS and VOTable B1 Why not just expanding FITS to meet our requirements of tabular data exchanges | There are the following basic limitations in FITS which make | difficult to work with FITS (ascii or binary) tables: | - FITS structures can't be generated as streams (sizes of the | various elements have to be known before reading the data) | - dedicated tools are required to manipulate FITS files, the | standard Unix or PC tools can't just display the contents of a | FITS file | - structural limitations in FITS keyword length make it difficult | to define e.g. HTTP links | - FITS metadata are poorly defined in terms of physical meaning; | the exercise of adding the necessary keywords / conventions would | require most likely several years. B2 whatever VOTable is, we should ensure that the conversion FITS --> VOTable --> FITS can be done without any loss of data. | The VOTable <FIELD> definitions contain all FITS details describing | the various aspects of a table column; the FITS headers transport | also important informations via COMMENT or HISTORY keywords, | and VOTable should supply a structure (<INFO> or <PARAMETER> or | TBD other XML tag) to pass these details along. B3 "Appendix B" of FITS definitions | The appendix B of the FITS standard NOST 100-2.0 contains three | extensions which are currently used in the FITS community: | B1 - the 'Variable Length Array' facility (covered in VOTable) | B2 - the 'Multidimensional Array' convention (or the 'TDIM' | convention) currently widely used; a way of specifying this | dimensionality could easily be defined in VOTable | e.g. arraysize="1024x512" | B3 - the 'Substring Array' (SSTR) convention to enable fixed- or | variable-length substrings; nobody in the meeting had used | that convention. C. VOTable and the various data formats. C1 Why does VOTable use this hybrid of XML and CSV ? | Most people seem to dislike the mixture of XML and CSV as it | is in the VOTable document; the complete document should either | be fully XML compliant, or use the <BINARY> (with/without <STREAM> | -- or Xlink?) constructions. 'CSV' tables could however be brought | via the <STREAM> tag linking to a document with tab-separated-value | mime type --- such a possibility was not excluded. | | The conclusions of the discussion are clearly suggesting that the | <CSV> tag ought to be removed from VOTable definition; | acceptable data presentations are either the <TABLEDATA> and | an XML-compliant table, or the <STREAM> (or maybe Xlink?) | It was also suggested that <TR> and <TD> are better (and shorter) | names instead of <ROW> and <CELL>. | | The next version of the VOTable document should take these comments | into account. C2 Why not define and use dedicated tags, e.g. <RA> ... </RA> ? That would allow the standard XML tools to find at once the interesting parameters within the data. | The problem is that it is not possible to define a priori all possible | data types with all their variations (e.g. exact coordinate system, | exotic photometric systems, etc) as dedicated tags -- new types of | parameters, data systems (e.g. photometric) being created regularly; | an alternative to this limitation would be an extensive usage of the | XSchema (see http://www.w3.org/XML/Schema). More study would be | necessary to assess the complete semantics of such procedures, | and their relation with the XDF developments (see | http://xml.gsfc.nasa.gov/XDF/XDF_home.html) | | It should also be added that the non-XML way of supplying the data | (via STREAM or XLink) obviously ignores such an 'individual' tagging; | in this latter case the interpretation of any parameter has to rely | on its rank in a row. It should also be noticed that the ranking of | an element is a parameter that is accessible in the | XPath specifications. C3 Blanks significance: are leading blanks significants ? Apparently they are not in XML. | This point should be investigated -- it would likely mean that | the leading significant blanks would have to be escaped as &nbsp; C4 The <STREAM> element raised mainly two questions: C4a why not, for remote data, not just use XLink or at least have the XLink compatibility ? And would it be possible, for links to FITS, to use the XLink capabilities to point to a specific extension of a FITS file ? | The possible usage or the limitations/drawbacks of XLink | (see http://www.w3.org/TR/xlink) has to be discussed in the | VOTable document. C4b is it necessary to accept "embedded data streams" -- | the usefulness of having 'everything in one set' -- metadata | AND the relevant data -- was however pointed out. D. The FIELD definitions D1 The 'datatype' -- why a single character ? Couldn't 'A' be replaced by 'asci' or more self-explanatory definitions ? | The FITS way was chosen -- this ensures the compatibility -- | but alternative expressions could be used. D2 The accuracy and precision: clear distinction between the 'precision' (how many 'digits', binary or asci, are necessary to give a realistic presentation of the value) and the 'accuracy' expressed e.g. by a standard error. The 'accuracy' of a parameter may be, either associated to each parameter value (i.e. a column exists which contains the standard error of the parameter), or 'global' to a table (e.g. the standard error of the parameter 'alpha' is 30mas). | The 'precision' attribute is defined in VOTable to match the | TDISP FITS keyword -- with exactly the same meaning: how many | figures are required in the ascii representation of the numbers. | Ways of specifying 'global' accuracies of parameters (another | attribute?) should be investigated. D3 The Relations UCDs / VOTable: while there was an agreement about the usefulness of the UCD classification, no detailed criticism has been expressed. D4 The 'invalid' / 'null' attributes (in the <VALUE> element) The 'invalid' attribute was felt as unnecessary; 'null' values are represented in the XML <TABLEDATA> as empty cells. | The 'invalid' attribute could also be understood as a possibility | of accepting several representations of 'null' values. D5 Logical values (boolean): what is their representation? FITS accepts uppercase 'T' (true) 'F' (false) and the hexadecimal 0x00 for 'null' | The 'null' representation would better be a blank or a question | mark -- and true / false by their universal representations '1' | and '0' -- but the FITS compatibility would suffer from such a change. D6 Sort order specifications of the data. Currently the tabular data (within <TABLEDATA> or via <STREAM>) is considered as an unordered set of rows. | When the data are coming in an known order, specifying this | order would be a useful information for the applications. E. The Query/Anwser scheme E1 The Query mechanism It was generally felt that the Query/Anwser mechanism is not clearly explained in the VOTable document: how to form the query from the set of FIELDs or other parameters ? | The 'action' attribute of the <LINK> element is the convention used | in Astrores to set up a query in a form similar to the HTML <FORM> | definition; the ASU conventions are used to express the contraints | (see http://vizier.u-strasbg.fr/doc/asu.html) E2 The Query Formulation: The acceptable query language should be specified somewhere; should SQL (and to which level of complexity?) be allowed ? Why not think about Xquery ? | The 'mecanism' acceptable by the data server should be | specified by some tag in the result -- e.g. a mention of | 'ASU-compliant' server. | | Xquery is a language for getting extracts of an XML document | making use of its hierachical structure -- it requires a fully | XML-compliant document, and would not work with STREAM'ed | data. Details about XQuery in the most recent draft at | http://www.w3.org/TR/2001/WD-xquery-20011220 E3 Parameters required in the Query: in addition to the columns of a table, a query may require a knowledge of other parameters, like a search radius, etc. | The <FIELD> is used in Astrores for this purpose; a proposal for | the representation of parameters from Clive Davenhall is appended. | This proposal can also be a solution to the preservation of the FITS | keywords mentioned above; the preservation of the spaces, important | e.g. in the HISTORY contents, could be ensured with the | xml:space='preserve' attribute. Appendix: Alternative XML Representations for Parameters (or keywords)
(from Clive Davenhall) Take as an example, a FITS keyword from a SuperCOSMOS catalogue: PLTSCALE= '67.14 ' / [arcsec/mm] plate scale Current VOTable 0.4 ------------------- <FIELD ID = "PLTSCALE" unit = "arcsec/mm" datatype = E precision = 2 name = "PLTSCALE" type = no_query > <DESCRIPTION> Plate scale of the UK Schmidt plates from which the catalogue was constructed. </DESCRIPTION> <VALUES> <OPTION value = "67.14" > </OPTION> </VALUES> </FIELD> Suggested Parameter element --------------------------- (Could equally well be called `KEYWORD'.) <!ELEMENT PARAMETER (DESCRIPTION?)> <!ATTLIST PARAMETER ID ID #IMPLIED unit CDATA #MPLIED datatype (L | X | B | I |J | A | U | E | F | D | C | M | K) #IMPLIED precision CDATA #IMPLIED width CDATA #IMPLIED arraysize CDATA #IMPLIED ref IDREF #IMPLIED name CDATA #IMPLIED ucd CDATA #IMPLIED value CDATA #REQUIRED > For example:- <PARAMETER ID = "PLTSCALE" unit = "arcsec/mm" datatype = E precision = 2 name = "PLTSCALE" value = "67.14" > <DESCRIPTION> Plate scale of the UK Schmidt plates from which the catalogue was constructed. </DESCRIPTION> </PARAMETER> Clive Davenhall, 25/1/02.