cgiprint (1)      Convert a text into HTML     (Dec. 2011)

cgiprint [–%letter] [–bib] [–cat] [–Dmacro=definition] [–f tag_file] [–glu] [–HELP] [–html] [–ic] [–keep[^_$~] [–lang language] [–lis] [–LIST] [–mail] [–n name] [–nomath] [–nooutput] [–oglu [–o output_program] [–pre] [–sec] [–tex] [–tex2] [–tag tag_prefix] [–v##] input_file...

cgiprint is a filter which translates a text (from a set of files, or from the standard input) into HTML. The generated HTML contains links (anchors) as http addresses or GLU tags ( with the –glu option. These links are either constructed from entités taggées having the generic form <nom_tag:valeur>, or from an interpretation of the text.

If the name of this program starts by Show (e.g. by creating a link named Show-text to cgiprint ), it can then be called directly by the httpd(1) daemon.

A few examples of entité taggée : <Uni:unit_symbol>, an entity-GLU tag (<%type value>), or an action-GLU tag (<&action value>). A BibCode, bibliographical reference made of 19 characters YYYYJJJJJvvvvMppppA, can typically be recognized automatically.

The text to convert can be in TeX form (with the –tex option), in a TeX-light form which contains only basic TeX macros and is used for the abstracts (with the –tex2 option), or can be in HTML (with the –html option); it can alos be a mixture of latex/HTML with e.g. the text between \begin{HTML} and \end{HTML} markers written in HTML.

cgiprint also permits the substitution of environment variables, either via the TeX macro \env{...} (in a TeX context), or with the usage of the $ sign (in a HTML context). It should be notieced thatn as in a shell, the braces { } can be used to delimit the variable's name, as in ${var}1.

All definitions — the format of the input text, the definition of the macros, how to convert entités taggées into GLU or HTML markers, etc... — can be included in a tag_file file, or included in the beginning of the input text if it starts by \cgidef{


–%letter    asks to desactivate the automatic recognition of lines startin by %-letter. For instance, -%R does not generate an anchor for the line
%R 1996ApJS..123.1234A
lorsque la reconnaissance automatique des bibcodes est active.

–bib | -+bib    asks to transform the BibCodes embedded in the input text into anchors or GLU tags (the option -+bib removes this feature). Note that the bibCode has to be written in its canonical form (19 characters) to be recognized, and must be separated from the preceding and following text by a blank or non-ambiguous punctuation characters like comma, bracket, ... Such bibcodes are interpretated as if these were written as entités taggées <Bib:bibcode> or ou <%R bibcode> (–glu)

–cat | -+cat    asks to recognized the CatCodes — codes designating astronomical catalogues like III/123 or J/A+AS/112/234 (the -+cat option disables this recognition). These CatCodes are interpretated as if these were written as entités taggées <Cat:CatCode> or <%E bibcode> (–glu)

–Dmacro=definition    is meant to add macro definition(s) on the command line. Notice that the definition must be attached to the –D symbol (as for pre-processor instructions in C compiler); the leading \ backslash may be omitted, and for instance -DmyURL=http://myhost and -D'\myURL=http://myhost' are equivalent.

–f tag_file    indicates a file which contains definitions. See below the tag_file section for details.

–HELP    gives a list of options, and the list of all TeX macros recognized.

–html | -+html    indicates that the input text contains HTML tags — for instance things like <B>bold text</B> (the -+html option disables this recognition). The list of the acceptable ``tags'' are given in the tag_file (see tag_file section), while the non-recognized tags are transformed: as an example, the text Search for quasars in range 2<z<3
is (correctly) transformed into
Search for quasars in range 2&lt;z&lt;3

Note: an HTML tag is not recognized as proper HTML syntax (and hence the < is translated into &lt;) if:

–ic    asks to ignore the input comments; by default the input comments are translated into HTML comments.

–keep[^_$~%]    asks to consider the characters enumerated as normal characters; otherwise the characters indicate the modes exponent, indice, mathematical, tilde accent, and comment modes.

–lang    specifies the language (english of french) for the textx like Contents or Table es Matières. The default (not surprinsingly) is en (english).

–lis    writes out, as HTML comments, a list of the known definitions:

Note that the macro \list{titre} generates also this list, when the –tex option is specified. See also the –LIST option to produce the actual list of TeX macros.

–LIST    issues a complete list of TeX macros with their equivalence, as a simple text.

–mail    | -+mail asks to interpret e-mail addresses; note that the form <adresse[?Subject=sujet]> also recognized, as if written <Mail:e-mail_adress> or <%M e-mail> (–glu) The -+mail option disables this recognition.
Note: email addresses are crypted.

–math^    asks to intrepret the character ^ (carret) as exponentiation, even outside the mathematical mode . The -+math disables this interpretation.

–n name    assigns a name to the input stream – its default name is (stdin). This option also changes the action of the macro \thefile.

–nomath    asks to use the standard font for the mathematical mode (what is between $ ... $). The default is to write the mathematical mode in italics.

–nooutput    asks to not edit the result – until the next \enableoutput macro.

–oglu    asks to ``filter'' the output by the program /usr/local/bin/glufilter. This option also implies –glu.

–o programme    asks to ``filter'' the output by the program programme. The options of programme, if any, are separated by an ampersand &. Typically, the –glu option can be followed by -o glu&-D&gluDic.
Note: when the –sec option is active, the program must reside in the directory $HTTPD_HOME/bin (the default value of $HTTPD_HOME is /usr/httpd).

–pre    indicates that the text is already pre-formatted (blanks and newlines are meaningful). This option simply inserts the HTML tags <PRE> at the beginning and </PRE> en fin (the -+pre option disables this interpretation).

–sec    (secure) restricts the access to files starting by the 8-character \cgidef{ string \cgidef{ – any file not starting by this signature being rejected. In addition any executable program specified in the –o option (see above) must be in the unique $HTTPD_HOME/bin directory.

–tex    indicates that the input text uses the TeX conventions. (the -+tex option -+tex disables this convention). See below the Définitions TeX section the details of the tex-like definitions.

–tex2    indicates that the input text uses a `light-TeX'' format, as the one used for the abstracts: the symbols { }, %, $ keep their usual signification, indices are within underscores _, the exponents are within carrets ^. Some other commonly used conventions are also recognized, like +/- which is written ±. The -+tex option disables this convention.

–tag tag_prefix    specifies the prefix used for the conversion of entités taggées into actual anchors – the default value of tag_prefix is URL_

–vversion_number    defines the HTML version of the emulation used. The default is 3.0

input_file...    indicates the files to convert. The default is the standard input.

Tagge entities
An entité taggée , of general format <nom_tag:valeur>, is by default onverted into <A href="${URL_nom_tag}valeur">valeur</A>
i.e. the environment variable $URL_nom_tag is replaced, and the argument follows. For instance, the text <Essai:valeur> becomes: valeur

To be more detailed, if the variable $URL_Essai is defined, either in the standard Unix environment, or in the tag_file, as
the the complete translation of is:

The tagging prefix URL_ can be replaced by any other text defined in the –tag option, or by the environment variable TAG2HTML.

The first character of each line of a tag_file means:

Example: the data concerning the cluster NGC 2244 of the Base de Données sur les Amas are tagged by <Cat:BDA/ngc/2244>. With the definition

the text <Cat:BDA/ngc/2244> is converted into:
	<A HREF="">BDA/ngc/2244</A>
  • [%] definition of GLU entities. As above, the entity written <%E:BDA/ngc/2244> is transformed into the anchor
    	<A HREF="">BDA/ngc/2244</A>
    assuming the definition
  • [<] definition (in the form of a regular expression, see grep(1)) of the tags acceptable with the –html option
  • [\] TeX definition. This definition is similar to a TeX macro definition. The number of arguments of the macro should be indicated with the # symbol followed by its number, followed by the definition within braces {...}. For example, a definition of the macro \titre can be done with:
    \titre   #1{\<TITLE\>#1\</TITLE\>}
    (beware to the separation between the name of the macro and the #)
  • [{] definition of TeX environment. Parameters are possible, as e.g.:
    {tabular} #1TABLE
  • [!] \cgidef definitions (see next section)
  • \cgidef definitions
    Any line starting by \cgidef{ indicates a context change – permetting a mixture of documents written in TeX and in HTML. The exact syntax is:

    \cgidef{ [begin|plain|html|tex|end] [options]

    with the following meanings:

    Définitions TeX
    La liste complète est généré par cgiprint -LIST. Un très bref aperu des macros reconnues par défaut:

  • et pour finir, quelques macros particuliéres:

  • Un autre exemple, pour montrer comment on peut réaliser des changements de couleur dans les backgrounds

    Returned Status
    cgiprint retourne 0 en cas de succès, et 1 s'il y a eu un problème.

    Variables d'Environnement

    Quelques variables d'environnement peuvent être utilisees:

    • $HTTPD_HOME (lorsque le programme s'appelle Show-s, c.a.d. l'option secure est activée: restriction de programmes dans $HTTPD_HOME/bin, cf option –o ci-dessus.
      Par défaut: /usr/httpd
    • $QUERY_STRING (lorsque le programme s'appelle Show (éventuellement suivi de -s ou -plain): les arguments sont fournis dans cette variable d'environnement, suivant le protocole HTTP (cf httpd(1))
      Par défaut: arguments dans \cgidef{ (cf section \cgidef{)
    • $SCRIPT_NAME (lorsque le programme s'appelle Show (éventuellement suivi de -s ou -plain): il sert à définir la macro \theprogram

    See also
    glu(1) grep(1) httpd(1) latex(1) sed(1) tr(1)

    Note: A function
    char *cgitr(char *text, char *html_version)
    is available in /usr/$MACHINE/lib which interprets the input string text and returns a newly allocated interpreted result.

    Questions & Problèmes
    à Fox ()