Delila Program: imgalt

imgalt program

Documentation for the imgalt program is below, with links to related programs in the "see also" section.

{   version = 2.40; (* of imgalt.p 2017 Apr 06}

(* begin module describe.imgalt *)
(*
name

   imgalt: html image alt detection and upgrading for 508 requirements

synopsis
   imgalt(orihtml: in, imgaltp: in,
          imagereport: out, imagenames: out,
          althtml: out, altstrings: out,
          stop: out,
          output: out)

files

   orihtml:  the original html file to be analyzed.

   imgaltp:  parameters to control the program.  The file must contain the
      following parameters, one per line:

      parameterversion: The version number of the program.  This allows the
         user to be warned if an old parameter file is used.

      Second line: showconversion (charcter),
                   showparse (charcter):

         If the first character is a 'c', then each conversion name
         and alt tag is shown as it is read in.  This could be a long
         list.  They look like: 'new.gif -> "new"'.

         If the second character is an 's', then the orihtml file is
         copied to the output with parsing information displayed.
         Each line number being examined is given and then a set of
         characters show the parsing information:

                     not        '-'
                     in comment 'C'
                     in quote   'Q'
                     in image   'I'

                     found src  's'
                     found src= '='

                     found alt  'a'
                     found alt= '='

      Third line: temporarytag (string): This string is the default
      'temporary tag' alt string to provide if the user does not
      specify one and there is no alt tag.  Suggestion:
         ALTERNATE_TEXT

      The following lines of the file contain pairs consisting of a
      file name (e.g. colorbar.gif) and an alt string in double quotes
      to use for that file name (e.g. "a color bar").

   imagereport:  report of the results

   imagenames:  names of images.

      The first line begins with '*' and identifies the program.

      The second line begins with '*' and identified the columns.  The
      rest of the file has these columns:

      1. type: If the line begins with

         'alt ' then the image already has an alt and the alt is not
         listed in imgaltp.

         '--- ' then the image does not have an alt and none is listed
         in imgaltp.

         '+++ ' then the image alt will be supplied from imgaltp in
         creating althtml from orihtml.

      2. url: The image url is given as reported in the orihtml.  Note
      that this may not be a full url.

      3. Ourl: The line in the orihtml file where the image url is
      given.

      4. Oalt: The line in the orihtml file where the alternative text
      string is.

      5. Aurl: The line in the althtml file where the image url is
      given.  This is different from the lines in the orihtml file
      because extra lines are added to althtml to put each alternative
      text string on a fresh line.

      6. Aalt: The line in the althtml file where the alternative text
      string is.

   althtml: a copy of orihtml with alt strings inserted as needed.

      There are four cases to handle depending on whether or not there
      already is an alt tag and whether or not what to do is specified
      in the imgaltp.

      --- no alt tag, alternative not supplied: fill in with temporary tag

      +++ no alt tag, alternative     supplied: use tag supplied in imgaltp

      alt    alt tag, alternative not supplied: leave alone

      +++    alt tag, alternative     supplied: use tag supplied in imgaltp

      For the first case, when a temporary tag is generated, it will
      contain the string specified by the user in imgaltp, for example
      "ALTERNATE_TEXT".

      Note that an empty alt tag is replaced with data from imgaltp
      only if there is a file name match.  This preserves empty alt
      tags.

      In this version of imgalt, the <img ... > is rewritten in
      althtml so that src and alt are at the end of the <img ... >.
      Both the src and alt are on their own lines.  Also, the alt
      string is wrapped and the current indentation is retained.

      Any spaces on the ends of the orihtml are removed from lines of
      althtml.  This allows comparison of the results.  If the imgaltp
      is not changed, then moving althtml back to orihtml and running
      imgalt should, in theory, make orihtml and althtml identical.

   altstrings: image file names and alt strings in quotes, one per line
       These are taken from both imgaltp and orihtml and duplicates
       are removed.  Priority is given to the first file name/alt
       string found in the list so that what a user defines overrides
       values in the orihtml.

   stop: The stop file is written only if there is a program error.
       This allows other programs to know that imgalt has crashed and
       handle the situation gracefully.

   output: messages to the user

description

   The 508 requirements require that every image (img) in an html page
   have an alternative (alt) description tag.  This program, imgalt,
   reads an html page (orihtml), detects the images, identifies the
   source of the image and whether there is an alt tag.  It rewrites
   the page so that the image has the source followed by alt tags that
   the user specifies in a master list (provided in imgaltp).  The
   revised master list is created (altstrings).  If there is a
   problem, the stop file is created which can be used to halt
   recursive reading through directories (see the tree script in my
   toolbox).  Other files give information about the image tags.

   In an html web page, orihtml, identify images that do not have
   "alt" (alternative) descriptions.  Create a revised html, called
   althtml, that inserts alts with tags that can be fixed.  The
   initial list of alt tags to use is given in the parameter file
   imgaltp.

   The program outputs a list of the images, called imagenames, marked
   with whether or not they have an alt description.  This list gives
   the line numbers for the url and alt tag in the both the orihtml
   and althtml files.

   NOTE: only the actual file name, following all slashes ('/') in the
   URL is used to identify the image name in orihtml, but the whole
   URL is passed to althtml.

   The program also generates a list all the file names with their alt
   tags into the altstrings file.  The list starts with pairs given by
   imgaltp and is followed by ones found in the file.  This list can
   be appended to the end of the imgaltp parameter file.  Duplicate
   file names are removed from the altstrings list.  Duplicates that
   are further down the list are removed, so priority is given to the
   top of the list.  This allows a user to define a new alt tag for a
   given file, and this will override all cases subsequently found.

   The program does a single pass through the html file.  To replace
   an alt tag, the file has to be identified before the alt tag is
   given.  One approach would be to requre that the src string tag be
   before the alt tag.  However, the program is smarter than that.
   When it encounters an img that is not inside a comment, it copies
   the contents but skips and remembers the alt and src.  Then when
   the end of the img is found, it prints them on their own lines with
   the same indentation as the original code.  The src is given before
   the alt.

examples

Example imgaltp file:

1.36  version of imgalt that this parameter file is designed for.
sn    's' means show the conversion lines as read in
ALTERNATE_TEXT
colorbar.gif "a color bar"
donor.gif "sequence logo for human donor splice junctions"

documentation

see also

   google search for 'html alt'
   http://www.google.com/search?hl=en&source=hp&q=html+alt

   A useful tutorial, "The Rules of ALT":
   http://html.com/images/rules-of-alt/

   A useful tutorial, "Guidelines on alt texts in img elements":
   http://www.cs.tut.fi/~jkorpela/html/alt.html

   Program that removes blanks from ends of lines:  rembla.p

   **** Related Scripts ****
   mh sets up for correcting alts in a single html file
   alt called by my or directly to correct alts
   mkalttags uses tree to analyze all htmls in a directory
   mkalttagsfunction - called by mkalttags through the tree script
   tree general recursive directory processing
   masteralt provides the alt string for a given image

author

   Thomas Dana Schneider

bugs

technical notes

   The program imgalt reads through the original html (orihtml) and
   copies to the altered html (althtml).  When it is inside an image
   '<img' it copies until it sees a source file 'src=' and it then
   captures that.  Likewise it captures an alt string 'alt='.  Case of
   the src and alt and spacing to the equals do not matter.  These are
   captured but not sent to the output.  Instead, they are held until
   the end of the img is found at '>'.  At this point the src is
   output followed by the alt.  So if the src is after the alt, the
   order will be reversed.

*)
(* end module describe.imgalt *)
{This manual page was created by makman 1.45}


{created by htmlink 1.62}