> < ^ Date: Thu, 14 Oct 1999 11:09:44 -0400 (EDT)
> < ^ From: Alexander Hulpke <hulpke@math.colostate.edu >
^ Subject: Correctness of Data Libraries

As mentioned already by Steve Linton in his forum mail the other day, there
has been an error in the library of primitive groups, which was fixed in
bugfix number 4.

I would like to take this as an opportunity to discuss the reliability of
this (or the other) databases built into GAP in a slightly wider context.

As the databases were obtained in different ways, this naturally splits into
two parts:

The first class is databases which were produced by classifying programs,
often by GAP programs themselves. This are the library of small groups by
Hans Ulrich Besche, Bettina Eick und Eamonn O'Brien, the library of
transitive permutation groups by Butler, McKay, Royle and myself.

For these classifications we have run variations of the classification
programs and crosschecks with other data bases to check the output, the
construction processes have also been published and are open to
scrutinizing.

The second class is data bases which stem from published classifications in
refereed publications. This includes the data base of character tables
(which is based on the ATLAS and the modular ATLAS) but has been extended
immensly and often. (For reasons of space I cannot give a list of all
contributors here, many of the tables are labelled with an original
investigator, but I'd like to mention Thomas Breuer who maintains and
extends this library.)
Then there is the library of perfect groups, which is based on the book by
Holt and Plesken.
There also is the library of crystallographic groups (which in GAP 4 is part
of a deposited contribution by Franz G"ahler, currently undergoing review
for becoming a share package) based on the book by Brown, B"ulow, Neub"user,
Wondratschek and Zassenhaus.
Finally there is the aformentioned library of primitive groups. This is
based on a paper by Dixon and Mortimer, a list in the same author's book on
permutation groups and further corrections on this in Theissen's
dissertation.

Possible error sources here are twofold: The original classification might
be erraneous, so for example several errors have been found in the original
ATLAS print. (For this, Simon Norton keeps a list of errata, I'd also like
to mention that all these corrections are incorporated in the current tables
in GAP.)

The second source of errors is in the translation process into a data base
on the computer. This involves the interpretation (and sometimes extension)
of the original publication, it might involve getting a more explicit
description of the objects than given in the publication and finally (often
human) input in the computer. This is the type of error we have just fixed
in the primitive groups library.

Again, we try to run crosschecks on these libraries, but naturally here it
is harder to spot errors and there are further potential error sources. On
the other hand also the printed versions of many of these data bases have
contained errors, which have been corrected in newer versions of GAP. So
(unless you are very careful with crosschecking with the authors for errata
lists) you will on the other hand automatically get such corrections from
the lists in GAP (again, this is particularly true for the library of
character tables).

Overall, I think, we managed, for both types of libraries, to get at least
the same level of reliability as ``traditional'' printed versions of such
classifications in refereed publications have had. Neither has been proven to
be always free from errors.

So, if you obtain a result that uses such a classification (and this holds
for the electronic version as well as for a computer-free one), I would use
some common-sense carefulness for relying on it: How long has this
classification been in existence? How complicated is it? Have there been
errors/omissions? What would happen to your result if there was an error
(is it just another case that needs checking or could everything become
worthless)?

The last point to mention is ``silent'' use, this is algorithms that utilize
a data library without the user noticing. To my knowledge, so far this only
happens in three cases:

- The isomorphism test for small groups identifies the groups in the small
groups library first. This is also partially used when
computing automorphisms to find characteristic subgroups.
- The library of perfect groups is used in the subgroup lattice calculation
to find all perfect subgroups of a given group.
- The classification of finite simple groups is used in the simplicity test
and subsequentially in composition series calculations and everything that
builds on top of this.

So far these three underlying classifications have been error-free.

Best wishes,

Alexander Hulpke


> < [top]