In a letter of April 19 to the GAP-forum Frank Leonard wrote
"I have been attempting some constructions of finite groups in GAP as
defined by a presentation in abstract generators/relators. The
problem consists of applying a certain algorithm to construct a group
which I then attempt to identify. I also have access to the C.A.
package CAYLEY and have re-written the algorithm for CAYLEY. I have
found CAYLEY vastly superior (time wise) to GAP for this problem. I
assume that the fact that GAP programs/functions are interpreted would
be significant in explaining the discrepancies but I am curious if
there are any other factors which might be significant. Can you tell
me if there are any plans at present to write a compiler for GAP?"
In my reply I have made no attempt to hide my disappointment about
such unspecified but nevertheless very critical comment. On this I
have obtained a letter from Mr. Leonard which starts:
"The following is a personal message and is not intended for
submission to the GAP-forum."
I have to respect this. However I do not think that I break this
confidentiality if I report that after saying later on in his letter:
>"I have been deliberately vague about the details of my constructions
>in my submission as the material on which I have based my GAP program
>forms the basis of a paper..."
Mr. Leonard reports that he has used the GAP-function Size(g) on a
two-generator, two-relator presentation of the trivial group which he
then gives, thus allowing us to analyze the case. He states that it
took him approximately 20 sec using the GAP-command Size(g) and less
than a second in Cayley to find that the presentation does in fact
describe the trivial group.
We have repeated the calculations and found the statement correct, on
a DECstation Size(g) took 15 sec, Cayley took 0.5 sec. I only wished,
Mr. Leonhard had made this explicit statement in his first letter,
because this gives me a chance to explain the reasons and talk about
consequences of the observation.
Both programs use Todd-Coxeter coset enumeration methods. The one in
Cayley is implemented by George Havas in C, the one in GAP is written
by Martin Schoenert in the GAP-language with some time-critical parts
in C in the GAP kernel. There is a standalone version of Martin
Schoenert's program written by him totally in C and so the first step
is a comparison of this and the GAP function.
The respective time for Martin Schoenert's C program is 4.7 sec, as
stated above the time for the GAP function 15 sec. This factor is
rather typical, factors of 3 - 4 have also been observed with other
presentations. This factor is due to two reasons, the fact that GAP
is interpreted but also to the overhead time needed for the dynamic
storage allocation using indirections. We do not know exactly which
proportion of the factor is due to which of the reasons but it is
clear that even compilation would not remove the second reason. It is
also clear that both reasons hit in particular with programs in which
rather small data has to be accessed many times and Todd-Coxeter
methods are one of the worst cases for this phenomenon.
There remains quite a substantial factor to be explained. The first to
note is that there are several "strategies" for the Todd-Coxeter
method, which can have an enormous, unfortunately not completely a
priori predictable influence on the efficiency. There are some papers
investigating experimentally these various strategies, in particular
J.J. Cannon, L.A. Dimino, G. Havas, J.M. Watson : Implementation and
Analysis of the Todd-Coxeter Algorithm. Math. of Computation, 27,
(1973), 463 - 490.
G. Havas : Coset Enumeration Strategies. Proc. ISSAC 91.
and George Havas is carrying out further investigations.
Without going into details: There are two 'old' strategies, the
so-called Felsch strategy and the so-called HLT strategy (after
Haselgrove, Leech, and Trotter), of which 'in general' (such
statements always have exceptions, and Mr. Leonard's example is one)
the Felsch method tends to define fewer redundant coset numbers and
hence to need less space while the HLT method tends to be faster.
(Both have been improved by modifications, but this does not matter
too much here.) Now the times observed by Mr. Leonard are for the
default strategies in GAP and Cayley.
The very short time in Cayley is obtained by a HLT strategy, but as
George Havas, whom I want to thank for his help in investigating the
case, has pointed out, also with a bit of luck: If he just cyclically
permutes the relators, his program in CAYLEY takes 3 times longer
because it defines 3 times more coset numbers before collapsing the
coset table!! (maximal table size 69909 instead of 21860, 1.570 sec
instead of 0.530sec, times on his machine which I believe is a SUN)
The method used in GAP is basically a Felsch method, there is no GAP
implementatiom of a HLT based method yet. Modifications of the method
cause rather drastic changes in efficiency of a Felsch method, too, as
in particular George Havas has investigated. E.g. using the relators
also as subgroup generators in a Felsch strategy mode of George Havas'
program reduces the maximal table size from 27532 (which is the same
as in the GAP program) to 17012 and computing times from 2.29 sec to
1.40 sec. This is because in George's program subgroup generators are
scanned a la HLT. Using in addition a 'Preferred Definition List', a
strategy developped by George Havas, reduces the maximal table size
further to 9542 and the computing time 0.84 sec.
These figures make clear that the choice of the strategy is a delicate
matter and that the comparison made originally used defaults that were
particularly unlucky for the performance of GAP.
What are the consequences for us? For very large enumerations the
factor of 3 or 4 caused by indirection and interpretation is annoying.
We will therefore gratefully accept the offer of George Havas to
append his C-program, which at present probably represents the
greatest experience with the above mentioned delicate decisions, as a
share library to GAP. George will eventually provide us with the C
source code for the purpose, but he wants to polish the code first, so
we may first try to provide executables of his program for a few
computers. Using such a share library of course has the drawback
(which is the payment for some of its speed) that it is running on its
own piece of storage that is not administered by the GAP storage
management. Therefore we also intend to learn more from George's
experience and also try to improve the TC written in the GAP-language,
even if more or less intrinsicly we will loose that factor of 3 or 4
against a standalone C-program.
This has become a long letter, allow me nevertheless to mention
another case of an efficiency comparison. Both GAP 3.2 and CAYLEY have
programs for the computatipon of the subgroup lattice, which we are
allowed to compare even more since they have both been written in
Aachen. The CAYLEY program was written by Volkmar Felsch, originally
still in Fortran, then semiautomatically translated into C by John
Cannon. Volkmar Felsch is still helping to maintain this program, the
last such help was given only this Spring. The GAP program was
written by Juergen Mnich. Of course, when he had written it, we made
comparisons and were rather content with the results: For small groups
the two performed roughly equal while for big permutation groups (big
for this kind of program) such as the symmetric group S8 or the
Mathieu group M12 the new program was faster by a factor over 10,
which matters for these cases since absolute computing time is over
one hour for the new program for these cases. This was at least
partially due to the use of new functions for permutation groups. We
were rather surprised to learn from Ralf Dentzer that the performance
of the new program was not so satisfactory for medium size soluble
groups with thousands of classes of conjugate subgroups e.g. certain
groups of order 256 from Eamonn O'Brien's list for which the new
program performed by about the same factor of 10 *slower* than the old
one. Martin Schoenert has very briefly mentioned this fact in a reply
to a question in the GAP-forum recently and also mentioned that he has
written a still newer progam in GAP which seems to avoid these
deficiencies for medium size soluble groups while keeping the good
performance for the 'big' groups. This program will be put into GAP in
the not too far future.
I mention the second case after my report about the first in order to
make the point that in cases such as the computation of the subgroup
lattice of a 'big' permutation group the fact that GAP is interpreted
and the storage management uses indirections does not play such an
important role for the efficiency while on the other hand the
simplicity of programming in GAP makes it much more feasable to try a
mathematical variant of a method that can improve the efficiency.
Last not least however I want to emphazise that we do appreciate clear
and specific reports, such as given to us by Ralf Dentzer, about
deficiencies in GAP, deficiencies which we shall then try to remove as
much and as soon as our restricted manpower allows, and that the last
thing we want to do is to sweep such problems under the carpet.
Joachim Neubueser