[GAP Forum] Lookup tables in gap

Arnaldo Mandel am at ime.usp.br
Mon Jun 30 19:46:21 BST 2008


Alexander Hulpke wrote (on Jun 27, 2008):
 > Dear Arnaldo Mandel, Dear Forum,
 > 
 > > First, obvious solution: produced an ordered list L of all labels
 > > (that was fast!).  Although L is longer than N, it is true that the
 > > label of N[i] is L[i], so, given a label s, the corresponding record
 > > is N[Position(L,s)].  Also, since L is ordered, lookup should be fast.
 > >
 > > That seems fine, so I tried a traversal, which I know takes linear
 > > time on the number of links.  It took forever.  I had the routine
 > > print a progress report, and it was clearly crawling.
 > 
 > from the description given, I understand that you have a list of  
 > strings, in which you are searching. The performance problems indicate  
 > that the strings are not immutable. In this case GAP cannot store that  
 > the list is sorted, but checks it every time.
 > (The reason for this slightly disturbing behaviour is that it would be  
 > possible to change one of the strings (and thus making the list not  
 > sorted) without the list noticing. Section "Sorted Lists and Sets" in  
 > the manual has more details.
 > 
 > A workaround is easy. Simply do
 > for i in L do
 >    MakeImmutable(i);
 > od;
 > 
 > This should give you a substantial speedup.

Thanks for the tip!  Although I had already worked around the problem,
I decided to test your suggestion.  After all, I have known GAP
forever, but only recently I started to use it seriously.  So, the
least I can get from this exercise is a better understanding of GAP.

I read about MakeImmutable, and it seemed to me that the simple
statement   MakeImmutable(L);  would accomplish the same as your loop
above.  A little test confirmed it.

So, I tried this in my old traversal routine.  Instead of using the
large list L, I used only the small one, N.  Still, after a while it
was clear that it had not helped: the traversal still crawled.  Just
to recall what I said before: on the same structure, with strings
mapped already to indices, the whole traversal took less than a
minute.

(A back of the envelope calculation: Position is called once in the
innermost loop of the traversal.  Since Size(N) is just under a
million, execution of Position entails about 20 string comparisons;
this is probably 100 times longer than direct indexing, but this would
be still faster than what I observed, and would not account for the
gradual slowing down in tha algorithm)

Then I tried to see how efficient would be a record as an associative
array.  In what follows I present an interaction so that it is made
clear what I did.  The list N is called names, below:


gap> Size(names);
973438
gap> IsMutable(names);
false
gap> ForAny(names,IsMutable);
false
gap> ForAll(names,IsString);
true
gap> Maximum(List(names,Length));
141
# Note: more than half have length between 12 and 16.
gap> R:=rec();
rec(  )
gap> for i in [1..Size(names)] do
> R.(names[i]):=i;
> if i mod 128 = 0 then
>  Print("\r",i);     
> fi;od;

So, I could see the progress along i.  In the beginning, it looked
like a quick animation, one could barely read the numbers. At about
i=300000, things started to slow down visibly.  It was taking 1s for
each block of 128 indices.  Now it has reached 500000, and it takes
about 3s for each such block.  I will let it go to see whether it
finishes or goes to exponential hell.

Cheers,

am

-- 
Arnaldo Mandel                        
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il	  
am at ime.usp.br
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.



More information about the Forum mailing list