[isabelle-dev] More context on Isabelle memory consumption

Thomas Sewell Thomas.Sewell at nicta.com.au
Tue Oct 18 03:10:16 CEST 2011

My email about PolyML and memory usage has generated some discussion, 
perhaps I
should give some more context.

The L4.verified project has been stuck at Isabelle-2009-2 for years. Our 
seem to be compatible with newer Isabelle versions, but we're running out of
(virtual) memory. We've been running PolyML in 32 bit mode on most of our
machines. As images grow beyond 4GB, we must switch to 64 bit mode, which
roughly doubles memory use to > 8GB, which means replacing most of our 
Soon 16GB will be standard, I suppose.

In the meanwhile I've been having a look at Isabelle memory use. I've never
really understood why Isabelle images consume gigabytes of memory. With full
proofs off, the only real inputs should be the statements of hypotheses 
and the
definitions of types and terms. Isabelle's output is many orders of 
bigger than these inputs.

The initial thought was that the record package would be representative. The
HOL image has grown because it has more in it, but the theorems generated in
defining a large record stay roughly constant. In addition, the L4.verified
proofs require a large record.  I attach a graph of the memory cost of 
a 400-element record at various dates in Isabelle's development history. To
quote a Dilbert-like comic I read "This graph clearly demonstrates up-ness."
The cost of defining the record has nearly tripled in this time.

We've had a look at some of the bigger spikes in this graph, and learned 
little of use. What is clear is that nothing is constant: the record package
has been moved from the Typecopy package to the Typedef package, which 
has been growing in complexity, and throughout this time Florian has been
adding new hidden constants to the record package for code generation and
quickcheck. The record package uses multiple typedefs to simplify the 
types of
some terms. In 2009-2 a typedef caused a theory to be checkpointed 8 
times, in
2011-2 30 times.

This gives us a large collection of hypotheses, all of which are tedious to
investigate. Please appreciate how valuable empirical data from PolyML 
is. Up
until now we've been poking at these issues in the dark.

One point of reference is that disabling the code generation and random
extensions saves us about 30% memory consumption, (roughly doubling over
2009-2 rather than tripling) which may be a sufficient stop-gap measure.

To address Andreas' concern: this is memory usage *after* finishing a theory
and garbage collection. If you're writing tactics, don't worry about it too
much. We don't have a problem with individual proofs exhauting memory and we
aren't using full proofs. I anticipate anyone using full proofs will do 
so on a
machine with lots of memory.

Term sharing might be important for terms that are expected to end up at 
scope, e.g. statements of theorems or terms stored in various kinds of Table
for internal use. In addition, it will only matter if these terms are really
big or produced very often. The last line in the HOL statistics line was 13
million words worth of type substitutions. I guess the interesting 
question is
what terms are so large or being substituted so often.

The reason I'm sending these emails rather than proposing a patch is 
that I've
got lost trying to figure out which functions matter and which don't. I 
don't have any clear ideas.


p.s. I attach the source of the graph as well. The columns are patch 
date, patch ID (hg name) and heap size in bytes. The date here is given 
in UNIX time for obscure reasons, gnuplot can read it with timefmt '%s'.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: graph
URL: <https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/attachments/20111018/f1c09485/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graph.pdf
Type: application/x-download-application-pdf
Size: 16408 bytes
Desc: not available
URL: <https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/attachments/20111018/f1c09485/attachment-0001.bin>

More information about the isabelle-dev mailing list