[isabelle-dev] More context on Isabelle memory consumption

Makarius makarius at sketis.net
Wed Oct 19 13:24:54 CEST 2011

On Tue, 18 Oct 2011, Thomas Sewell wrote:

> My email about PolyML and memory usage has generated some discussion, 
> perhaps I should give some more context.

The continuing improvements of Poly/ML statistics add important new 
aspects to the overall performance monitoring of Isabelle performance, 
although this is an ongoing effort for many years already.

> The L4.verified project has been stuck at Isabelle-2009-2 for years. Our 
> proofs seem to be compatible with newer Isabelle versions, but we're 
> running out of (virtual) memory. We've been running PolyML in 32 bit 
> mode on most of our machines. As images grow beyond 4GB, we must switch 
> to 64 bit mode, which roughly doubles memory use to > 8GB, which means 
> replacing most of our laptops. Soon 16GB will be standard, I suppose.

Probably yes.  Moore's Law for memory is still working.  5 years ago 4 GB 
was really huge, but now it is not so much, something you can assume for a 
"small" Laptop.  A normal technical workstation of 2011 could be expected 
to have 16-32 GB.

Isabelle has traditionally targeted well-equipped workstations.  In the 
past few years there was an unusual situation that the difference between 
laptops and servers was dimishied, but the gap is opening again due to 
hardware and marketing reasons (think of Smart phones, iPads, light 
Netbooks etc. opposed to huge "cloud" facilities).

I don't think it is realistic to follow the path of such mobile devices 
for local execution of the proof develpment environment.  It is imaginable 
to do it remotely, though, say via some "proof cloud service".  We have 
had this essentially in the past already: In the classic times of the Bali 
project, for example, it was commonplace to run the actual proof process 
in the background on a big server.

> In the meanwhile I've been having a look at Isabelle memory use. I've 
> never really understood why Isabelle images consume gigabytes of memory. 
> With full proofs off, the only real inputs should be the statements of 
> hypotheses and the definitions of types and terms. Isabelle's output is 
> many orders of magnitude bigger than these inputs.

You probably mean heap size 1 GB, not image size -- the latter is fully 
shared an usually quite compact.  Taking current hardware, 1 GB is not so 
much.  Empirically, a software system always grows to fill up the 
available space.  The question is what can be done with that, i.e. if it 
is put into proper use or just wasted.  (I am also the one who 
periodically reminds people of Isabelle/HOL bloat, even though it builds 
faster on my machine than ever before in history: 2min 16s on 4 cores with 
maybe 4-8 GB.

Understanding the full details of memory usage is hard, often infeasible. 
After years of Isabelle maintanence and optimization, I have myself 
developed a certain mental model that approximates this complex situation.

Counting mere types and terms of the core logic is not realistic.  The 
system is much more than the logic, even though everything happens to be 
reduced to some core calculus.  And even the logic itself has a lot of not 
immediately visible overhead compared to rather compact source 
representation.  As a rule of thumb the explicit type information is 10 
times bigger than the overall term structure.  You have already 
rediscovered some aspects of that in your recent profiling 4 measurement.

> The initial thought was that the record package would be representative. 
> The HOL image has grown because it has more in it, but the theorems 
> generated in defining a large record stay roughly constant.
> ...
> The record package uses multiple typedefs to simplify the types of some 
> terms. In 2009-2 a typedef caused a theory to be checkpointed 8 times, 
> in 2011-2 30 times.
> This gives us a large collection of hypotheses, all of which are tedious 
> to investigate. Please appreciate how valuable empirical data from 
> PolyML is. Up until now we've been poking at these issues in the dark.

One more performance figure is good, but many other things have been 
measured over the years, with an ever improving tendency.  So profiling 4 
is just one more step.

Since a couple of years, my morning ritual is to look first at the charts 
that are generated from the isatest runs, e.g. see 

Every tiny unexplained change in performance usually leads to some 
investigation of changesets, often with a private mail exchange with the 
people involved.  Thus we have managed to keep an ever growing behemoth 
manageable, and actually devliver some performance to end-users, who in 
turn produce larger and larger applications to fill up the available space 
and time.

Here is a list of things that can be easily contributed externally to 
continue the continious efforts to manage the system, without postulating 
arbitrary quick changes to the incredibly complex Isabelle/Pure and main 

   * Improve isatest to include additional measurements, say from profiling
     4, and the new (still experimental) Poly/ML Statistics facility.

   * Improve isatest to remove some assumptions about shared NFS home
     directories, to make it work reliably on vmbroy9 under Windows/Cygwin.
     Since that platform has more resource limitations, it will indirectly
     result in a leaner Isabelle system and library over time.

   * Improve Mira to take over the main statistics functions of old
     isatest.  So a better Mira could make isatest obsolete eventually.

   * Ensure that the big applications, which are now mostly in AFP, produce
     usable statistics.  Recently there was very little information coming
     from this source, as I have pointed out several times already.

   * Publish the L4.verified sources together with results from the
     checking process.  The closed nature of the application makes it
     much more subject to performance issue -- this then really means
     "poking in the dark".

   * Procure financial resources to have David Matthews improve the Poly/ML
     statistics facilities even further.  Not just for small mobile
     devices in 32bit mode, but also for bigger iron with large degree of
     parallelism. (This is technically rather delicate, and there are big
     companies making big money just with performance monitoring tools for
     certain platforms.)


More information about the isabelle-dev mailing list