[isabelle-dev] Grouping ISABELLE_FULL_TEST?
makarius at sketis.net
Wed Feb 3 22:22:21 CET 2016
On Wed, 3 Feb 2016, Jasmin Blanchette wrote:
> I have a slight worry concerning the characterization and naming as
> "benchmarks". From what I understand, a "benchmark" is a theory file
> that tests the performance of the system.
I have introduced this terminology many years ago. The idea is that we
always do a proper test of all tools, libraries, examples, either via
old-school "isabelle build -a" or some immediate test service (formerly
The things that are run in addition as "nightly build" are slow (and thus
getting in the way), but are not relevant for actual testing. Since the
main purpose of nightly builds is to collect reliably performance figures,
the outcome of these sessions is characterized as "benchmarks".
Since the measurements of isatest have become very inaccurate in recent
years, this key purpose the background tests has been forgotten. We are
flying blind, especially for AFP. Good performance records are important,
especially due to the trend to shrink consumer devices more and more.
I don't mind if there is a more fitting name for
instead of ~~/src/Benchmarks.
Getting good performance figures back -- stored over in the long term --
is really important, though.
> Special setup:
What is special about Sudoku? The relevant changeset is this:
date: Fri Sep 12 17:51:31 2014 +0200
enabled 'Sudoku' only with 'ISABELLE_FULL_TEST' -- Sudoku is fast enough
on modern hardware (within seconds on my MacBook), but it seems to fail on
older test machines
> SMT_Tests (requires Z3)
The relevant changeset is this:
date: Wed Jan 02 09:31:25 2013 +0100
files: src/HOL/ROOT src/HOL/SMT_Examples/SMT_Examples.thy
actually run Z3 for "SMT_Tests" when "ISABELLE_FULL_TEST" is enabled
In 2013 we did not have Z3 as default component, usable for everybody. Now
the condition on it is always true -- z3 is always enabled.
The theory takes 40s elapsed time on my machine. That could qualify it as
optional "benchmark", but I don't mind to have it by default. We have
other much bigger things all the time.
> Probably also in the slow category (the last three might have a benchmarking component):
> Slow and benchmarking:
These are the genuine examples where I understand the arrangement as "test
this continuously in the background, and record good performance figures
More information about the isabelle-dev