[isabelle-dev] counter example checking from ML

Lucas Dixon ldixon at inf.ed.ac.uk
Mon Sep 21 22:26:27 CEST 2009

Hi, a little more progress in understanding the issue...

I've made a little test program that highlights the change in behaviour 
that we noticed.

The point is to stress test counter example finding and see at what 
point a false-conjecture slips past. In particular, when performing 5 
iterations, this code measures the average number of times it takes to 
before the false conjecture is not spotted by the test_term function. 
So, the bigger the number that comes out, the better test_term is doing 
(at generating examples that show the conjecture to be false).

Here's the code:

ML {*
(* how many times until this false conjecture slips past codegen *)
fun count_until_missed i =
     (case Codegen.test_term @{context} @{term "%a (b :: nat). a + b = a 
* b"} 5
      of NONE => i
       | SOME _ => count_until_missed (i + 1));

(* average of the function's results, preformed n times *)
fun avg f n =
     let fun sumf 0 a = a
           | sumf k a = sumf (k - 1) (a + f())
     in  (Real.fromInt (sumf n 0)) / (Real.fromInt n) end;

(* avg number of times until this false conjecture slips past codegen *)
val count_avg = avg (fn () => count_until_missed 1) 50;

I don't have enough internet to download and test on 2008; but there has 
been a significant difference even between Isabelle 2009 and a recent 
dev snapshot (32589:9353798ce61f):

On Isabelle dev snapshot (32589:9353798ce61f)
val count_avg = 14.11 : real

On Isabelle 2009
val count_avg = 32.14 : real

(takes some time... good news: you've speeded up codegen a lot! In fact, 
this difference might even be compensating for the bad behaviour in 
iterations :)

The deterministic version should, I think, not terminate for this 
problem and many like it. But I haven't tried it yet.

These are of course averages and change a bit, but having repeated the 
experiment many times, I think those numbers are aprox right. At least 
they give a rough benchmarking mechanism.

This is a case where your random sampling looks to be doing the wrong 
thing - with respect to the spec I've made up in my head ;-). Since "(a 
= 0 & b = 0 | a = 2 & b = 2) = (a + b = a * b)", 5 iterations is going 
to have to be picking the same case at least three times; a quick hack 
would be to cache the examples you've already looked at. But even better 
would be for your random example chooser to automatically pick the next 
unpicked example, rather than repeat known results. This is probably 
more of a problem for conjectures with a small number of variables, but 
would still be generally a good idea.


Florian Haftmann wrote:
> Hi Lucas,
> indeed the whole quickcheck matter has grown into a complex story.
> First, we currently (e.g. following development in the hg repository)
> have a kind of fork:  there is the "old" quickcheck, residing in
> Pure/codegen.ML, and the "new" one in HOL/Quickcheck.thy; the most
> prominent difference between both is that the "new" one tries to
> internalize as much as possible into the logic, which allows to use type
> classes and avoids tinkering too much with glueing raw ML snippet
> strings together.
> Concerning determinism, the "old" quickcheck is inherently randomized.
> The "new" one uses an open state monad which passes a random seed around
> explicitly and can thus be also used to enforce determinism.  If you
> want to use this, be aware of the pitfalls which arise when using
> unofficial intermediate stages of development rather than official releases!
> The key for understanding is in HOL/Tools/quickcheck_generators.ML,
> function mk_generator_expr (export it on demand and try something like
> Quickcheck_Generators.mk_generator_expr @{theory} @{term
> "\<lambda>n::nat.  n ~= n^2"} [@{typ nat}]);  for further background
> knowledge on this machinery, refer to ยง4.4 in
> http://www4.in.tum.de/~haftmann/phd.shtml.  Don't hesitate to ask
> further questions.
> Hope this helps,
> 	Florian

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

More information about the isabelle-dev mailing list