Generation of C-tests

The problem of selecting a good bias for generating k-hard strings depends on many factors. The objective is to maintain expressiveness, to ease the problem of finding explanatory descriptions and to limit the combinatorial explosion. The final choice we present is an oversimplified abstract machine that is easily extensible to work as a Turing machine.

A Toy Memory-less Abstract Machine

Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, such as l -calculus (or LISP). We have adapted the "toy RISC" machine of [Hernández-Orallo and Hernández-Orallo 1993] with two remarkable features inherited from its object-oriented coding in C++: it is easily tunable for our needs, and it is efficient. We have made it even more reduced, removing any operand in the instruction set, even for the loop operations. We have only three registers, which are AX (the accumulator), BX and CX. The operations Q b we have used for our experiment are in Table 1:

LOOPTOP	Decrements CX. If it is not equal to the first element jump to the program top.
LOOPS	Same as LOOPTOP but it jumps n (for the tests n=4) instructions backward.
LOOPM	Same as LOOPTOP but it jumps m (for the tests m=7) instructions backward.
SUCC	Increments the accumulator.
PRED	Decrements the accumulator.
WRITE	Writes into the output and moves fwd.
BREAD²	Moves back and reads from the output.
FREAD²	Moves fwd and reads from the output.
MOV A,B¹	Copy register BX into AX
MOV B,A¹	Copy register AX into BX
MOV A,C	Copy register CX into AX
MOV C,A	Copy register AX into CX
ROTR³	Rotates 45° to the right.
ROTL³	Rotates 45° to the left.

Table 1. Instruction Set

The operations with no superscript are present in all the subsets. Operations marked with (1) are present in the ‘professional’ version of the machine, the operations with (2) are present in the Turing-like version and those with (3) are present in the Logo version where the output is bidimensional. This sparseness of only 10 operations will be clearly justified later. We have essayed with many different alphabets but for this test we will use the professional version and a circular alphabet W b = {a,b,c,d,...,z}, i.e., incrementing ‘z’ yields ‘a’ and decrementing ‘a’ yields ‘z’. Since the first element is an inflexion point for the loops, it is presented to the subjects as "a critical element".

This configuration still produces many programs that are not robust (intensional) because programs can be often split into subprograms. The solution for these cases comes from another restriction: the programs must be comprised wholly inside a loop. This leaves a good approximation to explanatory programs. The rest to do is to avoid repetitions of patterns such as "abcabcabcabc" (for sake of gain and plausibility) and take apart the strings where an important part is explained by a shorter program (for sake of intensionality). We think that the bias is not all the expressible we would like but it allows the generation of strings of certain complexity. Also we think it is fair because it does not relate on arithmetic (such as cryptarithmetic tests) or any other preceding knowledge, except the order of the alphabet.

The Generation of k-Hard Strings

The algorithm we have used to generate a set of different k-incomprehensible strings is very similar to the one we presented in section 5.4 (of [Hernández-Orallo and Minaya-Collado 1998]). Having 10 operations, we have that usually only about a 20% of the programs of any size are explanatory. This means that trying to know whether a randomly generated program of, say, size 15, is valid, will need the checking of more than 2,222,222,222,222 programs. And this is the case if the computational cost of x* is slow, contrariwise (if x* is a costly program) we will have to check longer programs.

We have used some optimisations and heuristics in order to make the great amount of programs to check more tractable. Some examples of questions are:

Prediction style:

k9: a, d, g, j, … Answer: ‘m’

k12 a, a, z, c, y, e, x, … Answer: ‘g’

k14: c, a, b, d, b, c, c, e, c, d, … Answer: ‘d’

Abduction style: k8: a, _, a, z, a, y, a, … Answer: ‘a’

k10: a, x, _, v, w, t, u, … Answer: ‘y’

k13: a, y, w, _, w, u, w, u, s, … Answer: ‘y’

The Tests

Four tests were devised to measure prediction, abduction, g-factor and similarity. The prediction test is composed of 19 exercises generated with the following k-hardness distribution (2 k7, 1 k8, 2 k9, 3 k10, 3 k11, 3 k12, 2 k13 and 1 k14), redundancy r = 2 and the less ‘akin’ as possible. The abduction test is composed of 15 exercises using the same generator and redundancy. The distribution was (2 k7, 2 k8, 1 k9, 2 k10, 1 k11, 3 k12 and 4 k13). In these two tests, the incorrect options were generated randomly but relative near to the solution and the letters appearing in the string. The IQ test we used was the European IQ test simply because it is a culture-fair test, devised for 20 minutes, ensuring a reasonable range (75-174) of values and available on the Internet.

Return to the Computational Measurement of Intelligence Page