Quines

A quine is a program that prints itself, or some computable transformation of itself.[1] They are discussed at some length here and here. However, although both writers discuss the logic of quines, as does Hofstadter, neither seem to apply this to the actual systematic construction of quines. Hence there are long catalogues of quines in all manner of programming languages, with requests for further additions thereto.

Yet there is a simple construction of quines that is generic across all languages with a sufficient modest expressive power. As far as I can tell, the barrier to seeing this construction has been the traditional view of quines as comprising two segments, code and data, with the data being some quoted form of the code. In factit  is more fruitful to view a quine as comprising three segments, as I shall now show.

The Construction

I shall derive the construction in a self-explanatory Algol-like pseudocode.[2]

A program is a string, so in some way a quine must embody itself as a string that it can output. Clearly it cannot contain do so literally, in the sense that it contains a definition of the form

     

where self is the program itself.

Apart from the logical absurdity, self would have to be a quotable form of the program rather than the program verbatim, because, in all programming languages, some characters cannot appear as themselves within quotes but must be escaped in some way.  That is what the box denotes, and henceforth I shall use a box to denote the quotable version of a string. Note that boxing is specific to the host language.

Now consider the program to be the catenation of three strings,  s q t.  t contains the statements that output the program, say:

     

Since t is independent of s, s can contain the definition of t, thus:

     

Where can the definition of s go?  Obviously not within s, but it could be in q.  Indeed, q could be just the boxed form of s, with the rest of its definition being part of s itself.  Thus we get:

                             

Unfortunately, q cannot be defined explicitly as a program variable, because the boxed form of s cannot consistently appear either within s or t.  The way round this impasse is to compute q.

So, assume that we can write a construct [s] within the language to generate, or directly output, the boxed form of s. This capability of expressing boxing algorithmically within the language is the minimum expressive power alluded to above; given it, we are home. t becomes:

     

and the complete quine is simply:

Clearly it prints itself because it prints the strings s,  and t in sequence.

I should emphasise that this construction by no means generally yields a minimum length quine, and I leave our OCD brethren to their own devices on that one.

However, there are some obvious simplifications within the general framework, language permitting. For example, if it is possible for the printing segment to precede the definitions,[3] we can re-arrange the program as:

                             

Now t is a “primitive” string independent of any of the other segments of the program, so we can dispense with the variable t and include  textually in the print statement, thus:

In other languages, like Perl, special constructs may enable even more radical reductions.

Automating the construction

Even given the construction scheme, hand-cranking the actual construction of a quine is tedious, so some degree of automation is essential.  Here I describe how the FIFE tool achieves such automation by allowing the quine to be expressed textually as the catenation of its logical segments. Since the construction is driven by a shell script, it is natural to have each segment reside in its own file. To ring the changes, I shall illustrate the construction of a C quine.

Constructing a C quine

Recall that the quine is constructed as a 3-segment string, thus:

The FIFE tool allows the program to be written literally in this form, thus:

$$*$s.c*$$*$Qs.c*$$*$t.c*

The file s.c is itself generated by FIFE from a file s.fife, and Qs.c is the separately generated boxed form of s.c . (If it weren’t for the need to do this, the construction could be done in a oner by making a link directly to s.fife in the above.)

Below are the individual source files, the construction script, and the final constructed quine. I shall make only two comments, then leave the interested reader to mull it all over; and if further inclined, to extend the construction to produce a quine that yields some host language-definable string-to-string function of itself (but don’t tell me – I know how to do it J).

  1. The internal boxing of s is done by the for loop in main (and C being C, it outputs the boxed form directly). Thus it is not necessary to be able to embody boxing as a function in the host language, simple as an expression with a free string identifier. In contrast, the external boxing operation in the script can be done in any convenient way: I happen to have written a Java program to do it.
  2. The final quine is complex and fairly impenetrable. However, this matters no more than the complexity any compiled program. It is the source that is important, and that couldn’t really be more transparent.

quine.fife:

$$*$s.c*$$*$Qs.c*$$*$t.c*

s.fife:

$#include <stdio.h>

 

char*t = "$$*$Qt.c*";

char*s = "

t.c:

";

 

int main()

{

    printf("%s",s);

    for(char*p=s;*p!=0;p++)

    {

       if(*p=='\"'||*p=='\\'||*p=='\n'||*p=='\t')printf("%c",'\\');

       if(*p=='\n') printf("%c",'n');

       else if(*p=='\t') printf("%c",'t');

       else printf("%c",*p);

    }

    printf("%s",t);

}

The construction script:

java box t.c > Qt.c

java fife s.fife > s.c

java box s.c > Qs.c

java fife quine.fife > quine.c

quine.c:

#include <stdio.h>

 

char*t = "\";\n\nint main()\n{\n\tprintf(\"%s\",s);\n\tfor(char*p=s;*p!=0;p++)\n\t{\n\t\tif(*p=='\\\"'||*p=='\\\\'||*p=='\\n'||*p=='\\t') printf(\"%c\",'\\\\');\n\t\tif(*p=='\\n') printf(\"%c\",'n');\n\t\telse if(*p=='\\t') printf(\"%c\",'t');\n\t\telse printf(\"%c\",*p);\n\t}\n\tprintf(\"%s\",t);\n}";

char*s = "#include <stdio.h>\n\nchar*t = \"\\\";\\n\\nint main()\\n{\\n\\tprintf(\\\"%s\\\",s);\\n\\tfor(char*p=s;*p!=0;p++)\\n\\t{\\n\\t\\tif(*p=='\\\\\\\"'||*p=='\\\\\\\\'||*p=='\\\\n'||*p=='\\\\t')printf(\\\"%c\\\",'\\\\\\\\');\\n\\t\\tif(*p=='\\\\n') printf(\\\"%c\\\",'n');\\n\\t\\telse if(*p=='\\\\t') printf(\\\"%c\\\",'t');\\n\\t\\telse printf(\\\"%c\\\",*p);\\n\\t}\\n\\tprintf(\\\"%s\\\",t);\\n}\";\nchar*s = \"";

 

int main()

{

    printf("%s",s);

    for(char*p=s;*p!=0;p++)

    {

       if(*p=='\"'||*p=='\\'||*p=='\n'||*p=='\t')printf("%c",'\\');

       if(*p=='\n') printf("%c",'n');

       else if(*p=='\t') printf("%c",'t');

       else printf("%c",*p);

    }

    printf("%s",t);

}

 



[1]    The name was apparently first used by Douglas R. Hofstadter in his 1980s cult book “Godel, Escher Bach: The Eternal Golden Braid”.

[2]    It is actually a language called S-algol that was used for many years as the teaching language at St Andrews.

[3]    Which is possible in Java, for example, but not in C.