A Guile/Scheme mini-tutorial for Linux developers (1)

Published on june 30, 2019 at starynkevitch.net/Basile/guile-tutorial-1.html. this is work in progress and incomplete.

by Basile Starynkevitch (France). Send comments (with constructive suggestions or about any mistakes) by email to basile@starynkevitch.net

The audience for this is mostly software developers using Linux, knowing some programming language (like Python, PHP, JavaScript, C or C++, Rust, etc...; in particular at least some C or Java or JavaScript), notably those interested by my bismon persistent system (see also this draft technical report, but skip there the text required by European bureaucracy). It is about learning basic Scheme programming using Guile, not about embedding the Guile interpreter in your C program.

My (Basile's) personal motivation for writing this short tutorial is to popularize many concepts important to me and in my bismon, such as homoiconicity, metaprogramming, hygenic macros, reflection, introspection (both at compile-time and at run-time). So this tutorial mostly aims to be food for thought. My scariest resource is my own time spent in writing this tutorial.

You are expected to read this twice to undestand all of it. On the second read, do follow the hyperlinks given here at least to check that the terminology is familiar to you, or else to bookmark that hyperlink for later reading, and also refer to the Revised5 Report on the Algorithmic Language Scheme, called R5RS. Take in mind that I am not a native English speaker.

Refer also to the R5RS and its index, and to the Guile reference and tutorial.

Prerequisites

Install Guile 2

You need a recent Linux distribution such as Debian, Ubuntu, Arch etc. Install some Guile 2 (preferably Guile 2.2) package on your system.

I am using the guile-2.2 package on Debian/Sid, on a x86-64 Linux desktop, and my favorite editor is GNU emacs. I heard that Guile could be usable on Windows thru WSL and on MacOSX by compiling its source code, but I don't use these proprietary operating systems, because I strongly prefer free software. Notice that Guile is a nice implementation of Scheme, but there are other good free software implementations of Scheme on Linux, notably Chichen/Scheme or Bigloo (both are somehow compilers, and less focused on interactive usage) or Racket (which implements interactively several programming languages, including Scheme). But Guile is a GNU software and is easily embeddable, e.g. in GNU make.

Configuring Guile on your Linux system

You really want auto-completion and parenthesis matching during REPL interaction. For that you need to fill, with your favorite source code editor your ~/.guile configuration file in your home directory to use several common modules, e.g. to get GNU readline Guile support and have formatted output and pretty-printing then initialize the readline facility:

;; file ~/.guile
(use-modules
  (ice-9 readline)
  (ice-9 format)
  (ice-9 pretty-print))
(activate-readline)

Configuring your editor for Guile and Scheme

The usual file suffix for Scheme source code is .scm on Linux, for example you could have some hello-world.scm source file. But you would prefer using the Read-Eval-Print Loop of your guile interpreter.

The GNU emacs has a M-x scheme-mode for Scheme; see also this.
And the vim editor knows about Lisp and Scheme. If your editor don't know about Scheme choose at least some Lisp support in it. Parenthesis matching is of practical paramount importance, so some of them are shown here as ( matched with its corresponding ).

Mandatory reads

Norvig's Teach Yourself Programming in Ten Years should be read before this mini-tutorial.

Many other texts are available, including better tutorials.

Of course SICP should be read after this mini-tutorial!

Scheme is a minimal and elegant programming language which has popularized important computer science concepts, such as continuations with its call/cc, and this tutorial is not a reference. For reference about Scheme, refer to the famous R5RS (or some later standard, such as R6RS). If you are also interested about Common Lisp (see also its wikipage), refer to the Common Lisp HyperSpec. My biased opinion is that every software developer should know Scheme for the concepts it is teaching.

If you happen to code your scripts in Python, you should notice, after having read this tutorial, how Scheme is semantically close to Python or even to JavaScript or to Lua (all being dynamic programming languages), even if their syntax is widely different. I personally hate Python's syntax so I prefer GNU Guile to Python, notably because white space are semantically significant in Python, and this practically requires extra support from your source code editor, so it is becomes difficult to edit python scripts with a plain editor like vi or leafpad.

Typographical conventions

In Guile, upper or lower cases are significant, and comments start with a semi-colon ; and symbols (they are a bit like identifiers or keywords in other languages) may contain non-letter characters such as - or ? or * or ! etc.... In practice, the minus sign is almost like the underscore in many other programming languages. So if you name a variable like_this in your favorite programming language, you should name it like-this in Scheme. Hence x-y is not a substraction and a+b is not the sum of a and b, it is a symbol name!! And "predicate" like names are conventionally ending with a question mark, such as integer? which test for integerness. An example of guile code is :

;; some comment, we are inputting a number
2

Notice that Guile don't have much real keywords (they are actually predefined macros), but it is advised to avoid redefining core language operators even if in some occasions, it might be useful. And the macro system of Scheme is specially powerful because of its hygenic macros. If you are used to the C preprocessor, forget everything you know about C or C++ macros. In Scheme, you could in principle redefine the if operator but you usually don't want to (except in rare cases!).

You'll often run the simplest examples directly into your guile interpreter. The prompt of that interpreter could be scheme@(guile-user)> but we abreviate it as guile> . The result of each input is often immediate. For example:

guile> 3
3 is the result given by guile

Actually, with readline support, Guile maintains some history of input, and keep several past results. Strings are like in C or Java or C++, with backslash escapes. But Guile will show you them, in results, with appropriate escapes. So, actually you'll have:

% guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.

scheme@(guile-user)> "a\x47b\n"
$1 ="aGb\n"

Then you could type Ctrl-D for the end-of-file, or also the ,quit command, to exit from Guile.

Don't forget to use your TAB key for auto-completion! In many (but not all) cases, just typing a few characters, immediately followed by a tabulation keypress (but without the ENTER key!), should be worthwhile. Also, take advantage (with the GNU readline extension of Guile) of parenthesis matching: if your cursor is on a right parenthesis, the corresponding left parenthesis will blink.

Guile (or Scheme) overview

The left ( and right ) parenthesis, quote or apostrophe ', backquote `, dot ., and comma , characters have very special meanings. Some of these characters are actually syntactic sugar, in particular, within expressions, 'a is a shorthand for (quote a), ,a is a shorthand for (unquote a), and `a is a shorthand for (quasiquote a), and all of quote, unquote, quasiquote operators are Scheme macros. In some contexts -and this is specific to Guile- a prefix comma is starting a command, for example Guile understands ,help to provide you with on-line help, and ,quit to quit the guile interpreter. To write a character constant (rarely useful), you need to use #\a for the character a (as if in C you code 'a'), and #\newline for the newline character (in C, you code it '\n'). Integers (decimal notation) and floating-point literals are like in C or JavaScript. The boolean literal values are #f (like false in JavaScript or C++) and #t (like true); notice that Common Lisp don't have such boolean literal values: it uses instead nil and t.

major language features

The Scheme language has no statements, just expressions (with a few other core syntactic constructs, such as bindings inside let, or conditions inside cond, etc...). Every expression gives some value. The interpreter is running some Read-Eval-Print Loop (or REPL) so it is looping on:

And the above loop is repeated indefinitely, unless the interpreter terminates, e.g. with you typing the ,quit command, or some end-of-file input condition. The values are garbage collected (this GC is required by R5RS) so there is no need to explicitly release any allocated memory. The Guile interpreter uses conservative tracing garbage collection techniques, but don't expect them to be your silver bullet.

The Scheme language is homoiconic: expressions of the Scheme language can be easily manipulated as data, so code is data, and that homoiconicity permits to have some clever eval function, and to have hygienic macros.

As in most Lisp dialects -since that is the acronym of List Processing-, an important data structure of Scheme is the linked list, as shown by the following figure taken from wikipedia:

and the function to make them is naturally called list. Of course, these lists may easily represent abstract syntax trees of the Scheme language, thus enabling its homoiconicity.

However, Guile has many other data structures, including symbols, vectors, records (a bit like C struct-s, but more powerful), association lists (called a-lists), hash tables, and procedures (generally represented as closures). All these are first-class citizens (actually, first-class values) of the Scheme programming language. And many Scheme values -including lists, vectors, records, a-lists, hash tables, etc...- are in principle mutable, but your Scheme code would be cleaner and more readable if you favor immutable values and use mutability with care. By convention, most mutation operations on values have a name ending with an exclamation point, for example vector-set! to put some element in a vector, or hash-set! to add some entry into an hash-table. Even environments binding variable names (with each variable name being associated to its bound value) are first class values.

The Scheme language is minimal, so its syntax, inspired by Polish notation, is extremely simple and orthogonal, and might be percieved as scary or unfriendly. Any operator is before its (optional) operands, and enclosed in significant parenthesis. Of course the already mentioned syntactic sugar rule[s] mentioned above do apply, so '(+ 1 2) is equivalent to (quote (+ 1 2)) and will have exactly the same semantics and behavior.

simple Guile examples, explained

Guile can be used as a simple Polish-notation calculator. For example: (- 7 2) 5, (+ 3 5 10) 18 and of course (+ 3 (- 8 4) 5) 12.

Simple output can be done with the display unary operation, with newline niladic operation for newlines. Formatted output (a bit like printf in C, but with a different format syntax, where ~ is an escape character, similiar to what % is in C's printf) can be done with format. For example (format #f "ten hexa=~x; some list=~a!~%" 10 (list 12 13)) "ten hexa=a; some list=(12 13)!\n" and works a bit à la sprintf. If #f is replaced by #t we get a printf-like behavior (output to the current output port, that is to stdout). But most of the time, we would use pretty-print to nicely output one single value.

Scheme being an expression-based language, the begin operator generalizes the comma operator of C to several arguments, evaluating all of them in sequence and giving the last value. Hence (begin (pretty-print (+ 4 5)) (* 5 6) (- 7 2)) 5 but prints before 9 and heats the processor (as a side-effect) to compute the product 30. The if conditional operator is like the ?: ternary conditional operator of C, but you also have the quite useful when, cond, unless, case conditional operators.

Simple input happens with the read primitive, so (+ (read) 2) 5, but during evaluation will wait for your input, supposed here to be 3. More general input and output are done thru ports, a bit similar to C FILE* handles.

Variables (that is, global ones) can be defined with define followed by the variable name. For example: after (define some-number 2) we have (+ some-number 3) 5. Variables can be assigned with set! so after (set! some-number 5) we get (+ some-number 4) 9.

Functions are also defined with define followed by some kind of template looking like a function call. For example the recursive factorial is of course defined with :
(define (fact n) (if (< n 1) 1 (* n (fact (- n 1))))) and after that (fact 40) 815915283247897734345611269596115894272000000000. Notice that Guile uses arbitrary-precision arithmetic, also known as bignums. The body of a function can contain several expressions, à la begin, for example: (define (verbose-double x) (pretty-print x) (+ x x))

Of course, functions are procedures, so, are first-class values; hence : fact #<procedure fact (n)> and + #<procedure + (#:optional _ _ . _)>. However, if or define cannot be evaluated, since they are not procedures. Use the unary procedure? primitive to test if some value is a procedure.

Local variables are introduced with the let syntax (which has inspired the same construct of JavaScript). The let construct is followed by one or several local bindings, for example (let ( (x 1) (y 2) ) (+ x y)) 3 but the bindings (x 1) and (y 2) are lexically scoped to just the entire let expression.

Some anonymous functions can be obtained with lambda. For example (let ( (double (lambda (x) (+ x x)) ) ) (double 3) ) 6, so observe that (define (triple x) (* 3 x)) is just a short-hand (some syntactic sugar) for (define triple (lambda (x) (* 3 x))).

The previously seen let construct is not enough in all cases. It cannot define recursive local functions, because when such functions are recursively called in their body, their name is not yet bound to them. For example, if we want to locally define two mutually recursive functions for testing odd and even numbers, we need to use letrec with two local bindings and the below example is (almost) exactly from the Guile manual:

 (letrec ((even? (lambda (n)
                    (if (zero? n)
                     #t
                     (odd? (- n 1)))))
          (odd? (lambda (n)
                    (if (zero? n)
                     #f
                     (even? (- n 1))))))
 (even? 8888))

Be careful to note that the mutually recursive calls above are in tail call position, and R5RS requires that to be computed iteratively. So even if you replace 8888 by a larger number like 998877, the computation cannot blow up your call stack. Actually, such tail-recursive letrec constructs are the recommended way to code iterations in Scheme. As an exercise, code in such a way the factorial, using an internal tail-recursive function of two arguments: the number n of which you compute the factorial and some accumulator p holding the partial result. Be aware that Scheme has other iteration mechanisms.

Of course in real life, few recursive functions or mutually-recursive functions are involving just tail-recursive calls. For example, most recursive descent parser or macro-expanders involve mutually-recursive functions with some call sites which are not tail-recursive. And it is common practice to code recursive functions with both tail-recursive and full-recursive calls.

Read more about primitive recursive functions which illustrate the mathematical equivalence between iteration and recursion, and study also continuation-passing style as a way to construct that equivalence.


Thanks to Abhishek Chakravarti (Kolkota, India) for constructive feedback and help on CSS.