hyPiRion

Hello World in Swearjure

posted

The code for Hello World in Swearjure lies here.

Swearjure Bootstrapping on Clojure 1.0

I have just written a Hello, World! program in Swearjure, including the project map needed for it. It is on nothing less than 684 lines with code1, most of it just parentheses, pluses, minuses and asterisks. You can have a look at the code in the GitHub repository: hello-swearjure.

If you haven’t read my blog post about what Swearjure is, I recommend you to do so before continuing on. This post will somewhat extend that one.

Is the Minimal Amount of Needed Alphanums 9?

Swearjure development kind of halted when we weren’t able to find any way to lookup or get functions from Clojure core, so instead of asking the question “Is Swearjure Clojure-complete without alphanumerics?”, we’ve turned the question into “What is the minimal amount of alphanumerics needed to make Swearjure Clojure-complete?”.

On Clojure 1.5.1, I managed to cut it down to 9, 7 distinct ones. I thought for a long time that this was the minimal amount of characters needed, but Technomancy commented that ^ fetches metadata in Clojure 1.0. After some playing around, I managed to get it down to just 5: The single ns-map function. So yes, the application itself would only need 5, but for Leiningen 2.1.x, both meta and ns-map are needed for project.clj.

The Magic Revealed

Contrary to most magicians, I prefer to explain how I manage to do this magic: Let’s first and foremost get an explanation of why meta/^ and ns-map is so critical for Swearjure. You see, the major issue in Swearjure is the fact that we cannot call any function with alphanumeric characters in them. However, we would usually like to do this, sometimes it’s critical to get stuff done. How would you otherwise be able to define a function, without def or defn available?

So let us repeat a trick from last post: We can convert everything which is seqable into a vector of elements we can do lookups on:

user=> `[~@{1 "one", 2 "two"}]
[[1 "one"] [2 "two"]] ;; or [[2 "two"] [1 "one"]]

This is where meta/^ and ns-map comes into play: ns-map takes a namespace and returns a map with the public symbols as keys, and their vars as values. If we can get this map, then we are able to fetch every single var defined within clojure.core. And, as perhaps most people know, when you call a var, the var just sends the call down to the value which is bound to the var:

user=> (find-ns 'clojure.core)
#<Namespace clojure.core>

user=> (ns-map *1)
{sorted-map #'clojure.core/sorted-map,
 read-line #'clojure.core/read-line,
 re-pattern #'clojure.core/re-pattern,
 cond->> #'clojure.core/cond->>, ...}

user=> `[~@*1]
[[sorted-map #'clojure.core/sorted-map]
 [read-line #'clojure.core/read-line]
 [re-pattern #'clojure.core/re-pattern]
 [cond->> #'clojure.core/cond->>] ...]

user=> (#'+ 1 2 3)
6

But where are we going to get that namespace to begin with? Well, this is where meta/^ is needed:

;; Clojure 1.5.1
user=> (meta #'+)
{:arglists ([] [x] [x y] [x y & more]),
 :ns #<Namespace clojure.core>, :name +,
 :column 1, :added "1.2", :inline-arities ...}

;; Clojure 1.0
user=> ^ #'+
{:ns #<Namespace clojure.core>, :name +, :file "clojure/core.clj",
 :line 549, :arglists ([] [x] [x y] [x y & more]), ...}

The :ns part within that map is what we’re going to fetch. By utilizing these things and combine them together, we get this (relatively deswearjurized) call:

user=> (get-in `[~@(-> `[~@^ #'+]
                        (get-in [0 1]) ns-map)]
                [x 1])
#'some-var

Where #'some-var depends upon the value of x. Now we have a way of fetching a var, but which var we fetch seems rather arbitrary.

How do we ensure that the var we fetch is the one we would like to use? Do we know beforehand what the namespace map look like? Well… If we do not alter clojure.core, we know that the map is not going to change: The implementation of maps in Clojure is deterministic, and henceforth it will always produce the same sequence. This sequence only depends on the Clojure version number, which we almost always control. Therefore, we can just find the position of the vars we need before we make the program. eval, for instance, is the 529th element (0-indexed) in Clojure 1.0. So if we replace x with 529, we get the var for eval:

user=> (get-in `[~@(-> `[~@^ #'+]
                        (get-in [0 1]) ns-map)]
                [529 1])
#'clojure.core/eval
;; Swearjurized, minus numbers. (We'll get to that.)
user=> ((`[~@(ns-map ((`[~@^ #'+](+))(*)))] 529) (*))
#'clojure.core/eval
user=> ((`[~@(ns-map ((`[~@^ #'+](+))(*)))] 345) (*))
#'clojure.core/defn

Bootstrapping Swearjure

So far, so good. We can use the anonymous reader to avoid to refer to ns-map more than once. As such, if we do something within that anonymous call which makes us able to do whatever we want, we would be very happy. And indeed there is a way: If we define a function $, which returns the nth var, we would be able to do anything! Through the use of char, str, symbol, and keyword, we can create any form we want and evaluate it with eval.

For 1.0, the bootstrap could be defined as follows:

(#(eval `(~'defn $ [-#]
           ((`[~@(ns-map ((`[~@^ #'+](+))(*)))] -#) (*)))))

;; Replace eval and defn with lookups in the namespace map
(#(((`[~@(ns-map ((`[~@^ #'+](+))(*)))] 529) (*))
     `(~((`[~@(ns-map ((`[~@^ #'+](+))(*)))] 345) (+))
       $ [-#] ((`[~@(ns-map ((`[~@^ #'+](+))(*)))] -#) (*)))))

;; Drag ns-map outside (must unquote it as well)
(#(((`[~@(% ((`[~@^ #'+](+))(*)))] 529) (*))
     `(~((`[~@(% ((`[~@^ #'+](+))(*)))] 345) (+))
       $ [-#] ((`[~@(~% ((`[~@^ #'+](+))(*)))] -#) (*))))
 ns-map)

It’s hairy and ugly, and if you replace those integers with lists representing their values, you’ll most likely have problems if you want to return it back to normal. However, now we can lookup any var using $, so we can do whatever we want now.

The Integer Representation Problem

I have explicitly left the hardest part to the end, although it at first may look like the most straightforward part: We now have a function $ which returns vars when it is given integers. Now we only have to create those integers, and all is well.

Creating integers in Swearjure was the easiest part of Swearjure up until recently: If you need the number n, just use (+ (*) ... (*)), where the number of (*)s is just equal to n. That is not the problem. The problem is the size of those numbers: When you have many of them, you end up getting a nice little message, saying:

Exception in thread "main" java.lang.ClassFormatError: Invalid method
Code length 69270 in class

You see, with very many (*)s, method sizes and method counts blow up and kill your program at compile time. Therefore, one has to do some smart tricks to avoid such issues2.

Factorization

What I initially did was a very straightforward prime factorization with an additional touch, and the implementation works as follows:

  • Define M(0) = (+), M(1) = (*) and M(2) = (+ (*) (*)).
  • For all primes p, define M(p) as the minimal solution of (+ M(p-1) (*)) and (- M(p+1) (*)).
  • For all non primes n, define the M(n) as (* M(x1) M(x2) ... M(x_m)), where x1, x2, …, x_m are the prime factors of n.

As an example, consider the number 156. Its prime factors are 2, 2, 3 and 13. M(2) is (+ (*) (*)), M(3) is (+ (+ (*) (*)) (*))3, and M(13) is (+ M(12) (*)), which in turn expands into

(+ (* (+ (*) (*)) (+ (*) (*)) (+ (+ (*) (*)) (*))) (*))

One would believe that this should do well enough, but this is actually not the case: I had to shave off a couple of characters here and there to not let project.clj crash due to its immense size.

Program Generation?

So, is there a more efficient way to create numbers? Perhaps unsurprisingly, there is. If we use anonymous function literals, we can for instance convert 64 to (#(*%%%%%%) (+ (*) (*)), which is very compact compared to many other representations of 64–though most likely not the smallest one out there. (#(*%%) (- (#(*%%% (+% (*))) (+ (*) (*))) (*))) is also an incredibly small representation of a number, this time 529 (eval).

I’ve done a lot of these manually in core.clj and stopped after a while, as it really is time-consuming. The reason I didn’t automate it from the start has to do with the fact that finding such representations programmatically requires a bit of work and a pretty good interpreter. For instance, it would be completely feasible to use a Y combinator for a recursive definition of a number, and I wouldn’t be surprised if that would be the shortest representation for some number out there.

There has been done similar work before. Dan Friedman and William Byrd have designed an interpreter for a very small language in miniKanren (which they also made, together with Oleg Kiselyov), and they are able to run it backwards to generate programs which evaluate to something they’ve already defined. I really, really recommend people to watch “miniKanren - Dan Friedman and William Byrd”, as it is both educational and fun. If your time schedule is limited, at least watch some minutes from 19:11, which is the part where they run the interpreter in reverse.

The next step for integer representation would then be to create a parser for a subset of Swearjure in clojure.core.logic. I would most definitely do “some tweaks” to get the smallest representations of numbers efficiently, as I have a slight suspicion that one has to give it hints of what branches to take first, etc.

Are We Done?

Is this is? Has the quest for a working Clojure program without any alphanumerical character ended?

By all means, no: This is just the beginning. We still have ns-map in the source code, which is 5 alphanumerical characters. Work to get that removed should be done, either by going another route than what I do (through metadata) or by looking around after shorter alternatives.

We also have the particular issue of short integer representations, which is a difficult and hard problem. Curiously, if someone manages to do this relatively efficiently, this may enable other things in real-world programs: Friedman and Byrd comment on this use case later in the talk I referred to, at 21:29.

And there are of course all sorts of other small issues: I tried to compile it down to bytecode and make it possible to run as a jar, but that seems to be harder than expected. Clojure doesn’t seem too happy about evaling namespaces during compile-time, but I assume that I can do some manual emitting if that’s needed. (I’ve temporarily left that out, as I’ve not figured that out yet.)

Currently, we have to know the Clojure version up front. Can we create a Swearjure program agnostic to the Clojure version it runs on top of? I have a feeling that it isn’t impossible, though it may make the bootstrapper a bit larger.

When those things have been figured out, we can have a look at a Clojure to Swearjure converter: Converting real usable Clojure code to equivalent Swearjure code, which may be used as production code. I’m not recommending people to use Swearjure in production code, of course, I just mention that it should be possible.

Until then, work on Swearjure is not going to stop.

  1. Most of this could be crunched down to an even lower amount through more efficient integer representations, as mentioned under “The Integer Representation Problem”. 

  2. I would think doing just (#(+ %%...%%) (*)) would solve the problem because you only call three methods here. But where’s the fun in that? 

  3. Yes, I’m aware I could have done minimal changes and gotten (+ (*) (*) (*). However, this would not give us any major benefits, and wouldn’t change the fact that we have to compress numbers even further.