Cyrus Harmon home

Iwashi -- packed arrays in clojure

27 August 2010 - Bolinas

One thing that I miss from Common Lisp in clojure is efficient support for elements of arbitrary, fixed types, such as integers within a given range. Yes, clojure does have support for arrays of booleans, bytes, chars, ints, shorts, longs, floats, doubles, and objects, but there is no explicit, efficient support for 2-, 4- and 5- bit integers. Even the 8-bit type (bytes) are really signed bytes, so there's no nice, effecient, built-in storage for quantities in the range 0-255. Why would one want 2-bit integers? Well, if you want to store the 3 billion bases of the human genome in RAM, you're going to use a lot less RAM to do so if you can pack 4 bases into each byte, rather than 1, or, worse yet, 1 base per 2-bytes if you use char-arrays.

Enter iwashi (Japanese for sardine, I think -- at least that's what they call it at the sushi counter). Iwashi provides for efficient storage of 2-, 4- and 5-bit integers in clojure arrays. There are 6 main functions that one calls to use Iwashi. Three functions for creating iwashi arrays:

And three functions for using the arrays:

To use, make sure iwashi is on the classpath and do, for instance:

(require '[iwashi.array :as array])

For example, let's make a 2-bit array with 128 items in it and the first 32 items to the item's index modulo 4:

(let [q (array/make-2-bit-array (bigint 128))]
  (doall (map #(array/set-item q % (mod % 4))
              (take 32 (iterate inc 0))))
  (map #(array/item q %)
       (take 32 (iterate inc 0))))
=> (0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3)

Now for a bigger, trivial, example, let's make a 4,000,000,000 element 2-bit array and set the first 32 values of it:

(let [q (array/make-2-bit-array (bigint 4e9))]
  (doall (map #(array/set-item q (+ (bigint 3e9) %) (mod % 4))
              (take 32 (iterate inc 0))))
  (map #(array/item q (+ (bigint 3e9) %))
       (take 32 (iterate inc 0))))
=> (0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3)

In another post, we'll load some DNA and protein sequences into packed arrays just for fun.

Enjoy.

--

blog comments powered by Disqus