I recently got my hands on an NVIDIA Jetson Nano, which NVIDIA describes as “a small, powerful computer that lets you run multiple neural networks in parallel”. In practice, it resembles a souped-up Raspberry Pi, with a quad-core ARM CPU, a 128-core Maxwell-based NVIDIA GPU, 4GiB of RAM, and a power consumption of 5W. Quite slow compared to a real computer, but fast enough that you can do interesting things with it. Some people are using them for self-driving cars or automated doorbells, but I’ll probably just make it render pretty fractals on the wall display in my office. Since I long ago exceeded my tolerance for writing GPU code by hand, the first step is of course to figure out a way to run Futhark on the device. While the Jetson does not support OpenCL, The Futhark compiler now has a CUDA backend, so it should be possible. This blog post documents how to get it working.
I’ll be assuming that you have a freshly installed Jetson Nano with a
working CUDA setup, meaning that you can run
nvcc in the command
line and compile CUDA programs. For inexplicable reasons, NVIDIA does
not set the environment variables correctly out of the box, but
setting the following should take care of it:
You will need a root partition with at least 32GiB of space.
There are two ways of running Futhark code on the Jetson:
futhark cudaon some other machine, copy the generated
.cfile to the Jetson, and then compile to a binary there. Since the C code generated by the Futhark compiler is not machine specific, it can easily be moved.
- Run an ARM build of the Futhark compiler on the Jetson itself.
I’ll cover the former option first, since it is much simpler. When
futhark cuda foo.fut, the Futhark compiler will generate a
foo.c and a binary
foo. You can then move that
to the Jetson and compile it with:
$ gcc foo.c -o foo -O -std=c99 -lm -lcuda -lnvrtc
Note that if your host system does not itself support CUDA,
foo will fail. However,
generated, so you can still copy it to the Jetson and finish
compilation there. It’s not pretty, but it works. If you use
futhark cuda --library, which you likely will for real use,
gcc is not invoked for you, so you will not see any error.
The Jetson uses an ARM CPU, and Futhark binary releases are currently only available for x86-64. Hence, we’ll have to recompile the Futhark compiler from scratch. This is normally a straightforward procedure, but a little more tricky when using an exotic architecture (ARM) and a small machine (the Jetson). Specifically, The Futhark compiler is written in Haskell, and while the Glasgow Haskell Compiler (GHC) does support ARM, it is not a so-called “tier 1 platform”, meaning that binary releases are spotty. This looks like it will change in the future, but for now, it takes some effort to get a usable Haskell infrastructure set up on the Jetson.
Ideally, we’d cross-compile an ARM build of Futhark from a beefier machine, but cross-compiling is notoriously difficult, and I could not get it to work. Instead, we’ll compile Futhark on the Jetson itself. Futhark uses the Stack build tool, which fortunately comes compiled for ARM:
$ curl -sSL https://get.haskellstack.org/ | sh
Unfortunately, Futhark’s Stack configuration specifies GHC 8.6.5, and the newest official binary release of GHC on ARM is 8.4.2. While in theory we could use GHC 8.4.2 to compile GHC 8.6.5 on the Jetson, this would take an extremely long time. Instead, we will be using the Nix package manager, which has binary releases of recent GHCs. Installing Nix is non-invasive (we will not be using all of NixOS, which would definitely be invasive):
$ curl https://nixos.org/nix/install | sh
While this saves us from compiling GHC itself, we still have to
compile a lot of Haskell, and GHC always hungers for memory. First,
GHC uses too too much RAM-disk space (specifically
the default cap of 10% of physical memory is not sufficient. Edit
/etc/systemd/logind.conf and set
Reboot after this. If you have more systemd knowledge than I, maybe
you can avoid the reboot.
RAM-wise, the Jetson’s 4GiB is not enough. Therefore, set up a 4GiB swap file:
# sudo fallocate -l 4G /swapfile
# sudo chmod 600 /swapfile
# sudo mkswap /swapfile
# sudo swapon /swapfile
This setup is transient, meaning it’ll go away on next reboot, but
you’ll have to delete
Now clone the Futhark Git repository as usual,
cd into it, and
$ stack --nix install --fast -j1
--nix part tells
stack to fetch GHC from Nix, rather than
use a non-existent official release.
--fast disables the Haskell
optimiser, which saves on time and space.
-j1 limits concurrency
to one job, also to limit memory usage. You may be able to bump this
-j4) to speed up compilation. If the build crashes
at some point due to an out-of-memory situation, simply reduce it to
-j1 and carry on. All dependencies that managed to be
succesfully built should still be available.
The build need not finish in one sitting, which is good, because this
will take a long time. When it’s done, you’ll have a
binary located in
$HOME/.local/bin. To verify that it works, try
running part of the Futhark test suite:
$ futhark test --backend=cuda examples
Hopefully, it should work. Congratulations! You can now compile and run Futhark programs on the Jetson. There are no other Jetson-specific considerations that I have noticed. Unfortunately, the CUDA backend is for C, not Python, although we may implement a PyCUDA backend some day. If you want to easily show some graphics, consider Lys, which will certainly also be the topic of a future blog post.