Precomplation support in elixir_make

There is a growing number of Elixir libraries that come with functions implemented in foreign languages like C/C++, Rust, Zig and so on for different reasons. Compiling these foreign codes isn't a big issue on a powerful machine, but for less powerful devices (limited by size, thermal, power consumption, etc.) such as a Raspberry Pi, it can take a very long time.

Also, for livebook (.livemd file) that uses Mix.install to install dependent NIF libraries, such as:

Mix.install([
  {:nif_lib_a, "~> 0.1"}
])

the nif_lib_a will be compiled and cached based on the whole deps configuration. That means if we later would like to add another library, even the newly added one is implemented in pure elixir, nif_lib_a will have to be recompiled unless nif_lib_a explicitly uses a global location to cache itself.

Mix.install([
  # nif_lib_a will be compiled again
  # even if the newly added library is
  # implemented in pure elixir
  {:nif_lib_a, "~> 0.1"},
  {:pure_elixir_lib_b, "~> 0.1"}
])

There are some other reasons that one might want to use precompiled artefacts besides the above one,

  • when the compiling toolchain is not available (e.g, running livebook on some embedded devices, or running in the nerves environment, where a C compiler is not shipped)
  • a working C/C++, Rust, or Zig compiler will not be a strict requirement.
  • to save the compilation time.

Although it's possible to completely leave the task of reusing compiled artefacts to each NIF library, there should be a way to make it at least slightly easier.

Therefore I'm exploring adding a behaviour to elixir_make to support using precompiled artefacts. And here is the Mix.Tasks.ElixirMake.Precompile behaviour (elixir-lang/elixir_make#55).

It's a behaviour because this would allow elixir_make using different precompilers for different NIF libraries to suit the needs. Also, this allows the NIF library developers to change their preferred precompiler module. And lastly, as you might've guessed, anyone can write their own precompiler module if there is no existing precompiler that quite fits their compilation pipelines.

If you'd like to write a precompiler module yourself, there are 7 required and 2 optional callbacks. It's recommended to implement them in the following order:

  • all_supported_targets/0 and current_target/0.
  @typedoc """
  Target triplets
  """
  @type target :: String.t()

  @doc """
  This callback should return a list of triplets ("arch-os-abi") for all supported targets.
  """
  @callback all_supported_targets() :: [target]

  @doc """
  This callback should return the target triplet for current node.
  """
  @callback current_target() :: {:ok, target} | {:error, String.t()}

As their name suggests, the precompiler should return the identifier of all supported targets and the current target. Usually, the identifier is the arch-os-abi triplet, but it is not a strict requirement because these identifiers are only used within the same precompiler module.

Note that it is possible for the precompiler module to pick up other environment variables like TARGET_ARCH and use them to override the value of the current target.

  • build_native/1.
  @doc """
  This callback will be invoked when the user executes the `mix compile`
  (or `mix compile.elixir_make`) command.
  The precompiler should then compile the NIF library "natively". Note that
  it is possible for the precompiler module to pick up other environment variables
  like `TARGET_ARCH=aarch64` and adjust compile arguments correspondingly.
  """
  @callback build_native(OptionParser.argv()) :: :ok | {:ok, []} | no_return

This callback corresponds to the mix compile command. The precompiler should then compile the NIF library for the current target.

After implementing this one, you can try to compile the NIF library natively with the mix compile command.

  • precompile/2.
  @typedoc """
  A map that contains detailed info of a precompiled artefact.
  - `:path`, path to the archived build artefact.
  - `:checksum_algo`, name of the checksum algorithm.
  - `:checksum`, the checksum of the archived build artefact using `:checksum_algo`.
  """
  @type precompiled_artefact_detail :: %{
    :path => String.t(),
    :checksum => String.t(),
    :checksum_algo => atom
  }

  @typedoc """
  A tuple that indicates the target and the corresponding precompiled artefact detail info.
  `{target, precompiled_artefact_detail}`.
  """
  @type precompiled_artefact :: {target, precompiled_artefact_detail}

  @doc """
  This callback should precompile the library to the given target(s).
  Returns a list of `{target, acrhived_artefacts}` if successfully compiled.
  """
  @callback precompile(OptionParser.argv(), [target]) :: {:ok, [precompiled_artefact]} | no_return

There are two arguments passed to this callback, the first one is the command line args. For example, the value of the first one will be [arg1, arg2] if one executes mix elixir_make.precompile arg1 arg2.

The second argument passed to the callback is a list of target identifiers. The precompiler should compile the NIF library for all of these targets.Note that this list of targets is always a subset of the result returned by all_supported_targets/0 but not necessarily the same list. This is designed to avoid introducing a foreseeable breaking change for precompiler modules if we prefer to add a mechanism to filter out some targets in the future.

After implementing this one, you can try the mix elixir_make.precompile command to compile for all targets.

  • post_precompile/1.
  @doc """
  This optional callback will be invoked when all precompilation tasks are done,
  i.e., it will only be called at the end of the `mix elixir_make.precompile`
  command.
  Post actions to run after all precompilation tasks are done. For example,
  actions can be archiving all precompiled artefacts and uploading the archive
  file to an object storage server.
  """
  @callback post_precompile(context :: term()) :: :ok

This is an optional callback where the precompiler module can do something after precompile/2. The argument passed to this callback is the precompiler module context.

  • precompiler_context/1.
  @doc """
  This optional callback is designed to store the precompiler's state or context.
  The returned value will be used in the `download_or_reuse_nif_file/1` and
  `post_precompile/1` callback.
  """
  @callback precompiler_context(OptionParser.argv()) :: term()

This is an optional callback that returns a custom variable which is (perhaps) based on the list of the command line arguments passed to it. The returned variable may contain anything it needs for the post_precompile/1 and download_or_reuse_nif_file/1.

For example, a simple HTTP-based username and password that are passed by command line arguments. mix elixir_make.precompile --auth USERNAME:PASSWD or mix elixir_make.fetch --all --user hello --pass world.

  • available_nif_urls/0 and current_target_nif_url/0.
  @doc """
  This callback will be invoked when the user executes the following commands:
  - `mix elixir_make.fetch --all`
  - `mix elixir_make.fetch --all --print`
  The precompiler module should return all available URLs to precompiled artefacts
  of the NIF library.
  """
  @callback available_nif_urls() :: [String.t()]

  @doc """
  This callback will be invoked when the user executes the following commands:
  - `mix elixir_make.fetch --only-local`
  - `mix elixir_make.fetch --only-local --print`
  The precompiler module should return the URL to a precompiled artefact of
  the NIF library for current target (the "native" host).
  """
  @callback current_target_nif_url() :: String.t()

The first one should return a list of URLs to the precompiled artefacts of all available targets, and the second one should return a single URL for the current target.

  • download_or_reuse_nif_file/1.
  @doc """
  This callback will be invoked when the NIF library is trying to load functions
  from its shared library.
  The precompiler should download or reuse nif file for current target.
  ## Paramters
    - `context`: Precompiler context returned by the `precompiler_context` callback.
  """
  @callback download_or_reuse_nif_file(context :: term()) :: :ok | {:error, String.t()} | no_return

The precompiler module should either download the precompiler artefacts or reuse local caches for the current target. The argument passed to this callback is the precompiler context.

Below is the Mix.Tasks.ElixirMake.Precompile behaviour (as of 04 Aug 2022). And here is a complete demo precompiler, cocoa-xu/cc_precompiler and an example project that uses this demo precompiler, cocoa-xu/cc_precompiler_example.

defmodule Mix.Tasks.ElixirMake.Precompile do
  use Mix.Task

  @typedoc """
  Target triplets
  """
  @type target :: String.t()

  @doc """
  This callback should return a list of triplets ("arch-os-abi") for all supported targets.
  """
  @callback all_supported_targets() :: [target]

  @doc """
  This callback should return the target triplet for current node.
  """
  @callback current_target() :: {:ok, target} | {:error, String.t()}

  @doc """
  This callback will be invoked when the user executes the `mix compile`
  (or `mix compile.elixir_make`) command.
  The precompiler should then compile the NIF library "natively". Note that
  it is possible for the precompiler module to pick up other environment variables
  like `TARGET_ARCH=aarch64` and adjust compile arguments correspondingly.
  """
  @callback build_native(OptionParser.argv()) :: :ok | {:ok, []} | no_return

  @typedoc """
  A map that contains detailed info of a precompiled artefact.
  - `:path`, path to the archived build artefact.
  - `:checksum_algo`, name of the checksum algorithm.
  - `:checksum`, the checksum of the archived build artefact using `:checksum_algo`.
  """
  @type precompiled_artefact_detail :: %{
    :path => String.t(),
    :checksum => String.t(),
    :checksum_algo => atom
  }

  @typedoc """
  A tuple that indicates the target and the corresponding precompiled artefact detail info.
  `{target, precompiled_artefact_detail}`.
  """
  @type precompiled_artefact :: {target, precompiled_artefact_detail}

  @doc """
  This callback should precompile the library to the given target(s).
  Returns a list of `{target, acrhived_artefacts}` if successfully compiled.
  """
  @callback precompile(OptionParser.argv(), [target]) :: {:ok, [precompiled_artefact]} | no_return

  @doc """
  This callback will be invoked when the NIF library is trying to load functions
  from its shared library.
  The precompiler should download or reuse nif file for current target.
  ## Paramters
    - `context`: Precompiler context returned by the `precompiler_context` callback.
  """
  @callback download_or_reuse_nif_file(context :: term()) :: :ok | {:error, String.t()} | no_return

  @doc """
  This callback will be invoked when the user executes the following commands:
  - `mix elixir_make.fetch --all`
  - `mix elixir_make.fetch --all --print`
  The precompiler module should return all available URLs to precompiled artefacts
  of the NIF library.
  """
  @callback available_nif_urls() :: [String.t()]

  @doc """
  This callback will be invoked when the user executes the following commands:
  - `mix elixir_make.fetch --only-local`
  - `mix elixir_make.fetch --only-local --print`
  The precompiler module should return the URL to a precompiled artefact of
  the NIF library for current target (the "native" host).
  """
  @callback current_target_nif_url() :: String.t()

  @doc """
  This optional callback is designed to store the precompiler's state or context.
  The returned value will be used in the `download_or_reuse_nif_file/1` and
  `post_precompile/1` callback.
  """
  @callback precompiler_context(OptionParser.argv()) :: term()

  @doc """
  This optional callback will be invoked when all precompilation tasks are done,
  i.e., it will only be called at the end of the `mix elixir_make.precompile`
  command.
  Post actions to run after all precompilation tasks are done. For example,
  actions can be archiving all precompiled artefacts and uploading the archive
  file to an object storage server.
  """
  @callback post_precompile(context :: term()) :: :ok

  @optional_callbacks precompiler_context: 1, post_precompile: 1
end

Raspberry Pi Bluetooth Gamepad Setup

Setting up a Bluetooth gamepad on a Raspberry Pi may not be as straightforward as some were expected. This post documents how I set up a Bluetooth gamepad on a Raspberry Pi, and I hope it could help you if you encountered any difficulties when doing so.

Specifically, the gamepad I'm using is an Xbox controller. If you are using a joystick from other brands, there is a high chance that this post still works for you, but as I only have an Xbox controller, your mileage may vary.

1. Change Bluetooth Settings

The first thing to look at is /etc/bluetooth/main.conf. In my setup, I changed the following option values. The first five were in the General section and the last one was in the Policy section of this .conf file.

[General]
Class = 0x000100
ControllerMode = dual
FastConnectable = true
Privacy = device
JustWorksRepairing = always

[Policy]
AutoEnable=true

Then we can restart the bluetooth service or restart the pi.

sudo systemctl restart bluetooth
# or restart the pi
# sudo reboot

2. (Optional) Test

To test if everything works, we can use bluetoothctl to manually pair and connect to the joystick.

$ bluetoothctl
[bluetooth]# scan on
Discovery started
[NEW] Device AA:BB:CC:DD:EE:FF Controller Name
[bluetooth]# pair AA:BB:CC:DD:EE:FF
[bluetooth]# trust AA:BB:CC:DD:EE:FF
[bluetooth]# connect AA:BB:CC:DD:EE:FF

If there's still some issues connecting the joystick, then you may have a try to disable the Bluetooth ERTM feature.

echo <<EOF | sudo tee /etc/modprobe.d/bluetooth.conf
options bluetooth disable_ertm=1
EOF
sudo reboot

3. (Optional) Auto-connect Script

We can have an auto-connect script that tests whether the input device exists, and try to connect to the gamepad if the specified device is not presented.

#!/bin/sh
JS="$1"
ADDRESS="$2"
usage() {
	S="$(basename "$0")"
	echo "usage:  ${S} [input_name] [address]"
	echo "        ${S} js0 AA:BB:CC:DD:EE:FF"
	exit 1
}
if [ -z "${JS}"] || [ -z "${ADDRESS}" ]; then
	usage
fi

if [ -e "/dev/input/${JS}" ]; then
	echo "${JS} is connected"
else
	echo "try to connect ${ADDRESS}"
	echo "connect ${ADDRESS}" | bluetoothctl
fi

A cron task could be set to execute this script every minute. The following code shows how to auto-connect to a joystick at AA:BB:CC:DD:EE:FF, assuming only connect one gamepad to Raspberry Pi (thus testing /dev/input/js0).

* * * * * /usr/bin/autoconnect_joystick js0 AA:BB:CC:DD:EE:FF

Numerical Elixir Benchmark: CIFAR10 with 3-Layer DenseNN

TLDR:

  1. Use C libraries (via NIF) for matrix computation when performance is the top priority. Otherwise, it is about $10^3$ times slower in terms of matrix computation.
  2. OTP 25 introduces JIT on ARM64 and it shows 3-4% performance improvement (matrix computation).
  3. Almost linear speedup can be achieved when a large computation task can be divided into independent smaller ones.
  4. Apple M1 Max performs much better than its x86_64 competitors (Intel Core i9 8950HK and AMD Ryzen 9 3900XT).

Benchmark code here: https://github.com/cocoa-xu/CIFAR-10-livebook

Numerical Elixir

I started to use Elixir/Erlang about 2 months ago, and I learned the existence of Numerical Elixir (Nx) from my supervisor, Lito.

Basically, Nx to Elixir is like NumPy to Python. They implemented a number of numerical operations, especially for multi-dimensional arrays. It's worth noting that Nx comes with a built-in auto-grader, which means that we don't have to write the corresponding differentiate functions for backwards-propagating when training a neural network.

I explored the Nx and tried to write some benchmarks to evaluate its performance with different hardware (Raspberry Pi 4, x86_64 laptops and desktops, ARM64 laptops) and conditions (Allow calls to external C libraries vs. Pure Elixir implementation). And here I finally got some numbers!

P.S. The goal of this benchmark is only to evaluate the matrix computation performance, instead of getting a decent (or even acceptable) CIFAR-10 prediction accuracy.

Benchmark Settings

Hardware

  • Raspberry Pi 4, 8 GB of RAM. Ubuntu 20.04 aarch64.
  • x86_64 laptop. Intel 8th Gen Core i9 8950HK, 6 Cores 12 Threads, MacBook Pro (15-inch, 2018), 32 GB RAM. macOS Big Sur 11.1 x86_64.
  • x86_64 desktop. AMD Ryzen 9 3900XT, 12 Cores 24 Threads, Desktop PC, 64 GB RAM, NVIDIA RTX 3090. Ubuntu 20.04 x86_64.
  • ARM64 laptop. M1 Max, 10 Cores (8 Performance + 2 Effiency) 10 Threads, MacBook Pro (14-inch, 2021), 64 GB RAM. macOS Montery 12.0.1 aarch64.

Software

Dataset

CIFAR-10 binary version.

Method

  • 3-layer DenseNN.
    1. Input layer. Dense layer, size {nil, 1024, 64} + {nil, 64}, activation sigmoid.
    2. Hidden layer. Dense layer, size {nil, 64, 32} + {nil, 32}, activation sigmoid.
    3. Output layer. Dense layer, size {nil, 32, 10} + {nil, 10}, activation softmax.
  • Number of epochs: 5.
  • Batch size.
    • 300 when using Nx.BinaryBackend, single-thread
    • 250 * n_jobs when using Nx.BinaryBackend, multi-thread. n_jobs will be the number of available logical cores.
    • 300 when using Torchx.Backend.
  • Binary.
Benchmark.run(
  backend: Nx.BinaryBackend,
  batch_size: 300,
  n_jobs: 1
)
  • Binary MT.
Benchmark.run(
  backend: Nx.BinaryBackend,
  batch_size:250 * System.schedulers_online(),
  n_jobs: System.schedulers_online()
)
  • Torch CPU/GPU.
Benchmark.run(backend: Torchx.Backend, batch_size: 300)

Benchmark Results

Numbers are in seconds.

I'll fill in the empty cells when the rest benchmarks are done.

HardwareBackendOTPLoad DatasetTo Batched InputMean Epoch Time
Pi 4Binary24
Pi 4Binary MT24
Pi 4Binary25194.42711.91727336.010
Pi 4Binary MT25207.92311.85518210.347
Pi 4Torch CPU2415.3344.88017.170
Pi 4Torch CPU2516.3724.44216.207
8950HKBinary2417.9943.0364460.758
8950HKBinary MT2417.8262.9341471.090
8950HKTorch CPU242.1410.7780.841
3900XTBinary246.0582.3913670.930
3900XTBinary MT246.0342.536786.443
3900XTTorch CPU241.6530.6170.770
3900XTTorch GPU241.6300.6520.564
M1 MaxBinary2411.0902.1353003.321
M1 MaxBinary MT2410.9252.154453.536
M1 MaxBinary259.4581.5483257.853
M1 MaxBinary MT259.9491.527436.385
M1 MaxTorch CPU241.7021.9000.803
M1 MaxTorch CPU251.5990.7450.773