Takodachi! Advent Of Code 2021, Day 11

Towards world domINAtion!

#!/usr/bin/env python3

import numpy as np
from functools import reduce

input = """1224346384
5621128587
6388426546
1556247756
1451811573
1832388122
2748545647
2582877432
3185643871
2224876627"""


def flash(iter, takodachi, count, all_flash, flash_threshold=10):
    takodachi += 1
    width, height = takodachi.shape
    old = set()
    while True:
        flash_mask = np.where(np.greater_equal(takodachi, flash_threshold))
        d = list(set([(flash_mask[0][i], flash_mask[1][i]) for i in range(len(flash_mask[0]))]).difference(old))
        if len(d) > 0:
            mask = (np.array([d[i][0] for i in range(len(d))]), np.array([d[i][1] for i in range(len(d))]))
            old = old.union(set([(mask[0][i], mask[1][i]) for i in range(len(mask[0]))]))
            adjacents = []
            for x in [-1, 0, 1]:
                for y in [-1, 0, 1]:
                    if x != 0 or y != 0:
                        mx = mask[0] + x
                        my = mask[1] + y
                        keep = np.where((mx >= 0) & (mx < width) & (my >= 0) & (my < width))[0]
                        adjacents.append((mx[keep], my[keep]))
            for adjacents_mask in adjacents:
                takodachi[adjacents_mask] += 1
        else:
            break

    d = list(old)
    flashes = len(d)
    mask = (np.array([d[i][0] for i in range(flashes)]), np.array([d[i][1] for i in range(flashes)]))
    if flashes > 0:
        takodachi[mask] = 0
        if flashes == width * height:
            all_flash.append(iter + 1)
    return takodachi, flashes + count, all_flash


if __name__ == '__main__':
    width = len(input.split("\n", 1)[0])
    n = np.array([c for c in input.replace('\n', '')], dtype=int)
    tako = n.reshape(width, -1)
    max_step = 500
    tako, flashes, all_flashes = reduce(lambda acc, iter: flash(iter, acc[0], acc[1], acc[2]), np.arange(max_step), (tako, 0, []))
    print(tako, flashes, all_flashes[0])

ARM64 CPU Compilation Test – Season 1

Takeaways: (1) M1 Max is powerful, deserves its price and outperforms some more expensive ARM64 servers. (2) The Always Free tier available on Oracle cloud can provide ARM64 servers with decent performance.

Just a quick compilation test amongst Apple M1 Max, AWS c6g.2xlarge, c6g.metal and Oracle Ampere (VM.Standard.A1.Flex).

Hardware-wise, I'm using a MacBook Pro 14-inch with M1 Max (10 cores, 8 performance cores + 2 efficiency cores). The build is done in ARM64 docker with the arm64v8:ubuntu image. The docker engine can use 8 cores and 14 GB of RAM. It's worth noting that allocating 8 cores to the docker engine does not guarantee they are all performance cores. The core schedule is handled by macOS and there is no core pinning in recent macOS.

The hardware configuration on AWS c6g.2xlarge is just the stock one, which is 8 cores and 16 GB of RAM. The system image on the c6g.2xlarge machine is also ubuntu 20.04.

As for the Oracle Ampere (VM.Standard.A1.Flex), I tested three configurations:

  1. 4 CPUs, 24 GB of RAM
  2. 8 CPUs, 48 GB of RAM
  3. 16 CPUs, 96 GB of RAM

The first configuration is eligible for the Oracle Always Free Tier while the second configuration is meant to match the cores count with M1 Max and AWS c6g.2xlarge. The last one is the topped out spec (by default, but can increase the quota by upgrading to a paid account). The OS image used on these configurations is ubuntu 20.04 as well (image build is 2021.10.15-0).

M1 Max completed the compilation in ~28 minutes while it took ~45 minutes and xx minutes for AWS c6g.2xlarge and c6g.metal respectively. The Oracle Ampere machines finished in ~68 minutes (4c), ~42 minutes (8c) and (16c). The precise results are shown in the table below.

MachineCoresRAMCostCompile Time (seconds)
MBP 14", M1 Max8@~3GHz14 GBOne Time, ≥$2,499.001697.344
AWS c6g.2xlarge8@2.5GHz16 GB$0.272/hr (~$204/m)2736.556
AWS c6g.metal64@2.5GHz128 GB$2.176/hr (~$1632/m)1448.384
Oracle Ampere4@3GHz24 GBFree Tier4109.323
Oracle Ampere8@3GHz48 GB$0.08/hr (~$30/m)2569.361
Oracle Ampere16@3GHz96 GB$0.16/hr (~$89/m)1906.699
GCC 11.2 compilation time on different machines.

As we can see that M1 Max is about 37.98% faster than the c6g.2xlarge machine.

M1 Max (8c) completed in ~28 minutes
AWS c6g.2xlarge finished in ~45 minutes
AWS c6g.metal finished in ~24 minutes
Oracle Ampere (VM.Standard.A1.Flex) 16 cores, GCC 11.2 compiled in ~31 minutes.
Oracle Ampere (VM.Standard.A1.Flex) 8 cores, GCC 11.2 compiled in ~42 minutes.
Oracle Ampere (VM.Standard.A1.Flex) 4 cores, GCC 11.2 compiled in ~68 minutes.

The test script used is shown below

#!/bin/bash

export GCC_VER=11.2.0
export GCC_SUFFIX=11.2

export sudo="$(which sudo)"
$sudo apt-get update -y
$sudo apt-get install -y make build-essential wget zlib1g-dev
wget "https://ftpmirror.gnu.org/gcc/gcc-${GCC_VER}/gcc-${GCC_VER}.tar.xz" \
  -O "gcc-${GCC_VER}.tar.xz"
tar xf "gcc-${GCC_VER}.tar.xz"
cd "gcc-${GCC_VER}"
contrib/download_prerequisites
cd .. && mkdir build && cd build

../gcc-${GCC_VER}/configure -v \
  --build=aarch64-linux-gnu \
  --host=aarch64-linux-gnu \
  --target=aarch64-linux-gnu \
  --prefix=/usr/local \
  --enable-checking=release \
  --enable-languages=c,c++,go,d,fortran,objc,obj-c++ \
  --disable-multilib \
  --program-suffix=-${GCC_SUFFIX} \
  --enable-threads=posix \
  --enable-nls \
  --enable-clocale=gnu \
  --enable-libstdcxx-debug \
  --enable-libstdcxx-time=yes \
  --with-default-libstdcxx-abi=new \
  --enable-gnu-unique-object \
  --disable-libquadmath \
  --disable-libquadmath-support \
  --enable-plugin \
  --enable-default-pie \
  --with-system-zlib \
  --with-target-system-zlib=auto \
  --enable-multiarch \
  --enable-fix-cortex-a53-843419 \
  --disable-werror

time make -j`nproc`

Cocoa's Linux Package Repo

# at least you should upgrade the ca-certificates package
# to get the latest ssl root certificates
sudo apt update && sudo apt install -y ca-certificates gnupg2 curl

# add key
curl https://repo.uwucocoa.moe/pgp.key | gpg --dearmor | \
    sudo tee /usr/share/keyrings/uwucocoa-archive-keyring.gpg

# add source for arm64
echo "deb [arch=arm64] https://repo.uwucocoa.moe/ stable main" | \
  sudo tee /etc/apt/sources.list.d/uwucocoa.list

# update caches
sudo apt update
# all packages from this repo have a 'uwu' suffix
sudo apt-cache search uwu

# (also has some packages for amd64(x86_64), armhf(armv7), s390x, ppc64el, riscv64)
echo "deb [arch=amd64] https://repo.uwucocoa.moe/ stable main" | \
  sudo tee /etc/apt/sources.list.d/uwucocoa.list
echo "deb [arch=armhf] https://repo.uwucocoa.moe/ stable main" | \
  sudo tee /etc/apt/sources.list.d/uwucocoa.list
echo "deb [arch=s390x] https://repo.uwucocoa.moe/ stable main" | \
  sudo tee /etc/apt/sources.list.d/uwucocoa.list
echo "deb [arch=ppc64el] https://repo.uwucocoa.moe/ stable main" | \
  sudo tee /etc/apt/sources.list.d/uwucocoa.list
echo "deb [arch=riscv64] https://repo.uwucocoa.moe/ stable main" | \
  sudo tee /etc/apt/sources.list.d/uwucocoa.list

Available packages can be viewed at https://repo.uwucocoa.moe/pool/main/. Although there are a few armhf, s390x and ppc64el packages.