An introduction to Cloud-init

12 August, 2020

Brett Milford

Cloud-init is an open-source package widely used in cloud computing environments to initialise and configure instances when they are first booted up. It is designed to simplify the process of provisioning and customising instances in various cloud platforms. Amusingly, cloud-init has only a modest number of Github stars and is not one of the most stared repositories under Canonical’s organisation, despite underpinning so much of the cloud platforms we consume. Because of this behind the scenes position, it is often used without users being aware its being used, and as such a little bit of knowledge and awareness of cloud-init can go a long way in debugging and platform architecture and design scenarios.

When an instance is launched, cloud-init runs during the boot process and performs several tasks, including:

  1. Metadata retrieval: Cloud-init retrieves information from the cloud platform’s metadata service. This metadata may include details such as the instance’s hostname, network configuration, SSH keys, user data, and other custom attributes.

  2. User data processing: Cloud-init processes the user data provided during the instance launch. User data can be in the form of scripts or cloud-init directives, allowing users to define custom configurations or execute specific actions during initialisation.

  3. Operating system customisation: Cloud-init configures the instance’s operating system based on the retrieved metadata and user data. It can perform tasks like setting up networking, creating users, running scripts, installing packages, configuring services, and more. This allows for the automated and consistent setup of an instance.

Cloud-init supports various cloud platforms, including Amazon EC2, Azure, Google Cloud Platform, and OpenStack, and is utilised under the hood by even more. It is flexible and extensible, enabling users to write custom cloud-init modules to perform specialised tasks during instance initialisation.

By leveraging cloud-init, platform consumers (be they humans, or other applications) automate the setup of instances, making it easier to manage and scale cloud infrastructure. It also provides a standardised way to configure instances across different cloud platforms, simplifying the deployment process and reducing manual intervention.

From the user perspective lets look at a couple of scenarios in which cloud-init may be useful:

  1. You have a root file system image of a legacy application which needs to be ported to run on cloud infrastructure.
  2. You need to modify some minor aspects of a cloud based instance at boot without minting a new image each time a change is made.

In each of these cases, the cloud-init is the solution that will help us achieve these goals. Due to its support from cloud providers and distributions it is able to provide a unified and cross-platform interface for deep OS customisation, ideal for tasks like these. A defining feature of Ubuntu “cloud” images 1 is that they are shipped with cloud-init already installed, enabling them to be instantly compatible with and deployable on all major cloud providers.

In the first scenario, you simply need to install the cloud-init package for your distribution 2, and upload the image to your cloud provider. The defaults of cloud-init are often sufficient to make your image compatible with your cloud provider’s metadata infrastructure. However, in some circumstances, there may be additional needs or requirements from your cloud provider. For instance, the OpenStack Image Guide 3 details additional considerations for an OpenStack cloud.

For the second scenario, we will discuss the basics of cloud-init and provide some examples for customising instances with cloud-init when used in conjunction with other platforms.

# Basics of Cloud-init

## Stages

Cloud-init is installed by default on Ubuntu cloud images. When installed, cloud-init integrates itself with the boot process at various stages to customise the instance.

  1. SystemD Generator. At this stage a SystemD generator will dynamically create a unit file for cloud-init. Specifically it allows cloud-init to be disabled by the presence of /etc/cloud/cloud-init.disabled or the kernel parameter cloud-init=disabled.
  2. Local. Performs basic initialisation such as locating local data sources and applying network configuration, which can be performed after mounting root.
  3. Network. Performs initialisation, which can take place after the network is available. Specifically it will run the cloud_init_modules listed in /etc/cloud/cloud.cfg of the image. user-data is loaded from a data-source at this point. The disk_setup and mounts modules are run during this stage and perform actions like formatting and disks and configuring mount points. bootcmd directives are run here as well.
  4. Config. This stage runs cloud_config_modules. runcmd directives are run here.
  5. Final. This stage runs as late as possible and executes cloud_final_modules. Notably, package installation and user-scripts are run here.

## Data sources

Cloud-init derives the configuration of an instance from a number of data sources. These are typically split into three categories:

  • Cloud
  • Vendor
  • User Data

Cloud data sources found in Canonical software include MAAS, OpenStack, and NoCloud types. Vendor data is typically provided by the entity that launches an instance (such as a cloud provider) to further customise an image for a particular environment.

User data, as the name suggests, is most commonly used to customise an image to the user’s needs, and as such, is the main part of our focus.

## User data

User data can be provided in a number of formats, including gzip compressed, which may be useful and necessary as user data is typically limited to 16384 bytes (this may vary depending on the cloud provider).

Those formats include shell scripts and YAML formatted #cloud-config. Cloud-init is modular and configurable by design and can support many data source options; however, the three listed here are the most commonly used.

#cloud-config is by far the most common format used of the options listed and is an easy way to customize your cloud instance in a declarative manner. Cloud-config is a YAML formatted file that starts with #cloud-config.

A series of configuration items targeting various modules follow this statement. The configuration items for each module are collated and passed to the module when it is executed in the corresponding cloud-init stage, as outlined in [2.1](#* Stages).

## Modules

Cloud-init modules allow the detailed customisation of various parts of the OS. While some of these are generic to the OS deployed (e.g. packages), others are only applicable to a given distribution (Ubuntu Advantage, Red Subscription).

A few module examples are shown below, covering:

  • Adding a PPA and configuring apt for use with a proxy
  • Installing packages
  • Adding ssh authorizedkeys
  • Creating a new user

Example: Use the apt module to add a PPA and configure apt for use with a proxy.

#cloud-config
apt:
  http_proxy: 'http://[[user][:pass]@]host[:port]/'
  https_proxy: 'https://[[user][:pass]@]host[:port]/'
  sources:
      myppa:
          source: 'ppa:lp-user/app'

Example: Update the OS, install the required packages, and reboot the instance.

#cloud-config
packages:
    - pwgen
    - pastebinit
    - [libpython2.7, 2.7.3-0ubuntu3.1]
package_update: true
package_upgrade: true
package_reboot_if_required: true

Example: Import a ssh key.

Example: Create the user ‘ubuntuuser’ and add the provided pub key.

users:
    - default
    - name: ubuntuuser
      ssh_authorized_keys:
          - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAGEA3FSyQwBI6Z+nCSjUU

N.B: If - default is not provided to this module, the default user, e.g. ‘ubuntu’ on Ubuntu cloud images (or ‘centos’ for CentOS images), will not be created. This may break some automation software that relies on this user.

Example: Use the ubuntu-advantage module to register an instance on the first run.

#cloud-config
ubuntu-advantage:
  token: <ua_contract_token>
  enable:
  - fips
  - esm

## User scripts

This module runs user-provided scripts. These scripts may be provided in two ways:

  1. As a cloud-config part starting with #!
  2. As a script located in the scripts directory of the instance configuration location, which for Ubuntu cloud images is: /var/lib/cloud/scripts.

Scripts may be repeatedly run at various points in the instance life cycle by utilising the directories:

  • scripts/per-boot: Run every time the system boots.
  • scripts/per-instance: Run when the instance is first booted (configuration changes to these instances may trigger these scripts).
  • scripts/per-once: Runs only once (configuration changes to the instance will not trigger these scripts).

Scripts provided by the first method or in the base directory of scripts will be run ‘per-instance’.

Users scripts may be combined with #cloud-config YAML by using the write_file module.

For example:

#cloud-config
# ... some other config ...

write_files:
  - path: /var/lib/cloud/scripts/per-boot/run_me.sh
permissions: '0755'
content: |
  #!/bin/sh
  echo "I was run on `date`"  

## Testing and debugging cloud-init configuration.

Booting an instance on a cloud provider to test a new cloud-init cloud-config may be costly and time-consuming. Multipass 4, vastly improves this situation by allowing you to quickly download and run the latest stable Ubuntu cloud images, and provide cloud-init configuration directly on the command line.

For example:

$ cat mycloudconfig.yaml
#cloud-config
---
ssh_import_id: lp:ubuntuuser

$ multipass launch --cloud-init mycloudconfig.yaml

Alternatively, you may also supply cloud-config to LXD containers with a slightly altered method:

$ lxc launch ubuntu: --config user.user-data="$(cat mycloudconfig.yaml)"

In this command, LXD will use the default image from the “ubuntu” channel (similar to the behaviour of Multipass) and cloud-config is supplied as a configuration key to the LXD container.

Cloud-init is typically configured by default to log to a number of places and this output may be found at:

  • /var/log/syslog
  • /var/log/cloud-init.log

All the output that was sent to the console can be found at:

  • /var/log/cloud-init.out

With Multipass and LXD, these logs can be viewed by executing a shell in the created instance. For advanced configuration, it might be necessary to confirm various points of configuration in the image. Common locations for these are:

  • /etc/cloud
  • /var/lib/cloud: instance configuration and data
  • /run/cloud-init/: runtime data

We can utilise the cloud-init cli inside the running instance to instrument some of this debugging 5.

# Wait for cloud-init to finish executing
$ multipass exec <instance_name> -- cloud-init status --wait

# Collect logs
$ multipass exec <instance_name> -- cloud-init collect-logs

# Show cloud-init activity and time
$ cloud-init analyze show
$ cloud-init analyze blame

Finally you may want to execute specific modules, such as the package 6 module of your cloud-init separately.

$ multipass shell <instance_name>

# run the package module
$ sudo cloud-init single -n package_update_upgrade_install

# How to use Cloud-init on various platforms

We have seen how to use cloud-init to customise instances with Multipass and LXD. It is pertinent to note the data source in use when deploying an instance with cloud-init as there can be slight differences in advanced use cases. For instance, LXD makes use of the ’nocloud’ data source for its instances.

## MAAS

MAAS is both a data source for cloud-init and supports cloud-init for customisation. This means MAAS provides metadata services for the instances it deploys. It may also customise instances during ‘deployment’ after the first boot. It is important to note that cloud-init cannot customise instances prior to first boot; to do this, MAAS needs to be configured with Curtin preseed templates 7.

MAAS does not currently support setting cloud-init user data via the GUI; to deploy MAAS instances with cloud-init user data, the MAAS cli 8 needs to be used.

$ maas $PROFILE machine deploy $SYSTEM_ID user_data=<base-64-encoded-script>
# e.g.
$ maas admin machine deploy e8xa8m user_data=$(base64 mycloudconfig.yaml)

## Juju

Juju is not a cloud-init data source, however, it utilises the native metadata sources of a model’s backing cloud to provide instance customisation (e.g. via MAAS or Openstack). If Juju is deploying applications to LXD containers, it will also customise LXD profiles to include cloud-init data.

The cloud-init data passed to instances may be modified on a per-model basis, or set for all models via ‘model-defaults’. It is pertinent to note that changes to model-defaults only apply to new models after the change, and changes to the cloud-init key of model-config only apply to new instances.

As such, model-defaults are generally decided by the operator at the time of bootstrapping a controller.

$ cat ./mymodelconfig.yaml
cloudinit-userdata: |
  #cloud-config
  packages:
      - pwgen

# for all new models
$ juju model-defaults ./mymodelconfig.yaml
# for current model (applies to new instances)
$ juju model-config ./mymodelconfig.yaml
# at bootstrap
$ juju bootstrap --model-default ./mymodelconfig.yaml <cloud> <name>

## OpenStack

OpenStack is both a data source and supports cloud-init for instance customisation. OpenStack supports creating an instance with user-data configuration via the –user-data flag.

For example:

$ openstack server create --user-data mycloudconfig.yaml

## Bonus: using cloud-init with libvirt and qemu

We don’t even need a cloud provider or metadata source to make use of cloud-init. We can generate a .img containing our metadata and boot directly into a cloud-init enabled image with this.

IMG="focal-server-cloudimg-amd64.img"
IMG_PATH="/var/lib/libvirt/images/base/$IMG"
INSTANCE_NAME="testinstance"

# Download the focal cloud image
[ -f $IMG_PATH ] || {
sudo mkdir -p /var/lib/libvirt/images/base
sudo wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img \
    -O /var/lib/libvirt/images/base/focal-server-cloudimg-amd64.img
}

# Create a root disk backed by the cloud image
sudo qemu-img create -f qcow2 -o backing_file=/var/lib/libvirt/images/base/focal-server-cloudimg-amd64.img /var/lib/libvirt/images/${INSTANCE_NAME}/${INSTANCE_NAME}.img
sudo qemu-img resize /var/lib/libvirt/images/${INSTANCE_NAME}/${INSTANCE_NAME}.img 10G

# Create a seed image with our cloud-config
sudo cloud-localds -H ${INSTANCE_NAME} /var/lib/libvirt/images/${INSTANCE_NAME}/seed.img cloud-config.yaml

# Define a libvirt domain from an xml spec that references the above disk images and boot
virsh define ${INSTANCE_NAME}.xml
virsh autostart ${INSTANCE_NAME}
virsh start ${INSTANCE_NAME}

# Alternativly boot directly with qemu
qemu-system-x86_64  \
  -machine accel=kvm,type=q35 \
  -cpu host \
  -m 2G \
  -nographic \
  -device virtio-net-pci,netdev=net0 \
  -netdev user,id=net0,hostfwd=tcp::2222-:22 \
  -drive if=virtio,format=qcow2,file=/var/lib/libvirt/images/${INSTANCE_NAME}/${INSTANCE_NAME}.img \
  -drive if=virtio,format=raw,file=/var/lib/libvirt/images/${INSTANCE_NAME}/seed.img

We’ve stepped through cloud-init use cases from as simple as “just install it and it just works” right through to the complexities of cloud-init debugging, modules, and module usage, as well as compiling cloud-config into an image for use “offline”. As demonstrated, cloud-init is a super versatile tool for instance customisation that should be near at hand for a variety of tasks.