Faster startup for containerized applications with Open Liberty InstantOn

Open Liberty InstantOn provides fast startup times for MicroProfile and Jakarta EE applications. With InstantOn, your applications can start in milliseconds, without compromising on throughput, memory, development-production parity, or Java language features.

InstantOn uses the Checkpoint/Restore In Userspace (CRIU) feature of the Linux kernel to take a checkpoint of the JVM that can be restored later. With InstantOn, the application is developed as normal, and then InstantOn makes a checkpoint of the application process when the application container image is built. When the application is restored, it runs in the same JVM, which provides complete parity between development and production. Because the checkpoint process takes only seconds, your CI/CD process is barely affected.

When you build an application container image, InstantOn creates an extra image layer that contains a checkpoint of the Open Liberty application process. When the application image starts, the Open Liberty runtime recognizes the InstantOn layer and restores the application process to the checkpoint that was created during the application container image build.

Open Liberty and OpenJ9 contain hooks that participate in the checkpoint and restore process. These hooks allow the Open Liberty runtime to prepare the process for checkpoint such that persisting the checkpoint process in the container image is safe. This preparation step ensures that the process can be restored from the same persistent state in the container image into multiple, possibly concurrent, instances of the application. For more information on the CRIU support in OpenJ9 and the IBM Semeru JVM, see the Java CRIU Support documentation.

InstantOn cannot be used outside of a container image build. An application container image provides a consistent environment, which is required to ensure a reliable restore of an Open Liberty application process. The InstantOn container layer is the last layer of the application container image. This configuration ensures that the resources in the underlying layers of the image do not change from the time the checkpoint is taken to the time the image is started with InstantOn.

The following sections describe the prerequisites and processes to build and run an InstantOn application image:

When to take a checkpoint: beforeAppStart or afterAppStart
Runtime and host build system prerequisites
Building an InstantOn application image
Running and deploying an InstantOn application image
Open Liberty InstantOn supported features

When to make a checkpoint: beforeAppStart or afterAppStart

When an InstantOn checkpoint occurs, the Open Liberty runtime starts. During this startup, the runtime processes the configuration, loads all the enabled features, and starts processing the configured application. This startup sequence varies depending on the size and complexity of the application and the number of Open Liberty features it requires. For small simple applications, this process can take less than 1 second. For more typical applications, it can take several seconds. InstantOn takes a checkpoint after the Open Liberty runtime completes startup. Two options are available to determine whether the checkpoint occurs before or after the application itself starts, either of which can be specified as part of the InstantOn RUN configuration:

afterAppStart - Performs a checkpoint after the application is started. This option provides the fastest startup, but might not be suitable for some applications.
beforeAppStart - Performs a checkpoint before any application code runs. Use this option in cases where the application code is not compatible with the afterAppStart option.

Which of these options you choose depends on the code your application must run. Jakarta EE and MicroProfile applications might contain application code that gets run as the application is started, such as the following examples:

A servlet that uses the loadOnStartup attribute
An EJB that uses the @Startup annotation
A CDI bean that uses @Observes @Initialized(ApplicationScoped.class) annotations

Sometimes, the application code that runs as the application starts might not be suited for performing an InstantOn checkpoint. For such cases, use the beforeAppStart option. For example, the following application code scenarios must be avoided before an InstantOn checkpoint, and are therefore better suited for the beforeAppStart option:

Accessing a remote resource, such as a database. The correct data source is unlikely to be available to connect to during an application container build.
Creating a transaction. Currently, transactions are prohibited before an InstantOn checkpoint.
Reading configuration that is expected to change when the application is deployed, for example configuration from MicroProfile Config.

Using the beforeAppStart option in these cases ensures that the application code is run only after the InstantOn checkpoint process is restored. This option might result in slower restore times because it must run more code before the application is ready to service any incoming requests. If the application early start code is determined to be safe and acceptable for checkpoint, then the afterAppStart checkpoint option can be used. This option provides for the fastest startup time when the application process is restored.

If an application has no code that is run as the application is started, then the beforeAppStart and afterAppStart checkpoints are equivalent. In these cases, both checkpoint options perform a checkpoint of the process before the configured ports are enabled for servicing requests. This sequence ensures that the transport protocols for the application are enabled only after the InstantOn checkpoint process is restored.

For more information about limitations with early startup code and possible workarounds, see InstantOn limitations and known issues.

Runtime and host build system prerequisites

Starting with Open Liberty version 23.0.0.6, all X86-64/AMD64 UBI Open Liberty container images include the prerequisites for InstantOn to checkpoint and restore Open Liberty application processes. Open Liberty Ubuntu container images are not enabled for InstantOn.

Currently, InstantOn is supported with IBM Semeru Java version 11.0.19+ and IBM Semeru Java version 17.0.7+. InstantOn is expected to support new versions of IBM Semeru Java as they are released. Currently, InstantOn is not supported on other Java vendor implementations.

To use InstantOn during an application container image build, the host machine must satisfy the following prerequisites:

Uses the Linux Operating System with kernel version 5.9 or greater
Uses the X86-64/AMD64 processor architecture
Allows execution of privileged container builds by using Podman or Docker

The use of privileged containers in the application container image build is required only when you build the container image with InstantOn. Running the resulting InstantOn application image does not require the use of privileged containers.

To use Docker, a version 23.0 or greater is required. Earlier versions of Docker did not support the CHECKPOINT_RESTORE Linux capability, which prevents InstantOn from doing a checkpoint or restore of a process in container.

Linux capability prerequisites for checkpoint and restore

Unprivileged (non-root) users are supported by CRIU for checkpointing and restoring application processes. To checkpoint and restore, CRIU requires different Linux capabilities.

To perform an application process checkpoint, CRIU requires the following Linux capabilities:

CHECKPOINT_RESTORE - This capability was added in Linux 5.9 to separate checkpoint/restore functions from the overloaded SYS_ADMIN capability.
SETPCAP - This capability is required for the subsequent restore.
SYS_PTRACE - CRIU uses this powerful capability to capture and record the full process state. It is necessary only when CRIU is checkpointing an application process.

To perform an application process restore, CRIU requires the following Linux capabilities:

CHECKPOINT_RESTORE - This capability was added in Linux 5.9 to separate checkpoint/restore functions from the overloaded SYS_ADMIN capability.
SETPCAP - This capability allows CRIU to drop capabilities from the restored application process. Because the CRIU binary is granted extra capabilities like CHECKPOINT_RESTORE, it needs SETPCAP so it can drop those capabilities from the final process that CRIU restores.

Building an InstantOn application image

Two options are available to build an application container image that uses InstantOn:

Add a special RUN instruction at end of a Dockerfile or Containerfile that runs the checkpoint.sh script to perform an application checkpoint at container image build time. This option requires you to use Podman.
Use a three-step process to build the application image, run the checkpoint, and commit the final result into an InstantOn application container image. This option requires you to use either Podman or Docker version 23.0 or later.

To run the checkpoint.sh script, you must use Podman to build the application container image. Currently, you cannot use Docker to build the InstantOn application container image because Docker does not provide a way to grant the container build the necessary Linux capabilities. To use Docker to build an InstantOn application container image, you must follow the three-step build process.

Building the InstantOn image with Podman and the checkpoint.sh script

You can use the checkpoint.sh script to perform the application checkpoint by adding the RUN checkpoint.sh instruction to the end of your Dockerfile or Containerfile file. The execution of the checkpoint.sh must be the last RUN instruction during your container image build. This configuration performs the application process checkpoint and stores the process data as the last layer of the application container image. Currently, this script requires you to use Podman rather than Docker because Docker cannot grant the necessary Linux capabilities.

The following image template example uses the kernel-slim-java17-openj9-ubi tag to build an image that uses the latest Open Liberty release with the IBM Semeru distribution of Java 17. This example uses the afterAppStart checkpoint option.

Dockerfile

FROM icr.io/appcafe/open-liberty:kernel-slim-java17-openj9-ubi

# Add a Liberty server configuration that includes all necessary features
COPY --chown=1001:0  server.xml /config/

# This script adds the requested XML snippets to enable Liberty features and grow the image to be fit-for-purpose.
# This option is available only in the 'kernel-slim' image type. The 'full' and 'beta' tags already include all features.
RUN features.sh

# Add interim fixes (optional)
COPY --chown=1001:0  interim-fixes /opt/ol/fixes/

# Add an application
COPY --chown=1001:0  Sample1.war /config/dropins/

# This script adds the requested server configuration, applies any interim fixes, and populates caches to optimize the runtime.
RUN configure.sh

# This script performs an InstantOn checkpoint of the application.
# The application can use beforeAppStart or afterAppStart to do the checkpoint.
# The default is beforeAppStart when not specified
RUN checkpoint.sh afterAppStart

Use the following Podman command to build the InstantOn application container image. To grant the necessary Linux capabilities to the container image build, this command must be run either as the root user or by using the sudo utility.

podman build \
   -t dev.local/liberty-app-instanton \
   --cap-add=CHECKPOINT_RESTORE \
   --cap-add=SYS_PTRACE\
   --cap-add=SETPCAP \
   --security-opt seccomp=unconfined .

The three --cap-add options grant the three Linux capabilities that CRIU requires to perform the application process checkpoint during the container image build. The --security-opt option grants access to all Linux system calls to the container image build.

Building the InstantOn image by using the three-step process with Docker or Podman

If you cannot use Podman to run the checkpoint.sh during the container image build, you can use the following three-step process to build the InstantOn application container image:

Build the application container image without the InstantOn layer
Run the application container to perform a checkpoint of the application in the running container
Commit the stopped container with the checkpoint process data into an InstantOn application container image

You can use these steps with either Podman and Docker to build an InstantOn application image. For Docker, version 23.0 or later is required. The following examples assume that you are using Docker to build an application image that is named liberty-app.

1. Build the application container image

Set the image template (Dockerfile or Containerfile) similar to the following example. This example uses the kernel-slim-java17-openj9-ubi tag to build an image that uses the latest Open Liberty release with the IBM Semeru distribution of Java 17. This template does not run the checkpoint.sh script.

Dockerfile

FROM icr.io/appcafe/open-liberty:kernel-slim-java17-openj9-ubi

# Add a Liberty server configuration that includes all necessary features
COPY --chown=1001:0  server.xml /config/

# This script adds the requested XML snippets to enable Liberty features and grow the image to be fit-for-purpose.
# This option is available only in the 'kernel-slim' image type. The 'full' and 'beta' tags already include all features.
RUN features.sh

# Add interim fixes (optional)
COPY --chown=1001:0  interim-fixes /opt/ol/fixes/

# Add an application
COPY --chown=1001:0  Sample1.war /config/dropins/

# This script adds the requested server configuration, applies any interim fixes, and populates caches to optimize the runtime.
RUN configure.sh

To build the application container image with Docker, run the following command:

docker build -t liberty-app .

The resulting application container image, which is tagged liberty-app, does not contain the InstantOn checkpoint process layer.

2. Run the application container to perform a checkpoint

Run the application container image to perform a checkpoint of the application process within the running container. The following example uses the liberty-app application image to run the checkpoint of the application process with the afterAppStart option:

docker run \
  --name liberty-app-checkpoint-container \
  --privileged \
  --env WLP_CHECKPOINT=afterAppStart \
  liberty-app

This command runs the application within a container and performs an application process checkpoint. The --env option sets a WLP_CHECKPOINT environment variable to specify the checkpoint afterAppStart option. When the application process checkpoint completes, the liberty-app-checkpoint-container application container is stopped and exits.

3. Commit the stopped container with the checkpoint process data

The stopped liberty-app-checkpoint-container container from the previous step contains the data from the InstantOn checkpoint process. Lastly, take this checkpoint process data and commit it to an application container image layer by running the following commit commands:

docker commit liberty-app-checkpoint-container liberty-app-instanton
docker rm liberty-app-checkpoint-container

You now have two application images: liberty-app and liberty-app-instanton. Starting a container with the liberty-app-instanton container image shows a faster startup time than the original liberty-app image. The liberty-app-checkpoint-container stopped container is no longer needed and can safely be removed.

Running and deploying an InstantOn application image

Special considerations are required to run an InstantOn application image locally or when it is deployed to a public cloud. The following prerequisites are required to restore the InstantOn checkpoint process.

The host that is running the container image must use Linux kernel 5.9 or greater
The Linux capabilities CHECKPOINT_RESTORE and SETPCAP must be granted to the running container
The necessary system calls must be granted to the running container
The host processor must be X86-64/AMD64

Running an InstantOn application image locally

If a host system is running the Linux kernel 5.9 or greater with the X86-64/AMD64 processor, you can run an InstantOn application image by using Podman or Docker locally. The following command runs the liberty-app-instanton InstantOn application image with Podman:

podman run \
  --rm \
  --cap-add=CHECKPOINT_RESTORE \
  --cap-add=SETPCAP \
  --security-opt seccomp=unconfined \
  -p 9080:9080 \
  liberty-app-instanton

The following command runs the liberty-app-instanton InstantOn application image with Docker:

docker run \
  --rm \
  --cap-add=CHECKPOINT_RESTORE \
  --cap-add=SETPCAP \
  --security-opt seccomp=unconfined \
  -p 9080:9080 \
  liberty-app-instanton

In both cases, the --cap-add option grants the CHECKPOINT_RESTORE and SETPCAP capabilities. The SYS_PTRACE capability is not required to run the InstantOn application container image.

Required Linux system calls

The --security-opt option grants the running container access to all Linux system calls. Depending on the defaults of the container engine, the --security-opt with the seccomp-unconfined setting might not be required. For CRIU to restore the InstantOn application process, the container must have access to clone3, ptrace, and other system calls. This requirement is true even though the elevated Linux capability of SYS_PTRACE is not required to restore the process. You can update the defaults of the container engine to include all the required system calls.

Alternatively, you can specify a file with the --security-opt seccomp option that specifies the policy for the container. Use the following command to specify a JSON policy file for seccomp:

podman run \
  --rm \
  --cap-add=CHECKPOINT_RESTORE \
  --cap-add=NET_ADMIN \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=criuRequiredSysCalls.json \
  -p 9080:9080 \
  liberty-app-instanton

The resulting criuRequiredSysCalls.json file grants access to all the Linux system calls that are required by CRIU to restore an InstantOn application process.

Recovering from a failed InstantOn restore

If restoration of the InstantOn application process fails, Open Liberty starts the server without using the InstantOn checkpoint process. In such cases, the Open Liberty application starts as if no InstantOn checkpoint process layer exists, which takes longer than a successfully restored InstantOn process. This recovery launch from a failed InstantOn restore can be disabled by setting the following environment variable:

CRIU_RESTORE_DISABLE_RECOVERY=true

After you build an InstantOn application container image, you can verify a successful restore by setting this environment variable to run locally. For example, you can run the following Podman command:

podman run \
  --rm \
  --cap-add=CHECKPOINT_RESTORE \
  --cap-add=SETPCAP \
  --security-opt seccomp=unconfined \
  --env CRIU_RESTORE_DISABLE_RECOVERY=true \
  -p 9080:9080 \
  liberty-app-instanton

To avoid cloud environments from continuously trying to restart the failed start of an application container image, the default value of the CRIU_RESTORE_DISABLE_RECOVERY variable is false.

Deploying an InstantOn application to Kubernetes services

Currently, Open Liberty InstantOn is tested and supported on the following public cloud Kubernetes services:

Other public cloud Kubernetes services might also work if they have the prerequisites to allow the InstantOn application process to restore.

When you deploy to Kubernetes, the container must be granted the CHECKPOINT_RESTORE and the SETPCAP Linux capabilities to allow the InstantOn application process to restore. You can configure these capabilities in the deployment YAML file by specifying the following securityContext for the container:

        securityContext:
          allowPrivilegeEscalation: true
          privileged: false
          runAsNonRoot: true
          capabilities:
            add:
            - CHECKPOINT_RESTORE
            - SETPCAP
            drop:
            - ALL

Open Liberty InstantOn supported features

InstantOn supports a subset of Open Liberty features. If a feature is enabled that InstantOn does not support, a failure occurs when you try to perform a checkpoint of an application process. InstantOn supports the following Jakarta EE and MicroProfile convenience features:

Jakarta EE Web Profile versions 8.0 and later
MicroProfile versions 4.1 and later

You can individually enable the Open Liberty public features that are enabled by the Jakarta EE Web Profile and MicroProfile features, depending on the needs of your application. This option avoids enabling the complete set of features that are enabled by the convenience features. However, InstantOn currently does not support standalone MicroProfile features, which are MicroProfile features that are not enabled by any of the convenience features.

In addition to the features that are enabled in the MicroProfile and Jakarta convenience features, InstantOn also supports the following features:

For more information about limitations, see InstantOn limitations and known issues.