Did you know you can repackage your cloud-native Java applications to start in milliseconds, without compromising throughput, memory, development-production parity, or Java language features? And with little to no refactoring of the application code? Here’s how…
The need for speed at startup
In serverless environments, scale-to-zero can help reduce the overall cloud costs for deployed applications by shutting down unneeded application instances when no ongoing requests exist. When activity picks up for the application, new instances can start quickly, without introducing noticeable delays for the application end user.
Despite improvements to JDK technology that make it start in less than 1 second, like class data sharing and dynamic AOT compilation, it still cannot start fast enough to support scale-to-zero. However, the JDK is important for optimizing throughput and memory, ensuring development-production parity, and enabling a full range of Java language features. So how do we improve startup time while also benefiting from running on the full JDK?
The InstantOn capability in the Open Liberty runtime uses the IBM Semeru JDK and a Linux technology called Checkpoint/Restore in Userspace (CRIU) to take a checkpoint, or a point-in-time snapshot, of the application process. This checkpoint can then be restored very quickly to bring the application process back into the state it was in when the checkpoint was taken. The application can be restored multiple times because Open Liberty and Semeru JDK preserve the uniqueness of each restored process in containers. Each restored application process runs without first having to go through the whole startup sequence, saving up to 90% of startup time (depending on your application). InstantOn requires very little modification of your Java application to make this improvement happen.
The following diagram demonstrates how the checkpoint that is taken during the container image build can be used to rapidly start multiple instances of an application in production by restoring them to the checkpointed phase of the application process.
InstantOn cannot be used outside of a container image build. An application container image provides a consistent environment, which is required to ensure a reliable restore of an Open Liberty application process. Since the InstantOn checkpoint is included in the last layer of the application container image, the resources in the underlying layers of the image do not change from the time the checkpoint is taken to the time the image is restored.
The following tutorial walks you through containerizing your application using the Open Liberty Java runtime, InstantOn, IBM Semeru JDK, and Podman container tools running on Linux. For general information about containerizing applications with Open Liberty, see the Containerizing microservices with Podman guide.
Prerequisites to checkpoint/restore a containerized application
Currently, Open Liberty version 18.104.22.168 or later supports running with InstantOn only on x86-64/amd64 architectures. All our testing was done on RHEL 9.0 and Ubuntu 22.04 but it might also be possible to run on other Linux distributions and versions if they have the following prerequisites:
The kernel must support the Linux CAP_CHECKPOINT_RESTORE capability. This capability was introduced in kernel version 5.9.
The latest available version of Podman for the Linux distribution must be installed.
The Linux distribution must allow the execution of privileged container builds by using Podman or Docker.
For more information about the runtime and host build system prerequisites, see the Open Liberty InstantOn documentation.
Create an application WAR file
If you don’t have an application of your own handy, you can follow along with an example application from the Getting started with Open Liberty guide.
First, clone the Git repository for the guide:
git clone https://github.com/openliberty/guide-getting-started.git cd guide-getting-started
Then, build the application, which is in the
finish/ directory, and deploy it to Open Liberty:
cd finish mvn liberty:run
When you see the following message, your Open Liberty instance is ready:
The defaultServer server is ready to run a smarter planet.
Check out the service at the http://localhost:9080/dev/system/properties URL. Stop the running Open Liberty instance by pressing CTRL+C in the command-line session where you started Open Liberty.
Lastly, build the WAR for the application:
This command builds a
target/guide-getting-started.war archive. We can now include this WAR in a container image that uses the InstantOn feature.
Testing the startup time of your application
For comparison of how long it takes your Open Liberty application container image to start both with and without InstantOn, we describe how to build the container image without InstantOn first. Then, we describe how to build with InstantOn and run the resulting containers.
Containerizing the Open Liberty application without InstantOn
Build the application container image without InstantOn:
podman build -t getting-started .
This command creates the getting-started container image without any checkpoint image.
Run this application container:
podman run --name getting-started --rm -p 9080:9080 getting-started
Note the amount of time Open Liberty takes to report it is started and check out the service running in the container at the http://localhost:9080/dev/system/properties URL. After you finish checking out the application, stop the running container by pressing CTRL+C in the command-line session where you ran the
podman run command.
Containerizing the Open Liberty application with InstantOn
The Open Liberty container image contains the prerequisites for building an application container image with a checkpointed runtime process. Applications can use the Open Liberty image as a base to build their own application container images and from that, create their own application container image with a checkpointed process.
Build the application container image and checkpoint the application
An InstantOn checkpoint is created by starting the Open Liberty runtime during the application container image build step. During this startup, the runtime processes the configuration, loads all the enabled features, and starts processing the configured application. Depending on the needs of your application, you can choose one of two specific phases during Open Liberty startup at which to checkpoint the process. You must configure the Dockerfile to specify your chosen phase (as we show later).
The official Open Liberty images from the IBM Container Registry (ICR) include all the prerequisites that are needed for InstantOn to checkpoint an application process. For this example, the
getting-started application container image is using the
icr.io/appcafe/open-liberty:full-java11-openj9-ubi image from ICR as the parent image. Currently, InstantOn is supported only with the Java 11 and Java 17-based UBI images of Open Liberty.
Update the application Dockerfile by adding a
RUN command for the
checkpoint.sh script to the end of the file, as shown in the following example:
FROM icr.io/appcafe/open-liberty:full-java11-openj9-ubi ARG VERSION=1.0 ARG REVISION=SNAPSHOT LABEL \ org.opencontainers.image.authors="Your Name" \ org.opencontainers.image.vendor="IBM" \ org.opencontainers.image.url="local" \ org.opencontainers.image.source="https://github.com/OpenLiberty/guide-getting-started" \ org.opencontainers.image.version="$VERSION" \ org.opencontainers.image.revision="$REVISION" \ vendor="Open Liberty" \ name="system" \ version="$VERSION-$REVISION" \ summary="The system microservice from the Getting Started guide" \ description="This image contains the system microservice running with the Open Liberty runtime." COPY --chown=1001:0 src/main/liberty/config/ /config/ COPY --chown=1001:0 target/*.war /config/apps/ RUN configure.sh RUN checkpoint.sh afterAppStart
This configuration adds the application process checkpoint as the last layer of the application container image. The
checkpoint.sh script allows you to specify either
beforeAppStart to indicate which phase of the startup performs the process checkpoint.
Two options are available to determine whether the checkpoint occurs before or after the application itself starts:
beforeAppStart: The checkpoint happens after processing the configured application metadata. If the application has any components that get run as part of the application starting, the checkpoint is taken before executing any code from the application. This option is the earliest checkpoint phase that is offered by InstantOn.
afterAppStart: This option is the latest phase where a checkpoint can happen, so it has the potential to provide the fastest startup time when restoring the application instance. The checkpoint happens after all configured applications are reported as started. This phase happens before opening any ports for listening to incoming requests for the applications.
afterAppStart phase typically provides the quickest startup time for an application, but it also might cause some application code to run before the server process checkpoint happens. Since the
getting-started application used in this tutorial does not do anything in its startup logic that would cause problems in restoring, we can use the
afterAppStart phase for it.
For InstantOn to take a checkpoint of and restore a process, the CRIU binary requires additional Linux capabilities. The Open Liberty container image includes the necessary capabilities already granted to the binary. However, the container must also have these capabilities granted when it is launched.
With podman, you can use the
--security-opt options to grant the container build the necessary capabilities to take a checkpoint during the container build step. The user who launches the Podman container must have the authority to grant it the necessary Linux capabilities, so you must run the following command as root or
podman build \ -t dev.local/getting-started-instanton \ --cap-add=CHECKPOINT_RESTORE \ --cap-add=SYS_PTRACE\ --cap-add=SETPCAP \ --security-opt seccomp=unconfined .
The last instruction in the Dockerfile is to run the
checkpoint.sh script. When you execute the previous Podman build command, it launches Open Liberty to perform the checkpoint at the phase specified in the Dockerfile. After the container process data is persisted, Open Liberty stops and the container image build completes. The produced application container image contains the checkpoint process data as the last layer of the container image. The output looks something like the following example:
Performing checkpoint --at=afterAppStart Launching defaultServer (Open Liberty 22.214.171.124/wlp-1.0.78.cl230620230612-1100) on Eclipse OpenJ9 VM, version 11.0.19+7 (en_US) [AUDIT ] CWWKE0001I: The server defaultServer has been launched. [AUDIT ] CWWKG0093A: Processing configuration drop-ins resource: /opt/ol/wlp/usr/servers/defaultServer/configDropins/defaults/keystore.xml [AUDIT ] CWWKG0093A: Processing configuration drop-ins resource: /opt/ol/wlp/usr/servers/defaultServer/configDropins/defaults/open-default-port.xml [AUDIT ] CWWKZ0058I: Monitoring dropins for applications. [AUDIT ] CWWKZ0001I: Application guide-getting-started started in 1.886 seconds. [AUDIT ] CWWKC0451I: A server checkpoint "afterAppStart" was requested. When the checkpoint completes, the server stops.
Run the InstantOn application image
getting-started-instanton container with the following command:
podman run \ --rm \ --cap-add=CHECKPOINT_RESTORE \ --cap-add=SETPCAP \ --security-opt seccomp=unconfined \ -p 9080:9080 \ getting-started-instanton
--cap-add options grant the container the two Linux capabilities that CRIU requires to restore the application process. When Open Liberty restores the application process, it logs the following messages:
[AUDIT ] Launching defaultServer (Open Liberty 126.96.36.199/wlp-1.0.78.cl230620230612-1100) on Eclipse OpenJ9 VM, version 11.0.19+7 (en_US) [AUDIT ] CWWKZ0001I: Application guide-getting-started started in 0.233 seconds. [AUDIT ] CWWKT0016I: Web application available (default_host): http://850ba43df239:9080/dev/ [AUDIT ] CWWKT0016I: Web application available (default_host): http://850ba43df239:9080/metrics/ [AUDIT ] CWWKT0016I: Web application available (default_host): http://850ba43df239:9080/health/ [AUDIT ] CWWKT0016I: Web application available (default_host): http://850ba43df239:9080/ibm/api/ [AUDIT ] CWWKC0452I: The Liberty server process resumed operation from a checkpoint in 0.283 seconds. [AUDIT ] CWWKF0012I: The server installed the following features: [cdi-4.0, distributedMap-1.0, jndi-1.0, json-1.0, jsonb-3.0, jsonp-2.1, monitor-1.0, mpConfig-3.0, mpHealth-4.0, mpMetrics-5.0, restfulWS-3.1, restfulWSClient-3.1, ssl-1.0, transportSecurity-1.0]. [AUDIT ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 0.297 seconds.
If Open Liberty fails to restore the checkpoint process, it recovers by launching without the checkpoint image and logs the following message:
CWWKE0957I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.
Check how long it took for Open Liberty to start and compare this to the time it took without InstantOn.
InstantOn improves startup time of Open Liberty applications significantly by restoring the process from the checkpointed state. The improvement in the time to first response (i.e. the time taken to serve the first request) is also impressive but obviously more of the application logic runs after the restore in that case. We measured both metrics for multiple applications running in containers and using the
afterAppStart checkpoint phase.
These experiments show a healthy improvement in startup times for all 3 applications and the time to first response is also improved by up to 8.8x when compared with normal JVM mode without InstantOn.
This post describes how to configure your cloud-native application to start almost immediately by using the Open Liberty InstantOn feature to produce an application container image. The key value proposition of InstantOn is that you can repackage your cloud-native Java applications to start in milliseconds, without compromising on throughput, memory, development-production parity, or Java language features. This feature is now available in Open Liberty 188.8.131.52 on the X86-64/AMD64 platforms running in the public cloud AWS EKS and Azure AKS environments.
In the future, we are planning to broaden our platform coverage and expand to be able to run in more managed public and hybrid cloud environments. We also intend to explore supporting InstantOn with an even larger set of Open Liberty features. For more details about Open Liberty InstantOn, see the Faster startup for containerized applications with Open Liberty InstantOn documentation, which links to more elaborate discussion on known limitations and information on the Semeru JDK support for this feature.
taskset -cwas used to allocate 4 cores to the Open Liberty process running in containers in each case. Startup time is measured from the time the Open Liberty server startup is initiated to the time the server is ready to accept requests, as denoted by
The <server name> server is ready to run a smarter planet.message in the
messages.log. The time it takes to start the container itself is also included in the results shown. InstantOn versus normal startup times for these applications are shown here in milliseconds. Your results might vary based on your environment, hardware and software installed on your system, and other factors.