InstantOn limitations and known issues

Jakarta Transactions configuration limitations

Open Liberty Transaction Manager support for InstantOn has limitations around configuration updates when the application process is restored. The configuration attributes for the Transaction Manager must remain constant between the InstantOn checkpoint and restore. This limitation is true only for the configuration attributes that are specified directly with the transaction server configuration element, for example, recoveryGroup and recoverIdentity. The values for these configuration attributes must not change between checkpoint and restore.

This limitation implies that transaction recovery in a cloud environment cannot work as designed because the recoverIdentity cannot be parameterized by something like the following transaction configuration example. This example gives a unique recoverIdentity for each instance of the application:

<transaction
  ...
  recoveryGroup="peer-group-name"
  recoveryIdentity="${HOSTNAME}${wlp.server.name}"
  ...
/>

Jakarta Transaction before checkpoint

Open Liberty InstantOn does not allow transactions to begin before a checkpoint is performed for the application process. This scenario is possible if the application has early startup code that attempts to start a transaction. Consider the following Servlet:

@WebServlet(urlPatterns = { "/StartupServlet" }, loadOnStartup = 1)
public class StartupServlet extends HttpServlet {
    @Override
    public void init() {
        UserTransaction ut = UserTransactionFactory.getUserTransaction();
        try {
            ut.begin();
            ...
            ut.commit();
        } catch (Exception e) {
            // something went wrong
        }
    }

}

This Servlet example uses the loadOnStartup = 1 attribute. When you use this attribute with the afterAppStart option, the servlet initializes before the checkpoint. The runtime detects this conflict and logs the following message:

[WARNING ] WTRN0155W: An application began or required a transaction during the server checkpoint request. The following stack trace for this thread was captured when the transaction was created:

This warning is followed by a stacktrace that helps identify the application code that is attempting to begin a transaction. The server then fails to checkpoint and the following error is logged:

WTRN0154E: The server checkpoint request failed because the transaction service is unable to begin a transaction.

You can avoid this failure by using the beforeAppStart option or by modifying the component not to use early startup code. In this example, that modification is to remove the loadOnStartup = 1 attribute.

Accessing MicroProfile Config too early

If an application has early startup code and you are using the afterAppStart option, it might get injected with a configuration value from MicroProfile Config before a checkpoint is performed for the application process. If such a configuration value changes at the time the application image container runs, the application might use the stale value that was set when the application process checkpoint was performed.

The Open Liberty runtime detects this situation and logs a warning message when the application container image is run that indicates that a configuration value is changed. The following example uses an example_config configuration key with a default value set to theDefault. When the checkpoint occurs, the environment configuration source is not available to populate MicroProfile configuration values. If this @Inject annotation of the configuration is contained in a CDI bean that is created and used before the checkpoint is performed, the value of theDefault is injected.

    @Inject
    @ConfigProperty(name = "example_config", defaultValue = "theDefault")
    String exampleConfig;

When the InstantOn application container image is run, the environment variable EXAMPLE_CONFIG can be used to provide an updated value. The runtime detects this value and logs the following message:

[WARNING ] CWWKC0651W: The MicroProfile configuration value for the key example_config has changed since the checkpoint action completed on the server. If the value of the key changes after the checkpoint action, the application might not use the updated value.

In this situation, use the beforeAppStart checkpoint option. Another option is to use a Dynamic ConfigSource. The previous example can be modified to use a dynamic ConfigSource by using the Provider<String> type for the exampleConfig variable:

    @Inject
    @ConfigProperty(name = "example_config", defaultValue = "theDefault")
    Provider<String> exampleConfig;

Each call to the get() method of the Provider<String> returns the current value of the ConfigProperty annotation. This behavior allows the application to access the updated configuration value when the application process is restored during the InstantOn application container run.

Injecting a DataSource too early

If an application has early startup code and you are using the afterAppStart option, it might get injected with DataSource before a checkpoint is performed for the application process. In a cloud environment, the configuration of the DataSource likely needs to change at the time the application image container is run. Consider the following Servlet example:

@WebServlet(urlPatterns = "/ExampleServlet", loadOnStartup = 1)
public class ExampleServlet extends HttpServlet {
    @Resource(shareable = false)
    private DataSource exampleDataSource;
    ...
}

This Servlet example uses the loadOnStartup = 1 attribute. When you are using the afterAppStart option, this attribute initializes the servlet before the checkpoint. The deployment information related to the DataSource might need to be configured when you deploy the application to the cloud. Consider the following Open Liberty server.xml configuration.

  <!-- these values are place holders so we don't have to have the env set before checkpoint -->
  <variable name="DB2_DBNAME" defaultValue="placeholder" />
  <variable name="DB2_HOSTNAME" defaultValue="placeholder" />
  <variable name="DB2_PASS" defaultValue="placeholder" />
  <variable name="DB2_PORT" defaultValue="45000" />
  <variable name="DB2_PORT_SECURE" defaultValue="45001" />
  <variable name="DB2_USER" defaultValue="placeholder" />


  <dataSource id="DefaultDataSource">
    <jdbcDriver libraryRef="DB2Lib"/>
    <properties.db2.jcc
      databaseName="${DB2_DBNAME}" serverName="${DB2_HOSTNAME}" portNumber="${DB2_PORT}"
      downgradeHoldCursorsUnderXa="true"/>
    <containerAuthData user="${DB2_USER}" password="${DB2_PASS}"/>
    <recoveryAuthData user="${DB2_USER}" password="${DB2_PASS}"/>
  </dataSource>

This configuration uses placeholder values for things like the database name, hostname, ports, user, and password. This configuration allows the values to be updated with environment variable values or other configuration mechanisms, as described in Configuring microservices running in Kubernetes. These configurations must not be hardcoded into an application image and must be able to be updated when you deploy the application to the cloud.

If an application is injected with a DataSource before the checkpoint and the configuration of the DataSource changes, the application is restarted when the InstantOn application container image is run with the updated configuration. You can avoid this scenario by using the beforeAppStart option or by modifying the component not to be early startup code. In this example, that modification is to remove the loadOnStartup = 1 attribute.

Accessing MicroProfile Config properties with no default value at checkpoint

An application injected with a configuration property that has no default value set in any configuration source might cause errors during checkpoint. This section provides solutions for common errors that are encountered.

A configuration property can be introduced into the application either statically or dynamically, and in either case, the property can be declared optional. The following example shows ways to inject static, static-optional, dynamic, and dynamic-optional configuration properties.

  @Inject
  @ConfigProperty(name = "static_config")
  String staticConfig;

  @Inject
  @ConfigProperty(name = "static_optional_config")
  Optional<String> staticOptionalConfig;

  @Inject
  @ConfigProperty(name = "dynamic_config")
  Provider<String> dynamicConfig;

  @Inject
  @ConfigProperty(name = "dynamic_optional_config")
  Provider<Optional<String>> dynamicOptionalConfig;

If no value is found in an existing configuration source during checkpoint, the injected static_config property causes an error similar to the following example:

SRCFG02000: Failed to Inject @ConfigProperty for key static_config into io.example.Example.staticConfig since the config property could not be found in any config source.

You can avoid this error by providing a default value for the configuration key in one of the following ways:

Specify the default value on the @ConfigProperty annotation
  @Inject
  @ConfigProperty(name = "static_config", defaultValue = "defaultValue")
  String staticConfig;
Specify the default value in the application META-INF/microprofile-config.properties resource
  static_config=defaultValue
Specify a default value in a variable element in the server.xml` file
  <variable name="static_config" defaultValue="defaultValue" />

If no default value is set, you can still avoid the previous error by injecting configuration with the static_optional_config, dynamic_config, or dynamic_optional_config properties. However, the following error might occur if you use the checkpoint option with CDI beans that are @ApplicationScoped and the dynamic_config is accessed too early during application startup:

java.util.NoSuchElementException: SRCFG00014: The config property dynamic_config is required but it could not be found in any config source.

Similarly, accessing the static_optional_config and dynamic_optional_config too early might cause the following error:

java.util.NoSuchElementException: No value present

Therefore, to avoid these errors it is best to set a default value for injected config properties as optional and dynamic config can be accessed too early during application startup. Furthermore, if the @ConfigProperty injection site is not using dynamic configuration, then any default value that is injected into the application-scoped bean before checkpoint is not updated on restore. For more information, see Accessing MicroProfile Config too early

Using product extensions, user features, or features that are not supported by InstantOn

InstantOn supports only a subset of Open Liberty features, as described in Open Liberty InstantOn supported features. Any public features that are enabled outside of the supported set of features for InstantOn cause checkpoint to fail with an error message like the following example:

CWWKC0456E: A checkpoint cannot be taken because the following features configured in the server.xml file are not supported for checkpoint: [usr:exampleFeature-1.0]

This error occurs for any configured features that are not supported for InstantOn. This limitation includes Liberty product extension and Liberty user features.

Updating configuration with a bootstrap.properties file

When an InstantOn application container image is run, the bootstrap.properties file is not read. Values that must be able to be configured when you run an InstantOn application container image must come from alternative sources. For example, you might use environment variables or other configuration mechanisms, as described Configuring microservices running in Kubernetes.

Java SecurityManager is not supported

If Open Liberty is configured to run with the SecurityManager, InstantOn detects this configuration during a checkpoint and fails with the following message:

CWWKE0958E: The server checkpoint request failed because the websphere.java.security property was set in the bootstrap.properties file. This property enables the Java Security Manager and is not valid when a server checkpoint occurs.

Updating JVM options

InstantOn does not support changing the jvm.options when you restore the InstantOn application process. Any JVM options that are required to be set for the JVM process must be defined during the InstantOn container image build.

The IBM Semeru JVM does have limited support for setting JVM options on restore with the use of the OPENJ9_RESTORE_JAVA_OPTIONS environment variable. For more information, see the Java Checkpoint/Restore In Userspace (CRIU) support documentation.

SELinux limitations

If SELinux mode is set to enforcing, SELinux might prevent InstantOn from performing a checkpoint of the application process when you use the checkpoint.sh script in the image template Dockerfile or Containerfile. If the virt_sandbox_use_netlink SELinux setting is disabled, the required netlink Linux system calls are blocked. This block prevents InstantOn from performing a checkpoint of the application process during the container image build. Open Liberty InstantOn detects this limitation and logs the following messages:

CWWKE0962E: The server checkpoint request failed. The following output is from the CRIU /logs/checkpoint/checkpoint.log file that contains details on why the checkpoint failed.
Warn  (criu/kerndat.c:1103): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (criu/libnetlink.c:84): Can't send request message: Permission denied
..
Error (criu/cr-dump.c:2099): Dumping FAILED.
CWWKE0963E: The server checkpoint request failed because netlink system calls were unsuccessful. If SELinux is enabled in enforcing mode, netlink system calls might be blocked by the SELinux "virt_sandbox_use_netlink" policy setting. Either disable SELinux or enable the netlink system calls with the "setsebool virt_sandbox_use_netlink 1" command.

To work around this limitation, you can either enable the virt_sandbox_use_netlink SELinux setting with the setsebool virt_sandbox_use_netlink 1 command or disable SELinux enforcing mode. Another option to work around this issue is to use the three-step process to build the InstantOn image. The three-step process requires the use of a --privileged container that grants access to the netlink system calls to the running container that performs the application process checkpoint.

Yama Linux Security Module limitations

If Yama is configured with one of the following modes, InstantOn cannot checkpoint or restore the application process in running containers:

  • 2 - admin-only attach

  • 3 - no attach

When this configuration is present, the /logs/checkpoint/restore.log contains the following error:

Error (criu/arch/x86/kerndat.c:178): 32: ptrace(PTRACE_TRACEME) failed: Operation not permitted

For InstantOn checkpoint and restore to work, Yama must be configured with one of the following modes:

  • 0 - classic ptrace permissions

  • 1 - restricted ptrace

The following supported public cloud Kubernetes services have the default for Yama set to the 1 mode, which allows InstantOn to checkpoint and restore by default:

Access to Linux system calls

As described in Required Linux system calls, CRIU requires several Linux system calls to restore the application process. This requirement might require extra configuration to grant the required system calls to the running container when you use InstantOn.

The following examples are errors that are logged to the /logs/checkpoint/restore.log file when access to specific system calls is blocked.

Blocked clone3 system call
Error (criu/kerndat.c:1377): Unexpected error from clone3: Operation not permitted
Blocked to ptrace system call
Error (criu/arch/x86/kerndat.c:178): 28: ptrace(PTRACE_TRACEME) failed: Operation not permitted
Blocked to vmsplice system call
Error (criu/pipes.c:184): 0x4c11a: Error splicing data: Operation not permitted

The supported public cloud Kubernetes Service environments currently allow the required system calls used by CRIU by default. No additional configuration is required when you use the following cloud providers:

  • Amazon Elastic Kubernetes Service (EKS)

  • Azure Kubernetes Service (AKS)

Running without the necessary Linux capabilities

Errors occur during checkpoint and restore if the required Linux capabilities are not granted. If the required capabilities are not granted for checkpoint, then the following error occurs during the InstantOn container image build:

Can't exec criu swrk: Operation not permitted
Can't read request: Connection reset by peer
Can't receive response: Invalid argument
[ERROR   ] CWWKC0453E: The server checkpoint request failed with the following message: Could not dump the JVM processes, err=-70

The Operation not permitted message indicates that the required Linux capabilities are not granted. If you are using the checkpoint.sh script, the following error occurs during the RUN checkpoint.sh instruction:

Error: building at STEP "RUN checkpoint.sh afterAppStart": while running runtime: exit status 74

To avoid this error, grant the container image build the CHECKPOINT_RESTORE, SYS_PTRACE, and SETPCAP Linux capabilities. If you use the three-step process to build the container image, make sure the container that is running the checkpoint step is a --privileged container.

If the required capabilities are not granted for restore, the following error occurs when you try to run the InstantOn application container image:

/opt/ol/wlp/bin/server: line 1430: /opt/criu/criu: Operation not permitted
CWWKE0961I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.

The Operation not permitted message is an indication that the required Linux capabilities are not granted for restore.

Supported processors

Currently, the only supported processor is X86-64/AMD64. Other processors are expected to be supported in later releases of Open Liberty InstantOn.