Limit number of records loaded by the JDBC job repository

In batch

Table of Contents

Overview
Issue Metadata
Requirements
- Hard Requirements
- Nice-to-Have Requirements
Implementation Plan
- JBeret Core
- Wildfly
Test Plan
Community Documentation
Release Note Content

Overview

The batch subsystem uses job repositories to store metadata about batch job executions. The job repositories implemented in Wildfly are the in-memory job repository and the JDBC job repository.

When the batch subsystem is configured to use the JDBC job repository to store execution metadata, and when it accumulates a large number of records, degraded performance can be observed when deploying applications which use batch jobs. This performance degradation is caused by the fact that the batch subsystem needs to load past execution records during the application deployment.

We have reports from customers and upstream users whose repositories contain hundreds of thousands of records. These users can’t prune their job repositories any further due to business requirements to keep operational records for some minimum amount of time. The reported delays during application deployments are as high as several minutes. OutOfMemory exceptions and failed deployments were also reported.

Several fixes have already been completed, which optimized processing of these large job repositories:

While these fixes improved the deployment times considerably, an upstream user reported that application deployments still take significantly longer in their environment, than they used to take with an older version of Wildfly.

Debugging performed by us and by the upstream user indicates that most time is spent on reading records from the JDBC ResultSet, not on executing the actual database query. Database latency seems to be an important factor.

Another issue noticed later on is that the Wildfly Web Console is not able to process and display such large number job executions, as the operation that loads the data fails with OutOfMemory exception. (Neither is the Web Console designed to display that amount of records.)

Hence, it is proposed to implement a way to limit the number of execution records that are loaded by the job repository implementations.

Issue Metadata

Issue

https://issues.redhat.com/browse/WFLY-15525

Dev Contacts

Tomas Hofman

QE Contacts

Testing By

Engineering
QE

Affected Projects or Components

JBeret Core,
Batch subsystem.

Other Interested Projects

Requirements

Hard Requirements

Make it possible to limit the number of job executions returned by job repository implementations. If this limitation behavior is active, the subset of records returned by the job repository must contain the youngest records, sorted from the youngest to the oldest. Older records which do not fit into the specified limit will be omitted.

If the limit is not specified, current behavior will apply and all available records will be loaded.

Nice-to-Have Requirements

If the limiting behavior comes to an effect, print a debug or a warning log message.
Add debug messages that would make it easier to identify where the JDBC job repository spends time when loading execution records.

Implementation Plan

JBeret and Wildfly code bases need to be modified.

JBeret Core

Currently, the JobRepository interface has method List<Long> getJobExecutionsByJob(String jobName), which returns a list of all job execution IDs for given jobName present in the repository.

The proposal is to add a new method List<Long> getJobExecutionsByJob(String jobName, Integer maxRecords), that would return a list with maximum size of maxRecords.

All JobRepository implementations would to implement the new method: InMemoryRepository, InfinispanRepository, JdbcRepository, MongoRepository. The original method would remain present.

Wildfly

Consume new version of JBeret Core.
Add a new attribute execution-records-limit of the type int would be added to the JdbcJobRepository and InMemoryJobRepository resources. The attribute would be nillable and not required, i.e. by default it would be undefined. If set, the minimal possible value is 1.
New "batch-jberet" schema version needs to be released, containing above attribute.
If execution-records-limit attribute was set, the number would be passed to the JobRepository#getJobExecutionsByJob(String jobName, Integer maxRecords) call, if not set the original method can be called or null can be passed as the maxRecords parameter.

Test Plan

Subsystem tests:
- Standard subsystem test case in the batch subsystem - verifies that subsystem XML config can be read and serialized again (with and without the new attribute), and the serialized XML is identical to the original one.
- Test that the new attribute can be set via a management operation. Test minimal value (1).
Integration tests:
- Create a deployment with batch job. Execute the batch job number of times. Via the JobOperator instance, load job executions for executed job name and check the number of returned executions. This should be tested with the execution-records-limit attribute undefined and then with the attribute set to a low number. When set, number of returned executions should be limited to that number.

Community Documentation

The feature will be documented in the docs/ folder of the Wildfly repository, in the "Subsystem Configuration / Jakarta Batch" chapter.

Release Note Content

New attribute execution-records-limit of type int has been added to the batch job repository resources (both in-memory and JDBC based). When set, the job repository will never return more than the specified limit of job execution instances. Setting this attribute is useful in situations when the job repository stores an unusually large number of job executions, which can cause delays during application deployments or out-of-memory errors when trying to display the list of executions in the Web Console.