Spring Batch 6 Guide Part 4: Job Launch · Scheduling · Operations — Triggers · Idempotent Parameters · Data Sources

Spring Batch 6 Guide Part 4: Job Launch · Scheduling · Operations — Triggers · Idempotent Parameters · Data Sources


Introduction

Through Part 3 we focused on how to write a job — read and transform and write in chunks, handle failure with Skip and Retry, resume on death. Part 4 is about how to run it.

The questions cascade. Who launches the job, and when? Is one @Scheduled line enough, or is cron or K8s better? You scaled out to several servers — now the same job runs twice? Can the aggregation job hit the operational DB directly, or should it read from somewhere else?

Part 4 walks the seven branches of “operations”: the division of labor between JobLauncher and JobOperator, choosing among four schedulers, JobParameters idempotency design, restarting failed jobs, operational monitoring, and the two things that trip people up most in practice — preventing duplicate execution across instances and the five patterns for where a batch reads its data.

The target reader is a backend engineer who has built jobs through Parts 1–3. Basic concepts of scheduling and operational environments (containers, multiple instances) are assumed.


TL;DR

  • JobLauncher is the entry point where code starts a job; JobOperator is the operator’s remote control@Scheduled calls JobLauncher; restarting or stopping a failed execution by ID is JobOperator.
  • Pick the scheduler from four options@Scheduled (simple, single-instance), Quartz (in-app high availability), K8s CronJob (containers), Argo/Airflow (inter-job dependency directed acyclic graphs).
  • JobParameters design = idempotency key vs new instance every run — make the business date (targetDate) the identifying key for “once a day, restartable,” or add a RunIdIncrementer for a new JobInstance each run (write idempotency then required).
  • Restart via the same identifying parameters or JobOperator.restart — control restart behavior with preventRestart · startLimit · allowStartIfComplete, and branch flow on ExitStatus.
  • Monitoring: Spring Batch Admin is gone — replace it with Actuator + Micrometer metrics (Part 6) + JobExplorer queries + failure alerts.
  • Preventing duplicate execution across instances — N app instances make @Scheduled fire N times. The JobInstance lock blocks concurrent runs with the same parameters, but with limits — guarantee single execution with ShedLock, a Quartz cluster, or CronJob.
  • Where does a batch read its data — five patterns — A. same DB / B. domain API / C. Read Replica / D. analytics Warehouse / E. change data capture. The usual evolution is A → C → D, and B is rarely used.

1. JobLauncher vs JobOperator

1.1 JobLauncher — launches the job

JobLauncher is the entry point that starts a job from code. It takes a Job and JobParameters, runs it, and returns a JobExecution. It’s exactly what Part 1’s CommandLineRunner or the @Scheduled below calls.

import org.springframework.batch.core.Job
import org.springframework.batch.core.launch.JobLauncher
import org.springframework.batch.core.JobParametersBuilder
import org.springframework.scheduling.annotation.Scheduled
import org.springframework.stereotype.Component
import java.time.LocalDate

@Component
class DailySalesScheduler(
    private val jobLauncher: JobLauncher,
    private val dailySalesJob: Job,
) {
    @Scheduled(cron = "0 0 1 * * *")  // every day at 01:00
    fun launch() {
        val params = JobParametersBuilder()
            .addLocalDate("targetDate", LocalDate.now().minusDays(1))
            .toJobParameters()
        jobLauncher.run(dailySalesJob, params)
    }
}

The default JobLauncher is synchronous — it blocks the calling thread until the job finishes. If you don’t want to hold the @Scheduled thread, you can inject a TaskExecutor to launch asynchronously, but then it returns immediately and you must observe completion separately.

1.2 JobOperator — the operator’s remote control

JobOperator is the API for handling jobs during operations, by a human. If JobLauncher is “code starting a job,” JobOperator deals in job names, parameter strings, and execution IDs to start/stop/restart/abandon. It’s designed to be called from JMX, an ops CLI, or an admin endpoint.

// from an ops endpoint/CLI (by name, string params, execution ID)
val executionId: Long = jobOperator.start("dailySalesJob", properties)  // new run
jobOperator.restart(failedExecutionId)   // restart a failed run (§4)
jobOperator.stop(runningExecutionId)     // request a stop
jobOperator.abandon(stoppedExecutionId)  // abandon a stopped run

1.3 When to use which

AspectJobLauncherJobOperator
Primary userapplication codeoperators / admin tools
InputJob + JobParameters objectsjob name + string params / execution ID
Typical caller@Scheduled, CommandLineRunneradmin screen, JMX, ops CLI
Good attriggering a known jobquerying history, stopping, restarting after the fact

Note: Spring Batch 6 expanded the JobOperator API, strengthening its role as the single entry point for operational tasks. The conceptual split is unchanged — “normal-flow trigger is JobLauncher, after-the-fact intervention is JobOperator.”


2. Choosing a Scheduler

Who calls JobLauncher — that is, “when does it run” — is the scheduler’s job. Part 1 §1.1 said trigger (when) and execution engine (how) are separate axes; Part 4 now picks that trigger layer in earnest.

Terms: a DAG (Directed Acyclic Graph) expresses ordering and dependencies between jobs without cycles — “B after A, D after B and C.” Backfill means re-running past periods retroactively — e.g., fixing aggregation logic and re-running the last month day by day. HA (High Availability) is a setup where the job keeps running even if one instance dies.

2.1 Four options compared

SchedulerWhere it runsSingle execution (HA)Inter-job dependency (DAG)Fits
@Scheduledin-app❌ (per instance → §7)single instance, simple schedule
Quartzin-app✅ cluster modelimitedHA needed inside the app
K8s CronJobcluster✅ cluster guarantees onecontainerized deployment
Argo · Airflowexternal orchestrator✅ DAG, backfill, retriesdependencies across jobs, pipelines

2.2 Decision tree

flowchart TD
    A["A recurring batch is needed"] --> B{"Need inter-job dependencies·<br/>backfill·DAG?"}
    B -->|"Yes"| C["Argo Workflows / Airflow"]
    B -->|"No"| D{"Deployment form?"}
    D -->|"Containers · K8s"| E["K8s CronJob<br/>(cluster guarantees one run)"]
    D -->|"App process"| F{"Multiple instances?"}
    F -->|"Single instance"| G["@Scheduled<br/>(simplest)"]
    F -->|"Multiple instances"| H["@Scheduled + ShedLock<br/>or a Quartz cluster"]

Two questions decide it. Are there dependencies between jobs (A→B→C ordering, backfill, visualization)? → if so, Argo/Airflow. If not, is the deployment containerized? → CronJob; for app processes, by instance count it’s @Scheduled (single) or ShedLock/Quartz (multiple, §7).

2.3 Launching a job from @Scheduled

The most common form is exactly the §1.1 code: @Scheduled calls JobLauncher.run. The cron expression sets the time, and the date to process is passed via JobParameters (§3). But @Scheduled assumes a single instance — with multiple, you need §7’s single-execution guarantee.


3. JobParameters Design

JobParameters is not just input. It defines the identity of a JobInstance (Part 1 §3). So which parameters you make identifying decides “what happens when the same job runs twice.”

3.1 Identifying vs non-identifying parameters

The IDENTIFYING flag from Part 1 works here. Same identifying parameters mean the same JobInstance — the identifying set is the JobInstance’s business key.

  • Identifying — distinguishes JobInstances. The value carrying business meaning. E.g., targetDate.
  • Non-identifying — distinguishes runs only within the same JobInstance. E.g., a debug flag, a timestamp.

3.2 The idempotency key = the business date

An aggregation job’s idempotency key is the business value pointing at “what to process” — targetDate.

val params = JobParametersBuilder()
    .addLocalDate("targetDate", LocalDate.now().minusDays(1))  // identifying (default)
    .addString("triggeredBy", "scheduler", false)              // non-identifying
    .toJobParameters()

Make only targetDate identifying and “aggregate 2026-05-16” is one JobInstance. Re-run with the same date and — if it failed before, it restarts; if it succeeded, JobInstanceAlreadyCompleteException (Part 3 §4.4). This is the most common batch identity: “once a day, restartable.”

3.3 The incrementer — a new instance every run

Conversely, if you want to “re-run the same job anytime, multiple times,” add a RunIdIncrementer. A run.id increments each run, making a new JobInstance every time.

JobBuilder("dailySalesJob", jobRepository)
    .incrementer(RunIdIncrementer())   // run.id increments each run → new JobInstance
    .start(aggregateStep)
    .build()

The trade-off is clear. With an incrementer, “blocking re-runs of a succeeded job” disappears — running the same date twice processes it twice. So an incrementer makes write idempotency mandatory (Part 3 §5 upsert). In summary:

ChoiceJobInstanceSame-date re-runIdempotency owner
Date identifying key onlyone per dateblocked (or restart)the framework
Incrementernew every runprocessed afresh each timeWriter upsert (Part 3)

3.4 Parameter validation

To catch a missing required parameter at boot/launch time, add a JobParametersValidator. Declaring required/optional keys on a DefaultJobParametersValidator prevents the accident of launching without targetDate.


4. Restarting Failed Jobs

Part 3 §4 covered the mechanism of restart (ExecutionContext preserves the position). Part 4 covers how you trigger and control it in operations.

4.1 Two ways to trigger a restart

  • Re-run with the same identifying parameters — pass the same targetDate again via JobLauncher, and the framework finds the incomplete JobInstance and continues it. The default path in scheduler-driven operations.
  • JobOperator.restart(executionId) — restart by specifying the failed execution ID directly. For human intervention from an admin/CLI.

4.2 Restart-control knobs

Jobs and steps have settings that change restart behavior.

SettingWhereEffect
preventRestart()Jobonce it fails, restart is forbidden entirely
startLimit(n)Stepallow the step to start at most n times
allowStartIfComplete(true)Stepre-run even an already-completed step on restart

allowStartIfComplete is especially useful. The default is “skip completed steps on restart,” but a step like validation or cleanup that must run every time turns this flag on.

4.3 Branching flow on ExitStatus

You can change the next flow based on a step’s result (ExitStatus). For example, “if there were skips, go to a notification step; otherwise end.”

JobBuilder("dailySalesJob", jobRepository)
    .start(aggregateStep)
        .on("COMPLETED WITH SKIPS").to(notifySkipStep)  // notify when skips occurred
        .from(aggregateStep).on("*").end()              // otherwise end normally
    .end()
    .build()

Return a custom ExitStatus from afterStep (e.g., "COMPLETED WITH SKIPS") and the on(...) mapping above receives that value to branch.


5. Operational Monitoring

5.1 Spring Batch Admin is gone

Spring Batch Admin, once the standard, was discontinued long ago. Today, instead of a dedicated UI, you observe operations with Actuator + Micrometer + metadata queries.

5.2 What to watch with

  • Metrics → Micrometer/Prometheus — expose spring.batch.job · step · item.* metrics via /actuator/prometheus and view them in Grafana. The six metrics and dashboards are covered in depth in Part 6.
  • Execution history → JobExplorer — query job/step execution history, status, and counts from code. The data source when you build your own admin endpoint.
  • Failure alerts — detect BatchStatus.FAILED in JobExecutionListener.afterJob and send to Slack/Email. The first safety net you add in operations.

Note: you can also query the six metadata tables (Part 1 §3) directly with SQL. BATCH_JOB_EXECUTION’s STATUS/EXIT_CODE and BATCH_STEP_EXECUTION’s counts are the first diagnostic. Deeper observability is deferred to Part 6.


6. Where Does a Batch Read Its Data — Five Patterns

So far every example assumed “the batch reads the operational DB directly.” But as data grows and starts colliding with operational traffic, “where to read from” governs the whole job design. There are five patterns you meet in practice.

6.1 The five patterns compared

PatternCouplingInfra costLoad shifted ontoData freshnessWhen
A. Same DB directlyhighnonethe operational DB itselfreal-timeearly stage, small scale
B. Domain API callmediumlowthe operational servicereal-timerarely used (§6.3)
C. Read Replicahighmediuma replica (isolated)near real-time (replica lag)need to isolate op load
D. Analytics Warehouselowhighfully separateddelayed by ETL cycleanalytics, multi-domain joins
E. CDC event streamlowvery highfully separatednear real-time (minutes)large scale, near-real-time need

Attaching to the operational DB with a read-only account (A) is simplest; keeping a dedicated analytics store (D) is the cleanest separation. Cost and separation move in opposite directions.

6.2 The evolution path — A → C → D

Most companies don’t jump straight to D. They walk this order naturally.

flowchart LR
    A["A. Same DB directly<br/>early stage"] --> C["C. Read Replica<br/>when op load hurts"]
    C --> D["D. Analytics Warehouse<br/>analytics·multi-domain joins"]
    A -. "rarely used" .-> B["B. Domain API"]
    D -. "large scale·near real-time" .-> E["E. CDC stream"]

At first A is enough. Once the batch starts inflating operational-DB load, you push reads onto a replica with C (Read Replica) (just swap the datasource on Part 2’s JdbcPagingItemReader). As analytics demand grows and you must join across domains, you move to D (Warehouse).

6.3 Why B (API call) is rarely used

  • APIs are for single/small responses — paging hundreds of thousands of rows carries heavy serialization overhead.
  • No transactional consistency across page boundaries — if data changes between pages, you get gaps or duplicates.
  • It shifts batch load onto the operational service — bulk reads wreck the operational API’s latency.
  • The only exceptions are third-party APIs (where DB access is impossible) or real-time under ~1,000 rows.

6.4 When E (CDC) shows up

CDC (Change Data Capture) streams DB change logs into something like Kafka to consume near real-time. Powerful, but expensive.

  • It costs infrastructure like Debezium · Kafka · Schema Registry.
  • It’s less “batch” and more micro-batch (5-minute to 1-hour windows), which pairs well with time-bucketed aggregation.
  • The adoption threshold is high — it earns its keep only when “near-real-time analytics” is a business requirement.

6.5 Connection to this series

Every body example in Parts 1–6 assumes A (same DB) — the learning flow is simplest. The capstone picks a simplified variant of D (analytics Warehouse) — splitting operational and analytics schemas inside one PostgreSQL instance (covered in the capstone). Just remember that the data-source decision ripples into everything from Reader choice (Part 2) to idempotency design (Part 3).


7. Preventing Duplicate Execution Across Instances

7.1 The problem — N app instances mean N runs

In operations the app usually runs as several instances. But @Scheduled fires independently on each instance. With three, the same aggregation job triggers three times at 01:00 daily. That’s exactly the “multi-instance duplication” column from Part 1 §1.1’s trigger table.

7.2 What Spring Batch covers, and its limit

The good news is Spring Batch blocks part of this. If you try to run the same JobInstance (same identifying JobParameters) concurrently, the JobRepository lock rejects the second run with JobExecutionAlreadyRunningException.

The catch is the limit. If the trigger passes different parameters each time — say LocalDateTime.now() instead of targetDate — each run becomes a different JobInstance and the lock never engages. Three instances passing slightly different times all run separately. So single execution must be guaranteed at the trigger level too.

7.3 Single-execution guarantee per trigger

ApproachSingle-execution guaranteeExtra infraFits
@Scheduled alone❌ runs on every instancenonesingle-instance deployment only
@Scheduled + ShedLock✅ one run via a distributed locklock store (DB/Redis)keep in-app scheduling with HA
Quartz cluster✅ one run in cluster modeQuartz schema (DB)complex scheduling + HA
K8s CronJob✅ cluster guarantees oneK8scontainerized deployment
leader election✅ only the leader runsa coordinator (ZK/etcd, etc.)already have a cluster coordinator
  • Containerized → K8s CronJob — the cluster launches it exactly once, so no extra lock is needed. The cleanest option.
  • Keeping @Scheduled → ShedLock is mandatory — a distributed lock lets only one instance launch the job (Appendix A).
  • Idempotency (Part 3 §5) is the last safety net — even if trigger single-execution leaks and the job runs twice, an upsert Writer makes the result the same.

The conclusion is one line — preventing duplication = single-trigger + idempotency, doubled up. Don’t trust either alone; layer both.


Recap

The key takeaways from Part 4, one line each:

  • JobLauncher triggers, JobOperator intervenes after the fact@Scheduled calls the Launcher; restarting a failed execution by ID is the Operator.
  • Pick the scheduler by dependency and deployment form — DAGs → Argo/Airflow, containers → CronJob, app processes → @Scheduled or ShedLock/Quartz by instance count.
  • JobParameters is the JobInstance’s identity — a date identifying key means “once a day, restartable”; an incrementer means “new every run” (write idempotency required).
  • Data sources usually evolve A → C → D — start on the same DB, isolate load with a Read Replica, move to a Warehouse as analytics demand grows. B (API) is rarely used.
  • Stop multi-instance duplication with single-trigger + idempotency — the JobInstance lock only blocks concurrent same-parameter runs, so single-trigger with ShedLock/CronJob and keep upsert as the safety net.

Part 5 takes on Performance · Parallelism. So far we’ve run jobs single-threaded. Part 5 covers running the same job faster — multi-threaded Steps, partitioning, remote workers, and JDK 21 virtual threads. Note up front that this is “splitting one job to run faster,” a separate axis from this part’s §7 “preventing duplicate execution.”


Appendix

A. @Scheduled + ShedLock single execution

Expand — triggering exactly once across multiple instances

ShedLock takes a lock in a shared store (DB/Redis) so only one of several instances runs the scheduled method.

import net.javacrumbs.shedlock.spring.annotation.SchedulerLock
import org.springframework.scheduling.annotation.Scheduled
import org.springframework.stereotype.Component

@Component
class DailySalesScheduler(
    private val jobLauncher: JobLauncher,
    private val dailySalesJob: Job,
) {
    @Scheduled(cron = "0 0 1 * * *")
    @SchedulerLock(name = "dailySalesJob", lockAtMostFor = "30m", lockAtLeastFor = "1m")
    fun launch() {
        val params = JobParametersBuilder()
            .addLocalDate("targetDate", LocalDate.now().minusDays(1))
            .toJobParameters()
        jobLauncher.run(dailySalesJob, params)
    }
}

lockAtMostFor is the safety valve that prevents a lock from being held forever if an instance dies. Set it comfortably above the job’s maximum runtime.

B. Data-source pattern selection cheat sheet

Expand — recommended pattern by situation
SituationRecommended pattern
Just starting, little dataA. Same DB (read-only account)
Batch is inflating operational-DB loadC. Read Replica
Multi-domain joins · analytics reportsD. Analytics Warehouse
Near-real-time analytics is a requirementE. CDC stream
Third-party data with no DB accessB. API (exceptional)

C. External references

Shop on Amazon

As an Amazon Associate, I earn from qualifying purchases.