May 20, 2026

Spring Batch 6 Guide Part 4: Job Launch · Scheduling · Operations — Triggers · Idempotent Parameters · Data Sources

Introduction

Through Part 3 we focused on how to write a job — read and transform and write in chunks, handle failure with Skip and Retry, resume on death. Part 4 is about how to run it.

The questions cascade. Who launches the job, and when? Is one @Scheduled line enough, or is cron or K8s better? You scaled out to several servers — now the same job runs twice? Can the aggregation job hit the operational DB directly, or should it read from somewhere else?

Part 4 walks the seven branches of “operations”: the division of labor between JobLauncher and JobOperator, choosing among four schedulers, JobParameters idempotency design, restarting failed jobs, operational monitoring, and the two things that trip people up most in practice — preventing duplicate execution across instances and the five patterns for where a batch reads its data.

The target reader is a backend engineer who has built jobs through Parts 1–3. Basic concepts of scheduling and operational environments (containers, multiple instances) are assumed.

Part 1 — Job · Step · Metadata Identity
Part 2 — Chunk-Oriented Processing — Reader · Processor · Writer
Part 3 — Transactions · Failure Handling — Skip · Retry · Restart
Part 4 — Job Launch · Scheduling · Operations (this post)
Part 5 — Performance · Parallelism — Multi-thread · Partitioning · Remote Workers
Part 6 — Observability · Testing · Deployment
Capstone — Marketplace Analytics Pipeline

TL;DR

JobLauncher is the entry point where code starts a job; JobOperator is the operator’s remote control — @Scheduled calls JobLauncher; restarting or stopping a failed execution by ID is JobOperator.
Pick the scheduler from four options — @Scheduled (simple, single-instance), Quartz (in-app high availability), K8s CronJob (containers), Argo/Airflow (inter-job dependency directed acyclic graphs).
JobParameters design = idempotency key vs new instance every run — make the business date (targetDate) the identifying key for “once a day, restartable,” or add a RunIdIncrementer for a new JobInstance each run (write idempotency then required).
Restart via the same identifying parameters or JobOperator.restart — control restart behavior with preventRestart · startLimit · allowStartIfComplete, and branch flow on ExitStatus.
Monitoring: Spring Batch Admin is gone — replace it with Actuator + Micrometer metrics (Part 6) + JobExplorer queries + failure alerts.
Preventing duplicate execution across instances — N app instances make @Scheduled fire N times. The JobInstance lock blocks concurrent runs with the same parameters, but with limits — guarantee single execution with ShedLock, a Quartz cluster, or CronJob.
Where does a batch read its data — five patterns — A. same DB / B. domain API / C. Read Replica / D. analytics Warehouse / E. change data capture. The usual evolution is A → C → D, and B is rarely used.

1. JobLauncher vs JobOperator

1.1 JobLauncher — launches the job

JobLauncher is the entry point that starts a job from code. It takes a Job and JobParameters, runs it, and returns a JobExecution. It’s exactly what Part 1’s CommandLineRunner or the @Scheduled below calls.

import org.springframework.batch.core.Job
import org.springframework.batch.core.launch.JobLauncher
import org.springframework.batch.core.JobParametersBuilder
import org.springframework.scheduling.annotation.Scheduled
import org.springframework.stereotype.Component
import java.time.LocalDate

@Component
class DailySalesScheduler(
    private val jobLauncher: JobLauncher,
    private val dailySalesJob: Job,
) {
    @Scheduled(cron = "0 0 1 * * *")  // every day at 01:00
    fun launch() {
        val params = JobParametersBuilder()
            .addLocalDate("targetDate", LocalDate.now().minusDays(1))
            .toJobParameters()
        jobLauncher.run(dailySalesJob, params)
    }
}

The default JobLauncher is synchronous — it blocks the calling thread until the job finishes. If you don’t want to hold the @Scheduled thread, you can inject a TaskExecutor to launch asynchronously, but then it returns immediately and you must observe completion separately.

1.2 JobOperator — the operator’s remote control

JobOperator is the API for handling jobs during operations, by a human. If JobLauncher is “code starting a job,” JobOperator deals in job names, parameter strings, and execution IDs to start/stop/restart/abandon. It’s designed to be called from JMX, an ops CLI, or an admin endpoint.

// from an ops endpoint/CLI (by name, string params, execution ID)
val executionId: Long = jobOperator.start("dailySalesJob", properties)  // new run
jobOperator.restart(failedExecutionId)   // restart a failed run (§4)
jobOperator.stop(runningExecutionId)     // request a stop
jobOperator.abandon(stoppedExecutionId)  // abandon a stopped run

1.3 When to use which

Aspect	JobLauncher	JobOperator
Primary user	application code	operators / admin tools
Input	`Job` + `JobParameters` objects	job name + string params / execution ID
Typical caller	`@Scheduled`, `CommandLineRunner`	admin screen, JMX, ops CLI
Good at	triggering a known job	querying history, stopping, restarting after the fact

Note: Spring Batch 6 expanded the JobOperator API, strengthening its role as the single entry point for operational tasks. The conceptual split is unchanged — “normal-flow trigger is JobLauncher, after-the-fact intervention is JobOperator.”

2. Choosing a Scheduler

Who calls JobLauncher — that is, “when does it run” — is the scheduler’s job. Part 1 §1.1 said trigger (when) and execution engine (how) are separate axes; Part 4 now picks that trigger layer in earnest.

Terms: a DAG (Directed Acyclic Graph) expresses ordering and dependencies between jobs without cycles — “B after A, D after B and C.” Backfill means re-running past periods retroactively — e.g., fixing aggregation logic and re-running the last month day by day. HA (High Availability) is a setup where the job keeps running even if one instance dies.

2.1 Four options compared

Scheduler	Where it runs	Single execution (HA)	Inter-job dependency (DAG)	Fits
`@Scheduled`	in-app	❌ (per instance → §7)	❌	single instance, simple schedule
Quartz	in-app	✅ cluster mode	limited	HA needed inside the app
K8s CronJob	cluster	✅ cluster guarantees one	❌	containerized deployment
Argo · Airflow	external orchestrator	✅	✅ DAG, backfill, retries	dependencies across jobs, pipelines

2.2 Decision tree

flowchart TD
    A["A recurring batch is needed"] --> B{"Need inter-job dependencies·<br/>backfill·DAG?"}
    B -->|"Yes"| C["Argo Workflows / Airflow"]
    B -->|"No"| D{"Deployment form?"}
    D -->|"Containers · K8s"| E["K8s CronJob<br/>(cluster guarantees one run)"]
    D -->|"App process"| F{"Multiple instances?"}
    F -->|"Single instance"| G["@Scheduled<br/>(simplest)"]
    F -->|"Multiple instances"| H["@Scheduled + ShedLock<br/>or a Quartz cluster"]

Two questions decide it. Are there dependencies between jobs (A→B→C ordering, backfill, visualization)? → if so, Argo/Airflow. If not, is the deployment containerized? → CronJob; for app processes, by instance count it’s @Scheduled (single) or ShedLock/Quartz (multiple, §7).

2.3 Launching a job from @Scheduled

The most common form is exactly the §1.1 code: @Scheduled calls JobLauncher.run. The cron expression sets the time, and the date to process is passed via JobParameters (§3). But @Scheduled assumes a single instance — with multiple, you need §7’s single-execution guarantee.

3. JobParameters Design

JobParameters is not just input. It defines the identity of a JobInstance (Part 1 §3). So which parameters you make identifying decides “what happens when the same job runs twice.”

3.1 Identifying vs non-identifying parameters

The IDENTIFYING flag from Part 1 works here. Same identifying parameters mean the same JobInstance — the identifying set is the JobInstance’s business key.

Identifying — distinguishes JobInstances. The value carrying business meaning. E.g., targetDate.
Non-identifying — distinguishes runs only within the same JobInstance. E.g., a debug flag, a timestamp.

3.2 The idempotency key = the business date

An aggregation job’s idempotency key is the business value pointing at “what to process” — targetDate.

val params = JobParametersBuilder()
    .addLocalDate("targetDate", LocalDate.now().minusDays(1))  // identifying (default)
    .addString("triggeredBy", "scheduler", false)              // non-identifying
    .toJobParameters()

Make only targetDate identifying and “aggregate 2026-05-16” is one JobInstance. Re-run with the same date and — if it failed before, it restarts; if it succeeded, JobInstanceAlreadyCompleteException (Part 3 §4.4). This is the most common batch identity: “once a day, restartable.”

3.3 The incrementer — a new instance every run

Conversely, if you want to “re-run the same job anytime, multiple times,” add a RunIdIncrementer. A run.id increments each run, making a new JobInstance every time.

JobBuilder("dailySalesJob", jobRepository)
    .incrementer(RunIdIncrementer())   // run.id increments each run → new JobInstance
    .start(aggregateStep)
    .build()

The trade-off is clear. With an incrementer, “blocking re-runs of a succeeded job” disappears — running the same date twice processes it twice. So an incrementer makes write idempotency mandatory (Part 3 §5 upsert). In summary:

Choice	JobInstance	Same-date re-run	Idempotency owner
Date identifying key only	one per date	blocked (or restart)	the framework
Incrementer	new every run	processed afresh each time	Writer upsert (Part 3)

3.4 Parameter validation

To catch a missing required parameter at boot/launch time, add a JobParametersValidator. Declaring required/optional keys on a DefaultJobParametersValidator prevents the accident of launching without targetDate.

4. Restarting Failed Jobs

Part 3 §4 covered the mechanism of restart (ExecutionContext preserves the position). Part 4 covers how you trigger and control it in operations.

4.1 Two ways to trigger a restart

Re-run with the same identifying parameters — pass the same targetDate again via JobLauncher, and the framework finds the incomplete JobInstance and continues it. The default path in scheduler-driven operations.
JobOperator.restart(executionId) — restart by specifying the failed execution ID directly. For human intervention from an admin/CLI.

4.2 Restart-control knobs

Jobs and steps have settings that change restart behavior.

Setting	Where	Effect
`preventRestart()`	Job	once it fails, restart is forbidden entirely
`startLimit(n)`	Step	allow the step to start at most n times
`allowStartIfComplete(true)`	Step	re-run even an already-completed step on restart

allowStartIfComplete is especially useful. The default is “skip completed steps on restart,” but a step like validation or cleanup that must run every time turns this flag on.

4.3 Branching flow on ExitStatus

You can change the next flow based on a step’s result (ExitStatus). For example, “if there were skips, go to a notification step; otherwise end.”

JobBuilder("dailySalesJob", jobRepository)
    .start(aggregateStep)
        .on("COMPLETED WITH SKIPS").to(notifySkipStep)  // notify when skips occurred
        .from(aggregateStep).on("*").end()              // otherwise end normally
    .end()
    .build()

Return a custom ExitStatus from afterStep (e.g., "COMPLETED WITH SKIPS") and the on(...) mapping above receives that value to branch.

5. Operational Monitoring

5.1 Spring Batch Admin is gone

Spring Batch Admin, once the standard, was discontinued long ago. Today, instead of a dedicated UI, you observe operations with Actuator + Micrometer + metadata queries.

5.2 What to watch with

Metrics → Micrometer/Prometheus — expose spring.batch.job · step · item.* metrics via /actuator/prometheus and view them in Grafana. The six metrics and dashboards are covered in depth in Part 6.
Execution history → JobExplorer — query job/step execution history, status, and counts from code. The data source when you build your own admin endpoint.
Failure alerts — detect BatchStatus.FAILED in JobExecutionListener.afterJob and send to Slack/Email. The first safety net you add in operations.

Note: you can also query the six metadata tables (Part 1 §3) directly with SQL. BATCH_JOB_EXECUTION’s STATUS/EXIT_CODE and BATCH_STEP_EXECUTION’s counts are the first diagnostic. Deeper observability is deferred to Part 6.

6. Where Does a Batch Read Its Data — Five Patterns

So far every example assumed “the batch reads the operational DB directly.” But as data grows and starts colliding with operational traffic, “where to read from” governs the whole job design. There are five patterns you meet in practice.

6.1 The five patterns compared

Pattern	Coupling	Infra cost	Load shifted onto	Data freshness	When
A. Same DB directly	high	none	the operational DB itself	real-time	early stage, small scale
B. Domain API call	medium	low	the operational service	real-time	rarely used (§6.3)
C. Read Replica	high	medium	a replica (isolated)	near real-time (replica lag)	need to isolate op load
D. Analytics Warehouse	low	high	fully separated	delayed by ETL cycle	analytics, multi-domain joins
E. CDC event stream	low	very high	fully separated	near real-time (minutes)	large scale, near-real-time need

Attaching to the operational DB with a read-only account (A) is simplest; keeping a dedicated analytics store (D) is the cleanest separation. Cost and separation move in opposite directions.

6.2 The evolution path — A → C → D

Most companies don’t jump straight to D. They walk this order naturally.

flowchart LR
    A["A. Same DB directly<br/>early stage"] --> C["C. Read Replica<br/>when op load hurts"]
    C --> D["D. Analytics Warehouse<br/>analytics·multi-domain joins"]
    A -. "rarely used" .-> B["B. Domain API"]
    D -. "large scale·near real-time" .-> E["E. CDC stream"]

At first A is enough. Once the batch starts inflating operational-DB load, you push reads onto a replica with C (Read Replica) (just swap the datasource on Part 2’s JdbcPagingItemReader). As analytics demand grows and you must join across domains, you move to D (Warehouse).

6.3 Why B (API call) is rarely used

APIs are for single/small responses — paging hundreds of thousands of rows carries heavy serialization overhead.
No transactional consistency across page boundaries — if data changes between pages, you get gaps or duplicates.
It shifts batch load onto the operational service — bulk reads wreck the operational API’s latency.
The only exceptions are third-party APIs (where DB access is impossible) or real-time under ~1,000 rows.

6.4 When E (CDC) shows up

CDC (Change Data Capture) streams DB change logs into something like Kafka to consume near real-time. Powerful, but expensive.

It costs infrastructure like Debezium · Kafka · Schema Registry.
It’s less “batch” and more micro-batch (5-minute to 1-hour windows), which pairs well with time-bucketed aggregation.
The adoption threshold is high — it earns its keep only when “near-real-time analytics” is a business requirement.

6.5 Connection to this series

Every body example in Parts 1–6 assumes A (same DB) — the learning flow is simplest. The capstone picks a simplified variant of D (analytics Warehouse) — splitting operational and analytics schemas inside one PostgreSQL instance (covered in the capstone). Just remember that the data-source decision ripples into everything from Reader choice (Part 2) to idempotency design (Part 3).

7. Preventing Duplicate Execution Across Instances

7.1 The problem — N app instances mean N runs

In operations the app usually runs as several instances. But @Scheduled fires independently on each instance. With three, the same aggregation job triggers three times at 01:00 daily. That’s exactly the “multi-instance duplication” column from Part 1 §1.1’s trigger table.

7.2 What Spring Batch covers, and its limit

The good news is Spring Batch blocks part of this. If you try to run the same JobInstance (same identifying JobParameters) concurrently, the JobRepository lock rejects the second run with JobExecutionAlreadyRunningException.

The catch is the limit. If the trigger passes different parameters each time — say LocalDateTime.now() instead of targetDate — each run becomes a different JobInstance and the lock never engages. Three instances passing slightly different times all run separately. So single execution must be guaranteed at the trigger level too.

7.3 Single-execution guarantee per trigger

Approach	Single-execution guarantee	Extra infra	Fits
`@Scheduled` alone	❌ runs on every instance	none	single-instance deployment only
`@Scheduled` + ShedLock	✅ one run via a distributed lock	lock store (DB/Redis)	keep in-app scheduling with HA
Quartz cluster	✅ one run in cluster mode	Quartz schema (DB)	complex scheduling + HA
K8s CronJob	✅ cluster guarantees one	K8s	containerized deployment
leader election	✅ only the leader runs	a coordinator (ZK/etcd, etc.)	already have a cluster coordinator

7.4 Recommended combination

Containerized → K8s CronJob — the cluster launches it exactly once, so no extra lock is needed. The cleanest option.
Keeping @Scheduled → ShedLock is mandatory — a distributed lock lets only one instance launch the job (Appendix A).
Idempotency (Part 3 §5) is the last safety net — even if trigger single-execution leaks and the job runs twice, an upsert Writer makes the result the same.

The conclusion is one line — preventing duplication = single-trigger + idempotency, doubled up. Don’t trust either alone; layer both.

Recap

The key takeaways from Part 4, one line each:

JobLauncher triggers, JobOperator intervenes after the fact — @Scheduled calls the Launcher; restarting a failed execution by ID is the Operator.
Pick the scheduler by dependency and deployment form — DAGs → Argo/Airflow, containers → CronJob, app processes → @Scheduled or ShedLock/Quartz by instance count.
JobParameters is the JobInstance’s identity — a date identifying key means “once a day, restartable”; an incrementer means “new every run” (write idempotency required).
Data sources usually evolve A → C → D — start on the same DB, isolate load with a Read Replica, move to a Warehouse as analytics demand grows. B (API) is rarely used.
Stop multi-instance duplication with single-trigger + idempotency — the JobInstance lock only blocks concurrent same-parameter runs, so single-trigger with ShedLock/CronJob and keep upsert as the safety net.

Part 5 takes on Performance · Parallelism. So far we’ve run jobs single-threaded. Part 5 covers running the same job faster — multi-threaded Steps, partitioning, remote workers, and JDK 21 virtual threads. Note up front that this is “splitting one job to run faster,” a separate axis from this part’s §7 “preventing duplicate execution.”

Appendix

A. @Scheduled + ShedLock single execution

Expand — triggering exactly once across multiple instances

ShedLock takes a lock in a shared store (DB/Redis) so only one of several instances runs the scheduled method.

import net.javacrumbs.shedlock.spring.annotation.SchedulerLock
import org.springframework.scheduling.annotation.Scheduled
import org.springframework.stereotype.Component

@Component
class DailySalesScheduler(
    private val jobLauncher: JobLauncher,
    private val dailySalesJob: Job,
) {
    @Scheduled(cron = "0 0 1 * * *")
    @SchedulerLock(name = "dailySalesJob", lockAtMostFor = "30m", lockAtLeastFor = "1m")
    fun launch() {
        val params = JobParametersBuilder()
            .addLocalDate("targetDate", LocalDate.now().minusDays(1))
            .toJobParameters()
        jobLauncher.run(dailySalesJob, params)
    }
}

lockAtMostFor is the safety valve that prevents a lock from being held forever if an instance dies. Set it comfortably above the job’s maximum runtime.

B. Data-source pattern selection cheat sheet

Expand — recommended pattern by situation

Situation	Recommended pattern
Just starting, little data	A. Same DB (read-only account)
Batch is inflating operational-DB load	C. Read Replica
Multi-domain joins · analytics reports	D. Analytics Warehouse
Near-real-time analytics is a requirement	E. CDC stream
Third-party data with no DB access	B. API (exceptional)

C. External references

Spring Batch — Running a Job (JobLauncher · JobOperator) — official reference for the launch/operations API
Spring Batch — Controlling Step Flow (restart · startLimit · allowStartIfComplete) — flow control and restart settings
ShedLock — distributed lock for single-execution scheduling across instances
Kubernetes — CronJob — concurrencyPolicy and single-execution guarantees

Tags #Spring Boot #Spring Batch #Kotlin #Backend #Batch