Jan 25, 2026

Spring Boot Practical Guide Part 5: Database Optimization

Previous	Current	Next
Part 4: Resilience Patterns	Part 5: DB Optimization	Part 6: Monitoring

Introduction

In most web applications, performance bottlenecks occur at the database level. Through index design, query optimization, and architectural improvements, you can dramatically improve DB performance.

Topics covered in Part 5:

Principles and design methods for indexes
Handling large-scale data with cursor-based pagination
Read/write separation using Read Replicas
N+1 problem resolution strategies

Why is Database Optimization Important?
Indexes
Pagination Optimization
Read Replica
N+1 Problem
FAQ
Summary

1. Why is Database Optimization Important?

1.1 Most Performance Issues Originate from the DB

Typical web request processing time distribution:

+-------------------------------------------------------------+
|  Network     | App Logic |      Database Query              |
|    10ms      |   20ms    |         200ms                    |
|    (4%)      |   (8%)    |         (87%)                    |
+-------------------------------------------------------------+

> Most latency comes from DB queries
> DB optimization has a major impact on overall performance

1.2 Performance Degradation with Data Growth

100 rows:        SELECT * WHERE status = 'ON_SALE'  ->  1ms
10,000 rows:     SELECT * WHERE status = 'ON_SALE'  ->  50ms
1,000,000 rows:  SELECT * WHERE status = 'ON_SALE'  ->  5,000ms

After adding an index:
1,000,000 rows:  SELECT * WHERE status = 'ON_SALE'  ->  5ms

2. Indexes

2.1 What is an Index?

An index is like the index at the back of a book.

Finding "Kafka" in a book:
+-- Without index: Read every page from start to end (Full Scan)
+-- With index: Go to "K" section -> "Kafka: p.234" -> Jump directly

Finding status = 'ON_SALE' in a DB:
+-- Without index: Scan all rows (Full Table Scan)
+-- With index: B-Tree search -> Found immediately

2.2 Index Structure (B-Tree)

Index: idx_products_status

                    +------------------+
                    |   [ON_SALE]      |
                    |    [SOLD_OUT]    |
                    +--------+---------+
                             |
            +----------------+----------------+
            v                v                v
    +--------------+ +--------------+ +--------------+
    | DELETED      | | ON_SALE      | | SOLD_OUT     |
    | row: 5,12,89 | | row: 1,3,7.. | | row: 2,4,8.. |
    +--------------+ +--------------+ +--------------+

Search: status = 'ON_SALE'
1. Locate ON_SALE position at the root node
2. Navigate to the corresponding leaf node
3. Obtain row pointer
-> O(log n) complexity

2.3 Index Design for the Project

-- V2__add_indexes.sql

-- Products table: Optimize product queries
CREATE INDEX idx_products_status ON products(status);
CREATE INDEX idx_products_category_id ON products(category_id);
CREATE INDEX idx_products_seller_id ON products(seller_id);
CREATE INDEX idx_products_sales_count ON products(sales_count DESC);
CREATE INDEX idx_products_created_at ON products(created_at DESC);

-- Composite indexes: Conditions frequently used together
CREATE INDEX idx_products_status_created_at ON products(status, created_at DESC);
CREATE INDEX idx_products_status_sales_count ON products(status, sales_count DESC);

-- Orders table
CREATE INDEX idx_orders_buyer_id ON orders(buyer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_buyer_created_at ON orders(buyer_id, created_at DESC);

-- Order Items table
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_seller_id ON order_items(seller_id);

2.4 Composite Index

The column order of an index matters!

-- Composite index: (status, created_at)
CREATE INDEX idx_products_status_created_at ON products(status, created_at DESC);

-- Index is used (leading column condition present)
SELECT * FROM products
WHERE status = 'ON_SALE' AND created_at > '2024-01-01';

SELECT * FROM products
WHERE status = 'ON_SALE';

-- Index is NOT used (leading column condition missing)
SELECT * FROM products
WHERE created_at > '2024-01-01';
-- Searching only by created_at without status condition -> Full Scan

Composite Index Design Principles:

1. Place equality (=) condition columns first
2. Place range (>, <, BETWEEN) condition columns later
3. Place columns with higher cardinality (number of unique values) first

Example: WHERE status = 'ON_SALE' AND created_at > '2024-01-01'
    Index: (status, created_at)  -- Good
    Index: (created_at, status)  -- Bad

2.5 Downsides of Indexes

+----------------------------------+----------------------------+
|  Index Advantages                |  Index Disadvantages       |
+----------------------------------+----------------------------+
|  Faster SELECT queries           |  Slower INSERT operations  |
|  Optimized WHERE conditions      |  Slower UPDATE operations  |
|  Optimized ORDER BY              |  Slower DELETE operations  |
|                                  |  Additional storage needed |
+----------------------------------+----------------------------+

Reason:
Indexes must be updated along with data changes
More indexes = more write overhead

Recommendations:
- Only index columns that are frequently queried
- Minimize indexes on write-heavy tables
- Remove unused indexes

2.6 Execution Plan (EXPLAIN)

-- Check query execution plan
EXPLAIN SELECT * FROM products WHERE status = 'ON_SALE';

-- Interpreting results
+----+-------------+----------+------+---------------------+
| id | select_type | table    | type | key                 |
+----+-------------+----------+------+---------------------+
|  1 | SIMPLE      | products | ref  | idx_products_status |
+----+-------------+----------+------+---------------------+

type values:
- ALL: Full Table Scan (worst)
- index: Full Index Scan
- range: Index range scan
- ref: Index lookup (good)
- eq_ref: Unique index lookup (optimal)
- const: Constant lookup (optimal)

3. Pagination Optimization

3.1 The Problem with Offset Pagination

-- Offset pagination
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 0;     -- Page 1
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 20;    -- Page 2
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 19980; -- Page 1000

The problem:

Page 1 (OFFSET 0):
+--------------------------------------------+
| Read 20 rows -> Return 20 rows             |
| Processing time: 5ms                       |
+--------------------------------------------+

Page 1000 (OFFSET 19980):
+--------------------------------------------+
| Read 19,980 rows -> Discard -> Return 20   |
| Processing time: 2,000ms                   |
+--------------------------------------------+

-> Performance degrades linearly as OFFSET grows
-> A serious problem with large datasets

3.2 Cursor-Based Pagination (Keyset Pagination)

-- First page
SELECT * FROM products
ORDER BY created_at DESC, id DESC
LIMIT 21;  -- limit + 1 (to check if next page exists)

-- Next page (cursor = last item's created_at, id)
SELECT * FROM products
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 21;

Advantages:

Page 1:
+--------------------------------------------+
| Read 21 rows via index -> Return 20 rows   |
| Processing time: 5ms                       |
+--------------------------------------------+

Page 1000:
+--------------------------------------------+
| Read 21 rows via index -> Return 20 rows   |
| Processing time: 5ms (same!)               |
+--------------------------------------------+

-> Consistent performance regardless of page depth

3.3 Cursor Response Implementation

// CursorPageResponse.kt
data class CursorPageResponse<T>(
    val content: List<T>,       // Actual data
    val nextCursor: String?,    // Next page cursor (Base64 encoded)
    val hasNext: Boolean,       // Whether next page exists
    val size: Int               // Current page size
) {
    companion object {
        fun <T> of(
            content: List<T>,
            limit: Int,
            cursorExtractor: (T) -> Pair<LocalDateTime, Long>
        ): CursorPageResponse<T> {
            // Query limit + 1 items to determine hasNext
            val hasNext = content.size > limit
            val resultContent = if (hasNext) content.dropLast(1) else content

            val nextCursor = if (hasNext && resultContent.isNotEmpty()) {
                val last = resultContent.last()
                val (timestamp, id) = cursorExtractor(last)
                encodeCursor(timestamp, id)
            } else {
                null
            }

            return CursorPageResponse(
                content = resultContent,
                nextCursor = nextCursor,
                hasNext = hasNext,
                size = resultContent.size
            )
        }

        // Cursor encoding: "2024-01-15T10:30:00:12345" -> Base64
        fun encodeCursor(timestamp: LocalDateTime, id: Long): String {
            val cursorString = "${timestamp}:$id"
            return Base64.getEncoder().encodeToString(cursorString.toByteArray())
        }

        // Cursor decoding: Base64 -> (timestamp, id)
        fun decodeCursor(cursor: String): Pair<LocalDateTime, Long>? {
            return try {
                val decoded = String(Base64.getDecoder().decode(cursor))
                val parts = decoded.split(":")
                val timestamp = LocalDateTime.parse(parts[0])
                val id = parts[1].toLong()
                timestamp to id
            } catch (e: Exception) {
                null
            }
        }
    }
}

3.4 Cursor Query Implementation (QueryDSL)

// ProductJpaRepositoryImpl.kt
override fun searchWithCursor(
    keyword: String?,
    categoryId: Long?,
    status: ProductStatus?,
    cursor: LocalDateTime?,
    cursorId: Long?,
    limit: Int
): List<Product> {
    val query = queryFactory
        .selectFrom(product)
        .where(
            keywordContains(keyword),
            categoryIdEq(categoryId),
            statusEq(status),
            cursorCondition(cursor, cursorId)  // Cursor condition
        )
        .orderBy(product.createdAt.desc(), product.id.desc())
        .limit(limit.toLong() + 1)  // limit + 1

    return query.fetch()
}

// Cursor condition: (created_at, id) < (cursor_time, cursor_id)
private fun cursorCondition(cursor: LocalDateTime?, cursorId: Long?): BooleanExpression? {
    if (cursor == null || cursorId == null) return null

    return product.createdAt.lt(cursor)
        .or(product.createdAt.eq(cursor).and(product.id.lt(cursorId)))
}

3.5 API Response Example

// GET /api/v1/products/cursor?limit=5

// First page
{
  "content": [
    {"id": 100, "name": "Product100", "createdAt": "2024-01-15T10:30:00"},
    {"id": 99, "name": "Product99", "createdAt": "2024-01-15T10:25:00"},
    {"id": 98, "name": "Product98", "createdAt": "2024-01-15T10:20:00"},
    {"id": 97, "name": "Product97", "createdAt": "2024-01-15T10:15:00"},
    {"id": 96, "name": "Product96", "createdAt": "2024-01-15T10:10:00"}
  ],
  "nextCursor": "MjAyNC0wMS0xNVQxMDoxMDowMDo5Ng==",
  "hasNext": true,
  "size": 5
}

// Next page: GET /api/v1/products/cursor?limit=5&cursor=MjAyNC0wMS0xNVQxMDoxMDowMDo5Ng==
{
  "content": [
    {"id": 95, "name": "Product95", "createdAt": "2024-01-15T10:05:00"},
    ...
  ],
  "nextCursor": "...",
  "hasNext": true,
  "size": 5
}

3.6 Offset vs Cursor Comparison

Criteria	Offset	Cursor
Performance	Degrades as page depth increases	Always consistent
Data Consistency	Duplicates/omissions when data is added/deleted	Consistency maintained
Jump to Specific Page	Possible (page=5)	Not possible
Implementation Complexity	Simple	Relatively complex
Use Cases	Admin pages, small datasets	Infinite scroll, large datasets

4. Read Replica

4.1 Why Read/Write Separation is Needed

Typical web service traffic pattern:

Read : Write = 9 : 1 (or even more extreme)

+---------------------------------------------------------+
|                      Single DB                          |
|                                                         |
|   [Read 90%] ----------------+                          |
|                               v                         |
|                        +--------------+                 |
|                        |   Primary    | <- Bottleneck!  |
|                        |   Database   |                 |
|                        +--------------+                 |
|                               ^                         |
|   [Write 10%] ---------------+                          |
|                                                         |
+---------------------------------------------------------+

> Read requests affect write performance
> Single DB handles all load

4.2 Read Replica Architecture

+---------------------------------------------------------+
|                  Read Replica Setup                      |
|                                                         |
|                    +--------------+                     |
|   [Write 10%] --> |   Primary    |                     |
|                    |   (Master)   |                     |
|                    +------+-------+                     |
|                           |                             |
|                    Async Replication                     |
|                           |                             |
|              +------------+------------+                |
|              v            v            v                |
|       +----------+ +----------+ +----------+           |
|       | Replica1 | | Replica2 | | Replica3 |           |
|       +----------+ +----------+ +----------+           |
|              ^            ^            ^                |
|              +------------+------------+                |
|                           |                             |
|   [Read 90%] ------------+                              |
|                                                         |
+---------------------------------------------------------+

Advantages:
- Distributed read load
- Primary focuses on writes
- Easy read scaling by adding Replicas

4.3 Routing DataSource Implementation

// DataSourceConfig.kt
@Configuration
@Profile("prod")
@ConditionalOnProperty(name = ["spring.datasource.replica.enabled"], havingValue = "true")
class DataSourceConfig {

    // Primary DataSource (for writes)
    @Bean
    @ConfigurationProperties(prefix = "spring.datasource.primary")
    fun primaryDataSource(): DataSource {
        return DataSourceBuilder.create().build()
    }

    // Replica DataSource (for reads)
    @Bean
    @ConfigurationProperties(prefix = "spring.datasource.replica")
    fun replicaDataSource(): DataSource {
        return DataSourceBuilder.create().build()
    }

    // Routing DataSource (automatic routing)
    @Bean
    fun routingDataSource(
        @Qualifier("primaryDataSource") primaryDataSource: DataSource,
        @Qualifier("replicaDataSource") replicaDataSource: DataSource
    ): DataSource {
        val routingDataSource = ReplicationRoutingDataSource()

        val dataSourceMap = mapOf<Any, Any>(
            DataSourceType.PRIMARY to primaryDataSource,
            DataSourceType.REPLICA to replicaDataSource
        )

        routingDataSource.setTargetDataSources(dataSourceMap)
        routingDataSource.setDefaultTargetDataSource(primaryDataSource)

        return routingDataSource
    }

    @Primary
    @Bean
    fun dataSource(@Qualifier("routingDataSource") routingDataSource: DataSource): DataSource {
        return LazyConnectionDataSourceProxy(routingDataSource)
    }
}

// Routing decision logic
class ReplicationRoutingDataSource : AbstractRoutingDataSource() {
    override fun determineCurrentLookupKey(): Any {
        // Route to Replica for readOnly transactions, Primary otherwise
        return if (TransactionSynchronizationManager.isCurrentTransactionReadOnly()) {
            DataSourceType.REPLICA
        } else {
            DataSourceType.PRIMARY
        }
    }
}

4.4 The Role of @Transactional(readOnly = true)

@Service
@Transactional(readOnly = true)  // Default: use Replica
class ProductService {

    // Read-only -> Routed to Replica
    fun getProduct(id: Long): ProductResponse {
        return productRepository.findById(id)
    }

    // Read-only -> Routed to Replica
    fun searchProducts(request: SearchRequest): Page<ProductResponse> {
        return productRepository.search(request)
    }

    @Transactional  // Write -> Routed to Primary
    fun createProduct(request: CreateProductRequest): ProductResponse {
        return productRepository.save(Product(...))
    }

    @Transactional  // Write -> Routed to Primary
    fun updateProduct(id: Long, request: UpdateProductRequest): ProductResponse {
        val product = productRepository.findById(id)
        product.update(request)
        return productRepository.save(product)
    }
}

4.5 Replication Lag Problem

Problem scenario:
+---------------------------------------------------------+
|                                                         |
|  T=0ms:  Order saved to Primary (INSERT)                |
|  T=5ms:  "Order complete" response -> Returned to user  |
|  T=10ms: User clicks "View my orders"                  |
|  T=15ms: Query from Replica -> Not replicated yet!      |
|                                                         |
|  User: "I just placed an order but it's not showing!"   |
|                                                         |
+---------------------------------------------------------+

Solutions:

// 1. Read from Primary immediately after writing
@Transactional  // Not readOnly -> Primary
fun createOrderAndGet(request: CreateOrderRequest): OrderResponse {
    val order = orderRepository.save(Order(...))
    // Query within the same transaction -> Reads from Primary
    return OrderResponse.from(order)
}

// 2. Session Affinity
// After a write, the same user reads from Primary for a certain period

// 3. Explicit Primary read
@Transactional(readOnly = false)  // Force Primary
fun getRecentOrder(userId: Long): OrderResponse {
    return orderRepository.findTopByUserIdOrderByCreatedAtDesc(userId)
}

// 4. Accept Eventual Consistency
// Show "Changes will be reflected shortly" in the UI

5. N+1 Problem

5.1 What is the N+1 Problem?

// Query Orders
val orders = orderRepository.findByBuyerId(buyerId)  // 1 query

// Query OrderItems for each Order
orders.forEach { order ->
    val items = order.orderItems  // N queries (Lazy Loading)
    items.forEach { println(it.productName) }
}

// Total queries: 1 + N (if there are 100 orders, that's 101 queries!)

-- Actual queries executed
SELECT * FROM orders WHERE buyer_id = 1;
SELECT * FROM order_items WHERE order_id = 1;
SELECT * FROM order_items WHERE order_id = 2;
SELECT * FROM order_items WHERE order_id = 3;
... (100 more!)

5.2 Solution: Fetch Join

// JPQL Fetch Join
@Query("SELECT o FROM Order o JOIN FETCH o.orderItems WHERE o.buyer.id = :buyerId")
fun findByBuyerIdWithItems(buyerId: Long): List<Order>

// QueryDSL Fetch Join
fun findByBuyerIdWithItems(buyerId: Long): List<Order> {
    return queryFactory
        .selectFrom(order)
        .join(order.orderItems, orderItem).fetchJoin()
        .where(order.buyer.id.eq(buyerId))
        .fetch()
}

// Total queries: 1!

-- Solved with a single query
SELECT o.*, oi.* FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.buyer_id = 1;

5.3 @EntityGraph

@EntityGraph(attributePaths = ["orderItems", "orderItems.product"])
fun findByBuyerId(buyerId: Long): List<Order>

5.4 Batch Size Configuration

# application.yml
spring:
  jpa:
    properties:
      hibernate:
        default_batch_fetch_size: 100

Before (N+1):
SELECT * FROM orders WHERE buyer_id = 1;
SELECT * FROM order_items WHERE order_id = 1;
SELECT * FROM order_items WHERE order_id = 2;
SELECT * FROM order_items WHERE order_id = 3;
... (100 times)

After (Batch):
SELECT * FROM orders WHERE buyer_id = 1;
SELECT * FROM order_items WHERE order_id IN (1,2,3,...,100);
... (2 times!)

5.5 N+1 Solution Strategy Comparison

Strategy	Advantages	Disadvantages	When to Use
Fetch Join	Solved with 1 query	Cannot paginate	Single collection
@EntityGraph	Declarative, simple	Difficult for dynamic queries	Static queries
Batch Size	Global setting	Not a complete solution	Default setting

6. FAQ (Interview Prep)

Q1. Are more indexes always better?

No. Having too many indexes means:

Indexes must be updated on INSERT, UPDATE, DELETE -> Write performance degradation
Additional storage space required for indexes
The optimizer may choose the wrong index

Recommendations:

Only create indexes on frequently queried columns
Remove unused indexes
Use composite indexes to cover multiple conditions

Q2. How do you determine the column order in a composite index?

1. Place equality (=) condition columns first
2. Place range condition columns later
3. Place columns with higher cardinality first

Example: WHERE status = 'ON_SALE' AND price > 10000 AND category_id = 5
Recommended order: (status, category_id, price)

Q3. What are the problems with offset pagination?

Performance degrades linearly as OFFSET grows (reads and discards 19,980 rows)
Duplicates/omissions can occur when data changes
Solution: Cursor-based pagination

Q4. How do you handle Replication Lag with Read Replicas?

1. Read from Primary immediately after writing (same transaction)
2. Session affinity (keep using Primary for a certain period)
3. Explicitly specify Primary for critical reads
4. Accept Eventual Consistency

Q5. What is the N+1 problem and how do you solve it?

The N+1 problem occurs when querying associated entities results in N additional queries.