Spring Boot Practical Guide Part 5: Database Optimization

Spring Boot Practical Guide Part 5: Database Optimization


Series Navigation

PreviousCurrentNext
Part 4: Resilience PatternsPart 5: DB OptimizationPart 6: Monitoring

Introduction

In most web applications, performance bottlenecks occur at the database level. Through index design, query optimization, and architectural improvements, you can dramatically improve DB performance.

Topics covered in Part 5:

  • Principles and design methods for indexes
  • Handling large-scale data with cursor-based pagination
  • Read/write separation using Read Replicas
  • N+1 problem resolution strategies

Table of Contents


1. Why is Database Optimization Important?

1.1 Most Performance Issues Originate from the DB

Typical web request processing time distribution:

+-------------------------------------------------------------+
|  Network     | App Logic |      Database Query              |
|    10ms      |   20ms    |         200ms                    |
|    (4%)      |   (8%)    |         (87%)                    |
+-------------------------------------------------------------+

> Most latency comes from DB queries
> DB optimization has a major impact on overall performance

1.2 Performance Degradation with Data Growth

100 rows:        SELECT * WHERE status = 'ON_SALE'  ->  1ms
10,000 rows:     SELECT * WHERE status = 'ON_SALE'  ->  50ms
1,000,000 rows:  SELECT * WHERE status = 'ON_SALE'  ->  5,000ms

After adding an index:
1,000,000 rows:  SELECT * WHERE status = 'ON_SALE'  ->  5ms

2. Indexes

2.1 What is an Index?

An index is like the index at the back of a book.

Finding "Kafka" in a book:
+-- Without index: Read every page from start to end (Full Scan)
+-- With index: Go to "K" section -> "Kafka: p.234" -> Jump directly

Finding status = 'ON_SALE' in a DB:
+-- Without index: Scan all rows (Full Table Scan)
+-- With index: B-Tree search -> Found immediately

2.2 Index Structure (B-Tree)

Index: idx_products_status

                    +------------------+
                    |   [ON_SALE]      |
                    |    [SOLD_OUT]    |
                    +--------+---------+
                             |
            +----------------+----------------+
            v                v                v
    +--------------+ +--------------+ +--------------+
    | DELETED      | | ON_SALE      | | SOLD_OUT     |
    | row: 5,12,89 | | row: 1,3,7.. | | row: 2,4,8.. |
    +--------------+ +--------------+ +--------------+

Search: status = 'ON_SALE'
1. Locate ON_SALE position at the root node
2. Navigate to the corresponding leaf node
3. Obtain row pointer
-> O(log n) complexity

2.3 Index Design for the Project

-- V2__add_indexes.sql

-- Products table: Optimize product queries
CREATE INDEX idx_products_status ON products(status);
CREATE INDEX idx_products_category_id ON products(category_id);
CREATE INDEX idx_products_seller_id ON products(seller_id);
CREATE INDEX idx_products_sales_count ON products(sales_count DESC);
CREATE INDEX idx_products_created_at ON products(created_at DESC);

-- Composite indexes: Conditions frequently used together
CREATE INDEX idx_products_status_created_at ON products(status, created_at DESC);
CREATE INDEX idx_products_status_sales_count ON products(status, sales_count DESC);

-- Orders table
CREATE INDEX idx_orders_buyer_id ON orders(buyer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_buyer_created_at ON orders(buyer_id, created_at DESC);

-- Order Items table
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_seller_id ON order_items(seller_id);

2.4 Composite Index

The column order of an index matters!

-- Composite index: (status, created_at)
CREATE INDEX idx_products_status_created_at ON products(status, created_at DESC);

-- Index is used (leading column condition present)
SELECT * FROM products
WHERE status = 'ON_SALE' AND created_at > '2024-01-01';

SELECT * FROM products
WHERE status = 'ON_SALE';

-- Index is NOT used (leading column condition missing)
SELECT * FROM products
WHERE created_at > '2024-01-01';
-- Searching only by created_at without status condition -> Full Scan

Composite Index Design Principles:

1. Place equality (=) condition columns first
2. Place range (>, <, BETWEEN) condition columns later
3. Place columns with higher cardinality (number of unique values) first

Example: WHERE status = 'ON_SALE' AND created_at > '2024-01-01'
    Index: (status, created_at)  -- Good
    Index: (created_at, status)  -- Bad

2.5 Downsides of Indexes

+----------------------------------+----------------------------+
|  Index Advantages                |  Index Disadvantages       |
+----------------------------------+----------------------------+
|  Faster SELECT queries           |  Slower INSERT operations  |
|  Optimized WHERE conditions      |  Slower UPDATE operations  |
|  Optimized ORDER BY              |  Slower DELETE operations  |
|                                  |  Additional storage needed |
+----------------------------------+----------------------------+

Reason:
Indexes must be updated along with data changes
More indexes = more write overhead

Recommendations:
- Only index columns that are frequently queried
- Minimize indexes on write-heavy tables
- Remove unused indexes

2.6 Execution Plan (EXPLAIN)

-- Check query execution plan
EXPLAIN SELECT * FROM products WHERE status = 'ON_SALE';

-- Interpreting results
+----+-------------+----------+------+---------------------+
| id | select_type | table    | type | key                 |
+----+-------------+----------+------+---------------------+
|  1 | SIMPLE      | products | ref  | idx_products_status |
+----+-------------+----------+------+---------------------+

type values:
- ALL: Full Table Scan (worst)
- index: Full Index Scan
- range: Index range scan
- ref: Index lookup (good)
- eq_ref: Unique index lookup (optimal)
- const: Constant lookup (optimal)

3. Pagination Optimization

3.1 The Problem with Offset Pagination

-- Offset pagination
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 0;     -- Page 1
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 20;    -- Page 2
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 19980; -- Page 1000

The problem:

Page 1 (OFFSET 0):
+--------------------------------------------+
| Read 20 rows -> Return 20 rows             |
| Processing time: 5ms                       |
+--------------------------------------------+

Page 1000 (OFFSET 19980):
+--------------------------------------------+
| Read 19,980 rows -> Discard -> Return 20   |
| Processing time: 2,000ms                   |
+--------------------------------------------+

-> Performance degrades linearly as OFFSET grows
-> A serious problem with large datasets

3.2 Cursor-Based Pagination (Keyset Pagination)

-- First page
SELECT * FROM products
ORDER BY created_at DESC, id DESC
LIMIT 21;  -- limit + 1 (to check if next page exists)

-- Next page (cursor = last item's created_at, id)
SELECT * FROM products
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 21;

Advantages:

Page 1:
+--------------------------------------------+
| Read 21 rows via index -> Return 20 rows   |
| Processing time: 5ms                       |
+--------------------------------------------+

Page 1000:
+--------------------------------------------+
| Read 21 rows via index -> Return 20 rows   |
| Processing time: 5ms (same!)               |
+--------------------------------------------+

-> Consistent performance regardless of page depth

3.3 Cursor Response Implementation

// CursorPageResponse.kt
data class CursorPageResponse<T>(
    val content: List<T>,       // Actual data
    val nextCursor: String?,    // Next page cursor (Base64 encoded)
    val hasNext: Boolean,       // Whether next page exists
    val size: Int               // Current page size
) {
    companion object {
        fun <T> of(
            content: List<T>,
            limit: Int,
            cursorExtractor: (T) -> Pair<LocalDateTime, Long>
        ): CursorPageResponse<T> {
            // Query limit + 1 items to determine hasNext
            val hasNext = content.size > limit
            val resultContent = if (hasNext) content.dropLast(1) else content

            val nextCursor = if (hasNext && resultContent.isNotEmpty()) {
                val last = resultContent.last()
                val (timestamp, id) = cursorExtractor(last)
                encodeCursor(timestamp, id)
            } else {
                null
            }

            return CursorPageResponse(
                content = resultContent,
                nextCursor = nextCursor,
                hasNext = hasNext,
                size = resultContent.size
            )
        }

        // Cursor encoding: "2024-01-15T10:30:00:12345" -> Base64
        fun encodeCursor(timestamp: LocalDateTime, id: Long): String {
            val cursorString = "${timestamp}:$id"
            return Base64.getEncoder().encodeToString(cursorString.toByteArray())
        }

        // Cursor decoding: Base64 -> (timestamp, id)
        fun decodeCursor(cursor: String): Pair<LocalDateTime, Long>? {
            return try {
                val decoded = String(Base64.getDecoder().decode(cursor))
                val parts = decoded.split(":")
                val timestamp = LocalDateTime.parse(parts[0])
                val id = parts[1].toLong()
                timestamp to id
            } catch (e: Exception) {
                null
            }
        }
    }
}

3.4 Cursor Query Implementation (QueryDSL)

// ProductJpaRepositoryImpl.kt
override fun searchWithCursor(
    keyword: String?,
    categoryId: Long?,
    status: ProductStatus?,
    cursor: LocalDateTime?,
    cursorId: Long?,
    limit: Int
): List<Product> {
    val query = queryFactory
        .selectFrom(product)
        .where(
            keywordContains(keyword),
            categoryIdEq(categoryId),
            statusEq(status),
            cursorCondition(cursor, cursorId)  // Cursor condition
        )
        .orderBy(product.createdAt.desc(), product.id.desc())
        .limit(limit.toLong() + 1)  // limit + 1

    return query.fetch()
}

// Cursor condition: (created_at, id) < (cursor_time, cursor_id)
private fun cursorCondition(cursor: LocalDateTime?, cursorId: Long?): BooleanExpression? {
    if (cursor == null || cursorId == null) return null

    return product.createdAt.lt(cursor)
        .or(product.createdAt.eq(cursor).and(product.id.lt(cursorId)))
}

3.5 API Response Example

// GET /api/v1/products/cursor?limit=5

// First page
{
  "content": [
    {"id": 100, "name": "Product100", "createdAt": "2024-01-15T10:30:00"},
    {"id": 99, "name": "Product99", "createdAt": "2024-01-15T10:25:00"},
    {"id": 98, "name": "Product98", "createdAt": "2024-01-15T10:20:00"},
    {"id": 97, "name": "Product97", "createdAt": "2024-01-15T10:15:00"},
    {"id": 96, "name": "Product96", "createdAt": "2024-01-15T10:10:00"}
  ],
  "nextCursor": "MjAyNC0wMS0xNVQxMDoxMDowMDo5Ng==",
  "hasNext": true,
  "size": 5
}

// Next page: GET /api/v1/products/cursor?limit=5&cursor=MjAyNC0wMS0xNVQxMDoxMDowMDo5Ng==
{
  "content": [
    {"id": 95, "name": "Product95", "createdAt": "2024-01-15T10:05:00"},
    ...
  ],
  "nextCursor": "...",
  "hasNext": true,
  "size": 5
}

3.6 Offset vs Cursor Comparison

CriteriaOffsetCursor
PerformanceDegrades as page depth increasesAlways consistent
Data ConsistencyDuplicates/omissions when data is added/deletedConsistency maintained
Jump to Specific PagePossible (page=5)Not possible
Implementation ComplexitySimpleRelatively complex
Use CasesAdmin pages, small datasetsInfinite scroll, large datasets

4. Read Replica

4.1 Why Read/Write Separation is Needed

Typical web service traffic pattern:

Read : Write = 9 : 1 (or even more extreme)

+---------------------------------------------------------+
|                      Single DB                          |
|                                                         |
|   [Read 90%] ----------------+                          |
|                               v                         |
|                        +--------------+                 |
|                        |   Primary    | <- Bottleneck!  |
|                        |   Database   |                 |
|                        +--------------+                 |
|                               ^                         |
|   [Write 10%] ---------------+                          |
|                                                         |
+---------------------------------------------------------+

> Read requests affect write performance
> Single DB handles all load

4.2 Read Replica Architecture

+---------------------------------------------------------+
|                  Read Replica Setup                      |
|                                                         |
|                    +--------------+                     |
|   [Write 10%] --> |   Primary    |                     |
|                    |   (Master)   |                     |
|                    +------+-------+                     |
|                           |                             |
|                    Async Replication                     |
|                           |                             |
|              +------------+------------+                |
|              v            v            v                |
|       +----------+ +----------+ +----------+           |
|       | Replica1 | | Replica2 | | Replica3 |           |
|       +----------+ +----------+ +----------+           |
|              ^            ^            ^                |
|              +------------+------------+                |
|                           |                             |
|   [Read 90%] ------------+                              |
|                                                         |
+---------------------------------------------------------+

Advantages:
- Distributed read load
- Primary focuses on writes
- Easy read scaling by adding Replicas

4.3 Routing DataSource Implementation

// DataSourceConfig.kt
@Configuration
@Profile("prod")
@ConditionalOnProperty(name = ["spring.datasource.replica.enabled"], havingValue = "true")
class DataSourceConfig {

    // Primary DataSource (for writes)
    @Bean
    @ConfigurationProperties(prefix = "spring.datasource.primary")
    fun primaryDataSource(): DataSource {
        return DataSourceBuilder.create().build()
    }

    // Replica DataSource (for reads)
    @Bean
    @ConfigurationProperties(prefix = "spring.datasource.replica")
    fun replicaDataSource(): DataSource {
        return DataSourceBuilder.create().build()
    }

    // Routing DataSource (automatic routing)
    @Bean
    fun routingDataSource(
        @Qualifier("primaryDataSource") primaryDataSource: DataSource,
        @Qualifier("replicaDataSource") replicaDataSource: DataSource
    ): DataSource {
        val routingDataSource = ReplicationRoutingDataSource()

        val dataSourceMap = mapOf<Any, Any>(
            DataSourceType.PRIMARY to primaryDataSource,
            DataSourceType.REPLICA to replicaDataSource
        )

        routingDataSource.setTargetDataSources(dataSourceMap)
        routingDataSource.setDefaultTargetDataSource(primaryDataSource)

        return routingDataSource
    }

    @Primary
    @Bean
    fun dataSource(@Qualifier("routingDataSource") routingDataSource: DataSource): DataSource {
        return LazyConnectionDataSourceProxy(routingDataSource)
    }
}

// Routing decision logic
class ReplicationRoutingDataSource : AbstractRoutingDataSource() {
    override fun determineCurrentLookupKey(): Any {
        // Route to Replica for readOnly transactions, Primary otherwise
        return if (TransactionSynchronizationManager.isCurrentTransactionReadOnly()) {
            DataSourceType.REPLICA
        } else {
            DataSourceType.PRIMARY
        }
    }
}

4.4 The Role of @Transactional(readOnly = true)

@Service
@Transactional(readOnly = true)  // Default: use Replica
class ProductService {

    // Read-only -> Routed to Replica
    fun getProduct(id: Long): ProductResponse {
        return productRepository.findById(id)
    }

    // Read-only -> Routed to Replica
    fun searchProducts(request: SearchRequest): Page<ProductResponse> {
        return productRepository.search(request)
    }

    @Transactional  // Write -> Routed to Primary
    fun createProduct(request: CreateProductRequest): ProductResponse {
        return productRepository.save(Product(...))
    }

    @Transactional  // Write -> Routed to Primary
    fun updateProduct(id: Long, request: UpdateProductRequest): ProductResponse {
        val product = productRepository.findById(id)
        product.update(request)
        return productRepository.save(product)
    }
}

4.5 Replication Lag Problem

Problem scenario:
+---------------------------------------------------------+
|                                                         |
|  T=0ms:  Order saved to Primary (INSERT)                |
|  T=5ms:  "Order complete" response -> Returned to user  |
|  T=10ms: User clicks "View my orders"                  |
|  T=15ms: Query from Replica -> Not replicated yet!      |
|                                                         |
|  User: "I just placed an order but it's not showing!"   |
|                                                         |
+---------------------------------------------------------+

Solutions:

// 1. Read from Primary immediately after writing
@Transactional  // Not readOnly -> Primary
fun createOrderAndGet(request: CreateOrderRequest): OrderResponse {
    val order = orderRepository.save(Order(...))
    // Query within the same transaction -> Reads from Primary
    return OrderResponse.from(order)
}

// 2. Session Affinity
// After a write, the same user reads from Primary for a certain period

// 3. Explicit Primary read
@Transactional(readOnly = false)  // Force Primary
fun getRecentOrder(userId: Long): OrderResponse {
    return orderRepository.findTopByUserIdOrderByCreatedAtDesc(userId)
}

// 4. Accept Eventual Consistency
// Show "Changes will be reflected shortly" in the UI

5. N+1 Problem

5.1 What is the N+1 Problem?

// Query Orders
val orders = orderRepository.findByBuyerId(buyerId)  // 1 query

// Query OrderItems for each Order
orders.forEach { order ->
    val items = order.orderItems  // N queries (Lazy Loading)
    items.forEach { println(it.productName) }
}

// Total queries: 1 + N (if there are 100 orders, that's 101 queries!)
-- Actual queries executed
SELECT * FROM orders WHERE buyer_id = 1;
SELECT * FROM order_items WHERE order_id = 1;
SELECT * FROM order_items WHERE order_id = 2;
SELECT * FROM order_items WHERE order_id = 3;
... (100 more!)

5.2 Solution: Fetch Join

// JPQL Fetch Join
@Query("SELECT o FROM Order o JOIN FETCH o.orderItems WHERE o.buyer.id = :buyerId")
fun findByBuyerIdWithItems(buyerId: Long): List<Order>

// QueryDSL Fetch Join
fun findByBuyerIdWithItems(buyerId: Long): List<Order> {
    return queryFactory
        .selectFrom(order)
        .join(order.orderItems, orderItem).fetchJoin()
        .where(order.buyer.id.eq(buyerId))
        .fetch()
}

// Total queries: 1!
-- Solved with a single query
SELECT o.*, oi.* FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.buyer_id = 1;

5.3 @EntityGraph

@EntityGraph(attributePaths = ["orderItems", "orderItems.product"])
fun findByBuyerId(buyerId: Long): List<Order>

5.4 Batch Size Configuration

# application.yml
spring:
  jpa:
    properties:
      hibernate:
        default_batch_fetch_size: 100
Before (N+1):
SELECT * FROM orders WHERE buyer_id = 1;
SELECT * FROM order_items WHERE order_id = 1;
SELECT * FROM order_items WHERE order_id = 2;
SELECT * FROM order_items WHERE order_id = 3;
... (100 times)

After (Batch):
SELECT * FROM orders WHERE buyer_id = 1;
SELECT * FROM order_items WHERE order_id IN (1,2,3,...,100);
... (2 times!)

5.5 N+1 Solution Strategy Comparison

StrategyAdvantagesDisadvantagesWhen to Use
Fetch JoinSolved with 1 queryCannot paginateSingle collection
@EntityGraphDeclarative, simpleDifficult for dynamic queriesStatic queries
Batch SizeGlobal settingNot a complete solutionDefault setting

6. FAQ (Interview Prep)

Q1. Are more indexes always better?

No. Having too many indexes means:

  • Indexes must be updated on INSERT, UPDATE, DELETE -> Write performance degradation
  • Additional storage space required for indexes
  • The optimizer may choose the wrong index

Recommendations:

  • Only create indexes on frequently queried columns
  • Remove unused indexes
  • Use composite indexes to cover multiple conditions

Q2. How do you determine the column order in a composite index?

1. Place equality (=) condition columns first
2. Place range condition columns later
3. Place columns with higher cardinality first

Example: WHERE status = 'ON_SALE' AND price > 10000 AND category_id = 5
Recommended order: (status, category_id, price)

Q3. What are the problems with offset pagination?

  • Performance degrades linearly as OFFSET grows (reads and discards 19,980 rows)
  • Duplicates/omissions can occur when data changes
  • Solution: Cursor-based pagination

Q4. How do you handle Replication Lag with Read Replicas?

1. Read from Primary immediately after writing (same transaction)
2. Session affinity (keep using Primary for a certain period)
3. Explicitly specify Primary for critical reads
4. Accept Eventual Consistency

Q5. What is the N+1 problem and how do you solve it?

The N+1 problem occurs when querying associated entities results in N additional queries.

Solutions:

  • Fetch Join: JOIN FETCH
  • @EntityGraph
  • Batch Size configuration

7. Project File Structure

marketplace/
+-- marketplace-api/
|   +-- src/main/
|       +-- kotlin/.../
|       |   +-- common/
|       |   |   +-- CursorPageResponse.kt     # Cursor response
|       |   +-- config/
|       |   |   +-- DataSourceConfig.kt       # Read Replica routing
|       |   +-- product/
|       |       +-- ProductController.kt      # Cursor API
|       |
|       +-- resources/
|           +-- db/migration/
|               +-- V2__add_indexes.sql       # Index DDL
|
+-- marketplace-infra/
    +-- src/main/kotlin/.../
        +-- ProductJpaRepositoryImpl.kt       # Cursor query implementation

8. Hands-On Practice

# 1. Check query execution plan
# H2 Console (http://localhost:8080/h2-console)
EXPLAIN SELECT * FROM products WHERE status = 'ON_SALE';

# 2. Test cursor pagination
# First page
curl "http://localhost:8080/api/v1/products/cursor?limit=5"

# Next page (use the nextCursor value)
curl "http://localhost:8080/api/v1/products/cursor?limit=5&cursor=<NEXT_CURSOR>"

# 3. Check index list (MySQL)
SHOW INDEX FROM products;

Summary

Comparison by Technique

TechniquePurposeWhen to Apply
IndexImprove query performanceFrequently searched columns
Composite IndexOptimize multi-condition queriesMultiple WHERE conditions
Cursor PaginationHandle large-scale listsInfinite scroll, large datasets
Read ReplicaDistribute read loadHigh read-to-write ratio
Fetch JoinSolve N+1 problemWhen loading associated entities

N+1 Resolution Strategies

StrategyAdvantagesDisadvantagesWhen to Use
Fetch JoinSolved with 1 queryCannot paginateSingle collection
@EntityGraphDeclarative, simpleDifficult for dynamic queriesStatic queries
Batch SizeGlobal settingNot a complete solutionDefault setting

Quick Checklist

  • Are frequently queried columns indexed?
  • Is the column order in composite indexes correct? (equality -> range)
  • Is cursor-based pagination used for large-scale lists?
  • Is readOnly = true set on read-only transactions?
  • Are there any queries causing N+1 problems?
  • Are there any unused indexes?
  • Have you verified query execution plans with EXPLAIN?

The next part covers Monitoring and Observability.

Next: Part 6 - Monitoring

This post is part of the Coupang Partners program, and a commission is earned from qualifying purchases.