Redis — In-Memory Middleware

⚡

One line — Toward the end of the KT DS cloud-service work, I brought Redis in as the in-memory middleware layer: caching, session store and distributed lock on the client (Spring Boot) side, and an operation model — replication · sharding · Sentinel / Cluster — on the server side. This page distills the engineering manual I wrote so the team could adopt it.

Period: 2023 · KT DS — Cloud service
Role: Backend & middleware engineering · Spring–Redis integration
Scope: Server-side operation + client-side Spring Boot integration
Deliverable: Internal Redis manual (Redis 7.2), used as the team reference

Why Redis — and where it fits

Redis (REmote DIctionary Server) is an open-source, in-memory key–value NoSQL store. Because it keeps data in memory it is extremely fast (~100K ops/s), but the same memory residency makes it volatile — so clustering and backup are non-negotiable for production. That speed-vs-volatility trade-off is exactly why it shines for short-lived, frequently-touched data: caching, session, message queue (Streams), Pub/Sub, and distributed locks. I integrated it at the application level through Spring Boot rather than driving it by hand with redis-cli.

Two integration patterns

Look-aside (read through cache) vs Write-back (buffer writes, then batch-insert) — one big insert beats a thousand small ones

Server-side — operating Redis

Speed comes from living in RAM, so the whole operations story is really about protecting that memory: bounding it, replicating it for failover, and splitting it across servers when one box runs out.

⊙ Memory management

Redis is fast precisely because it reads/writes memory — but if physical memory fills up and the OS starts swapping, every access turns into disk I/O and throughput collapses. So I treat memory as the primary operational signal:

Set maxmemory deliberately, based on the service. Past the limit, Redis evicts according to its eviction policy (below).
Keep each instance ≤ ½ of server RAM — this leaves headroom for the copy-on-write spike during replication (explained next).
Memory is managed in pages/frames, so storing wildly different value sizes causes fragmentation — actual footprint exceeds the stored data. Keeping similarly-sized values together helps.
Watch it continuously (top, plus the INFO metrics further down).

maxmemory-policy	What it evicts when full
`noeviction`	Nothing — returns an OOM error on writes
`allkeys-lru`	Least-recently-used key, across all keys
`volatile-lru`	LRU key, only among keys with a TTL
`volatile-ttl`	Shortest-TTL key first, among keys with a TTL
`allkeys-random`	A random key, across all keys
`volatile-random`	A random key, among keys with a TTL

⊙ Replication — HA / disaster recovery

Redis is a process; it can die at any time. Replication keeps synchronized copies — an Origin (Master) plus Replicas (Slaves) forming a Replica Set. It uses asynchronous replication: the Origin fork()s a child process so it can keep serving while the child ships data to replicas.

Full synchronization — fork a child, snapshot to RDB, ship to replicas, then replay buffered commands

Full sync — used on first set-up or when replicas drift too far: fork → RDB snapshot → transfer → replicas load it; commands arriving meanwhile are buffered and replayed.
Partial resync — for brief disconnects: each side tracks a run_id + replication offset, and the Origin replays only the missing range from its backlog_buffer. If the backlog overflows (or either side restarts), it falls back to a full sync.
The COW gotcha: fork() shares parent pages copy-on-write; writes during sync allocate fresh pages, so a 32 GB instance can momentarily need up to ~64 GB. This is why I prefer many small instances (8 GB × 8) over a few big ones (32 GB × 2), and cap each at ≤ ½ of host RAM.

⊙ Sharding — scaling out

Replication is about availability, not capacity. One instance has finite RAM; when you can't scale up, you scale out by distributing data across servers — horizontal partitioning, called sharding in the NoSQL world. (Unlike DBMS partitioning, which divides within one server, sharding divides at the server boundary.) The real question is how you split the keys:

Strategy	How	Trade-off
Range	Split by key range (e.g. dates)	Simple, but uneven load → hot shards
Modular	`hash(key) % N` servers	Even, but adding a node triggers heavy rebalancing (grow ×2 to halve it)
Index	A central index server places/locates data	Balanced, but the index is an extra failure point
Consistent hashing	Hash both keys and servers onto a ring	Only the affected slice rebalances on add/remove

Consistent hashing localizes rebalancing — if S2 leaves, only its keys move and the rest stay put

⊙ Operation modes — Sentinel & Cluster

Rather than wiring replication and failover by hand, Redis ships two managed modes that package these mechanisms for operations.

Sentinel adds automatic failover to replication; Cluster adds sharding on top via a 16,384-slot hash space

Sentinel (HA): Sentinel processes ride alongside each Redis, monitor the nodes, and on master failure hold a quorum vote to promote a replica; they can also alert via Pub/Sub or shell scripts. Needs ≥ 3 nodes so the vote has a majority. A recovered master rejoins as a replica.
Cluster (HA + sharding): slots 0–16383 are divided across primaries; an incoming key is hashed with crc16 then % 16384 to pick its slot/primary. If a key lands on the wrong node it replies -MOVED and the client redirects. Secondaries auto-promote on failure (a different mechanism from Sentinel). Cost: the client needs slot-aware logic and the cluster carries slot-management overhead.

Mode	High availability	Sharding
Stand-alone	—	—
Sentinel	✓ (auto-failover)	—
Cluster	✓	✓

⊙ Monitoring

Redis is single-threaded, so for CPU you scale the per-core speed, not the core count. The headline signals from INFO:

# Memory
used_memory_human:916.98K        # what Redis uses
used_memory_rss_human:6.63M      # RSS — physical memory, OS view
maxmemory_human:0B
maxmemory_policy:noeviction
# CPU  (single-thread → raise core speed, not count)
used_cpu_sys:853.71
used_cpu_user:923.68
# Replication
role:master
connected_slaves:0
master_repl_offset:0

Client-side — Spring Boot integration

On the application side Redis becomes a client interface problem. With spring-boot-starter-data-redis the connection auto-configures (Lettuce by default; Redisson is the other common client), and a small YAML block wires host / port / password and the Lettuce pool. From there I used three layers — caching, session, distributed lock.

dependencies {
  implementation 'org.springframework.boot:spring-boot-starter-data-redis'
  implementation 'org.springframework.session:spring-session-data-redis'
  implementation 'org.springframework.boot:spring-boot-starter-cache'
}

spring:
  redis:
    host: { redis-server ip }
    port: { redis-server port }
    password: { redis-server password }
    lettuce:
      pool: { max-active: 5, max-idle: 5, min-idle: 2 }

⊙ Caching — why, and three ways

If each WAS caches in its own JVM heap, the instances have to sync their caches with each other. Put the cache in Redis and that disappears — every instance just looks at one place.

Moving the cache out of per-WAS heaps into Redis removes the cross-instance sync cost

1 · Spring Data Repository — a CrudRepository over a @RedisHash domain object; objects map to Redis hashes, @Id/@Indexed become lookup keys, and @TimeToLive sets expiry. Feels just like JPA.

@RedisHash(value = "people", timeToLive = 1800)
public class Person implements Serializable {
    @Id private Long id;
    @Indexed private String name;
    private Integer age;
}

public interface PersonRedisRepository extends CrudRepository<Person, Long> {
    Optional<Person> findPersonByName(String name);   // save / findById / deleteById come for free
}

2 · Caching annotations — wrap a service method in a caching proxy: @Cacheable on reads, @CachePut on updates, @CacheEvict on deletes. You register a CacheManager with a serializer and a default TTL (else redis-cli shows opaque serialized bytes). Note: an updating method must return the value so the proxy can cache it.

@Cacheable(key = "#name", value = "personCache")
public Person findPersonByName(String name) { return repo.getByName(name); }

@CachePut(key = "#person.name", value = "personCache")
public Person update(Person person) { return repo.updateByName(person); }   // must return!

@CacheEvict(key = "#name", value = "personCache")
public void delete(String name) { repo.deleteByName(name); }
// → key in redis: "personCache::kim"

3 · RedisTemplate — the low-level operator when you want a specific data structure directly: opsForValue / opsForList / opsForSet / opsForZSet / opsForHash, plus expire, keys, rename, delete. The catch I flagged for the team: even with List/Hash/Set you still address everything by the Redis key — don't confuse a hash's inner field with the Redis key — and RedisCommandExecutionException: ERR no such key on rename means you need solid error handling.

⊙ Session — Spring Session

Same story as caching: WAS-local sessions force sticky sessions or session sync across the cluster. Backing Spring Session with Redis centralizes it. Config goes in YAML — and one trap I documented: adding @EnableRedisHttpSession makes the annotation's attributes win and your YAML session settings stop taking effect.

server: { servlet: { session: { timeout: 60 } } }   # seconds
spring:
  session:
    store-type: redis
    redis:
      namespace: spring:session:admin   # key prefix
      flush-mode: on_save                # on_save vs immediate
      cleanup-cron: 0/5 * * * * *        # sweep expired remnants

@Bean   // avoid rune-like bytes in redis
RedisSerializer<Object> springSessionDefaultRedisSerializer() {
    return new GenericJackson2JsonRedisSerializer();
}

flush-mode is the subtle one: on_save writes the session only when the response is built, while immediate writes the moment you setAttribute/getAttribute — a behavioral difference you only catch with a deliberate Thread.sleep test before the response returns.

⊙ Distributed lock

To stop two processes/threads from touching a shared resource at once, I implemented a spin lock on Redis with setIfAbsent (atomic "set if not exists") plus a TTL so a crashed holder can't deadlock the resource; unlock is just a delete.

public Boolean lock(Long key) {
    return redisTemplate.opsForValue()
        .setIfAbsent(String.valueOf(key), "lock", Duration.ofMillis(10000L));
}
public Boolean unlock(Long key) { return redisTemplate.delete(String.valueOf(key)); }

// usage — spin until acquired, always release in finally
while (!redisLockRepository.lock(id)) { Thread.sleep(300); }   // sleep cushions Redis load
try   { redisCRUDService.create(p1); }
finally { redisLockRepository.unlock(id); }

The spin lock's weakness is the constant re-polling load on Redis; the Thread.sleep(300) is the deliberate buffer that softens it.

What I took away

Redis is a memory-operations problem first. Most production risk traces back to memory — eviction policy, the copy-on-write spike on fork, and fragmentation — so I size instances to ≤ ½ host RAM and prefer many small ones.
Pick the mode for the need: Sentinel when you only want HA, Cluster when you also need to shard — and remember Cluster pushes slot-awareness onto the client.
On the app side, the right abstraction matters: repository / annotations for caching, Spring Session for session, a TTL-guarded setIfAbsent for locking — each with its own serializer and its own subtle config trap.

⚡

한 줄 요약 — KT DS 클라우드 서비스 작업 막바지에 Redis 를 인메모리 미들웨어 계층으로 도입했다. 클라이언트(Spring Boot) 쪽은 캐싱 · 세션 저장소 · 분산 락, 서버 쪽은 운영 모델(Replication · Sharding · Sentinel / Cluster). 이 페이지는 팀이 도입할 수 있도록 작성한 엔지니어링 매뉴얼을 정리한 것이다.

기간: 2023 · KT DS — 클라우드 서비스
역할: 백엔드·미들웨어 엔지니어링 · Spring–Redis 연동
범위: 서버 측 운영 + 클라이언트 측 Spring Boot 연동
산출물: 사내 Redis 매뉴얼 (Redis 7.2), 팀 레퍼런스로 사용

왜 Redis인가 — 그리고 어디에 쓰는가

Redis(REmote DIctionary Server)는 오픈소스 인메모리 Key–Value NoSQL 저장소다. 메모리에 데이터를 두기에 매우 빠르지만(초당 약 10만 TPS), 같은 이유로 휘발성 이 있어 운영에서는 클러스터링·백업이 필수다. 이 속도 ↔ 휘발성 트레이드오프가 곧 Redis의 쓰임새를 규정한다 — 짧게 자주 접근하는 데이터: 캐싱, 세션, 메시지 큐(Streams), Pub/Sub, 분산 락. 나는 redis-cli 로 직접 다루기보다 Spring Boot 를 통해 애플리케이션 단위로 연동했다.

두 가지 연동 패턴

Look-aside(읽기 캐시) vs Write-back(쓰기 모아 배치 insert) — 큰 insert 한 번이 작은 insert 천 번보다 낫다

서버 측 — Redis 운영

속도가 RAM 거주에서 나오는 만큼, 운영의 본질은 결국 그 메모리를 지키는 것 이다 — 한계를 정하고, 장애 대비로 복제하고, 한 대가 부족할 때 여러 대로 나눈다.

⊙ 메모리 관리

Redis가 빠른 이유가 메모리 Read/Write인데, 물리 메모리가 차서 OS가 스왑 을 시작하면 모든 접근이 디스크 I/O가 되어 처리량이 급감한다. 그래서 메모리를 1순위 운영 지표로 본다:

서비스 특성에 맞춰 maxmemory 를 신중히 설정. 한계를 넘으면 eviction policy(아래)에 따라 데이터를 지운다.
각 인스턴스를 서버 RAM의 ½ 이하 로 — 복제 시 발생하는 copy-on-write 메모리 스파이크(다음 절) 여유를 두기 위함.
메모리는 page/frame 단위로 관리되므로, 크기 편차가 큰 값을 섞어 저장하면 파편화 가 심해진다(실제 데이터보다 메모리를 더 점유). 비슷한 크기끼리 저장하면 유리.
top + 아래 INFO 지표로 지속 모니터링.

maxmemory-policy	가득 찼을 때 삭제 대상
`noeviction`	삭제 안 함 — 쓰기 시 OOM 오류 반환
`allkeys-lru`	전체 키 중 LRU(최근 최소 사용)
`volatile-lru`	TTL 있는 키 중 LRU
`volatile-ttl`	TTL 있는 키 중 TTL 짧은 것부터
`allkeys-random`	전체 키 중 무작위
`volatile-random`	TTL 있는 키 중 무작위

⊙ Replication — 고가용성 / 장애 복구

Redis도 프로세스라 언제든 죽을 수 있다. Replication 은 동기화된 복제본을 유지한다 — Origin(Master) + Replica(Slave)들이 Replica Set 을 이룬다. 비동기 복제를 쓰는데, Origin이 fork() 로 자식 프로세스를 만들어 서비스를 멈추지 않은 채 Replica에 데이터를 보낸다.

전체 동기화 — 자식 fork → RDB 스냅샷 → Replica 전송 → 적재 후 버퍼 명령 재생

전체 동기화(full sync) — 최초 구성 또는 싱크 차이가 클 때: fork → RDB 스냅샷 → 전송 → Replica 적재. 그 사이 들어온 명령은 버퍼에 모았다 재생.
부분 재동기화(partial resync) — 짧은 연결 끊김 시: 양쪽이 run_id + 복제 offset 을 추적하고, Origin이 backlog_buffer 에서 빠진 구간만 재전송. 백로그가 넘치거나 한쪽이 재시작하면 전체 동기화로 회귀.
COW 함정: fork() 는 부모 페이지를 copy-on-write로 공유한다. 동기화 중 write가 발생하면 새 페이지를 할당하므로 32 GB 인스턴스가 순간 최대 ~64 GB 까지 점유할 수 있다. 그래서 큰 인스턴스 몇 개(32 GB×2)보다 작은 인스턴스 여러 개(8 GB×8)를 선호하고, 각각을 호스트 RAM의 ½ 이하로 제한한다.

⊙ Sharding — 확장(Scale-out)

Replication은 가용성이지 용량이 아니다. 한 인스턴스의 RAM은 유한하므로, scale-up이 어려우면 서버를 늘려 데이터를 분산하는 scale-out 으로 간다 — 수평적 파티셔닝, NoSQL에서는 Sharding. (DBMS 파티셔닝이 한 서버 안에서 나누는 것과 달리, Sharding은 서버 단위로 나눈다.) 핵심 질문은 어떻게 키를 나누느냐다:

방식	원리	트레이드오프
Range	키 범위(예: 날짜) 기준 분할	단순하지만 부하 불균형 → 핫 샤드
Modular	`hash(key) % N` 서버	고르지만 증설 시 대규모 리밸런싱(2배씩 늘려 절반으로)
Index	중앙 인덱스 서버가 배치·조회	균형은 좋으나 인덱스가 추가 장애 지점
Consistent Hashing	키와 서버를 같은 링에 해싱	증감 시 영향받는 구간만 리밸런싱

Consistent Hashing은 리밸런싱을 국소화한다 — S2가 빠지면 S2의 키만 이동하고 나머지는 그대로

⊙ 운영 모드 — Sentinel & Cluster

Replication·failover를 수동으로 엮는 대신, Redis는 이 메커니즘들을 운영 단위로 묶은 두 모드를 제공한다.

Sentinel은 Replication에 자동 failover를 더하고, Cluster는 그 위에 16,384 슬롯 해시 공간으로 Sharding을 더한다

Sentinel(HA): Sentinel 프로세스가 각 Redis 옆에 떠서 노드를 감시하고, Master 장애 시 quorum 투표 로 Replica를 승격시킨다. Pub/Sub·셸 스크립트로 알림도 가능. 다수결을 위해 3대 이상 필요. 복구된 Master는 Replica로 재합류.
Cluster(HA + Sharding): 슬롯 0–16383 을 Primary들에 나누고, 유입 키를 crc16 해싱 후 % 16384 로 슬롯/Primary를 정한다. 키가 엉뚱한 노드에 가면 -MOVED 로 응답하고 클라이언트가 재전송. Secondary가 장애 시 자동 승격(Sentinel과 다른 메커니즘). 비용: 클라이언트의 슬롯 인지 로직과 슬롯 관리 오버헤드.

모드	고가용성	Sharding
Stand-alone	—	—
Sentinel	✓ (자동 failover)	—
Cluster	✓	✓

⊙ 모니터링

Redis는 싱글 스레드라 CPU는 코어 수가 아니라 코어당 성능을 올려야 한다. INFO 의 핵심 지표:

# Memory
used_memory_human:916.98K        # Redis 사용량
used_memory_rss_human:6.63M      # RSS — 물리 메모리, OS 관점
maxmemory_human:0B
maxmemory_policy:noeviction
# CPU  (싱글 스레드 → 코어 수 아닌 코어 성능)
used_cpu_sys:853.71
used_cpu_user:923.68
# Replication
role:master
connected_slaves:0
master_repl_offset:0

클라이언트 측 — Spring Boot 연동

애플리케이션 쪽에서 Redis는 클라이언트 인터페이스 문제가 된다. spring-boot-starter-data-redis 만 넣으면 연결이 auto-configuration 되고(기본 Lettuce, 그 외 Redisson), 작은 YAML로 host / port / password 와 Lettuce 풀을 설정한다. 그 위에 캐싱·세션·분산 락 세 계층을 올렸다.

dependencies {
  implementation 'org.springframework.boot:spring-boot-starter-data-redis'
  implementation 'org.springframework.session:spring-session-data-redis'
  implementation 'org.springframework.boot:spring-boot-starter-cache'
}

spring:
  redis:
    host: { redis-server ip }
    port: { redis-server port }
    password: { redis-server password }
    lettuce:
      pool: { max-active: 5, max-idle: 5, min-idle: 2 }

⊙ 캐싱 — 왜, 그리고 세 가지 방법

각 WAS가 자기 JVM 힙에 캐싱하면 인스턴스끼리 캐시를 동기화 해야 한다. 캐시를 Redis에 두면 이 비용이 사라진다 — 모든 인스턴스가 한 곳만 본다.

캐시를 WAS별 힙에서 Redis로 옮기면 인스턴스 간 동기화 비용이 사라진다

1 · Spring Data Repository — @RedisHash 도메인 객체 위의 CrudRepository. 객체는 Redis Hash로 매핑되고, @Id/@Indexed 가 조회 키, @TimeToLive 가 만료를 정한다. JPA와 거의 동일한 사용감.

@RedisHash(value = "people", timeToLive = 1800)
public class Person implements Serializable {
    @Id private Long id;
    @Indexed private String name;
    private Integer age;
}

public interface PersonRedisRepository extends CrudRepository<Person, Long> {
    Optional<Person> findPersonByName(String name);   // save / findById / deleteById 기본 제공
}

2 · 캐싱 어노테이션 — 서비스 메서드를 캐싱 프록시로 감싼다: 조회 @Cacheable, 수정 @CachePut, 삭제 @CacheEvict. CacheManager 에 serializer와 기본 TTL을 등록해야 한다(안 하면 redis-cli 에 직렬화된 바이트만 보임). 주의: 수정 메서드는 프록시가 캐싱할 수 있도록 값을 반드시 return 해야 한다.

@Cacheable(key = "#name", value = "personCache")
public Person findPersonByName(String name) { return repo.getByName(name); }

@CachePut(key = "#person.name", value = "personCache")
public Person update(Person person) { return repo.updateByName(person); }   // 반드시 return!

@CacheEvict(key = "#name", value = "personCache")
public void delete(String name) { repo.deleteByName(name); }
// → redis 키: "personCache::kim"

3 · RedisTemplate — 특정 자료구조를 직접 다룰 때의 저수준 오퍼레이터: opsForValue / opsForList / opsForSet / opsForZSet / opsForHash 와 expire, keys, rename, delete. 팀에 강조한 함정: List/Hash/Set을 써도 모든 접근은 Redis 키 로 한다 — Hash 내부 필드와 Redis 키를 혼동하지 말 것 — 그리고 rename 시 RedisCommandExecutionException: ERR no such key 가 잘 나므로 에러 처리를 탄탄히.

⊙ 세션 — Spring Session

캐싱과 같은 이야기다. WAS 로컬 세션은 sticky session이나 클러스터 간 세션 동기화를 강제한다. Spring Session을 Redis로 받치면 중앙화된다. 설정은 YAML에 — 그리고 문서화한 함정 하나: @EnableRedisHttpSession 을 붙이면 어노테이션 속성이 우선해 YAML 세션 설정이 더 이상 먹지 않는다.

server: { servlet: { session: { timeout: 60 } } }   # 초 단위
spring:
  session:
    store-type: redis
    redis:
      namespace: spring:session:admin   # 키 prefix
      flush-mode: on_save                # on_save vs immediate
      cleanup-cron: 0/5 * * * * *        # 만료 찌꺼기 청소 주기

@Bean   // 룬 문자 방지
RedisSerializer<Object> springSessionDefaultRedisSerializer() {
    return new GenericJackson2JsonRedisSerializer();
}

flush-mode 가 미묘하다: on_save 는 응답 객체를 만들 때 세션을 저장하고, immediate 는 setAttribute/getAttribute 즉시 저장한다 — 응답 전에 Thread.sleep 을 걸어 테스트해야만 차이를 잡아낼 수 있다.

⊙ 분산 락(Distributed Lock)

두 프로세스/스레드가 공유 자원을 동시에 건드리지 못하게, Redis에 스핀 락 을 구현했다 — setIfAbsent("없으면 set"의 원자 연산) + TTL(점유 중 크래시가 자원을 데드락시키지 않도록). unlock 은 그냥 delete.

public Boolean lock(Long key) {
    return redisTemplate.opsForValue()
        .setIfAbsent(String.valueOf(key), "lock", Duration.ofMillis(10000L));
}
public Boolean unlock(Long key) { return redisTemplate.delete(String.valueOf(key)); }

// 사용 — 획득까지 spin, 해제는 항상 finally
while (!redisLockRepository.lock(id)) { Thread.sleep(300); }   // sleep으로 Redis 부하 완충
try   { redisCRUDService.create(p1); }
finally { redisLockRepository.unlock(id); }

스핀 락의 약점은 Redis에 끊임없이 재요청하는 부하다. Thread.sleep(300) 이 이를 완충하는 의도적 버퍼다.

배운 점

Redis는 우선 메모리 운영 문제다. 운영 리스크 대부분이 메모리로 수렴한다 — eviction policy, fork 시 copy-on-write 스파이크, 파편화 — 그래서 인스턴스를 호스트 RAM ½ 이하로 잡고 작은 인스턴스 여러 개를 선호한다.
필요에 맞춰 모드를 고른다: HA만 필요하면 Sentinel, 샤딩까지 필요하면 Cluster — 단 Cluster는 슬롯 인지 로직을 클라이언트에 떠넘긴다.
앱 쪽은 적절한 추상화가 핵심: 캐싱은 repository/어노테이션, 세션은 Spring Session, 락은 TTL을 건 setIfAbsent — 각각 고유의 serializer와 미묘한 설정 함정을 동반한다.