Reinforced Security Agent (AI Agent)

One line — A self-reinforcing security agent (LLM + ReAct) that replaces rule-of-thumb WAF: it analyzes request context, decides whether it’s an attack, blocks it, and autonomously updates its own defense policy — a closed loop of detection → blocking → policy update → re-detection.

Period: April 1, 2025 – July 10, 2025
Role: System planning, design, and full implementation
Org: SK Inc. AX, Manufacturing Development Platform Team
Type: In-house R&D PoC (Proof of Concept)

Background & planning

Recent hacking incidents at organizations such as SKT and the Korea Employment Information Service made cybersecurity a major social issue. In our team’s AWS Cloud environment, security checks ran on a WAF with a rule-of-thumb approach, which was weak against newly emerging attack patterns. So I started an R&D PoC for a self-reinforcing security agent that reads the context of incoming requests, judges potential attacks, and autonomously strengthens defense policies.

Figure 01. Reinforced Security Agent replaces WAF — Gateway, Monitoring, and LLM (ReAct) form a closed self-reinforcing loop

Impact

Real-time anomaly detection — adapts to unpredictable new threats while improving response time and resource efficiency.
Self-reinforcing security loop — detection → blocking → policy update → re-detection.
Automated policy updates — threat assessments are immediately reflected in Gateway policies.
Integrated admin notifications + auto-blocking — better operational efficiency and reliability.

Tech stack

SecurityAgentRAGReActLog PythonFastAPILangGraphLangChain Qdrant (VectorDB)

Key roles & achievements

Designed and implemented a fully automated end-to-end security system (Feedback System) by integrating Secure Monitoring and Secure Gateway agents into an LLM-based ReAct framework — covering log-based detection, real-time blocking and policy reinforcement.

⊙ Monitoring–Gateway integration architecture

Designed a self-reinforcing loop: initial blocking → log collection → anomaly detection → automated response.
Implemented a bi-directional feedback mechanism between the Gateway and Monitoring agents.
Developed a ReAct-based reasoning & policy-reinforcement flow powered by LLMs.

⊙ LLM-based threat detection & response

Designed LLM prompts & tools for log summarization and threat assessment.
Enabled rapid response via Qdrant vector-DB caching.
Built suspicious-request classification (IP, URL, User-Agent) with few-shot techniques.
Implemented automated admin notifications and Gateway policy auto-update.

Troubleshooting 1 — Agent memory (token-based message management)

Unlike typical chat systems that continuously exchange messages with a user, here the server delivers only the initial System and Human messages at startup, then runs in a repeated loop. As messages accumulate, the initial System/Human context can eventually drop out. To prevent this, messages are segmented into 10,000-token units, and every new message must explicitly include both the System and Human messages — keeping context consistent over long-term operation.

Figure 02. To prevent early-context loss in long-running operation, messages are split into 10,000-token chunks and every chunk explicitly includes the System/Human messages

Troubleshooting 2 — Log pipeline (long context to the LLM)

Sending the entire raw log to the LLM risks exceeding token limits; conversely, dumping raw logs into a Vector DB and querying them loses the flow/continuity of the logs. Vector storage also explodes — embedding just 13.3 MB of raw log with text-embedding-3-large expands to 914.2 MB.

Embedding model	Location	Dim	Raw (log)	Vectorized	Time
text-embedding-3-large	External	3072	13.3 MB	914.2 MB	41m 30.6s
text-embedding-3-small	External	1536	13.3 MB	754.3 MB	20m 44.0s
all-MiniLM-L6-v2	Local	384	13.3 MB	625.6 MB	22m 53.0s

[Table] Storage expansion by embedding model

Figure 03. Message Broker-based pipeline — the agent pulls only as much log data as needed via Tool APIs and passes it to the LLM

Solution: deliver logs through a Message Broker–based pipeline. The agent polls data from the broker only as needed; polling is implemented as a Tool API; the agent invokes the tool and autonomously decides how many entries to request — ensuring efficient context handling.

Troubleshooting 3 — Limited reasoning (Inference Tool)

A ReAct agent mostly focuses on tool invocation with limited autonomous reasoning. I added an Inference Tool (think_aloud) so the agent can reason independently at intermediate steps. Tools can also conflict (Tool 1 says OK, Tool 2 says Not OK; only the last result may be applied) — resolved via prompt engineering so the agent weighs multiple tool outputs appropriately.

Troubleshooting 4 — Reducing external-API latency

Local embedding — a SentenceTransformer-based embedding layer is deployed locally, minimizing reliance on external API calls.
Optimized initial judgment — if the LLM can make a first-level decision immediately, the system handles it directly (fast path) without invoking the full ReAct graph; only ambiguous cases go to the ReAct tools.

한 줄 요약 — Rule-of-Thumb WAF 를 대체하는 자기강화 보안 에이전트(LLM + ReAct). 요청의 맥락을 분석해 공격 여부를 판단하고 차단한 뒤, 스스로 방어 정책을 갱신 한다 — 탐지 → 차단 → 정책 업데이트 → 재탐지의 닫힌 루프.

기간: 2025년 4월 1일 – 7월 10일
역할: 시스템 기획, 설계, 전체 구현
소속: SK Inc. AX, Manufacturing Development Platform Team
성격: 사내 R&D PoC (개인 단위 연구과제)

배경 & 기획

최근 SKT, 한국고용정보원 등의 해킹 사고로 사이버 보안이 큰 사회적 이슈가 되었다. 우리 팀 AWS Cloud 환경에서는 Rule-of-Thumb 방식의 WAF 로 보안 점검을 했는데, 새롭게 등장하는 공격 패턴에 취약했다. 이 한계를 해결하고자, 들어오는 요청의 맥락을 분석해 공격을 판단하고 스스로 방어 정책을 강화하는 자기강화 보안 에이전트 R&D 과제를 시작했다.

그림 01. WAF 를 대체하는 Reinforced Security Agent — Gateway, Monitoring, LLM(ReAct)이 닫힌 자기강화 루프를 이룬다

성과

실시간 이상 탐지 — 예측 불가능한 신규 위협에 적응하면서 응답 시간과 자원 효율 개선.
자기강화 보안 루프 — 탐지 → 차단 → 정책 업데이트 → 재탐지.
자동 정책 업데이트 — 위협 판단을 즉시 Gateway 정책에 반영.
관리자 알림 + 자동 차단 통합 — 운영 효율과 보안 신뢰성 향상.

기술 스택

SecurityAgentRAGReActLog PythonFastAPILangGraphLangChain Qdrant (VectorDB)

핵심 역할 & 성과

Secure Monitoring 과 Secure Gateway 에이전트를 LLM 기반 ReAct 프레임워크로 통합하여, 로그 기반 탐지부터 실시간 차단과 정책 강화까지 아우르는 완전 자동화 end-to-end 보안 시스템(Feedback System) 을 설계하고 구현했다.

⊙ Monitoring–Gateway 통합 아키텍처

자기강화 루프 설계: 초기 차단 → 로그 수집 → 이상 탐지 → 자동 대응.
Gateway–Monitoring 에이전트 간 양방향 피드백 메커니즘 구현.
LLM 기반 ReAct 추론과 정책 강화 흐름 개발.

⊙ LLM 기반 위협 탐지 및 대응

로그 요약과 위협 판단을 위한 LLM 프롬프트와 툴 설계.
Qdrant 벡터 DB 캐싱 으로 빠른 대응.
의심 요청 분류(IP, URL, User-Agent)를 few-shot 로 구축.
관리자 자동 알림 + Gateway 정책 자동 업데이트 구현.

트러블슈팅 1 — 에이전트 메모리(토큰 기반 메시지 관리)

사용자와 계속 메시지를 주고받는 일반 대화 시스템과 달리, 여기서는 서버가 시작 시 System/Human 메시지만 에이전트에 전달한 뒤 반복 루프로 동작한다. 이 과정에서 메시지가 쌓이면 초기 System/Human 컨텍스트가 결국 사라질 수 있다. 이를 막기 위해 메시지를 10,000-token 단위 로 분할하고, 모든 새 메시지가 System/Human 메시지를 명시적으로 포함하도록 하여 장기 운영 내내 컨텍스트를 일관되게 유지했다.

그림 02. 장기 운영 시 초기 컨텍스트 소실을 막기 위해 10,000-token 단위로 분할하고 매 메시지에 System/Human 을 포함

트러블슈팅 2 — 로그 파이프라인(LLM 에 긴 컨텍스트 전달)

전체 raw 로그를 그대로 LLM 에 보내면 토큰 한도를 넘길 위험이 있고, 반대로 raw 로그를 Vector DB 에 넣고 쿼리하면 로그의 흐름과 연속성 을 놓쳐 맥락이 끊긴다. 게다가 벡터 저장소 용량이 폭증한다 — text-embedding-3-large 로 13.3 MB 의 raw 로그를 임베딩하면 Vector DB 에서 914.2 MB 로 늘어난다.

Embedding model	위치	차원	Raw (log)	Vectorized	시간
text-embedding-3-large	External	3072	13.3 MB	914.2 MB	41m 30.6s
text-embedding-3-small	External	1536	13.3 MB	754.3 MB	20m 44.0s
all-MiniLM-L6-v2	Local	384	13.3 MB	625.6 MB	22m 53.0s

[표] 임베딩 모델별 저장 용량 팽창

그림 03. Message Broker 기반 파이프라인 — agent 가 Tool API 로 필요한 만큼만 로그를 가져와 LLM 에 전달

해결: 로그를 Message Broker 기반 파이프라인 으로 전달한다. 에이전트가 broker 에서 필요한 만큼만 polling 하고, polling 은 Tool API 로 구현했으며, 에이전트가 tool 을 호출하면서 요청 건수를 스스로 결정 한다 — 효율적인 컨텍스트 처리가 가능해졌다.

트러블슈팅 3 — 제한된 추론(Inference Tool)

ReAct 에이전트는 주로 tool 호출 에 초점이 맞춰져 자율적 추론 능력이 제한적이다. 중간 단계의 추론을 보강하기 위해 Inference Tool(think_aloud) 을 추가하여 에이전트가 독립적으로 사고할 수 있게 했다. 또한 tool 간 충돌(예: Tool 1 은 OK, Tool 2 는 Not OK 인데 마지막 tool 결과만 적용)이 생길 수 있는데, 이는 프롬프트 엔지니어링 으로 여러 tool 결과를 적절히 고려하도록 해결했다.

트러블슈팅 4 — 외부 API 지연 감소

로컬 임베딩 — SentenceTransformer 기반 임베딩 레이어를 로컬에 두어 외부 API 호출 의존을 최소화.
초기 판단 최적화 — LLM 이 1차 판단을 즉시 내릴 수 있으면 ReAct Graph 의 tool 을 거치지 않고 빠르게 직접 처리; 모호한 경우만 ReAct tool 로 넘긴다.