AI Readable Data Platform (ARDP)

One line — AI Readable Data Platform (ARDP) turns a company's databases & documents into data an AI can read — an ontology / Virtual Knowledge Graph — and answers natural-language questions over it. As application architect & backend engineer I designed the cross-module data flow and built the structured-build, deploy, and inference pieces. (Ongoing — applied to the in-house HR system)

Duration: March 2026 – Present (ongoing)
Role: Application Architect & Backend Engineer
Org: SK Inc. AX, AI Innovation Lab
Client: In-house systems first, then external expansion
Tech: FastAPI, PostgreSQL, Ontop (VKG), Trino, Neo4j, SPARQL, LangChain / LangGraph, Azure OpenAI, AWS, Docker / Kubernetes, React, Claude Code (AI pair programming)
Scope: 5 of 6 modules — data_management, structured, inference, deploy-hooks, frontend (not unstructured). The frontend is being developed via AI coding with Claude Code.

Overview

ARDP converts structured (RDB/CSV) and unstructured (policy/SI documents) data into an ontology so an LLM can query it reliably. Structured DBs are exposed as a Virtual Knowledge Graph via Ontop (no data duplication) over a Trino federation of 5 heterogeneous DBs, and users ask questions in plain language that are translated to SPARQL. It is built as microservices around a single source of truth, and is applied to the in-house HR system (HR / SI workforce domain).

Figure 01. Single source of truth feeds the build pipelines; deploy-hooks publish artifacts & bring up Ontop/Trino; inference answers questions over the graph.

My role & key contributions

① Application architecture — the data flow between modules. I designed the end-to-end pipeline as application architect: how data_management ingests and serves data as the single source of truth, how the structured service finishes an ontology build, and how deploy-hooks then fires to publish the artifacts and bring up Ontop / Trino so the new graph is immediately queryable.

Figure 02. When the structured build finishes, deploy-hooks automatically publishes the artifacts and starts/refreshes Ontop/Trino — I designed this trigger & flow.

② Ontop + Trino structure. Designed the serving layer — Trino federates 5 heterogeneous DBs (PostgreSQL, MySQL, MSSQL, Oracle, MongoDB) into one SQL surface, and Ontop exposes it as a SPARQL-queryable VKG.
③ AWS infrastructure architecture — defined the cloud architecture in coordination with the infrastructure team.
④ RDB → graph modeling methodology. Developed the PK/FK-based table-clustering method that decides how relational tables map to a property graph — classifying each table (Entity / Junction / …) from its primary/foreign keys & cardinality, combining rule-based classification with LLM correction.
⑤ Semantic inference (catalog-grounded NL → SPARQL). Built the inference modes; Graph-Explorer and Graph-Hop are my own ideas — and Graph-Explorer performs notably well.
⑥ HR-domain golden set — building, testing, and tuning the evaluation/golden answer set that the platform is measured and improved against.
⑦ Unified datasource management. Built a structure that connects and manages heterogeneous datasources dynamically without service restarts, and led inter-service API integration and infrastructure setup.

Inference mode	How	Best for
One-Shot	NL → 1 SPARQL → answer (schema + value catalog grounding)	simple lookups
N-Shot	decompose into sub-questions → multiple SPARQL → synthesize	complex / multi-condition
Graph-Explorer ★	ReAct-style LLM agent (LangGraph) explores the KnowledgeGraph and queries directly	relationship-following, strong accuracy
Graph-Hop	anchor + hop-expansion over very large schemas	large schemas that don't fit one prompt

Why a Virtual Knowledge Graph? Unlike Text-to-SQL (the LLM must read raw, cryptic DB schemas) or RAG (data must be pre-embedded and can't reflect live changes), the VKG puts a semantic layer + value catalog between the DB and the LLM, so the LLM writes SPARQL against clean business concepts while Ontop translates it to SQL on the live, federated database.

Results

Unified cross-system data into a knowledge graph, eliminating data silos, and delivered natural-language graph querying (Graph-Explorer).
Passed 116 of 117 real-world validation cases (99%) — earning a "ready to open" assessment from the business-side PM.
Confirmed for adoption as a chat-style service on the HR-information and project systems, with phased expansion to company-wide systems.

한 줄 요약 — AI Readable Data Platform (ARDP) 는 기업의 DB와 문서를 AI가 읽을 수 있는 데이터(온톨로지 / 가상 지식그래프)로 바꾸고 자연어로 질의하고 추론하는 플랫폼이다. 애플리케이션 아키텍트 겸 백엔드 엔지니어로서 모듈 간 데이터 flow 를 설계하고 정형 빌드, 배포, 추론 영역을 개발했다. (진행 중 — 사내 인사시스템 적용)

기간: 2026년 3월 – 현재 (진행 중)
역할: 애플리케이션 아키텍트, 백엔드 엔지니어
소속: SK Inc. AX, AI Innovation Lab
고객: 사내 시스템 적용 우선, 이후 대외 확장
기술: FastAPI, PostgreSQL, Ontop (VKG), Trino, Neo4j, SPARQL, LangChain / LangGraph, Azure OpenAI, AWS, Docker / Kubernetes, React, Claude Code (AI 페어 프로그래밍)
관여: 6개 모듈 중 5개 — data_management, structured, inference, deploy-hooks, frontend (unstructured 제외). frontend 는 Claude Code 기반 AI 코딩으로 개발 중.

개요

ARDP 는 정형(RDB/CSV)과 비정형(정책/SI 문서) 데이터를 온톨로지로 변환해 LLM 이 안정적으로 질의할 수 있게 한다. 정형 DB 는 Trino 로 5개 이기종 DB 를 연합한 뒤 Ontop 으로 가상 지식그래프(VKG) 로 노출(데이터 복제 없음)하고, 사용자는 자연어로 물으면 SPARQL 로 변환되어 실행된다. 단일 진실원(SoT) 중심의 마이크로서비스로 구성되며 사내 인사시스템(인사/SI 인력 도메인)에 적용 중이다.

그림 01. 단일 진실원이 빌드 파이프라인에 정보를 공급하고, deploy-hooks 가 산출물 배포와 Ontop/Trino 기동, inference 가 그 위에서 질의에 답한다.

담당 역할 & 핵심 기여

① 애플리케이션 아키텍처 — 모듈 간 데이터 flow. 앱 아키텍트로서 전체 파이프라인을 설계했다. data_management 가 데이터를 어떻게 처리해 단일 진실원으로 제공하는지, structured 에서 온톨로지 빌드가 끝났을 때 deploy-hooks 가 어떻게 작동해 산출물을 배포하고 Ontop / Trino 를 기동시켜 새 그래프를 바로 질의 가능하게 만드는지 — 이 트리거와 흐름 전체를 잡았다.

그림 02. structured 빌드가 끝나면 deploy-hooks 가 자동으로 산출물을 배포하고 Ontop/Trino 를 기동/갱신 — 이 트리거와 흐름을 설계.

② Ontop + Trino 구조. 서빙 레이어를 설계 — Trino 로 5개 이기종 DB(PostgreSQL, MySQL, MSSQL, Oracle, MongoDB)를 하나의 SQL 로 연합하고, Ontop 으로 SPARQL 질의 가능한 VKG 로 노출.
③ AWS 인프라 아키텍처 — 인프라팀과 협의해 클라우드 아키텍처를 설계.
④ RDB → 그래프 모델링 방법론. 관계형 테이블을 프로퍼티 그래프로 어떻게 매핑할지 결정하는 PK/FK 기반 테이블 클러스터링 방법론을 개발 — 기본키/외래키와 카디널리티로 각 테이블을 Entity/Junction 등으로 분류(규칙 기반 + LLM 보정).
⑤ 시멘틱 추론 (카탈로그 기반 자연어 → SPARQL). 추론 모드들을 개발했고, Graph-Explorer 와 Graph-Hop 은 직접 고안하여 개발 — 특히 Graph-Explorer 가 상당히 잘 동작한다.
⑥ 인사 도메인 골든셋 — 플랫폼을 측정하고 개선하는 기준이 되는 평가/정답셋을 구축, 테스트, 튜닝.
⑦ 데이터소스 통합 관리 체계. 서비스 재기동 없이 이기종 데이터소스를 동적으로 연결하고 관리하는 구조를 구축하고, 서비스 간 API 연동 및 인프라 구성을 주도.

추론 모드	방식	적합
One-Shot	자연어 → SPARQL 1개 → 답변 (스키마 + 값 카탈로그 grounding)	단순 조회
N-Shot	하위 질문 분해 → 여러 SPARQL → 결과 종합	복합, 다중 조건
Graph-Explorer ★	ReAct 기반 LLM 에이전트(LangGraph)가 KnowledgeGraph 를 탐색하며 직접 질의	관계 추적, 정확도 우수
Graph-Hop	앵커 + hop 확장으로 대규모 스키마 처리	한 프롬프트에 안 들어가는 큰 스키마

왜 가상 지식그래프(VKG)인가? Text-to-SQL(LLM 이 난해한 원시 DB 스키마를 직접 봐야 함), RAG(미리 임베딩해야 하고 실시간 변경 반영 불가)와 달리, VKG 는 DB 와 LLM 사이에 시맨틱 레이어 + 값 카탈로그를 둔다. LLM 은 깔끔한 비즈니스 개념으로 SPARQL 을 쓰고, Ontop 이 이를 실시간 연합 DB 의 SQL 로 변환한다.

성과

시스템 간 데이터를 지식그래프로 통합해 데이터 사일로를 해소하고, 자연어 기반 그래프 조회(Graph-Explorer)를 구현.
실무 검증셋 117건 중 116건 통과 (99%) — 현업 PM 으로부터 '오픈 가능' 평가 확보.
인사정보와 프로젝트 시스템에 채팅형 서비스로 적용 확정, 이후 전사 시스템으로 단계적 확장.

SK Inc. AX 재직 중 진행한 사내 프로젝트(AI Readable Data Platform)입니다. 보안상 내부 데이터, 자격증명, 고객 정보는 제외하고 아키텍처와 역할 중심으로 정리했습니다.