← Projects

AI Readable Data Platform (ARDP)

🧠

One lineAI Readable Data Platform (ARDP) turns a company's databases & documents into data an AI can read — an ontology / Virtual Knowledge Graph — and answers natural-language questions over it. As application architect & backend engineer I designed the cross-module data flow and built the structured-build, deploy, and inference pieces. (Ongoing — applied to the AI-SCM domain)

Duration
2026 – Present (ongoing)
Role
Application Architect & Backend Engineer
Org
SK Inc. AX · AI Innovation Lab
Tech
FastAPI · PostgreSQL · Ontop (VKG) · Trino · Neo4j · SPARQL · LangChain / LangGraph · Azure OpenAI · AWS · Docker / Kubernetes · React
Scope
5 of 6 modules — data_management · structured · inference · deploy-hooks · frontend (not unstructured)

Overview

ARDP converts structured (RDB/CSV) and unstructured (policy/SI documents) data into an ontology so an LLM can query it reliably. Structured DBs are exposed as a Virtual Knowledge Graph via Ontop (no data duplication) over a Trino federation of 5 heterogeneous DBs, and users ask questions in plain language that are translated to SPARQL. It is built as microservices around a single source of truth, and is applied to the AI-SCM (HR / SI workforce) domain.

Platform architecture (microservices) Frontend data_managementSoT · datasources·files·artifacts structured · ontology build unstructured · docs deploy-hooksartifacts → restart Ontop + TrinoVKG · 5-DB federation inference — NL → SPARQL / Cyphersemantic Q&A over the knowledge graph
Single source of truth feeds the build pipelines; deploy-hooks publish artifacts & bring up Ontop/Trino; inference answers questions over the graph.

My role & key contributions

① Application architecture — the data flow between modules. I designed the end-to-end pipeline as application architect: how data_management ingests and serves data as the single source of truth, how the structured service finishes an ontology build, and how deploy-hooks then fires to publish the artifacts and bring up Ontop / Trino so the new graph is immediately queryable.

Build → deploy → serve (the flow I designed) data_managementSoT 데이터 제공 structuredontology build 완료 deploy-hooksartifacts 배포 Ontop / Trino기동·갱신 queryable
structured 빌드가 끝나면 deploy-hooks 가 자동으로 산출물을 배포하고 Ontop/Trino 를 기동/갱신한다 — 이 트리거·흐름을 설계.
  • ② Ontop + Trino structure. Designed the serving layer — Trino federates 5 heterogeneous DBs (PostgreSQL · MySQL · MSSQL · Oracle · MongoDB) into one SQL surface, and Ontop exposes it as a SPARQL-queryable VKG.
  • ③ AWS infrastructure architecture — defined the cloud architecture in coordination with the infrastructure team.
  • ④ RDB → graph modeling methodology. Developed the PK/FK-based table-clustering method that decides how relational tables map to a property graph — classifying each table (Entity / Junction / …) from its primary/foreign keys & cardinality, combining rule-based classification with LLM correction.
  • ⑤ Semantic inference (catalog-grounded NL → SPARQL). Built the inference modes; Graph-Explorer and Graph-Hop are my own ideas — and Graph-Explorer performs notably well.
  • ⑥ AI-SCM golden set — building, testing, and tuning the evaluation/golden answer set that the platform is measured and improved against.
Inference modeHowBest for
One-ShotNL → 1 SPARQL → answer (schema + value catalog grounding)simple lookups
N-Shotdecompose into sub-questions → multiple SPARQL → synthesizecomplex / multi-condition
Graph-ExplorerLLM agent (LangGraph) explores the KnowledgeGraph and queries directlyrelationship-following · strong accuracy
Graph-Hopanchor + hop-expansion over very large schemaslarge schemas that don't fit one prompt
💡

Why a Virtual Knowledge Graph? Unlike Text-to-SQL (the LLM must read raw, cryptic DB schemas) or RAG (data must be pre-embedded and can't reflect live changes), the VKG puts a semantic layer + value catalog between the DB and the LLM, so the LLM writes SPARQL against clean business concepts while Ontop translates it to SQL on the live, federated database.


🏢 SK Inc. AX 재직 중 진행한 사내 프로젝트(AI Readable Data Platform)입니다 · 보안상 내부 데이터·자격증명·고객 정보는 제외하고 아키텍처·역할 중심으로 정리했습니다.