Stephen Kim

I build

Five years taking machine learning and generative AI from idea to production — evaluation frameworks with Microsoft Research, forecasting that saved a manufacturer A$1.2M, RAG applications used daily by 500+ people, and agentic pipelines that create video content end-to-end.

SYS.STATUS ONLINE PAPERS 2 SAVED A$1.2M UPTIME 99.9%
PORTRAIT FEED OFFLINE
— set CONFIG.profilePhoto —
FIG. 01 — LIVE · systems designed & shipped to production

AgentEval

GenAI agent evaluation
w/ Microsoft Research · +20% vs G-Eval

TFT Forecaster

Steel-price prediction · GVTL
~A$1.2M saved over 5 yrs

RAG Text-to-SQL

LangChain · Azure
−40% query time · 500+ users

Video Pipeline

Agentic AI · LLM scripts,
voice synthesis, animation

PRODUCTION

Monitored · evaluated ·
used in the real world
FIG. 02 — SELECTED IMPACT

Numbers that held up in the real world

A$0.0M
Cost savings · GVTL

Inventory pipeline + Temporal Fusion Transformer steel-price forecasting for a Vietnamese manufacturer, across five years of live operation.

+0%
vs Google G-Eval

AgentEval, co-developed with Microsoft Research at Telstra — a production evaluation framework for non-deterministic AI agents.

−0%
Query time · 500+ users

Production RAG Text-to-SQL on Azure with LangChain, prompt engineering and ADK agent orchestration.

A$0K
First Prize · CSIRO

Won a CSIRO innovation program with an AI-powered app supporting plant-based meat and sustainable consumption.

0
First-author papers

Peer-reviewed publications on AI evaluation and AI-driven productivity (Springer CCIS; CS&IT).

FIG. 03 — EXPERIENCE

Where the work happened

MAR 2024 — SEP 2024

Data and AI Specialist

Telstra · Brisbane
  • Co-developed AgentEval with Microsoft Research — beat Google's G-Eval baseline by 20% on agent reliability metrics.
  • Architected a production RAG Text-to-SQL app (LangChain, prompt engineering, ADK) on Azure for 500+ enterprise users.
  • Designed the PGI data-quality framework (35% accuracy gains); production monitoring & observability; co-authored 2 papers.
2022 — 2026

Machine Learning Engineer / Data Consultant

GVTL · Vietnam (remote)
  • Built the internal pipeline managing stainless-steel inventory — stock tracking, demand planning, procurement support.
  • Developed a Temporal Fusion Transformer model forecasting steel prices — ~A$1.2M saved across five years.
  • Owned the full lifecycle in Python: ingestion, features, training, deployment, monitoring of a live operational system.
JUL 2022 — PRESENT

Casual Lecturer · Subject Coordinator (DATA5000)

Kaplan Business School · QUT · JCU · CQU
  • Designed curriculum, assessments and rubrics for a postgraduate AI programming unit as Subject Coordinator.
  • Lectured 13+ units across ML, deep learning, NLP, statistics and data communication; 2024 QUT Teaching Advantage Program.
JAN 2023 — MAY 2023

Machine Learning Engineer

Prooftec · Sydney (remote)
  • Production ML pipelines on AWS with dbt SQL transformations; Docker, CI/CD, automated retraining and monitoring.
OCT 2021 — OCT 2022

Data Scientist

Orefox · Brisbane
  • Predictive models on geospatial data (+20% viable site identification); NLP search engine over regulatory documents (−30% retrieval time).
NOV 2021 — SEP 2022

Junior Software Developer

Explorate · Brisbane
  • Production REST APIs on AWS — 10K+ daily requests at 99.9% uptime; document OCR automation with Textract.
NOV 2021 — AUG 2022

Research Assistant (×2 contracts)

QUT · Brisbane
  • Reinforcement learning in recommender systems; deep matrix factorisation for social-media text classification.
FIG. 04 — CAPABILITIES

Tools of the trade

AI / GenAI

LLMsRAGAgentic AI LangChainPydanticADK Prompt engineeringAI evaluation Hallucination mitigationResponsible AI

Machine Learning

PyTorchTensorFlowScikit-learn TFT / time-seriesGradient boosting NLPDeep learningRecommender systems

Engineering & Cloud

PythonSQLRC/C++ AzureAWSDocker CI/CDMLflowdbtPySpark

Data & Communication

PostgreSQLMongoDBPower BI TableauDimensional modelling Teaching & storytelling
FIG. 05 — PERSONAL PROJECT

The machine that makes the videos

Founder & Creator · 2025 — present

AI-Powered YouTube Content Channel

An agentic AI pipeline that produces video content end-to-end — no camera crew, no editor, no studio. One orchestrated system, from idea to upload.

  • LLM script generation — story and narration written by language models with structured prompting.
  • Voice synthesis — natural narration generated programmatically.
  • Automated animation — visuals rendered by code and composited automatically.
  • Tool orchestration — every stage wired together in Python as a single production pipeline.
NO SIGNAL PROJECT PHOTO FEED
— set CONFIG.projectPhoto —
NO SIGNAL VIDEO FEED
— set CONFIG.youtubeVideoId —
FIG. 06 — PUBLICATIONS (FIRST AUTHOR)

On the record

SPRINGER CCIS VOL. 2325 · AUSDM 2024 · DOI: 10.1007/978-981-95-6888-8_1

Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content

Vu, T., Nayak, R., & Balasubramaniam, T. (2026) — can AI agents be trusted to judge AI output? A framework and evidence.

CS & IT VOL. 14, NO. 24 · DOI: 10.5121/csit.2024.142402

Improved Productivity with AI Models for SQL Tasks: A Case Study

Vu, T., Keretna, S., Nayak, R., & Balasubramaniam, T. (2024) — measuring real productivity gains when AI assists enterprise SQL work.

FIG. 07 — EDUCATION & RECOGNITION

Foundations

APR 2022 — 2026 (LODGED)

PhD, Computer Science — QUT

QUT RPA Stipend Scholarship (competitive merit-based). Thesis on responsible AI and recommender systems — five PyTorch architectures including an agentic LLM with hallucination mitigation. Supervisor: Prof. Richi Nayak.

FEB 2020 — OCT 2021

Master of Information Technology — QUT

GPA 6.5 / 7.0 (High Distinction) · Dean's List 2020 & 2021.

First Prize & A$20,000 — CSIRO innovation program, AI app for plant-based meat & sustainable consumption.

2024 QUT Teaching Advantage Program — professional teaching certification.