This portfolio uses JavaScript for project filtering, modal previews, and interactive navigation. Please enable JavaScript for the full experience.

Tarek Masryo

AI/ML Engineer

I build ML and GenAI systems that ship, hold up in production, and give operators something they can actually act on.

Production ML and GenAI systems.

Get In Touch Explore Projects

About Me

Production ML and GenAI systems built for evaluation, observability, reliable handoff, and real-world decision support.

Hi, I'm Tarek Masryo,

an AI/ML Engineer building practical machine learning and GenAI systems that connect data, models, software, and user-facing workflows into reliable products.

I design APIs, RAG evaluation workflows, telemetry views, internal tools, and decision-support dashboards that make model and GenAI outputs easier to validate, monitor, and operationalize.

What I deliver

End-to-end delivery: data → modeling → evaluation → deployment → monitoring
Reliability & observability by design: data contracts, schema validation, typed interfaces, telemetry, and failure-aware workflows
Evaluation-first workflows: metrics, calibration, thresholds, acceptance criteria, and grounded review loops
Production safeguards: data quality checks, drift signals, alerting, fallback paths, and incident-ready signals
Practical GenAI, RAG, and agentic workflows: retrieval, reranking, context pruning, tool orchestration, and grounded evaluation
Performance-aware systems: explicit cost/latency budgets, clear failure modes, and graceful fallbacks

Featured Work

Selected production AI systems, command centers, ML pipelines, datasets, apps, and decision-support projects.

Featured projects

Fraud Risk Ops Platform RAG QA Command Center LLMOps Telemetry Command Center Advanced ML Sentiment Lab Health Intelligence Platform Short Video Intelligence Dashboard

Services

Production-ready AI systems built for reliability, evaluation, observability, and decision support.

Production ML Systems

Risk scoring, classification, and forecasting systems built with calibrated outputs, reliable evaluation, and deployment-ready workflows.

GenAI / RAG / Agentic Workflows

Grounded GenAI, RAG, and agentic systems with retrieval evaluation, structured outputs, tool orchestration, and practical safeguards for quality, latency, and failure handling.

API & Inference Services

Deployment-ready APIs with strict schemas, typed contracts, versioned artifacts, and observable runtime behavior.

Decision Apps & Analytics Interfaces

Interactive interfaces that turn model outputs into KPIs, review workflows, threshold controls, and clearer operational decisions.

Evaluation & Validation

Leakage-safe evaluation, threshold tuning, error analysis, and acceptance criteria for ML and GenAI systems.

Data Pipelines & Quality

Structured data preparation, validation rules, and reproducible pipelines that reduce silent failures and support safer iteration.

Monitoring & Model Health

Telemetry across quality, latency, cost, drift, and failures to support ongoing review and reliable production behavior.

Tech Stack

Core stack first. Expand to view the full stack.

Core Stack

Python

SQL

NumPy · Pandas

PostgreSQL · pgvector

scikit-learn

PyTorch

Hugging Face Transformers

FastAPI · Pydantic v2

Docker

Linux · Git

Streamlit · Plotly

MLflow

GitHub Actions

pytest · ruff

Languages & Core

Systems + foundations

Python
SQL
C/C++
JavaScript
Linux · Git · Bash

Data & Analytics

Wrangling, validation, and visualization

NumPy · Pandas
Polars · DuckDB
Matplotlib · Seaborn · Plotly
Jupyter · Google Colab
Pandera

ML / Deep Learning

Modeling, training, and deployment

scikit-learn
XGBoost / LightGBM
PyTorch
TensorFlow / Keras
ONNX

GenAI / RAG / Agents

Retrieval, orchestration, and evaluation

Hugging Face Transformers · Fine-tuning
LangChain · LlamaIndex · LangGraph
Tool calling · Agent workflows · MCP
RAGAS · DeepEval
Ollama · vLLM

Data Stores & Retrieval

Storage, caching, and indexing

PostgreSQL · pgvector
Redis
FAISS
DuckDB
SQLite

Production APIs & Backend

Serving, validation, and jobs

FastAPI · Pydantic v2
SQLAlchemy · Alembic
Redis · RQ
REST APIs
Model Serving · ONNX Runtime

Dashboards & Interfaces

Interactive delivery

Streamlit
Gradio
Chainlit
Plotly
React.js

MLOps, Packaging & Observability

Packaging, tracking, and monitoring

Docker
MLflow
GitHub Actions · CI/CD
Prometheus · OpenTelemetry
Structured logging

Quality & Testing

Validation and tooling

pytest
ruff
mypy
Schema validation
Typed interfaces

Working With Me

Frequently asked questions about delivery, evaluation, and handoff.

What kind of work do you usually take on?

I build production-minded AI/ML and GenAI systems: data pipelines, model evaluation workflows, API services, RAG systems, internal tools, automation workflows, and decision-support products.

How do you approach a new project?

I start by clarifying the decision or workflow the system needs to support, the available data, the risks, the users, and what “good enough to ship” means. Then I design a simple, testable workflow before adding complexity.

What do your deliverables usually look like?

Depending on the project: cleaned datasets, validation rules, analysis notebooks, trained models, evaluation reports, API services, monitoring views, Streamlit/Gradio apps, Docker setup, tests, documentation, and deployment-ready project structure. When a dashboard is useful, I build it as an operator-facing layer for decisions, not just charts.

What stack do you usually work with?

I mainly build Python-based AI systems using FastAPI, PyTorch, LangChain/LlamaIndex/LangGraph, vector stores, Docker, SQL, Streamlit/Gradio, MLflow, and CI-ready project structure. I choose tools based on reliability, maintainability, and deployment needs.

How do you keep AI/ML work reproducible?

I use clear project structure, pinned dependencies where needed, versioned artifacts, saved metrics, validation rules, configuration files, and tests. The goal is to make results easier to rerun, review, compare, and hand off.

What do you care about most in AI/ML projects?

Reliable evaluation, leakage-safe validation, reproducible artifacts, clear thresholds, structured outputs, monitoring signals, and maintainable delivery. I care more about useful systems than impressive demos.

Can you work with existing code or messy projects?

Yes. I can review an existing notebook, dashboard, API, prototype, or AI workflow, identify weak points, clean the structure, improve reliability, and turn it into something easier to run, test, extend, and hand off.

How do you handle GenAI and RAG quality?

I focus on grounded outputs, retrieval evaluation, source attribution, schema validation, guardrails, failure analysis, latency/cost tracking, and review workflows. The goal is to make quality visible instead of relying on vague demos.

What happens after delivery?

I aim to leave the project easy to run, understand, and extend: documented setup steps, clear assumptions, testable components, exported artifacts, and notes on known limits or next improvements.

What makes your work different?

I build for the operator, not just the model. The goal is a system someone can inspect, trust, and act on — not just a prototype that looks good in isolation.

Contact

Let's connect on LinkedIn. I'm open to production AI/ML and GenAI roles, consulting, and high-impact freelance engagements.