Tarek Masryo

AI/ML Engineer

I build AI systems for production: APIs, dashboards, and monitoring to improve operations, reduce risk, and scale reliably.

End-to-end production AI systems

Get In Touch Explore Projects

About Me

Production AI systems: APIs, dashboards, and monitoring, built for evaluation, reliability, and observability.

Hi, I’m Tarek Masryo,

an AI/ML Engineer building decision-ready, evaluation-first systems teams can rely on.

I ship measurable impact through APIs, dashboards, and monitoring.

What I deliver

End-to-end delivery: data → modeling → deployment → monitoring
Reliability & observability by design: data contracts, schema validation, typed interfaces, telemetry
Evaluation-first decisions: metrics, calibration, thresholds, acceptance criteria tied to outcomes
Production safeguards: data quality checks, drift signals, alerting, incident-ready signals
GenAI/RAG when it adds value: retrieval, reranking, context pruning, grounded evaluation
Performance by design: explicit cost/latency budgets, clear failure modes, graceful fallbacks

Featured Projects

Decision-ready AI systems with deployable artifacts, evaluation, and monitoring.

Services

Production deliverables built for reliability, evaluation, and observability.

Production ML Systems

Prediction and risk-scoring systems shipped as deployable APIs with calibrated thresholds, operating policies, and success criteria.

GenAI / RAG Systems

Grounded RAG and tool-calling workflows with retrieval evaluation, schema-validated outputs, and pragmatic safeguards (timeouts, retries, fallbacks).

Dashboards & Decision Analytics

Decision-ready dashboards that translate model outputs into actions, KPIs, and human review workflows.

Data Pipelines & Contracts

Robust data ingestion with schema contracts and validation to prevent silent failures and regressions.

Monitoring & Model Health

Incident-ready telemetry covering quality signals, drift indicators, latency/cost, and alerting to catch issues early.

Tech Stack

Core stack first. Expand to view the full stack.

Core Stack

Python

SQL

NumPy · Pandas

PostgreSQL · pgvector

scikit-learn

PyTorch

HF Transformers

FastAPI

Pydantic

Docker

Linux · Git

GitHub Actions

MLflow

Pytest

Languages & Core

Systems + Scripting

Python
SQL
C/C++
JavaScript
Linux · Git · Bash

Data & Analytics

Wrangling + Notebooks

NumPy · Pandas
Polars · DuckDB
Matplotlib · Seaborn · Plotly
Jupyter · Google Colab
Pandera (data validation)

ML / Deep Learning

Modeling + Training

scikit-learn
PyTorch
XGBoost
LightGBM
TensorFlow

GenAI / RAG

Retrieval + Tool Use

HF Transformers
LangChain · LlamaIndex
Tool calling
RAG evaluation: RAGAS · DeepEval
Schema-validated outputs

Vector Stores & Databases

Storage + Retrieval

PostgreSQL · pgvector
MySQL · SQLite
MongoDB · Redis
FAISS · Chroma
Pinecone · Weaviate

Production APIs

Serving + Validation

FastAPI
Pydantic
Uvicorn
Gunicorn

Dashboards & Apps

Interactive Delivery

Streamlit
Plotly
Gradio
Hugging Face Spaces
Streamlit Community Cloud

MLOps & Deployment

Packaging + Release

Docker
GitHub Actions (CI/CD)
MLflow
Kubernetes · Terraform
Observability: OpenTelemetry

Quality & Testing

Testing and tooling

pytest
ruff
pre-commit
mypy

Working With Me

Frequently asked questions about delivery, evaluation, and support.

What do you build?

Production ML and GenAI systems for real-world decision-making—risk scoring, forecasting, NLP/CV, and grounded RAG/agent workflows—chosen to fit practical constraints.

How do you choose the right AI approach?

I pick the simplest approach that meets the target: classical ML, deep learning, or LLM/RAG only when it adds measurable value under cost, latency, and reliability constraints.

Can you take an AI idea into production?

Yes. I handle end-to-end delivery: data pipelines, modeling, evaluation, backend APIs, deployment, and handoff—built to run reliably, not as a demo.

How do you evaluate models and GenAI workflows?

Leakage-safe evaluation, calibration and thresholding tied to cost/risk, plus error slicing and automated regression tests to catch failure modes before and after release.

What do you prioritize in production work?

Reliability and reproducibility: data contracts, schema validation, typed interfaces, versioned artifacts, and monitoring signals that surface drift and performance decay.

What can teams expect in your work output?

Clean, maintainable code with clear structure, tests where it matters, and documentation that captures assumptions, decisions, and how to run or extend the system.

How do engagements typically run?

Scope alignment → a written plan (risks, milestones, success criteria) → iterative delivery in focused sprints with validation checkpoints → release and stabilization.

Is post-delivery support available?

Yes. Optional post-delivery support focused on stability, performance, monitoring, and safe iteration as requirements and data evolve.

Contact

Let’s connect on LinkedIn. Open to production AI roles and high-impact freelance engagements.