Tarek Masryo — profile photo

Tarek Masryo

AI/ML Engineer

I build ML and GenAI systems that ship, hold up in production, and give operators something they can actually act on.

Production ML and GenAI systems.

About Me

Production ML and GenAI systems built for evaluation, observability, reliable handoff, and real-world decision support.

Hi, I'm Tarek Masryo,

an AI/ML Engineer building practical machine learning and GenAI systems that connect data, models, software, and user-facing workflows into reliable products.

I design APIs, RAG evaluation workflows, telemetry views, internal tools, and decision-support dashboards that make model and GenAI outputs easier to validate, monitor, and operationalize.

What I deliver

  • End-to-end delivery: data → modeling → evaluation → deployment → monitoring
  • Reliability & observability by design: data contracts, schema validation, typed interfaces, telemetry, and failure-aware workflows
  • Evaluation-first workflows: metrics, calibration, thresholds, acceptance criteria, and grounded review loops
  • Production safeguards: data quality checks, drift signals, alerting, fallback paths, and incident-ready signals
  • Practical GenAI, RAG, and agentic workflows: retrieval, reranking, context pruning, tool orchestration, and grounded evaluation
  • Performance-aware systems: explicit cost/latency budgets, clear failure modes, and graceful fallbacks

Featured Work

Selected production AI systems, command centers, ML pipelines, datasets, apps, and decision-support projects.

Services

Production-ready AI systems built for reliability, evaluation, observability, and decision support.

Production ML Systems

Risk scoring, classification, and forecasting systems built with calibrated outputs, reliable evaluation, and deployment-ready workflows.

GenAI / RAG / Agentic Workflows

Grounded GenAI, RAG, and agentic systems with retrieval evaluation, structured outputs, tool orchestration, and practical safeguards for quality, latency, and failure handling.

API & Inference Services

Deployment-ready APIs with strict schemas, typed contracts, versioned artifacts, and observable runtime behavior.

Decision Apps & Analytics Interfaces

Interactive interfaces that turn model outputs into KPIs, review workflows, threshold controls, and clearer operational decisions.

Evaluation & Validation

Leakage-safe evaluation, threshold tuning, error analysis, and acceptance criteria for ML and GenAI systems.

Data Pipelines & Quality

Structured data preparation, validation rules, and reproducible pipelines that reduce silent failures and support safer iteration.

Monitoring & Model Health

Telemetry across quality, latency, cost, drift, and failures to support ongoing review and reliable production behavior.

Tech Stack

Core stack first. Expand to view the full stack.

Core Stack

Python
SQL
NumPy · Pandas
PostgreSQL · pgvector
scikit-learn
PyTorch
Hugging Face Transformers
FastAPI · Pydantic v2
Docker
Linux · Git
Streamlit · Plotly
MLflow
GitHub Actions
pytest · ruff

Working With Me

Frequently asked questions about delivery, evaluation, and handoff.

What kind of work do you usually take on?

I build production-minded AI/ML and GenAI systems: data pipelines, model evaluation workflows, API services, RAG systems, internal tools, automation workflows, and decision-support products.

How do you approach a new project?

I start by clarifying the decision or workflow the system needs to support, the available data, the risks, the users, and what “good enough to ship” means. Then I design a simple, testable workflow before adding complexity.

What do your deliverables usually look like?

Depending on the project: cleaned datasets, validation rules, analysis notebooks, trained models, evaluation reports, API services, monitoring views, Streamlit/Gradio apps, Docker setup, tests, documentation, and deployment-ready project structure. When a dashboard is useful, I build it as an operator-facing layer for decisions, not just charts.

What stack do you usually work with?

I mainly build Python-based AI systems using FastAPI, PyTorch, LangChain/LlamaIndex/LangGraph, vector stores, Docker, SQL, Streamlit/Gradio, MLflow, and CI-ready project structure. I choose tools based on reliability, maintainability, and deployment needs.

How do you keep AI/ML work reproducible?

I use clear project structure, pinned dependencies where needed, versioned artifacts, saved metrics, validation rules, configuration files, and tests. The goal is to make results easier to rerun, review, compare, and hand off.

What do you care about most in AI/ML projects?

Reliable evaluation, leakage-safe validation, reproducible artifacts, clear thresholds, structured outputs, monitoring signals, and maintainable delivery. I care more about useful systems than impressive demos.

Can you work with existing code or messy projects?

Yes. I can review an existing notebook, dashboard, API, prototype, or AI workflow, identify weak points, clean the structure, improve reliability, and turn it into something easier to run, test, extend, and hand off.

How do you handle GenAI and RAG quality?

I focus on grounded outputs, retrieval evaluation, source attribution, schema validation, guardrails, failure analysis, latency/cost tracking, and review workflows. The goal is to make quality visible instead of relying on vague demos.

What happens after delivery?

I aim to leave the project easy to run, understand, and extend: documented setup steps, clear assumptions, testable components, exported artifacts, and notes on known limits or next improvements.

What makes your work different?

I build for the operator, not just the model. The goal is a system someone can inspect, trust, and act on — not just a prototype that looks good in isolation.

Contact

Let's connect on LinkedIn. I'm open to production AI/ML and GenAI roles, consulting, and high-impact freelance engagements.