Production ML Systems
Risk scoring, classification, and forecasting systems built with calibrated outputs, reliable evaluation, and deployment-ready workflows.
I build ML and GenAI systems that ship, hold up in production, and give operators something they can actually act on.
Production ML and GenAI systems built for evaluation, observability, reliable handoff, and real-world decision support.
Hi, I'm Tarek Masryo,
an AI/ML Engineer building practical machine learning and GenAI systems that connect data, models, software, and user-facing workflows into reliable products.
I design APIs, RAG evaluation workflows, telemetry views, internal tools, and decision-support dashboards that make model and GenAI outputs easier to validate, monitor, and operationalize.
What I deliver
Selected production AI systems, command centers, ML pipelines, datasets, apps, and decision-support projects.
Production-ready AI systems built for reliability, evaluation, observability, and decision support.
Risk scoring, classification, and forecasting systems built with calibrated outputs, reliable evaluation, and deployment-ready workflows.
Grounded GenAI, RAG, and agentic systems with retrieval evaluation, structured outputs, tool orchestration, and practical safeguards for quality, latency, and failure handling.
Deployment-ready APIs with strict schemas, typed contracts, versioned artifacts, and observable runtime behavior.
Interactive interfaces that turn model outputs into KPIs, review workflows, threshold controls, and clearer operational decisions.
Leakage-safe evaluation, threshold tuning, error analysis, and acceptance criteria for ML and GenAI systems.
Structured data preparation, validation rules, and reproducible pipelines that reduce silent failures and support safer iteration.
Telemetry across quality, latency, cost, drift, and failures to support ongoing review and reliable production behavior.
Core stack first. Expand to view the full stack.
Systems + foundations
Wrangling, validation, and visualization
Modeling, training, and deployment
Retrieval, orchestration, and evaluation
Storage, caching, and indexing
Serving, validation, and jobs
Interactive delivery
Packaging, tracking, and monitoring
Validation and tooling
Frequently asked questions about delivery, evaluation, and handoff.
I build production-minded AI/ML and GenAI systems: data pipelines, model evaluation workflows, API services, RAG systems, internal tools, automation workflows, and decision-support products.
I start by clarifying the decision or workflow the system needs to support, the available data, the risks, the users, and what “good enough to ship” means. Then I design a simple, testable workflow before adding complexity.
Depending on the project: cleaned datasets, validation rules, analysis notebooks, trained models, evaluation reports, API services, monitoring views, Streamlit/Gradio apps, Docker setup, tests, documentation, and deployment-ready project structure. When a dashboard is useful, I build it as an operator-facing layer for decisions, not just charts.
I mainly build Python-based AI systems using FastAPI, PyTorch, LangChain/LlamaIndex/LangGraph, vector stores, Docker, SQL, Streamlit/Gradio, MLflow, and CI-ready project structure. I choose tools based on reliability, maintainability, and deployment needs.
I use clear project structure, pinned dependencies where needed, versioned artifacts, saved metrics, validation rules, configuration files, and tests. The goal is to make results easier to rerun, review, compare, and hand off.
Reliable evaluation, leakage-safe validation, reproducible artifacts, clear thresholds, structured outputs, monitoring signals, and maintainable delivery. I care more about useful systems than impressive demos.
Yes. I can review an existing notebook, dashboard, API, prototype, or AI workflow, identify weak points, clean the structure, improve reliability, and turn it into something easier to run, test, extend, and hand off.
I focus on grounded outputs, retrieval evaluation, source attribution, schema validation, guardrails, failure analysis, latency/cost tracking, and review workflows. The goal is to make quality visible instead of relying on vague demos.
I aim to leave the project easy to run, understand, and extend: documented setup steps, clear assumptions, testable components, exported artifacts, and notes on known limits or next improvements.
I build for the operator, not just the model. The goal is a system someone can inspect, trust, and act on — not just a prototype that looks good in isolation.
Let's connect on LinkedIn. I'm open to production AI/ML and GenAI roles, consulting, and high-impact freelance engagements.