Available for consulting

Zac Peterson

DevOps & Site Reliability Engineer | AI/ML Infrastructure Architect

Helping startups scale from zero to millions of users

Staff DevOps consultant specializing in AI platform engineering, HIPAA-compliant healthcare systems, and production ML pipelines. Building startup infrastructure, GPU clusters, and enterprise cloud architecture across AWS, GCP, and Azure.

San Francisco Bay Area
(858) 922-5573
Technical Expertise

Core Competencies

Deep expertise across the modern infrastructure stack, specializing in AI/ML platforms, healthcare cloud systems, and startup scalability

AI Infrastructure & MLOps

Designing production ML pipelines with GPU cluster management, LLM deployment, and model serving at scale for AI companies.

NVIDIA A100/H100vLLMRay ServeTritonKubeflowMLflow

Kubernetes & Container Orchestration

Certified Kubernetes Administrator (CKA) with expertise in cloud-native application deployment and service mesh architecture.

K8s/EKS/GKE/AKSHelmIstioArgoCDKustomizeOperators

Infrastructure as Code

HashiCorp Terraform Associate certified. Building reproducible, version-controlled infrastructure for rapid startup scaling.

TerraformPulumiCloudFormationAnsibleCrossplane

Cloud Architecture

AWS Solutions Architect and Google Cloud Professional certified. Multi-cloud architecture from seed-stage to Series C startups.

AWSGCPAzureSageMakerVertex AIHybrid Cloud

DevOps & Platform Engineering

Building developer experience platforms with GitOps practices for zero-to-production deployment velocity.

GitHub ActionsGitLab CIJenkinsFluxSpinnaker

Site Reliability Engineering

Ensuring 99.99% uptime through SLOs, error budgets, chaos engineering, and incident management best practices.

SREChaos EngineeringLoad TestingCapacity Planning

Observability & Monitoring

Full-stack observability with metrics, logs, and distributed tracing for complex microservices architectures.

PrometheusGrafanaDatadogOpenTelemetryELK Stack

Healthcare & HIPAA Compliance

Building secure, HIPAA-compliant infrastructure for digital health platforms with PHI protection, BAA compliance, and FHIR/HL7 integration.

HIPAA BAAPHI SecuritySOC2FHIR APIHL7Audit Logging
Industry Recognized

Professional Certifications

Validated expertise in the most sought-after cloud and DevOps technologies

CKA

Certified Kubernetes Administrator

Cloud Native Computing Foundation (CNCF)

Expert-level certification validating skills in Kubernetes cluster administration, troubleshooting, and security.

CKS

Certified Kubernetes Security Specialist

Cloud Native Computing Foundation (CNCF)

Advanced certification demonstrating expertise in securing Kubernetes clusters and container-based applications.

AWS SA Pro

AWS Solutions Architect Professional

Amazon Web Services

Professional-level certification for designing distributed systems and complex AWS architectures.

GCP Architect

Google Cloud Professional Cloud Architect

Google Cloud

Professional certification for designing, developing, and managing Google Cloud solutions.

Terraform

HashiCorp Terraform Associate

HashiCorp

Certification validating Infrastructure as Code skills using Terraform across multi-cloud environments.

Technology Arsenal

Tools & Technologies

Production-tested technologies for building scalable infrastructure across AI platforms, healthcare systems, and high-growth startups

Kubernetes

Container Orchestration

AWS

Cloud Provider

GCP

Cloud Provider

Azure

Cloud Provider

Terraform

Infrastructure as Code

Pulumi

Infrastructure as Code

PostgreSQL

Database

Redis

In-Memory Cache

Prometheus

Observability

Grafana

Visualization

Datadog

Observability

ArgoCD

Continuous Delivery

Vault

Secrets Management

Istio

Service Mesh

NVIDIA GPU

AI/ML Acceleration

Docker

Containerization

SageMaker

MLOps Platform

Vertex AI

MLOps Platform

vLLM

LLM Serving

HIPAA

Healthcare Security

99.99%
Uptime SLA Achieved
500+
Daily Deployments
10M+
Users Served
Career Journey

Professional Experience

8+ years building and scaling enterprise cloud infrastructure across healthcare startups, AI companies, Fortune 500 enterprises, and high-growth DTC brands

Staff DevOps Engineer - AI/ML Infrastructure

Homeward Health

Healthcare Startup | Digital Health Platform

Jan 2025 - Present
  • Reduced deployment cycles by 70% by engineering automated CI/CD pipelines with AWS CodePipeline and CodeBuild for containerized Python, Go, and SQL microservices
  • Designed HIPAA-compliant Kubernetes infrastructure on AWS EKS with secure ML pipelines leveraging SageMaker and encryption at rest and in transit for PHI protection
  • Architected FHIR-compliant data ingestion and transformation workflows using AWS Kinesis, Lambda, and Redshift for processing HL7 messages at scale
  • Achieved 45% reduction in cloud costs through intelligent resource scaling, spot instance utilization, and comprehensive CloudWatch monitoring
  • Established model versioning, experiment tracking, and automated retraining workflows using SageMaker and MLflow for production ML pipelines
AWSEKSSageMakerHIPAAFHIRHL7PythonGo

Staff DevOps Architect - AI/ML Infrastructure

BP

Fortune 500 | Enterprise AI Platform

Jan 2024 - Dec 2024
  • Reduced infrastructure costs by 40% by architecting GKE and AKS clusters with GPU acceleration for 500+ concurrent AI/ML workloads
  • Cut model deployment time from weeks to hours by building automated MLOps pipelines with Vertex AI and custom model serving infrastructure
  • Achieved 99.9% uptime by establishing observability with Prometheus, Grafana, ELK Stack and GitOps workflows
  • Enabled processing of 10TB+ daily data by designing transformation pipelines with Apache Spark and Databricks, managing 1000+ ML models in production
  • Reduced MTTR from 4 hours to 15 minutes and ensured SOC2 compliance by leading Site Reliability Engineering practices
GKEAKSGPUVertex AIPrometheusGrafanaSparkDatabricks

Senior Cloud Architect

Vuori

DTC E-commerce | High-Traffic Retail Platform

Feb 2023 - Jan 2024
  • Reduced cloud costs by 60% by restructuring and automating resources across GCP and Azure using Terraform Infrastructure as Code
  • Architected and deployed Kubernetes clusters on GCP GKE and Azure AKS, utilizing Vertex AI for scalable ML workflows and demand forecasting
  • Designed hybrid networking between on-premises and cloud, building ETL pipelines for data migration supporting Black Friday traffic scaling
  • Containerized legacy applications and migrated workloads to GCP and Azure with automated deployment pipelines for rapid iteration
TerraformGCPAzureKubernetesVertex AIPythonGo

Senior DevOps Engineer

Wickfire LLC

Legal Tech Startup | SaaS Platform

Jan 2021 - Feb 2023
  • Reduced infrastructure downtime by 20% by designing and scaling backend IT infrastructure with comprehensive SLAs and SLOs
  • Accelerated software delivery by 30% by developing automated GitLab CI/CD pipelines on GCP for rapid startup iteration
  • Authored reusable Terraform modules for automated provisioning of GCP, Azure, and Vertex AI resources enabling zero-to-production deployment
  • Implemented advanced monitoring and alerting with GCP Operations Suite and Azure Monitor for full-stack observability
GitLab CITerraformGCPAzurePythonGoStackdriver

Managed Services Practice Lead

Redapt Inc.

Cloud Consulting | Enterprise Solutions

Jan 2019 - Jan 2021
  • Reduced cloud expenditure by 60% by restructuring and optimizing GCP and Azure resources through Terraform and Infrastructure as Code
  • Architected GKE and AKS clusters for startups and enterprises, containerizing applications and automating deployment pipelines
  • Built CI/CD pipelines with GitLab and GCP Cloud Build, orchestrating deployments with Python, Go, and Terraform
  • Maintained ISO, HIPAA, and FedRAMP compliant infrastructure for healthcare and government clients with sensitive data requirements
GKEAKSTerraformHIPAAFedRAMPGitLabCloud Build

Solutions Architect

Rapid Domains Inc.

Web Hosting | Multi-Cloud Infrastructure

Jan 2016 - Dec 2018
  • Improved cloud infrastructure reliability for 200+ clients by designing scalable solutions in hybrid Azure and GCP environments
  • Reduced manual operations workload by 40% by developing automation scripts in Python and Go for startup clients
  • Enhanced database performance for mission-critical workloads by optimizing SQL queries and implementing automated monitoring
AzureGCPPythonGoSQLAutomation
Get in Touch

Let's Build Something Great

Looking to improve system reliability, performance, or scalability? I offer free technical consultations to help optimize your infrastructure and deployment processes.

Contact Information

Email

zac@zacp.xyz

Phone

(858) 922-5573

Location

San Francisco, CA

Availability

Free consultations available

Connect Online

Send a Message