Great on Implement LRU Cache Algorithm, Difficulty Medium
Great on Design a High-Availability, Compliant EKS Deployment, Difficulty Medium
Great on Design a Scalable and Compliant SaaS Platform Architecture, Difficulty Hard
Reduced GCP networking costs by $1,655/month on an Education App by diagnosing Tel Aviv anycast misrouting and migrating to Bunny CDN with a validated staging cutover.
Architected staging and production environments on GCP using Cloud Armor (WAF), Google Cloud Load Balancer, and Managed Instance Groups for autoscaling and high availability.
Managed AWS environments (EC2, EKS, S3, RDS, SQS, Lambda) including stateful Kubernetes workloads with persistent volume claims and StatefulSets.
Built multi-stage GitHub Actions CI/CD pipelines (dev → staging → prod) using IAP tunnels for secure deployment to private VPC resources.
Deployed a self-hosted LLM inference stack (Ollama + Docker Compose, NVIDIA L4) with a systemd idle-monitor to auto-shutdown GPU VMs after 15 minutes of inactivity, reducing idle compute to near zero.
Deployed Python scripts as Cloud Run scheduled jobs for cron-like automation tasks.
Built and operated a hybrid observability stack: self-hosted Prometheus + Grafana + Loki on GCP with Slack alerting, alongside Datadog APM for application performance and ServiceNow integration for ITSM workflows.
Authored and maintained modular Terraform configurations for reproducible infrastructure provisioning across AWS and GCP environments.
Ensured compliance with SOC 2 and HIPAA through access control audits, encryption enforcement, secrets management, and policy reviews; implemented resource tagging strategies for cost attribution and FinOps governance.
Maintained high availability and reliability of production Linux-based systems (Apollo, JBoss) across 24/7 operations in collaboration with cross-functional ops teams.
Designed, deployed, and managed Kubernetes (EKS) clusters for stateful and stateless workloads; provisioned infrastructure using Terraform for reproducible, modular IaC deployments.
Built and maintained Prometheus and Grafana observability stack — authored custom alerting rules, runbooks, and dashboards — reducing incident response time by 25% via structured escalation.
Managed PostgreSQL and MySQL databases on RDS/Aurora including backup strategies, performance tuning, high-availability configuration, and disaster recovery planning.
Designed and implemented 3-tier AWS architecture using VPC, public/private subnets, NAT gateways, and security group chaining following CIS benchmark hardening guidelines.
Leveraged Datadog for infrastructure monitoring and APM; integrated ServiceNow for ITSM ticket management and incident lifecycle tracking.
Applied FinOps principles with cloud resource tagging policies to monitor and govern cloud spend across AWS environments.
Developed and maintained CI/CD pipelines in Jenkins and GitLab CI for production-grade SaaS environments, integrating SonarQube, OWASP Dependency Check, and Trivy into a full DevSecOps workflow.
Architected scalable, high-availability cloud infrastructure using Docker, Kubernetes, and AWS, ensuring uptime and platform reliability for stateful services.
Automated infrastructure provisioning with Terraform, reducing deployment errors by 40% and enabling repeatable, modular IaC across environments.
Deployed and managed Prometheus and Grafana monitoring stacks; administered PostgreSQL databases with backup, recovery, and performance tuning procedures.
Enforced resource tagging, cost governance, and Linux OS hardening (via Ansible) across all managed infrastructure.
Architected and managed Kubernetes clusters using Helm for stateful and stateless deployments; implemented GitOps-based continuous delivery via ArgoCD.
Automated Linux-based infrastructure provisioning and OS hardening using Ansible and Terraform; maintained system reliability and performance under high compute loads.
Integrated Prometheus/Grafana monitoring with Slack alerts and ServiceNow incident workflows, reducing MTTR by 35%.
Administered Kafka messaging infrastructure for event-driven workloads; conducted load and reliability testing to validate distributed system performance.
Managed PostgreSQL instances including schema maintenance, backup validation, and performance profiling.
Designed and provisioned production-ready Amazon EKS infrastructure using Terraform with reusable modules, remote state management, IAM roles, VPC networking, managed node groups, and autoscaling, enabling consistent and automated Kubernetes deployments.
Implemented a complete observability stack using Prometheus, Grafana, Loki, OpenTelemetry, and Jaeger for infrastructure, applications, and databases. Configured dashboards, log aggregation, distributed tracing, alerting, and Slack/Email notifications to improve system visibility and incident response.
Built an end-to-end CI/CD pipeline for Development, Staging, and Production environments using GitHub Actions and Google Cloud Build. Integrated AI-based failure detection, automated testing, security scanning, deployment approvals, and rollback strategies to improve deployment reliability.
Reduced cloud infrastructure costs by up to 90% through networking and architecture optimization, including CDN redesign, storage optimization, caching improvements, and traffic routing enhancements while maintaining application performance and availability.
Led the migration of a monolithic application to a Kubernetes-based microservices architecture using Docker and Amazon EKS. Designed the complete platform including CI/CD with Jenkins, security scanning using Trivy, SonarQube, and OWASP Dependency-Check, GitOps deployment with ArgoCD, and end-to-end monitoring using Grafana, Prometheus, and Loki with Slack and email alerting.
Architected and deployed a highly available three-tier application on AWS using public/private subnets, Application Load Balancer (ALB), Auto Scaling Groups (ASG), NAT Gateway, AWS WAF, VPN connectivity, and secure networking following AWS best practices.
Successfully migrated enterprise applications from on-premises infrastructure to a private cloud environment using a lift-and-shift approach. Planned migration strategy, minimized downtime, validated workloads, and optimized infrastructure for improved scalability and operational efficiency.