All projects
Featured Project

Silver Jems — E-Commerce Infrastructure on AWS

Designed and owns the production AWS infrastructure for a family jewellery business — containerised multi-service architecture, automated CI/CD pipeline, and full-stack observability with dynamic alerting across every component.

AWS ECSAWS ECRAWS EC2AWS LambdaCloudFrontRoute 53VPCCloudflare R2DockerGitHub ActionsGrafanaZabbixPostgreSQLRedis

The Problem

A family jewellery business needed an online store that could handle real traffic, real orders, and real consequences if it went down. The ask wasn't just to deploy an application — it was to own the infrastructure so the business never had to think about it. That meant designing for isolation, automating every release, and building observability before the first customer ever landed on the site.

Infrastructure Design

Service Isolation by Default

Every component of the application runs as an independent container on AWS ECS — storefront, backend API, database, and email service are fully decoupled. This was a deliberate reliability decision: a database issue cannot take down the storefront, and an API deployment cannot affect the email service. Blast radius is contained at the container boundary.

Network Architecture

All services are deployed inside a private VPC with controlled ingress. Public traffic enters only through CloudFront, which handles SSL termination, edge caching, and DDoS mitigation. Route 53 manages DNS routing with health checks — if an origin becomes unhealthy, traffic fails over automatically. No service is directly exposed to the public internet.

Storage Decoupled from Compute

Product images and static assets are stored in AWS S3, proxied through CloudFront. Storage scales independently of the application layer. No infrastructure changes needed as asset volume grows. Sub-100ms delivery at the edge via CloudFront's global CDN. Replacing the origin bucket never requires a frontend change.

CI/CD Pipeline — Zero Manual Deploys

The Deployment Problem

Manual deployments introduce human error into every release. For a business where a failed deployment means lost orders, that risk is unacceptable. The goal was to make deployment a process owned by the pipeline, not a person.

Pipeline Design

  • Developer pushes to dev branch

  • Promotion to test branch triggers deployment to a dedicated test server

  • Test server validates the build in an environment identical to production

  • Merge to main triggers GitHub Actions automatically

  • Actions builds the Docker image and pushes it to AWS ECR

  • ECS pulls the new image and rolls out the update — zero SSH, zero manual steps

Outcome

Every production release is traceable, repeatable, and reversible. The deployment history lives in Git. Rollback is a revert and a push.

Security Posture

  • Admin panel protected by Multi-Factor Authentication — privileged access requires a second factor, always

  • End users authenticate via Google OAuth or standard login — managed at the API layer, not the UI

  • Every API route enforces server-side session validation — the API rejects unauthorised requests regardless of origin

  • All services run inside a private VPC — no direct public internet exposure

  • CloudFront acts as the sole public entry point — origin is shielded behind the CDN layer

Observability Stack

PhilosophyObservability was instrumented before the platform went live. The first incident was caught by an alert — not a customer complaint. Every monitored component has a meaningful signal tied to it, not a generic CPU threshold.

What Is Monitored

  • ECS container state — all services watched for restarts, crashes, and unhealthy states

  • API response time and error rate — latency tracked via Grafana

  • PostgreSQL slow queries, active connections, and replication lag — Zabbix agent on DB host

  • Redis memory usage and cache hit ratio — early warning on eviction pressure

  • Email service delivery events — failed sends trigger alerts before users notice

  • SSL certificate expiry — automated watch, alert fires 30 days before expiry

  • CloudFront cache hit ratio — drop in ratio signals origin pressure or misconfiguration

  • Host-level metrics on EC2 — CPU, memory, disk I/O, network via Zabbix

Alerting Design

All alerts are dynamic — thresholds are calculated relative to baseline behaviour, not fixed numbers. A spike in API error rate that lasts 30 seconds is noise. One that sustains for 3 minutes is an incident. The alerting layer knows the difference.

SRE Principles Applied

Toil Elimination

Manual deployments, manual image builds, and manual server configuration were all eliminated. GitHub Actions owns the build. ECS owns the rollout. The engineer owns the pipeline design — not the execution.

Blast Radius Containment

Service isolation at the container level means failures stay local. An email service crash does not affect order processing. A cache eviction event does not bring down the API. Each component fails independently and recovers independently.

Observability Before Incidents

Monitoring was not added after something broke. Every component was instrumented before go-live. The system was observable before it was public.

Infrastructure as a Reliability Contract

The business does not think about infrastructure. That is the goal. Uptime, deployments, and alerts are owned by the SRE layer — not delegated back to the product team.

Silver Jems — E-Commerce Infrastructure on AWS

Tech Stack

AWS ECSAWS ECRAWS EC2AWS LambdaCloudFrontRoute 53VPCCloudflare R2DockerGitHub ActionsGrafanaZabbixPostgreSQLRedis
Kartik Patel
Let's Connect

Ahmedabad, India · Open to Remote & Global Opportunities
© 2026 Kartik Patel · Built with intention, not just code.

On this page