AWS VPC Edge Routing Guide Part 4: DNS Decisions and Route 53 — Hosted Zones, the Six Routing Policies, Alias vs CNAME, and Health Checks
Introduction
Part 1 picked the entry point that fronts a VPC (ALB, NLB, API Gateway, CloudFront, GA). But before traffic ever reaches that entry point, another decision always happens first — DNS resolution. Type https://api.example.com, and the browser fires off a DNS query to learn an IP; the IP / CNAME returned at that moment decides which entry point, region, or instance the traffic actually reaches.
The layer that handles that decision is Route 53. Part 4 unpacks Route 53’s core decisions — Hosted Zone selection, record types, Routing Policies, Health Checks. (The series synthesis and standard patterns close in Part 5.)
- Part 0 — Primer: network and AWS fundamentals
- Part 1 — Picking the entry point: ALB / NLB / API Gateway / CloudFront / Global Accelerator
- Part 2 — VPC-to-VPC and on-prem connectivity: VPC Endpoint / PrivateLink / Peering / Transit Gateway / VPN / Direct Connect
- Part 3 — Inside the VPC: IGW / NAT GW / Route Tables / Security Group vs NACL
- Part 4 — DNS decisions and Route 53 (this post)
- Part 5 — Four standard patterns: from decision tree to first sketch
Same target reader as the rest of the series — backend or infrastructure engineers who’ve used Route 53 in the console but aren’t sure how the six Routing Policies differ, why Alias is preferable to CNAME, or how a Private Hosted Zone interacts with VPC Endpoint. After this post, the goal is that even DNS decisions resolve through a single decision tree.
TL;DR
- Route 53 always runs before the entry points in Parts 1–3 — the order is user → DNS resolution → entry point (ALB / CloudFront / etc.) → VPC.
- Hosted Zones split into Public vs Private — Public is internet-visible, Private only resolves inside attached VPCs.
- Alias records are the standard way to map AWS resources (ALB, CloudFront, S3, API Gateway, etc.) to a domain root. CNAME can’t sit at the root, and it incurs an extra DNS query (with cost).
- The six Routing Policies are traffic-distribution strategies — Simple (default), Weighted (canary), Latency (closest region), Geolocation (per country), Geoproximity (geo + bias), Multi-value (random + health), Failover (primary + secondary).
- Health Check + Failover routing gives you DNS-layer auto-failover. But TTL caching means minute-level latency — for second-level failover, reach for Global Accelerator or ALB target health checks instead.
1. Where Route 53 sits in the traffic flow
Trace the actual path from “user types https://api.example.com” to “backend EC2 receives the request” and you’ll see Route 53 sits at the very front of everything.
flowchart LR
User([User]) -->|"1. DNS query"| Resolver[Recursive Resolver<br/>ISP / Cloudflare 1.1.1.1 / Google 8.8.8.8]
Resolver -->|"2. .com delegation"| Root[Root + .com TLD]
Root -->|"NS record"| R53[Route 53<br/>example.com Hosted Zone]
R53 -->|"3. A/AAAA/Alias response"| Resolver
Resolver -->|"4. Returned IP"| User
User -->|"5. HTTPS to that IP"| ALB[ALB / CloudFront / NLB]
ALB --> VPC[VPC]
Three key facts:
- Route 53 is an authoritative DNS server. Once your domain’s NS record delegates to Route 53, all DNS queries for that domain land on Route 53.
- Responses are cached. Clients (browsers, OSes), recursive resolvers (ISPs, Cloudflare 1.1.1.1), and CDN edges all cache the response for the TTL. This is the defining trait of DNS-based routing — DNS changes don’t propagate instantly.
- Route 53 makes routing decisions. It’s not just “name to IP” — based on the Routing Policy, it picks which IP to return given the requester’s location, configured weights, health states, etc. That’s why Route 53 is more than a DNS server; it’s a routing tool.
2. Hosted Zone — Public vs Private
A Hosted Zone is a settings container in Route 53 for a single domain. All records (A, CNAME, MX, etc.) for example.com live inside its Hosted Zone.
There are two kinds: Public and Private.
2.1 Public Hosted Zone
- A normal domain that can be resolved from anywhere on the internet.
- Becomes active once your domain registrar (Route 53 Domains, GoDaddy, Cloudflare, etc.) delegates the NS records to Route 53.
- Cost: $0.50 per hosted zone per month + $0.40 per million queries.
2.2 Private Hosted Zone
- A domain that only resolves inside specific attached VPCs.
- Example:
internal.example.comconfigured as a Private Hosted Zone attached to VPCs A and B → only resolves inside those VPCs; invisible from the internet. - Use cases: internal microservice DNS, friendly aliases for RDS endpoints, service discovery.
- The same domain can exist as both Public and Private simultaneously — VPC clients see the Private answer, external clients see the Public answer (split-horizon DNS).
2.3 When to pick which
| Situation | Pick |
|---|---|
| General web service domain (external users) | Public Hosted Zone |
| Internal microservice traffic | Private Hosted Zone |
| Marketing externally, different backend internally | Both (split-horizon) |
| RDS / ElastiCache managed endpoints | Private possible (default DNS also works) |
Note — relationship with VPC Endpoint Private DNS: Part 2 says “Interface Endpoint attaches Private DNS so the service’s official domain resolves to the ENI’s private IP.” That Private DNS is the same Route 53 Private Hosted Zone mechanism — AWS auto-creates a Private Hosted Zone when you make the Endpoint.
3. Record types and Alias vs CNAME
DNS maps a domain to some other piece of information; the kind of mapped information is the record type.
3.1 Common record types
| Type | Maps to | Common use |
|---|---|---|
| A | Domain → IPv4 | Most basic — api.example.com → 54.x.x.x |
| AAAA | Domain → IPv6 | IPv6 dual stack |
| CNAME | Domain → another domain | www.example.com → cdn.cloudfront.net |
| MX | Domain → mail server | Email routing |
| TXT | Domain → string | SPF, DKIM, domain ownership verification |
| NS | Domain → authoritative nameserver | Domain delegation |
| Alias | Domain → AWS resource | AWS-only. The key one |
3.2 Alias — Route 53’s secret weapon
Alias is a non-standard record that lets Route 53 map AWS resources (ALB, CloudFront, S3, API Gateway, Global Accelerator, Elastic Beanstalk, etc.) directly to a domain.
It looks like CNAME on the surface, but the differences are decisive.
3.3 Alias vs CNAME
| CNAME | Alias | |
|---|---|---|
| Maps to | Any domain | AWS resources only (ALB, CloudFront, S3, API Gateway, GA, etc.) |
Works at the root domain (example.com) | No | Yes |
| DNS query count | Two (CNAME → A lookup of that domain) | One (Route 53 internally returns the IP) |
| Cost | Standard query billing | Free (Alias queries aren’t billed) |
| Tracks AWS-resource IP changes | No (the CNAME stays put) | Yes (auto-followed) |
Two key points:
- You can’t put a CNAME at the root — that’s DNS standard (RFC 1034). To map
example.com → ALB DNSyou can’t use CNAME. Alias is AWS’s workaround. - Alias adds no extra query and is free. CNAME forces clients to do a second A lookup; Alias has Route 53 resolve the actual IP internally and respond directly.
Practical trap: trying to map
example.comto an ALB with CNAME and getting confused when it doesn’t work is one of the most common DNS gotchas. The answer is always Alias — for AWS-resource mappings, root or subdomain, Alias is almost always the right choice.
4. The six Routing Policies — Route 53’s core decision
This is what makes Route 53 a routing tool, not just a DNS server. Multiple records can sit on the same domain, and a policy decides which response to return.
4.1 Simple — single fixed response
- One record = one response. No routing decision.
- The simplest case.
api.example.com → ALB DNSis a 1:1 mapping.
4.2 Weighted — distribute by weight
- Multiple records on the same domain, each with a weight.
- Example:
api.example.com → 10.0.0.1 (weight 90),→ 10.0.0.2 (weight 10)→ 90% to the first, 10% to the second. - Use cases: canary deploys (route only 10% to a new version), A/B testing.
4.3 Latency-based — closest region
- Multiple regions running the same service; respond with the IP of the lowest-latency region for the requester.
- Use cases: global multi-region services. US users hit us-east-1, EU hits eu-west-1, Asia hits ap-northeast-2.
- Similar to Global Accelerator’s routing, but Route 53 decides at the DNS layer (GA decides at the anycast IP layer).
4.4 Geolocation — by country / continent
- Different responses based on the requester’s geographic location (country, continent, US state).
- Use cases: country-specific compliance (EU GDPR data must stay in EU), region-specific content (China users get the China region).
- Differs from Latency-based by being political/legal boundary-driven, not distance-driven.
4.5 Geoproximity — geo distance + bias
- A generalization of Geolocation. Distance is computed from lat/long, and each resource gets a bias (±99%) to expand or shrink its zone.
- Use cases: precise distribution like “Texas users — 50% to us-east, 50% to us-west.”
- The most complex; rarely used in everyday cases.
4.6 Multi-value Answer — random response + health check
- Multiple records on the same domain; return up to 8 IPs in random order per query (only healthy ones).
- Use cases: managed lightweight load balancing — when an ALB is overkill.
- The client falls back to the next IP if the first fails (default browser behavior).
4.7 Failover — primary + secondary
- Primary record + Secondary record + Health Check.
- If Primary’s health check fails, responses automatically come from Secondary.
- Use cases: disaster recovery (DR), multi-region active-passive.
4.8 Quick-pick table
| Policy | Distribution basis | Common use |
|---|---|---|
| Simple | Single response | Basic 1:1 mapping |
| Weighted | Percentage by weight | Canary / A/B tests |
| Latency | Closest region | Global multi-region |
| Geolocation | Country / continent | Compliance / region-specific content |
| Geoproximity | Lat/long + bias | Precise geo distribution (rare) |
| Multi-value | Random + health | Managed lightweight LB |
| Failover | Primary + secondary | DR / active-passive |
5. Health Check — auto-failover at the DNS layer
A Route 53 Health Check periodically checks whether an endpoint (or a calculated combination of other checks) is alive and removes failing answers from DNS responses.
5.1 Three kinds of Health Check
| Kind | What it does |
|---|---|
| Endpoint | Sends HTTP / HTTPS / TCP to a specific IP / domain on a port every 30s (or 10s), checks status code and string |
| Calculated | Combines multiple other health checks with AND/OR — “OK if 2 of A/B/C pass” |
| CloudWatch Alarm | Tied to a CloudWatch alarm state — health based on CPU or custom metrics |
5.2 Combined with Failover routing
flowchart TB
Q[DNS query: api.example.com] --> RT{Primary health<br/>OK?}
RT -->|"OK"| P[Primary IP response<br/>54.x.x.1]
RT -->|"Fail"| S[Secondary IP response<br/>54.x.x.2]
- If Primary’s health is failing, queries automatically failover to Secondary.
- The DNS response itself changes, so clients reconnect to the new IP starting with their next query.
5.3 The catch — DNS TTL latency
The weakness of DNS-based failover is TTL caching: changes don’t propagate instantly. A 60-second TTL means clients keep using the old IP for up to a minute. So:
- Minute-level failover → DNS-based is fine
- Second-level failover required → use Global Accelerator (anycast IP, no TTL impact) or ALB target health checks (drop unhealthy targets at L7)
6. The DNS decision tree
Combine the variables above and you get the DNS decision tree.
flowchart TD
Start([Creating or changing a domain]) --> Q1{External vs internal?}
Q1 -->|External users| Q2{Need root-domain mapping?}
Q1 -->|VPC-only| Private[Private Hosted Zone]
Q2 -->|"Yes, to an AWS resource"| Alias[Alias record]
Q2 -->|No, only subdomains| Q3{Mapping target an AWS resource?}
Q3 -->|Yes| Alias
Q3 -->|No, external domain| CNAME[CNAME record]
Q2 -->|"To a specific IP"| A[A / AAAA record]
Alias --> Q4{Distribute across multiple endpoints?}
A --> Q4
Q4 -->|No| Simple[Simple Routing]
Q4 -->|Yes| Q5{Distribution criterion?}
Q5 -->|"Weighted (canary, A/B)"| Weighted[Weighted]
Q5 -->|Closest region (latency)| Latency[Latency-based]
Q5 -->|Country / continent (compliance)| Geo[Geolocation]
Q5 -->|"Primary + backup"| FO[Failover + Health Check]
Q5 -->|"Random + health"| MV[Multi-value]
Each branch in one line:
- Q1: External users → Public Hosted Zone. VPC-only → Private.
- Q2-Q3: Mapping to an AWS resource → Alias almost always. External domain → CNAME. Specific IP → A.
- Q4-Q5: Single endpoint → Simple. Multiple endpoints → pick a Routing Policy by distribution criterion: Weighted / Latency / Geolocation / Failover / Multi-value.
7. Route 53 vs Global Accelerator vs CloudFront — same neighborhood?
All three “route global users to the right endpoint,” so their decision spaces overlap. But they operate at different layers and fit different scenarios.
| Route 53 (Latency Routing) | Global Accelerator | CloudFront | |
|---|---|---|---|
| Layer | DNS (different IP responses) | Anycast IP (network layer) | Edge caching (HTTP layer) |
| Failover speed | Minute-level (DNS TTL) | Second-level (anycast auto-rerouting) | Second-level (edge health) |
| Static IP | No (DNS response varies) | Yes (two permanent anycast IPs) | No |
| Caching | No | No | Yes |
| Uses AWS backbone | (not directly) | Yes (entire path) | Yes (on cache miss) |
| Pricing | Hosted zone + per query | ~$18/hour + data transfer | Data transfer + per request |
| Best fit | General multi-region routing | Static IP / second-level failover / UDP / gaming | Static assets / HTTP caching |
One-liner picks
| Situation | Pick |
|---|---|
| Simple global routing, cost-sensitive | Route 53 Latency-based |
| Static IP allowlist, UDP, gaming | Global Accelerator |
| Heavy static assets, HTTP caching effective | CloudFront |
| All three combined | DNS (Route 53) → CloudFront → ALB is a common pattern |
The three aren’t really alternatives — they often stack at different layers. Route 53 alias-points to CloudFront, CloudFront’s origin is an ALB, and the ALB sits in front of EC2.
8. Five common anti-patterns
8.1 Trying to CNAME the root domain
example.com (root) being CNAME’d to an ALB. DNS standard forbids CNAME at the apex (RFC 1034). The answer is always Alias. The very first DNS gotcha most people hit on AWS.
8.2 TTL set absurdly low
“For fast failover, I’ll set TTL to 5 seconds.” Now every client re-queries DNS on every request — cost and latency explode. Route 53 charges per query, and users see additional latency. A reasonable TTL is 60–300 seconds; if you really need fast failover, switch to Global Accelerator.
8.3 Failover routing without an attached health check
Configuring Failover routing with a Primary and Secondary but no Health Check — Primary failure won’t trigger failover. Without health information, Route 53 has no basis to declare Primary “failing.” Always pair Failover routing with a Health Check.
8.4 Forgetting CloudFront / ALB Alias’s Hosted Zone ID
When creating an Alias record, CloudFront, ALB, and S3 each have their own Hosted Zone ID. CloudFront is always Z2FDTNDATAQYW2 (global, fixed); ALB differs by region. The console handles this for you, but Terraform / CloudFormation requires it explicitly — pick the wrong one and the record silently fails.
8.5 Forgetting to attach the Private Hosted Zone
Creating a Private Hosted Zone but skipping the VPC association. The Hosted Zone exists, but no VPC actually resolves it. Or sharing across multiple accounts / regions requires explicit VPC associations. Adding a new VPC and forgetting to associate is a common slip.
Recap
What this post covered:
- Route 53 always runs before the entry points in Parts 1–3. The DNS response decides which IP, region, or instance traffic ends up on.
- Hosted Zones split into Public (internet) vs Private (VPC-internal). They can both serve the same domain (split-horizon).
- Mapping AWS resources is almost always Alias. CNAME can’t sit at the apex and adds extra queries / cost.
- The six Routing Policies are Simple / Weighted (canary, A/B) / Latency (global region) / Geolocation (country) / Multi-value (random + health) / Failover (primary + secondary) — pick by distribution criterion.
- Health Check delivers DNS-layer auto-failover. But TTL keeps it minute-level — for second-level failover, GA or ALB is the better fit.
Part 4’s goal was to make DNS decisions resolvable through one decision tree. Walk Hosted Zone type → record type → routing policy in order, and almost every case lands on a clear answer.
Series retrospective
This series unpacks AWS network ingress and routing through the lens of “what decision problem does this solve?”, across six parts.
- Part 0 — Primer: network and AWS fundamentals, gathered into one post.
- Part 1 — Picking the entry point that fronts a VPC (ALB / NLB / API Gateway / CloudFront / Global Accelerator). Four decision variables and a decision tree.
- Part 2 — Connecting a VPC to other VPCs, AWS services, and on-prem (VPC Endpoint / PrivateLink / Peering / Transit Gateway / VPN / Direct Connect). The first split is destination type.
- Part 3 — How packets actually flow inside (IGW / NAT GW / Route Tables / SG vs NACL). Less about choosing, more about understanding mechanics.
- Part 4 — DNS decisions and Route 53. The decision that runs before all the entry points.
- Part 5 — Four standard patterns. The closing post that takes Parts 0–4’s decision trees and recombines them into a “where do I start drawing?” layer.
Together, the six posts give you a decision-tree-driven path through “DNS → external entry point → VPC → inside → other systems,” plus four standard patterns to start from on day one. Parts 0–4 do the decomposition; Part 5 does the synthesis. Holding both at once is the starting point for infrastructure design.
Worthwhile follow-ups: security (WAF / Shield / SG / NACL / Network Firewall / GuardDuty / VPC Lattice), cost optimization (VPC traffic-cost patterns), observability (VPC Flow Logs, Reachability Analyzer, Route 53 Resolver Query Logs), multi-account (AWS Organizations + Resource Access Manager + domain delegation). Security gets its own series — AWS VPC Security Guide — because the decision area and narrative are different enough that bundling them here would make the series too heavy.
Appendix. One-page summary
A. Hosted Zone choice
| Situation | Pick |
|---|---|
| Domain for external users | Public Hosted Zone |
| VPC-internal-only domain | Private Hosted Zone |
| Split-horizon (different answers external vs internal) | Both |
B. Record selection
| Mapping target | Use |
|---|---|
| AWS resource (ALB / CloudFront / S3 / API Gateway / GA) | Alias (root or subdomain) |
| External domain (subdomain) | CNAME |
| External domain (root) | Not allowed — find another way |
| Specific IP | A (IPv4) / AAAA (IPv6) |
| Mail server | MX |
| Domain verification / SPF | TXT |
C. Routing Policy in one line
| Policy | Headline |
|---|---|
| Simple | 1:1 mapping, no routing |
| Weighted | Weighted percentage distribution (canary) |
| Latency | Closest region |
| Geolocation | Country / continent |
| Geoproximity | Lat/long + bias (rare) |
| Multi-value | Random 8 + health |
| Failover | Primary + Secondary + health |
D. Official AWS docs
- Route 53: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/
- Routing Policy comparison: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html
- Health Check: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover.html
- Hosted Zone: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/hosted-zones-working-with.html
E. Acronyms
| Acronym | Meaning |
|---|---|
| DNS | Domain Name System. Domain-to-IP resolution |
| TTL | Time To Live. How long a DNS response is cached |
| Hosted Zone | A Route 53 settings container for a single domain |
| Alias | An AWS-only record that maps directly to AWS resources |
| Health Check | The mechanism Route 53 uses to verify endpoint health |
| TLD | Top Level Domain (.com, .kr, etc.) |
| NS | Name Server. The record that delegates authority |