AWS VPC Edge Routing Guide Part 2: Connecting a VPC to Other VPCs, AWS Services, and On-Prem — A Decision Tree for VPC Endpoint, PrivateLink, Peering, Transit Gateway, and Direct Connect
Introduction
In the previous post we covered picking the entry point that fronts a VPC. This post handles the next decision — once traffic is inside (or already lives inside) a VPC, how does it reach another VPC, an AWS-managed service, or on-prem (on-premises — your own datacenter or office server room, infrastructure you operate outside the public cloud)?
This decision goes wrong far more often than Part 1’s. There are six candidates, each works on a fundamentally different mechanism, and “looks similar but actually can’t do X” comes up everywhere. Building an N×N mesh of VPC Peerings only to rip it out for Transit Gateway a year later, or routing S3 traffic through a NAT Gateway and quietly burning hundreds of dollars a month — both are common.
- Part 0 — Primer: network and AWS fundamentals
- Part 1 — Picking the entry point: ALB / NLB / API Gateway / CloudFront / Global Accelerator
- Part 2 — VPC-to-VPC and on-prem connectivity: VPC Endpoint / PrivateLink / Peering / Transit Gateway / VPN / Direct Connect (this post)
- Part 3 — Inside the VPC: IGW / NAT GW / Route Tables / Security Group vs NACL
- Part 4 — DNS decisions and Route 53: Hosted Zone / Routing Policy / Alias vs CNAME / Health Check
- Part 5 — Four standard patterns: from decision tree to first sketch
Same target reader as Part 1 — backend or infrastructure engineers who’ve built a VPC but can’t explain “what’s the difference between option A and option B” in one line. After this post, the goal is that “connect this VPC to something” is a 30-second decision.
TL;DR
- The first split is “where does the destination live” — AWS managed service / another VPC / a different organization’s service / on-prem. The candidate set barely overlaps across these four.
- For S3 and DynamoDB, Gateway Endpoint (free) is almost always the answer. Routing S3 traffic through NAT Gateway charges per-GB data processing.
- VPC Peering is 1:1, Transit Gateway is N:N. Past three or four VPCs, a Peering mesh stops being operationally feasible — switch to Transit Gateway.
- To privately expose a service from another organization (or VPC), use PrivateLink. It sidesteps IP-overlap problems and exposes only one service in one direction.
- On-prem starts with VPN and graduates to Direct Connect (dedicated link). The two aren’t competitors — production patterns usually run Direct Connect as primary with a VPN backup.
1. Why this decision is hard
Unlike Part 1’s entry-point candidates, the six VPC-connectivity options split by the underlying decision problem itself, into four buckets. So the same word “connection” pulls in completely different candidate sets depending on what you’re connecting.
flowchart TB
VPC[Inside a VPC,<br/>need to reach outside]
VPC --> Q1{What are you<br/>connecting to?}
Q1 --> S1[AWS managed services<br/>S3, DynamoDB, KMS, ECR, SSM, ...]
Q1 --> S2[Another VPC<br/>same account/org]
Q1 --> S3[A different org's service<br/>SaaS, partner, sister BU]
Q1 --> S4[On-prem<br/>own DC / office]
S1 -.-> A1[VPC Endpoint]
S2 -.-> A2["VPC Peering or<br/>Transit Gateway"]
S3 -.-> A3[PrivateLink]
S4 -.-> A4["Site-to-Site VPN or<br/>Direct Connect"]
“Connect VPC to X” almost completely partitions by where X lives. So this guide’s decision tree puts destination-type as the first branch, then asks two or three more variables inside each region.
In one table:
| Destination | Decision variables | Candidates |
|---|---|---|
| AWS managed service | S3/DynamoDB or other? | Gateway Endpoint / Interface Endpoint |
| Same-org VPC | 1:1 or N:N? | VPC Peering / Transit Gateway |
| Other-org service | One-way exposure / IP isolation needed | PrivateLink |
| On-prem | Internet OK / dedicated line needed | Site-to-Site VPN / Direct Connect / both |
Sections 2–5 cover the candidates per region with their mechanisms and selection criteria. §6 has the full decision tree, §7 the anti-patterns.
2. Reaching AWS managed services — VPC Endpoint
By default, when an EC2 inside a VPC talks to an AWS managed service like S3, DynamoDB, KMS, or ECR, traffic goes out to the internet — through IGW from a Public Subnet, or through NAT Gateway from a Private one. Both eventually return to the AWS backbone, but the round-trip “leave the VPC, come back in” hurts cost and security.
VPC Endpoint removes that detour. “Talk to AWS services directly from inside the VPC, without traversing the internet” is the one-line definition.
2.1 Gateway Endpoint vs Interface Endpoint
Endpoints come in two flavors, and the choice is automatic based on which service you’re targeting.
| Gateway Endpoint | Interface Endpoint | |
|---|---|---|
| Supported services | S3, DynamoDB only | Most others (KMS, ECR, SSM, CloudWatch, Lambda, …) |
| Mechanism | Route Table + prefix list | ENI in the VPC + Private DNS |
| Cost | Free | $0.01/hour/AZ + $0.01/GB |
| Reachable from another VPC / on-prem | No | Yes (via PrivateLink) |
The rule is simple — S3 and DynamoDB → Gateway Endpoint; everything else → Interface Endpoint.
2.2 Why Gateway Endpoint is free, and the pitfalls
Gateway Endpoint is just “add a prefix-list route for S3/DynamoDB to the Route Table.” Packets exit through that route instead of the IGW or NAT Gateway, and AWS handles the path on its own backbone — no extra infrastructure (no ENI, no traffic processing). That’s why it’s free.
flowchart LR
EC2 -->|"default: internet detour"| NAT[NAT Gateway]
NAT -->|$0.045/GB| S3a[S3 public IP]
EC2 -->|"Gateway Endpoint<br/>route"| S3b[S3 direct]
style S3b stroke:#48cae4,stroke-width:2px
Two pitfalls:
- Same-region only — Gateway Endpoint reaches S3 buckets in the same region only. Cross-region S3 still goes through NAT.
- Useless without a Route Table entry — creating the Endpoint isn’t enough; the prefix list has to be added to the Route Table of the EC2’s Subnet for routing to actually change. 99% of “I created it but nothing changed” comes from this miss.
2.3 Interface Endpoint — an ENI inside the VPC
Interface Endpoint works completely differently. It puts an ENI in a Subnet of your VPC and attaches Private DNS so that the service’s official domain (e.g., kms.ap-northeast-2.amazonaws.com) resolves to the ENI’s private IP. The full ENI primer is in Part 0 §3.1 — the one-line takeaway is that an ENI is a virtual NIC living in the VPC with a private IP, and that private IPs, Security Groups, and EIPs all bind to the ENI rather than to the instance directly.
Two implications:
- It costs money — $0.01/hour/AZ + $0.01/GB. Three AZs gives you
$21/month fixed. Small numbers, but creating Interface Endpoints for every service in a small environment can quietly add up to $100200/month. - Endpoint policies for access control — IAM-policy-style rules on which resources can be reached through this Endpoint. Common in compliance-driven environments.
Note: Interface Endpoint and PrivateLink are the same mechanism. AWS managed services exposed via PrivateLink → Interface Endpoint; user services exposed the same way → PrivateLink (§5). Same box, different label.
2.4 Aside: how do you test locally with VPC Endpoint?
A frequent practical question. The principle in one line — VPC Endpoint only affects traffic originating inside the VPC. Your laptop sits outside the VPC and reaches AWS over the public internet, so usually plain IAM access keys work for local testing as if Endpoints don’t exist. Endpoint configuration is just a routing change inside the production VPC — it doesn’t alter the access controls of the AWS services themselves.
Where it does break: when bucket or Endpoint policies enforce aws:SourceVpce (“only allow access through this Endpoint”), no credential from outside reaches them. Two patterns cover most cases:
| Pattern | How it works |
|---|---|
| Separate dev account (most common) | Production account locks down with Endpoint-only policies; dev/staging accounts allow public access. Local development uses dev-account credentials. Code stays identical, only the IAM and resource policies differ. |
| LocalStack (offline / CI) | Docker-emulate S3, DynamoDB, SQS locally; point AWS SDK endpoint URL to http://localhost:4566. Zero real AWS calls — best for CI determinism. |
When these aren’t enough — when you need to validate the Endpoint policies themselves, or company policy forces the local path to mirror production — fall back to SSM Session Manager port forwarding, AWS Client VPN to join the VPC from your laptop, or running integration tests inside a VPC-attached CodeBuild project.
3. Same-org VPCs — Peering vs Transit Gateway
Anyone running more than one VPC hits this fork. Splitting dev/staging/prod, separating per team, or building in another region — all create the moment when “these two should be able to talk to each other.”
3.1 VPC Peering — 1:1 direct
VPC Peering is the simplest way: connect two VPCs at L3 by adding routes on each side. Both Route Tables get a route for the other VPC’s CIDR, both sides accept the peering, done.
flowchart LR
VPCa[VPC A<br/>10.0.0.0/16]
VPCb[VPC B<br/>10.1.0.0/16]
VPCa <-->|Peering| VPCb
Properties and limits:
- Almost zero cost — Peering itself is free; only data transfer is billed.
- Non-transitive — A↔B and B↔C don’t automatically give you A↔C. You need a separate Peering for that.
- CIDRs cannot overlap — overlapping IP ranges make Peering impossible.
- Mesh explodes with VPC count — N VPCs full-mesh = N(N-1)/2 peerings. 5 → 10. 10 → 45.
3.2 Transit Gateway — N:N hub
Transit Gateway (TGW) is a hub model where all VPCs and on-prem connections attach to a single Transit Gateway. Each attachment routes through TGW Route Tables. Adding a new VPC means attaching it to TGW — none of the existing VPCs need to change.
flowchart TB
TGW[Transit Gateway<br/>hub]
VPCa[VPC A]
VPCb[VPC B]
VPCc[VPC C]
VPCd[VPC D]
DC[Direct Connect /<br/>VPN]
VPCa --- TGW
VPCb --- TGW
VPCc --- TGW
VPCd --- TGW
DC --- TGW
Differences that matter:
- Transitive — A→B→C routing is just a TGW Route Table setting away.
- On-prem / DX / VPN integration — attach DX/VPN to TGW once, all VPCs share it.
- Multi-AZ HA happens per attachment — the TGW itself is region-scoped and managed (auto-HA), but each VPC attachment should be configured with ENIs across AZs (a single-AZ failure then only impacts that AZ’s VPC traffic).
- It costs — $0.05/hour per attachment + $0.02/GB. Five VPCs + one DX = ~$200+/month.
3.3 Where they split
| Variable | VPC Peering | Transit Gateway |
|---|---|---|
| Connection count | 1:1 (or 2~3 full mesh) | 4+ |
| Transitive | No | Yes |
| Cost | $0 (data only) | $0.05/hour per attachment |
| On-prem integration | Configured separately | One TGW attach, all VPCs share |
| CIDR overlap | Not allowed | Can be split via TGW Route Tables |
| Operational complexity | Explodes with VPC count | Centralized |
Practical crossover: around 3~4 VPCs. Below that, Peering’s zero cost wins. Above, mesh management eats your operations and TGW’s fee earns its keep.
4. Other-org services — PrivateLink
Section 3 covers within-org VPC connections. But there’s a separate scenario — privately calling a service from another organization (a SaaS vendor, a partner, or even another business unit at the same company) from your own VPC. PrivateLink is built for that decision problem.
4.1 What PrivateLink solves
flowchart LR
subgraph CON["Consumer VPC (caller)"]
EC2[EC2]
IE[Interface Endpoint<br/>ENI]
end
subgraph PROV["Provider VPC (service owner)"]
NLB[NLB]
SVC[Service]
end
EC2 -->|"private DNS"| IE
IE -.PrivateLink.-> NLB
NLB --> SVC
The mechanism is straightforward. The Provider defines a “Service” in front of an NLB; the Consumer creates an Interface Endpoint in their VPC connected to that Service. The Consumer talks to a domain that resolves to an ENI in their own VPC; the Provider only exposes their NLB.
PrivateLink’s wins versus Peering / TGW:
- IP overlap is irrelevant — it works even when Consumer/Provider VPC CIDRs collide. The Consumer just talks to an ENI in its own VPC.
- One-way and service-scoped — Provider exposes only the service behind the specified NLB; the rest of the VPC stays private. Peering exposes the entire VPC.
- Consumer-side SG control — attach an SG to the Endpoint ENI to restrict which Consumer-side EC2s can call.
4.2 PrivateLink vs Peering — when each wins
Even within the same organization, PrivateLink is sometimes the right call.
| Situation | Pick |
|---|---|
| Two VPCs need full bidirectional VPC-wide access | Peering / TGW |
| One VPC calls a single service in another VPC | PrivateLink |
| CIDRs overlap | PrivateLink (or TGW + NAT) |
| Calling a service in another AWS account/org | PrivateLink (essentially the only answer) |
| Privately calling a SaaS vendor’s service | PrivateLink (if they support it) |
Summary: “VPC-wide” → Peering/TGW. “One service” → PrivateLink.
5. On-prem — VPN and Direct Connect
Connecting on-prem to a VPC narrows to two options: Site-to-Site VPN and Direct Connect. They’re not competitors — they’re a “low-barrier vs high-performance” progression.
5.1 Site-to-Site VPN — IPsec tunnel over the internet
Site-to-Site VPN sets up IPsec (IP Security — a network-layer protocol that encrypts packets and verifies their integrity) tunnels over the public internet between AWS-side Virtual Private Gateway (VGW) or TGW and your on-prem router. Two IPsec tunnels come up automatically; both sides exchange routes via BGP or static routing.
- Up in days — no hardware orders, just router configuration.
- Bandwidth bound by your internet circuit — roughly 1.25 Gbps per tunnel cap.
- Latency exposed to internet routing volatility — fine most days, but ISP issues show through.
- Pricing — $0.05/hour per tunnel. AWS automatically creates two → $0.10/hour.
5.2 Direct Connect — dedicated fiber to AWS
Direct Connect (DX) is actual fiber laid to an AWS facility (or DX Location) — a dedicated circuit.
- 1, 10, or 100 Gbps options — bandwidth VPN can’t match.
- Stable latency — no internet routing in the picture.
- Different pricing model — port-hour fee plus data transfer. But AWS-to-on-prem outbound on DX is much cheaper than over the internet, so high-volume traffic ends up cheaper than VPN.
- Takes time — circuit ordering and physical install takes weeks to months.
5.3 They complement each other — DX primary + VPN backup
The standard production pattern is “DX primary, VPN backup.”
flowchart LR
subgraph AWS
TGW[Transit Gateway]
end
subgraph OnPrem[On-prem DC]
Router[Router]
end
Router ===|DX primary| TGW
Router -.VPN backup.-> TGW
Reasons:
- DX is a single physical circuit — one cable cut and you’re down. With VPN configured, BGP fails over automatically.
- VPN is your fast start — bring up VPN first while DX is being provisioned, promote DX to primary later.
- BGP picks the path automatically — no manual ops involvement; availability is built in.
| Site-to-Site VPN | Direct Connect | |
|---|---|---|
| Medium | Internet + IPsec | Dedicated fiber |
| Bandwidth | ~1.25 Gbps per tunnel | 1 / 10 / 100 Gbps |
| Latency | Subject to internet routing | Stable |
| Build time | Days | Weeks to months |
| Pricing | $0.05/hour per tunnel | Port-hour + data (cheaper at scale) |
| When to pick | PoC, mid-scale, backup | Production primary |
6. The decision tree
The four regional decisions combine into:
flowchart TD
Start([Need to communicate from inside a VPC]) --> Q1{Connecting to what?}
Q1 -->|AWS managed service| QA{S3 or DynamoDB?}
QA -->|Yes| GE[Gateway Endpoint<br/>free]
QA -->|No| IE[Interface Endpoint]
Q1 -->|Same-org other VPC| QB{One service only?<br/>Overlapping CIDRs?}
QB -->|Yes| PL1[PrivateLink]
QB -->|No| QC{4+ VPCs?<br/>On-prem integration?}
QC -->|Yes| TGW[Transit Gateway]
QC -->|No| Peer[VPC Peering]
Q1 -->|Other-org service| PL2[PrivateLink]
Q1 -->|On-prem| QD{Dedicated line + high bandwidth?}
QD -->|Yes| QE{Production primary?}
QE -->|Yes| DXVPN[Direct Connect<br/>+ VPN backup]
QE -->|No| DX[Direct Connect]
QD -->|No| VPN[Site-to-Site VPN]
Each branch in one line:
- QA (S3/DynamoDB): always Gateway Endpoint. Routing through NAT Gateway just leaks data-processing fees.
- QB (single service / CIDR overlap): either condition → PrivateLink wins decisively.
- QC (VPC count, on-prem integration): under 4 + no on-prem → Peering. Otherwise TGW.
- QD (bandwidth/latency demands): internet circuit suffices → VPN. Need dedicated line → DX.
- QE (production primary?): production primary should be DX + VPN backup, not DX alone.
Key: The first-level branch is destination type, not cost or features. Once that’s set, the candidate set drops to one or two immediately.
7. Five common anti-patterns
7.1 NAT Gateway for S3 access
The most common and most expensive leak. S3 traffic through a NAT Gateway costs $0.045/GB, and analytics, log shipping, and image upload workloads run hundreds of GB to TBs monthly — that just gets added to the bill. A Gateway Endpoint plus one prefix-list line in the Route Table is a five-minute change that saves hundreds of dollars a month. (For a side-by-side cost comparison of Gateway Endpoint vs. NAT GW vs. IGW vs. Peering / TGW, see Part 0 Appendix I.)
7.2 N:N mesh of VPC Peerings
Five-plus VPCs all needing to talk to each other, drawn as a full Peering mesh. Route Tables explode and every new VPC means touching every existing VPC’s Route Table. Move to TGW — yes, attachments cost, but the operational complexity reduction more than pays for itself.
7.3 Direct Connect alone
Running only one DX circuit and going dark when the fiber is cut. DX is a single physical circuit; on its own its SLA isn’t materially better than a regular internet circuit. AWS’s standard pattern is always DX + VPN backup; for higher availability, run two DX circuits over separate physical paths or use a second DX Location.
7.4 Interface Endpoints in every AZ
“For AZ separation” — putting an Interface Endpoint in every AZ. $0.01/hour/AZ adds up; small services can leak tens of dollars a month doing this. Single-AZ environments (dev, staging) or low-traffic services usually only need one or two AZs. Even in production, look at traffic patterns first.
7.5 Solving a PrivateLink problem with Peering
Calling a single service from another org (or BU) but solving it with VPC Peering. Peering exposes the entire VPC — overkill for security, and impossible if the two orgs’ CIDRs collide. Anything that fits PrivateLink and is built on Peering will get torn out the moment a security review lands.
Recap
What this post covered:
- The first split is where the destination lives: AWS managed service / same-org VPC / other-org service / on-prem. The four buckets barely share candidates.
- S3/DynamoDB → Gateway Endpoint, everything else → Interface Endpoint. Gateway is free and almost always right; Interface costs per AZ, so be deliberate.
- VPC Peering vs Transit Gateway is “1:1 vs N:N.” Past 3~4 VPCs, TGW is operationally non-negotiable.
- Other-org services, CIDR collisions, single-service exposure → PrivateLink. It solves problems Peering and TGW can’t.
- On-prem starts with VPN, graduates to DX, but production runs both — DX primary + VPN backup.
Part 2’s goal was to make “VPC needs to talk to something outside” a 30-second decision. The decision tree narrows to one or two candidates in step one, and lands on exactly one by step three.
Part 3 covers routing inside the VPC — how IGW and NAT Gateway actually work, the priority order Route Tables evaluate in, and where stateful Security Groups and stateless NACLs split in practice. After ingress and external connectivity are settled, how packets actually flow inside the VPC is what’s left.
Note — series flow: Every connectivity decision here also runs after a DNS step, which is covered in Part 4 (DNS and Route 53). §2.3 mentions that Interface Endpoint relies on “Private DNS” — that Private DNS is in fact a Route 53 Private Hosted Zone (Part 4 §2.2). Pinning that connection makes the whole series fit together.
Appendix. One-page summary
A. One-line decision per region
| Destination | First pick | If first pick doesn’t fit |
|---|---|---|
| S3 / DynamoDB | Gateway Endpoint | (no alternative — always pick it) |
| Other AWS managed services | Interface Endpoint | Internet path (NAT GW) |
| Same-org VPC, 1:1 | VPC Peering | TGW |
| Same-org VPC, N:N | Transit Gateway | (mesh Peering is an anti-pattern) |
| Other-org service | PrivateLink | (essentially no alternative) |
| On-prem PoC / mid-scale | Site-to-Site VPN | DX |
| On-prem production primary | DX + VPN backup | (DX alone is an anti-pattern) |
B. Pricing in one line
| Candidate | Idle cost | Data cost |
|---|---|---|
| Gateway Endpoint | $0 | $0 (same region) |
| Interface Endpoint | $0.01/hour/AZ | $0.01/GB |
| VPC Peering | $0 | $0.01/GB cross-AZ, free same-AZ |
| Transit Gateway | $0.05/hour per attachment | $0.02/GB |
| PrivateLink (Provider) | NLB cost + Endpoint Service | NLB data |
| Site-to-Site VPN | $0.05/hour per tunnel | Standard outbound |
| Direct Connect | Port-hour (capacity-based) | DX outbound (cheaper than internet) |
C. Official AWS docs
- VPC Endpoint: https://docs.aws.amazon.com/vpc/latest/privatelink/
- VPC Peering: https://docs.aws.amazon.com/vpc/latest/peering/
- Transit Gateway: https://docs.aws.amazon.com/vpc/latest/tgw/
- Direct Connect: https://docs.aws.amazon.com/directconnect/latest/UserGuide/
- Site-to-Site VPN: https://docs.aws.amazon.com/vpn/latest/s2svpn/
D. Acronyms
AWS services and components
| Acronym | Meaning |
|---|---|
| VPC | Virtual Private Cloud. An isolated virtual network inside AWS |
| EC2 | Elastic Compute Cloud. AWS virtual servers |
| RDS | Relational Database Service. AWS-managed RDB |
| Lambda | AWS serverless compute |
| S3 | Simple Storage Service. AWS object storage |
| DynamoDB | AWS-managed NoSQL key-value database |
| KMS | Key Management Service. AWS-managed encryption keys |
| ECR | Elastic Container Registry. AWS container image registry |
| SSM | AWS Systems Manager. Unified EC2 ops (Session Manager etc.) |
| ALB / NLB | Application / Network Load Balancer (L7 / L4) |
Connectivity and routing
| Acronym | Meaning |
|---|---|
| VPC Endpoint | A path inside the VPC to AWS services without going through the internet (Gateway / Interface variants) |
| PrivateLink | A one-way connection that exposes a service behind an NLB and is consumed via an ENI in the consumer’s VPC |
| TGW | Transit Gateway. N:N VPC and on-prem hub |
| DX | Direct Connect. Dedicated fiber to AWS |
| VPN | Virtual Private Network. Here, Site-to-Site VPN |
| VGW | Virtual Private Gateway. AWS-side VPN endpoint |
| IGW | Internet Gateway. The bidirectional gateway between VPC and the internet |
| NAT | Network Address Translation. Private-IP-to-public-IP translation |
| NACL | Network Access Control List. Subnet-level stateless firewall |
| SG | Security Group. ENI-level stateful firewall |
Network basics
| Acronym | Meaning |
|---|---|
| ENI | Elastic Network Interface. Virtual NIC with a private IP inside a VPC |
| NIC | Network Interface Card. A network adapter (physical or virtual) |
| CIDR | Classless Inter-Domain Routing. IP-range notation startIP/prefix-length (e.g., 10.0.0.0/16) |
| BGP | Border Gateway Protocol. Dynamic routing protocol |
| IPsec | IP Security. Network-layer protocol for packet encryption and integrity |
| L3 | OSI network layer (IP) |
| AZ | Availability Zone. Datacenter unit within a region |
General
| Acronym | Meaning |
|---|---|
| SaaS | Software as a Service. Managed software services (Salesforce, Datadog, etc.) |
| PoC | Proof of Concept. A small-scale implementation to validate feasibility |
| On-prem / On-premises | Your own datacenter or office server room — infrastructure you operate outside a public cloud like AWS |