AWS VPC Edge Routing Guide Part 2: Connecting a VPC to Other VPCs, AWS Services, and On-Prem — A Decision Tree for VPC Endpoint, PrivateLink, Peering, Transit Gateway, and Direct Connect

AWS VPC Edge Routing Guide Part 2: Connecting a VPC to Other VPCs, AWS Services, and On-Prem — A Decision Tree for VPC Endpoint, PrivateLink, Peering, Transit Gateway, and Direct Connect


Introduction

In the previous post we covered picking the entry point that fronts a VPC. This post handles the next decision — once traffic is inside (or already lives inside) a VPC, how does it reach another VPC, an AWS-managed service, or on-prem (on-premises — your own datacenter or office server room, infrastructure you operate outside the public cloud)?

This decision goes wrong far more often than Part 1’s. There are six candidates, each works on a fundamentally different mechanism, and “looks similar but actually can’t do X” comes up everywhere. Building an N×N mesh of VPC Peerings only to rip it out for Transit Gateway a year later, or routing S3 traffic through a NAT Gateway and quietly burning hundreds of dollars a month — both are common.

  • Part 0 — Primer: network and AWS fundamentals
  • Part 1 — Picking the entry point: ALB / NLB / API Gateway / CloudFront / Global Accelerator
  • Part 2 — VPC-to-VPC and on-prem connectivity: VPC Endpoint / PrivateLink / Peering / Transit Gateway / VPN / Direct Connect (this post)
  • Part 3 — Inside the VPC: IGW / NAT GW / Route Tables / Security Group vs NACL
  • Part 4 — DNS decisions and Route 53: Hosted Zone / Routing Policy / Alias vs CNAME / Health Check
  • Part 5 — Four standard patterns: from decision tree to first sketch

Same target reader as Part 1 — backend or infrastructure engineers who’ve built a VPC but can’t explain “what’s the difference between option A and option B” in one line. After this post, the goal is that “connect this VPC to something” is a 30-second decision.


TL;DR

  • The first split is “where does the destination live” — AWS managed service / another VPC / a different organization’s service / on-prem. The candidate set barely overlaps across these four.
  • For S3 and DynamoDB, Gateway Endpoint (free) is almost always the answer. Routing S3 traffic through NAT Gateway charges per-GB data processing.
  • VPC Peering is 1:1, Transit Gateway is N:N. Past three or four VPCs, a Peering mesh stops being operationally feasible — switch to Transit Gateway.
  • To privately expose a service from another organization (or VPC), use PrivateLink. It sidesteps IP-overlap problems and exposes only one service in one direction.
  • On-prem starts with VPN and graduates to Direct Connect (dedicated link). The two aren’t competitors — production patterns usually run Direct Connect as primary with a VPN backup.

1. Why this decision is hard

Unlike Part 1’s entry-point candidates, the six VPC-connectivity options split by the underlying decision problem itself, into four buckets. So the same word “connection” pulls in completely different candidate sets depending on what you’re connecting.

flowchart TB
    VPC[Inside a VPC,<br/>need to reach outside]
    VPC --> Q1{What are you<br/>connecting to?}
    Q1 --> S1[AWS managed services<br/>S3, DynamoDB, KMS, ECR, SSM, ...]
    Q1 --> S2[Another VPC<br/>same account/org]
    Q1 --> S3[A different org's service<br/>SaaS, partner, sister BU]
    Q1 --> S4[On-prem<br/>own DC / office]
    S1 -.-> A1[VPC Endpoint]
    S2 -.-> A2["VPC Peering or<br/>Transit Gateway"]
    S3 -.-> A3[PrivateLink]
    S4 -.-> A4["Site-to-Site VPN or<br/>Direct Connect"]

“Connect VPC to X” almost completely partitions by where X lives. So this guide’s decision tree puts destination-type as the first branch, then asks two or three more variables inside each region.

In one table:

DestinationDecision variablesCandidates
AWS managed serviceS3/DynamoDB or other?Gateway Endpoint / Interface Endpoint
Same-org VPC1:1 or N:N?VPC Peering / Transit Gateway
Other-org serviceOne-way exposure / IP isolation neededPrivateLink
On-premInternet OK / dedicated line neededSite-to-Site VPN / Direct Connect / both

Sections 2–5 cover the candidates per region with their mechanisms and selection criteria. §6 has the full decision tree, §7 the anti-patterns.


2. Reaching AWS managed services — VPC Endpoint

By default, when an EC2 inside a VPC talks to an AWS managed service like S3, DynamoDB, KMS, or ECR, traffic goes out to the internet — through IGW from a Public Subnet, or through NAT Gateway from a Private one. Both eventually return to the AWS backbone, but the round-trip “leave the VPC, come back in” hurts cost and security.

VPC Endpoint removes that detour. “Talk to AWS services directly from inside the VPC, without traversing the internet” is the one-line definition.

2.1 Gateway Endpoint vs Interface Endpoint

Endpoints come in two flavors, and the choice is automatic based on which service you’re targeting.

Gateway EndpointInterface Endpoint
Supported servicesS3, DynamoDB onlyMost others (KMS, ECR, SSM, CloudWatch, Lambda, …)
MechanismRoute Table + prefix listENI in the VPC + Private DNS
CostFree$0.01/hour/AZ + $0.01/GB
Reachable from another VPC / on-premNoYes (via PrivateLink)

The rule is simple — S3 and DynamoDB → Gateway Endpoint; everything else → Interface Endpoint.

2.2 Why Gateway Endpoint is free, and the pitfalls

Gateway Endpoint is just “add a prefix-list route for S3/DynamoDB to the Route Table.” Packets exit through that route instead of the IGW or NAT Gateway, and AWS handles the path on its own backbone — no extra infrastructure (no ENI, no traffic processing). That’s why it’s free.

flowchart LR
    EC2 -->|"default: internet detour"| NAT[NAT Gateway]
    NAT -->|$0.045/GB| S3a[S3 public IP]
    EC2 -->|"Gateway Endpoint<br/>route"| S3b[S3 direct]
    style S3b stroke:#48cae4,stroke-width:2px

Two pitfalls:

  • Same-region only — Gateway Endpoint reaches S3 buckets in the same region only. Cross-region S3 still goes through NAT.
  • Useless without a Route Table entry — creating the Endpoint isn’t enough; the prefix list has to be added to the Route Table of the EC2’s Subnet for routing to actually change. 99% of “I created it but nothing changed” comes from this miss.

2.3 Interface Endpoint — an ENI inside the VPC

Interface Endpoint works completely differently. It puts an ENI in a Subnet of your VPC and attaches Private DNS so that the service’s official domain (e.g., kms.ap-northeast-2.amazonaws.com) resolves to the ENI’s private IP. The full ENI primer is in Part 0 §3.1 — the one-line takeaway is that an ENI is a virtual NIC living in the VPC with a private IP, and that private IPs, Security Groups, and EIPs all bind to the ENI rather than to the instance directly.

Two implications:

  • It costs money — $0.01/hour/AZ + $0.01/GB. Three AZs gives you $21/month fixed. Small numbers, but creating Interface Endpoints for every service in a small environment can quietly add up to $100200/month.
  • Endpoint policies for access control — IAM-policy-style rules on which resources can be reached through this Endpoint. Common in compliance-driven environments.

Note: Interface Endpoint and PrivateLink are the same mechanism. AWS managed services exposed via PrivateLink → Interface Endpoint; user services exposed the same way → PrivateLink (§5). Same box, different label.

2.4 Aside: how do you test locally with VPC Endpoint?

A frequent practical question. The principle in one line — VPC Endpoint only affects traffic originating inside the VPC. Your laptop sits outside the VPC and reaches AWS over the public internet, so usually plain IAM access keys work for local testing as if Endpoints don’t exist. Endpoint configuration is just a routing change inside the production VPC — it doesn’t alter the access controls of the AWS services themselves.

Where it does break: when bucket or Endpoint policies enforce aws:SourceVpce (“only allow access through this Endpoint”), no credential from outside reaches them. Two patterns cover most cases:

PatternHow it works
Separate dev account (most common)Production account locks down with Endpoint-only policies; dev/staging accounts allow public access. Local development uses dev-account credentials. Code stays identical, only the IAM and resource policies differ.
LocalStack (offline / CI)Docker-emulate S3, DynamoDB, SQS locally; point AWS SDK endpoint URL to http://localhost:4566. Zero real AWS calls — best for CI determinism.

When these aren’t enough — when you need to validate the Endpoint policies themselves, or company policy forces the local path to mirror production — fall back to SSM Session Manager port forwarding, AWS Client VPN to join the VPC from your laptop, or running integration tests inside a VPC-attached CodeBuild project.


3. Same-org VPCs — Peering vs Transit Gateway

Anyone running more than one VPC hits this fork. Splitting dev/staging/prod, separating per team, or building in another region — all create the moment when “these two should be able to talk to each other.”

3.1 VPC Peering — 1:1 direct

VPC Peering is the simplest way: connect two VPCs at L3 by adding routes on each side. Both Route Tables get a route for the other VPC’s CIDR, both sides accept the peering, done.

flowchart LR
    VPCa[VPC A<br/>10.0.0.0/16]
    VPCb[VPC B<br/>10.1.0.0/16]
    VPCa <-->|Peering| VPCb

Properties and limits:

  • Almost zero cost — Peering itself is free; only data transfer is billed.
  • Non-transitive — A↔B and B↔C don’t automatically give you A↔C. You need a separate Peering for that.
  • CIDRs cannot overlap — overlapping IP ranges make Peering impossible.
  • Mesh explodes with VPC count — N VPCs full-mesh = N(N-1)/2 peerings. 5 → 10. 10 → 45.

3.2 Transit Gateway — N:N hub

Transit Gateway (TGW) is a hub model where all VPCs and on-prem connections attach to a single Transit Gateway. Each attachment routes through TGW Route Tables. Adding a new VPC means attaching it to TGW — none of the existing VPCs need to change.

flowchart TB
    TGW[Transit Gateway<br/>hub]
    VPCa[VPC A]
    VPCb[VPC B]
    VPCc[VPC C]
    VPCd[VPC D]
    DC[Direct Connect /<br/>VPN]
    VPCa --- TGW
    VPCb --- TGW
    VPCc --- TGW
    VPCd --- TGW
    DC --- TGW

Differences that matter:

  • Transitive — A→B→C routing is just a TGW Route Table setting away.
  • On-prem / DX / VPN integration — attach DX/VPN to TGW once, all VPCs share it.
  • Multi-AZ HA happens per attachment — the TGW itself is region-scoped and managed (auto-HA), but each VPC attachment should be configured with ENIs across AZs (a single-AZ failure then only impacts that AZ’s VPC traffic).
  • It costs — $0.05/hour per attachment + $0.02/GB. Five VPCs + one DX = ~$200+/month.

3.3 Where they split

VariableVPC PeeringTransit Gateway
Connection count1:1 (or 2~3 full mesh)4+
TransitiveNoYes
Cost$0 (data only)$0.05/hour per attachment
On-prem integrationConfigured separatelyOne TGW attach, all VPCs share
CIDR overlapNot allowedCan be split via TGW Route Tables
Operational complexityExplodes with VPC countCentralized

Practical crossover: around 3~4 VPCs. Below that, Peering’s zero cost wins. Above, mesh management eats your operations and TGW’s fee earns its keep.


Section 3 covers within-org VPC connections. But there’s a separate scenario — privately calling a service from another organization (a SaaS vendor, a partner, or even another business unit at the same company) from your own VPC. PrivateLink is built for that decision problem.

flowchart LR
    subgraph CON["Consumer VPC (caller)"]
        EC2[EC2]
        IE[Interface Endpoint<br/>ENI]
    end
    subgraph PROV["Provider VPC (service owner)"]
        NLB[NLB]
        SVC[Service]
    end
    EC2 -->|"private DNS"| IE
    IE -.PrivateLink.-> NLB
    NLB --> SVC

The mechanism is straightforward. The Provider defines a “Service” in front of an NLB; the Consumer creates an Interface Endpoint in their VPC connected to that Service. The Consumer talks to a domain that resolves to an ENI in their own VPC; the Provider only exposes their NLB.

PrivateLink’s wins versus Peering / TGW:

  • IP overlap is irrelevant — it works even when Consumer/Provider VPC CIDRs collide. The Consumer just talks to an ENI in its own VPC.
  • One-way and service-scoped — Provider exposes only the service behind the specified NLB; the rest of the VPC stays private. Peering exposes the entire VPC.
  • Consumer-side SG control — attach an SG to the Endpoint ENI to restrict which Consumer-side EC2s can call.

Even within the same organization, PrivateLink is sometimes the right call.

SituationPick
Two VPCs need full bidirectional VPC-wide accessPeering / TGW
One VPC calls a single service in another VPCPrivateLink
CIDRs overlapPrivateLink (or TGW + NAT)
Calling a service in another AWS account/orgPrivateLink (essentially the only answer)
Privately calling a SaaS vendor’s servicePrivateLink (if they support it)

Summary: “VPC-wide” → Peering/TGW. “One service” → PrivateLink.


5. On-prem — VPN and Direct Connect

Connecting on-prem to a VPC narrows to two options: Site-to-Site VPN and Direct Connect. They’re not competitors — they’re a “low-barrier vs high-performance” progression.

5.1 Site-to-Site VPN — IPsec tunnel over the internet

Site-to-Site VPN sets up IPsec (IP Security — a network-layer protocol that encrypts packets and verifies their integrity) tunnels over the public internet between AWS-side Virtual Private Gateway (VGW) or TGW and your on-prem router. Two IPsec tunnels come up automatically; both sides exchange routes via BGP or static routing.

  • Up in days — no hardware orders, just router configuration.
  • Bandwidth bound by your internet circuit — roughly 1.25 Gbps per tunnel cap.
  • Latency exposed to internet routing volatility — fine most days, but ISP issues show through.
  • Pricing — $0.05/hour per tunnel. AWS automatically creates two → $0.10/hour.

5.2 Direct Connect — dedicated fiber to AWS

Direct Connect (DX) is actual fiber laid to an AWS facility (or DX Location) — a dedicated circuit.

  • 1, 10, or 100 Gbps options — bandwidth VPN can’t match.
  • Stable latency — no internet routing in the picture.
  • Different pricing model — port-hour fee plus data transfer. But AWS-to-on-prem outbound on DX is much cheaper than over the internet, so high-volume traffic ends up cheaper than VPN.
  • Takes time — circuit ordering and physical install takes weeks to months.

5.3 They complement each other — DX primary + VPN backup

The standard production pattern is “DX primary, VPN backup.”

flowchart LR
    subgraph AWS
        TGW[Transit Gateway]
    end
    subgraph OnPrem[On-prem DC]
        Router[Router]
    end
    Router ===|DX primary| TGW
    Router -.VPN backup.-> TGW

Reasons:

  • DX is a single physical circuit — one cable cut and you’re down. With VPN configured, BGP fails over automatically.
  • VPN is your fast start — bring up VPN first while DX is being provisioned, promote DX to primary later.
  • BGP picks the path automatically — no manual ops involvement; availability is built in.
Site-to-Site VPNDirect Connect
MediumInternet + IPsecDedicated fiber
Bandwidth~1.25 Gbps per tunnel1 / 10 / 100 Gbps
LatencySubject to internet routingStable
Build timeDaysWeeks to months
Pricing$0.05/hour per tunnelPort-hour + data (cheaper at scale)
When to pickPoC, mid-scale, backupProduction primary

6. The decision tree

The four regional decisions combine into:

flowchart TD
    Start([Need to communicate from inside a VPC]) --> Q1{Connecting to what?}
    Q1 -->|AWS managed service| QA{S3 or DynamoDB?}
    QA -->|Yes| GE[Gateway Endpoint<br/>free]
    QA -->|No| IE[Interface Endpoint]
    Q1 -->|Same-org other VPC| QB{One service only?<br/>Overlapping CIDRs?}
    QB -->|Yes| PL1[PrivateLink]
    QB -->|No| QC{4+ VPCs?<br/>On-prem integration?}
    QC -->|Yes| TGW[Transit Gateway]
    QC -->|No| Peer[VPC Peering]
    Q1 -->|Other-org service| PL2[PrivateLink]
    Q1 -->|On-prem| QD{Dedicated line + high bandwidth?}
    QD -->|Yes| QE{Production primary?}
    QE -->|Yes| DXVPN[Direct Connect<br/>+ VPN backup]
    QE -->|No| DX[Direct Connect]
    QD -->|No| VPN[Site-to-Site VPN]

Each branch in one line:

  • QA (S3/DynamoDB): always Gateway Endpoint. Routing through NAT Gateway just leaks data-processing fees.
  • QB (single service / CIDR overlap): either condition → PrivateLink wins decisively.
  • QC (VPC count, on-prem integration): under 4 + no on-prem → Peering. Otherwise TGW.
  • QD (bandwidth/latency demands): internet circuit suffices → VPN. Need dedicated line → DX.
  • QE (production primary?): production primary should be DX + VPN backup, not DX alone.

Key: The first-level branch is destination type, not cost or features. Once that’s set, the candidate set drops to one or two immediately.


7. Five common anti-patterns

7.1 NAT Gateway for S3 access

The most common and most expensive leak. S3 traffic through a NAT Gateway costs $0.045/GB, and analytics, log shipping, and image upload workloads run hundreds of GB to TBs monthly — that just gets added to the bill. A Gateway Endpoint plus one prefix-list line in the Route Table is a five-minute change that saves hundreds of dollars a month. (For a side-by-side cost comparison of Gateway Endpoint vs. NAT GW vs. IGW vs. Peering / TGW, see Part 0 Appendix I.)

7.2 N:N mesh of VPC Peerings

Five-plus VPCs all needing to talk to each other, drawn as a full Peering mesh. Route Tables explode and every new VPC means touching every existing VPC’s Route Table. Move to TGW — yes, attachments cost, but the operational complexity reduction more than pays for itself.

7.3 Direct Connect alone

Running only one DX circuit and going dark when the fiber is cut. DX is a single physical circuit; on its own its SLA isn’t materially better than a regular internet circuit. AWS’s standard pattern is always DX + VPN backup; for higher availability, run two DX circuits over separate physical paths or use a second DX Location.

7.4 Interface Endpoints in every AZ

“For AZ separation” — putting an Interface Endpoint in every AZ. $0.01/hour/AZ adds up; small services can leak tens of dollars a month doing this. Single-AZ environments (dev, staging) or low-traffic services usually only need one or two AZs. Even in production, look at traffic patterns first.

Calling a single service from another org (or BU) but solving it with VPC Peering. Peering exposes the entire VPC — overkill for security, and impossible if the two orgs’ CIDRs collide. Anything that fits PrivateLink and is built on Peering will get torn out the moment a security review lands.


Recap

What this post covered:

  1. The first split is where the destination lives: AWS managed service / same-org VPC / other-org service / on-prem. The four buckets barely share candidates.
  2. S3/DynamoDB → Gateway Endpoint, everything else → Interface Endpoint. Gateway is free and almost always right; Interface costs per AZ, so be deliberate.
  3. VPC Peering vs Transit Gateway is “1:1 vs N:N.” Past 3~4 VPCs, TGW is operationally non-negotiable.
  4. Other-org services, CIDR collisions, single-service exposure → PrivateLink. It solves problems Peering and TGW can’t.
  5. On-prem starts with VPN, graduates to DX, but production runs both — DX primary + VPN backup.

Part 2’s goal was to make “VPC needs to talk to something outside” a 30-second decision. The decision tree narrows to one or two candidates in step one, and lands on exactly one by step three.

Part 3 covers routing inside the VPC — how IGW and NAT Gateway actually work, the priority order Route Tables evaluate in, and where stateful Security Groups and stateless NACLs split in practice. After ingress and external connectivity are settled, how packets actually flow inside the VPC is what’s left.

Note — series flow: Every connectivity decision here also runs after a DNS step, which is covered in Part 4 (DNS and Route 53). §2.3 mentions that Interface Endpoint relies on “Private DNS” — that Private DNS is in fact a Route 53 Private Hosted Zone (Part 4 §2.2). Pinning that connection makes the whole series fit together.


Appendix. One-page summary

A. One-line decision per region

DestinationFirst pickIf first pick doesn’t fit
S3 / DynamoDBGateway Endpoint(no alternative — always pick it)
Other AWS managed servicesInterface EndpointInternet path (NAT GW)
Same-org VPC, 1:1VPC PeeringTGW
Same-org VPC, N:NTransit Gateway(mesh Peering is an anti-pattern)
Other-org servicePrivateLink(essentially no alternative)
On-prem PoC / mid-scaleSite-to-Site VPNDX
On-prem production primaryDX + VPN backup(DX alone is an anti-pattern)

B. Pricing in one line

CandidateIdle costData cost
Gateway Endpoint$0$0 (same region)
Interface Endpoint$0.01/hour/AZ$0.01/GB
VPC Peering$0$0.01/GB cross-AZ, free same-AZ
Transit Gateway$0.05/hour per attachment$0.02/GB
PrivateLink (Provider)NLB cost + Endpoint ServiceNLB data
Site-to-Site VPN$0.05/hour per tunnelStandard outbound
Direct ConnectPort-hour (capacity-based)DX outbound (cheaper than internet)

C. Official AWS docs

D. Acronyms

AWS services and components

AcronymMeaning
VPCVirtual Private Cloud. An isolated virtual network inside AWS
EC2Elastic Compute Cloud. AWS virtual servers
RDSRelational Database Service. AWS-managed RDB
LambdaAWS serverless compute
S3Simple Storage Service. AWS object storage
DynamoDBAWS-managed NoSQL key-value database
KMSKey Management Service. AWS-managed encryption keys
ECRElastic Container Registry. AWS container image registry
SSMAWS Systems Manager. Unified EC2 ops (Session Manager etc.)
ALB / NLBApplication / Network Load Balancer (L7 / L4)

Connectivity and routing

AcronymMeaning
VPC EndpointA path inside the VPC to AWS services without going through the internet (Gateway / Interface variants)
PrivateLinkA one-way connection that exposes a service behind an NLB and is consumed via an ENI in the consumer’s VPC
TGWTransit Gateway. N:N VPC and on-prem hub
DXDirect Connect. Dedicated fiber to AWS
VPNVirtual Private Network. Here, Site-to-Site VPN
VGWVirtual Private Gateway. AWS-side VPN endpoint
IGWInternet Gateway. The bidirectional gateway between VPC and the internet
NATNetwork Address Translation. Private-IP-to-public-IP translation
NACLNetwork Access Control List. Subnet-level stateless firewall
SGSecurity Group. ENI-level stateful firewall

Network basics

AcronymMeaning
ENIElastic Network Interface. Virtual NIC with a private IP inside a VPC
NICNetwork Interface Card. A network adapter (physical or virtual)
CIDRClassless Inter-Domain Routing. IP-range notation startIP/prefix-length (e.g., 10.0.0.0/16)
BGPBorder Gateway Protocol. Dynamic routing protocol
IPsecIP Security. Network-layer protocol for packet encryption and integrity
L3OSI network layer (IP)
AZAvailability Zone. Datacenter unit within a region

General

AcronymMeaning
SaaSSoftware as a Service. Managed software services (Salesforce, Datadog, etc.)
PoCProof of Concept. A small-scale implementation to validate feasibility
On-prem / On-premisesYour own datacenter or office server room — infrastructure you operate outside a public cloud like AWS
Shop on Amazon

As an Amazon Associate, I earn from qualifying purchases.