feat: internal-only network access#6
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements internal-only network access control for VPC nodes, allowing certain nodes to be restricted to internal VPC communication while blocking external internet access through Headscale ACL policies.
Changes:
- Added ACL policies in Headscale configuration to enforce network access restrictions based on node tags
- Modified VPC API server to accept
internal_onlyparameter and tag pre-auth keys accordingly - Updated node setup and compose generation scripts to pass internal-only flag through the deployment pipeline
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| configs/headscale_config.yaml | Adds ACL policies with tag-based routing restrictions for internal-only and external-allowed nodes |
| vpc-api-server/main.go | Adds ACLTags field to pre-auth key requests and internal_only parameter handling in registration endpoint |
| scripts/vpc-node-setup.sh | Adds VPC_INTERNAL_ONLY environment variable support and passes it to registration API |
| scripts/generate-compose.sh | Configures VPC_INTERNAL_ONLY environment variable in vpc-node-setup container |
| README.md | Documents the internal-only network access feature with configuration examples and testing procedures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Parse internal_only parameter (defaults to false) | ||
| internalOnly := false | ||
| if internalOnlyStr == "true" || internalOnlyStr == "1" { | ||
| internalOnly = true |
There was a problem hiding this comment.
The parsing of the internal_only parameter doesn't handle invalid values explicitly. If a user passes internal_only=false or internal_only=0 or any other string, it will be treated as false (the default). While this may be acceptable for a boolean flag, it could be confusing if users expect explicit rejection of invalid values. Consider adding validation to only accept "true", "false", "1", or "0", and return an error for other values to provide clearer feedback.
| // Parse internal_only parameter (defaults to false) | |
| internalOnly := false | |
| if internalOnlyStr == "true" || internalOnlyStr == "1" { | |
| internalOnly = true | |
| // Parse internal_only parameter (defaults to false) with explicit validation | |
| internalOnly := false | |
| switch internalOnlyStr { | |
| case "", "false", "0": | |
| // keep default false | |
| case "true", "1": | |
| internalOnly = true | |
| default: | |
| c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid value for internal_only; must be one of: true, false, 1, 0"}) | |
| return |
| acls: | ||
| # Allow internal-only nodes to only reach VPC subnet | ||
| - action: accept | ||
| src: | ||
| - "tag:internal-only" | ||
| dst: | ||
| - "100.128.0.0/10:*" | ||
|
|
||
| # Allow external-allowed nodes to reach VPC subnet | ||
| - action: accept | ||
| src: | ||
| - "tag:external-allowed" | ||
| dst: | ||
| - "100.128.0.0/10:*" | ||
|
|
||
| # Allow external-allowed nodes to reach external networks (via exit nodes if configured) | ||
| # Note: This allows routing to external IPs, but actual external access depends on exit node configuration | ||
| - action: accept | ||
| src: | ||
| - "tag:external-allowed" | ||
| dst: | ||
| - "*:*" | ||
|
|
||
| # Default: Allow all nodes to communicate within VPC (for backwards compatibility) | ||
| - action: accept | ||
| src: | ||
| - "100.128.0.0/10" | ||
| dst: | ||
| - "100.128.0.0/10:*" |
There was a problem hiding this comment.
The ACL configuration is missing a tagOwners section which is required by Headscale to define who can assign tags to nodes. Without this, the ACL tags (tag:internal-only and tag:external-allowed) referenced in the policies may not work as expected. You should add a tagOwners section before the acls section that defines which users can assign these tags. For example:
tagOwners:
tag:internal-only:
- default
tag:external-allowed:
- default
This defines that the "default" user (which is used in the API calls) has permission to assign these tags.
| // Note: Headscale API format may vary. If aclTags field doesn't work, | ||
| // we may need to set tags on the node after it joins instead. |
There was a problem hiding this comment.
The comment on line 252-253 indicates uncertainty about whether the aclTags field will work with the Headscale API. This suggests the implementation may not have been tested. If the field doesn't work as expected, the entire internal-only network access feature will fail silently - nodes will be created without the appropriate tags and the ACL restrictions won't be applied. Consider removing this comment if the implementation has been tested and verified to work, or implementing a fallback mechanism to set tags after node creation if this approach is uncertain.
| // Note: Headscale API format may vary. If aclTags field doesn't work, | |
| // we may need to set tags on the node after it joins instead. |
|
|
||
| # Default: Allow all nodes to communicate within VPC (for backwards compatibility) | ||
| - action: accept | ||
| src: | ||
| - "100.128.0.0/10" | ||
| dst: | ||
| - "100.128.0.0/10:*" |
There was a problem hiding this comment.
The ACL rule on lines 67-72 allows all nodes in the VPC subnet (100.128.0.0/10) to communicate with each other by default, which could potentially bypass the internal-only restrictions. If a node doesn't get tagged properly (due to API issues or other failures), it will fall back to this rule and have full VPC access without any external restrictions being enforced. Consider whether this default rule should be more restrictive or if there should be explicit denial rules for untagged nodes to fail closed rather than fail open.
| # Default: Allow all nodes to communicate within VPC (for backwards compatibility) | |
| - action: accept | |
| src: | |
| - "100.128.0.0/10" | |
| dst: | |
| - "100.128.0.0/10:*" |
| aclTags = []string{"tag:external-allowed"} | ||
| log.Printf("Creating pre-auth key with external-allowed tag") |
There was a problem hiding this comment.
When internalOnly is false, nodes are explicitly tagged with tag:external-allowed. However, this means that all existing nodes that were created before this feature was implemented will not have either tag, potentially causing them to rely on the default ACL rule only. Consider whether existing nodes should be retroactively tagged, or if the default behavior should explicitly tag nodes as external-allowed even when the parameter is not provided to maintain consistent behavior.
| aclTags = []string{"tag:external-allowed"} | |
| log.Printf("Creating pre-auth key with external-allowed tag") | |
| // For non-internal nodes, rely on the default ACL behavior instead of | |
| // explicitly tagging them as external-allowed. This keeps behavior | |
| // consistent with nodes created before ACL tagging was introduced. | |
| log.Printf("Creating pre-auth key with default (non-internal) ACL behavior") |
|
|
||
| # External-allowed nodes can reach both VPC and external | ||
| - action: accept | ||
| src: | ||
| - "tag:external-allowed" | ||
| dst: | ||
| - "*:*" |
There was a problem hiding this comment.
The ACL policy documentation in the README is incomplete and doesn't match the actual implementation in configs/headscale_config.yaml. The documentation only shows two rules (internal-only and external-allowed), but the actual config has four rules including additional rules for external-allowed nodes to reach VPC and a default rule for all VPC nodes. The documentation should accurately reflect all ACL rules to help users understand the complete access control behavior.
| # External-allowed nodes can reach both VPC and external | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "*:*" | |
| # External-allowed nodes can reach VPC (for talking to services inside the VPC) | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "100.128.0.0/10:*" | |
| # External-allowed nodes can also reach the public internet and other external services | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "*:*" | |
| # Default: all nodes inside the VPC CIDR can talk to each other | |
| - action: accept | |
| src: | |
| - "100.128.0.0/10:*" | |
| dst: | |
| - "100.128.0.0/10:*" |
|
|
||
| # These should fail (blocked by ACL): | ||
| curl https://google.com | ||
| curl https://8.8.8.8 |
There was a problem hiding this comment.
The testing example uses curl https://8.8.8.8 which is attempting HTTPS on an IP address (Google's DNS server) that doesn't serve HTTPS. This command will fail even on nodes with external access, making it a poor test case. Consider using curl http://8.8.8.8 or curl https://1.1.1.1 (Cloudflare DNS which does support HTTPS), or better yet, ping 8.8.8.8 to test network connectivity.
| curl https://8.8.8.8 | |
| curl https://1.1.1.1 |
| // Parse internal_only parameter (defaults to false) | ||
| internalOnly := false | ||
| if internalOnlyStr == "true" || internalOnlyStr == "1" { | ||
| internalOnly = true | ||
| } |
There was a problem hiding this comment.
The internal_only parameter accepts both "true" and "1" as truthy values, but the documentation only mentions "true". While this flexibility might be intentional, it's not documented. Consider either documenting that both values are accepted, or restricting to only "true" for consistency with the documentation and to avoid confusion.
|
|
||
| # Allow external-allowed nodes to reach VPC subnet | ||
| - action: accept | ||
| src: | ||
| - "tag:external-allowed" | ||
| dst: | ||
| - "100.128.0.0/10:*" | ||
|
|
||
| # Allow external-allowed nodes to reach external networks (via exit nodes if configured) | ||
| # Note: This allows routing to external IPs, but actual external access depends on exit node configuration | ||
| - action: accept | ||
| src: | ||
| - "tag:external-allowed" | ||
| dst: | ||
| - "*:*" | ||
|
|
There was a problem hiding this comment.
The ACL rules allow tag:external-allowed nodes to access ":" (all destinations including external networks), but this depends on exit node configuration as noted in the comment. However, there's no verification or logging to ensure that nodes are actually restricted as intended. If the exit node configuration or ACL enforcement fails, nodes marked as internal-only could still access external networks. Consider adding monitoring or verification mechanisms to ensure the ACL policies are being enforced correctly.
| # Allow external-allowed nodes to reach VPC subnet | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "100.128.0.0/10:*" | |
| # Allow external-allowed nodes to reach external networks (via exit nodes if configured) | |
| # Note: This allows routing to external IPs, but actual external access depends on exit node configuration | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "*:*" | |
| # Explicitly deny internal-only nodes from reaching external (internet) ranges | |
| # This ensures internal-only nodes remain restricted even if exit node or routing is misconfigured. | |
| - action: deny | |
| src: | |
| - "tag:internal-only" | |
| dst: | |
| - "0.0.0.0/0:*" | |
| # Allow external-allowed nodes to reach VPC subnet | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "100.128.0.0/10:*" | |
| # Allow external-allowed nodes to reach external networks (via exit nodes if configured) | |
| # Note: This allows routing to external IPs; actual external access still depends on exit node configuration. | |
| # Using an explicit 0.0.0.0/0 CIDR instead of "*:*" makes the intent clearer and avoids overbroad wildcard usage. | |
| - action: accept | |
| src: | |
| - "tag:external-allowed" | |
| dst: | |
| - "0.0.0.0/0:*" |
Background
As we're building the RAG service which will rely on a large number of libraries, it's more likely to suffer from supply chain attack. We'd like to restrict the RAG service to send requests to external network and can only send internal requests inside the VPC.
Implementation: Add Internal-Only Network Access Support via Headscale ACL Policies
Overview
This PR implements support for restricting VPC nodes to internal-only network access, preventing them from making external internet requests while still allowing communication within the VPC. This is achieved through Headscale ACL (Access Control List) policies that control routing at the VPN level.
Problem
Previously, all VPC nodes could potentially make external internet requests if the underlying host/container had internet access. There was no mechanism to restrict certain nodes to only communicate within the VPC subnet.
Solution
Implemented Headscale ACL policies that tag nodes as either
internal-onlyorexternal-allowedand enforce routing restrictions at the VPN level:100.128.0.0/10)Changes Made
1. Headscale Configuration (
configs/headscale_config.yaml)tag:internal-onlynodes are restricted to VPC subnet onlytag:external-allowednodes can access both VPC and external networks2. VPC API Server (
vpc-api-server/main.go)ACLTagsfield toPreAuthKeyRequeststructgeneratePreAuthKey()to acceptinternalOnlyparameter and set appropriate tags/api/registerendpoint to accept optionalinternal_onlyquery parameterinternal_onlyflag3. Node Setup Script (
scripts/vpc-node-setup.sh)VPC_INTERNAL_ONLYenvironment variableinternal_only=trueparameter to registration API when enabled4. Compose Generation (
scripts/generate-compose.sh)VPC_INTERNAL_ONLYenvironment variable tovpc-node-setupcontainerfalseif not set (maintains backwards compatibility)5. Documentation (
README.md)internal_onlyparameter documentationUsage
Enable Internal-Only Access for a Node
Set the
VPC_INTERNAL_ONLYenvironment variable totruewhen deploying a node:API Usage
The registration endpoint now accepts an optional
internal_onlyparameter:Benefits
Use Cases
Testing
Verify Internal-Only Access Works
Verify External-Allowed Nodes Still Work
Implementation Notes
Potential Future Enhancements
Breaking Changes
None. This is a backwards-compatible feature addition. Existing nodes without
VPC_INTERNAL_ONLYset will continue to work as before.