Skip to content

OCPBUGS-86303: e2e/ote-ccm-aws: enhance tests to run hybrid in hypershift HC#464

Open
mtulio wants to merge 2 commits into
openshift:mainfrom
mtulio:OCPBUGS-83399-v2
Open

OCPBUGS-86303: e2e/ote-ccm-aws: enhance tests to run hybrid in hypershift HC#464
mtulio wants to merge 2 commits into
openshift:mainfrom
mtulio:OCPBUGS-83399-v2

Conversation

@mtulio
Copy link
Copy Markdown
Contributor

@mtulio mtulio commented May 20, 2026

Summary

OCPBUGS-86303

The [cloud-provider-aws-e2e-openshift] AWSServiceLBNetworkSecurityGroup e2e tests fail on HyperShift hosted clusters due to three distinct issues:

  • (1) the AWS SDK resolves the ELB/EC2 API endpoint to a VPC private endpoint DNS name (elasticloadbalancing.<vpc-endpoint-id>.amazonaws.com) that is unreachable from the CI test pod running on the management cluster;
  • (2) the AWS_REGION environment variable is set to a CI lease UUID (from LEASED_RESOURCE) instead of a valid AWS region, producing invalid endpoint URLs even with the public endpoint override; and
  • (3) the cloud-config validation test looks for ConfigMap cloud-conf in namespace openshift-cloud-controller-manager on the guest cluster, but in HyperShift the CCM configuration lives in ConfigMap aws-cloud-config in the hosted control plane namespace on the management cluster.

Proposed Solution

This PR forces the ELBv2 and EC2 clients to use public regional AWS endpoints via BaseEndpoint, validates the SDK-resolved region against a pattern covering all five AWS partitions (standard, China, GovCloud, European Sovereign Cloud, and ISO/ISOB) with fallback to infrastructure/cluster status.platformStatus.aws.region, and makes the cloud-config test topology-aware by detecting External control plane topology to read the config from the HCP namespace on the management cluster using environment variables (HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG, HYPERSHIFT_MANAGEMENT_CLUSTER_NAMESPACE) set by the CI conformance step. It also adds a SKIP_MANAGEMENT_CLUSTER_TESTS opt-in flag for environments without management cluster access, extracts reusable helpers (GetCloudConfig, IsExternalTopology, IsConfigPresentCloudConfig, IsNLBSecurityGroupModeManaged) into common/helper.go without Ginkgo control-flow dependencies, and replaces the stale developer documentation with corrected paths, HyperShift setup instructions, and batch test execution examples.

Test Requirements

We must ensure those tests are validating those changes:

Standalone:

  • regular presubmit jobs
  • HCP / Hosted Cluster:
/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guide for running AWS CCM e2e tests via the OTE test binary, with prerequisites, build/run examples, test listing, batching, and HyperShift usage details.
  • New Features

    • HyperShift topology detection and environment-variable support to opt out of management-cluster tests and select the correct cloud-config source.
  • Bug Fixes

    • Stricter AWS region validation and improved regional endpoint selection for AWS clients.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@mtulio: This pull request references Jira Issue OCPBUGS-86303, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (mrbraga@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds topology-aware cloud-config retrieval for HyperShift, AWS region validation and regional endpoint configuration, centralized common test helpers (including a management-cluster skip toggle), updates NLB tests to use those helpers, promotes an openshift/api dependency, and adds OTE runner documentation.

Changes

AWS CCM E2E Test Refactoring with HyperShift Support

Layer / File(s) Summary
E2E test documentation guide
docs/dev/e2e-ote-ccm-aws.md
Comprehensive guide for running AWS CCM e2e tests via the OpenShift Tests Extension binary, including prerequisites, build instructions, test discovery and execution patterns, and HyperShift-specific configuration with management-cluster skip toggle.
Common test helpers: topology-aware cloud-config and skip logic
openshift-tests/ccm-aws-tests/e2e/common/helper.go
Introduces IsExternalTopology() to detect HyperShift hosted control planes, getHCPCloudConfig() to retrieve cloud-config from the HCP namespace, and SkipIfManagementClusterTestsDisabled() to skip management-cluster-dependent tests; updates GetCloudConfig() to branch on topology; refactors IsFeatureEnabled() to reuse the kube client helper.
AWS client refactoring: region validation and regional endpoints
openshift-tests/ccm-aws-tests/e2e/aws/helper.go
Adds loadAWSConfig() to validate AWS region via regex and fall back to region from cluster Infrastructure; updates createAWSClientLoadBalancer() and createAWSClientEC2() to use the new config loading and conditionally set public regional endpoints via BaseEndpoint when region is non-empty.
Load balancer test refactoring: use centralized common helpers
openshift-tests/ccm-aws-tests/e2e/aws/loadbalancer.go
Updates NLB security group mode test to call SkipIfManagementClusterTestsDisabled(), retrieve cloud-config via common.GetCloudConfig(), and validate via common.IsNLBSecurityGroupModeManaged(); removes local ConfigMap namespace/name/key constants.
Dependency: promote openshift/api to direct requirement
openshift-tests/ccm-aws-tests/go.mod
Moves github.com/openshift/api from the indirect require block to the primary direct require block to support typed Infrastructure resource queries.

Sequence Diagram(s)

sequenceDiagram
  participant TestBinary
  participant common
  participant loadAWSConfig
  participant InfrastructureAPI
  participant AWS_SDK
  TestBinary->>common: GetCloudConfig(ctx, cs)
  common->>InfrastructureAPI: IsExternalTopology / Infrastructure query
  alt external topology
    common->>ManagementCluster: fetch aws-cloud-config using HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG
  else standalone
    common->>KubeAPI: fetch cloud-conf ConfigMap
  end
  TestBinary->>loadAWSConfig: loadAWSConfig(ctx)
  loadAWSConfig->>InfrastructureAPI: getRegionFromInfrastructure (when needed)
  loadAWSConfig->>AWS_SDK: return cfg (with optional BaseEndpoint)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • racheljpg
  • theobarberbany
  • mfbonfigli
🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Typo 'creatomg' appears 5 times instead of 'creating' in By() statements. One test lacks cleanup assertions. Tests have proper timeouts and assertion messages otherwise. Fix typo 'creatomg' to 'creating' in all 5 By() statements. Ensure all tests have explicit cleanup or DeferCleanup patterns.
Microshift Test Compatibility ⚠️ Warning Tests use config.openshift.io APIs (FeatureGate, Infrastructure) not available on MicroShift, with no protection labels or guards. Add [apigroup:config.openshift.io] tag to test name or guard with exutil.IsMicroShiftCluster() check, as config.openshift.io is unavailable on MicroShift.
Topology-Aware Scheduling Compatibility ⚠️ Warning New deployment manifest uses nodeSelector for master nodes, which fails on HyperShift where no control-plane nodes exist in hosted clusters. Add topology-aware logic to detect HyperShift/External topology and conditionally apply nodeSelector only on non-HyperShift clusters, or use node affinity with fallback behavior.
✅ Passed checks (9 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names are stable and deterministic. The Describe block uses fmt.Sprintf with static constants, and all It blocks contain only static strings with no dynamic values.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The 6 new Ginkgo tests validate LoadBalancer Service and security group configuration without creating Deployments, assuming multiple nodes, or using anti-affinity. Compatible with SNO.
Ote Binary Stdout Contract ✅ Passed No stdout violations found. Modified code uses only fmt.Errorf, framework.Logf, and fmt.Sprintf—none print to stdout. main.go unchanged, preserving logrus stderr default.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo tests added; PR modifies existing helpers only. No IPv4 hardcoded addresses, IPv4-specific parsing, or external connectivity detected.
Title check ✅ Passed The title clearly and specifically identifies the main change: enhancing OTE CCM AWS tests to support hybrid execution in HyperShift hosted control planes, which aligns with the substantial code changes across documentation, test helpers, and configurations.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@openshift-ci-robot
Copy link
Copy Markdown

@mtulio: This pull request references Jira Issue OCPBUGS-86303, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (mrbraga@redhat.com), skipping review request.

Details

In response to this:

Summary by CodeRabbit

  • Documentation

  • Added comprehensive guide for running AWS Cloud Controller Manager e2e tests via OpenShift Tests Extension (OTE).

  • New Features

  • Enhanced AWS NLB test suite to support HyperShift cluster topologies.

  • Added ability to conditionally skip management cluster tests in HyperShift deployments.

  • Improved AWS client configuration and endpoint handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
openshift-tests/ccm-aws-tests/e2e/aws/helper.go (1)

128-130: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Same BaseEndpoint conditional logic issue as ELBv2 client.

The condition cfg.Region != "" will always evaluate to true after successful loadAWSConfig, resulting in unconditional BaseEndpoint forcing. See the comment on lines 50-52 for the same issue in createAWSClientLoadBalancer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openshift-tests/ccm-aws-tests/e2e/aws/helper.go` around lines 128 - 130, The
code unconditionally overwrites o.BaseEndpoint because cfg.Region is always
non-empty after loadAWSConfig; change the condition to only set o.BaseEndpoint
when it is not already provided (e.g., check o.BaseEndpoint == nil or empty) in
addition to cfg.Region being non-empty so the endpoint is not forced; adjust the
conditional around the assignment to reference o.BaseEndpoint and cfg.Region
(the symbols in question) similar to the fix applied in
createAWSClientLoadBalancer.
🧹 Nitpick comments (1)
docs/dev/e2e-ote-ccm-aws.md (1)

125-125: ⚡ Quick win

Use hyphenated compound modifier for consistency.

Consider changing “hosted cluster” to “hosted-cluster” in these adjective positions for grammar consistency in the doc.

Also applies to: 161-161

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dev/e2e-ote-ccm-aws.md` at line 125, Change the adjectival phrase
"hosted cluster" to the hyphenated form "hosted-cluster" for consistency in
modifier usage; update each occurrence (e.g., the phrase found in the sentence
starting "When running against a HyperShift hosted cluster, KUBECONFIG must
point to the" and the similar occurrence around line 161) to "hosted-cluster" so
the compound modifier is hyphenated wherever it precedes a noun.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/dev/e2e-ote-ccm-aws.md`:
- Around line 97-98: Replace the placeholder link token "[OCPBUGS-TBD]" in the
docs text (the line referencing OTE framework limitation) with the real issue
identifier or a full URL to the bug tracker; if no final issue exists, remove
the bracketed reference entirely and adjust the sentence to avoid a dead
placeholder so the doc reads correctly.
- Around line 27-38: The fenced code blocks showing the directory tree and the
CI job name lack a language tag; update both fenced blocks (the one containing
the directory tree starting with "openshift-tests/ccm-aws-tests/" and the other
showing the CI job name mentioned around lines 176-178) to declare a language
(e.g., use ```text) so markdownlint passes and rendering is consistent.

In `@openshift-tests/ccm-aws-tests/e2e/aws/helper.go`:
- Around line 50-52: The BaseEndpoint override is currently guarded by
cfg.Region != "" which is always true after loadAWSConfig, so update the
conditional in createAWSClientLoadBalancer and createAWSClientEC2 to reflect the
intended behavior: if you want the override only for HyperShift/external
topology, replace the cfg.Region check with common.IsExternalTopology() (use
that predicate to set o.BaseEndpoint); otherwise, if you intend to always use
the regional public endpoint, remove the conditional entirely and always assign
o.BaseEndpoint using cfg.Region. Ensure you update both functions
(createAWSClientLoadBalancer and createAWSClientEC2) and keep the BaseEndpoint
construction using fmt.Sprintf("https://elasticloadbalancing.%s.amazonaws.com",
cfg.Region).

---

Duplicate comments:
In `@openshift-tests/ccm-aws-tests/e2e/aws/helper.go`:
- Around line 128-130: The code unconditionally overwrites o.BaseEndpoint
because cfg.Region is always non-empty after loadAWSConfig; change the condition
to only set o.BaseEndpoint when it is not already provided (e.g., check
o.BaseEndpoint == nil or empty) in addition to cfg.Region being non-empty so the
endpoint is not forced; adjust the conditional around the assignment to
reference o.BaseEndpoint and cfg.Region (the symbols in question) similar to the
fix applied in createAWSClientLoadBalancer.

---

Nitpick comments:
In `@docs/dev/e2e-ote-ccm-aws.md`:
- Line 125: Change the adjectival phrase "hosted cluster" to the hyphenated form
"hosted-cluster" for consistency in modifier usage; update each occurrence
(e.g., the phrase found in the sentence starting "When running against a
HyperShift hosted cluster, KUBECONFIG must point to the" and the similar
occurrence around line 161) to "hosted-cluster" so the compound modifier is
hyphenated wherever it precedes a noun.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1be81588-e297-4557-b9d1-4cac9b4f6369

📥 Commits

Reviewing files that changed from the base of the PR and between 7f6aa93 and a4795c8.

📒 Files selected for processing (6)
  • docs/dev/e2e-ote-ccm-aws.md
  • docs/dev/ote-ccm-aws.md
  • openshift-tests/ccm-aws-tests/e2e/aws/helper.go
  • openshift-tests/ccm-aws-tests/e2e/aws/loadbalancer.go
  • openshift-tests/ccm-aws-tests/e2e/common/helper.go
  • openshift-tests/ccm-aws-tests/go.mod
💤 Files with no reviewable changes (1)
  • docs/dev/ote-ccm-aws.md

Comment thread docs/dev/e2e-ote-ccm-aws.md Outdated
Comment thread docs/dev/e2e-ote-ccm-aws.md Outdated
Comment thread openshift-tests/ccm-aws-tests/e2e/aws/helper.go
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

Local tests targeting hypershift hosted cluster are passing:

$ env | grep ^HYPERSHIFT
HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG=/home/mtulio/openshift/OCPSTRAT-1553/hypershift-deploy/install-dir-v32/auth/kubeconfig
HYPERSHIFT_MANAGEMENT_CLUSTER_NAMESPACE=clusters-mrb-v32-hc4
me@localhost ~/openshift/OCPSTRAT-1553/cluster-cloud-controller-manager-operator (OCPBUGS-83399-v2)
$ grep -E "(name\"\:|\"result\")" e2e-ote.log 
grep: warning: stray \ before :
    "name": "[cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should have NLBSecurityGroupMode with 'Managed value in cloud-config [Suite:openshift/conformance/parallel]",
    "result": "passed",
    "name": "[cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should create NLB service with security group attached [Suite:openshift/conformance/parallel]",
    "result": "passed",
    "name": "[cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should have security groups attached to default ingress controller NLB [Suite:openshift/conformance/parallel]",
    "result": "skipped",
    "name": "[cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should update security group rules when service is updated [Suite:openshift/conformance/parallel]",
    "result": "passed",
    "name": "[cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should cleanup security groups when service is deleted [Suite:openshift/conformance/parallel]",
    "result": "passed",
    "name": "[cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should have correct security group rules for service ports [Suite:openshift/conformance/parallel]",
    "result": "passed",

Except the checking the ingress, which I am figuring out how to create a cluster with that flag enabled to get routers uses NLB on install time.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d7770d80-5495-11f1-8ea0-eafa9b9db0f9-0

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-aws-ovn-conformance

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-aws-ovn-conformance

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f0ca3500-5495-11f1-90e5-249a448fb62c-0

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 20, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-ovn-conformance

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-ovn-conformance

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0dfea370-5498-11f1-8ee7-71af62bf6546-0

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

/payload-job periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@mtulio: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

/payload-job pull-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@mtulio: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

/testwith openshift/hypershift/main/e2e-aws-ovn-conformance-ccm-techpreview

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/51495bb0-54b2-11f1-9071-d22d799e67a5-0

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

/test all

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

Alll required changes must be addressed now. Re-testing on hosted cluster and standalone before converting to ready:

/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

/test e2e-aws-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e6c59230-5548-11f1-9885-d4b892a16acb-0

@mtulio mtulio marked this pull request as ready for review May 21, 2026 19:21
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@openshift-ci openshift-ci Bot requested review from RadekManak and nrb May 21, 2026 19:21
@mtulio mtulio changed the title OCPBUGS-86303: enhance tests to run hybrid in hypershift HC OCPBUGS-86303: e2e/ote-ccm-aws: enhance tests to run hybrid in hypershift HC May 21, 2026
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

Seeing e2e-aws-ovn issues of unrelated changes. Asked DPTP on related (BYOIP) thread.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

Those changes has been reviewed locally and in previous CI payload jobs - awaiting one more with cleaned up branch.

cc @mfbonfigli @nrb

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

/test e2e-aws-ovn

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 21, 2026

Confirmed CI infra issues in standalone profiles, following the Slack thread before triggering again.

@openshift-ci-robot
Copy link
Copy Markdown

@mtulio: This pull request references Jira Issue OCPBUGS-86303, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (mrbraga@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

The [cloud-provider-aws-e2e-openshift] AWSServiceLBNetworkSecurityGroup e2e tests fail on HyperShift hosted clusters due to three distinct issues:

  • (1) the AWS SDK resolves the ELB/EC2 API endpoint to a VPC private endpoint DNS name (elasticloadbalancing.<vpc-endpoint-id>.amazonaws.com) that is unreachable from the CI test pod running on the management cluster;
  • (2) the AWS_REGION environment variable is set to a CI lease UUID (from LEASED_RESOURCE) instead of a valid AWS region, producing invalid endpoint URLs even with the public endpoint override; and
  • (3) the cloud-config validation test looks for ConfigMap cloud-conf in namespace openshift-cloud-controller-manager on the guest cluster, but in HyperShift the CCM configuration lives in ConfigMap aws-cloud-config in the hosted control plane namespace on the management cluster.

Proposed Solution

This PR forces the ELBv2 and EC2 clients to use public regional AWS endpoints via BaseEndpoint, validates the SDK-resolved region against a pattern covering all five AWS partitions (standard, China, GovCloud, European Sovereign Cloud, and ISO/ISOB) with fallback to infrastructure/cluster status.platformStatus.aws.region, and makes the cloud-config test topology-aware by detecting External control plane topology to read the config from the HCP namespace on the management cluster using environment variables (HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG, HYPERSHIFT_MANAGEMENT_CLUSTER_NAMESPACE) set by the CI conformance step. It also adds a SKIP_MANAGEMENT_CLUSTER_TESTS opt-in flag for environments without management cluster access, extracts reusable helpers (GetCloudConfig, IsExternalTopology, IsConfigPresentCloudConfig, IsNLBSecurityGroupModeManaged) into common/helper.go without Ginkgo control-flow dependencies, and replaces the stale developer documentation with corrected paths, HyperShift setup instructions, and batch test execution examples.

Test Requirements

We must ensure those tests are validating those changes:

Standalone:

  • regular presubmit jobs
  • HCP / Hosted Cluster:
/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

Summary by CodeRabbit

  • Documentation

  • Added comprehensive guide for running AWS CCM e2e tests via the OTE test binary, with prerequisites, build/run examples, test listing, batching, and HyperShift usage details.

  • New Features

  • HyperShift topology detection and environment-variable support to opt out of management-cluster tests and select the correct cloud-config source.

  • Bug Fixes

  • Stricter AWS region validation and improved regional endpoint selection for AWS clients.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 22, 2026

/test all

1 similar comment
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 22, 2026

/test all

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 25, 2026

/test unit e2e-aws-ovn

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 25, 2026

Analysis:

5/6 tests with prefix [cloud-provider-aws-e2e-openshift] are passing, one is skipping (as expected), as well 5/5 test group with prefix [cloud-provider-aws-e2e] are passing in the Hypershift TPNU payload jobs:

6/6 tests with prefix [cloud-provider-aws-e2e-openshift] are passing, as well 7/7 test group with prefix [cloud-provider-aws-e2e] are passing in the presubmit for e2e-aws-ovn:

Note 1: Hypershift is skipping Ingress test : [cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should have security groups attached to default ingress controller NLB [Suite:openshift/conformance/parallel]
Note 2: Hypershift tests are skipping (job level) two upstream tests, resulting only in 5 test configured to run. The local tests[1] are showing this must work, we are first fixing the issues reporting in this bug, later removing those skips. WIP at openshift/release#79439

[1] Local tests targeting devel env in hypershift setup is also validated: #464 (comment)

The results show this new hybrid aws client configuration is working both self-managed and hosted cluster's installations.

/verified by @mtulio CI and local tests.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 25, 2026
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 25, 2026

/pipeline-required

@openshift-ci-robot
Copy link
Copy Markdown

@mtulio: This PR has been marked as verified by @mtulio CI and local tests..

Details

In response to this:

Analysis:

5/6 tests with prefix [cloud-provider-aws-e2e-openshift] are passing, one is skipping (as expected), as well 5/5 test group with prefix [cloud-provider-aws-e2e] are passing in the Hypershift TPNU payload jobs:

6/6 tests with prefix [cloud-provider-aws-e2e-openshift] are passing, as well 7/7 test group with prefix [cloud-provider-aws-e2e] are passing in the presubmit for e2e-aws-ovn:

Note 1: Hypershift is skipping Ingress test : [cloud-provider-aws-e2e-openshift] loadbalancer NLB [OCPFeatureGate:AWSServiceLBNetworkSecurityGroup] should have security groups attached to default ingress controller NLB [Suite:openshift/conformance/parallel]
Note 2: Hypershift tests are skipping (job level) two upstream tests, resulting only in 5 test configured to run. The local tests[1] are showing this must work, we are first fixing the issues reporting in this bug, later removing those skips. WIP at openshift/release#79439

[1] Local tests targeting devel env in hypershift setup is also validated: #464 (comment)

The results show this new hybrid aws client configuration is working both self-managed and hosted cluster's installations.

/verified by @mtulio CI and local tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown

@mtulio: This pull request references Jira Issue OCPBUGS-86303, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (mrbraga@redhat.com), skipping review request.

Details

In response to this:

Summary

OCPBUGS-86303

The [cloud-provider-aws-e2e-openshift] AWSServiceLBNetworkSecurityGroup e2e tests fail on HyperShift hosted clusters due to three distinct issues:

  • (1) the AWS SDK resolves the ELB/EC2 API endpoint to a VPC private endpoint DNS name (elasticloadbalancing.<vpc-endpoint-id>.amazonaws.com) that is unreachable from the CI test pod running on the management cluster;
  • (2) the AWS_REGION environment variable is set to a CI lease UUID (from LEASED_RESOURCE) instead of a valid AWS region, producing invalid endpoint URLs even with the public endpoint override; and
  • (3) the cloud-config validation test looks for ConfigMap cloud-conf in namespace openshift-cloud-controller-manager on the guest cluster, but in HyperShift the CCM configuration lives in ConfigMap aws-cloud-config in the hosted control plane namespace on the management cluster.

Proposed Solution

This PR forces the ELBv2 and EC2 clients to use public regional AWS endpoints via BaseEndpoint, validates the SDK-resolved region against a pattern covering all five AWS partitions (standard, China, GovCloud, European Sovereign Cloud, and ISO/ISOB) with fallback to infrastructure/cluster status.platformStatus.aws.region, and makes the cloud-config test topology-aware by detecting External control plane topology to read the config from the HCP namespace on the management cluster using environment variables (HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG, HYPERSHIFT_MANAGEMENT_CLUSTER_NAMESPACE) set by the CI conformance step. It also adds a SKIP_MANAGEMENT_CLUSTER_TESTS opt-in flag for environments without management cluster access, extracts reusable helpers (GetCloudConfig, IsExternalTopology, IsConfigPresentCloudConfig, IsNLBSecurityGroupModeManaged) into common/helper.go without Ginkgo control-flow dependencies, and replaces the stale developer documentation with corrected paths, HyperShift setup instructions, and batch test execution examples.

Test Requirements

We must ensure those tests are validating those changes:

Standalone:

  • regular presubmit jobs
  • HCP / Hosted Cluster:
/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

Summary by CodeRabbit

  • Documentation

  • Added comprehensive guide for running AWS CCM e2e tests via the OTE test binary, with prerequisites, build/run examples, test listing, batching, and HyperShift usage details.

  • New Features

  • HyperShift topology detection and environment-variable support to opt out of management-cluster tests and select the correct cloud-config source.

  • Bug Fixes

  • Stricter AWS region validation and improved regional endpoint selection for AWS clients.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mfbonfigli
Once this PR has been reviewed and has the lgtm label, please assign damdo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Address chaibot feeback by updating error wrapping, as well
improve const documentations of decision taking of using regional
endpoints preventing VPC provided endpoints which does not work outside
VPC - where the client/ote runs.
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 25, 2026
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 25, 2026

Chai-bot feedback addressed.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 25, 2026

Changes should not affect job results (only error handling). Re-testing to collect more data points of stability while waiting for review.

/payload-job periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview
/test e2e-aws-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-aws-ovn-conformance-ccm-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5284e010-586f-11f1-98dd-c6c6dbea694c-0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@mtulio: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants