Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dry run changes #7525

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

dry run changes #7525

wants to merge 1 commit into from

Conversation

edibble21
Copy link
Contributor

Fixes #N/A

Description

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

netlify bot commented Dec 13, 2024

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 53c553f
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/675b936872eba200085d1bc9

@@ -25,6 +25,7 @@ const (
ConditionTypeAMIsReady = "AMIsReady"
ConditionTypeInstanceProfileReady = "InstanceProfileReady"
ConditionTypeValidationSucceeded = "ValidationSucceeded"
ConditionTypeNotDegraded = "NodeclassNotDegraded"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that an authorization is a hard-stop (as-in, we can't create any resources for sure if we don't have proper permissions, I see this as a validation failure, not a Degraded case)

In general, I would think about the classification as:

  1. Degraded: Something where we suspect that there is an issue with the launch config but aren't 100% sure -- this informs a user to look at the NodePool but we will keep launching nodes to keep trying
  2. ValidationFailed: Something where we know that there is an issue with the launch config -- this inform a user to look at the NodePool and we will stop launching nodes with this NodePool

return reconcile.Result{}, nil
}
nodeClaims := &karpv1.NodeClaimList{}
if err := d.kubeClient.List(ctx, nodeClaims, nodeclaimutils.ForNodeClass(nodeClass)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should get real NodeClaims for this -- we shouldn't require actual nodes to have been launched to determine that there is an issue with our authorization -- we should be able to "mock" a NodeClaim for the launch and then use that to execute the dry-run

@@ -242,8 +242,13 @@ func (p *DefaultProvider) launchInstance(ctx context.Context, nodeClass *v1.EC2N
} else {
createFleetInput.OnDemandOptions = &ec2types.OnDemandOptionsRequest{AllocationStrategy: ec2types.FleetOnDemandAllocationStrategyLowestPrice}
}

createFleetInput.DryRun = lo.ToPtr(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updates the existing provider -- I don't think you want to update this call for everything -- you just want to pass an option to the create that ensures that it succeeds under dry-run.

if err := d.kubeClient.List(ctx, nodeClaims, nodeclaimutils.ForNodeClass(nodeClass)); err != nil {
return reconcile.Result{}, fmt.Errorf("listing nodeclaims that are using nodeclass, %w", err)
}
_, err := d.instanceProvider.Create(ctx, nodeClass, &nodeClaims.Items[0], nodeClass.Spec.Tags, lo.Must(d.resolveInstanceTypes(ctx, &nodeClaims.Items[0], nodeClass)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require an update to the CloudProvider interface, so I'm not 100% convinced that it's a good idea, but it would be kinda nice if we could just run a cloudProvider.Create() under dry-run and then execute all of the underlying config just the same -- I guess what this would really take would be to inject an EC2API that just always executes everything under dry-run every time -- but this basically makes sure that we run through the full "launch sequence" without launching an actual instance

return reconcile.Result{}, nil
}

func (d *Degraded) resolveInstanceTypes(ctx context.Context, nodeClaim *karpv1.NodeClaim, nodeClass *v1.EC2NodeClass) ([]*corecloudprovider.InstanceType, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal that we have to rewrite this function again with this change

Copy link
Contributor

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants