Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Recreate bootstrap token if it was cleaned up #11520

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AndiDog
Copy link
Contributor

@AndiDog AndiDog commented Dec 2, 2024

What this PR does / why we need it:

If the join token was cleaned up by Kubernetes before CAPI was able to refresh its expiry, CAPI currently fails. This happened in a few cases for us, using machine pools, and required manual intervention to unset the token so that CAPI would recreate it. This change takes care to autoheal the situation by recreating the token.

Which issue(s) this PR fixes:

Seems to be the same issue as #11034.

/area bootstrap

@k8s-ci-robot k8s-ci-robot added area/bootstrap Issues or PRs related to bootstrap providers cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 2, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 2, 2024
Copy link
Contributor

@g-gaston g-gaston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just some nits

@@ -371,6 +371,11 @@ func (r *KubeadmConfigReconciler) refreshBootstrapTokenIfNeeded(ctx context.Cont

secret, err := getToken(ctx, remoteClient, token)
if err != nil {
if apierrors.IsNotFound(err) {
log.Error(err, "token secret is gone, triggering creation of new token")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Error(err, "token secret is gone, triggering creation of new token")
log.Error(err, "Bootstrap token secret is gone, triggering creation of new token")

Also, big nit, but should this really be an error log? Even if this not part of the normal flow, if we have a way to recover from it, should we surface it as an error? My concern is if this would be confusing from someone looking over the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, and done. I had initially made it an error since it's rare, but that's admittedly not a good reason.

return ctrl.Result{}, errors.Wrapf(err, "failed to refresh bootstrap token")
}
return ctrl.Result{
RequeueAfter: r.tokenCheckRefreshOrRotationInterval(),
}, nil
}

func (r *KubeadmConfigReconciler) createNewBootstrapToken(ctx context.Context, config *bootstrapv1.KubeadmConfig, scope *Scope, remoteClient client.Client) (ctrl.Result, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? Given the message you log, looks like this method is not for the first token creation, but for when the token needs to be recreated.

Suggested change
func (r *KubeadmConfigReconciler) createNewBootstrapToken(ctx context.Context, config *bootstrapv1.KubeadmConfig, scope *Scope, remoteClient client.Client) (ctrl.Result, error) {
func (r *KubeadmConfigReconciler) recreateNewBootstrapToken(ctx context.Context, config *bootstrapv1.KubeadmConfig, scope *Scope, remoteClient client.Client) (ctrl.Result, error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. But, talking about language nits 😆, I'm removing "new" since that else duplicates the "re" prefix.

@g-gaston
Copy link
Contributor

g-gaston commented Dec 2, 2024

@AndiDog is this a dup of #11037 ?

@fabriziopandini
Copy link
Member

fabriziopandini commented Dec 3, 2024

Yes, it seems a dup of #11037, but it has the big advantage that this PR comes with unit tests.

It would be great if @AndiDog and @archerwu9425 can join forcces and come out with one PR preserving test coverage and addressing this comment (re-create behaviour should be limited to MP owned bootstrap tokens).

@AndiDog AndiDog force-pushed the recreate-bootstrap-token-if-cleaned branch from 5350450 to 8716e12 Compare December 3, 2024 15:33
@AndiDog
Copy link
Contributor Author

AndiDog commented Dec 3, 2024

Sorry, I wasn't aware there's already a PR. Please feel free to choose where to continue with reviews. I covered all places where the secret could be seen as not found – the Get and the Update of the token.

@AndiDog AndiDog force-pushed the recreate-bootstrap-token-if-cleaned branch from 8716e12 to 9d22a92 Compare December 3, 2024 15:52
@AndiDog
Copy link
Contributor Author

AndiDog commented Dec 3, 2024

I limited the new behavior to machine pools

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last two nits otherwise lgtm pending fix for linter errors

@@ -371,6 +371,11 @@ func (r *KubeadmConfigReconciler) refreshBootstrapTokenIfNeeded(ctx context.Cont

secret, err := getToken(ctx, remoteClient, token)
if err != nil {
if apierrors.IsNotFound(err) && scope.ConfigOwner.IsMachinePool() {
log.Info("Bootstrap token secret is gone, triggering creation of new token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
log.Info("Bootstrap token secret is gone, triggering creation of new token")
log.Info("Bootstrap token secret is not found, triggering creation of new token")

same in L410

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to move this log line and the following line blanking out the Token into recreateBootstrapToken

q: can we drop blanking out the boolstrap token entirely, given that we are setting it to a new value inside recreateBootstrapToken

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

Semantically, the function name recreateBootstrapToken doesn't denote that the secret wasn't found, so I'd rather leave the log line here where we determined that the secret wasn't found and therefore need to call the recreation functionality.

We could drop setting the field to "", but then if token creation fails, CAPI tries to read the non-existing secret again. If we set an empty string, the next reconciliation run will try to create the token. Your call here – I have no strong opinion which way to go.

@@ -1441,6 +1441,154 @@ func TestBootstrapTokenRotationMachinePool(t *testing.T) {
g.Expect(foundNew).To(BeTrue())
}

func TestBootstrapTokenRefreshIfTokenSecretCleanedMachine(t *testing.T) {
Copy link
Member

@fabriziopandini fabriziopandini Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a single test with two test run? (se we can have a better test description)

Suggested change
func TestBootstrapTokenRefreshIfTokenSecretCleanedMachine(t *testing.T) {
func TestBootstrapTokenRefresh(t *testing.T) {
t.Run("should not recreate the token for Machines", func(t *testing.T) {
...
})
t.Run("should recreate the token for MachinePools", func(t *testing.T)
...
})
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants