Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [WIP] supervisor: create ClusterModule per MachineDeployment and re-reconcile VirtualMachineResourceSetPolicy to update VirtualMachineSetResourcePolicy #3287

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chrischdi
Copy link
Member

What this PR does / why we need it:

  • CAPV currently creates one single cluster module which is used for all machinedeployments of a cluster.
  • This was added like this because there always was only a single machinedeployment used.
  • This PR:
    • adds a clustermodule per machinedeployment
    • updates the clustermodule's
    • watches machinedeployments now, to trigger updates for clustermodules
    • removes the old cluster-module name used before
    • verifies that the cluster module got created by vm-operator before creating a virtual machine

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 9, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chrischdi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chrischdi
Copy link
Member Author

/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main

@chrischdi chrischdi changed the title 🐛 supervisor: create ClusterModule per MachineDeployment and re-reconcile VirtualMachineResourceSetPolicy to update VirtualMachineSetResourcePolicy 🐛 [WIP] supervisor: create ClusterModule per MachineDeployment and re-reconcile VirtualMachineResourceSetPolicy to update VirtualMachineSetResourcePolicy Dec 9, 2024
@chrischdi
Copy link
Member Author

/hold

failed to patch VirtualMachineSetResourcePolicy clusterclass-rollout-i3y0x6/clusterclass-rollout-jxgupz:
      admission webhook "default.validating.virtualmachinesetresourcepolicy.v1alpha2.vmoperator.vmware.com"
      denied the request: spec.clustermodules: Invalid value: []string{"clusterclass-rollout-jxgupz-md-md-0-wpd29",
      "control-plane-group"}: field is immutable

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2024
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass

@@ -59,6 +59,36 @@ func GetVSphereClusterFromVMwareMachine(ctx context.Context, c client.Client, ma
return vsphereCluster, err
}

// GetVMwareVSphereClusterFromMachineDeployment gets the vmware.infrastructure.cluster.x-k8s.io.VSphereCluster resource for the given MachineDeployment$.
func GetVMwareVSphereClusterFromMachineDeployment(ctx context.Context, c client.Client, machineDeployment *clusterv1.MachineDeployment) (*vmwarev1.VSphereCluster, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
If this func is going only be used to enqueue requests

Suggested change
func GetVMwareVSphereClusterFromMachineDeployment(ctx context.Context, c client.Client, machineDeployment *clusterv1.MachineDeployment) (*vmwarev1.VSphereCluster, error) {
func MachineDeploymentToVMwareVSphereCluster(ctx context.Context, c client.Client, machineDeployment *clusterv1.MachineDeployment) (*vmwarev1.VSphereCluster, error) {
  • I will drop the final get and simply return the namespaced name

return nil, errors.Wrapf(err, "failed to list MachineDeployment objects")
}
for _, md := range mdList.Items {
if md.DeletionTimestamp.IsZero() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure we should drop deleted MD, there could still be machines for them.

return nil, err
}

clusterModuleGroups := append([]string{ControlPlaneVMClusterModuleGroupName}, machineDeploymentNames...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be we should add also fmt.Sprintf("%s-workers-0", cluster.Name) if not already there, at least for the transition time

}

// Ensure .spec.clusterModuleGroups is up to date.
helper, err := patch.NewHelper(resourcePolicy, s.Client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I prefer this approach (using the patch helper), but let's consider also using createOrUpdate for consistency with the rest of the codebase

}
}

return errors.Errorf("VirtualMachineSetResourcePolicy's .status.clusterModules does not yet contain %s", clusterModuleGroupName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: when we look into v1beta2 conditions for CAPV, we should consider if/how to surface when machine creation is stuck due to corresponding module missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants