-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent Reconciliation Caused by Minor Changes in HPA Status from Prometheus Scalers #6442
Comments
@Park-Jiyeonn I have faced this issue last year and we debugged it and fixed it as well, This fix is there from KEDA version 2.13.0. |
Thank you for your response! I’ve identified that the issue was indeed caused by the excessive number of Reconcile events. However, we are also encountering another problem: each Reconcile operation is very slow, and our apiserver’s QPS is consistently capped at 20. After some investigation, we realized this was due to the default QPS limit of 20 in the Controller-runtime client, so we’ve adjusted the client’s QPS settings to resolve this bottleneck. Currently, we are using KEDA version 2.8 due to compatibility constraints with our cluster version. If you are aware of any other known performance issues or best practices for this version of KEDA, I would greatly appreciate it if you could share them with us. This would help us avoid potential problems in advance. Thanks again for your help! |
We also had this rate limit happening at the api server, which we resolved by increasing it. |
Report
I am experiencing an issue where the status of HPAs created by KEDA frequently change due to minor differences in the Prometheus query results used in my scalers. For example, the status value for my CPU utilization query might change from 55.001 to 55.002, causing KEDA-operator to frequently trigger the Reconcile process.
Currently, I have a large number of Prometheus-based scalers in my cluster, and this frequent Reconciliation significantly impacts the operator's performance. When a new ScaledObject is created, it can take up to 10 minutes or more for its associated Reconciliation to be processed because the operator is constantly handling updates triggered by these small annotation changes.
Is there a way to avoid such frequent Reconciliation for minor or insignificant annotation changes? Or is there a best practice to handle this situation when using Prometheus-based scalers?
Expected Behavior
The operator should not trigger Reconciliation for minor or insignificant changes in HPA status. Alternatively, there should be a way to configure a tolerance or threshold for such changes to avoid excessive Reconcile events.
Actual Behavior
The operator is repeatedly triggered for Reconciliation due to minor changes in HPA annotations, significantly impacting performance and delaying the processing of new ScaledObjects.
Steps to Reproduce the Problem
Logs from KEDA operator
A large number of similar logs:
KEDA Version
< 2.12.0
Kubernetes Version
< 1.28
Platform
Any
Scaler Details
Prometheus
Anything else?
If this issue is considered a bug and there is agreement on a potential solution, I am willing to contribute code to fix it. I would appreciate any guidance or suggestions from the maintainers on how to approach the fix, such as which parts of the codebase to modify or any specific design considerations to keep in mind.
The text was updated successfully, but these errors were encountered: