-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka scaler: excludePersistentLag not working for partitions with invalid offset (that is -1) #5274
Comments
@dttung2905 What do you think about this? |
@zroubalik I think its a good point that @rakesh-ism bring up and imo we can implement the logic. I do agree that sometimes we do get negative offset lag for kafka consumer (quite common from my experience managing kafka actually 😞 ) I will create a PR tomorrow morning for this ( its late in my timezone now haha ) |
@rakesh-ism could you help to send over a few things ?
|
@dttung2905, Thanks for taking it up. In my case, offsetresetpolicy is "earliest", the above code is bypassed. and code branch at line no 684 is executed. |
Thanks for sharing with us the config file. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Got busy. Need some more time to reproduce. |
@rakesh-ism sure. Let me know if you manage to reproduce it 🙏 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
Hi
Because of this the lag comes as 156. |
@dttung2905 PTAL 🙏 :) |
Any updates please? |
TLDR: update at least to keda 2.15.x, enable scaleToZeroOnInvalidOffset: "true" and add a cron trigger to start the consumer at least one a day/hour (depending of your requirements). Long term fix below, by forcing one message in all partitions. I also have this problem and found that this option fix this: scaleToZeroOnInvalidOffset: "true" So this option will consider lag zero when the lag is below zero or non-existent, fixing this problem. So this option is safe is your minimum consumer is at least one, but if scale to zero, you may need to start the consumer (maybe via cron trigger) to force the consumer to start every few hours/days to make sure you don't have any stale message in the partitions without lag. when all partitions already have lag metric, you can disable this option. Kafka lag negative is usually a quick corner case of a active consumer, it is usually now a major issue and the default scaleToZeroOnInvalidOffset: "false" already takes care of this. Please make sure you use the latest keda version, as older versions may not have this option and the version 2.15.0 add a fix for when the offsetResetPolicy: "earliest" not applying the scaleToZeroOnInvalidOffset workaround. So for my side, i updated keda, enabled that option and it solved my issue. i them enabled cron to start the consumer once a day, just in case for the topics with very little activity. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
Report
When I set excludePersistentLag flag to true, it does not excludes persistent lags for the partition with invalid offset (that is -1).
Expected Behavior
If the lags for these partitions with invalid offset is persistent, it should be ignored in the custom metrics.
Actual Behavior
In my test set up, this resulted in no of kafka consumer pods to set to max.
Steps to Reproduce the Problem
Use kafka scalar to scale the consumer deployment.
I know this is might not be a real production set up. I had this set up in my sandbox landscape for . And it took me hard time to find out what was the problem.
Logs from KEDA operator
KEDA Version
2.12.1
Kubernetes Version
None
Platform
Other
Scaler Details
Kafka
Anything else?
No response
The text was updated successfully, but these errors were encountered: