-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Extremely High Throughput by holding back requests to etcd until the throughput decreases. #16837
Comments
Yes, there is, at least in Kubernetes. https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
What you are describing is issue with write throughput? Is that correct? I haven't heard of real world cases where writes could topple etcd. Write throughput depends mostly on disk performance and is not as resource intensive on memory or CPU (would need a test to confirm). Also etcd limits number of pending proposals so at some point no new proposals should be accepted. Are you sure there is no other accompanying requests other than just writes that could be cause of the problem? For example cost of writes, scales with number of watchers. If you have many watches established this would make more sense. I think we need more concrete data points then just saying etcd becomes a problem. What exact traffic goes into etcd, performance metrics and profiles to be able to answer what is the problem in your case. Overall I think this is a scalability problem, in such cases there is no single problem that would allow us to scale to 1000 qps. Fixing one bottleneck will just surfice another issue. Solution defining the exact scenario we want to improve, picking the success metric, and progressive improvements towards the goal. Such as #16467 |
I remembered one case where testing high throughput affected etcd, however it only caused high memory usage due to increased number of allocations required by PrevKey watch option used by Kubernetes #16839. Issue was easily mitigated by changing GC to be more aggresive. |
Really appreciate you getting back to me. This gives me lots of research and ideas to throw about with my team. If we still have questions / issues, I will try and do it more formally like you suggested. Thanks! |
Hey @serathius, so talking with people more, it seems that we suffer from very high pod-churn with the pods having very large manifests (20-50 kB), so we are constantly having to defragment etcd. OpenShift have an operator that handles this most of the time https://docs.openshift.com/container-platform/4.14/scalability_and_performance/recommended-performance-scale-practices/recommended-etcd-practices.html#manual-defrag-etcd-data_recommended-etcd-practices but they still need some manual work. Is there a way of doing this natively with an operator that etcd have? Or is that something that could be created? Thanks again. |
Hey @Sharpz7 - One of the etcd maintainers @ahrtr has put together an etcd defrag helper utility https://github.com/ahrtr/etcd-defrag. This can be run via a kubernetes cronjob and have rules applied to ensure defrag is only run if actually required. It might be a helpful approach, however please bear in mind this is not an official etcd subproject at this point. |
Appreciate you getting back to me - this is really cool! Thanks |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
What would you like to be added?
Find the original k8s ticket here: kubernetes/kubernetes#120781.
Essentially, what the title says. If someone wants to try and enter an incredibly high number of key-value pairs all at once, there should be a way to hold them back.
As I said in my last comment in the original ticket in k8s, I am not convinced this belongs here. But, it is something I am very, very interested in, and would be happy to pivot to whatever is needed and work on this personally.
Thanks!
Why is this needed?
For people dealing with extremely high-throughput batch work (i.e 1000's jobs / second, lasting 1-2 mins each), etcd starts to become a real problem.
Links to back up this point:
https://etcd.io/docs/v3.5/op-guide/performance/
https://github.com/armadaproject/armada: A scheduling solution partially designed around this problem.
In the original ticket (kubernetes/kubernetes#120781) it was agreed:
The text was updated successfully, but these errors were encountered: