-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting up Llamadeploy for multiagent deployment on k8s #357
Comments
@masci thanks a lot for looking into my question above. We are kind of blocked and there is a some urgency in completing the PoC for agentic workflows using LlamaIndex and greatly appreciate if you can provide some guidance with the request above. |
@hz6yc3 while it might not be totally clear from docs/examples, its fairly straightforward. You'd need to use the lower-level API Basically, you can setup a docker image that deploys the core Then another docker image from there that deploys a workflow service (or several, depending on how you want to manage scaling) Once you have it running in docker, its fairly transferrable to then launching those docker images in a k8s cluster This example walks through all of this, including k8s We are working on updates to make this easier though, using a more simple top-level yaml file rather than writing code for all the deployments. But in-lieu of that, the above is the best approach. |
@logan-markewich thanks a lot! Let me read through the documents. We were not sure on the guidance for centrally deploying the core components because based on the architecture in the documentation it seemed like we have to deploy the core components (control plane, message queue) for each deployment separately. The way we deploy applications in our company is that every application is deployed within its own namespace on the cluster so we weren't sure how we would want to set up the deployment pattern using llama deploy. |
Yeah, https://www.llamaindex.ai/blog/introducing-llama-deploy-a-microservice-based-way-to-deploy-llamaindex-workflows is somewhat misleading about these:
If you use the API Server / |
@rehevkor5 first of all, thanks for the feedback! What you read in the article is still true but it dates back to before we introduced the apiserver, see how we changed the architecture diagram here so I see how this can be misleading. A quick recap to clarify the situation:
Why the apiserver is monolithic then? The apiserver is a key component of what we want Llama Deploy to become in terms of user experience. We wanted to quickly validate the concept of "deployments" and their yaml definition with our users and get feedback as soon as possible, so we optimized the current "backend" of the apiserver for running in a single-process/single-container environment that was easy to setup. But we're already planning an actual scalable implementation of the apiserver backend, currently we're leaning towards building on top of existing container orchestrators to move faster and avoid reinventing any wheel. I'll expand the docs to include these considerations and call out that the apiserver is work in progress. Let me know if you have any question! |
@masci sounds like your suggested approach is manual orchestration for deploying the individual components for now until a scalable solution using api server is developed. Based on the updated architecture diagram you shared it sounds like we have to create separate "deployments" with its own control plane and message queue config for deploying the associated workflows? |
@masci I have been working on a POC to set up Llama Deploy workflows using the manual orchestration approach. I did manage to set it up using docker-compose using a custom docker image with both simple message queue and redis. As next step of the POC, I tried deploying the services to k8s. The setup I was going for is to have a centralized deployment of control plane and message queue (with multiple replicas), deploy workflows as a separat deployment (with multiple replicas) and register the workflow to the central control plane. Is there a way to share the service metadata information across the control plane deployments and to register one instance of the workflow? In the meantime, I can scale down my replicas to 1 to mitigate the issue but curious to see if there is already a fix available. Edit: A quick fix could be to allow passing in a KV store URI that is a separate service for the control plane to use here via env var like |
Yes I believe that would be the solution, we already have a bunch of stores that can run on separate services https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/storage/kvstore so for example you could use the Redis implementation. I'll look into it, tracking the feature with #370 |
@masci @logan-markewich I have set up control-plane, message-queue, and workflow services using manual orchestration and the set up is as follows:
I am referencing the services using the k8s service URL: When I am interacting with this deployment using the LlamaDeployClient, I am noticing that all replicas of the workflow service are consuming the same message from Redis and running the workflow and as a result, the client receives duplicated responses due to multiple workflows acting on the message. But once the control plane receives a final_result from one of the workflow replicas, it stops consuming messages for the same task_id, which is expected. How do I ensure that only one pod replica of the workflow is consuming the message from the message queue and processing a single request instead of all replicas of the workflow? Edit: I wonder if issue 363 will resolve this problem but maybe not completely? I tried using simple message queue with 1 replica (because consumers and queues are managed in memory) with 2 replicas of control plane that uses redis KV store and 3 replicas of the workflow service. As the message queue service is now responsible for publishing and consuming messages, it is able to process one request at a time. But if I scale the message queue, each replica will need to register the consumers and publishers to work as expected (a problem that can be solved similar to using a separate KV store like in control plane?). |
There is no documentation that provides guidance on how to set up Llamadeploy (control plane, message queue and service deployment) on Kubernetes. The example provided in the code is little confusing and our company badly need some guidance on setting up Llamadeploy for enterprise deployment. Any relevant documentation or sample configuration that someone can share would be really helpful.
The text was updated successfully, but these errors were encountered: