-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694
Comments
@PawelPeczek-Roboflow Can I have a go at it ? And can you tell me what is expected, are we talking about complete integration end to end or breaking down this issue into sub issues which can be tackled. |
Hi @AHB102, thanks for engaging into the issue. Sure, you can pick up the task - so the point is we would like to: |
First step would definitely be agreeing on API that host llama but was not investigating all of the options, which would be good to do. I would try to find cheap and reliable third party |
@PawelPeczek-Roboflow I looked into hosted Llama 3.2 Vision APIs and found a few options: Together.ai (https://api.together.xyz/models) , Google Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/llama-3-2-metas-new-generation-models-vertex-ai) , Azure (https://techcommunity.microsoft.com/blog/machinelearningblog/meta%E2%80%99s-new-llama-3-2-slms-and-image-reasoning-models-now-available-on-azure-ai-m/4255167) and AWS Bedrock(https://aws.amazon.com/blogs/machine-learning/vision-use-cases-with-llama-3-2-11b-and-90b-models-from-meta/).Except Together.ai all of the other options have massive scale , it would be reliable and cheap. Hugging face also has a offering for inferencing. I checked out OpenRouter's API limits. The Llama 3.2 11B model is currently free, and the usage rates are pretty good. I think 20 requests per minute should be plenty for most things Any thoughts ? |
I do not have particular bias towards any of the vendor - I even see that the decision which is most handy for people to use is strictly related to individual preferences of the consumer. I see the construction of the block in the following way:
|
That sounds great! This approach provides a solid foundation for future scalability and flexibility. By not committing to a single vendor upfront, we can adapt to evolving needs and avoid potential vendor lock-in. To start, I suggest we explore OpenRouter. It offers free API usage for Llama 2.3 11B, making it ideal for initial testing and development. Additionally, its compatibility with familiar libraries like requests and openai can streamline the integration process and minimize security risks. Once we have a robust core structure in place, we can easily pivot to other providers. wdyt ? |
yeah, that sounds right |
I've been diving into the Workflow Block (https://github.com/roboflow/inference/blob/main/inference/core/workflows/core_steps/models/foundation/openai/v2.py) and feel comfortable with the workings of the OpenAIBlockV2 class. I'm about to start writing code. Any advice for getting the most out of it? When modifying the inference core, I understand that I need to include test cases, right? |
yeah, tests are recommended. Here is our block creation guide: https://inference.roboflow.com/workflows/create_workflow_block/ |
you can find information how to run development smoothly to test remote apis - we usually create unit-tests agains mocks - and place some integration test skipped if API key not provided |
@PawelPeczek-Roboflow Let's get v1.py for Llama working, I'll be focusing on getting it functional before tackling test cases. I'll definitely ask for your input and help along the way, and I'll keep you updated on how it's going. Thanks for the docs 😁 |
hi there :) anything I can help you with? |
@PawelPeczek-Roboflow Hi, sorry for the late reply.
|
Do not worry to much if the API for all VLM blocks cannot be identical, we strive for similar experience regarding blocks integration, not 100% the same config parameters. I do not see the list of all params that open-router APIs support, they name it recommended, and use openai client |
For now I'll dive into the details of max_tokens and top_p to keep our responses concise and cost-effective. We can also explore other tricks like choosing the right model and batching requests. I will update you once I have something, and concurrently keep working on the manifest block. 😁 |
👍 |
@PawelPeczek-Roboflow I experimented with the tiktoken library to determine the maximum token count and top-p values used by OpenRouter Llama 3.2 Vision. For tasks like object detection, image captioning, and OCR, I found that the maximum number of tokens rarely exceeds 200. I also tested different top-p values, ranging from 0.7 to the default of 1.0, and observed a decrease in the number of tokens required as the top-p value increased. |
I guess so - in this case (contrary to popular approach) I suggest just copy-pasting the functions into your block module. We do not follow DRY rule for blocks, in practice it's easier to manage changes for each blocks separately |
@PawelPeczek-Roboflow Hi 🖐️, I’ve nearly completed the workflow block, which now consists of approximately 600 lines of code. To ensure thorough understanding, I’ve been analyzing the function of each individual component, which has slowed progress. I’ve constructed the block by referencing Anthropic and OpenAI workflow implementations. However, I haven’t tested anything it yet. What should be the next steps toward testing and integration? PS: I haven’t worked on a codebase of this size before, so I’ve learned a lot in the process. 😅 |
cool - submit the pr even if not 100% ready, we will figure out the way forward |
@PawelPeczek-Roboflow Nice !, I will make a PR 😁 |
Llama 3.2 Vision in Workflows
Are you ready to make a difference this Hacktoberfest? We’re excited to invite you to contribute by integrating LLama 3.2 Vision into our Workflows ecosystem! This new block for image generation will be a fantastic addition, broadening the horizons of what our platform can achieve.
Join us in enhancing our capabilities and empowering users to harness the power of vision technology. Whether you're a seasoned developer or just starting your journey in open source, your contributions will play a vital role in shaping the future of our ecosystem. Let’s collaborate and bring this innovative functionality to life!
Task description
requests
library - we've found that OpenRouter provides REST API access (see this) - but if you find a better option - feel free to discussCheatsheet
The text was updated successfully, but these errors were encountered: