Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot quantize Qwen2-VL-72B-Instruct on a single gpu card #389

Closed
WeiweiZhang1 opened this issue Dec 17, 2024 · 3 comments
Closed

Cannot quantize Qwen2-VL-72B-Instruct on a single gpu card #389

WeiweiZhang1 opened this issue Dec 17, 2024 · 3 comments
Assignees

Comments

@WeiweiZhang1
Copy link
Collaborator

when running below shell cmd, the device-related error is reported, and the quantization of the 72B model is expected to be done by a single gpu card.

CUDA_VISIBLE_DEVICES=$device
python3 -m auto_round --mllm
--model ${model_dir}/Qwen2-VL-72B-Instruct
--group_size 64
--bits 2
--iters 2000
--nsample 1024
--low_gpu_mem_usage
--seqlen 2048
--model_dtype "float16"
--format 'auto_gptq,auto_round' \

image

@wenhuach21
Copy link
Contributor

@n1ck-guo following llm, disable auto-mapping for single card

@wenhuach21
Copy link
Contributor

Besides, please add a gpu ut for 70B models

@wenhuach21
Copy link
Contributor

#395

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants