We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when running below shell cmd, the device-related error is reported, and the quantization of the 72B model is expected to be done by a single gpu card.
CUDA_VISIBLE_DEVICES=$device python3 -m auto_round --mllm --model ${model_dir}/Qwen2-VL-72B-Instruct --group_size 64 --bits 2 --iters 2000 --nsample 1024 --low_gpu_mem_usage --seqlen 2048 --model_dtype "float16" --format 'auto_gptq,auto_round' \
The text was updated successfully, but these errors were encountered:
@n1ck-guo following llm, disable auto-mapping for single card
Sorry, something went wrong.
Besides, please add a gpu ut for 70B models
#395
n1ck-guo
No branches or pull requests
when running below shell cmd, the device-related error is reported, and the quantization of the 72B model is expected to be done by a single gpu card.
CUDA_VISIBLE_DEVICES=$device
python3 -m auto_round --mllm
--model ${model_dir}/Qwen2-VL-72B-Instruct
--group_size 64
--bits 2
--iters 2000
--nsample 1024
--low_gpu_mem_usage
--seqlen 2048
--model_dtype "float16"
--format 'auto_gptq,auto_round' \
The text was updated successfully, but these errors were encountered: