[Question]: different eval results compared to the results in paper #99

unicorneeee · 2024-12-19T11:05:09Z

Describe the issue

hello ,when I use the env as follows:
transformers:4.47.0
torch 2.3.0
triton 2.1.0
flash_attn 2.5.8
minference use the support_tp branch
however, I set the attn_type="hf" to evalute the infinitebench, there are different results compared to the results in paper:
results self-evalute (use llama3-8B-262K)
Llama-3-8B-Instruct-262k_hf,code_debug,24.62
Llama-3-8B-Instruct-262k_hf,math_find,18.00
Llama-3-8B-Instruct-262k_hf,longdialogue_qa_eng,0.50
Could U please provide the requirments of env when U test the infinitebench? thank you!

iofu728 · 2024-12-20T07:35:42Z

Hi @unicorneeee, thanks for your question.

Could you share your scripts with us?
In our experiments, we used "minference_with_dense" to avoid OOM issues in long-context scenarios. You can give that a try as well.

Let me know if it helps!

unicorneeee · 2024-12-20T08:41:59Z

I use the run_infinitebench.sh to run the infinitebench,the code is as follow and I directly use the run_infinitebench.py without any change
CUDA_VISIBLE_DEVICES=4,5,6,7 python "$SCRIPT_DIR/run_infinitebench.py"
--task $task
--model_name_or_path gradientai/Llama-3-8B-Instruct-262k
--data_dir InfiniteBench/data/infiniteBench
--output_dir ./result_hf
--max_seq_length 160000
--rewrite
--num_eval_examples -1 --topk 1 --starting_layer 0 --attn_type hf
Is any parameter different with you?

iofu728 · 2024-12-20T08:48:26Z

Hi @unicorneeee, we use attn_type=minference_with_dense to avoid OOM issues. You can try it again and let us know if it works.

unicorneeee · 2024-12-20T08:57:38Z

Thanks for your answer! I will try it again!

unicorneeee · 2024-12-20T09:30:54Z

Hello!
When I eval the model llama3-8B-262K with float32, use minference_with_dense have to transfer all Q K V matrix to torch.bfloat16, otherwise it will raise an error: flash_attn just support the bf16 or fp16. Therefore in the forward code just before the flash_attn function, I transfer the all matrix to bfloat16.
def dense(q, k, v, vertical_size=None, slash_size=None):
return flash_attn_func(q.transpose(1, 2).to(torch.bfloat16), k.transpose(1, 2).to(torch.bfloat16), v.transpose(1,2).to(torch.bfloat16), 0.0, softmax_scale=None, causal=q_len != 1).view(bsz, 1, q_len, self.head_dim)
Have you ever occured this issue? Is it the reason for the different result compared to the paper?
Thank you!

iofu728 · 2024-12-23T02:01:25Z

Hi @unicorneeee,

I haven’t encountered this issue before. Llama3+ is already bf16 by default. https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k/blob/main/config.json#L24

By the way, are you using flash_attn or our custom triton ops?
You might want to fix this issue and give it another try.

unicorneeee · 2024-12-23T02:25:46Z

Thanks for your answer! The model evaled in the paper is llama3-8B-262K, why this link is 1048K model? And I use the link in paper to download the 262K model, which I found that model this fp32, however the link is unavaible now?

iofu728 · 2024-12-23T08:11:43Z

Hi @unicorneeee, apologies for the confusion—I just shared a random model as an example. You can refer to this configuration instead: https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k/blob/main/config.json#L24.

unicorneeee · 2024-12-25T01:47:02Z

Hello! I have use the model downloaded by the link, and find the results is different with the results in paper:(especially in code debug and EN dia task)
llama3-8B-262K_minference_with_dense,code_debug,24.62
llama3-8B-262K_minference_with_dense,math_find,18.00
llama3-8B-262K_minference_with_dense,longdialogue_qa_eng,0.50
and my code is as follow:
TASKS=("code_debug" "math_find" "longdialogue_qa_eng")

export TOKENIZERS_PARALLELISM=false
SCRIPT_DIR=$(dirname "$0")
for task in ${TASKS[@]}; do
echo $task
CUDA_VISIBLE_DEVICES=4,5,6,7 python "$SCRIPT_DIR/run_infinitebench.py"
--task $task
--model_name_or_path models/llama3-8B-262K
--data_dir data
--output_dir ./result_hf
--max_seq_length 160000
--rewrite
--num_eval_examples -1 --topk 1 --starting_layer 0 --attn_type minference_with_dense
done

unicorneeee added the question Further information is requested label Dec 19, 2024

iofu728 self-assigned this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: different eval results compared to the results in paper #99

[Question]: different eval results compared to the results in paper #99

unicorneeee commented Dec 19, 2024

iofu728 commented Dec 20, 2024

unicorneeee commented Dec 20, 2024

iofu728 commented Dec 20, 2024

unicorneeee commented Dec 20, 2024

unicorneeee commented Dec 20, 2024

iofu728 commented Dec 23, 2024

unicorneeee commented Dec 23, 2024

iofu728 commented Dec 23, 2024

unicorneeee commented Dec 25, 2024

[Question]: different eval results compared to the results in paper #99

[Question]: different eval results compared to the results in paper #99

Comments

unicorneeee commented Dec 19, 2024

Describe the issue

iofu728 commented Dec 20, 2024

unicorneeee commented Dec 20, 2024

iofu728 commented Dec 20, 2024

unicorneeee commented Dec 20, 2024

unicorneeee commented Dec 20, 2024

iofu728 commented Dec 23, 2024

unicorneeee commented Dec 23, 2024

iofu728 commented Dec 23, 2024

unicorneeee commented Dec 25, 2024