[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack #88

lepangdan · 2024-11-26T04:51:39Z

Describe the issue

Hi,

Thanks again for your help. I encountered an error while reproducing results in needle_in_a_haystack by running bash experiments/needle_in_a_haystack/run_needle.sh and would appreciate any insights:

[   1000   72357  143714  215071  286429  357786  429143  500500  571857
  643214  714571  785929  857286  928643 1000000]
[ 286429  357786  429143  500500  571857  643214  714571  785929  857286
  928643 1000000]
# Too long, ignore some logs
 File "/home/far/MInference/minference/modules/minference_forward.py", line 656, in forward
    part_o = self.gather_last_q_vertical_slash_topk_v4(part_q, part_k, part_v, head)
  File "/home/far/MInference/minference/modules/minference_forward.py", line 463, in gather_last_q_vertical_slash_topk_v4
    return fc(q, k, v, vertical_size, slash_size)
  File "/home/far/MInference/minference/modules/minference_forward.py", line 383, in vertical_and_slash_kernel
    slash = sum_all_diagonal_matrix(qk)[...,:-last_q + 1]
  File "/home/far/MInference/minference/modules/minference_forward.py", line 103, in sum_all_diagonal_matrix
    zero_mat = torch.zeros((b, h, n, n)).to(mat.device) # Zero matrix used for padding
  File "/home/far/MInference/minference/modules/minference_forward.py", line 103, in sum_all_diagonal_matrix
    zero_mat = torch.zeros((b, h, n, n)).to(mat.device) # Zero matrix used for padding
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

I noticed the error only occurs when starting from job 4 with the --kv_cache_cpu argument. Jobs in the range [0-4) work fine. Any suggestions on this?

Additionally, I found that the vllm module is required when performing the needle_in_a_haystack experiment. In my opinion, vllm isn't necessary for minference. Is there a specific reason for this, or something I might have missed?

Looking forward to your response!

The text was updated successfully, but these errors were encountered:

iofu728 · 2024-11-26T09:49:00Z

Hi @lepangdan, thanks for your feedback.

It doesn't seem to be related to vLLM. It might be due to GPU memory not being fully reclaimed yet. Could you try running the Python command separately or upgrading Triton?

python experiments/needle_in_a_haystack/needle_test.py \
    --model_name gradientai/Llama-3-8B-Instruct-Gradient-1048k \
    --max_length 1000000 \
    --min_length 1000 \
    --rounds 5 \
    --attn_type minference \
    --kv_cache_cpu \
    --output_path ./needle \
    --run_name minference_LLaMA_1M \
    --jobs 4-15

lepangdan · 2024-11-27T04:31:16Z

Hi @iofu728 ,

The error persists after running your mentioned command. Any further suggestions?

Additionally, could you please confirm the A100 count and total GPU memory used for running the needle experiment?

iofu728 · 2024-11-28T06:02:35Z

Hi @lepangdan,

For the NIAH experiments, we used a single A100 GPU with 216GB CPU memory for inputs up to 800K tokens, while 900K and 1M tokens were tested on a setup with a single A100 GPU and 1TB CPU memory.

Could you try setting specific job ranges like “5-6” or “6-7”? Let me know if you encounter any issues!

lepangdan added the question Further information is requested label Nov 26, 2024

iofu728 self-assigned this Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack #88

[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack #88

lepangdan commented Nov 26, 2024

iofu728 commented Nov 26, 2024

lepangdan commented Nov 27, 2024 •

edited

Loading

iofu728 commented Nov 28, 2024

[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack #88

[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack #88

Comments

lepangdan commented Nov 26, 2024

Describe the issue

iofu728 commented Nov 26, 2024

lepangdan commented Nov 27, 2024 • edited Loading

iofu728 commented Nov 28, 2024

lepangdan commented Nov 27, 2024 •

edited

Loading