Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? #92

Open
unicorneeee opened this issue Dec 11, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@unicorneeee
Copy link

Describe the issue

I hope this messages find u well!
def stream_llm(vertical_size, slash_size):
q_len = q.shape[2]

    mask = torch.triu(torch.tril(torch.ones(q_len, q_len), 0), -slash_size).to(q)
    mask[:,:vertical_size] = 1
    mask = mask.unsqueeze(0).unsqueeze(1)

    est_attn = torch.tril(mask)
    attn_weights_x = attn_weights * est_attn
    res3 = attn_weights_x[:,:,2500:].sum(-1).mean(-1).squeeze().float().detach().cpu().numpy()
    return res3

why this only pick the 2500: columns? If the head is A-shape, this may lead to mis-classification to V-S type?

@unicorneeee unicorneeee added the question Further information is requested label Dec 11, 2024
@iofu728 iofu728 self-assigned this Dec 12, 2024
@iofu728
Copy link
Contributor

iofu728 commented Dec 12, 2024

Hi @unicorneeee, thanks for your interest in MInference.

The value 2500 here is used to exclude the dense upper triangular part of the lower triangle, which doesn’t affect any results.

@unicorneeee
Copy link
Author

Thanks for your answering,! I wander why pick the 2500? Because I draw the attn_weights on the top 30000 tokens, and I notice that there isn't show the dense part in the top 2500, the attn_weights distribute very discrete. Is there any experiment to show pick the 2500 is the best number?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants