[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? #92

unicorneeee · 2024-12-11T10:54:08Z

Describe the issue

I hope this messages find u well!
def stream_llm(vertical_size, slash_size):
q_len = q.shape[2]

    mask = torch.triu(torch.tril(torch.ones(q_len, q_len), 0), -slash_size).to(q)
    mask[:,:vertical_size] = 1
    mask = mask.unsqueeze(0).unsqueeze(1)

    est_attn = torch.tril(mask)
    attn_weights_x = attn_weights * est_attn
    res3 = attn_weights_x[:,:,2500:].sum(-1).mean(-1).squeeze().float().detach().cpu().numpy()
    return res3

why this only pick the 2500: columns? If the head is A-shape, this may lead to mis-classification to V-S type?

The text was updated successfully, but these errors were encountered:

iofu728 · 2024-12-12T18:44:46Z

Hi @unicorneeee, thanks for your interest in MInference.

The value 2500 here is used to exclude the dense upper triangular part of the lower triangle, which doesn’t affect any results.

unicorneeee · 2024-12-13T08:06:59Z

Thanks for your answering,! I wander why pick the 2500? Because I draw the attn_weights on the top 30000 tokens, and I notice that there isn't show the dense part in the top 2500, the attn_weights distribute very discrete. Is there any experiment to show pick the 2500 is the best number?

unicorneeee added the question Further information is requested label Dec 11, 2024

iofu728 self-assigned this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? #92

[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? #92

unicorneeee commented Dec 11, 2024

iofu728 commented Dec 12, 2024

unicorneeee commented Dec 13, 2024

[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? #92

[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? #92

Comments

unicorneeee commented Dec 11, 2024

Describe the issue

iofu728 commented Dec 12, 2024

unicorneeee commented Dec 13, 2024