You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your answering,! I wander why pick the 2500? Because I draw the attn_weights on the top 30000 tokens, and I notice that there isn't show the dense part in the top 2500, the attn_weights distribute very discrete. Is there any experiment to show pick the 2500 is the best number?
Describe the issue
I hope this messages find u well!
def stream_llm(vertical_size, slash_size):
q_len = q.shape[2]
why this only pick the 2500: columns? If the head is A-shape, this may lead to mis-classification to V-S type?
The text was updated successfully, but these errors were encountered: