-
Notifications
You must be signed in to change notification settings - Fork 39
Issues: microsoft/MInference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Question]: The evaluation code of scbench does not match the provided dataset.
question
Further information is requested
#103
opened Dec 26, 2024 by
rainstorm12
[Question]: different eval results compared to the results in paper
question
Further information is requested
#99
opened Dec 19, 2024 by
unicorneeee
[Question]: What are the definitions of the different stages?
question
Further information is requested
#98
opened Dec 19, 2024 by
crazyofapple
[Question]: How to apply MInference on multiple A100 GPUs?
question
Further information is requested
#95
opened Dec 13, 2024 by
XiongxiaoL
[Question]: How to understand dense_decoding?
question
Further information is requested
#94
opened Dec 13, 2024 by
lemyx
[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols?
question
Further information is requested
#92
opened Dec 11, 2024 by
unicorneeee
[Question]:Code related question: Is the search just for the first batch of dataset?
question
Further information is requested
#91
opened Dec 9, 2024 by
unicorneeee
[Question]: vllm-tp generate can't stop
question
Further information is requested
#90
opened Dec 4, 2024 by
unicorneeee
[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack
question
Further information is requested
#88
opened Nov 26, 2024 by
lepangdan
[Question]: How can I reproduce the FullAttention results on the Ruler dataset
question
Further information is requested
#87
opened Nov 25, 2024 by
LfieLike
[Feature Request]: Is it possible to get the returned logsumexp in streamingllm forward?
feature request
New feature or request
#85
opened Nov 17, 2024 by
311dada
[Question]: Discrepancy in Pre-filling Time and Memory Consumption on Single A100
question
Further information is requested
#84
opened Nov 15, 2024 by
lepangdan
[Question]: Am I using minference correctly?
question
Further information is requested
#83
opened Oct 30, 2024 by
YLGH
[Question]: analysis of attention scores (too sparse)
question
Further information is requested
#82
opened Oct 19, 2024 by
wiluen
[Question]: sparsity of minference
question
Further information is requested
#78
opened Sep 23, 2024 by
susu1210
[Bug]: Torch not found: can't install with pip install (Python 3.12, CUDA 12.6 Update 1, PyTorch 2.4.1)
bug
Something isn't working
#77
opened Sep 20, 2024 by
atemerev
[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv
question
Further information is requested
#76
opened Sep 18, 2024 by
gaow0007
[Bug]: loc("Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands
bug
Something isn't working
#75
opened Sep 18, 2024 by
leoyuppieqnew
[Feature Request]: Support LLaVA Model feature request / Low generation speed
feature request
New feature or request
#74
opened Sep 18, 2024 by
ThisisBillhe
[Question]: what is the speedup of attention kernel of current implemetation?
question
Further information is requested
#73
opened Sep 10, 2024 by
foreverpiano
Performance Degradation when Using MInference with Qwen2-7B-Instruct Model
question
Further information is requested
#71
opened Aug 26, 2024 by
yumingfan-0219
[Bug]: vllm executor.driver_worker. 'RayWorkerWrapper' object has no attribute 'model_runner'
bug
Something isn't working
#67
opened Aug 8, 2024 by
TPLink32
[Question]: Confusion about Optimal Search Pattern Configuration
question
Further information is requested
#64
opened Aug 6, 2024 by
Dianaia
[Question]: It seems that minference does not currently support tensor parallelism under vllm, right? Because in a multi-card environment, the head_id here is incorrect compared to a single card
feature request
New feature or request
question
Further information is requested
#62
opened Aug 4, 2024 by
zh2333
[Question]: Why is every head config saved with "vertical_and_slash"?
question
Further information is requested
#57
opened Jul 29, 2024 by
fmmoret
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-11-27.