microsoft / MInference Public

Notifications You must be signed in to change notification settings
Fork 39
Star 853

Code
Issues 43
Pull requests 2
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: microsoft/MInference

[ToDo]: V0.1.6 Iteration Plan

#50 opened Jul 18, 2024 by iofu728

Open

[ToDo]: V0.1.5 Iteration Plan

#27 by iofu728 was closed Jul 24, 2024

Closed

Labels 15 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

43 Open 29 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Question]: The evaluation code of scbench does not match the provided dataset. question

Further information is requested

#103 opened Dec 26, 2024 by rainstorm12

[Question]: different eval results compared to the results in paper question

Further information is requested

#99 opened Dec 19, 2024 by unicorneeee

[Question]: What are the definitions of the different stages? question

Further information is requested

#98 opened Dec 19, 2024 by crazyofapple

[Question]: How to apply MInference on multiple A100 GPUs? question

Further information is requested

#95 opened Dec 13, 2024 by XiongxiaoL

[Question]: How to understand dense_decoding? question

Further information is requested

#94 opened Dec 13, 2024 by lemyx

[Question]: when searching the best sparse attention type ,why to caculate the score just pick the 2500 cols? question

Further information is requested

#92 opened Dec 11, 2024 by unicorneeee

[Question]:Code related question: Is the search just for the first batch of dataset? question

Further information is requested

#91 opened Dec 9, 2024 by unicorneeee

[Question]: vllm-tp generate can't stop question

Further information is requested

#90 opened Dec 4, 2024 by unicorneeee

[Question]: RuntimeError encountered when trying to reproduce results in needle in a haystack question

Further information is requested

#88 opened Nov 26, 2024 by lepangdan

[Question]: How can I reproduce the FullAttention results on the Ruler dataset question

Further information is requested

#87 opened Nov 25, 2024 by LfieLike

[Feature Request]: Is it possible to get the returned logsumexp in streamingllm forward? feature request

New feature or request

#85 opened Nov 17, 2024 by 311dada

[Question]: Discrepancy in Pre-filling Time and Memory Consumption on Single A100 question

Further information is requested

#84 opened Nov 15, 2024 by lepangdan

[Question]: Am I using minference correctly? question

Further information is requested

#83 opened Oct 30, 2024 by YLGH

[Question]: analysis of attention scores (too sparse) question

Further information is requested

#82 opened Oct 19, 2024 by wiluen

[Question]: sparsity of minference question

Further information is requested

#78 opened Sep 23, 2024 by susu1210

[Bug]: Torch not found: can't install with pip install (Python 3.12, CUDA 12.6 Update 1, PyTorch 2.4.1) bug

Something isn't working

#77 opened Sep 20, 2024 by atemerev

[Question]: Could you provide more examples about other attention usage, e.g., dilated1, streaming, snapkv question

Further information is requested

#76 opened Sep 18, 2024 by gaow0007

[Bug]: loc("Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands bug

Something isn't working

#75 opened Sep 18, 2024 by leoyuppieqnew

[Feature Request]: Support LLaVA Model feature request / Low generation speed feature request

New feature or request

#74 opened Sep 18, 2024 by ThisisBillhe

[Question]: what is the speedup of attention kernel of current implemetation? question

Further information is requested

#73 opened Sep 10, 2024 by foreverpiano

Performance Degradation when Using MInference with Qwen2-7B-Instruct Model question

Further information is requested

#71 opened Aug 26, 2024 by yumingfan-0219

[Bug]: vllm executor.driver_worker. 'RayWorkerWrapper' object has no attribute 'model_runner' bug

Something isn't working

#67 opened Aug 8, 2024 by TPLink32

[Question]: Confusion about Optimal Search Pattern Configuration question

Further information is requested

#64 opened Aug 6, 2024 by Dianaia

[Question]: It seems that minference does not currently support tensor parallelism under vllm, right? Because in a multi-card environment, the head_id here is incorrect compared to a single card feature request

New feature or request

question

Further information is requested

#62 opened Aug 4, 2024 by zh2333

[Question]: Why is every head config saved with "vertical_and_slash"? question

Further information is requested

#57 opened Jul 29, 2024 by fmmoret

Previous 1 2 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-11-27.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly