Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a set for TargetPython.get_tags for performance #12204

Merged
merged 10 commits into from
Aug 6, 2023

Conversation

hauntsaninja
Copy link
Contributor

@hauntsaninja hauntsaninja commented Aug 4, 2023

This makes a representative pip-compile about 1.6x faster

# Before
λ hyperfine -w 1 -M 3 -p 'rm req.txt || true' 'python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking'
Benchmark 1: python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking
  Time (mean ± σ):     28.552 s ±  2.552 s    [User: 20.477 s, System: 1.547 s]
  Range (min … max):   26.446 s … 31.390 s    3 runs

# After
λ hyperfine -w 1 -M 3 -p 'rm req.txt || true' 'python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking'
Benchmark 1: python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking
  Time (mean ± σ):     17.570 s ±  0.450 s    [User: 11.537 s, System: 1.345 s]
  Range (min … max):   17.095 s … 17.989 s    3 runs

This makes the set.isdisjoint operation done over here much cheaper: https://github.com/pypa/pip/blob/593b85f4a/src/pip/_internal/models/wheel.py#L92

The reason for this is because a Python usually has a lot of supported tags. The Python I used above has 2000 supported tags! Whereas a wheel usually only has one or two file tags.

The CPython code will unfortunately iterate over all 2000 tags to check if there's a match. Only if the other collection is a set will CPython think to swap the operands to iterate over the shorter collection: https://github.com/python/cpython/blob/35963da40f/Objects/setobject.c#L1352

This speeds up a representative pip-compile by about 40%

```
λ hyperfine -w 1 -M 3 -p 'rm req.txt || true' 'python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking'
Benchmark 1: python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking
  Time (mean ± σ):     28.552 s ±  2.552 s    [User: 20.477 s, System: 1.547 s]
  Range (min … max):   26.446 s … 31.390 s    3 runs

λ hyperfine -w 1 -M 3 -p 'rm req.txt || true' 'python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking'
Benchmark 1: python -m piptools compile req.in --output-file req.txt --strip-extras --resolver=backtracking
  Time (mean ± σ):     17.570 s ±  0.450 s    [User: 11.537 s, System: 1.345 s]
  Range (min … max):   17.095 s … 17.989 s    3 runs
```

This makes the set.isdisjoint operation done over here much cheaper:
https://github.com/pypa/pip/blob/593b85f4a/src/pip/_internal/models/wheel.py#L92

The reason for this is because a Python usually has a lot of supported
tags. The Python I used above has 2000 supported tags! Whereas a wheel
usually only has one or two file tags.

The CPython code will unfortunately iterate over all 2000 tags to check
if there's a match. Only if the other collection is a set will CPython
think to swap the operands to iterate over the shorter collection:
https://github.com/python/cpython/blob/35963da40f/Objects/setobject.c#L1352
@hauntsaninja hauntsaninja changed the title Use a set for TargetPython.get_tags Use a set for TargetPython.get_tags for performance Aug 4, 2023
@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Aug 4, 2023

Oh, also note I tried narrowing the type on Wheel.supported to only take Set[Tag] instead of Iterable[Tag]. It's pretty doable, let me know if I should pursue this (could also be a follow-up PR).

@uranusjr
Copy link
Member

uranusjr commented Aug 4, 2023

I don’t see why we need to narrow it though. Iterable seems just fine to me.

@pradyunsg pradyunsg merged commit 901db9c into pypa:main Aug 6, 2023
@hauntsaninja hauntsaninja deleted the set-tags branch August 6, 2023 18:29
hauntsaninja added a commit to hauntsaninja/pip that referenced this pull request Aug 6, 2023
This would have caught an issue I had when writing
pypa#12204
cosmicexplorer pushed a commit to cosmicexplorer/pip that referenced this pull request Aug 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 22, 2023
@ichard26 ichard26 added the type: performance Commands take too long to run label Dec 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants