Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: use a Dev Drive to improve Windows I/O performance #13123

Closed
wants to merge 1 commit into from

Conversation

ichard26
Copy link
Member

I was reading actions/runner-images#8755 when I saw that the uv project has switched to using a Dev Drive for better Windows CI performance: astral-sh/uv#3522

It turns out that there is a community action (https://github.com/samypr100/setup-dev-drive) that can set up a Dev Drive for us, but the Powershell script needed for our simple needs is really not that bad. I'd prefer keeping the implementation local instead of depending on yet another 3rd party action.

@ichard26 ichard26 added skip news Does not need a NEWS file entry (eg: trivial changes) C: automation Automated checks, CI etc labels Dec 22, 2024
@ichard26 ichard26 force-pushed the devdrive branch 3 times, most recently from 3326d1c to be02015 Compare December 22, 2024 01:39
@notatallshaw
Copy link
Member

notatallshaw commented Dec 22, 2024

Related issue (but not identical): #12055

@ichard26 ichard26 force-pushed the devdrive branch 10 times, most recently from cad5bd9 to 6ee4068 Compare December 22, 2024 02:41
Dev Drives are a modern Windows feature based on the ReFS file system
which offers significantly better performance for developer-focused
workloads. This is perfect for pip's Windows CI which is still slower
than the Unix jobs.

Most of the implementation was borrowed from the uv project which also
uses a Dev Drive to improve their Windows CI times. There is a community
action (samypr100/setup-dev-drive) that can set up a Dev Drive for us,
but the Powershell script needed for our simple needs is really not that
bad. The small maintenance burden of doing it ourselves is perferable
over the risks of using yet another 3rd party action.

NB: We used to use a RAM disk to improve I/O performance, but the creation
of the RAM disk started to fail intermittently, annoying everyone and
eliminating any speed ups gained by the constant retrying needed.

See also: https://learn.microsoft.com/en-us/windows/dev-drive/
@ichard26
Copy link
Member Author

Sigh, it doesn't seem like the Dev Drive is giving us any sort of speed up. The four Windows jobs durations are on par with the runtimes we've gotten lately 🙁

@zooba am I missing anything obvious? I thought pip's test suite had massive gains on a ReFS disk / Dev Drive in your testing. Maybe I should use the 3rd party action as that should be a known good solution, and then work backwards to fix my custom set up.

@ichard26
Copy link
Member Author

Maybe I should use the 3rd party action as that should be a known good solution, and then work backwards to fix my custom set up.

So I tried that and it didn't do anything to improve the times: https://github.com/ichard26/pip/actions/runs/12450861807

I think this idea is dead.

@ichard26
Copy link
Member Author

I also tried using the Windows 2025 image as they support Dev Drives (and not just ReFS) but that also didn't help. This is frustrating, but it's time to move on.

@ichard26 ichard26 closed this Dec 22, 2024
@zooba
Copy link
Contributor

zooba commented Dec 24, 2024

It may have been hurt by putting it on the C:\ drive instead of in your working directory. I'm not familiar with the internals of GitHub Action's VMs, but I know it's basically the same as Azure Pipelines, and over there they use a slow C: drive for the OS (read-optimized) and a fast D: drive for working space. Many builds have been drastically improved just by moving from C: to D:.

A mounted VHDX shouldn't suffer that badly, but it's possible that there's something slower about the virtual disks than the physical disks we usually test on.

Having a quick glance at your tests, I don't see any evidence that they're actually using the new drive. Do you have any way to confirm that's the case? Any possibility that a different setting is taking precedence and keeping the tests on the slow drive?

@ichard26
Copy link
Member Author

Thanks @zooba for the additional information! I was able to improve Dev Drive's performance uplift with your help... but it turns out that simply moving TEMP to the D: drive nets an even bigger improvement, heh. #13129

@zooba
Copy link
Contributor

zooba commented Dec 27, 2024

Yeah, as I said, the move to a Dev Drive gives you:

  • not on the OS drive (biggest benefit)
  • bypassed/optimised malware scans (big benefit)
  • optimised ReFS (some benefit)

Except on GHA, there's no malware scans anyway, so you don't get the second benefit. The third may well be outweighed by the time taken to set it up, though we also drastically improved things with better stat() and copy() implementations (for both NTFS and ReFS), so pip may not be seeing such a big jump after those improvements.

Generally I think we've seen a consistent 5-10% penalty for being on a virtual drive (nested, in this case), and the ReFS benefit is highly workload dependent. So it's worth trying, but may not be of great benefit. One day it should just be the default anyway - maybe that's today?

@ichard26
Copy link
Member Author

So it's worth trying, but may not be of great benefit.

Seems like that was the case :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: automation Automated checks, CI etc skip news Does not need a NEWS file entry (eg: trivial changes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants