Repeated timeouts in GitHub Actions fetching wheel for large packages #1912

adamtheturtle · 2024-02-23T13:11:04Z

In the last few days since switching to uv, I have seen errors that I have not seen before with pip.

I see:

error: Failed to download distributions
  Caused by: Failed to fetch wheel: torch==2.2.1
  Caused by: Failed to extract source distribution
  Caused by: request or response body error: operation timed out
  Caused by: operation timed out
Error: Process completed with exit code 2.

I see this on the CI for vws-python-mock, which requires installing 150 packages:

uv pip install --upgrade --editable .[dev]
...
Resolved 150 packages in 1.65s
Downloaded 141 packages in 21.41s
Installed 150 packages in 283ms

I do this in parallel across many jobs on GitHub Actions, mostly on ubuntu-latest.

This happened with torch 2.2.0 before the recent release of torch 2.2.1.
It has not happened with any other dependencies.
The wheels for torch are pretty huge: https://pypi.org/project/torch/#files.

uv is always at the latest version as I run curl -LsSf https://astral.sh/uv/install.sh | sh. In the most recent example, this is uv 0.1.9.

Failures:

The text was updated successfully, but these errors were encountered:

adamtheturtle · 2024-02-23T13:15:13Z

Perhaps I just need to use UV_HTTP_TIMEOUT and I will, but I thought that this might be worth pointing out:

If so, the error message could helpfully point to UV_HTTP_TIMEOUT
Perhaps the default is too small if using GitHub Actions + a popular package times out

zanieb · 2024-02-23T17:08:24Z

Thanks for the feedback, I've opened issues for your requests

adamtheturtle · 2024-02-23T17:20:00Z

Thank you @zanieb ! I don't know the value of having this issue open, but I'll leave it to you to close if desired.

zanieb · 2024-02-23T17:28:32Z

In #1921 my co-worker noted that this might be a bug in the way we're specifying the timeout so I'll recategorize this one and leave it open.

konstin · 2024-02-28T14:23:49Z

Looking at the actions runs, all the passing actions take ~30s, while the failing ones error after 5min, which is our default timeout, so this looks like a network failure (in either github actions or rust)

konstin · 2024-03-01T11:37:57Z

I'm not seeing any timeouts anymore with the two most recent versions (https://github.com/konstin/vws-python-mock/actions). Could you check if this now solved?

adamtheturtle · 2024-03-01T11:38:47Z

I have not seen this issue since posting. Thank you for looking into this.

konstin · 2024-03-01T11:40:45Z

I'll close it for now, please feel free to reopen should it reoccur

adamtheturtle · 2024-03-04T01:35:52Z

@konstin I do not have permissions to re-open this issue. I can create a new one, but it is probably easier if you re-open this.

This failure has reoccurred:

hmc-cs-mdrissi · 2024-03-05T03:23:43Z

I'm seeing very similar error message for non pytorch package that's also pretty large. It's ~400 MB wheel and consistently gives me,

(bento_uv2) pa-loaner@C02DVAQNMD6R training-platform % uv pip install --index-url=$REGISTRY_INDEX data-mesh-cli==0.0.66
error: Failed to download: data-mesh-cli==0.0.66
  Caused by: The wheel data_mesh_cli-0.0.66-py3-none-any.whl is not a valid zip file
  Caused by: an upstream reader returned an error: request or response body error: operation timed out
  Caused by: request or response body error: operation timed out
  Caused by: operation timed out

Package is company internal one though, but I think only notable thing is very large size (it vendors spark/java stuff).

edit: Pytorch weirdly installs fine for me pretty fast.

adamtheturtle · 2024-03-13T09:39:46Z

I have changed the title of this to not reference torch. It recently happened with nvidia-cudnn-cu12, another large download.

As another example, https://github.com/VWS-Python/vws-python-mock/actions/runs/8262236134 has 7 failures in one run.

astrojuanlu · 2024-03-18T20:13:44Z

It can happen on Read the Docs as well, not only GHA https://beta.readthedocs.org/projects/kedro-datasets/builds/23790543/

astrojuanlu · 2024-03-24T06:27:07Z

Spotted it locally today inside a local Docker image running under QEMU

error: Failed to download distributions
  Caused by: Failed to fetch wheel: nvidia-cublas-cu12==12.1.3.1
  Caused by: Failed to extract archive
  Caused by: Failed to download distribution due to network timeout. Try increasing UV_HTTP_TIMEOUT (current value: 300s).

Reverts c59f0ca (#13) Too many CI test timeouts from installing torch/nvidia packages with uv: astral-sh/uv#1912

…3144) ## Summary This leverages the new `read_timeout` property, which ensures that (like pip) our timeout is not applied to the _entire_ request, but rather, to each individual read operation. Closes: #1921. See: #1912.

njzjz · 2024-04-19T22:12:24Z

I encountered the problem when I used either uv or pip to download large wheels (for pip, the issue is pypa/pip#4796 and pypa/pip#11153), so I think the root cause is the network. However, I am wondering if uv can be smarter to retry automatically, like something in pypa/pip#11180.

astrojuanlu · 2024-04-20T06:56:46Z

Worth trying 0.1.35, which includes #3144

zanieb · 2024-04-21T15:32:48Z

It seems likely that this is resolved by #3144

OneCyrus · 2024-04-25T20:20:47Z

I encountered the problem when I used either uv or pip to download large wheels (for pip, the issue is pypa/pip#4796 and pypa/pip#11153), so I think the root cause is the network. However, I am wondering if uv can be smarter to retry automatically, like something in pypa/pip#11180.

that would be a great feature. we have our dev environments behind TLS inspection and some packages often run into a timeout due too slow inspection. we can reproduce this with a browser and the download gets stuck until a timeout. in the browser we can just click resume and the browser reconnects snd downloads the remaining part. with uv we don't have a retry with resume. so it starts from scratch and gets stuck again.

+1 for retry with resume

charliermarsh · 2024-05-01T16:33:32Z

Going to close for now, but we can re-open if this comes up again post-changing the timeout semantics.

zanieb added the question Asking for clarification or support label Feb 23, 2024

This was referenced Feb 23, 2024

Failed to download distributions with large requirements file #1920

Closed

Increase or adjust scope of default HTTP timeout #1921

Closed

zanieb mentioned this issue Feb 23, 2024

Display HTTP timeout setting on timeout #1922

Closed

zanieb added bug Something isn't working and removed question Asking for clarification or support labels Feb 23, 2024

konstin self-assigned this Feb 28, 2024

konstin closed this as completed Mar 1, 2024

konstin reopened this Mar 4, 2024

konstin removed their assignment Mar 4, 2024

hmc-cs-mdrissi mentioned this issue Mar 5, 2024

Fetching Error with Custom Index #2099

Closed

adamtheturtle changed the title ~~Repeated timeouts in GitHub Actions fetching wheel for torch~~ Repeated timeouts in GitHub Actions fetching wheel for large packages Mar 13, 2024

ankatiyar mentioned this issue Mar 18, 2024

kedro-datasets: Nightly build failure kedro-org/kedro-plugins#618

Closed

astrojuanlu mentioned this issue Mar 25, 2024

kedro-datasets: Nightly build failure kedro-org/kedro-plugins#624

Closed

eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Apr 2, 2024

ci: switch back from uv to pip

8d8cd1d

Reverts c59f0ca (#13) Too many CI test timeouts from installing torch/nvidia packages with uv: astral-sh/uv#1912

eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Apr 3, 2024

ci: switch back from uv to pip

00f8d47

Reverts c59f0ca (#13) Too many CI test timeouts from installing torch/nvidia packages with uv: astral-sh/uv#1912

astrojuanlu mentioned this issue Apr 11, 2024

kedro-datasets: Nightly build failure kedro-org/kedro-plugins#645

Closed

charliermarsh mentioned this issue Apr 19, 2024

Enforce HTTP timeouts on a per-read (rather than per-request) basis #3144

Merged

konstin mentioned this issue Apr 22, 2024

30s default http read timeout #3182

Merged

charliermarsh closed this as completed May 1, 2024

yedpodtrzitko mentioned this issue Jun 15, 2024

fix: pip cache failures in workflows TagStudioDev/TagStudio#293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated timeouts in GitHub Actions fetching wheel for large packages #1912

Repeated timeouts in GitHub Actions fetching wheel for large packages #1912

adamtheturtle commented Feb 23, 2024

adamtheturtle commented Feb 23, 2024

zanieb commented Feb 23, 2024 •

edited

Loading

adamtheturtle commented Feb 23, 2024

zanieb commented Feb 23, 2024

konstin commented Feb 28, 2024

konstin commented Mar 1, 2024

adamtheturtle commented Mar 1, 2024

konstin commented Mar 1, 2024

adamtheturtle commented Mar 4, 2024

hmc-cs-mdrissi commented Mar 5, 2024 •

edited

Loading

adamtheturtle commented Mar 13, 2024

astrojuanlu commented Mar 18, 2024

astrojuanlu commented Mar 24, 2024

njzjz commented Apr 19, 2024 •

edited

Loading

astrojuanlu commented Apr 20, 2024

zanieb commented Apr 21, 2024

OneCyrus commented Apr 25, 2024

charliermarsh commented May 1, 2024

Repeated timeouts in GitHub Actions fetching wheel for large packages #1912

Repeated timeouts in GitHub Actions fetching wheel for large packages #1912

Comments

adamtheturtle commented Feb 23, 2024

adamtheturtle commented Feb 23, 2024

zanieb commented Feb 23, 2024 • edited Loading

adamtheturtle commented Feb 23, 2024

zanieb commented Feb 23, 2024

konstin commented Feb 28, 2024

konstin commented Mar 1, 2024

adamtheturtle commented Mar 1, 2024

konstin commented Mar 1, 2024

adamtheturtle commented Mar 4, 2024

hmc-cs-mdrissi commented Mar 5, 2024 • edited Loading

adamtheturtle commented Mar 13, 2024

astrojuanlu commented Mar 18, 2024

astrojuanlu commented Mar 24, 2024

njzjz commented Apr 19, 2024 • edited Loading

astrojuanlu commented Apr 20, 2024

zanieb commented Apr 21, 2024

OneCyrus commented Apr 25, 2024

charliermarsh commented May 1, 2024

zanieb commented Feb 23, 2024 •

edited

Loading

hmc-cs-mdrissi commented Mar 5, 2024 •

edited

Loading

njzjz commented Apr 19, 2024 •

edited

Loading