Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazaar: Use lightweight checkouts rather than a full branch clone #11264

Merged
merged 1 commit into from
Oct 7, 2022

Conversation

jelmer
Copy link
Contributor

@jelmer jelmer commented Jul 15, 2022

Fixes #5444

For Bazaar itself, this changes the clone time from 2 minutes to 11 seconds for me.

@jelmer jelmer force-pushed the bazaar-lightweight-perf branch from 20d290e to a88ec53 Compare July 15, 2022 21:01
@sbidoul
Copy link
Member

sbidoul commented Jul 16, 2022

Hi,

I'm personally not familiar with bazaar at all anymore, but I'll note that when these kind of solutions where showing up for git, they were rejected because some use cases rely on having the history available in the cloned repo (e.g. for tools such as setuptools_scm).

Could this be an issue for bazaar users too ?

@sbidoul sbidoul added the C: vcs pip's interaction with version control systems like git, svn and bzr label Jul 16, 2022
@jelmer
Copy link
Contributor Author

jelmer commented Jul 16, 2022

Hi,

I'm personally not familiar with bazaar at all anymore, but I'll note that when these kind of solutions where showing up for git, they were rejected because some use cases rely on having the history available in the cloned repo (e.g. for tools such as setuptools_scm).

Could this be an issue for bazaar users too ?

That's not the case here - you can still access the full history in a lightweight checkout, but it might trigger network activity to retrieve the missing data from the main copy. Even if that is necessary, that's probably going to be faster for things like setuptools-scm since they just need the revision graph - not the contents.

@sbidoul
Copy link
Member

sbidoul commented Jul 16, 2022

Ah, that should be ok then, and similar to what we have done for git with --filter=blob:none. Thanks for the clarification.

@sbidoul
Copy link
Member

sbidoul commented Jul 20, 2022

And, are the difference between branch and checkout significant? And between update and pull?
Are there any backward compatibility concerns? What if (for instance) there is an existing editable directory that was obtained with a previous pip version with bzr branch. Will a bzr update work on it?

@jelmer
Copy link
Contributor Author

jelmer commented Jul 20, 2022

This is not backwards compatible, "bzr up" is a noop for "bzr branch" type clones. That's easy to address though, by adding a "bzr bind" command to convert from a branch to a checkout. I'll update this branch.

@jelmer jelmer force-pushed the bazaar-lightweight-perf branch from a88ec53 to 67d2fe6 Compare August 17, 2022 20:05
@jelmer
Copy link
Contributor Author

jelmer commented Aug 17, 2022

This is not backwards compatible, "bzr up" is a noop for "bzr branch" type clones. That's easy to address though, by adding a "bzr bind" command to convert from a branch to a checkout. I'll update this branch.

Now updated to call "bzr bind" to make it compatible with older branches (and cope with location changes).

@jelmer
Copy link
Contributor Author

jelmer commented Sep 3, 2022

Hi @sbidoul , would you perhaps be able to help review this branch after the last changes or suggest a course of action? Thanks!

@sbidoul
Copy link
Member

sbidoul commented Sep 4, 2022

would you perhaps be able to help review this branch after the last changes or suggest a course of action? Thanks!

Hi, I've had a look at your latest version. It's hard for me to tell as I know next to nothing about bzr. I'm still a bit worried about backward compatibility, though. As checkout and branch don't have the same effect, would it be possible to do something just after checkout so it becomes more or less equivalent to a bzr branch?

@jelmer
Copy link
Contributor Author

jelmer commented Sep 4, 2022

would you perhaps be able to help review this branch after the last changes or suggest a course of action? Thanks!

Hi, I've had a look at your latest version. It's hard for me to tell as I know next to nothing about bzr. I'm still a bit worried about backward compatibility, though. As checkout and branch don't have the same effect, would it be possible to do something just after checkout so it becomes more or less equivalent to a bzr branch?

It seems hard to find somebody in pypa who is fairly with bzr, so I appreciate you taking a look. :-) If it helps, I'm one of the upstream developers for Bazaar.

My changes are backwards compatible. If there is currently an "unbound" branch (i.e. something created by "bzr branch") present, then it will convert it to a bound branch (the same thing "bzr checkout" will create) by calling "bzr bind" first before running "bzr update".

% bzr bind --help
Purpose: Convert the current branch into a checkout of the supplied branch.
Usage:   brz bind [LOCATION]

...

@sbidoul
Copy link
Member

sbidoul commented Sep 4, 2022

My changes are backwards compatible now

I'm not sure. If you look at the obtain method in versioncontrol.py, it does fetch_new if the directory does not exist yet, and update if it does. With this PR fetch_new now does a bzr checkout where it did a bzr branch before, so the result of the first obtain will be in a different state than before.

That's why I'm asking if something can be done in fetch_new to convert the checkout into a branch.

Also, if I read the help of bzr bind I'm confused since it says Convert the current branch into a checkout of the supplied branch. I would think we rather need to convert the checkout into a branch. But again I don't use bzr and I'm probably missing something.

@jelmer
Copy link
Contributor Author

jelmer commented Sep 4, 2022

My changes are backwards compatible now

I'm not sure. If you look at the obtain method in versioncontrol.py, it does fetch_new if the directory does not exist yet, and update if it does. With this PR fetch_new now does a bzr checkout where it did a bzr branch before, so the result of the first obtain will be in a different state than before.

Yup, that's fine. You'll end up with a checkout after fetch_new now. And if there's already a branch rather than a checkout there (as created by fetch_new in older version of pip), then a subsequent call to update() will convert the standalone branch to a checkout because it's calling "bzr bind".

That's why I'm asking if something can be done in fetch_new to convert the checkout into a branch.

Also, if I read the help of bzr bind I'm confused since it says Convert the current branch into a checkout of the supplied branch. I would think we rather need to convert the checkout into a branch. But again I don't use bzr and I'm probably missing something.

We specifically want to end up with a "checkout" rather than a standalone branch (created by "bzr branch"), since a standalone branch carries full history and is thus slow to create. A checkout is a local thing that is meant to merely mirror a remote branch (tying it to that remote branch is what "bzr bind" does); it will fetch things from its "master branch" when it needs to and therefor it doesn't need to have a full copy of history itself.

Or running these as bzr commands:

# Clone hydrazine, obtaining the second to last revision (so we have something to update to later)
# This is similar to what we'd be left by an older version of pip
/tmp % bzr branch -2 lp:hydrazine
Branched 106 revisions.                                                                                                                                                                    
# It's a standalone branch:
/tmp % bzr info hydrazine
Standalone tree (format: 2a)
Location:
  branch root: hydrazine

Related branches:
  parent branch: bzr+ssh://bazaar.launchpad.net/+branch/hydrazine/

# Bind it to the upstream branch
/tmp % cd hydrazine
/tmp/hydrazine % bzr bind lp:hydrazine
# It's now become a checkout
/tmp/hydrazine % bzr info 
Checkout (format: 2a)
Location:
       checkout root: .
  checkout of branch: bzr+ssh://bazaar.launchpad.net/+branch/hydrazine/
# update to the latest revision:
/tmp/hydrazine % bzr up 
 M  README                                                                                                                                                                                 
 M  bugclient
 M  capture-bug-counts
 M  check-membership
 M  feed-pqm
 M  lp-attach
 M  lp-delete-ppa-packages
 M  lp-promote-ppa
 M  scan-merge-proposals
 M  setup.py
All changes applied successfully.                                                                                                                                                          
Updated to revision 107 of branch bzr+ssh://bazaar.launchpad.net/+branch/hydrazine

@jelmer jelmer force-pushed the bazaar-lightweight-perf branch from 67d2fe6 to 4241bc8 Compare September 4, 2022 22:11
@jelmer
Copy link
Contributor Author

jelmer commented Sep 4, 2022

I've added a comment in the code explaining why it's calling "bzr bind".

@jelmer jelmer force-pushed the bazaar-lightweight-perf branch 2 times, most recently from fc9779b to cca101d Compare September 4, 2022 22:15
Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the basis that we don't have anyone with real expertise in Bazaar, but equally I don't think we have many users relying on it, so the risk of major issues is low.

If we get a problem, we can simply revert the PR.

@pfmoore
Copy link
Member

pfmoore commented Oct 7, 2022

I propose we merge this. @sbidoul, as you previously commented here, are you OK with that?

@jelmer can you look into the CI failure?

@sbidoul
Copy link
Member

sbidoul commented Oct 7, 2022

Yup, sorry I had lost track of this.

I still think there might be a minor backward incompatibility concern with pip install -e bzr+https://.... IIUC, what people will get in their src directory will not be the same as before and that might break some workflows.

But that's probably not worth worrying about too much. So 👍 for merging.

@sbidoul
Copy link
Member

sbidoul commented Oct 7, 2022

The CI failure is our flaky test_timeout, so unrelated.

@sbidoul sbidoul added this to the 22.3 milestone Oct 7, 2022
@pfmoore
Copy link
Member

pfmoore commented Oct 7, 2022

The CI failure is our flaky test_timeout, so unrelated.

Sorry, I misread because github actions highlighted these lines as they have "ERROR" in them.

Error: ib/test_lib.py::TestPipTestEnvironment::test_run__allow_stderr_error[ERROR] 
Error: 36m [ 94%] PASSED tests/lib/test_lib.py::TestPipTestEnvironment::test_run__allow_stderr_error[ERROR]

@pradyunsg pradyunsg merged commit d37034c into pypa:main Oct 7, 2022
@pradyunsg
Copy link
Member

I've force-merged, since reverts are easy-enough. :)

@pfmoore
Copy link
Member

pfmoore commented Oct 7, 2022

Beat me to it, @pradyunsg! 🚤

@pradyunsg
Copy link
Member

Heh, good point @pfmoore -- I'll revert the force-merge if it causes main to break.

@pfmoore
Copy link
Member

pfmoore commented Oct 7, 2022

The CI failure is our flaky test_timeout, so unrelated.

Off-topic for here, but is anyone looking at test_timeout, or do we have an issue tracking the problem? Presumably it's this one?

Checking the history, it looks like we originally skipped the test deliberately because it was flaky, but it was "fixed" in 426691d by an over-enthusiastic response to a mypy message (skipif used without an argument).

I propose we revert the marker change, making it skipif(True) to placate mypy, but leaving the test skipped.

@pradyunsg
Copy link
Member

I think we should just remove the test -- it's flaky enough that it's basically rendering our Windows CI useless. :)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 23, 2022
@ichard26 ichard26 added the type: performance Commands take too long to run label Dec 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided C: vcs pip's interaction with version control systems like git, svn and bzr type: performance Commands take too long to run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checking out Bazaar branch makes full clone
5 participants