Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve packaging; split python bits from lib, add run-export; make -dev count #154

Closed
h-vetinari opened this issue Apr 7, 2024 · 6 comments · Fixed by #158
Closed

Improve packaging; split python bits from lib, add run-export; make -dev count #154

h-vetinari opened this issue Apr 7, 2024 · 6 comments · Fixed by #158

Comments

@h-vetinari
Copy link
Member

Currently all the C++ libraries are duplicated between various python versions, because rdkit is a monolithic build per python version.

A better setup would be to have librdkit as the C++ bits, with a proper run-export (that we can also add to the pinning), and upon which rdkit can then depend.

Also, the current rdkit-dev is useless, as it brings in 1:1 the same content as rdkit itself (i.e. it's an content-less package depending on rdkit)

@greglandrum
Copy link
Contributor

This sounds reasonable to me, and if it's possible to be better citizens in the conda-forge ecosystem, it would great to do so.
@jaimergp @pstjohn @mcs07 : I don't know how to do what @h-vetinari is requesting. If any of you do know how to do this and can take it on, please let me know. Otherwise I will try to make a block of time to figure it out.

@h-vetinari : if this is something you can easily do, a PR would be welcome. :-)

@mcs07
Copy link
Contributor

mcs07 commented Apr 9, 2024

Sounds reasonable to me too, but I'm very out of the loop on current best practices here. I guess the libboost/libboost-python split in the boost feedstock might be a useful example to follow. However, correct me if I'm wrong, but my understanding is there's no RDKit cmake configuration to just compile the python bindings by pointing to an existing C++-only RDKit installation? So to achieve this, the conda build for each python might still have to do the full RDKit compile then some hacks to only install the python-specific parts (maybe just a post-install cleanup of lib/libRDKit*.so).

My recollection with rdkit-dev is that it does in fact differ on windows (or at least it used to). I can't remember if it was intentionally added for all platforms for consistency or this happened accidentally. Regardless, is there any harm in preserving it now for backwards compatibility? Looks like the boost feedstock similarly has a bunch of packages like this due to past renaming.

@greglandrum
Copy link
Contributor

greglandrum commented Apr 9, 2024

However, correct me if I'm wrong, but my understanding is there's no RDKit cmake configuration to just compile the python bindings by pointing to an existing C++-only RDKit installation? So to achieve this, the conda build for each python might still have to do the full RDKit compile then some hacks to only install the python-specific parts (maybe just a post-install cleanup of lib/libRDKit*.so).

Yeah, the only way to get the Python build is to do the C++ build as well. But I believe there's some way in the conda-forge infrastructure to construct multiple install artifacts from a single build, so I don't think there should be wasted work

@jaimergp
Copy link
Member

You can create multiple outputs from a single "build script", let's say, so you can do "Run CMake on this repo" followed by "this output gets the .so files" and "this other output gets the .py files".

However, I don't know if we can have a single job build all the python versions. Usually we have one Python version per job. So we would have to build the .so files in all builds, but only package the librdkit bits in one of them.

@h-vetinari
Copy link
Member Author

However, I don't know if we can have a single job build all the python versions.

Yes, that's very possible. It's how boost/arrow/grpc etc. do it. As long as not all outputs (including the "global" build stage, i.e. what is before outputs:; where build.sh / bld.bat is run) contain a python dependency, the CI jobs will be collapsed into one per platform, and only the python-dependent outputs get executed multiple times (one per python version).

However, that presupposes that we can build the two things (C++ lib & python bindings) separately. Otherwise things get messy.

@h-vetinari
Copy link
Member Author

Yeah, the only way to get the Python build is to do the C++ build as well.

It's possible that these are the current assumptions of the upstream build scripts, but it's usually not such a big deal to patch this out and point to an existing C++ lib (built from the same tarball, so we don't have to worry about consistency).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants