Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New query style for obsdb which enables to use external database #856

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

tterasaki
Copy link
Contributor

new query style for obsdb which enables to use external database.

@tterasaki tterasaki marked this pull request as ready for review May 27, 2024 04:30
@tterasaki tterasaki requested a review from mhasself May 27, 2024 04:31
@tterasaki
Copy link
Contributor Author

I think it is ready to go.

Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat! I want to experiment with it a bit more. The docstring details are appreciated but can you add some basic tests?

@tterasaki
Copy link
Contributor Author

tterasaki commented Jun 7, 2024

Thank you, Matthew.
If you want to test or play around, you can use:
computeXX: /so/metadata/satp1/manifests/pwv_class/satp1_pwv_class_240516m/pwv_class.sqlite

Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to get to this. This is a really useful capability to have!

My requests for now are:

  • deal with too-long lines (marked inline)
  • squash and rebase

In the longer term I think we should update the interface so that:

  • Extend ObsDb so that 'attach' can occur on the base ObsDb object, allowing access to merged fields in any request (including .get), without passing subdb_info to each call
  • Extent Context so it can attach obsdb extensions automatically.
  • Include more description of this feature in the docs.

But we don't need to deal with that stuff now.

cursor = self.conn.cursor()

try:
extra_fields_main, joins_main, query_text_main = _generate_query_components_from_tags(query_text=query_text, tags=tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long; use \

aliases.append(alias)
attach = f"ATTACH DATABASE '{filepath}' AS '{alias}'"
cursor = cursor.execute(attach)
_extra_fields_sub, _join_sub, _query_sub = _generate_query_components_from_subdb(filepath=filepath,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reduce line lengths here.

You could pass **subdb_info instead of all these .get(...)?

If you do not know the parameters in the sub-database, you can view the params
as long as it is a ManifestDb like below::

from sotolib.core import metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sotodlib.core

@davidvng
Copy link
Contributor

In combination with #850, I added handling for querying an external ManifestDB in which a field, in this case coverage has a string value that is a list comma-separated strings in SQLite format. I added the translation of the query '<substring> in coverage' to become 'coverage LIKE "%<substring>%"' to be understood by SQL as "any string that contains this substring"
I added a test to the test script and also tested on a real ManifestDB I created from #850
In that database there is the following entry:

[{'_id': 1,
  'filename': 'preprocess_obs_sourcetags_archive_v2.h5',
  'obs:obs_id': 'obs_1721381147_satp1_1111111',
  'dataset': 'obs_1721381147_satp1_1111111',
  'coverage': 'saturn:ws0,saturn:ws4,saturn:ws5,neptune:ws4,neptune:ws5,QSO_J2253+1608:ws5'}]

I can successfully retrieve that entry with the following query:

r0 = obsdb.query(
                    subdbs_info_list = [{'filepath': extdb_path,
                                         'query_list': ['saturn:ws0 in coverage'],
                                         'params_list': ['coverage']}]
                   )

@davidvng davidvng mentioned this pull request Sep 13, 2024
@mhasself
Copy link
Member

Hi @davidvng and @tterasaki -- who is going to deal with my requested changes to the original PR?

@davidvng On the "coverage" changes -- those seem fine but you need to clean up the testing function (remove prints, restore mpi check).

@davidvng davidvng requested a review from mhasself October 7, 2024 18:31
@davidvng
Copy link
Contributor

Added method to query source distance tag. Example:

[{'_id': 1,
  'filename': 'preprocess_obs_sourcetags_archive_v3.h5',
  'obs:obs_id': 'obs_1722593318_satp1_1111111',
  'dataset': 'obs_1722593318_satp1_1111111',
  'coverage': 'mars:ws0,mars:ws1,mars:ws2,jupiter:ws1,jupiter:ws2,uranus:ws2',
  'source_distance': 'mars:10.6,jupiter:12.1,uranus:15.3'}]

I can successfully retrieve that entry with the following query:

r0 = obsdb.query(
                    subdbs_info_list = [{'filepath': extdb_path,
                                         'query_list': ['jupiter:ws1 in coverage', 'jupiter < 13'],
                                         'params_list': ['coverage', 'source_distance']}]
                   )

Here both coverage and source distance are used, but source distance can be used by itself as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants