-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(doris): add catalog support for Apache Doris #31580
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed my review and didn't find any issues... but I did find this chicken.
\\
(o>
<_ )
^^
Files scanned
File Path | Reviewed |
---|---|
superset/db_engine_specs/doris.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- Chat with Korbit on issues we post by tagging @korbit-ai in your reply.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Naming ✅ Database Operations ✅ Documentation ✅ Logging ✅ Error Handling ✅ Systems and Environment ✅ Objects and Data Structures ✅ Readability and Maintainability ✅ Asynchronous Processing ✅ Design Patterns ✅ Third-Party Libraries ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #31580 +/- ##
===========================================
+ Coverage 60.48% 83.77% +23.28%
===========================================
Files 1931 538 -1393
Lines 76236 39142 -37094
Branches 8568 0 -8568
===========================================
- Hits 46114 32790 -13324
+ Misses 28017 6352 -21665
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@mistercrunch Any idea about the docker image error? |
@betodealmeida is definitely the subject matter expert on catalog support lately! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is missing a proper description, making it difficult to understand key details in the featured changes. Also, a few general comments that caught my eye while reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements @liujiwen-up! One last comment, after that I feel this is good to go 👍
superset/db_engine_specs/doris.py
Outdated
if uri.database and "." in uri.database: | ||
current_catalog, _ = uri.database.split(".", 1) | ||
else: | ||
current_catalog = "internal" | ||
|
||
# In Apache Doris, each catalog has an information_schema for BI tool | ||
# compatibility. See: https://github.com/apache/doris/pull/28919 | ||
adjusted_database = ".".join( | ||
[catalog or current_catalog or "", "information_schema"] | ||
).rstrip(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may sound like a total nit, but I actually had some issues following what's going on here, especially the catalog or current_catalog or ""
logic. As current_catalog
is unnecessary if catalog
is defined, I would have maybe just reused the latter variable for all these uses. Something like:
if catalog:
pass
elif uri.database and "." in uri.database:
catalog, _ = uri.database.split(".", 1) or "" # notice how I also moved the `or ""` part here
else:
catalog = "internal"
Then later just
adjusted_database = ".".join([catalog, "information_schema"])
Also, why is .rstrip(".")
needed? I don't see how we can ever hit that, as adjusted_database
will always end with .information_schema
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro Thanks for your advice. After in-depth testing with Doris, we found that there is still a problem. The previous test only tested the case of linking data sources. When operating on SQL Lab, it will also go to this function and cannot use the information_schema library fixedly. When there is a schema value, the user-provided schema should be used for querying. This implementation is the correct behavior at present.
- When linking data sources, the schema is empty and the information_schema library is used uniformly
- When the schema has a value, the schema value provided by the user is used
@villebro Please help me push it forward. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more round to simplify the code (it'll be easier for maintainers to carry this code forward with less duplication and ambiguity). We should be able to get this into 5.0 as long as we can get this merged before the release cut (no rush yet, but let's try to finish this as soon as possible).
One more thing that comes to mind: is it really necessary to assign information_schema
to the connection string if no schema is selected? Typically we just leave it unspecified (if someone wants to access tables in the information_schema
, they can just choose that schema explicitly).
# In Apache Doris, each catalog has an information_schema for BI tool | ||
# compatibility. See: https://github.com/apache/doris/pull/28919 | ||
if schema: | ||
adjusted_database = ".".join([catalog or "", schema]) | ||
else: | ||
adjusted_database = ".".join([catalog or "", "information_schema"]) | ||
uri = uri.set(database=adjusted_database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema assignment can be simplified, and I'd prefer to just rename adjusted_database
to database
:
# In Apache Doris, each catalog has an information_schema for BI tool | |
# compatibility. See: https://github.com/apache/doris/pull/28919 | |
if schema: | |
adjusted_database = ".".join([catalog or "", schema]) | |
else: | |
adjusted_database = ".".join([catalog or "", "information_schema"]) | |
uri = uri.set(database=adjusted_database) | |
# In Apache Doris, each catalog has an information_schema for BI tool | |
# compatibility. See: https://github.com/apache/doris/pull/28919 | |
schema = schema or "information_schema" | |
database = ".".join([catalog or "", schema]) | |
uri = uri.set(database=database) |
SUMMARY
add catalog for apache doris
In Apache Doris, in order to be compatible with different BI tools, we have information_schema for each catalog. When no catalog is specified, the default catalog is internal. This feature corresponds to this PR of Doris, apache/doris#28919
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION