Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Potential Bug) Partition field names are not URL-encoded in file locations #1458

Open
smaheshwar-pltr opened this issue Dec 20, 2024 · 2 comments · May be fixed by #1457
Open

(Potential Bug) Partition field names are not URL-encoded in file locations #1458

smaheshwar-pltr opened this issue Dec 20, 2024 · 2 comments · May be fixed by #1457

Comments

@smaheshwar-pltr
Copy link

smaheshwar-pltr commented Dec 20, 2024

Potential Bug / Improvement

Unlike the Java implementation, partition field names are not URL-encoded in data locations, only the partition values are.

t = catalog.load_table('ns.tbl').scan().to_arrow()
table = catalog.create_table('ns.tbl2', t.schema)
with table.update_spec() as update:
    update.add_field("data", IdentityTransform(), "da#a")

table.append(t)
assert t == table.scan().to_arrow()

Not sure if this is a significant bug, but this causes the code above to fail (because it's partition field has a special character), when using MinIO: the returned data is messed up. When I go to the MinIO local object store explorer, there's a stray da object (the # cuts it off) that should be a directory for data files, but isn't.

I've confirmed that this is fixed by URL-encoding partition field names (#1457). I discovered this when working on #1452.

@smaheshwar-pltr
Copy link
Author

smaheshwar-pltr commented Dec 20, 2024

@kevinjqliu, please may you take a look if you have a moment? Thanks 😄

@kevinjqliu
Copy link
Contributor

Great catch. I think this is a bug and #175 might be related.

On the java side, looks like this was added recently https://github.com/apache/iceberg/blame/dea2fd1d9debfd23aeda9403ed3eb81c6aebf30f/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L218 in apache/iceberg#10329

could you add a test case in the PR?

@kevinjqliu kevinjqliu added this to the PyIceberg 0.9.0 release milestone Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants