Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.to_zarr append_dim behaviour is inconsistent with xr.concat for new dimensions. #9892

Open
5 tasks done
owenlittlejohns opened this issue Dec 14, 2024 · 0 comments
Open
5 tasks done
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@owenlittlejohns
Copy link
Contributor

What happened?

This issue relates to #9858, and captures the Dataset.append_dim behaviour noted by @TomNicholas in this comment.

What did you expect to happen?

Dataset.append_dim should be consistent with Dataset.concat, which does not raise an error. For empty Dataset objects, this results in the example Tom provided, with no time dimension, because there are no variables using that time dimension. For Dataset objects with variables, using Dataset.append_dim on those Datasets should introduce a new, single-element, dimension added to the contained variables, and the output variables should be concatenated along that new dimension.

Minimal Complete Verifiable Example

import xarray as xr

# Create Datasets
ds_one = xr.Dataset(
    data_vars={"temp": (["lat", "lon"], np.array([[270, 271, 270], [273, 272, 272]]))},
    coords={"lat": [10, 20], "lon": [-20, -10, 0]},
)

ds_two = xr.Dataset(
    data_vars={"temp": (["lat", "lon"], np.array([[271, 272, 271], [274, 273, 273]]))},
    coords={"lat": [10, 20], "lon": [-20, -10, 0]},
)

ds.to_zarr("ds.zarr")
ds_two.to_zarr("ds.zarr", append_dim="time")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

File ~/Documents/git/pydata/xarray/xarray/core/dataset.py:2622, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, zarr_format, write_empty_chunks, chunkmanager_store_kwargs)
   2454 """Write dataset contents to a zarr group.
   2455
   2456 Zarr chunks are determined in the following way:
   (...)
   2618     The I/O user guide, with more details and examples.
   2619 """
   2620 from xarray.backends.api import to_zarr
-> 2622 return to_zarr(  # type: ignore[call-overload,misc]
   2623     self,
   2624     store=store,
   2625     chunk_store=chunk_store,
   2626     storage_options=storage_options,
   2627     mode=mode,
   2628     synchronizer=synchronizer,
   2629     group=group,
   2630     encoding=encoding,
   2631     compute=compute,
   2632     consolidated=consolidated,
   2633     append_dim=append_dim,
   2634     region=region,
   2635     safe_chunks=safe_chunks,
   2636     zarr_version=zarr_version,
   2637     zarr_format=zarr_format,
   2638     write_empty_chunks=write_empty_chunks,
   2639     chunkmanager_store_kwargs=chunkmanager_store_kwargs,
   2640 )

File ~/Documents/git/pydata/xarray/xarray/backends/api.py:2184, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, zarr_format, write_empty_chunks, chunkmanager_store_kwargs)
   2182 writer = ArrayWriter()
   2183 # TODO: figure out how to properly handle unlimited_dims
-> 2184 dump_to_store(dataset, zstore, writer, encoding=encoding)
   2185 writes = writer.sync(
   2186     compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs
   2187 )
   2189 if compute:

File ~/Documents/git/pydata/xarray/xarray/backends/api.py:1920, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1917 if encoder:
   1918     variables, attrs = encoder(variables, attrs)
-> 1920 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/Documents/git/pydata/xarray/xarray/backends/zarr.py:907, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    905     existing_dims = self.get_dimensions()
    906     if self._append_dim not in existing_dims:
--> 907         raise ValueError(
    908             f"append_dim={self._append_dim!r} does not match any existing "
    909             f"dataset dimensions {existing_dims}"
    910         )
    912 variables_encoded, attributes = self.encode(
    913     {vn: variables[vn] for vn in new_variable_names}, attributes
    914 )
    916 if existing_variable_names:
    917     # We make sure that values to be appended are encoded *exactly*
    918     # as the current values in the store.
    919     # To do so, we decode variables directly to access the proper encoding,
    920     # without going via xarray.Dataset to avoid needing to load
    921     # index variables into memory.

ValueError: append_dim='time' does not match any existing dataset dimensions {'lat': 2, 'lon': 3}

Anything else we need to know?

No response

Environment

<function xarray.util.print_versions.show_versions(file=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>)>

@owenlittlejohns owenlittlejohns added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

No branches or pull requests

1 participant