8

Very simple question but I can't find the answer online. I have a Dataset and I just want to add a named DataArray to it. Something like dataset.add({"new_array": new_data_array}). I know about merge and update and concatenate, but my understanding is that merge is for merging two or more Datasets and concatenate is for concatenating two or more DataArrays to form another DataArray, and I haven't quite fully understood update yet. I've tried dataset.update({"new_array": new_data_array}) but I get the following error.

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

I've also tried dataset["new_array"] = new_data_array and I get the same error.

Update

I've now found out that the problem is that some of my coordinates have duplicate values, which I didn't know about. Coordinates are used as index, so Xarray gets confused (understandably) when trying to combine the shared coordinates. Below is an example that works.

names = ["joaquin", "manolo", "xavier"]
n = xarray.DataArray([23, 98, 23], coords={"name": names})
print(n)
print("======")
m = numpy.random.randint(0, 256, (3, 4, 4)).astype(numpy.uint8)
mm = xarray.DataArray(m, dims=["name", "row", "column"], coords=[names, range(4), range(4)])
print(mm)
print("======")
n_dataset = n.rename("number").to_dataset()
n_dataset["mm"] = mm
print(n_dataset)

Output:

<xarray.DataArray (name: 3)>
array([23, 98, 23])
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'xavier'
======
<xarray.DataArray (name: 3, row: 4, column: 4)>
array([[[ 55,  63, 250, 211],
        [204, 151, 164, 237],
        [182,  24, 211,  12],
        [183, 220,  35,  78]],

       [[208,   7,  91, 114],
        [195,  30, 108, 130],
        [ 61, 224, 105, 125],
        [ 65,   1, 132, 137]],

       [[ 52, 137,  62, 206],
        [188, 160, 156, 126],
        [145, 223, 103, 240],
        [141,  38,  43,  68]]], dtype=uint8)
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'xavier'
  * row      (row) int64 0 1 2 3
  * column   (column) int64 0 1 2 3
======
<xarray.Dataset>
Dimensions:  (column: 4, name: 3, row: 4)
Coordinates:
  * name     (name) object 'joaquin' 'manolo' 'xavier'
  * row      (row) int64 0 1 2 3
  * column   (column) int64 0 1 2 3
Data variables:
    number   (name) int64 23 98 23
    mm       (name, row, column) uint8 55 63 250 211 204 151 164 237 182 24 ...

The above code uses names as the index. If I change the code a little bit, so that names has a duplicate, say names = ["joaquin", "manolo", "joaquin"], then I get an InvalidIndexError.

Code:

names = ["joaquin", "manolo", "joaquin"]
n = xarray.DataArray([23, 98, 23], coords={"name": names})
print(n)
print("======")
m = numpy.random.randint(0, 256, (3, 4, 4)).astype(numpy.uint8)
mm = xarray.DataArray(m, dims=["name", "row", "column"], coords=[names, range(4), range(4)])
print(mm)
print("======")
n_dataset = n.rename("number").to_dataset()
n_dataset["mm"] = mm
print(n_dataset)

Output:

<xarray.DataArray (name: 3)>
array([23, 98, 23])
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'joaquin'
======
<xarray.DataArray (name: 3, row: 4, column: 4)>
array([[[247,   3,  20, 141],
        [ 54, 111, 224,  56],
        [144, 117, 131, 192],
        [230,  44, 174,  14]],

       [[225, 184, 170, 248],
        [ 57, 105, 165,  70],
        [220, 228, 238,  17],
        [ 90, 118,  87,  30]],

       [[158, 211,  31, 212],
        [ 63, 172, 190, 254],
        [165, 163, 184,  22],
        [ 49, 224, 196, 244]]], dtype=uint8)
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'joaquin'
  * row      (row) int64 0 1 2 3
  * column   (column) int64 0 1 2 3
======
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-12-50863379cefe> in <module>()
      8 print("======")
      9 n_dataset = n.rename("number").to_dataset()
---> 10 n_dataset["mm"] = mm
     11 print(n_dataset)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in __setitem__(self, key, value)
    536             raise NotImplementedError('cannot yet use a dictionary as a key '
    537                                       'to set Dataset values')
--> 538         self.update({key: value})
    539 
    540     def __delitem__(self, key):

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in update(self, other, inplace)
   1434             dataset.
   1435         """
-> 1436         variables, coord_names, dims = dataset_update_method(self, other)
   1437 
   1438         return self._replace_vars_and_dims(variables, coord_names, dims,

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/merge.py in dataset_update_method(dataset, other)
    492     priority_arg = 1
    493     indexes = dataset.indexes
--> 494     return merge_core(objs, priority_arg=priority_arg, indexes=indexes)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
    373     coerced = coerce_pandas_values(objs)
    374     aligned = deep_align(coerced, join=join, copy=False, indexes=indexes,
--> 375                          skip_single_target=True)
    376     expanded = expand_variable_dicts(aligned)
    377 

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in deep_align(list_of_variable_maps, join, copy, indexes, skip_single_target)
    162 
    163     aligned = partial_align(*targets, join=join, copy=copy, indexes=indexes,
--> 164                             skip_single_target=skip_single_target)
    165 
    166     for key, aligned_obj in zip(keys, aligned):

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in partial_align(*objects, **kwargs)
    122         valid_indexers = dict((k, v) for k, v in joined_indexes.items()
    123                               if k in obj.dims)
--> 124         result.append(obj.reindex(copy=copy, **valid_indexers))
    125 
    126     return tuple(result)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in reindex(self, indexers, method, tolerance, copy, **kw_indexers)
   1216 
   1217         variables = alignment.reindex_variables(
-> 1218             self.variables, self.indexes, indexers, method, tolerance, copy=copy)
   1219         return self._replace_vars_and_dims(variables)
   1220 

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in reindex_variables(variables, indexes, indexers, method, tolerance, copy)
    234             target = utils.safe_cast_to_index(indexers[name])
    235             indexer = index.get_indexer(target, method=method,
--> 236                                         **get_indexer_kwargs)
    237 
    238             to_shape[name] = len(target)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2080 
   2081         if not self.is_unique:
-> 2082             raise InvalidIndexError('Reindexing only valid with uniquely'
   2083                                     ' valued Index objects')
   2084 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

So it's not a bug in Xarray as such. Nevertheless, I wasted many hours trying to find this bug, and I wish the error message was more informative. I hope the Xarray collaborators will fix this soon. (Put in a uniqueness check on the coordinates before attempting to merge.)

In any case, the method provided by my answer below still works.

6
  • Can you share a more complete example? There was a small regression that caused a similar error (now corrected). But it's not clear what's going on without seeing the source. Commented Aug 8, 2016 at 15:16
  • Are you using xarray v0.8.0? It had a bug that caused this error erroneously, now fixed in v0.8.1. Commented Aug 8, 2016 at 16:02
  • I've now updated to 0.8.1 and still getting the same error. I've tried the example code in @jetesdal's answer, and that works, but for some reason it still doesn't work in my own code. My own code is too huge (and too confidential) to paste on here; it'll take me a good while to write a short example. Commented Aug 8, 2016 at 16:17
  • 1
    Thanks, a bug report would be much appreciated. It appears that at least one of your arrays has duplicate values along one of its axes, which means it cannot be automatically aligned, which is part of what happens in merging/assignment as well as many other xarray operations. There may be tweaks we can do to make this work, though. Commented Aug 8, 2016 at 16:27
  • @Stephan that's a great guess on your part! One of my DataArray indeed has many duplicates along its (only) axis, but that's intentional. The array I'm trying to add has 3 dimensions, one of them is shared with the array that has duplicate values. I will try to produce a reproducible error now. Commented Aug 8, 2016 at 16:31

3 Answers 3

9

You need to make sure that the dimensions of your new DataArray are the same as in your dataset. Then the following should work:

dataset['new_array_name'] = new_array

Here is a complete example to try it out:

# Create some dimensions
x = np.linspace(-10,10,10)
y = np.linspace(-20,20,20)
(yy, xx) = np.meshgrid(y,x)

# Make two different DataArrays with equal dimensions
var1 = xray.DataArray(np.random.randn(len(x),len(y)),coords=[x, y],dims=['x','y'])
var2 = xray.DataArray(-xx**2+yy**2,coords=[x, y],dims=['x','y'])

# Save one DataArray as dataset
ds = var1.to_dataset(name = 'var1')

# Add second DataArray to existing dataset (ds)
ds['var2'] = var2
Sign up to request clarification or add additional context in comments.

2 Comments

This is the right solution, but note that you don't need to ensure anything about the dimensions of your new DataArray or the labels. Xarray will align it automatically when you insert it.
Thank you @jetesdal but my code still doesn't work. Yours does (when I copy-paste it). Very strange.
9

Thanks to your detailed report, this issue has now been fixed in the latest release of xarray (v0.8.2).

We fixed the behavior in two ways:

  1. Alignment operations between xarray objects now succeed even with non-unique indexes, as long as the non-unique indexes take on identical values on all objects.

  2. If you attempt to align objects with non-unique indexes that are not identical, you now get an informative error message reporting the name of the index with duplicate values, e.g., ValueError: cannot reindex or align along dimension 'x' because the index has duplicate values.

Comments

1

OK I found one way to do it but I don't know if this is the canonical way or the best way, so please criticise and advise. It doesn't feel like a good way of doing it.

dataset = xarray.merge([dataset, new_data_array.rename("new_array")])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.