2
   Unnamed: 0       index    datetime  ...   cVI     Reg       average_temp
0           0  2000-01-01  2000-01-01  ...   NaN  Central           -5.883996
1           1  2000-01-02  2000-01-02  ...   NaN  Central           -6.715087
2           2  2000-01-03  2000-01-03  ...   NaN  Central           -6.074254
3           3  2000-01-04  2000-01-04  ...   NaN  Central           -4.222387
4           4  2000-01-05  2000-01-05  ...   NaN  Central           -0.994825

I want to convert the dataframe above to an xarray dataset, with datetime as the index. I do this:

ds = xr.Dataset.from_dataframe(df)

but I am not able to get the datetime column as index. How do I do that?

2
  • not really clear to me what is not covered by Michael's answer as it provides an xarray dataset with datetime as index that can be used to, well index the dataset. Maybe you can share the expected result or elaborate? Commented Mar 8, 2022 at 16:57
  • Is the problem solved by the below answers? Commented Mar 12, 2022 at 23:38

2 Answers 2

1
+500

xarray will treat the index in a dataframe as the dimensions of the resulting dataset. A MultiIndex will be unstacked such that each level will form a new orthogonal dimension in the result.

To convert your data to xarray, first set the datetime as index in pandas, with df.set_index('datetime').

ds = df.set_index('datetime').to_xarray()

Alternatively, you could promote it afterwards, with ds.set_coords('datetime') and then swap the indexing dimension with ds.swap_dims:

ds = df.to_xarray()
ds.set_coords('datetime').swap_dims({'index': 'datetime'})

I'd recommend the first option but the second also works if you already have your data as a Dataset and want to swap the index.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks @Michael, however, even after doing this, the datetime column type stays as object. Is there a way to get its type to be datetime?
you can convert the column in the dataframe first with pd.to_datetime, e.g. df['datetime'] = pd.to_datetime(df['datetime'])`
1

First use pd.to_datetime and df.set_index on the desired column:

df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')

#             cVI      Reg  average_temp
# datetime                              
# 2000-01-01  NaN  Central     -5.883996
# 2000-01-02  NaN  Central     -6.715087
# 2000-01-03  NaN  Central     -6.074254
# 2000-01-04  NaN  Central     -4.222387
# 2000-01-05  NaN  Central     -0.994825

Then your xr.Dataset.from_dataframe code will work as expected:

ds = xr.Dataset.from_dataframe(df)

# <xarray.Dataset>
# Dimensions:       (datetime: 5)
# Coordinates:
#   * datetime      (datetime) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-05
# Data variables:
#     cVI           (datetime) float64 nan nan nan nan nan
#     Reg           (datetime) object 'Central' 'Central' ... 'Central' 'Central'
#     average_temp  (datetime) float64 -5.884 -6.715 -6.074 -4.222 -0.9948

Or as Michael said, convert it from the pandas side using df.to_xarray:

ds = df.to_xarray()

# <xarray.Dataset>
# Dimensions:       (datetime: 5)
# Coordinates:
#   * datetime      (datetime) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-05
# Data variables:
#     cVI           (datetime) float64 nan nan nan nan nan
#     Reg           (datetime) object 'Central' 'Central' ... 'Central' 'Central'
#     average_temp  (datetime) float64 -5.884 -6.715 -6.074 -4.222 -0.9948

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.