4

I have a numpy array that i want to add as a column in a existing dask dataframe.

enc = LabelEncoder()
nparr = enc.fit_transform(X[['url']])

I have ddf of type dask dataframe.

ddf['nurl'] = nparr   ???

Any elegant way to achieve above please?

Python PANDAS: Converting from pandas/numpy to dask dataframe/array This does not solve my issue as i want numpy array into existing dask dataframe.

2

1 Answer 1

8

You can convert the numpy array to a dask Series object, then merge it to the dataframe. You will need to use the .to_frame() method of the Series object since it dask only support merging dataframes with other dataframes.

import dask.dataframe as dd
import numpy as np
import pandas as pd

df = pd.DataFrame({'x': range(30), 'y': range(0,300, 10)})
arr = np.random.randint(0, 100, size=30)

# create dask frame and series
ddf = ddf = dd.from_pandas(df, npartitions=5)
darr = dd.from_array(arr)
# give it a name to use as a column head
darr.name = 'z'

ddf2 = ddf.merge(darr.to_frame())

ddf2
# returns:
Dask DataFrame Structure:
                   x      y      z
npartitions=5
0              int64  int64  int32
6                ...    ...    ...
...              ...    ...    ...
24               ...    ...    ...
29               ...    ...    ...
Dask Name: join-indexed, 33 tasks
Sign up to request clarification or add additional context in comments.

1 Comment

it throws an error AttributeError: 'DataFrame' object has no attribute 'to_frame' when i try my data . could you help me with that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.