0

I created two random variables (x and y) with certain properties. Now, I want to create a dataframe from scratch out of these two variables. Unfortunately, what I type seems to be wrong. How can I do this correctly?

# creating variable x with Bernoulli distribution
from scipy.stats import bernoulli, binom
x = bernoulli.rvs(size=100,p=0.6)

# form a column vector (n, 1)
x = x.reshape(-100, 1)
print(x)
# creating variable y with normal distribution
y = norm.rvs(size=100,loc=0,scale=1)

# form a column vector (n, 1)
y = y.reshape(-100, 1)
print(y)
# creating a dataframe from scratch and assigning x and y to it
df = pd.DataFrame()  
df.assign(y = y,  x = x)
df

1 Answer 1

1

There are a lot of ways to go about this.

According to the documentation pd.DataFrame accepts ndarray (structured or homogeneous), Iterable, dict, or DataFrame. Your issue is that x and y are 2d numpy array

>>> x.shape
(100, 1)

where it expects either one 1d array per column or a single 2d array.

One way would be to stack the array into one before calling the DataFrame constructor

>>> pd.DataFrame(np.hstack([x,y]))
      0         1
0   0.0  0.764109
1   1.0  0.204747
2   1.0 -0.706516
3   1.0 -1.359307
4   1.0  0.789217
..  ...       ...
95  1.0  0.227911
96  0.0 -0.238646
97  0.0 -1.468681
98  0.0  1.202132
99  0.0  0.348248

The alernatives mostly revolve around calling np.Array.flatten(). e.g. to construct a dict

>>> pd.DataFrame({'x': x.flatten(), 'y': y.flatten()})
    x         y
0   0  0.764109
1   1  0.204747
2   1 -0.706516
3   1 -1.359307
4   1  0.789217
.. ..       ...
95  1  0.227911
96  0 -0.238646
97  0 -1.468681
98  0  1.202132
99  0  0.348248
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.