2

I want to duplicate the rows of dataframe "this" according to 2 column values and save them as a new dataframe named "newThis":

this = pd.DataFrame(columns=['a','b','c'], index=[1,2,3])
this.a = [1, 2, 0]
this.b = [5, 0, 4]
this.c = [2, 3, 2]

newThis = []

for i in range(len(this)):

    if int(this.iloc[i, 1]) != 0:
        that = np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))
    elif int(this.iloc[i, 1]) == 0:
        that = np.array([this.iloc[i,:]])              

    if int(this.iloc[i, 2]) != 0:
        those = np.array([this.iloc[i,:]] * int(this.iloc[i, 2]))
    elif int(this.iloc[i, 2]) == 0:
        those = np.array([this.iloc[i,:]])

    newThis.append(that)
    newThis.append(those)

I want one big array of concatenated rows, but Instead I get this mess:

[array([[1, 5, 2],
        [1, 5, 2],
        [1, 5, 2],
        [1, 5, 2],
        [1, 5, 2]], dtype=int64), array([[1, 5, 2],
        [1, 5, 2]], dtype=int64), array([[2, 0, 3]], dtype=int64), array([[2, 0, 3],
        [2, 0, 3],
        [2, 0, 3]], dtype=int64), array([[0, 4, 2],
        [0, 4, 2],
        [0, 4, 2],
        [0, 4, 2]], dtype=int64), array([[0, 4, 2],
        [0, 4, 2]], dtype=int64)]

Thanks

1
  • please post your desired data set Commented Dec 10, 2017 at 13:09

2 Answers 2

3

IIUC:

Source DF:

In [213]: this
Out[213]:
   a  b  c
1  1  5  2
2  2  0  3
3  0  4  2

Solution:

In [211]: newThis = pd.DataFrame(np.repeat(this.values, 
                                           this['b'].replace(0,1).tolist(), 
                                           axis=0),
                                 columns=this.columns)

In [212]: newThis
Out[212]:
   a  b  c
0  1  5  2
1  1  5  2
2  1  5  2
3  1  5  2
4  1  5  2
5  2  0  3
6  0  4  2
7  0  4  2
8  0  4  2
9  0  4  2
Sign up to request clarification or add additional context in comments.

2 Comments

wow this is so elegant, thanks. It turns out my original big data set has some NaN in it and I get this: "cannot convert float NaN to integer" any ideas?
@Sharonio, please post a small reproducible data set (with NaN's) and your desired data set...
0

It looks like you're confusing multiplying an np.array with a list.

Remember:

 [np.int32(1)] * 2 == [np.int32(1), np.int32(1)]

But:

 np.array([1]) * 2 == np.array([2])

You probably need to change this:

np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))

to this:

np.array([this.iloc[i,:]]) * int(this.iloc[i, 1])

1 Comment

Thanks, though this isn't what i'm looking for. I want to create a longer data-frame in which each row repeats itself n times according to the value in the second and third columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.