python pandas data-frame - duplicate rows according to a column value

Question

I want to duplicate the rows of dataframe "this" according to 2 column values and save them as a new dataframe named "newThis":

this = pd.DataFrame(columns=['a','b','c'], index=[1,2,3])
this.a = [1, 2, 0]
this.b = [5, 0, 4]
this.c = [2, 3, 2]

newThis = []

for i in range(len(this)):

    if int(this.iloc[i, 1]) != 0:
        that = np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))
    elif int(this.iloc[i, 1]) == 0:
        that = np.array([this.iloc[i,:]])              

    if int(this.iloc[i, 2]) != 0:
        those = np.array([this.iloc[i,:]] * int(this.iloc[i, 2]))
    elif int(this.iloc[i, 2]) == 0:
        those = np.array([this.iloc[i,:]])

    newThis.append(that)
    newThis.append(those)

I want one big array of concatenated rows, but Instead I get this mess:

[array([[1, 5, 2],
        [1, 5, 2],
        [1, 5, 2],
        [1, 5, 2],
        [1, 5, 2]], dtype=int64), array([[1, 5, 2],
        [1, 5, 2]], dtype=int64), array([[2, 0, 3]], dtype=int64), array([[2, 0, 3],
        [2, 0, 3],
        [2, 0, 3]], dtype=int64), array([[0, 4, 2],
        [0, 4, 2],
        [0, 4, 2],
        [0, 4, 2]], dtype=int64), array([[0, 4, 2],
        [0, 4, 2]], dtype=int64)]

Thanks

please post your desired data set

MaxU - stand with Ukraine
– MaxU - stand with Ukraine

2017-12-10 13:09:34 +00:00
Commented Dec 10, 2017 at 13:09 — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Dec 10, 2017 at 13:09

MaxU - stand with Ukraine · Accepted Answer · 2017-12-10 13:21:13Z

3

IIUC:

Source DF:

In [213]: this
Out[213]:
   a  b  c
1  1  5  2
2  2  0  3
3  0  4  2

Solution:

In [211]: newThis = pd.DataFrame(np.repeat(this.values, 
                                           this['b'].replace(0,1).tolist(), 
                                           axis=0),
                                 columns=this.columns)

In [212]: newThis
Out[212]:
   a  b  c
0  1  5  2
1  1  5  2
2  1  5  2
3  1  5  2
4  1  5  2
5  2  0  3
6  0  4  2
7  0  4  2
8  0  4  2
9  0  4  2

answered Dec 10, 2017 at 13:21

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sharonio Over a year ago

wow this is so elegant, thanks. It turns out my original big data set has some NaN in it and I get this: "cannot convert float NaN to integer" any ideas?

MaxU - stand with Ukraine Over a year ago

@Sharonio, please post a small reproducible data set (with NaN's) and your desired data set...

cwallenpoole · Accepted Answer · 2017-12-10 13:11:02Z

0

It looks like you're confusing multiplying an np.array with a list.

Remember:

 [np.int32(1)] * 2 == [np.int32(1), np.int32(1)]

But:

 np.array([1]) * 2 == np.array([2])

You probably need to change this:

np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))

to this:

np.array([this.iloc[i,:]]) * int(this.iloc[i, 1])

answered Dec 10, 2017 at 13:11

cwallenpoole

82.4k26 gold badges132 silver badges174 bronze badges

1 Comment

Sharonio Over a year ago

Thanks, though this isn't what i'm looking for. I want to create a longer data-frame in which each row repeats itself n times according to the value in the second and third columns.

Collectives™ on Stack Overflow

python pandas data-frame - duplicate rows according to a column value

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related