0

I have Dataframe:enter image description here

On the 16th row there is array with multiple items, how to put one item to next row and delete from previous? It should look like this:

16  76561198316667197  MW  0.13028361  11715671189
17  76561198315155019  MW  0.13529198  11715512049
18  76561198338763319  MW  0.14147347  11712963469
5
  • 1
    In these situations, you should ask yourself, "how did I get here"? Can you prevent this from happening in the first place? Generally you want a single value for each row for each column. Commented Sep 1, 2017 at 16:00
  • I didn't understand you... What do you mean? Do you have any ideas? Commented Sep 1, 2017 at 16:08
  • How did you read in df? Can you do that differently to prevent row 16 from having duplicate data? Commented Sep 1, 2017 at 16:09
  • It was Json object. Commented Sep 1, 2017 at 16:12
  • 1
    Try this: stackoverflow.com/a/40449726/4893407 Commented Sep 1, 2017 at 16:27

3 Answers 3

1

Try the explode function by MaxU:

def explode(df, lst_cols, fill_value=''):
    # make sure `lst_cols` is a list
    if lst_cols and not isinstance(lst_cols, list):
        lst_cols = [lst_cols]
    # all columns except `lst_cols`
    idx_cols = df.columns.difference(lst_cols)

    # calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    if (lens > 0).all():
        # ALL lists in cells aren't empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .loc[:, df.columns]
    else:
        # at least one list in cells is empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .append(df.loc[lens==0, idx_cols]).fillna(fill_value) \
          .loc[:, df.columns]
Sign up to request clarification or add additional context in comments.

Comments

0

I break down the steps.

df = pd.DataFrame({'col1': [[5, 3], [1]], 'col2': [[7, 5], [2]], 'col3': ['MW','MW']})#sample data
df1=df[df.col1.apply(len)>1]
df2=df[df.col1.apply(len)==1]
df1=df1.set_index('col3').stack().apply(pd.Series).stack().unstack(-2).reset_index().drop('level_1',1)
df2=df2.apply(lambda x : x.values[0])
pd.concat([df1,df2],axis=0).reset_index(drop=True)
Out[533]: 
   col1  col2 col3
0     5     7   MW
1     3     5   MW
2     1     2   MW

Comments

0

I am pretty sure this is not the most efficient way to do it, but it works. Assuming your First_Column consist of lists of two.

a = df['First_Column'].values

b = [a[i][0] for i in range(0, len(a))]
c = [a[i][1] for i in range(0, len(a))]

d = {'col1': b, 'col2': c}

new_df = pd.DataFrame(d)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.