2

I have a pandas dataframe that I read from google sheet. I then added the tag column using:

df['tag'] = df.filter(like = 'Subject', axis = 1).apply(lambda x: np.where(x == 'Y', x.name,'')).values.tolist()
df['tag'] = df['tag'].apply(lambda x: [i for i in x if i!= ''])

Resultant sample DataFrame:

    Id  Name    Subject-A   Subject-B   Total   tag
0   1   A       Y                       100     [Subject-A]
1   2   B                   Y           98      [Subject-B]
2   3   C       Y           Y           191     [Subject-A, Subject-B]
3   4   D                   Y           100     [Subject-B]
4   5   E                   Y           95      [Subject-B]

Then I export the dataframe to a MySQL Database after converting the tag column into a comma separated string by:

df['tag'] = df['tag'].map(lambda x : ', '.join(str(i) for i in x)).str.replace('Subject-','')
df

    Id  Name    Subject-A   Subject-B   Total   tag
0   1   A       Y                       100     A
1   2   B                   Y           98      B
2   3   C       Y           Y           91      A, B
3   4   D                   Y           100     B
4   5   E                   Y           95      B

df.to_sql(name = 'table_name', con = conn, if_exists = 'replace', index = False)

But in the MySQL database the tag columns is:

A,
,B
A,B
,B
,B

My actual data has many such "Subject" columns so the result looks like:

, , , D
A, ,C,
...
...

Could someone please let me know why it's giving expected out in Pandas but when I save the dataframe in cloud SQL, the column looks different. The expected output in MySQL database is same as how the tag column is appearing in Pandas.

3
  • It seems like for some reason is removed code df['tag'] = df['tag'].apply(lambda x: [i for i in x if i!= '']) in your solution. Commented Nov 2, 2021 at 8:12
  • @jezrael, yes i entered x!='' in my code by mistake Commented Nov 2, 2021 at 8:21
  • 1
    But thanks for your answer, can use it when there's a data problem. Commented Nov 2, 2021 at 8:26

1 Answer 1

1

Here is alternative solution, seems some data related problem.

First filter Subject columns with remove Subject- and then use DataFrame.dot with columns names with separator, last strip separator from right side:

df1 = df.filter(like = 'Subject').rename(columns=lambda x: x.replace('Subject-',''))
print (df1)
     A    B
0    Y  NaN
1  NaN    Y
2    Y    Y
3  NaN    Y
4  NaN    Y

df['tag'] = df1.eq('Y').dot(df1.columns  + ', ').str.rstrip(', ')
print (df)
   Id Name Subject-A Subject-B  Total   tag
0   1    A         Y       NaN    100     A
1   2    B       NaN         Y     98     B
2   3    C         Y         Y    191  A, B
3   4    D       NaN         Y    100     B
4   5    E       NaN         Y     95     B
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.