Create multiple DataFrames from a single DataFrame based on conditions by columns

Question

New to pandas and python so thank you in advance. I have a table

# Create DataFrame
data = [{'analyte': 'sample1'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2},
        {'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
        {'analyte': 'shoe', 'CAS1': 4},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
        {'analyte': 'sample2'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
        {'analyte': 'money', 'CAS1': 3},
        {'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)

Before Write Pandas DataFrame into a MySQL Database Table, i need to split df to separate tables, and then write each table to Mysql

How to split df by columns, somethink like, if column name contains string "cas1" then split df

for col in df.columns:
    if "cas1" in col:
       dfCas1 = df.split
       #add uniq index to indetify to which row belongs to
    if "cas2" in col:
       dfCas2 = df.split
       #add uniq index to indetify to which row belongs to
    if {"analyte","id" .etc } in col: # main table
       dfMain = df.split

dfMain.to_sql("Main", dbConnection, if_exists='fail')
dfCas1.to_sql("cas1", dbConnection, if_exists='fail')
dfCas2.to_sql("cas2", dbConnection, if_exists='fail')

expected

Lucvv · Accepted Answer · 2020-11-25 18:44:52Z

1

I'm not completely sure what you want to achieve, but I feel like you want to do something like splitting this:

+---------+----+------+--------+------+--------+
| Analyte | id | CAS1 | value1 | Cas2 | Value2 |
+---------+----+------+--------+------+--------+
|         |    |      |        |      |        |
+---------+----+------+--------+------+--------+

to this:

+---------+----+  +------+--------+  +------+--------+
| Analyte | id |  | CAS1 | value1 |  | Cas2 | Value2 |
+---------+----+  +------+--------+  +------+--------+
|         |    |  |      |        |  |      |        |
+---------+----+  +------+--------+  +------+--------+

The first one is obtained by calling e.g. df.loc[:, ['Analyte', 'id']]. For the other ones, adjust the column names.

Now for the uniq index that is within your code comments, df.loc[:] keeps the index of the original table. You can use df.reset_index() to reset it to a unique integer index. If you also want to drop empty rows in one of your subtables before parsing, have a look at df.dropna().

answered Nov 25, 2020 at 18:44

Lucvv

1168 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Dmitrij Holkin Over a year ago

yes, but each table must contains one more columns "UniqIndex"

Lucvv Over a year ago

Is this UniqIndex currently the index of your dataframe? If so, just call df.reset_index(drop=False). If not, call it twice! Depending on how the new column is named, you could subsequently call df.rename(index={'oldname':'UniqIndex'})

Dmitrij Holkin Over a year ago

yes and no, it must be best, hash of each md5(Analyte), i update question

Lucvv Over a year ago

I guess this would work: import hashlib df.loc[:, 'Uniq'] = df.loc[:, 'Analyte'].str.encode('utf8').apply(lambda x: (hashlib.md5(x).hexdigest()))

Lucvv Over a year ago

I would just add an inverse selection: inv_selection = [col for col in df.columns if col not in selection] and then use inv_selection in .loc

|

exokamen · Accepted Answer · 2020-11-25 18:23:59Z

1

I am not 100% sure if this is what you mean, but:

dfCas1 = df[df.col.str.contains('cas1')]
dfCas2 = df[df.col.str.contains('cas2')]
dfMain = df[~((df.col.str.contains('cas2')) & df.col.str.contains('cas1'))]

The ~ sign negates the selection and means all rows where the columns do not contain cas1 and cas2. I hope this makes sense.

answered Nov 25, 2020 at 18:23

exokamen

365 bronze badges

3 Comments

Dmitrij Holkin Over a year ago

But if i need multiple contains value for dfCas1? and how to add uniq index to each df so then i can verify vrom which rows come field in other df, say analyte is Product, and others columns this product attributes, so i must know where to find each products it attribute

Dmitrij Holkin Over a year ago

AttributeError: 'DataFrame' object has no attribute 'col'

exokamen Over a year ago

col here should be replaced by your column name. But I see you have found your answer.

Collectives™ on Stack Overflow

Create multiple DataFrames from a single DataFrame based on conditions by columns

2 Answers 2

10 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related