0

New to pandas and python so thank you in advance. I have a table

LAB DATA

# Create DataFrame
data = [{'analyte': 'sample1'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2},
        {'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
        {'analyte': 'shoe', 'CAS1': 4},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
        {'analyte': 'sample2'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
        {'analyte': 'money', 'CAS1': 3},
        {'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)

Before Write Pandas DataFrame into a MySQL Database Table, i need to split df to separate tables, and then write each table to Mysql

How to split df by columns, somethink like, if column name contains string "cas1" then split df

for col in df.columns:
    if "cas1" in col:
       dfCas1 = df.split
       #add uniq index to indetify to which row belongs to
    if "cas2" in col:
       dfCas2 = df.split
       #add uniq index to indetify to which row belongs to
    if {"analyte","id" .etc } in col: # main table
       dfMain = df.split

dfMain.to_sql("Main", dbConnection, if_exists='fail')
dfCas1.to_sql("cas1", dbConnection, if_exists='fail')
dfCas2.to_sql("cas2", dbConnection, if_exists='fail')

expected enter image description here

2 Answers 2

1

I'm not completely sure what you want to achieve, but I feel like you want to do something like splitting this:

+---------+----+------+--------+------+--------+
| Analyte | id | CAS1 | value1 | Cas2 | Value2 |
+---------+----+------+--------+------+--------+
|         |    |      |        |      |        |
+---------+----+------+--------+------+--------+

to this:

+---------+----+  +------+--------+  +------+--------+
| Analyte | id |  | CAS1 | value1 |  | Cas2 | Value2 |
+---------+----+  +------+--------+  +------+--------+
|         |    |  |      |        |  |      |        |
+---------+----+  +------+--------+  +------+--------+

The first one is obtained by calling e.g. df.loc[:, ['Analyte', 'id']]. For the other ones, adjust the column names.

Now for the uniq index that is within your code comments, df.loc[:] keeps the index of the original table. You can use df.reset_index() to reset it to a unique integer index. If you also want to drop empty rows in one of your subtables before parsing, have a look at df.dropna().

Sign up to request clarification or add additional context in comments.

10 Comments

yes, but each table must contains one more columns "UniqIndex"
Is this UniqIndex currently the index of your dataframe? If so, just call df.reset_index(drop=False). If not, call it twice! Depending on how the new column is named, you could subsequently call df.rename(index={'oldname':'UniqIndex'})
yes and no, it must be best, hash of each md5(Analyte), i update question
I guess this would work: import hashlib df.loc[:, 'Uniq'] = df.loc[:, 'Analyte'].str.encode('utf8').apply(lambda x: (hashlib.md5(x).hexdigest()))
I would just add an inverse selection: inv_selection = [col for col in df.columns if col not in selection] and then use inv_selection in .loc
|
1

I am not 100% sure if this is what you mean, but:

dfCas1 = df[df.col.str.contains('cas1')]
dfCas2 = df[df.col.str.contains('cas2')]
dfMain = df[~((df.col.str.contains('cas2')) & df.col.str.contains('cas1'))]

The ~ sign negates the selection and means all rows where the columns do not contain cas1 and cas2. I hope this makes sense.

3 Comments

But if i need multiple contains value for dfCas1? and how to add uniq index to each df so then i can verify vrom which rows come field in other df, say analyte is Product, and others columns this product attributes, so i must know where to find each products it attribute
AttributeError: 'DataFrame' object has no attribute 'col'
col here should be replaced by your column name. But I see you have found your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.