How to create DataFrame output with some conditions?

Question

I have 2 list of string containing pitch value:

pitch_detected = ['A#3 / Bb3', 'B3', 'C4', 'C#4 / Db4', 'D4', 'D#4 / Eb4', 'E4', 'F4']
pitch_dataset = ['G#3 / Ab3', 'A#3 / Bb3', 'B3', 'C4', 'C#4 / Db4', 'D4', 'D#4 / Eb4', 'E4', 'F4', 'F#4 / Gb4', 'G4']

I need to print them at HTML page later. Right now, I use this code, and I face no problem.

  df = pd.DataFrame(data = [pitch_dataset, pitch_class])
  df.index = ['Pitch from dataset', 'Pitch detected']
  df = df.T
  table_result = df.to_html()

This is the current HTML output created from dataframe df.

Altough the output is looking nicely, I need to present it with some conditions. I think, this can be achieved by manipulating the DataFrame first before printing to HTML using to_html(). Here are the conditions:

If pitch X is present in both pitch_dataset and pitch_detected, the pitch X will be shown side-by-side.
If pitch X is present in pitch_dataset but not present in pitch_detected, the row in pitch_detected will be filled by -.
If pitch X is not present in pitch_dataset but present in pitch_detected, the pitch X will be shown in pitch_detected column, the extra rows in pitch_dataset will be filled by -.

Is there any way to achieve this condition with Python?

This is the sample desired output I want, generated by Microsoft Excel, with comments on condition. Condition 1 and 2 is the same as the output table given above, while condition 3 is manually made for the purpose of giving examples, with the pitch_detected list would be:

pitch_detected = ['A#3 / Bb3', 'B3', 'C4', 'C#4 / Db4', 'D4', 'D#4 / Eb4', 'E4', 'F4', 'A4', 'B4']

At the real output, I require no highlighting.

Edit: because I have received notification of a duplicate question here, I need to address that the problem I face is different from the suspected duplicate question.

Does this answer your question? conditional column output for pandas dataframe — nyedidikeke
– nyedidikeke, Commented Apr 17, 2021 at 10:56
Unfortunately not, it is different from my problem., where I need to do some formatting — Dionisius Pratama
– Dionisius Pratama, Commented Apr 17, 2021 at 11:04
Would the detected pitch in the Excel example be represented as pitch_detected = ['A#3 / Bb3', 'B3', 'C4', 'C#4 / Db4', 'D4', 'D#4 / Eb4', 'E4', 'F4', 'A4', 'B4]? I have a solution for conditions 1 & 2, waiting for precisions concerning 3. — Guillaume Ansanay-Alex
– Guillaume Ansanay-Alex, Commented Apr 17, 2021 at 12:44
See my edit on simplified coding. You are free to choose whatever version of the codings. — SeaBean
– SeaBean, Commented Apr 17, 2021 at 17:36

SeaBean · Accepted Answer · 2021-04-17 17:34:14Z

You can merge df with itself and extract the 2 columns with aligned Pitch X (side by side). Then replace NaN values with -, as follows:

df2 = df.merge(df, left_on='Pitch detected', right_on='Pitch from dataset', how='outer', indicator=True)[['Pitch from dataset_y', 'Pitch detected_x']]

df3 = df2.dropna(how='all').fillna('-').rename(columns={'Pitch from dataset_y': 'Pitch from dataset', 'Pitch detected_x': 'Pitch detected'})




print(df3)

   Pitch from dataset Pitch detected
0           A#3 / Bb3      A#3 / Bb3
1                  B3             B3
2                  C4             C4
3           C#4 / Db4      C#4 / Db4
4                  D4             D4
5           D#4 / Eb4      D#4 / Eb4
6                  E4             E4
7                  F4             F4
8                   -             A4
9                   -             B4
11          G#3 / Ab3              -
12          F#4 / Gb4              -
13                 G4              -

Here, let me breakdown the steps for your reference:

With just the merge step, we get:

df.merge(df, left_on='Pitch detected', right_on='Pitch from dataset', how='outer', indicator=True)



   Pitch from dataset_x Pitch detected_x Pitch from dataset_y Pitch detected_y      _merge
0             G#3 / Ab3        A#3 / Bb3            A#3 / Bb3               B3        both
1             A#3 / Bb3               B3                   B3               C4        both
2                    B3               C4                   C4        C#4 / Db4        both
3                    C4        C#4 / Db4            C#4 / Db4               D4        both
4             C#4 / Db4               D4                   D4        D#4 / Eb4        both
5                    D4        D#4 / Eb4            D#4 / Eb4               E4        both
6             D#4 / Eb4               E4                   E4               F4        both
7                    E4               F4                   F4               A4        both
8                    F4               A4                  NaN              NaN   left_only
9             F#4 / Gb4               B4                  NaN              NaN   left_only
10                   G4             None                  NaN              NaN   left_only
11                  NaN              NaN            G#3 / Ab3        A#3 / Bb3  right_only
12                  NaN              NaN            F#4 / Gb4               B4  right_only
13                  NaN              NaN                   G4             None  right_only

The middle 2 columns already have what we want.

The other steps are just to extract and reformat the middle 2 columns to your desired format, detailed bellow:

df2 just extracts the middle middle 2 columns from the merge result

Then, we drop those rows (of middle 2 columns) with all None / NaN values.

Replace NaN values with - and then rename the columns without the _x, _y.

Edit

The codes can be further simplified and at the same time the output more resembling to the desired output, as follows:

(df[['Pitch from dataset']].merge(
    df[['Pitch detected']], 
    left_on='Pitch from dataset', 
    right_on='Pitch detected', 
    how='outer')
).dropna(how='all').fillna('-')

Note that we use double square brackets [[...]] here for the 2 columns during .merge(). We are, in effect, merging 2 dataframes each with one column only. Thus, save us subsequent effort on renaming merged columns.

Result:

   Pitch from dataset Pitch detected
0           G#3 / Ab3              -
1           A#3 / Bb3      A#3 / Bb3
2                  B3             B3
3                  C4             C4
4           C#4 / Db4      C#4 / Db4
5                  D4             D4
6           D#4 / Eb4      D#4 / Eb4
7                  E4             E4
8                  F4             F4
9           F#4 / Gb4              -
10                 G4              -
11                  -             A4
12                  -             B4

@DionisiusPratama Yes, df.merge() is a handy function to explore and use. With its indicator=True option, you can easily see from which dataframe the merged rows are from. Just make good use of it.

above_c_level · Accepted Answer · 2021-04-17 12:58:39Z

1

One solution is to use Python datastructures since your conditions are not well suited for pandas table/vector logic.

    intersection = list(set(pitch_detected) & set(pitch_dataset))
    fillvalue = '-'
    intersected_data = [(val, val) for val in intersection]
    dataset_data = [(val, fillvalue) for val in pitch_dataset if val not in intersection]
    detected_data = [(fillvalue, val) for val in pitch_detected if val not in intersection]
    data = intersected_data + dataset_data + detected_data
    df = pd.DataFrame(data, columns=['Pitch from dataset', 'Pitch detected'])

First, you build the intersection of the data as described in this answer.

Then you build three lists:

list of all intersected items
list of items only present in pitch_datset
list of items only present in pitch_detected

Finally, you join all lists and build your dataframe.

My result looks like this:

   Pitch from dataset Pitch detected
0                  F4             F4
1           D#4 / Eb4      D#4 / Eb4
2           A#3 / Bb3      A#3 / Bb3
3           C#4 / Db4      C#4 / Db4
4                  D4             D4
5                  E4             E4
6                  B3             B3
7                  C4             C4
8           G#3 / Ab3              -
9           F#4 / Gb4              -
10                 G4              -
11                  -             A4
12                  -             B4

answered Apr 17, 2021 at 12:58

above_c_level

3,9893 gold badges26 silver badges39 bronze badges

1 Comment

Dionisius Pratama Over a year ago

Thank you! Your solution worked perfectly. Anyway, if I may add something, because the pitch octave (written 3 and 4 on the example) is kinda important, I added intersected_data.sort(), dataset_data.sort(), detected_data.sort() to your code.

Corralien · Accepted Answer · 2021-04-17 15:33:27Z

1

Another method:

pitch_detected = ['A#3 / Bb3', 'B3', 'C4', 'C#4 / Db4', 'D4', 'D#4 / Eb4', 'E4', 'F4', 'A4', 'B4']
pitch_dataset = ['G#3 / Ab3', 'A#3 / Bb3', 'B3', 'C4', 'C#4 / Db4', 'D4', 'D#4 / Eb4', 'E4', 'F4', 'F#4 / Gb4', 'G4']

sr1 = pd.Series(pitch_dataset, name="Pitch from dataset").sort_values()
sr2 = pd.Series(pitch_detected, name="Pitch detected").sort_values()

df = pd.concat([sr1.loc[sr1.isin(sr2)].reset_index(drop=True),
                sr2.loc[sr2.isin(sr1)].reset_index(drop=True)], axis="columns")

df = df.append(sr1.loc[~sr1.isin(sr2)].reset_index(drop=True).to_frame())
df = df.append(sr2.loc[~sr2.isin(sr1)].reset_index(drop=True).to_frame())

df = df.reset_index(drop=True).fillna("-")  # <- replace NaN by "-"

Output:

>>> df 
   Pitch from dataset Pitch detected
0           A#3 / Bb3      A#3 / Bb3
1                  B3             B3
2                  C4             C4
3           C#4 / Db4      C#4 / Db4
4                  D4             D4
5           D#4 / Eb4      D#4 / Eb4
6                  E4             E4
7                  F4             F4
8           G#3 / Ab3              -
9           F#4 / Gb4              -
10                 G4              -
11                  -             A4
12                  -             B4

edited Apr 17, 2021 at 15:33

answered Apr 17, 2021 at 13:10

Corralien

121k8 gold badges44 silver badges69 bronze badges

6 Comments

Dionisius Pratama Over a year ago

Where should I use fillna('-') to replace NaN values?

Dionisius Pratama Over a year ago

I tried adding df.fillna('-') after df.reset_index, but it still returns NaN values.

SeaBean Over a year ago

This solution is also nice! But it needs the matching items in the 2 lists being in same sequence (this is quite possible as the sample lists are already not kept in sorted sequences). Otherwise, the pd.concat() result won't show the matching items side by side.

Corralien Over a year ago

@SeaBean, you are absolutely right! I have tried to shuffle the two lists and my solution is no longer consistent. I have to sort by values the two series.

SeaBean Over a year ago

Good! Frankly speaking, I like your solution too! Please don't mind, I just used to review other people's solutions too so that I can learn from others too! :-)

|

Collectives™ on Stack Overflow

How to create DataFrame output with some conditions?

3 Answers 3

Edit

1 Comment

1 Comment

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Edit

1 Comment

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related