1
import io
import pandas as pd

csv_data = '''App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine'''

df = pd.read_csv(io.StringIO(csv_data), sep=",")
html_table = df.tohtml()

Is there a way to compare the values of columns in dataframe and use it in conditional formatting ? I want compare if the 'prod','pre-prod' and 'stage' values are mismatching, if yes then then its bg-color should be red. I have tired the following methods present in pandas but none of them works.

df.style.apply()

df.style.apply_index()

df.style.applymap()

Current Output:

enter image description here

Desired output:

enter image description here

2
  • Have you checked this? pandas.pydata.org/docs/reference/api/… Commented Dec 29, 2022 at 8:19
  • Yes. But it has no conditional formatting options. Commented Dec 29, 2022 at 9:06

3 Answers 3

2

You can add style conditionally by applying style to a subset of your dataframe like:

import io
import pandas as pd

csv_data = '''App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine'''

def add_color(row):
    return ['background-color: red'] * len(row)

df = pd.read_csv(io.StringIO(csv_data), sep=",")
df.loc[(df["pre-prod"] == df["prod"]) & (df["prod"] == df["stage"])].style.apply(add_color, axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Its working. But its only printing the matching value and ignoring the remaining values.
1
import io
import pandas as pd

csv_data = '''
App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine
mismatching-image,nginx,nginx,nginx:1.23.3-alpine
'''
df = pd.read_csv(io.StringIO(csv_data), sep=",")
def match_checker(row):
    if row['prod'] == row['pre-prod'] == row['stage']:
        return [''] * len(row)
    else:
        return ['background-color: red'] * len(row)
        
df = df.style.apply(match_checker, axis=1)

html_table = df.to_html()

with open('testpandas.html','w+') as html_file:
    html_file.write(html_table)
    html_file.close()

Updated @PeterSmith answer.

Comments

1

It's also possible to style the entire DataFrame in one go by passing axis=None to apply.

We can identify rows which have differing values in the specified columns by comparing the first column (column 0) with the remaining columns (column 1-2) and identifying where there are unequal values using ne on axis=0.

df[['prod', 'stage']].ne(df['pre-prod'], axis=0)

#     prod  stage
# 0  False  False
# 1  False   True

Then we can check across rows for any rows which have any True values (meaning there is something that's not equal in the row).

df[['prod', 'stage']].ne(df['pre-prod'], axis=0).any(axis=1)

# 0    False
# 1     True
# dtype: bool

We can then simply apply the styles anywhere there's a True value in the resulting Series.


Altogether this could look something like:

def colour_rows_that_dont_match(df_: pd.DataFrame, comparison_cols: List[str]):
    # Sanity check that comparison_cols is what we expect
    assert isinstance(comparison_cols, list) and len(comparison_cols) > 1, \
        'Must be a list and provide at least 2 column to compare'

    # Create an Empty DataFrame to hold styles of the same shape as the original df
    styles_df = pd.DataFrame('', index=df_.index, columns=df_.columns)
    # Compare the first column's (col 0) values to the remaining columns. 
    # Find rows where any values are not equal (ne)
    rows_that_dont_match = df[comparison_cols[1:]].ne(df[comparison_cols[0]], axis=0).any(axis=1)
    # Apply styles to rows which meet the above criteria
    styles_df.loc[rows_that_dont_match, :] = 'background-color: red'
    return styles_df


df.style.apply(
    colour_rows_that_dont_match,
    # This gets passed to the function
    comparison_cols=['pre-prod', 'prod', 'stage'],
    # Apply to the entire DataFrame at once
    axis=None
).to_html(buf='test_df.html')

Which produces the following:
HTML table with second row highlighted due to differing values in the 'prod' and 'stage' columns


Setup, version, and imports:

from typing import List

import pandas as pd  # version 1.5.2

df = pd.DataFrame({
    'App_name': ['matching-image', 'mismatching-image'],
    'pre-prod': ['nginx', 'nginx'],
    'prod': ['nginx', 'nginx'],
    'stage': ['nginx', 'nginx:1.23.3-alpine']
})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.