concatenate values in dataframe if a column has specific values and None or Null values

Question

I have a dataframe with name+address/email information based on the type. Based on a type I want to concat name+address or name+email into a new column (concat_name) within the dataframe. Some of the types are null and are causing ambiguity errors. Identifying the nulls correctly in place is where I'm having trouble.

NULL = None
data = {
    'Type': [NULL, 'MasterCard', 'Visa','Amex'],
    'Name': ['Chris','John','Jill','Mary'],
    'City': ['Tustin','Cleveland',NULL,NULL ],
    'Email': [NULL,NULL,'[email protected]','[email protected]']
}

df_data = pd.DataFrame(data)

#Expected resulting df column:
df_data['concat_name'] = ['ChrisTustin', 'JohnCleveland','[email protected],'[email protected]']

Attempt one using booleans

if df_data['Type'].isnull() | df_data[df_data['Type'] == 'Mastercard':
   df_data['concat_name'] = df_data['Name']+df_data['City']
if df_data[df_data['Type'] == 'Visa' | df_data[df_data['Type'] == 'Amex':
   df_data['concat_name'] = df_data['Name']+df_data['Email']
else:
   df_data['concat_name'] = 'Error'

Error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Attempt two using np.where

df_data['concat_name'] = np.where((df_data['Type'].isna()|(df_data['Type']=='MasterCard'),df_data['Name']+df_data['City'],
np.where((df_data['Type']=="Visa")|(df_data['Type]=="Amex"),df_data['Name']+df_data['Email'], 'Error'

Error

ValueError: Length of values(2) does not match length of index(12000)

I changed that portion but am still getting the error of ambiguity. — hSin
– hSin, Commented Sep 8, 2022 at 17:56

Ingwersen_erik · Accepted Answer · 2022-09-08 19:49:44Z

Does the following code solve your use case?

# == Imports needed ===========================
import pandas as pd
import numpy as np


# == Example Dataframe =========================
df_data = pd.DataFrame(
    {
        "Type": [None, "MasterCard", "Visa", "Amex"],
        "Name": ["Chris", "John", "Jill", "Mary"],
        "City": ["Tustin", "Cleveland", None, None],
        "Email": [None, None, "[email protected]", "[email protected]"],
        # Expected output:
        "concat_name": [
            "ChrisTustin",
            "JohnCleveland",
            "[email protected]",
            "[email protected]",
        ],
    }
)

# == Solution Implementation ====================
df_data["concat_name2"] = np.where(
    (df_data["Type"].isin(["MasterCard", pd.NA, None])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].isin(["Visa", "Amex"])),
        df_data["Name"].astype(str).replace("None", "")
        + df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)
# == Expected Output ============================
print(df_data)
# Prints:
#          Type   Name       City           Email         concat_name          concat_name2
# 0        None  Chris     Tustin            None         ChrisTustin           ChrisTustin
# 1  MasterCard   John  Cleveland            None       JohnCleveland         JohnCleveland
# 2        Visa   Jill       None  [email protected]  [email protected]    [email protected]
# 3        Amex   Mary       None    [email protected]    [email protected]      [email protected]

Notes

You might also consider simplifying the problem, by replacing the first condition (Type == 'MasterCard' or None) with the opposite of your second condition (Type == 'Visa' or 'Amex'):

df_data["concat_name2"] = np.where(
    (~df_data["Type"].isin(["Visa", "Amex"])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["Email"].astype(str).replace("None", "")
)

Additionally, if you are dealing with messy data, you can also improve the implementation by converting the Type column to lowercase, or uppercase. This makes your code also account for cases where you have values like "mastercard", or "Mastercard", etc.:

df_data["concat_name2"] = np.where(
    (df_data["Type"].astype(str).str.lower().isin(["mastercard", pd.NA, None, "none"])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].astype(str).str.lower().isin(["visa", "amex"])),
        df_data["Name"].astype(str).replace("None", "")
        + df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)

Thank you so much. This worked, with just one change. I had to replace pd.NA with np.NaN since that's what my data contained. Thanks again.

Collectives™ on Stack Overflow

concatenate values in dataframe if a column has specific values and None or Null values

1 Answer 1

Notes

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Notes

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related