0

I have a dataframe with name+address/email information based on the type. Based on a type I want to concat name+address or name+email into a new column (concat_name) within the dataframe. Some of the types are null and are causing ambiguity errors. Identifying the nulls correctly in place is where I'm having trouble.

NULL = None
data = {
    'Type': [NULL, 'MasterCard', 'Visa','Amex'],
    'Name': ['Chris','John','Jill','Mary'],
    'City': ['Tustin','Cleveland',NULL,NULL ],
    'Email': [NULL,NULL,'[email protected]','[email protected]']
}

df_data = pd.DataFrame(data)
#Expected resulting df column:
df_data['concat_name'] = ['ChrisTustin', 'JohnCleveland','[email protected],'[email protected]']

Attempt one using booleans

if df_data['Type'].isnull() | df_data[df_data['Type'] == 'Mastercard':
   df_data['concat_name'] = df_data['Name']+df_data['City']
if df_data[df_data['Type'] == 'Visa' | df_data[df_data['Type'] == 'Amex':
   df_data['concat_name'] = df_data['Name']+df_data['Email']
else:
   df_data['concat_name'] = 'Error'

Error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Attempt two using np.where

df_data['concat_name'] = np.where((df_data['Type'].isna()|(df_data['Type']=='MasterCard'),df_data['Name']+df_data['City'],
np.where((df_data['Type']=="Visa")|(df_data['Type]=="Amex"),df_data['Name']+df_data['Email'], 'Error'

Error

ValueError: Length of values(2) does not match length of index(12000)
2
  • 1
    You cannot use | operator in if loop. Commented Sep 8, 2022 at 17:54
  • I changed that portion but am still getting the error of ambiguity. Commented Sep 8, 2022 at 17:56

1 Answer 1

1

Does the following code solve your use case?

# == Imports needed ===========================
import pandas as pd
import numpy as np


# == Example Dataframe =========================
df_data = pd.DataFrame(
    {
        "Type": [None, "MasterCard", "Visa", "Amex"],
        "Name": ["Chris", "John", "Jill", "Mary"],
        "City": ["Tustin", "Cleveland", None, None],
        "Email": [None, None, "[email protected]", "[email protected]"],
        # Expected output:
        "concat_name": [
            "ChrisTustin",
            "JohnCleveland",
            "[email protected]",
            "[email protected]",
        ],
    }
)

# == Solution Implementation ====================
df_data["concat_name2"] = np.where(
    (df_data["Type"].isin(["MasterCard", pd.NA, None])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].isin(["Visa", "Amex"])),
        df_data["Name"].astype(str).replace("None", "")
        + df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)
# == Expected Output ============================
print(df_data)
# Prints:
#          Type   Name       City           Email         concat_name          concat_name2
# 0        None  Chris     Tustin            None         ChrisTustin           ChrisTustin
# 1  MasterCard   John  Cleveland            None       JohnCleveland         JohnCleveland
# 2        Visa   Jill       None  [email protected]  [email protected]    [email protected]
# 3        Amex   Mary       None    [email protected]    [email protected]      [email protected]

Notes

You might also consider simplifying the problem, by replacing the first condition (Type == 'MasterCard' or None) with the opposite of your second condition (Type == 'Visa' or 'Amex'):

df_data["concat_name2"] = np.where(
    (~df_data["Type"].isin(["Visa", "Amex"])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["Email"].astype(str).replace("None", "")
)

Additionally, if you are dealing with messy data, you can also improve the implementation by converting the Type column to lowercase, or uppercase. This makes your code also account for cases where you have values like "mastercard", or "Mastercard", etc.:

df_data["concat_name2"] = np.where(
    (df_data["Type"].astype(str).str.lower().isin(["mastercard", pd.NA, None, "none"])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].astype(str).str.lower().isin(["visa", "amex"])),
        df_data["Name"].astype(str).replace("None", "")
        + df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much. This worked, with just one change. I had to replace pd.NA with np.NaN since that's what my data contained. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.