Adding newly created variables into existing dataframe in Python Pandas

Question

I would like to create a SplitName() function that 1) converts all letters to lower case, 2) splits the name entry by space (ie. "John Snow" into "John" and "Snow") and 3) creates a data frame in Pandas that takes the split name entities and creates new columns (one as "first name" and another as "last name").

I am able to create new series variable from the data frame and manipulate the name entities into lower case and splitting by space. But I don't know how to create an overall data frame that takes in the original data frame's information as well as the new "lower-cased" and "split" variables information

def SplitName():
    data = pd.read_csv("C:\data.csv")
    frame2 = DataFrame(data)
    frame2.columns = ["Name", "Ethnicity", "Event_Place", "Birth_Place"]
    name_lower = frame2["Name"].str.lower() # make names lower case
    name_split = name_lower.str.split() # split string element by space
    name_split_smallList = name_split[0:10] # small set to easily handle
    #print name_split_smallList
    '''for lastName in name_split_smallList:
        print lastName[0] + " " + lastName[-1]'''

    name_lower_list = name_lower.tolist()
    frame_all = frame2 + name_lower_list
    print frame_all[0:10]

Woody Pride · Accepted Answer · 2014-11-18 07:31:26Z

To create new columns in a data frame you can just assign a series in the same way you would assign some data a variable name: with an equals sign.

The following assumes that the CSV file has a header called 'Name' and that Name never can be split more than once i.e there are no middle names. The function simply created a data frame by reading the csv file, then creates two series objects of lowered strings. The first_name series takes the lowered string at index position 0 for all values of Name split but whitespace, an the 'second_name' series takes the lowered string at index position 1 for all values of Name split by whitespace. The Series objects are created using list comprehension... This therefore assumes that there are no Names with more than two components i.e. no middle names. This might be something you want to check first.

def SplitName():
    DF = pd.read_csv("C:\data.csv") #this already created a DataFrame.
    DF['first_name'] = pd.Series([Name.lower().split()[0] for Name in DF['Name'], index = DF.index)
    DF['second_name'] = pd.Series([Name.lower().split()[1] for Name in DF['Name'], index = DF.index)
    return DF

Collectives™ on Stack Overflow

Adding newly created variables into existing dataframe in Python Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related