How to add rows in Data Frame while for loop?

Question

I want to add a row in an existing data frame, where I don't have a matching regex value. For example,

import pandas as pd
import numpy as np
import re

lst = ['Sarah Kim', 'Added by January 21']

df = pd.DataFrame(lst)

df.columns = ['Info']

name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+"
date_pat = r"\b(\w*Added on\w*)\b"
title_pat = r"\b(\w*at\w*)\b"

for index, row in dff.iterrows():
    if re.findall(name_pat, str(row['Info'])):
        print("Name matched")
    elif re.findall(title_pat, str(row['Info'])):
        print("Title matched")
        if re.findall(title_pat, str(row['Info'])) == None:
            # Add a row here in the dataframe
    elif re.findall(date_pat, str(row['Info'])):
        print("Date matched")
        if re.findall(date_pat, str(row['Info'])) == None:
            # Add a row here in the dataframe

So here in my dataframe df, I do not have a title, but just Name and Date. While looping df, I want to add an empty column for a title.

The output is:

  Info
0 Sarah Kim
1 Added on January 21

My expected output is:

  Info
0 Sarah Kim
1 None
2 Added on January 21

Is there any way that I can add an empty column, or is there a better way?

+++ The dataset I'm working with is just one column with many rows. The rows have some structure, that repeat data of "name, title, date". For example,

  Info
0 Sarah Kim
1 Added on January 21
2 Jesus A. Moore
3 Marketer
4 Added on May 30
5 Bobbie J. Garcia
6 CEO
7 Anita Jobe
8 Designer
9 Added on January 3
...
998 Michael B. Reedy
999 Salesman
1000 Added on December 13

I have sliced the data frame, so I can only extract data frame looks like this:

  Info
0 Sarah Kim
1 Added on January 21

And I'm trying to run a loop for each section, and if a date or title is missing, I will fill with an empty row. So that in the end, I will have:

  Info
0 Sarah Kim
1 **NULL**
2 Added on January 21
3 Jesus A. Moore
4 Marketer
5 Added on May 30
6 Bobbie J. Garcia
7 CEO
8 **NULL**
9 Anita Jobe
10 Designer
11 Added on January 3
...
998 Michael B. Reedy
999 Salesman
1000 Added on December 13

Is there any way that I can add an empty column Yes, have you tried that? The best would be to use vectorized operations for this, you should read the Pandas docs. — AMC
– AMC, Commented Feb 13, 2020 at 4:10
In any case, there are plenty of resources on the subject, can you clarify what the issue is here? — AMC
– AMC, Commented Feb 13, 2020 at 4:11
@AMC Can you at least give me the resources on what to research? I don't need an entire code to solve the problem, but more I'm having issues approaching the problem. And yes I tried to add an empty column but none worked. — Sarah
– Sarah, Commented Feb 13, 2020 at 4:15

jawsem · Accepted Answer · 2020-02-13 17:00:36Z

I see you have a long dataframe with information and each set of information is different. I think the your goal is possibly to have a dataframe where you have 3 columns.

Name,Title and Date

Here is a way I would approach this problem and some code samples. I would take advantage of the df.shift method so I could tie information and use your existing dataframe to create a new one.

I am also making some assumptions based on what you have listed above. First I will assume that only the Title and Date field could be missing. Second I will assume that the order of the is Name,Title and Date like you have mentioned above.

#first step create test data
test_list = ['Sarah Kim','Added on January 21','Jesus A. Moore','Marketer','Added on May 30','Bobbie J. Garcia','CEO','Anita Jobe','Designer','Added on January 3']
test_df =pd.DataFrame(test_list,columns=['Info'])

# second step use your regex to get what type of column each info value is

name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+"
date_pat = r"\b(\w*Added on\w*)\b"
title_pat = r"\b(\w*at\w*)\b"

test_df['Col'] = test_df['Info'].apply(lambda x: 'Name' if re.findall(name_pat, x) else ('Date' if re.findall(date_pat,x) else 'Title'))

# third step is to get the next values from our dataframe using df.shift
test_df['Next_col'] = test_df['Col'].shift(-1)
test_df['Next_col2'] = test_df['Col'].shift(-2)
test_df['Next_val1'] = test_df['Info'].shift(-1)
test_df['Next_val2'] = test_df['Info'].shift(-2)

# Now filter to only the names and apply a function to get our name, title and date
new_df = test_df[test_df['Col']=='Name']

def apply_func(row):
    name = row['Info']
    title = None
    date = None
    if row['Next_col']=='Title':
        title = row['Next_val1']
    elif row['Next_col']=='Date':
        date = row['Next_val1']
    if row['Next_col2']=='Date':
        date = row['Next_val2']
    row['Name'] = name
    row['Title'] = title
    row['date'] = date
    return row

final_df = new_df.apply(apply_func,axis=1)[['Name','Title','date']].reset_index(drop=True)
print(final_df)

               Name     Title                 date
0  Sarah Kim         None      Added on January 21
1  Jesus A. Moore    Marketer  Added on May 30    
2  Bobbie J. Garcia  CEO       None               
3  Anita Jobe        Designer  Added on January 3

There is probably a way that we could do this in less lines of code. I welcome anyone who can make this more efficient, but I believe this should work. Also if you wanted to flatten this back into an array.

flattened_df = pd.DataFrame(final_df.values.flatten(),columns=['Info'])
print(flattened_df)

                   Info
0   Sarah Kim          
1   None               
2   Added on January 21
3   Jesus A. Moore     
4   Marketer           
5   Added on May 30    
6   Bobbie J. Garcia   
7   CEO                
8   None               
9   Anita Jobe         
10  Designer           
11  Added on January 3

Collectives™ on Stack Overflow

How to add rows in Data Frame while for loop?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related