I want to add a row in an existing data frame, where I don't have a matching regex value. For example,
import pandas as pd
import numpy as np
import re
lst = ['Sarah Kim', 'Added by January 21']
df = pd.DataFrame(lst)
df.columns = ['Info']
name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+"
date_pat = r"\b(\w*Added on\w*)\b"
title_pat = r"\b(\w*at\w*)\b"
for index, row in dff.iterrows():
if re.findall(name_pat, str(row['Info'])):
print("Name matched")
elif re.findall(title_pat, str(row['Info'])):
print("Title matched")
if re.findall(title_pat, str(row['Info'])) == None:
# Add a row here in the dataframe
elif re.findall(date_pat, str(row['Info'])):
print("Date matched")
if re.findall(date_pat, str(row['Info'])) == None:
# Add a row here in the dataframe
So here in my dataframe df, I do not have a title, but just Name and Date. While looping df, I want to add an empty column for a title.
The output is:
Info 0 Sarah Kim 1 Added on January 21
My expected output is:
Info 0 Sarah Kim 1 None 2 Added on January 21
Is there any way that I can add an empty column, or is there a better way?
+++ The dataset I'm working with is just one column with many rows. The rows have some structure, that repeat data of "name, title, date". For example,
Info 0 Sarah Kim 1 Added on January 21 2 Jesus A. Moore 3 Marketer 4 Added on May 30 5 Bobbie J. Garcia 6 CEO 7 Anita Jobe 8 Designer 9 Added on January 3 ... 998 Michael B. Reedy 999 Salesman 1000 Added on December 13
I have sliced the data frame, so I can only extract data frame looks like this:
Info 0 Sarah Kim 1 Added on January 21
And I'm trying to run a loop for each section, and if a date or title is missing, I will fill with an empty row. So that in the end, I will have:
Info 0 Sarah Kim 1 **NULL** 2 Added on January 21 3 Jesus A. Moore 4 Marketer 5 Added on May 30 6 Bobbie J. Garcia 7 CEO 8 **NULL** 9 Anita Jobe 10 Designer 11 Added on January 3 ... 998 Michael B. Reedy 999 Salesman 1000 Added on December 13