Suppose a list of strings:
lst1 = ['A1 B1 C1', 'A2 B2 D1', 'S1 M1 A3', 'A4 B3 G1','H1 K1 W1']
I want to create a table by searching a specific value within each string(if available) then populate a pandas dataframe .
Like so:
'A' 'B' 'C' 'D'
string1 A1 B1 C1 Nan
string2 A2 B2 Nan D1
string3 A3 Nan Nan Nan
string4 A4 B3 Nan Nan
string5 Nan Nan Nan Nan
In order to search within each string, I split each of them into a list making it a nested list in order to run a for loop within each string to search. My RegEx game is not too strong but I think this can be done with a good handle on RegEx.
My current code :
import pandas as pd
lst1 = ['A1 B1 C1', 'A2 B2 D1', 'S1 M1 A3', 'A4 B3 G1','H1 K1 W1']
modlst1 = []
for each in lst1:
modlst1.append(each.split())
rows = range(len(modlst1)) ### rows for each string
cols = ['A','B','C','D'] ### cols for each string
df = pd.DataFrame(index = rows, columns = cols)
df = df.fillna(0)
### Populating values
for each in rows:
for stuff in modlst1[each]:
if stuff.startswith('A'):
df['A'] = stuff
elif stuff.startswith('B'):
df['B'] = stuff
elif stuff.startswith('C'):
df['C'] = stuff
elif stuff.startswith('D'):
df['D'] = stuff
I'm very new to Python so I am still learning string manipulation and search and find. I am sure there has to be a better way to do this. My solution is not working as same values keep populating in my dataframe, when I try to put them in dataframe. But when I do:
if stuff.startswith('A'):
print(stuff)
loop runs fine and I get different values of "A","B","C","D". For eg: (i DON'T WANT THIS)
'A' 'B' 'C' 'D'
string1 A1 B1 C1 Nan
string2 A1 B1 C1 D1
string3 A1 B1 C1 D1
string4 A1 B1 C1 D1
string5 A1 B1 C1 D1