Python List to Dataframe with conditions

Question

I have a long list (sample below)

df_list = ['Joe',
 'UK',
 'Buyout',
 '10083',
 '4323',
 'http://info2.com',
 'Linda',
 'US',
 'Liquidate',
 '97656',
 '1223',
 'http://global.com',
 '[email protected]'           
          ]

As you can see, the list contains information about an individual (Joe and Linda's). However, the problem is that for some observations (Joe in this example), I am missing 7th element, which corresponds to the entity's email address, because for Linda, we do have this person's email, thus populated.

I want to turn this list into a dataframe with 7 columns (below), and for observations that do not have a valid email address (does not contain "@"), I want to put Null/empty values, rather than the next element, which would be the next observation's NAME column for email column.

cols = ['NAME'
,'COUNTRY'
,'STRATEGIES'
,'TOTAL FUNDS'
,'ESTIMATED PAYOFF'
,'WEBSITE'
,'EMAIL']

So far, this is where I am at

big_list = []  #intention is to append N (number of unique entity) small_lists into a big_list and call pd.DataFrame(big_list)
small_list = [] #intention is to create a small_list for each observation/entity, containing 7 values, including email or null if empty
for element in df_list:
    small_list.append(element)
if ("@" not in small_list):
    small_list[-1] = None

Any help would be highly appreciated! Thanks

Are you missing only 7th element(or some other element?) in all the items in the list?? — Shubham Sharma
– Shubham Sharma, Commented Mar 17, 2020 at 5:45
@ShubhamSharma Thanks for reply. It actually sometimes misses WEBSITE values, as well as EMAIL. I guess for emails, we could use the constraint that it must contain '@' and for website it has to contain at least 1 '.' — Si_CPyR
– Si_CPyR, Commented Mar 17, 2020 at 5:48

kederrac · Accepted Answer · 2020-03-17 07:02:02Z

1

you could use a generator:

def gen_batch(df_list):
    i = 6
    while i <= len(df_list):
        if i < len(df_list) and '@' in df_list[i]:
            yield df_list[i-6: i+1] 
            i += 7
        else:
            yield df_list[i-6: i] + [pd.np.NAN]
            i += 6

pd.DataFrame(gen_batch(df_list), columns=cols)

output:

answered Mar 17, 2020 at 7:02

kederrac

17.4k6 gold badges36 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Si_CPyR Over a year ago

thank you! i like your solution, because it's relatively straight forward for me to enforce additional rules, like doing the similar operation when another column contains missing values.

Sociopath · Accepted Answer · 2020-03-17 05:53:15Z

1

IIUC you need:

new_list = []
counter = 0
while True:
    try:
        if "@" not in df_list[counter+6]:
            new_list.append(df_list[counter:counter+6])
            counter += 6
        else:
            new_list.append(df_list[counter:counter+7])
            counter += 7
    except IndexError:
        break


df = pd.DataFrame(new_list, columns=cols)

print(df)

Output:

    NAME COUNTRY STRATEGIES TOTAL FUNDS ESTIMATED PAYOFF            WEBSITE  \
0    Joe      UK     Buyout       10083             4323   http://info2.com   
1  Linda      US  Liquidate       97656             1223  http://global.com   

              EMAIL  
0              None  
1  [email protected]

answered Mar 17, 2020 at 5:53

Sociopath

13.4k22 gold badges53 silver badges82 bronze badges

1 Comment

Si_CPyR Over a year ago

thank you Sociopath! This works too like a magic, but went with another one that I was able to add more rules.

Collectives™ on Stack Overflow

Python List to Dataframe with conditions

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related