2

I have a long list (sample below)

df_list = ['Joe',
 'UK',
 'Buyout',
 '10083',
 '4323',
 'http://info2.com',
 'Linda',
 'US',
 'Liquidate',
 '97656',
 '1223',
 'http://global.com',
 '[email protected]'           
          ]

As you can see, the list contains information about an individual (Joe and Linda's). However, the problem is that for some observations (Joe in this example), I am missing 7th element, which corresponds to the entity's email address, because for Linda, we do have this person's email, thus populated.

I want to turn this list into a dataframe with 7 columns (below), and for observations that do not have a valid email address (does not contain "@"), I want to put Null/empty values, rather than the next element, which would be the next observation's NAME column for email column.

cols = ['NAME'
,'COUNTRY'
,'STRATEGIES'
,'TOTAL FUNDS'
,'ESTIMATED PAYOFF'
,'WEBSITE'
,'EMAIL']

So far, this is where I am at

big_list = []  #intention is to append N (number of unique entity) small_lists into a big_list and call pd.DataFrame(big_list)
small_list = [] #intention is to create a small_list for each observation/entity, containing 7 values, including email or null if empty
for element in df_list:
    small_list.append(element)
if ("@" not in small_list):
    small_list[-1] = None

Any help would be highly appreciated! Thanks

2
  • Are you missing only 7th element(or some other element?) in all the items in the list?? Commented Mar 17, 2020 at 5:45
  • @ShubhamSharma Thanks for reply. It actually sometimes misses WEBSITE values, as well as EMAIL. I guess for emails, we could use the constraint that it must contain '@' and for website it has to contain at least 1 '.' Commented Mar 17, 2020 at 5:48

2 Answers 2

1

you could use a generator:

def gen_batch(df_list):
    i = 6
    while i <= len(df_list):
        if i < len(df_list) and '@' in df_list[i]:
            yield df_list[i-6: i+1] 
            i += 7
        else:
            yield df_list[i-6: i] + [pd.np.NAN]
            i += 6

pd.DataFrame(gen_batch(df_list), columns=cols)  

output: enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

thank you! i like your solution, because it's relatively straight forward for me to enforce additional rules, like doing the similar operation when another column contains missing values.
1

IIUC you need:

new_list = []
counter = 0
while True:
    try:
        if "@" not in df_list[counter+6]:
            new_list.append(df_list[counter:counter+6])
            counter += 6
        else:
            new_list.append(df_list[counter:counter+7])
            counter += 7
    except IndexError:
        break


df = pd.DataFrame(new_list, columns=cols)

print(df)

Output:

    NAME COUNTRY STRATEGIES TOTAL FUNDS ESTIMATED PAYOFF            WEBSITE  \
0    Joe      UK     Buyout       10083             4323   http://info2.com   
1  Linda      US  Liquidate       97656             1223  http://global.com   

              EMAIL  
0              None  
1  [email protected]

1 Comment

thank you Sociopath! This works too like a magic, but went with another one that I was able to add more rules.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.