0

CONTEXT

I am trying to create a DataFrame and fill out columns in that DataFrame based on whether or not the inserted lists have those columns.

Example Data:
Name    Height   Hair Color   Eye Color
Bob     72           Blonde       Blue
George  64                        Green
John                 Brown        Brown

The columns in the DataFrame would contain all the variables I want recorded but if a person does not have information for each column I'd like to fill out what I can in the DataFrame.

Sample Data / Code

name = ['Name', 'Bob']    <----- Each element has the associated column name and the value in a list.
height = ['Height', '72'] <----- Possible to search for height[0] in columns and place height[1] in there?
eye_color = ['Eye Color', 'Brown']

person = [name, height, eye_color]
columns = ['Name', 'Height', 'Hair Color', 'Eye Color'] 

df = pd.DataFrame(person, columns = columns)

Expected Outcome

Name    Height    Hair   Eye Color
Bob     72               Brown

PROBLEM

I want to be able to pass a person through and fill out a column based on the information that is there and leave any columns that aren't there blank. And append people to the DataFrame in the same fashion. Is this possible?

Please let me know if any additional details would help in answering this question!

4
  • Will there always be a name? Commented Oct 7, 2020 at 23:48
  • @wwii Yep, sorry should have mentioned that. Commented Oct 7, 2020 at 23:52
  • And I should have asked if all of the data (all the various rows) are in one structure or does it get added to the DataFrame one at a time? Commented Oct 7, 2020 at 23:55
  • @wwii all the data is contained in an object. Slightly more complex than the data provided but in this case I have a person_list with "person" objects and person = [name, [variable_list ] ]. This contains the person's name and variable_name/value in the list. Ideally I guess I would use a for loop to parse through each person and append to the dataframe. Let me know if I need to clarify anything more please! Thanks Commented Oct 8, 2020 at 0:02

3 Answers 3

1

You can make an empty DataFrame and just specify the columns.

In [21]: df = pd.DataFrame(columns=['name','a','b','c'])

In [22]: df
Out[22]: 
Empty DataFrame
Columns: [name, a, b, c]
Index: []

Then you can append

In [23]: df = df.append({'name':'bob','c':0},ignore_index=True)

In [24]: df
Out[24]: 
  name    a    b  c
0  bob  NaN  NaN  0

In [25]: df = df.append({'name':'geo','b':'foo'},ignore_index=True)

In [26]: df
Out[26]: 
  name    a    b    c
0  bob  NaN  NaN    0
1  geo  NaN  foo  NaN

Multiple rows:

In [32]: more = [{'name':'qq','b':'apples'},
                 {'name':'wildbill','a':'nickels'},
                 {'name':'lastone','b':'potatoes','c':16}]

In [33]: df = df.append(more,ignore_index=True)

In [33]: 

In [34]: df
Out[34]: 
       name        a         b    c
0       bob      NaN       NaN    0
1       geo      NaN       foo  NaN
2        qq      NaN    apples  NaN
3  wildbill  nickels       NaN  NaN
4   lastone      NaN  potatoes   16

Or if you can ensure all the columns are covered:

In [36]: more
Out[36]: 
[{'b': 'apples', 'name': 'qq'},
 {'a': 'nickels', 'name': 'wildbill'},
 {'b': 'potatoes', 'c': 16, 'name': 'lastone'}]

In [37]: pd.DataFrame(more)
Out[37]: 
         a         b     c      name
0      NaN    apples   NaN        qq
1  nickels       NaN   NaN  wildbill
2      NaN  potatoes  16.0   lastone

Looks like DataFrame will consume a generator.

In [3]: more
Out[3]: 
[{'b': 'apples', 'name': 'qq'},
 {'a': 'nickels', 'name': 'wildbill'},
 {'b': 'potatoes', 'c': 16, 'name': 'lastone'}]

In [4]: def f():
   ...:     for d in more:
   ...:         yield d
   ...:         

In [5]: pd.DataFrame(f())
Out[5]: 
         a         b     c      name
0      NaN    apples   NaN        qq
1  nickels       NaN   NaN  wildbill
2      NaN  potatoes  16.0   lastone

There is probably a better way.

Sign up to request clarification or add additional context in comments.

3 Comments

Is it possible to use the column name associated with the variable (in this case, height[0]) as the input for variable name? I'd like to loop through a list of Person objects and dynamically fill in the columns based on what columns that person may have. Like a for loop for variables in the list in that append section
I guess the better question is if I converted to a dictionary of key,value pairs, could I just replace everything between the { } with the dictionary?
@Yahtzee - see my last edit - it appends three dictionaries at once.
1

Are you open to rethinking what a person object is? If so you should consider dict for each person like below. It makes your life much easier.

import pandas as pd

columns = ['Name', 'Height', 'Hair Color', 'Eye Color'] 
df = pd.DataFrame(columns = columns)

person = {'Name':['Bob'], 'Height':['72'], 'Eye Color': ['Brown']}
person2 = {'Name':['Sue'], 'Height':['48'], 'Eye Color': ['Blue'], 'Hair Color': ['Blonde']}
person3 = {'Name':['Hank'], 'Height':['74'], 'Hair Color': ['Black']}

#add persons... could loop through
df = df.append(pd.DataFrame(person))
df = df.append(pd.DataFrame(person2))
df = df.append(pd.DataFrame(person3))
print(df)

   Name Height Hair Color Eye Color
0   Bob     72        NaN     Brown
0   Sue     48     Blonde      Blue
0  Hank     74      Black       NaN

If you don't want to change person you can also just make a simple function to convert it:

def person_to_dict(person):
    person_dict = {}
    for attr in person:
        person_dict[attr[0]]=[attr[1]]
    return person_dict
person = person_to_dict(person)

Comments

0

Here is a dynamic list comprehension method using the lists you have created in this example:

name = ['Name', 'Bob']
height = ['Height', '72']
eye_color = ['Eye Color', 'Brown']

person = [name, height, eye_color]
columns = ['Name', 'Height', 'Hair Color', 'Eye Color'] 

df = pd.DataFrame([{i:j} for (i,j) in zip([name[0], height[0], eye_color[0]],
                                          [name[1], height[1], eye_color[1]])
                         for col in df.columns if i == col], columns=columns)
df = df.apply(lambda x: pd.Series(x.dropna().values))
df

    Name    Height  Hair Color  Eye Color
0    Bob        72         NaN      Brown

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.