I am in need of help!
I have an Excel file that contains data I am trying to get into a dataframe, but the data is in a table form that is not easy to work with. Example:
I want to ultimately get it into a dataframe of this form:
Meal Food Calories
Breakfast English Muffins 120
Breakfast Peanut Butter Spread 190
Morning Snack Banana 90
Morning Snack Nectarine 59
... ... ...
And a separate dataframe for the Daily Totals in this form (ignore the 'Date' column for now):
Date Calories Fat Fiber Carbs Sodium Protein Water
2017-07-01 699 26 16 93 612 32 0
2017-07-02 ... ... ... ... ... ... ...
I am struggling with getting this into a dataframe. Looking at the screenshot of the dataset, it made sense to first store the data into a dictionary, but that leaves me with a bunch of NaN values, due to all the blank cells.
My thoughts on getting the 'Meal' column the way I want it to look is to do a forward fill, but that would mean I would have to use a Series or Dataframe, which I haven't gotten to yet.
This is what I currently have:
df = pd.read_excel('filename.xls', 'Foods')
# create a list to store the dictionaries
food_logs = []
# this is code to reformat the string values in a certain column
# to get the name of the sheets I need to use in the Excel. This can be ignored
for day in df.values:
if day[1] != '0':
foodLogSheetName = 'Food Log ' + day[0].replace('-', '')
food_logs.append(foodLogSheetName)
# 'foods' is now a list of nested dictionaries (think of everything in the
# first screenshot as the outer dictionary, and each of the column as the
# inner dictionary)
foods = [xls.parse(food_log).to_dict() for food_log in food_logs]
This is what 'foods' currently is, if I printed it out with a line between each of the outer dictionaries:

I have the option of working with a CSV file instead, but instead of multiple sheets, I would have multiple 'tables' stacked vertically, if that makes sense
I would greatly appreciate any tips anyone can provide, please!
