4

this is my first ever post here so go easy! :) I am attempting to convert data from Excel to JSON using the Python Pandas library.

I have data in Excel that looks like the table below, the columns detailed as "Unnamed: x" are blank, I used these headers as that's how they are output when converting to JSON. There are around 20 tests formatted like the sample below:

Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
Test 1 Menu Setting Value
Menu1 Setting1 Value1
Test 2 A B C
1 2 3

I would like to put these in to JSON to look something like this:

{
    "Test 1": [ 
    "Menu":"Menu1",
    "Setting":"Setting1",
    "Value":"Value1",
    ]
}

And so on...

I can convert the current code to JSON (but not the format detailed above, and I have been experimenting with creating different Pandas dataframes in Python. At the moment the JSON data I get looks something like this:

"3":[
        {
            "Unnamed: 0":"Test1",
            "Unnamed: 1":"Menu",
            "Unnamed: 2":"Setting",
            "Unnamed: 2":"Value"
        }
"4":[
        {
            "Unnamed: 1":"Menu1",
            "Unnamed: 2":"Setting1",
            "Unnamed: 2":"Value1"
        }

So I am doing some manual work (copying and pasting) to set it up in the desired format.

Here is my current code:

import pandas

# Pointing to file location and specifying the sheet name to convert
excel_data_fragment = pandas.read_excel('C:\\Users\\user_name\\tests\\data.xls', sheet_name='Tests')

# Converting to data frame
df = pandas.DataFrame(excel_data_fragment)

# This will get the values in Column A and removes empty values
test_titles = df['Unnamed: 0'].dropna(how="all")

# This is the first set of test values
columnB = df['Unnamed: 1'].dropna(how="all")

# Saving original data in df and removing rows which contain all NaN values to mod_df
mod_df = df.dropna(how="all")

# Converting data frame with NaN values removed to json
df_json = mod_df.apply(lambda x: [x.dropna()], axis=1).to_json()

print(mod_df)
2
  • Could you give an example of the output you would like to achieve? Commented May 20, 2021 at 12:09
  • Thanks for the reply Linden, the output I'd like to achieve is the JSON format: { "Test 1": [ "Menu":"Menu1", "Setting":"Setting1", "Value":"Value1", ] } Commented May 20, 2021 at 12:15

1 Answer 1

3

Your Excel sheet is basically composed of several distinct subtables put together (one for each test). The way I would go to process them in pandas would be to use groupby and then process each group as a table. DataFrame.to_dict will be your friend here to output JSON-able objects.

First here is some sample data that ressembles what you have provided:

import pandas as pd


rows = [
    [],
    [],
    ["Test 1", "Menu", "Setting", "Value"],
    [None, "Menu1", "Setting1", "Value1"],
    [None, "Menu2", "Setting2", "Value2"],
    [],
    [],
    ["Test 2", "A", "B", "C"],
    [None, 1, 2, 3],
    [None, 4, 5, 6],
]

df = pd.DataFrame(rows, columns=[f"Unnamed: {i}" for i in range(1, 5)])

df looks like:

  Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
0       None       None       None       None
1       None       None       None       None
2     Test 1       Menu    Setting      Value
3       None      Menu1   Setting1     Value1
4       None      Menu2   Setting2     Value2
5       None       None       None       None
6       None       None       None       None
7     Test 2          A          B          C
8       None          1          2          3
9       None          4          5          6

Then use the following snippet, which cleans up all the missing values in df and turns each subtable into a dict.

# Remove entirely empty rows
df = df.dropna(how="all")

# Fill missing values in column 1
df["Unnamed: 1"] = df["Unnamed: 1"].fillna(method="ffill")


def process_group(g):
    # Drop first column
    g = g.drop("Unnamed: 1", axis=1)
    # Use first row as column names
    g = g.rename(columns=g.iloc[0])
    # Drop first row
    g = g.drop(g.index[0])
    # Convert to dict
    return g.to_dict(orient="records")


output = df.groupby("Unnamed: 1").apply(process_group).to_dict()

In the end, output is equal to:

{
  "Test 1": [
    {
      "Menu": "Menu1",
      "Setting": "Setting1",
      "Value": "Value1"
    },
    {
      "Menu": "Menu2",
      "Setting": "Setting2",
      "Value": "Value2"
    }
  ],
  "Test 2": [
    {
      "A": 1,
      "B": 2,
      "C": 3
    },
    {
      "A": 4,
      "B": 5,
      "C": 6
    }
  ]
}

You can finally get the JSON string by simply using:

import json

output_str = json.dumps(output)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.