0

Imagine that I have the following dict:

 configs = {
    'CONFIG1': [
        {
            "server": "SERVER_1",
            "description": "Testing server 1.",
        },
        {
            "server": "SERVER_2",
            "description": "Testing server 2.",
        }
    ],
    'CONFIG2': [
        {
            "server": "SERVER_3",
            "description": "Testing server 3.",
        },
        {
            "server": "SERVER_4",
            "description": "Testing server 4.",
        }
    ],
    'CONFIG3': [
        
    ]
}

I want to organize this config into a dataframe so that it is like this:

server description config_name
SERVER_1 Testing server 1. CONFIG1
SERVER_2 Testing server 2. CONFIG1
SERVER_3 Testing server 3. CONFIG2
SERVER_4 Testing server 4. CONFIG2

I also want to prevent empty configuration keys such as CONFIG3 from being added to the dataframe.

I've tried doing it like this:

import pandas as pd

df = pd.DataFrame()

for config in configs:
    if configs[config]:
        df = df.append(configs[config], ignore_index=True)
        df['config_name'] = config
    

print(df)

But the configuration name is not right. The output is:

server description config_name
SERVER_1 Testing server 1. CONFIG2
SERVER_2 Testing server 2. CONFIG2
SERVER_3 Testing server 3. CONFIG2
SERVER_4 Testing server 4. CONFIG2
1
  • 1
    Every time you do df['config_name'] = config you are setting the value for the entire column. Commented Mar 10, 2021 at 17:29

4 Answers 4

2

Do not repeatedly append to a dataframe. concat is almost always a better choice:

pd.concat([pd.DataFrame(d).assign(config_name=k) 
           for k,d in configs.items()
          ])

Output:

     server        description config_name
0  SERVER_1  Testing server 1.     CONFIG1
1  SERVER_2  Testing server 2.     CONFIG1
0  SERVER_3  Testing server 3.     CONFIG2
1  SERVER_4  Testing server 4.     CONFIG2
Sign up to request clarification or add additional context in comments.

Comments

0

Let us try explode

out = pd.Series(configs).explode().dropna().apply(pd.Series)
Out[17]: 
           server        description
CONFIG1  SERVER_1  Testing server 1.
CONFIG1  SERVER_2  Testing server 2.
CONFIG2  SERVER_3  Testing server 3.
CONFIG2  SERVER_4  Testing server 4.

Comments

0

df['config_name'] = config assigns this to all rows in the df, not just the rows you just added.

Add it as an entry in the dictionaries before appending to the df.

for name, dicts in configs.items():
    if dicts:
        for d in dicts:
            d['config_name'] = name
        df = df.append(dicts, ignore_index=True)

3 Comments

This will be slow for large data. @Barmar You could append to a list instead
Not significantly slower than the original code. This just fixes the bug, it's not the optimal way to do it.
Yes. I agree with that.
0

A one-liner would be using list comprehension

df = pd.DataFrame([{**d, 'config_name': k} for k,v in configs.items() for d in v])

Output

     server        description config_name
0  SERVER_1  Testing server 1.     CONFIG1
1  SERVER_2  Testing server 2.     CONFIG1
2  SERVER_3  Testing server 3.     CONFIG2
3  SERVER_4  Testing server 4.     CONFIG2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.