1

I have a problem that I cannot find a solution for - so here comes the request for assistance.

I receive an export from a DB that looks like this (of course, more than one line in reality):

"created_at","country","query_success","query_result"
"2022-08-18 08:38:38","Germany",True,"{""servers"": {""windows"": 0, ""linux"": 0}, ""workstations"": {""windows"": 0, ""mac"": 0}}"

I import it into Pandas in this way:

df = pd.read_csv('data.csv', index_col='created_at', parse_dates=True)

Which turns it into this:

created_at          country query_success   query_result
2022-08-18 08:38:38 Germany True            {"servers": {"windows": 0, "linux": 0}, "workstations": {"windows": 0, "mac": 0}}

The problem I'm trying to resolve is the json/dictionary that populates the query_result column.

What I'd like to do would be to create and populate four new columns based on this data.

server_windows
server_linux
workstation_windows
workstation_mac

I've done quite some googling and have seen some solutions that uses the ast module but can't seem to get it right. It could potenially be due to the it being two nested dictionaries/json structures?

Thankful for any help/assistance.

0

1 Answer 1

3

Try:

import json

dfs = pd.concat([pd.json_normalize(json.loads(d)) for d in df["query_result"]])
dfs = pd.DataFrame(dfs.values, columns=dfs.columns, index=df.index)

df = pd.concat([df, dfs], axis=1)
df.pop("query_result")
print(df.to_markdown())

Prints:

created_at country query_success servers.windows servers.linux workstations.windows workstations.mac
2022-08-18 08:38:38 Germany True 0 0 0 0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.