I am using Azure Databricks and Python 3.
I have a data frame (df1) with a column called 'BodyJson' which is of 'string' data type.
'BodyJson' is a complex json structure - an example is shown below of one row from (df1).
Column BodyJson From df1
{
"Timestamp": 3690414400,
"Sender": "10.99.45.6:32768:wifivm0002EF",
"Type": "1.3.6.1.4.1.9.9.599.0.8",
"CaptureTime": 637616722902708244,
"Variables": [
{
"Key": "1.3.6.1.4.1.9.9.513.1.2.1.1.1.0",
"Value": "1"
},
{
"Key": "1.3.6.1.4.1.9.9.513.1.1.1.1.5.200.249.249.41.0.128",
"Value": {
"Hex": "66696E7362792D7761703033",
"String": "123456-wap03"
}
},
{
"Key": "1.3.6.1.4.1.9.9.599.1.3.2.1.2.0",
"Value": 1
},
{
"Key": "1.3.6.1.4.1.9.9.599.1.3.2.1.3.0",
"Value": {
"Hex": "0A9603F4",
"String": "\n?\u0003?"
}
},
{
"Key": "1.3.6.1.4.1.9.9.599.1.3.1.1.27.114.154.56.22.154.160",
"Value": {
"Hex": "766D6564776966692F646965676F33756B407961686F6F2E636F6D",
"String": "vmedwifi/[email protected]"
}
},
{
"Key": "1.3.6.1.4.1.9.9.599.1.3.1.1.28.114.154.56.22.154.160",
"Value": {
"Hex": "56697267696E204D65646961",
"String": "Virgin Media"
}
},
{
"Key": "1.3.6.1.4.1.9.9.599.1.3.1.1.38.114.154.56.22.154.160",
"Value": {
"Hex": "36306562663133322F37323A39613A33383A31363A39613A61302F3931323639363136",
"String": "60ebf132/72:9a:38:16:9a:a0/91269616"
}
},
{
"Key": "1.3.6.1.4.1.9.9.599.1.3.1.1.8.114.154.56.22.154.160",
"Value": {
"Hex": "C8F9F9290080",
"String": "???)\u0000?"
}
}
]
}
The only part of 'BodyJson' I am interested in is called "Variables" which holds a array of json rows. These rows come in two forms - examples forms with example values shown below:
Form-1
{
"Key": "1.3.6.1.4.1.9.9.513.1.2.1.1.1.0",
"Value": "1"
}
Form-2
{
"Key": "1.3.6.1.4.1.9.9.513.1.1.1.1.5.200.249.249.41.0.128",
"Value": {
"Hex": "66696E7362792D7761703033",
"String": "123456-wap03"
}
}
I would like to create a two new data frames that can hold rows of either form-1 or form-2 - for example the columns would be...
New Data Frame holding only Form-1 rows...
- Key(string) = "1.3.6.1.4.1.9.9.513.1.2.1.1.1.0"
- Value(string) = "1"
New Data Frame holding only Form-2 rows...
- Key(string) - "1.3.6.1.4.1.9.9.513.1.1.1.1.5.200.249.249.41.0.128"
- Value(string) - "123456-wap03" (Popualated with values from "Value"."String". NB: I am not interested in values from "Value"."Hex")
How do I go about extracting the data from the column 'BodyJson' and create 2 new data frames?

