0

Problem Statement : Remove/Rename Special characters (#,$, Back slash, & etc.) from json keys and replace in the main json file.

Approach :

  1. I am trying to get all the keys of deeply nested json first.
  2. Check for special characters in each key then rename/replace and write back to the json file.

Issue :

  1. Json I have is very deep nested so the logic i have written works for simple json but not deep nested json.

Code :

import json
import base64

def getKeys(object, prev_key = None, keys = []):
    if type(object) != type({}):
        keys.append(prev_key)
        return keys
    new_keys = []
    for k, v in object.items():
        if prev_key != None:
            new_key = "{}.{}".format(prev_key, k)
        else:
            new_key = k
        new_keys.extend(getKeys(v, new_key, []))
    return new_keys

Above code works for below json : It print all the json keys

json_string= '{"Relate:0/name": "securityhub-ec2-instance-managed-by-ssm-dc0c9f18","RelatedAWSResources:0/type": "AWS::Config::ConfigRule","aws/securityhub/ProductName": "Security Hub","aws/securityhub/CompanyName": "AWS"}'

Output :

['Relate:0/name', 'RelatedAWSResources:0/type', 'aws/securityhub/ProductName', 'aws/securityhub/CompanyName']

But it does not work for below json :

{
  "version": "0",
  "id": "ffd8a756-9fe6-fa54-af4e-cf85fa3d2896",
  "detail-type": "Security Hub Findings - Imported",
  "source": "aws.securityhub",
  "account": "220307202362",
  "time": "2021-10-17T14:26:25Z",
  "region": "us-west-2",
  "resources": [
    "arn:aws:securityhub:us-west-2::product/aws/securityhub/arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f"
  ],
  "detail": {
    "findings": [
      {
        "ProductArn": "arn:aws:securityhub:us-west-2::product/aws/securityhub",
        "Types": [
          "Software and Configuration Checks/Industry and Regulatory Standards/PCI-DSS"
        ],
        "Description": "This control checks for the CloudWatch metric filters using the following pattern { $.userIdentity.type = \"Root\" && $.userIdentity.invokedBy NOT EXISTS && $.eventType != \"AwsServiceEvent\" } It checks that the log group name is configured for use with active multi-region CloudTrail, that there is at least one Event Selector for a Trail with IncludeManagementEvents set to true and ReadWriteType set to All, and that there is at least one active subscriber to an SNS topic associated with the alarm.",
        "Compliance": {
          "Status": "FAILED",
          "StatusReasons": [
            {
              "Description": "Multi region CloudTrail with the required configuration does not exist in the account",
              "ReasonCode": "CLOUDTRAIL_MULTI_REGION_NOT_PRESENT"
            }
          ],
          "RelatedRequirements": [
            "PCI DSS 7.2.1"
          ]
        },
        "ProductName": "Security Hub",
        "FirstObservedAt": "2021-10-17T14:26:18.383Z",
        "CreatedAt": "2021-10-17T14:26:18.383Z",
        "LastObservedAt": "2021-10-17T14:26:21.346Z",
        "CompanyName": "AWS",
        "FindingProviderFields": {
          "Types": [
            "Software and Configuration Checks/Industry and Regulatory Standards/PCI-DSS"
          ],
          "Severity": {
            "Normalized": 40,
            "Label": "MEDIUM",
            "Product": 40,
            "Original": "MEDIUM"
          }
        },
        "ProductFields": {
          "StandardsArn": "arn:aws:securityhub:::standards/pci-dss/v/3.2.1",
          "StandardsSubscriptionArn": "arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1",
          "ControlId": "PCI.CW.1",
          "RecommendationUrl": "https://docs.aws.amazon.com/console/securityhub/PCI.CW.1/remediation",
          "StandardsControlArn": "arn:aws:securityhub:us-west-2:220307202362:control/pci-dss/v/3.2.1/PCI.CW.1",
          "aws/securityhub/ProductName": "Security Hub",
          "aws/securityhub/CompanyName": "AWS",
          "aws/securityhub/annotation": "Multi region CloudTrail with the required configuration does not exist in the account",
          "Resources:0/Id": "arn:aws:iam::220307202362:root",
          "aws/securityhub/FindingId": "arn:aws:securityhub:us-west-2::product/aws/securityhub/arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f"
        },
        "Remediation": {
          "Recommendation": {
            "Text": "For directions on how to fix this issue, consult the AWS Security Hub PCI DSS documentation.",
            "Url": "https://docs.aws.amazon.com/console/securityhub/PCI.CW.1/remediation"
          }
        },
        "SchemaVersion": "2018-10-08",
        "GeneratorId": "pci-dss/v/3.2.1/PCI.CW.1",
        "RecordState": "ACTIVE",
        "Title": "PCI.CW.1 A log metric filter and alarm should exist for usage of the \"root\" user",
        "Workflow": {
          "Status": "NEW"
        },
        "Severity": {
          "Normalized": 40,
          "Label": "MEDIUM",
          "Product": 40,
          "Original": "MEDIUM"
        },
        "UpdatedAt": "2021-10-17T14:26:18.383Z",
        "WorkflowState": "NEW",
        "AwsAccountId": "220307202362",
        "Region": "us-west-2",
        "Id": "arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f",
        "Resources": [
          {
            "Partition": "aws",
            "Type": "AwsAccount",
            "Region": "us-west-2",
            "Id": "AWS::::Account:220307202362"
          }
        ]
      }
    ]
  }
} 

Strip punctuation function :

import string
from typing import Optional, Iterable, Union


delete_dict = {sp_character: '' for sp_character in string.punctuation}

PUNCT_TABLE = str.maketrans(delete_dict)


def strip_punctuation(s: str,
                      exclude_chars: Optional[Union[str, Iterable]] = None) -> str:
    """
    Remove punctuation and spaces from a string.

    If `exclude_chars` is passed, certain characters will not be removed
    from the string.

    """
    punct_table = PUNCT_TABLE.copy()
    if exclude_chars:
        for char in exclude_chars:
            punct_table.pop(ord(char), None)

    # Next, remove the desired punctuation from the string
    return s.translate(punct_table) 

Usage:

cleaned_keys = {json data}
for key, expected_key in cleaned_keys.items():
    actual_key = strip_punctuation(key)
4
  • you have quite a bit of code that could be improved. for example the type({}) is somewhat wasteful, as it creates a new dict object each time. Commented Oct 22, 2021 at 3:45
  • "But it does not work for below json" Well, what is the result you want for that JSON? What is the rule that tells you what to do with it? What exactly does "all the keys" mean - everything that is a dict key at any level of the structure? Or just what? Commented Oct 22, 2021 at 3:46
  • 2
    Please only use tags that are relevant to the question you are asking. I have removed the aws-lambda tag, because the mere fact that you got the data from there has nothing to do with what you are actually doing with the data. Commented Oct 22, 2021 at 3:48
  • @KarlKnechtel : "All the keys" means it should output all the json keys at any level OR output all the json keys which has special characters like "aws/securityhub/ProductName" Commented Oct 22, 2021 at 3:52

1 Answer 1

1

Problem Statement : Remove/Rename Special characters (#,$, Back slash, & etc.) from json keys and replace in the main json file.

If I'm understanding you correctly, you don't need to create your own function (for example a recursive function) that iterates over the JSON data.

The good news is that it's possible to achieve this when loading the JSON string to a Python object itself, through the use of the object_pairs_hook parameter. When you define a callable for this parameter, it will be passed in a list of tuples, where each tuple is a key-value pair from the JSON data. So you will only need to replace all keys in the input data that you receive.

Here is a somewhat contrived example, that wraps all JSON keys (nested or otherwise) with exclamation marks !! around them:

import json


json_string = r"""
{
  "version": "0",
  "id": "ffd8a756-9fe6-fa54-af4e-cf85fa3d2896",
  "detail-type": "Security Hub Findings - Imported",
  "source": "aws.securityhub",
  "account": "220307202362",
  "time": "2021-10-17T14:26:25Z",
  "region": "us-west-2",
  "resources": [
    "arn:aws:securityhub:us-west-2::product/aws/securityhub/arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f"
  ],
  "detail": {
    "findings": [
      {
        "ProductArn": "arn:aws:securityhub:us-west-2::product/aws/securityhub",
        "Types": [
          "Software and Configuration Checks/Industry and Regulatory Standards/PCI-DSS"
        ],
        "Description": "This control checks for the CloudWatch metric filters using the following pattern { $.userIdentity.type = \"Root\" && $.userIdentity.invokedBy NOT EXISTS && $.eventType != \"AwsServiceEvent\" } It checks that the log group name is configured for use with active multi-region CloudTrail, that there is at least one Event Selector for a Trail with IncludeManagementEvents set to true and ReadWriteType set to All, and that there is at least one active subscriber to an SNS topic associated with the alarm.",
        "Compliance": {
          "Status": "FAILED",
          "StatusReasons": [
            {
              "Description": "Multi region CloudTrail with the required configuration does not exist in the account",
              "ReasonCode": "CLOUDTRAIL_MULTI_REGION_NOT_PRESENT"
            }
          ],
          "RelatedRequirements": [
            "PCI DSS 7.2.1"
          ]
        },
        "ProductName": "Security Hub",
        "FirstObservedAt": "2021-10-17T14:26:18.383Z",
        "CreatedAt": "2021-10-17T14:26:18.383Z",
        "LastObservedAt": "2021-10-17T14:26:21.346Z",
        "CompanyName": "AWS",
        "FindingProviderFields": {
          "Types": [
            "Software and Configuration Checks/Industry and Regulatory Standards/PCI-DSS"
          ],
          "Severity": {
            "Normalized": 40,
            "Label": "MEDIUM",
            "Product": 40,
            "Original": "MEDIUM"
          }
        },
        "ProductFields": {
          "StandardsArn": "arn:aws:securityhub:::standards/pci-dss/v/3.2.1",
          "StandardsSubscriptionArn": "arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1",
          "ControlId": "PCI.CW.1",
          "RecommendationUrl": "https://docs.aws.amazon.com/console/securityhub/PCI.CW.1/remediation",
          "StandardsControlArn": "arn:aws:securityhub:us-west-2:220307202362:control/pci-dss/v/3.2.1/PCI.CW.1",
          "aws/securityhub/ProductName": "Security Hub",
          "aws/securityhub/CompanyName": "AWS",
          "aws/securityhub/annotation": "Multi region CloudTrail with the required configuration does not exist in the account",
          "Resources:0/Id": "arn:aws:iam::220307202362:root",
          "aws/securityhub/FindingId": "arn:aws:securityhub:us-west-2::product/aws/securityhub/arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f"
        },
        "Remediation": {
          "Recommendation": {
            "Text": "For directions on how to fix this issue, consult the AWS Security Hub PCI DSS documentation.",
            "Url": "https://docs.aws.amazon.com/console/securityhub/PCI.CW.1/remediation"
          }
        },
        "SchemaVersion": "2018-10-08",
        "GeneratorId": "pci-dss/v/3.2.1/PCI.CW.1",
        "RecordState": "ACTIVE",
        "Title": "PCI.CW.1 A log metric filter and alarm should exist for usage of the \"root\" user",
        "Workflow": {
          "Status": "NEW"
        },
        "Severity": {
          "Normalized": 40,
          "Label": "MEDIUM",
          "Product": 40,
          "Original": "MEDIUM"
        },
        "UpdatedAt": "2021-10-17T14:26:18.383Z",
        "WorkflowState": "NEW",
        "AwsAccountId": "220307202362",
        "Region": "us-west-2",
        "Id": "arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f",
        "Resources": [
          {
            "Partition": "aws",
            "Type": "AwsAccount",
            "Region": "us-west-2",
            "Id": "AWS::::Account:220307202362"
          }
        ]
      }
    ]
  }
}
"""


def clean_keys(o):
    return {f'!!{k}!!': v for k, v in o}


r = json.loads(json_string, object_pairs_hook=clean_keys)
print(r)

Result object:

{'!!version!!': '0', '!!id!!': 'ffd8a756-9fe6-fa54-af4e-cf85fa3d2896', '!!detail-type!!': 'Security Hub Findings - Imported', '!!source!!': 'aws.securityhub', '!!account!!': '220307202362', '!!time!!': '2021-10-17T14:26:25Z', '!!region!!': 'us-west-2', '!!resources!!': ['arn:aws:securityhub:us-west-2::product/aws/securityhub/arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f'], '!!detail!!': {'!!findings!!': [{'!!ProductArn!!': 'arn:aws:securityhub:us-west-2::product/aws/securityhub', '!!Types!!': ['Software and Configuration Checks/Industry and Regulatory Standards/PCI-DSS'], '!!Description!!': 'This control checks for the CloudWatch metric filters using the following pattern { $.userIdentity.type = "Root" && $.userIdentity.invokedBy NOT EXISTS && $.eventType != "AwsServiceEvent" } It checks that the log group name is configured for use with active multi-region CloudTrail, that there is at least one Event Selector for a Trail with IncludeManagementEvents set to true and ReadWriteType set to All, and that there is at least one active subscriber to an SNS topic associated with the alarm.', '!!Compliance!!': {'!!Status!!': 'FAILED', '!!StatusReasons!!': [{'!!Description!!': 'Multi region CloudTrail with the required configuration does not exist in the account', '!!ReasonCode!!': 'CLOUDTRAIL_MULTI_REGION_NOT_PRESENT'}], '!!RelatedRequirements!!': ['PCI DSS 7.2.1']}, '!!ProductName!!': 'Security Hub', '!!FirstObservedAt!!': '2021-10-17T14:26:18.383Z', '!!CreatedAt!!': '2021-10-17T14:26:18.383Z', '!!LastObservedAt!!': '2021-10-17T14:26:21.346Z', '!!CompanyName!!': 'AWS', '!!FindingProviderFields!!': {'!!Types!!': ['Software and Configuration Checks/Industry and Regulatory Standards/PCI-DSS'], '!!Severity!!': {'!!Normalized!!': 40, '!!Label!!': 'MEDIUM', '!!Product!!': 40, '!!Original!!': 'MEDIUM'}}, '!!ProductFields!!': {'!!StandardsArn!!': 'arn:aws:securityhub:::standards/pci-dss/v/3.2.1', '!!StandardsSubscriptionArn!!': 'arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1', '!!ControlId!!': 'PCI.CW.1', '!!RecommendationUrl!!': 'https://docs.aws.amazon.com/console/securityhub/PCI.CW.1/remediation', '!!StandardsControlArn!!': 'arn:aws:securityhub:us-west-2:220307202362:control/pci-dss/v/3.2.1/PCI.CW.1', '!!aws/securityhub/ProductName!!': 'Security Hub', '!!aws/securityhub/CompanyName!!': 'AWS', '!!aws/securityhub/annotation!!': 'Multi region CloudTrail with the required configuration does not exist in the account', '!!Resources:0/Id!!': 'arn:aws:iam::220307202362:root', '!!aws/securityhub/FindingId!!': 'arn:aws:securityhub:us-west-2::product/aws/securityhub/arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f'}, '!!Remediation!!': {'!!Recommendation!!': {'!!Text!!': 'For directions on how to fix this issue, consult the AWS Security Hub PCI DSS documentation.', '!!Url!!': 'https://docs.aws.amazon.com/console/securityhub/PCI.CW.1/remediation'}}, '!!SchemaVersion!!': '2018-10-08', '!!GeneratorId!!': 'pci-dss/v/3.2.1/PCI.CW.1', '!!RecordState!!': 'ACTIVE', '!!Title!!': 'PCI.CW.1 A log metric filter and alarm should exist for usage of the "root" user', '!!Workflow!!': {'!!Status!!': 'NEW'}, '!!Severity!!': {'!!Normalized!!': 40, '!!Label!!': 'MEDIUM', '!!Product!!': 40, '!!Original!!': 'MEDIUM'}, '!!UpdatedAt!!': '2021-10-17T14:26:18.383Z', '!!WorkflowState!!': 'NEW', '!!AwsAccountId!!': '220307202362', '!!Region!!': 'us-west-2', '!!Id!!': 'arn:aws:securityhub:us-west-2:220307202362:subscription/pci-dss/v/3.2.1/PCI.CW.1/finding/b5a325b7-eab1-439f-b14d-1dc52c3a423f', '!!Resources!!': [{'!!Partition!!': 'aws', '!!Type!!': 'AwsAccount', '!!Region!!': 'us-west-2', '!!Id!!': 'AWS::::Account:220307202362'}]}]}}

Edit: Using the strip_punctuation function provided in the question, the clean_keys function would then be defined as follows:

def clean_keys(o):
    return {strip_punctuation(k): v for k, v in o}
Sign up to request clarification or add additional context in comments.

7 Comments

This is great. Can you throw more light on clean_keys(o). I have some keys in json which needs to be renamed. Those keys are "aws/securityhub/ProductName" , "Resources:0/Id" which should become like "awssecurityhubProductName" , "Resources0Id"
Right, I think rather than renaming in this case, it makes sense to strip out all the punctuation from all the keys, I believe that should still give the desired result.
I am new to python but i have such strip function , please let me know how i can use the same in above example. Posting above for your reference
Assuming you have such a function - for example, a function named strip_key that takes a string as a parameter and returns the "cleaned" string with the punctuation removed - In the above example where I have it as f'!!{k}!!', you would just need to replace it like strip_key(k).
Please see my function above and help me how to put it together
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.