1

I'm trying to create a step in a MongoDB aggregation pipeline which will replace the value of a full Social Security Number (like "123-45-6789") with a masked value, preserving the last four digits (like "XXX-XX-6789").

I see $replaceAll and $replaceOne but it looks like those only accept string literals, not regex expressions. I see there is a regex find that looks like it can very clumsily fit into a replace stage:

"maskedSsn" : {
    "$replaceOne" : { 
        "input" : "$ssn",
        "find" : { 
            "$regexFind" : { 
                "input" : "$ssn", 
                "regex" : "\\d{3}-\\d{2}\\-"
            }
        },
        "replacement" : "XXX-XX-"
    }
}

But with this I get the error "$replaceOne requires that 'find' be a string, found: {match: \"000-00-\", idx: 0, captures: []} (the SSN is mocked, random data here). I'm not sure how to just get the "match" out of the resultant object as part of this step. I could add this regex match as a different field, then get the match in this stage of my aggregation for a simple string replacement, but I would prefer to keep all the logic in one step if possible.

It's hard for me to believe that this isn't possible in one step, with something like a regex replace (which is a feature of every language I've used). Is there a functionality that I'm missing?

1
  • This might seem tedious, but it works for your case by splitting with - and masking on all tokens except the last one. After the masking, the result is rejoined to form the masked SSN Commented Oct 30, 2023 at 22:26

1 Answer 1

3

There is no generalized regexReplace operator in MongoDB.

You could kind of get where you want using $regexFind, like

{$regexFind: {
   input:"$ssn",
   regex:"^\\d{3}-\\d{2}-(\\d{4})$"
}}

That would return an object similar to:

{ 
  match: "123-45-6789",
  idx: 0
  captures: ["6789"]
}

You could use $getField, $arrayElemAt and $concat to extract the captures part, then do what you need with that.

db.collection.aggregate([
  {$addFields: {
      maskedssn: {
        $concat: [
          "XXX-XX-",
          {$arrayElemAt: [
              {$getField: {
                  field: "captures",
                  input: {$regexFind: {
                      input: "$ssn",
                      regex: "^\\d{3}-\\d{2}-(\\d{4})$"
                    }}
              }},
              0
          ]}
        ]
      }
 }}
])

Playground

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for clarifying, I was really hoping I was just missing something but your answer is a great way to do this. Rather than create the field through multiple aggregation steps, I realized (because of your great answer) that I can just concat a string with a substring which is much simpler for my case. Thanks so much!
But is there actually a way to use the regex in a replaceAll as the question asks. This should be a simple operation if possible in mongodb. Lets say I have some fields that may have some non-numeric characters and I want to group documents by a field key that may have special characters. in this case I need to clean the value to alphanumeric then use it to group by. Each document has this field, a find is not going to work here, a simple regex would be a one liner instead of a workaround.
replaceAll is covered here: stackoverflow.com/questions/72685713/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.