0

I've got an XML file that includes email addresses as part of each record. I'd like to obscure the email addresses (for privacy), but also keep their "uniqeness" to allow combining of records (purchases in this case) if there is more than one from the same email address.

Figured there might be a way using regex to replace the characters before and after the "@" with * or similar. Figuring that 3 or 4 characters before and after preserves the privacy and (for the most part) keeps the "uniqueness".

Suggestions on the best way to do this (including some completely different options than what I'm thinking)?

Thanks.

2
  • You're probably planning on doing this in a particular language. You might want to specify. Commented Oct 30, 2013 at 2:01
  • I'm fairly open, but good point. Can just use a text editor, but willing to give it a shot in javascript or PHP. Commented Oct 30, 2013 at 2:06

1 Answer 1

1

The regex would look something like this: ([^@]{1,4})@(.{1,4}) which gets up to 4 characters before and after the @.

How you would do the replacements would depend on your language, and how you are loading the file. If you are just doing this once in a Text Editor like Ultra Edit, and not in the middle of a program then I would do something like this:

Replace all [^@>]@[^<] with *@*
Replace all [^@>]{2}@[^<]{2} with **@**
Replace all [^@>]{3}@[^<]{3} with ***@***
Replace all [^@>]{4}@[^<]{4} with ****@****

That way it will still do something on short email addresses. (Tweaked to not include your xml tags)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.