Using Notepad++ Regex to format phone numbers

Question

I'm trying to format phone numbers in a large CSV directory. I will need to re-format this periodically as it changes so this is not a one-off solution. I have used Notepad++'s regex replace feature successfully in the past and would like to use this tool if possible. However, I'm open to better/faster methods including scripting like PowerShell, which I am familiar with.

Sample of number formats in the database:
XXX-XXXX
XXXXXXX
XXXXXXXXXX
1XXXXXXXXXX
(XXX) XXX-XXXX
1(XXX) XXX-XXXX
(1XXX) XXX-XXXX
XXX-XXX-XXXX

That last one is what I want all phone numbers to look like in the final output. For the one that is lacking the area code, I would add a default value. For the ones with extra country codes, I would need to truncate it.

Here are some of the regex searches I've used:
FIND: 1-(\d{3})-(\d{3})-(\d{4})
REPLACE: \1-\2-\3
This works!

FIND: 1$(\d{3})$\s(\d{3})-(\d{4})
REPLACE: \1-\2-\3
This works!

FIND: (\d{11})
REPLACE: ???
This finds the correct string, but I don't know how to format the output.

FIND: (\d{3})-(\d{4})
REPLACE: XXX-\1-\2 (here the XXX is my standard area code that I will add)
This finds the correct substring in XXX-XXX-XXXX as well as XXX-XXXX and zip codes with +4 appended (XXXXX-XXXX). Need to just find the XXX-XXXX without anything preceding it and just from phone numbers. Because this is a CSV file, the actual character before each field is a comma.

My problem is twofold. 1) I don't know how to break up a found string into the parts I need for the replace. I need to convert blocks of digits (7, 10 and 11 digits) and format them to fit the pattern XXX-XXX-XXXX. 2) I don't know how to select just the string I'm searching for (i.e. only XXX-XXXX)

Wiktor Stribiżew · Accepted Answer · 2016-09-14 10:31:17Z

4

Provided you have a sample list of numbers like

Current             Expected
---------------------------------
123-1234            XXX-123-1234
1234567             XXX-123-4567
1234567890          123-456-7890
10123456789         012-345-6789
(123) 456-1234      123-456-1234
1(123) 123-1234     123-123-1234
1-123-123-1234      123-123-1234
(1999) 999-1234     999-999-1234
123-123-1234        123-123-1234

You may use

Find What: ^(?:1-?)?(?|$1?(\d{3})$|(\d{3}))[-\s]?(\d{3})[-\s]?(\d{4})$|^(\d{3})[-\s]?(\d{4})$
Replace With: (?1$1-$2-$3:XXX-$4-$5)

Details:

^ - start of string
(?:1-?)? - optional sequence of 1 and an optional -
(?|$1?(\d{3})$|(\d{3})) - a branch reset group (syntax is (?|...), all groups inside alternative branches receive same IDs) matching either:
- $1?(\d{3})$ - ( + an optional 1 + Group 1 capturing 3 digits + )
- | - or
- (\d{3}) - Group 1 (still! because of a branch reset group) capturing 3 digits
[-\s]? - 1 or 0 (optional) - or whitespace
(\d{3}) - Group 2 capturing 3 digits
[-\s]? - an optional - or whitespace
(\d{4}) - Group 3 capturing 4 digits
$ - end of line
| - OR
^ - start of line
(\d{3}) - Group 4 capturing 3 digits
[-\s]? - an optional - or whitespace
(\d{4}) - Group 5 capturing 4 digits
$ - end of line

The replacement pattern:

(?1 - If Group 1 matched, then use
- $1-$2-$3 - Backreference to Group 1, 2 and 3 with hyphens in between
: - or else
XXX-$4-$5 - XXX (or whatever the country code is), and Group 4 and 5 separated with a hyphen.
) - end of the if-then block.

edited Sep 14, 2016 at 10:31

answered Sep 14, 2016 at 10:25

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Elliott Over a year ago

To change the format to (123) 456-7890 use this replace string (?1$$1$ $2-$3:XXX-$4-$5)

Aarjav · Accepted Answer · 2016-09-14 00:19:39Z

1

I'm not familiar with powershell but yea it would be a good idea to make a small script to do this for you.

For the notepad approach though, i'd try running the replace twice:

FIND: (?:^|,)(\d{3})[ -]?(\d{4})(?:,|$)

REPLACE: XXX-\1-\2 where the XXX is your input area code
FIND: $?1?\(?(\d{3})$?[ -]?(\d{3})[ -]?(\d{4})

REPLACE: \1-\2-\3

I don't think the order matters. Try it out in a test file first.

I'm not sure what you mean by your second question, are the regexes selecting numbers from the wrong column in csv? (if so that's another reason why a script would be better)

answered Sep 14, 2016 at 0:19

Aarjav

1,37412 silver badges24 bronze badges

Collectives™ on Stack Overflow

Using Notepad++ Regex to format phone numbers

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related