1

I have so many various names

Input:

Depsai P.R.N.
Dênis De Castro
John D.J. 
Andrew E.
D.J. JOHN 
JOHN Mical D.J.

I need output like this.

D. P.R.N.
D. C.
J. D.J. 
A. E.
D.J. J.
J. M. D.J.

If the name like Dênis De Castro i need the output: D. C. If the name contains theses cases (De|Di|Le|La|Van|Der) in between should not capture the first word.

 use strict;
    use warnings;
    my $gn = qq(<name>Depsai P.R.N.</name>
                <name>D&#x00EA;nis De Castro</name>
                <name>Andrew E.</name>
                <name>John D.J.</name>
                <name>D.J. John</name>
                <name>John Mical D.J.</name>);
        my @int = $gn =~ m{<name>(.*?)</name>}ig;
        my $ini=();
        foreach my $initial(@int){
            $ini .= "$1\. " while($initial =~ s/(?:^|[ \.\,\;]+)([A-Z])\w*(\b|$)//s);
            $ini =~ s/ $//mi;
            print join("\n",$ini);exit;
        }

  Please give some regex pattern.
  Thanks advance.
1
  • 2
    removing the lowecase letters will give you the desired output. Commented Nov 4, 2014 at 4:43

2 Answers 2

1

You can try below one liner :

InputFile:

<name>Depsai P.R.N.</name>
<name>D&#x00EA;nis De Castro</name>
<name>John D.J.</name> 
<name>Andrew E.</name>
<name>D.J. JOHN</name> 
<name>JOHN Mical D.J.</name>
<name>Roc&#x00ED;o</name>

On Windows cmd prompt:

perl -lne "if($_ =~ /<name(>.*?<)\/name>/) {$result = $1; $result =~ s/(\s)(De|Di|Le|La|Van|Der)(\s)/$1$3/g; $result =~ s/((?:>|\s)[A-Z])[^\.]/$1\./g; $result =~ s/.*?(\s*[A-Z]\.\s*).*?/$1/g;$result =~ s/([a-z]|[A-Z][A-Z]).*?<//g;$result =~ s/<//g;print $result;}" InputFile

On Unix:

perl -lne 'if($_ =~ /<name(>.*?<)\/name>/) {$result = $1; $result =~ s/(\s)(De|Di|Le|La|Van|Der)(\s)/$1$3/g; $result =~ s/((?:>|\s)[A-Z])[^\.]/$1\./g; $result =~ s/.*?(\s*[A-Z]\.\s*).*?/$1/g;$result =~ s/([a-z]|[A-Z][A-Z]).*?<//g;$result =~ s/<//g;print $result;}' InputFile

Output:

D. P.R.N.
D. C.
J. D.J. 
A. E.
D.J. J.
J. M. D.J.
R.
Sign up to request clarification or add additional context in comments.

2 Comments

You said do not capture first words if they are (De|Di|Le|La|Van|Der) in between then can you tell me what is the expected output of this D&#x00EA;nis Van Castro and even for this D&#x00EA;nis John La Castro ?
For this case not working @Praveen <name>Roc&#x00ED;o</name> should come like <name>R.</name>
0
(?<=[a-zA-Z])[a-zA-Z]+

You can try this.Replace by ..See demo.

http://regex101.com/r/bB8jY7/12

import re
p = re.compile(ur'(?<=[a-zA-Z])[a-zA-Z]')
test_str = u"Depsai P.R.N. \nJohn D.J. \nAndrew E."
subst = u"."

result = re.sub(p, subst, test_str)

6 Comments

am not downvoted. now i edit the question. i need space if the initials have otherwise no need. i working in perl the lookbehind regex not working for me it shows error like not support for lookbehind. if space present in the name should come.
@depsai try now.See demo.
your code is working in regex101.com and regex buddy. but in perl program lookbehind regex not working please give some regex without using ?<=.
@depsai try [a-z]+ replace by ..
thanks working i used this ([A-Z])([a-zA-Z]+) replace with $1.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.