How to remove a character within a certain identifiable string using perl regex?

Question

First I need to identify a particular string that looks like this:

my $removeU8374 = 'test A dkdkd荴kdklsl skldsls荴lksdkdk skdkd荴kdkls';

I then want to remove a particular character (U+8374) from the string.

So far I've tried this:

$removeU8374 = ~ s/^test A (.*[^\N U+8374])//g;

But it's not working...

There cannot be a space in the middle of =~. It's one operator. — ikegami
– ikegami, Commented Dec 1, 2013 at 19:49

ikegami · Accepted Answer · 2013-12-01 19:44:59Z

3

All you need is

$removeU8374 =~ s/\N{U+8374}//g;

or

$removeU8374 =~ s/\x{8374}//g;

If that doesn't work, it's because $removeU8374 doesn't actually contain U+8374. You can see what it actually contains using

use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper($removeU8374));

Demonstration:

use utf8;                               # Source file is encoded using UTF-8
use encode ':std', ':encoding(UTF-8)';  # Terminal expects UTF-8.

my $removeU8374 = "test A dkdkd荴kdklsl skldsls荴lksdkdk skdkd荴kdkls";
$removeU8374 =~ s/\N{U+8374}//g;
print("$removeU8374\n");

answered Dec 1, 2013 at 19:44

ikegami

391k17 gold badges291 silver badges555 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ikegami · Accepted Answer · 2013-12-01 19:48:53Z

1

To remove a character with regular expression, you can capture the part before and after it and put these parts together. According to ESCAPE SEQUENCES a Unicode character is matched with \N{U+8374}

$removeU8374 =~ s/^(test A .*)\N{U+8374}(.*)/$1$2/;

This captures test A ... and everything after U+8374 and concatenates both together.

edited Dec 1, 2013 at 19:48

ikegami

391k17 gold badges291 silver badges555 bronze badges

answered Dec 1, 2013 at 19:17

Olaf Dietsche

74.4k9 gold badges113 silver badges214 bronze badges

2 Comments

ikegami Over a year ago

Better: $removeU8374 =~ s/^(test A .*)\N{U+8374}/$1/;

ikegami Over a year ago

Even better: $removeU8374 =~ s/^test A .*\K\N{U+8374}//;

Collectives™ on Stack Overflow

How to remove a character within a certain identifiable string using perl regex?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related