1

First I need to identify a particular string that looks like this:

my $removeU8374 = 'test A dkdkd荴kdklsl skldsls荴lksdkdk skdkd荴kdkls';

I then want to remove a particular character (U+8374) from the string.

So far I've tried this:

$removeU8374 = ~ s/^test A (.*[^\N U+8374])//g;

But it's not working...

1
  • 1
    There cannot be a space in the middle of =~. It's one operator. Commented Dec 1, 2013 at 19:49

2 Answers 2

3

All you need is

$removeU8374 =~ s/\N{U+8374}//g;

or

$removeU8374 =~ s/\x{8374}//g;

If that doesn't work, it's because $removeU8374 doesn't actually contain U+8374. You can see what it actually contains using

use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper($removeU8374));

Demonstration:

use utf8;                               # Source file is encoded using UTF-8
use encode ':std', ':encoding(UTF-8)';  # Terminal expects UTF-8.

my $removeU8374 = "test A dkdkd荴kdklsl skldsls荴lksdkdk skdkd荴kdkls";
$removeU8374 =~ s/\N{U+8374}//g;
print("$removeU8374\n");
Sign up to request clarification or add additional context in comments.

Comments

1

To remove a character with regular expression, you can capture the part before and after it and put these parts together. According to ESCAPE SEQUENCES a Unicode character is matched with \N{U+8374}

$removeU8374 =~ s/^(test A .*)\N{U+8374}(.*)/$1$2/;

This captures test A ... and everything after U+8374 and concatenates both together.

2 Comments

Better: $removeU8374 =~ s/^(test A .*)\N{U+8374}/$1/;
Even better: $removeU8374 =~ s/^test A .*\K\N{U+8374}//;

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.