1

I am parsing CSV file, there I encounter special characters like á.

String line = scanner.nextLine();

Can any one help me to remove á and corrupted characters from the string line. I tried the following

line.replaceAll("[^a-zA-Z0-9]+","");

but it replacing :, / [ ] symbols.

 inputStream = filePart.getInputStream();
 Scanner scanner = new Scanner(inputStream);
 while (scanner.hasNextLine()) {
     String line = scanner.nextLine();
     System.out.println("Line : " + line.trim());
     String[] fields = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
     for (int i = fields.length - 1; i >= 0; i--) {
         System.out.println(i + " " + fields[i].replaceAll("[á]", ""));
     }
3
  • what do you mean "special characters"? You need to list out all special characters in order to replace it at all. I think no easy way to do so. Commented Oct 25, 2017 at 4:53
  • did you try writing and testing a regex in the input before using it in the program? Commented Oct 25, 2017 at 4:53
  • Show your input string. And what do you mean by "corrupted"? In normalized Unicode a diacritical is represented in its own code point separate from the code point of the character it modifies. For historical reasons, Unicode adopted a few dozen characters in code points from 128 to 255 (Latin-1 Supplement) that combine a character plus diacritical into a single code point. The a with acute accent is one of those, at 225. These two different ways to represent that character+diacritical may be your issue (just a guess). Commented Oct 25, 2017 at 5:47

3 Answers 3

2

Why not just replace a positive character class containing the accented character(s):

String input = "hablá";
input = input.replaceAll("[á]", "");
System.out.println(input);

Or

input = input.replaceAll("[\\u00e1]", "");

Output:

habl

Demo

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for quick reply, I tried this but not working
@Krishna I'm surprised that didn't work. You can also try replacing using the Unicode character sequence for accented letters.
Hi Tim when I try with input= input.replaceAll("[^\\p{ASCII}]", ""); It works
0

Add the characters you don't want stripped out to your regex pattern match.

e.g.

[^a-zA-Z0-9$\/\]\[\:\,]+

Will match a-z, A-Z, 0-9, /, \, ], [, :, ,, Don't forget to escape special characters in the pattern with a \

Also you can use https://regex101.com/ to check the validity of any regex you create.

Comments

0

You can use the replace method as shown below:

line = line.replace("á","");

Demo

4 Comments

Tried this but still its printing aids:áassessment, Thank You
It is working fine, note that you need to store it back in the line variable again.
use replaceAll method to replace all characters 'replaceAll("á","")'
Please try the link for the program, replace method works, here is the link : rextester.com/WOJE93801

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.