String text = "un’accogliente villa del."; // Unicode text
text = Normalizer.normalize(text, Form.NFC); // Normalize text
byte[] bytes = text.getBytes(StandardCharsets.UTF_8); // Index 5 UTF-8; 1 byte
char[] chars = text.toCharArray(); // Index 3 UTF-16; 2 bytes (indexOf)
int[] codePoints = text.codePoints().toArray(); // Index 3 UTF-32; 4 bytes
int charIndex = text.indexOf("accogliente");
int codePointIndex = (int) text.substring(0, charIndex).codePoints().count();
int byteIndex = text.substring(0, charIndex).getBytes(StandardCharsets.UTF_8).length;
UTF-32 is the Unicode code points, the numbering of all symbols with U+XXXX where there maybe more (or less) than 4 hexadecimal digits.
Text normalisation is needed as é could be one code point, or two code points, a zero-width ´ followed by a e.
The question of UTF-8 byte index to UTF-16 char index:
int charIndex = new String(text.getBytes(StandardCharsets.UTF_8),
0, byteIndex, StandardCharsets.UTF_8).length();
indexOf, which is correctly giving 3?"un'accogliente villa del.".indexOf("accogliente") == 35) is positioned?