2

I'm developing an app that has a feature to search a text (searchString ) in Arabic language if it is in a .txt file (.txt file is also in Arabic language).

Since Android doesn't support Arabic 100%, String.indexof() doesn't work properly. So, I thought, I would put the searchString into a Char[] array and instead of comparing the whole word, I compare every character. So I put the searchString into a char[] and start comparing the array to the String

Is it available anywhere a code that searches if the sequence that is in the char[] array is into a String?

example:

char[]={t,e,s,t}  String1{qqwtestq} String2{qwqtqwe}  -> String1:TRUE   String2:FALSE

Thanks

4 Answers 4

1

indexOf and contains don't use character encoding of any sort and you can use characters which are not used in your character encoding for example. i.e. it is ignored for these functions.

All String.indexOf() and contains do is compare character for character. I am not sure what behaviour you are expecting for 100% Arabic support. Here is a simplified version what the indexOf()/contains() does

public static int indexOf(String string, char[] chars) {
    LOOP:
    for (int i = 0; i < string.length() - chars.length; i++) {
        for (int j = 0; j < chars.length; j++)
            if (string.charAt(i + j) != chars[j])
                continue LOOP;
        return i;
    }
    return -1;
}

public static void main(String args[]) {
    char[] chars = "test".toCharArray();
    String one = "qqwtestq";
    String two = "qwqtqwe";
    String str = new String(chars);
    System.out.println("indexOf(" + one+", " + Arrays.toString(chars) + ") = " + indexOf(one, chars));
    System.out.println(one + ".indexOf(" + str + ") = " + one.indexOf(str));
    System.out.println("indexOf(" + two+", " + Arrays.toString(chars) + ") = " + indexOf(two, chars));
    System.out.println(two + ".indexOf(" + str + ") = " + two.indexOf(str));

    char[] chars2 = { '\uffff', '\uFeFF' };
    String test = "qqw\uffff\uFeFFq";
    String str2 = new String(chars2);
    System.out.println("indexOf(" + test+", " + Arrays.toString(chars2) + ") = " + indexOf(test, chars2));
    System.out.println(test + ".indexOf(" + str2 + ") = " + test.indexOf(str2));
}

Prints

indexOf(qqwtestq, [t, e, s, t]) = 3
qqwtestq.indexOf(test) = 3
indexOf(qwqtqwe, [t, e, s, t]) = -1
qwqtqwe.indexOf(test) = -1
indexOf(qqw??q, [?, ?]) = 3
qqw??q.indexOf(??) = 3

Can you provide an example where this method doesn't work?

EDIT: This test checks every possible character to see if indexOf behaves as expected. i.e. the same for every possible character.

for(int i=Character.MIN_VALUE;i<= Character.MAX_VALUE;i++) {
    String find = new String(new char[] {(char) i});
    String str = new String(new char[] {(char) (i+1), (char) i});
    String str1 = new String(new char[] {(char) (i+1)});

    int test1 = str.indexOf(find);
    if (test1 != 1)
        throw new AssertionError("test1 failed i="+i);

    int test2 = str1.indexOf(find);
    if (test2 != -1)
        throw new AssertionError("test2 failed i="+i);
}

Finds no discrepancies.

Sign up to request clarification or add additional context in comments.

8 Comments

This is the worst time complexity you can get from a string searching algorithm. I also don't think searching for a char[] will solve the character encoding problem.
@Ioan, its basically the same as the one Oracle's Java uses. Can you suggest another, more efficient approach?
Well the indexof() worked in the emulator, which doesnt support Arabic properly (Arabic language is made of characters that are connected each other, and Android shows the characters, but doesnt connect them) while they didnt work on my phone so I though there was a connection between indexof and the lack of Arabic support.
This may not won't work if it turns a single character into multiple char's. However, I would suggest you either a) use an encoding where each character is one char. b) use a comparision which understands the rules of how it turns each character into multiple characters.
Can you please explain "a) use an encoding where each character is one char." .. how can i specify which encoding to use? i would use utf8
|
1

Implement KMP!
http://en.m.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

EDIT
Sorry, I did not know about Arabic on Android. Some suggestions point to Cyanogen, and that only Android 3.0 supports Arabic.

1 Comment

As the others said I think that the problem isnt in searching but in encoding :S
0

Try StringUtils contains method.

Comments

0

How about this?

    char[] ch = { 't', 'e', 's', 't' };

    String string1 = "qqwtestq";
    if (string1.contains((new StringBuffer()).append(ch)))
        System.out.println("true");
    else
        System.out.println("false");

3 Comments

+0: contains just calls indexOf()
As I said above, I dont want to use indexOf since (I GUESS) it doesnt go well with Arabic words for the lack of proper support. What I want to compare each character alone.. I know that that would require more resources but I cant do anything about it
@Omar, indexOf/contains compares each character alone. See my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.