4

I have a two dimensional string array look like this: enter image description here
The first column contains characters of many strings, other columns are extra data for character.
I want to search a string (maybe change to array character) in this array to get all match indexes (start - end). For example, when I search with key "next", the result should be [5 - 8], [13 - 16] (the highlight parts in image above).
Shortly, I need a method look like this:

  public static List<Interval> search(String searchText, String[][] data, int columnsCount, int rowCount){
      // Convert search text to String array
      String[] searchArr = getStringArray(searchText);
      // then search in data

  }
  
  // where Interval is:
  public class Interval{
       public int start;
       public int end;
  }   

Is there any fast way to search like this,cause my data is very large?
Thanks in advance!

3
  • One of the most effective search is BinarySearch. Red Black Tree or AVL Tree can be implemented for more effective search. Commented Nov 29, 2013 at 6:50
  • 2
    There are a whole bunch of String searching algorithms. Commented Nov 29, 2013 at 6:51
  • 1
    Also, note that if your array contains [letter1, data, data, data..., letter2, data...], you will get a bad cache hit rate, which is essential for performance when working with large data sets. Try to re-arrange your data as [letter1, letter2, ... letterN, data, data, ...]. Here is why. (don't mind that he is talking about C++, this applies to all languages). Commented Nov 29, 2013 at 6:54

4 Answers 4

3

I would recommend to adapt the String[][] to a CharSequence. Then you are free to do everything you can do with a CharSequence and this also means that you can use java.util.regex.Matcher to search for the string and you don't need to implement an own search algorithm.

For example:

public class Main {
    public static void main(String[] args) {
        String[][] array2d = createArray();

        int charSeqColumn = 0;
        CharSequence charSequnce = new Array2DColumnCharSequnce(array2d, charSeqColumn);

        System.out.println(charSequnce.toString());

        Pattern patttern = Pattern.compile("ext");
        Matcher matcher = patttern.matcher(charSequnce);

        while (matcher.find()) {
            String matchGroup = matcher.group();
            int start = matcher.start();
            int end = matcher.end() - 1;

            String msg = MessageFormat.format("{0} matched at: [{1}] - [{2}]", matchGroup, start, end);
            System.out.println(msg);
        }
    }

    private static String[][] createArray() {
        String[][] array2d = new String[2][10];
        array2d[0][0] = "N";
        array2d[0][1] = "e";
        array2d[0][2] = "x";
        array2d[0][3] = "t";
        array2d[0][4] = " ";
        array2d[0][5] = "N";
        array2d[0][6] = "e";
        array2d[0][7] = "x";
        array2d[0][8] = "t";
        array2d[0][9] = " ";

        array2d[1][0] = "H";
        array2d[1][1] = "e";
        array2d[1][2] = "l";
        array2d[1][3] = "l";
        array2d[1][4] = "o";
        array2d[1][5] = "W";
        array2d[1][6] = "o";
        array2d[1][7] = "r";
        array2d[1][8] = "l";
        array2d[1][9] = "d";
        return array2d;
    }
}

will output

Next Next 
ext matched at: [1] - [3]
ext matched at: [6] - [8]

I would implement the CharSequence adaption like this

class Array2DColumnCharSequnce implements CharSequence {

    private int column;
    private String[][] array2d;
    private int endIndex;
    private int startIndex;

    public Array2DColumnCharSequnce(String[][] array2d, int column) {
        this(array2d, column, 0, array2d[column].length);
        this.array2d = array2d;
        this.column = column;
    }

    public Array2DColumnCharSequnce(String[][] array2d, int column,
            int startIndex, int endIndex) {
        this.array2d = array2d;
        this.column = column;
        this.startIndex = startIndex;
        this.endIndex = endIndex;
    }

    public int length() {
        return endIndex - startIndex;
    }

    public char charAt(int index) {
        String charString = array2d[column][startIndex + index];
        return charString.charAt(0);
    }

    public CharSequence subSequence(int start, int end) {
        Array2DColumnCharSequnce array2dColumnCharSequnce = new Array2DColumnCharSequnce(
                array2d, column, start, end);
        return array2dColumnCharSequnce;
    }

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder(this);
        return sb.toString();
    }
}

Note: The Array2DColumnCharSequnce is just a quick implementation and it does not address exception handling yet nor it addresses what happens when there are more than one char in a string column.

Why to use a CharSequence decorator

The difference with adapting the array to a CharSequence to other approaches is that you use a standard java interface that can be re-used with many other classes and thus is very flexible.

Some often used standard java classes that take a CharSequence as parameter

See the full list here.

Use the code above and try this to see how flexibe the decorator is.

public static void main(String[] args) {
    String[][] array2d = createArray();

    CharSequence charSequnce = new Array2DColumnCharSequnce(array2d, 0);

    boolean contentEquals = "Next Next ".contentEquals(charSequnce);
    System.out.println(contentEquals);

    CharSequence column1CharSequnce = new Array2DColumnCharSequnce(array2d, 1);
    String replaced = "I want to say Next Next ".replace(charSequnce, column1CharSequnce);
    System.out.println(replaced);
}

will output

true
I want to say HelloWorld

Finally everyone has to decide what he/she wants and what fits the situation. I prefer implementations that give me more options if I can get them "almost" for free.

Sign up to request clarification or add additional context in comments.

4 Comments

How can I get more than one match strings by this way?
Thanks, I wonder do we need a CharSequence? I just export my first column to a string, then use Matcher to find, just like @Trying said, it is similar to search a subString in a String. For example, I search next in abcdnextponexnextpour by using Matcher and regex and got the same result. Do you think there is something wrong with this way?
@R4j The short answer: no we don't need a CharSequence. I wanted to show an object oriented way of implementing your issue. My approach is a decorator pattern on the 2d array. The only advantage of this is that I don't need to create a new string object and copy the chars. But like I said before... I just wanted to show an oo way. I think that there is nothing wrong with your way.
How to detect full location of character? Like "e matched at [0][5]" or "f matched at [1][8]"?
1

It is similar to search a subString in a String.

e.g.

A B C D N E X T J H  J  N   E   N   E   X   T   O 

0 1 2 3 4 5 6 7 8 9 10  11  12  13  14  15  16  17

So the answer should be [4-7] and [13-16].

public static List<Integer> findIndexes(String source, String toFind){
    List<Integer> list = new LinkedList<Integer>();//it will return the starting indexes of the found substring, we can easily find the end e=index by adding the length of the other. 
    int start = 0;
    while(start < source.length()){
        if(source.charAt(start)==toFind.charAt(0)){//if the char is same then find whether the whole toFind string is present or not.
            if(isMatch(source, toFind, start)){//if it is found than increment the source pointer to the end after the toFind string
                list.add(start);
                start = start+toFind.length();
                continue;
            }
        }
        start++;
    }
    return list;
}
private static boolean isMatch(String s1, String s2, int srcIndex){
    int desIndex = 0;
    while(desIndex<s2.length() && s1.charAt(srcIndex)==s2.charAt(desIndex)){
        srcIndex++;
        desIndex++;
    }
    if(desIndex==s2.length()){
        return true;
    }
    return false;
}

And sample driver program:

public static void main(String[] args) {    
        String s1="abcdnextponexnextpour";
        String s2 = "next";
        List<Integer> list = findIndexes(s1, s2);
        for(int i : list){
            System.out.println(i);
        }
    }

It will output the indexes:

4
13

i.e. you can add the length of the toFind String to calculate the last index.

Comments

0

This is your solution:

   void main(String a[][],String k){
   String m="";
   for(int i=0;i<a.length;i++)
   m+=a[i][0];
   int n=0,x;
   while(n<m.length()){
   n=m.indexOf(k,n);
   x=n+k.length();
   System.out.println(n+"-"+x);
   n=x;
   }
   }
   void main(String a[][],char k){
   for(int i=0;i <a.length;i++)
   if(a[i][0]==k)System.out.println(i);
   }

it extracts the first strings of the dda and searches it. you may generate the value n and x as class interval and include it in list.

1 Comment

This way not work for all case, for example, I search only 1 character, it run infinite loop
0

I would implement search as follows -

public static List<Interval> search(
    String searchText, String[][] data) {
  List<Interval> al = new ArrayList<>();
  if (searchText != null) {
    searchText = searchText.trim().toUpperCase();
    char[] toMatch = searchText.toCharArray();
    for (int i = 0; i < data.length; i++) {
      if (data[i] != null && data.length > i
          && data[i].length > 0
          && data[i][0].charAt(0) == toMatch[0]) {
        boolean matched = true;
        for (int t = 1; t < toMatch.length; t++) {
          if (i + t > data.length
              || data[i + t][0].charAt(0) != toMatch[t]) {
            i += (t - 1);
            matched = false;
            break;
          }
        }
        if (matched) {
          Interval interval = new Interval();
          interval.start = i - 1;
          interval.end = interval.start + (toMatch.length - 1);
          al.add(interval);
        }
      }
    }
  }
  return al;
}

And, I would modify Interval to add a toString() like this

public String toString() {
  return String.valueOf(start) + "-" + end;
}

Finally, to test it I would use this main method.

public static void main(String[] args) {
  String[][] test = { { "N" }, { "A" }, { "N" },
      { "A" }, { "T" }, { "A" }, { "N" }, { "E" },
      { "X" }, { "T" }, { "E" }, { "R" }, { "N" },
      { "B" }, { "N" }, { "E" }, { "X" }, { "T" } };
  List<Interval> al = search("next", test);
  for (Interval i : al) {
    System.out.println(i);
  }
}

And I did receive this output -

5-8
13-16

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.