1

I Have looked through Stack, but none of the examples work in my case (from what I have tried).

I want to count how many times a word occurs in an array. This is done by splitting up an input String, such as "Henry and Harry went out" and counting the distinct characters of varying length (in the following example it is 2) Please forgive me if my style is bad, its my first project...

He = 1

en = 2

nr = 1

ry = 2

a = 1

an = 1

etc....... Here is my code for the constructor:

   public NgramAnalyser(int n, String inp) 
   { 
       boolean processed = false;
       ngram = new HashMap<>(); // used to store the ngram strings and count
       alphabetSize = 0;
       ngramSize = n;
       ArrayList<String> tempList = new ArrayList<String>();
       System.out.println("inp length: " + inp.length());
       System.out.println();
       int finalIndex = 0;

       for(int i=0; i<inp.length()-(ngramSize - 1); i++)
       {
           tempList.add(inp.substring(i,i+ngramSize));
           alphabetSize++;
           if(i == (inp.length()- ngramSize))
        // if i (the index) has reached the boundary limit ( before it gets an error), then...
           {
               processed = true;
               finalIndex = i;
               break;
           }
    }

       if(processed == true)
       { 
          for(int i=1; i<(ngramSize); i++)
          {
             String startString = inp.substring(finalIndex+i,inp.length());
             String endString = inp.substring(0, i);
             tempList.add(startString + endString);
          }  
       }

       for(String item: tempList)
       {
        System.out.println(item);
       }

    }
    // code for counting the ngrams and sorting them
6
  • It's not clear the ngramSize where comes from. Commented May 22, 2017 at 8:32
  • You can take a look at the StringUtils class of Apache. The class has many useful methods for this. You can use the split(String, char) to split the strings and then use countMatches(String, String) to find how many times a string occurs. Commented May 22, 2017 at 8:36
  • Sorry, I forgot to add the Signature Commented May 22, 2017 at 8:36
  • ngramSize is a paramter Commented May 22, 2017 at 8:36
  • That's more than a constructor, man! That's the whole kit and kaboodle! Commented May 22, 2017 at 8:37

3 Answers 3

2

A simple solution should use the Map<String, Integer> ngram and, while iterating on your list of ngram, for each key (aka String) found in your input update the counter (aka Integer).

Sign up to request clarification or add additional context in comments.

Comments

0

This method creates a HashMap with the keys being the different items and the values the item count. I think the code is pretty easy to understand but ask if there's something that isn't clear or might be wrong

public Map<String, Integer> ngram(String inp, Integer n)
{
    Map<String, Integer> nGram = new HashMap<>();
    for(int i = 0; i < inp.length() - n - 1; i++)
    {
        String item = inp.substring(i, i+n);
        int itemCount = nGram.getOrDefault(item, 0);
        nGram.put(item, itemCount+1);
    }
    return nGram;
}

Comments

0

This code takes the string converts it to same alphabetical case, remove spaces and turns to array. insert each value one by one, if it already exist increment its count by one other wise put the count as one. Good luck

 //take random string, convert to same case to (Lower or upper) then turn to 
character array
        char[] charArray = "This is an example text".replaceAll("\\s","").toLowerCase().toCharArray();
        System.out.println(Arrays.toString(charArray));
        Map<Character, Integer> charCount = new HashMap<>();
        for (char c : charArray){
            //if key doesnt exist put it and update count value to 1
            if(!charCount.containsKey(c)){
                charCount.put(c, 1);
            }else{
                //if key exist increment value by 1
                charCount.put(c, charCount.get(c) + 1);
            }
        }

        System.out.println(charCount.toString());

output:

[t, h, i, s, i, s, a, n, e, x, a, m, p, l, e, t, e, x, t]
{p=1, a=2, s=2, t=3, e=3, h=1, x=2, i=2, l=1, m=1, n=1}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.