1

Good Morning

I write a function that calculates for me the frequency of a term:

public static int tfCalculator(String[] totalterms, String termToCheck) {
    int count = 0;  //to count the overall occurrence of the term termToCheck
    for (String s : totalterms) {
        if (s.equalsIgnoreCase(termToCheck)) {
            count++; 
        }
    } 
    return count;
}

and after that I use it on the code below to calculate every word from a String[] words

for(String word:words){
    int freq = tfCalculator(words, word);

    System.out.println(word + "|" + freq);
    mm+=word + "|" + freq+"\n";
}

well the problem that I have is that the words repeat here is for example the result:

  • cytoskeletal|2
  • network|1
  • enable|1
  • equal|1
  • spindle|1
  • cytoskeletal|2
  • ...
  • ...

so can someone help me to remove the repeated word and get as result like that:

  • cytoskeletal|2
  • network|1
  • enable|1
  • equal|1
  • spindle|1
  • ...
  • ...

Thank you very much!

5
  • 1
    put the array in a Set and the duplicates are gone Commented Mar 10, 2016 at 13:41
  • @KevinEsche not good to calculate the frequency though. I'd use a Map<Integer, String> to map the words to their frequencies. Then again, there's quite a few better ways to calculate the frequency itself, but that's another story. Commented Mar 10, 2016 at 13:45
  • Can you please post the full program? Commented Mar 10, 2016 at 13:48
  • 1
    Side note on code quality: some of the names of your variables (like mm) or methods ... are pretty bad. If you give your method a name that says what it really does; things become much clearer. Like: int countOccurancesOfTerm(String term, String[] stringsToCheck) or something alike. Commented Mar 10, 2016 at 13:54
  • @Mena thank you for the answer can you explain how i'd work with Map<Integer, String>, or anathor way to calculate the Term frequency Commented Mar 10, 2016 at 14:21

6 Answers 6

2

Java 8 solution

words = Arrays.stream(words).distinct().toArray(String[]::new);

the distinct method removes duplicates. words is replaced with a new array without duplicates

Sign up to request clarification or add additional context in comments.

Comments

0

I think here you want to print the frequency of each string in the array totalterms . I think using Map is a easier solution as in the single traversal of the array it will store the frequency of all the strings Check the following implementation.

public static void printFrequency(String[] totalterms)
{
    Map frequencyMap = new HashMap<String, Integer>();

    for (String string : totalterms) {
        if(frequencyMap.containsKey(string))
        {
            Integer count = (Integer)frequencyMap.get(string);
            frequencyMap.put(string, count+1);
        }
        else
        {
            frequencyMap.put(string, 1);
        }
    }

    Set <Entry<String, Integer>> elements= frequencyMap.entrySet();

    for (Entry<String, Integer> entry : elements) {
        System.out.println(entry.getKey()+"|"+entry.getValue());
    }
}

2 Comments

thank you @Saurav this work well for me i really appreciate it
This is extremely verbose and unnecessary... Better check some of the other solutions @HaKiM's
0

You can just use a HashSet and that should take care of the duplicates issue:

words = new HashSet<String>(Arrays.asList(words)).toArray(new String[0]);

This will take your array, convert it to a List, feed that to the constructor of HashSet<String>, and then convert it back to an array for you.

Comments

0

Sort the array, then you can just count equal adjacent elements:

Arrays.sort(totalterms);
int i = 0;
while (i < totalterms.length) {
  int start = i;
  while (i < totalterms.length && totalterms[i].equals(totalterms[start])) {
    ++i;
  }
  System.out.println(totalterms[start] + "|" + (i - start));
}

Comments

0

in two line :


String s = "cytoskeletal|2 - network|1 - enable|1 - equal|1 - spindle|1 - cytoskeletal|2"; System.out.println(new LinkedHashSet(Arrays.asList(s.split("-"))).toString().replaceAll("(^\[|\]$)", "").replace(", ", "- "));

Comments

0

Your code is fine, you just need keep track of which words were encountered already. For that you can keep a running set:

Set<String> prevWords = new HashSet<>();
for(String word:words){
    // proceed if word is new to the set, otherwise skip
    if (prevWords.add(word)) {
        int freq = tfCalculator(words, word);

        System.out.println(word + "|" + freq);
        mm+=word + "|" + freq+"\n";
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.