0

I have this code that reads and counts every word in a txt file, however I only want it to count each word on a line once, and so I'm trying to create a HashSet however I'm having trouble converting an ArrayList to a HashSet. Here's my code:

try {
    List<String> list = new ArrayList<String>();
    int totalWords = 0;
    int uniqueWords = 0;
    File fr = new File("filename.txt");
    Scanner sc = new Scanner(fr);
    while (sc.hasNext()) {
        String words = sc.next();
        String[] space = words.split(" ");
        Set<String> set = new HashSet<String>(Arrays.asList(space));
        for (int i = 0; i < set.length; i++) {
            list.add(set[i]);
        }
        totalWords++;
    }
    System.out.println("Words with their frequency..");
    Set<String> uniqueSet = new HashSet<String>(list);
    for (String word : uniqueSet) {
        System.out.println(word + ": " + Collections.frequency(list,word));
    }
} catch (Exception e) {

    System.out.println("File not found");

  }  

If anyone could help on why length "cannot be resolved or is not a field", and also why I have an error on "set[i]" telling me it must be resolved to a String. Thank you

8
  • 2
    Remember Java doesn't support operator overloading. You can't use [] on any non-array object. Commented Jan 24, 2018 at 11:03
  • 1
    Use for in range to loop over each element of the set. Commented Jan 24, 2018 at 11:03
  • if the file contains the same word several times in different lines, how often should it be counted? Commented Jan 24, 2018 at 11:13
  • @XtremeBaumer For example if the word "dog" is on line 1 twice, and line 2 once, it should be counted only twice, for it is present on two lines. Commented Jan 24, 2018 at 11:17
  • so you don't care about the 3rd occurence at all and just ignore it (not counting it anywhere)? Commented Jan 24, 2018 at 11:19

2 Answers 2

1

As you have been told in comments , you cannot use [] nor length as any Set is a Collection and not an array:

You could try this way:

try {
    List<String> list = new ArrayList<String>();
    int totalWords = 0;
    int uniqueWords = 0;
    File fr = new File("filename.txt");
    Scanner sc = new Scanner(fr);
    while (sc.hasNext()) {
         String words = sc.next();
         String[] space = words.split(" ");
         Set<String> set = new HashSet<String>(Arrays.asList(space));
         for(String element : set){
              list.add(element);
         }
         totalWords++;
    }
    System.out.println("Words with their frequency..");
    Set<String> uniqueSet = new HashSet<String>(list);
    for (String word : uniqueSet) {
         System.out.println(word + ": " + Collections.frequency(list,word));
    }
} catch (Exception e) {
    System.out.println("File not found");
} 
Sign up to request clarification or add additional context in comments.

2 Comments

Hey thanks for your reply. This method still counts every time the word occurs in the document and not the number of lines it occurs on. I've been trying to use HashSet as it removes the duplicates however in this instance something is wrong
I did not review the algoithm itself, only the compiler complaints.
0

I have used a map data structure to store and update the words and their respective frequencies..

As per your requirement: Each word should be counted just once even though they appear multiple times in a single line.

Iterate over the each line:

 Store all the words in the set.

 Now just iterate over this set and update the map data structure.

Hence, at last the values corresponding to the word in the map will be the required frequencies.

You can look at my code below:

import java.io.File;
import java.util.*;

public class sol {
    public static void main(String args[]) {
        try {
            File fr = new File("file.txt");
            Scanner sc = new Scanner(fr);

            // to maintain frequency of each word after reading each line..
            HashMap<String, Integer> word_frequency = new HashMap<String, Integer>();

            while(sc.hasNextLine()) {
                // input the line..
                String line = sc.nextLine();
                String words[] = line.split(" ");

                // just store which unique words are there in this line..
                HashSet<String> unique_words = new HashSet<String>();

                for(int i=0;i<words.length;i++) {
                    unique_words.add(words[i]);     // add it to set..
                }

                // Iterate on the set now to update the frequency..
                Iterator itr = unique_words.iterator();

                while(itr.hasNext()) {
                    String word = (String)itr.next();   

                    // If this word is already there then just increment it..
                    if(word_frequency.containsKey(word)) {
                        int old_frequency = word_frequency.get(word);
                        int new_frequency = old_frequency + 1;
                        word_frequency.put(word, new_frequency);
                    }
                    else {
                        // If this word is not there then put this 
                        // new word in the map with frequency 1..

                        word_frequency.put(word, 1);
                    }
                }
            }

            // Now, you have all the words with their respective frequencies..
            // Just print the words and their frequencies..
            for(Map.Entry obj : word_frequency.entrySet()) {
                String word = (String)obj.getKey();
                int frequency = (Integer)obj.getValue();

                System.out.println(word+": "+frequency);
            }
        }
        catch(Exception e) {
            // throw whatever exception you wish.. 
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.