1

I started trying out the Weka GUI application to learn how I want to build my text classifier and I successfully built and saved a model using the GUI.

Now, I want to implement the classifier in Java code. But I can't seem to set the stopwords and tokenizer settings of the StringToWordVector filter in code like I did in the GUI. (See the screenshot.)

enter image description here

(Of course, without the stopwords handler set to NULL.)

I am aware that I can load the model I created and saved from the GUI, into the code. But I need to implement the filter in Java.

I tried to use the code here: Different results in Weka GUI and Weka via Java code Mainly, this part (of course, after changing the path):

 String opt = "-W -P 0 -M 5.0 -norm 1.0 -lnorm 2.0 -lowercase -stoplist -        stopwords C:\\Users\\Fernando\\workspace\\GPCommentsAnalyzer\\pt-br_stopwords.dat -tokenizer \"weka.core.tokenizers.NGramTokenizer -delimiters ' \\r\\n\\t.,;:\\\'\\\"()?!\' -max 2 -min 1\" -stemmer weka.core.stemmers.NullStemmer";

But, it still doesn't work.

I can't find any documentation about this topic anywhere. Any help would be much appreciated!

(I am using Weka version 3.7.12)

1 Answer 1

1

Set your configuration using GUI, then use copy configuration to clipboard option in context menu.

Copy config to clipboard

Sign up to request clarification or add additional context in comments.

5 Comments

imgur.com/VVhCisZ I tried your suggestion, but the errors in the attached image came up. What am I doing wrong?
@user1910524 I can't see your image.
@user1910524 I think you should use as String str=" -R first-last ...", that is your option string should not contain filter name since you already set filter in your code before.
I will try your suggestion and let you know again. Thank you Atilla.
Hello again Atilla, I have been working on the classifier, and implemented your notes/advice. The classifier is working, but under certain conditions. I wanted to be able to set the stopwordsHandler to a file in which I will put my own stopwords (as opposed to the Rainbow stopwords). When I try to use stopwords from my own file, it doesn't work. (Please see the image here: imgur.com/culPw1Z,PF7EfGn) When I use the Rainbow stopwords, the classifier works, and classifies successfully. (Please see the image here: imgur.com/culPw1Z,PF7EfGn#1) Why is this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.