Why is Weka GUI output different from Java code?

Question

Why is that the result from running the filter StringToWordVector in Weka GUI is different from the equivalent java code? I use the same attributes as I used in the gui but the tokenizer in java doesn't seem to do a proper job! I was told by a Ph.D student that it is common and no further answer from him.

Please help. My project is stalled.

Here is my code:

DataSource tempSource = new DataSource("/home/r_omio/Dataset.arff");
Instances temp = tempSource.getDataSet();
NumericToBinary nbTemp = new NumericToBinary();
nbTemp.setInputFormat(temp);
temp = Filter.useFilter(temp, nbTemp);
StringToWordVector stringFilterTemp = new StringToWordVector(2500);

stringFilterTemp.setOptions( 
    weka.core.Utils.splitOptions("-R 1,2,3,4 -W 2500 -prune-rate -1.0 <br>-N 1 -stemmer weka.core.stemmers.NullStemmer -M 1 -tokenizer weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?![]_\"")
 );


stringFilterTemp.setInputFormat(temp);
temp = Filter.useFilter(temp, stringFilterTemp);

What are you expecting, and what does it do differently?

Brian Roach
– Brian Roach

2011-04-22 02:34:39 +00:00
Commented Apr 22, 2011 at 2:34 — Brian Roach
– Brian Roach, Commented Apr 22, 2011 at 2:34

michaeltwofish · Accepted Answer · 2011-04-22 05:40:15Z

1

I suspect your delimiters are incorrectly escaped. Try using the default delimiters in the GUI and leaving the tokenizer out in Java, which will use the default, and see if you get the same value.

answered Apr 22, 2011 at 5:40

michaeltwofish

4,0663 gold badges30 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why is Weka GUI output different from Java code?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related