1

Why is that the result from running the filter StringToWordVector in Weka GUI is different from the equivalent java code? I use the same attributes as I used in the gui but the tokenizer in java doesn't seem to do a proper job! I was told by a Ph.D student that it is common and no further answer from him.

Please help. My project is stalled.

Here is my code:

DataSource tempSource = new DataSource("/home/r_omio/Dataset.arff");
Instances temp = tempSource.getDataSet();
NumericToBinary nbTemp = new NumericToBinary();
nbTemp.setInputFormat(temp);
temp = Filter.useFilter(temp, nbTemp);
StringToWordVector stringFilterTemp = new StringToWordVector(2500);

stringFilterTemp.setOptions( 
    weka.core.Utils.splitOptions("-R 1,2,3,4 -W 2500 -prune-rate -1.0 <br>-N 1 -stemmer weka.core.stemmers.NullStemmer -M 1 -tokenizer weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?![]_\"")
 );


stringFilterTemp.setInputFormat(temp);
temp = Filter.useFilter(temp, stringFilterTemp);
1
  • What are you expecting, and what does it do differently? Commented Apr 22, 2011 at 2:34

1 Answer 1

1

I suspect your delimiters are incorrectly escaped. Try using the default delimiters in the GUI and leaving the tokenizer out in Java, which will use the default, and see if you get the same value.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.