1

Currently I am setting the pandas dataframe into a csv and loading it as weka dataset from CSV loader . Is there a mechanism to to directly load pandas dataframe into weka dataset without creating a intermediate CSV file in between

learn_df = pd.DataFrame.from_records([s.to_dict() for s in learnList])
header = ["reviewId","word","type","positive_sentiment","negative_sentiment","number_of_noun","sentence","hasNeg","overallSentiment","sentiment"]
learn_df.to_csv(helper.get_data_dir() + os.sep + "resultTest.csv", index=None, header=True,columns=header)
diabetes_file = helper.get_data_dir() + os.sep + "resultTest.csv"
helper.print_info("Loading dataset: " + diabetes_file)
loader = Loader("weka.core.converters.CSVLoader")

diabetes_data = loader.load_file(diabetes_file)
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2,7"])
remove.inputformat(diabetes_data)
filtered = remove.filter(diabetes_data)
//code to classify instances here

Each time converting to csv and loading from csv to classify makes it a costly process . IS there a mechanism to avoid this ?

1 Answer 1

3

@Manish You can either convert the pandas dataframe into a list or a numpy matrix and then use the weka methods create_instances_from_lists() and create_instances_from_matrices().

For more details you can look into the weka examples at http://fracpete.github.io/python-weka-wrapper/examples.html

Regarding the setting of last column to nominal type instead of numeric, as mentioned in the comments by @Pedro Pablo Severin Honorato, you can use weka filters.

An example for the same is as under:

from weka.filters import Filter

num_to_nom = Filter(classname="weka.filters.unsupervised.attribute.StringToNominal", options=["-R", "last"])
num_to_nom.inputformat(data)      #data is the weka dataset whose last column is numeric.
newData=num_to_nom.filter(data)   #newData is the weka dataset whose last column is nominal.

Hope this helps!

Sign up to request clarification or add additional context in comments.

1 Comment

When using this methods, how can the last column be set to nominal type instead of numeric, assuming that the last column is the label we are trying to classify?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.