How to Load Weka data set from pandas dataframe in python

Question

Currently I am setting the pandas dataframe into a csv and loading it as weka dataset from CSV loader . Is there a mechanism to to directly load pandas dataframe into weka dataset without creating a intermediate CSV file in between

learn_df = pd.DataFrame.from_records([s.to_dict() for s in learnList])
header = ["reviewId","word","type","positive_sentiment","negative_sentiment","number_of_noun","sentence","hasNeg","overallSentiment","sentiment"]
learn_df.to_csv(helper.get_data_dir() + os.sep + "resultTest.csv", index=None, header=True,columns=header)
diabetes_file = helper.get_data_dir() + os.sep + "resultTest.csv"
helper.print_info("Loading dataset: " + diabetes_file)
loader = Loader("weka.core.converters.CSVLoader")

diabetes_data = loader.load_file(diabetes_file)
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2,7"])
remove.inputformat(diabetes_data)
filtered = remove.filter(diabetes_data)
//code to classify instances here

Each time converting to csv and loading from csv to classify makes it a costly process . IS there a mechanism to avoid this ?

Mahima · Accepted Answer · 2021-04-27 05:23:09Z

3

@Manish You can either convert the pandas dataframe into a list or a numpy matrix and then use the weka methods create_instances_from_lists() and create_instances_from_matrices().

For more details you can look into the weka examples at http://fracpete.github.io/python-weka-wrapper/examples.html

Regarding the setting of last column to nominal type instead of numeric, as mentioned in the comments by @Pedro Pablo Severin Honorato, you can use weka filters.

An example for the same is as under:

from weka.filters import Filter

num_to_nom = Filter(classname="weka.filters.unsupervised.attribute.StringToNominal", options=["-R", "last"])
num_to_nom.inputformat(data)      #data is the weka dataset whose last column is numeric.
newData=num_to_nom.filter(data)   #newData is the weka dataset whose last column is nominal.

Hope this helps!

edited Apr 27, 2021 at 5:23

answered Apr 9, 2020 at 5:52

Mahima

14210 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pedro Pablo Severin Honorato Over a year ago

When using this methods, how can the last column be set to nominal type instead of numeric, assuming that the last column is the label we are trying to classify?

Collectives™ on Stack Overflow

How to Load Weka data set from pandas dataframe in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related