run python sklearn classifier from java

Question

I trained a SVC classifier in python using Sklearn and other libraries. I did it through building pipeline(sklearn) I am able to dump the trained model in pickle file and made another python script which would load the pickle file and takes input from command line to do prediction. I am able to call this python script from java and its working fine. Only issue is that it takes a lot of time, as I have nltk, numpy, panda libraries called in the python script, required for the preprocessing of the input argument. I am calling this python script multiple times and that's increasing the time. How can I work around this issue.

thats how my pipleline looks

pipeline = Pipeline([

# Use FeatureUnion to combine the features from dataset
('union', FeatureUnion(
    transformer_list=[

        # Pipeline for getting POS 
       ('ngrams', Pipeline([
            ('selector', ItemSelector(key='Sentence')),
            ('vect', CountVectorizer(analyzer='word')),
            ('tfidf', TfidfTransformer()),
        ])),


    ],

    # weight components in FeatureUnion
    transformer_weights={
        'ngrams': 0.7,
    },
)),

# Use a SVC classifier on the combined features
('clf', LinearSVC()),
])

Optimizing the feature extraction process is not trivial but my guess is that you're spending a lot of time on loading the model from memory on every call to the script. Consider using a long running process and communicate with it using http or another IPC method. — gidim
– gidim, Commented May 21, 2018 at 18:48
You can also use github.com/jpmml/jpmml-sklearn, to convert your model to PMML and then load them directly in java. — Vivek Kumar
– Vivek Kumar, Commented May 22, 2018 at 1:17

gidim · Accepted Answer · 2018-05-21 19:13:51Z

Here's an example of setting a simple FLASK serving REST API for a scikit model.

import sys
import os
import time
import traceback

from flask import Flask, request, jsonify
from sklearn.externals import joblib

app = Flask(__name__)


model_directory = 'model'
model_file_name = '%s/model.pkl' % model_directory

# These will be populated at training time
clf = None


@app.route('/predict', methods=['POST'])
def predict():
    if clf:
        try:
            json_ = request.json
            # query = get the payload from the json and feed it to your model 
            prediction = list(clf.predict(query))

            return jsonify({'prediction': prediction})

        except Exception, e:

            return jsonify({'error': str(e), 'trace': traceback.format_exc()})
    else:
        return 'no model here'



if __name__ == '__main__':
    try:
        port = int(sys.argv[1])
    except Exception, e:
        port = 80

    try:
        clf = joblib.load(model_file_name)
        print 'model loaded'

    app.run(host='0.0.0.0', port=port, debug=True)

Collectives™ on Stack Overflow

run python sklearn classifier from java

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related