0

I trained a SVC classifier in python using Sklearn and other libraries. I did it through building pipeline(sklearn) I am able to dump the trained model in pickle file and made another python script which would load the pickle file and takes input from command line to do prediction. I am able to call this python script from java and its working fine. Only issue is that it takes a lot of time, as I have nltk, numpy, panda libraries called in the python script, required for the preprocessing of the input argument. I am calling this python script multiple times and that's increasing the time. How can I work around this issue.

thats how my pipleline looks

pipeline = Pipeline([

# Use FeatureUnion to combine the features from dataset
('union', FeatureUnion(
    transformer_list=[

        # Pipeline for getting POS 
       ('ngrams', Pipeline([
            ('selector', ItemSelector(key='Sentence')),
            ('vect', CountVectorizer(analyzer='word')),
            ('tfidf', TfidfTransformer()),
        ])),


    ],

    # weight components in FeatureUnion
    transformer_weights={
        'ngrams': 0.7,
    },
)),

# Use a SVC classifier on the combined features
('clf', LinearSVC()),
])
3
  • Optimizing the feature extraction process is not trivial but my guess is that you're spending a lot of time on loading the model from memory on every call to the script. Consider using a long running process and communicate with it using http or another IPC method. Commented May 21, 2018 at 18:48
  • Do you have any example how would I do that using http? Commented May 21, 2018 at 19:04
  • You can also use github.com/jpmml/jpmml-sklearn, to convert your model to PMML and then load them directly in java. Commented May 22, 2018 at 1:17

1 Answer 1

1

Here's an example of setting a simple FLASK serving REST API for a scikit model.

import sys
import os
import time
import traceback

from flask import Flask, request, jsonify
from sklearn.externals import joblib

app = Flask(__name__)


model_directory = 'model'
model_file_name = '%s/model.pkl' % model_directory

# These will be populated at training time
clf = None


@app.route('/predict', methods=['POST'])
def predict():
    if clf:
        try:
            json_ = request.json
            # query = get the payload from the json and feed it to your model 
            prediction = list(clf.predict(query))

            return jsonify({'prediction': prediction})

        except Exception, e:

            return jsonify({'error': str(e), 'trace': traceback.format_exc()})
    else:
        return 'no model here'



if __name__ == '__main__':
    try:
        port = int(sys.argv[1])
    except Exception, e:
        port = 80

    try:
        clf = joblib.load(model_file_name)
        print 'model loaded'

    app.run(host='0.0.0.0', port=port, debug=True)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.