1

Okay, so I have a simple interface that I designed with the Django framework that takes natural language input from a user and stores it in table.

Additionally I have a pipeline that I built with Java using the cTAKES library to do named entity recognition i.e. it will take the text input submitted by the user and annotate it with relevant UMLS tags.

What I want to do is take the input given from the user then once, its submitted, direct it into my java-cTAKES pipeline then feed the annotated output back into the database.

I am pretty new to the web development side of this and can't really find anything on integrating scripts in this sense. So, if someone could point me to a useful resource or just in the general right direction that would be extremely helpful.

========================= UPDATE:

Okay, so I have figured out that the subprocess is the module that I want to use in this context and I have tried implementing some simple code based on the documentation but I am getting an

Exception Type: OSError
Exception Value: [Errno 2] No such file or directory
Exception Location: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py in _execute_child, line 1335.  

A brief overview of what I'm trying to do:

This is the code I have in views. Its intent is to take text input from the model form, POST that to the DB and then pass that input into my script which produces an XML file which is stored in another column in the DB. I'm very new to django so I'm sorry if this is an simple fix, but I couldn't find any documentation relating django to subprocess that was helpful.

def queries_create(request):
    if not request.user.is_authenticated():
       return render(request, 'login_error.html')


    form = QueryForm(request.POST or None)
    if form.is_valid():
      instance = form.save(commit=False)
      instance.save()
      p=subprocess.Popen([request.POST['post'], './path/to/run_pipeline.sh'])
      p.save()

    context = {

      "title":"Create",
      "form": form,

    }
    return render(request, "query_form.html", context) 

Model code snippet:

class Query(models.Model):
  problem/intervention = models.TextField()

  updated = models.DateTimeField(auto_now=True, auto_now_add=False)
  timestamp = models.DateTimeField(auto_now=False, auto_now_add=True)

UPDATE 2: Okay so the code is no longer breaking by changing the subprocess code as below

def queries_create(request):
    if not request.user.is_authenticated():
       return render(request, 'login_error.html')


    form = QueryForm(request.POST or None)
    if form.is_valid():
      instance = form.save(commit=False)
      instance.save()
      p = subprocess.Popen(['path/to/run_pipeline.sh'], stdin=subprocess.PIPE,     
      stdout=subprocess.PIPE)
      (stdoutdata, stderrdata) = p.communicate()
      instance.processed_data = stdoutdata
      instance.save()

    context = {

      "title":"Create",
      "form": form,

    }
    return render(request, "query_form.html", context) 

However, I am now getting a "Could not find or load main class pipeline.CtakesPipeline" that I don't understand since the script runs fine from the shell in this working directory. This is the script I am trying to call with subprocess.

#!/bin/bash

INPUT=$1
OUTPUT=$2
CTAKES_HOME="full/path/to/CtakesClinicalPipeline/apache-ctakes-3.2.2"
UMLS_USER="####"
UMLS_PASS="####"
CLINICAL_PIPELINE_JAR="full/path/to/CtakesClinicalPipeline/target/
CtakesClinicalPipeline-0.0.1-SNAPSHOT.jar"

[[ $CTAKES_HOME == "" ]] && CTAKES_HOME=/usr/local/apache-ctakes-3.2.2

CTAKES_JARS=""
for jar in $(find ${CTAKES_HOME}/lib -iname "*.jar" -type f)
do
  CTAKES_JARS+=$jar
  CTAKES_JARS+=":"
done

current_dir=$PWD
cd $CTAKES_HOME

java -Dctakes.umlsuser=${UMLS_USER} -Dctakes.umlspw=${UMLS_PASS} -cp    
${CTAKES_HOME}/desc/:${CTAKES_HOME}/resources/:${CTAKES_JARS%?}:
${current_dir}/${CLINICAL_PIPELINE_JAR} -    
-Dlog4j.configuration=file:${CTAKES_HOME}/config/log4j.xml -Xms512M -Xmx3g   
pipeline.CtakesPipeline $INPUT $OUTPUT

cd $current_dir

I'm not sure how to go about fixing this error so any help is appreciated.

2
  • 1
    I am not at all familiar with cTAKES, so I apologize if this is an ignorant question: are you already running this Java service on an existing machine and looking to pipe data to it from your web app, or are you looking to deploy both the web app and the Java app? Commented Mar 21, 2016 at 15:00
  • 1
    I'm looking to deploy the pipeline as part of the web app. I want to use the java script internally. Commented Mar 21, 2016 at 16:14

1 Answer 1

2

If I understand you correctly, you want to pipe the value of request.POST['post'] to the program run_pipeline.sh and store the output in a field of your instance.

  1. You are calling subprocess.Popen incorrectly. It should be:

    p = subprocess.Popen(['/path/to/run_pipeline.sh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

  2. Then pass in the input and read the output

    (stdoutdata, stderrdata) = p.communicate()

  3. Then save the data, e.g. in a field of your instance

    instance.processed_data = stdoutdata instance.save()

I suggest you first make sure to get the call to the subprocess working in a Python shell and then integrate it in your Django app.

Please note that creating a (potentially long-running) subprocess in a request is really bad practice and can lead to a lot of problems. The best practice is to delegate long-running tasks in a job queue. For Django, Celery is probably most commonly used. There is a bit of setup involved, though.

Sign up to request clarification or add additional context in comments.

5 Comments

Hey thanks for the help! I am trying to get this working now and its working much better. However, the code I'm trying to run through the bash script is java and I'm getting a "Could not find or load main class" error. Is this something to do with the subprocess parameters?
That sounds like an issue with your class path. Can you run the exact command you use to call the script in the same working directory you are running your Python script in?
Yeah, I just tried that and its working fine. Its only when I try to run it through the django application that it can't find the main class.
I think I found the issue in my bash script but I'm not sure how to fix it. For the jar class I want to run through my bash script I specify the argument as CLINICAL_PIPELINE_JAR="target/CtakesClinicalPipeline-0.0.1-SNAPSHOT.jar" which runs frine from the standard shell, when I change it to the full path however, I get the same error as when trying to run it through django. Would I specify the whole path in the django based call then?
Hi, I tried a few things to no avail, but I posted the full bash script above in my edits.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.