0

I have set a test Cassandra + Spark cluster. I am able to successfully query Cassandra from spark if I do the following:

import org.apache.spark.sql.cassandra.CassandraSQLContext
import import sqlContext.implicits._
val cc = new CassandraSQLContext(sc)
val dataframe = cc.sql("select * from my_cassandra_table") 
dataframe.first 

I would now like to query data from a python we app. All the docs on the web seem to show how to use spark's python shell (where the context, 'sc', is implicitly provided).

I need to be able to run spark SQL from an independent python script, perhaps one which serves web pages.

I haven't found any docs, no help on apache-spark irc channel. Am I just thinking about this wrong? Are there other tools which provide spark SQL to less technical users? I'm completely new to spark.

3
  • 1
    How about Quick Start documentation? :) Commented Jan 6, 2016 at 19:46
  • @zero323 the quick start docs show how to write a python script, then 'submit' it to pyspark. I want something similar to the way someone might use a pgsql or mysql driver for run-of-the-mill python web app. Python script starts with a 'main' method, imports all the libraries and every once in a while executes spark-sql queries Commented Jan 6, 2016 at 22:24
  • spark-submit is just a convenience wrapper. As long as all settings are correct it is not really required. What you see in the docs is a valid standalone application. Commented Jan 6, 2016 at 22:36

1 Answer 1

2

From the Spark Programming Guide:

The first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

The appName parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode. In practice, when running on a cluster, you will not want to hardcode master in the program, but rather launch the application with spark-submit and receive it there. However, for local testing and unit tests, you can pass “local” to run Spark in-process.


You can then test your program with spark-submit.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.