How to query spark sql from a python app?

Question

I have set a test Cassandra + Spark cluster. I am able to successfully query Cassandra from spark if I do the following:

import org.apache.spark.sql.cassandra.CassandraSQLContext
import import sqlContext.implicits._
val cc = new CassandraSQLContext(sc)
val dataframe = cc.sql("select * from my_cassandra_table") 
dataframe.first

I would now like to query data from a python we app. All the docs on the web seem to show how to use spark's python shell (where the context, 'sc', is implicitly provided).

I need to be able to run spark SQL from an independent python script, perhaps one which serves web pages.

I haven't found any docs, no help on apache-spark irc channel. Am I just thinking about this wrong? Are there other tools which provide spark SQL to less technical users? I'm completely new to spark.

@zero323 the quick start docs show how to write a python script, then 'submit' it to pyspark. I want something similar to the way someone might use a pgsql or mysql driver for run-of-the-mill python web app. Python script starts with a 'main' method, imports all the libraries and every once in a while executes spark-sql queries — Shahbaz
– Shahbaz, Commented Jan 6, 2016 at 22:24
spark-submit is just a convenience wrapper. As long as all settings are correct it is not really required. What you see in the docs is a valid standalone application. — zero323
– zero323, Commented Jan 6, 2016 at 22:36

Brian Clapper · Accepted Answer · 2016-01-06 20:30:51Z

From the Spark Programming Guide:

The first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

The appName parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode. In practice, when running on a cluster, you will not want to hardcode master in the program, but rather launch the application with spark-submit and receive it there. However, for local testing and unit tests, you can pass “local” to run Spark in-process.

You can then test your program with spark-submit.

Collectives™ on Stack Overflow

How to query spark sql from a python app?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related