28

The following SOF question How to run script in Pyspark and drop into IPython shell when done? tells how to launch a pyspark script:

 %run -d myscript.py

But how do we access the existin spark context?

Just creating a new one does not work:

 ---->  sc = SparkContext("local", 1)

 ValueError: Cannot run multiple SparkContexts at once; existing 
 SparkContext(app=PySparkShell, master=local) created by <module> at 
 /Library/Python/2.7/site-packages/IPython/utils/py3compat.py:204

But trying to use an existing one .. well what existing one?

In [50]: for s in filter(lambda x: 'SparkContext' in repr(x[1]) and len(repr(x[1])) < 150, locals().iteritems()):
    print s
('SparkContext', <class 'pyspark.context.SparkContext'>)

i.e. there is no variable for a SparkContext instance

2
  • What happens when you run this first: from pyspark import SparkContext? Commented May 4, 2015 at 13:59
  • 4
    With Spark 2.0.0 onwards, the sparkSession which you can create without a clash has a sparkContext property to access the original context. Commented Jan 25, 2017 at 12:20

4 Answers 4

80

Include the following:

from pyspark.context import SparkContext

and then invoke a static method on SparkContext as:

sc = SparkContext.getOrCreate()
Sign up to request clarification or add additional context in comments.

3 Comments

Add some explanation with answer for how this answer help OP in fixing current issue
sc is the existing SparkContext OP is looking for. Earlier there was no way to obtain an existing SparkContext, but the static method getOrCreate() was added to get and existing context or create a new one if one does not exist.
It works for me! 3x! but can you explain that please?
10

If you created a already a SparkSession:

spark = SparkSession \
    .builder \
    .appName("StreamKafka_Test") \
    .getOrCreate()

Then you can access the "existing" SparkContext like this:

sc = spark.sparkContext

Comments

4

Standalone python script for wordcount : write a reusable spark context by using contextmanager

"""SimpleApp.py"""
from contextlib import contextmanager
from pyspark import SparkContext
from pyspark import SparkConf


SPARK_MASTER='local'
SPARK_APP_NAME='Word Count'
SPARK_EXECUTOR_MEMORY='200m'

@contextmanager
def spark_manager():
    conf = SparkConf().setMaster(SPARK_MASTER) \
                      .setAppName(SPARK_APP_NAME) \
                      .set("spark.executor.memory", SPARK_EXECUTOR_MEMORY)
    spark_context = SparkContext(conf=conf)

    try:
        yield spark_context
    finally:
        spark_context.stop()

with spark_manager() as context:
    File = "/home/ramisetty/sparkex/README.md"  # Should be some file on your system
    textFileRDD = context.textFile(File)
    wordCounts = textFileRDD.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
    wordCounts.saveAsTextFile("output")

print "WordCount - Done"

to launch:

/bin/spark-submit SimpleApp.py

Comments

1

When you type pyspark at the terminal, python automatically creates the spark context sc.

2 Comments

That's the bin/pyspark program not a standalone pyspark script.
And sc variable is not created automatically, SparkContext instance is created automatically.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.