PySpark error: AttributeError: 'NoneType' object has no attribute '_jvm'

Question

I have timestamp dataset which is in format of

And I have written a udf in pyspark to process this dataset and return as Map of key values. But am getting below error message.

Dataset:df_ts_list

+--------------------+
|             ts_list|
+--------------------+
|[1477411200, 1477...|
|[1477238400, 1477...|
|[1477022400, 1477...|
|[1477224000, 1477...|
|[1477256400, 1477...|
|[1477346400, 1476...|
|[1476986400, 1477...|
|[1477321200, 1477...|
|[1477306800, 1477...|
|[1477062000, 1477...|
|[1477249200, 1477...|
|[1477040400, 1477...|
|[1477090800, 1477...|
+--------------------+

Pyspark UDF:

>>> def on_time(ts_list):
...     import sys
...     import os
...     sys.path.append('/usr/lib/python2.7/dist-packages')
...     os.system("sudo apt-get install python-numpy -y")
...     import numpy as np
...     import datetime
...     import time
...     from datetime import timedelta
...     ts = np.array(ts_list)
...     if ts.size == 0:
...             count = 0
...             duration = 0
...             st = time.mktime(datetime.now())
...             ymd = str(datetime.fromtimestamp(st).date())
...     else:
...             ts.sort()
...             one_tag = []
...             start = float(ts[0])
...             for i in range(len(ts)):
...                     if i == (len(ts)) - 1:
...                             end = float(ts[i])
...                             a_round = [start, end]
...                             one_tag.append(a_round)
...                     else:
...                             diff = (datetime.datetime.fromtimestamp(float(ts[i+1])) - datetime.datetime.fromtimestamp(float(ts[i])))
...                             if abs(diff.total_seconds()) > 3600:
...                                     end = float(ts[i])
...                                     a_round = [start, end]
...                                     one_tag.append(a_round)
...                                     start = float(ts[i+1])
...             one_tag = [u for u in one_tag if u[1] - u[0] > 300]
...             count = int(len(one_tag))
...             duration = int(np.diff(one_tag).sum())
...             ymd = str(datetime.datetime.fromtimestamp(time.time()).date())
...     return {'count':count,'duration':duration, 'ymd':ymd}

Pyspark code:

>>> on_time=udf(on_time, MapType(StringType(),StringType()))
>>> df_ts_list.withColumn("one_tag", on_time("ts_list")).select("one_tag").show()

Error:

Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/worker.py", line 172, in main
    process()
  File "/usr/lib/spark/python/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/lib/spark/python/pyspark/worker.py", line 106, in <lambda>
    func = lambda _, it: map(mapper, it)
  File "/usr/lib/spark/python/pyspark/worker.py", line 92, in <lambda>
    mapper = lambda a: udf(*a)
  File "/usr/lib/spark/python/pyspark/worker.py", line 70, in <lambda>
    return lambda *a: f(*a)
  File "<stdin>", line 27, in on_time
  File "/usr/lib/spark/python/pyspark/sql/functions.py", line 39, in _
    jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
AttributeError: 'NoneType' object has no attribute '_jvm'

Any help would be appreciated!

fmsf · Accepted Answer · 2022-02-15 15:41:20Z

73

Mariusz answer didn't really help me. So if you like me found this because it's the only result on google and you're new to pyspark (and spark in general), here's what worked for me.

In my case I was getting that error because I was trying to execute pyspark code before the pyspark environment had been set up.

Making sure that pyspark was available and set up before doing calls dependent on pyspark.sql.functions fixed the issue for me.

edited Feb 15, 2022 at 15:41

answered Jan 14, 2018 at 14:04

fmsf

37.2k49 gold badges155 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Renée Over a year ago

as an additional for others... i hit this error when my spark session had not been set up and I had defined a pyspark UDF using a decorator to add the schema. I normally set up spark session in my main, but in this case, when passing a complex schema needed to set it up at the top of script. thanks for the quick hint!

JacksonHaenchen Over a year ago

To add on to this, I got this error when using a spark function in a default value for a function, since those are evaluated at import time, not call-time. E.g. def func(is_test = lit(False))

Ric S Over a year ago

Or, for others as stupid as me, you can encounter this error if you write pyspark code inside a pandas_udf (which is supposed to receive pandas code...)

JacksonHaenchen Over a year ago

@Mari all I can advise is that you cannot use pyspark functions before the spark context is initialized. In my case I was using them as a default arg value, but those are evaluated at import time, not runtime, so the spark context is not initialized. So I just changed it to None and checked inside the function.

Brendan Over a year ago

@Mari I ran into this recently. If you want to take this construction, instead of assigning it as a variable, return it via a function. E.g. def is_not_empty(): return (col('var') != lit('')). Then use it as a function instead of a variable. This allows it to be instantiated when called (after spark context is initialized) rather than when the module is loaded.

|

Mariusz · Accepted Answer · 2016-11-01 10:30:12Z

25

The error message says that in 27th line of udf you are calling some pyspark sql functions. It is line with abs() so I suppose that somewhere above you call from pyspark.sql.functions import * and it overrides python's abs() function.

answered Nov 1, 2016 at 10:30

Mariusz

14k3 gold badges66 silver badges66 bronze badges

7 Comments

egeogretmen Over a year ago

Is there a way to use the original abs() function without deleting the line from pyspark.sql.functions import * ?

Mariusz Over a year ago

@mufmuf sure, you can use __builtin__.abs as a pointer to python's function

Vaibhav Over a year ago

or you can import pyspark.sql.functions as F and use F.function_name to call pyspark functions

pnv Over a year ago

Thank you so much, instead of abs, I had round.

KartikKannapur Over a year ago

I used math.fabs() instead

|

SARose · Accepted Answer · 2020-02-15 17:17:34Z

12

Just to be clear the problem a lot of guys are having is stemming from a single bad programming style. That is from blah import *

When you guys do

from pyspark.sql.functions import *

you overwrite a lot of python builtins functions. I strongly recommending importing functions like

import pyspark.sql.functions as f
# or 
import pyspark.sql.functions as pyf

answered Feb 15, 2020 at 17:17

SARose

3,7635 gold badges42 silver badges52 bronze badges

1 Comment

Maeror Over a year ago

This advice helped me correct my bad habit of using '*' when importing. Hope others would correct this too

Gustavo Frederico · Accepted Answer · 2018-11-14 03:01:49Z

4

Make sure that you are initializing the Spark context. For example:

spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("...") \
    .getOrCreate()
sqlContext = SQLContext(spark)
productData = sqlContext.read.format("com.mongodb.spark.sql").load()

Or as in

spark = SparkSession.builder.appName('company').getOrCreate()
sqlContext = SQLContext(spark)
productData = sqlContext.read.format("csv").option("delimiter", ",") \
    .option("quote", "\"").option("escape", "\"") \
    .option("header", "true").option("inferSchema", "true") \
    .load("/path/thecsv.csv")

answered Nov 14, 2018 at 3:01

Gustavo Frederico

515 bronze badges

1 Comment

OneCricketeer Over a year ago

You can use the SparkSession to get a Dataframe reader. Don't need the sql context

dsalaj · Accepted Answer · 2020-06-26 09:23:18Z

4

This exception also arises when the udf can not handle None values. For example the following code results in the same exception:

get_datetime = udf(lambda ts: to_timestamp(ts), DateType())
df = df.withColumn("datetime", get_datetime("ts"))

However this one does not:

get_datetime = udf(lambda ts: to_timestamp(ts) if ts is not None else None, DateType())
df = df.withColumn("datetime", get_datetime("ts"))

answered Jun 26, 2020 at 9:23

dsalaj

3,4674 gold badges37 silver badges46 bronze badges

Comments

The Singularity · Accepted Answer · 2021-09-23 10:06:09Z

0

I faced the same issue, when I had python's round() function in my code and like @Mariusz said python's round() function got overridden.

The workaround for this was to use __builtin__.round() instead of round() like @Mariusz mentions in the comments in his answer.

answered Sep 23, 2021 at 10:06

The Singularity

2,7785 gold badges29 silver badges63 bronze badges

1 Comment

OneCricketeer Over a year ago

Or you rename whatever other round function you've defined/imported

OneCricketeer · Accepted Answer · 2022-12-01 13:53:37Z

0

I found this error in my jupyter notebook. I added the below commands

import findspark
findspark.init()
sc = pyspark.SparkContext(appName="<add-your-name-here>")

and it worked.

its the same problem of spark context not ready or Stopped

edited Dec 1, 2022 at 13:53

OneCricketeer

193k20 gold badges146 silver badges276 bronze badges

answered Jun 8, 2021 at 6:24

Mayuresh Dhawan

811 silver badge3 bronze badges

1 Comment

OneCricketeer Over a year ago

You should be using a SparkSession, though. You can get the context from that, if needed

kpandey · Accepted Answer · 2023-02-06 13:21:44Z

0

In all probability, this error occurs due to absence of spark session creation. So, spark session should be created.

spark = SparkSession.builder
       .master('yarn')
       .appName('a').getOrCreate()

This should resolve.

answered Feb 6, 2023 at 13:21

kpandey

12 bronze badges

Collectives™ on Stack Overflow

PySpark error: AttributeError: 'NoneType' object has no attribute '_jvm'

8 Answers 8

7 Comments

7 Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

7 Comments

7 Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related