29

I launch pyspark applications from pycharm on my own workstation, to a 8 node cluster. This cluster also has settings encoded in spark-defaults.conf and spark-env.sh

This is how I obtain my spark context variable.

spark = SparkSession \
        .builder \
        .master("spark://stcpgrnlp06p.options-it.com:7087") \
        .appName(__SPARK_APP_NAME__) \
        .config("spark.executor.memory", "50g") \
        .config("spark.eventlog.enabled", "true") \
        .config("spark.eventlog.dir", r"/net/share/grid/bin/spark/UAT/SparkLogs/") \
        .config("spark.cores.max", 128) \
        .config("spark.sql.crossJoin.enabled", "True") \
        .config("spark.executor.extraLibraryPath","/net/share/grid/bin/spark/UAT/bin/vertica-jdbc-8.0.0-0.jar") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.logConf", "true") \
        .getOrCreate()

    sc = spark.sparkContext
    sc.setLogLevel("INFO")

I want to see the effective config that is being used in my log. This line

        .config("spark.logConf", "true") \

should cause the spark api to log its effective config to the log as INFO, but the default log level is set to WARN, and as such I don't see any messages.

setting this line

sc.setLogLevel("INFO")

shows INFO messages going forward, but its too late by then.

How can I set the default logging level that spark starts with?

1

3 Answers 3

10

http://spark.apache.org/docs/latest/configuration.html#configuring-logging

Configuring Logging

Spark uses log4j for logging. You can configure it by adding a log4j.properties file in the conf directory. One way to start is to copy the existing log4j.properties.template located there.


The following blog about "How to log in spark" https://www.mapr.com/blog/how-log-apache-spark suggest a way to configure log4j, and provide suggestion which includes directing INFO level logs into a file.

Sign up to request clarification or add additional context in comments.

4 Comments

Ok, so is it this setting? log4j.logger.org.apache.spark.repl.Main=INFO
@ThatDataGuy - added info how to configure log4j (and tested that indeed the output file holds "INFO" level log). Note that the sample configuration direct to /var/log - You'll need to direct the log into a directory which is write-able to the user running spark
where do I create conf directory? Next to src? Here src/main/resources/conf/log4j.properties? This is confusing
@GeoffLangenderfer in my Dockerfile I'm using the following command when creating the spark docker image: COPY ./src/main/resources/log4j.properties /configuration
10

you can also update the log level programmatically like below, get hold of spark object from JVM and do like below

    def update_spark_log_level(self, log_level='info'):
        self.spark.sparkContext.setLogLevel(log_level)
        log4j = self.spark._jvm.org.apache.log4j
        logger = log4j.LogManager.getLogger("my custom Log Level")
        return logger;


use:

logger = update_spark_log_level('debug')
logger.info('you log message')

feel free to comment if you need more details

1 Comment

This code doesn't seem to work, what is self here?
8

You need to edit your $SPARK_HOME/conf/log4j.properties file (create it if you don't have one). Now if you submit your code via spark-submit, then you want this line:

log4j.rootCategory=INFO, console

If you want INFO-level logs in your pyspark console, then you need this line:

log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO

1 Comment

I have a package that has spark as a dependency. I jar it up and send it to s3. Where is spark home in this case? src/main/resources/log4j.properties?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.