Why does ydata-profiling not detect missing values in PySpark DataFrame when using None?

I'm using ydata-profiling to generate profiling reports from a large PySpark DataFrame without converting it to Pandas (to avoid memory issues on large datasets). Some columns contain the string "UNKNOWN", which I replace with None:

df = df.na.replace("UNKNOWN", None)

This works fine in PySpark: when I check with df.selectExpr("count(*)", "count_if(col_name IS NULL)").show() or df.filter(col("col_name").isNull()).count() I see the correct number of missing values. The Problem: When I run ydata-profiling directly on the PySpark DataFrame:

from ydata_profiling import ProfileReport
report = ProfileReport(df, minimal=True)
report.to_file("report.html")

... the report does only show missing values in the categorical columns. In numerical columns, I see mean = NaN, but missing = 0, which is contradictory, even though there should be missing values displayed.

How can I ensure that ydata-profiling correctly detects missing values in PySpark DataFrames – especially in numerical columns – without having to call .toPandas()?

asked Apr 9 at 10:57

hexxetexxeh

33 bronze badges

based on their documentation, missing value analysis isn't supported yet for spark dataframes. see

samkart
– samkart

2025-04-12 23:47:50 +00:00
Commented Apr 12 at 23:47

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Why does ydata-profiling not detect missing values in PySpark DataFrame when using None?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest