0

I am trying to read thorugh spark.sql a huge csv. I created a dataframe from a CSV, the dataframe seems created correctly. I read the schema and I can perform select and filter. I would like to create a temp view to execute same research using sql, I am more comfortable with it but the temp view seems created on the csv header only. Where am I making the mistake? Thanks

>>> df = spark.read.options(header=True,inferSchema=True,delimiter=";").csv("./elenco_dm_tutti_csv_formato_opendata_UltimaVersione.csv")
>>> df.printSchema()
root
 |-- TIPO: integer (nullable = true)
 |-- PROGRESSIVO_DM_ASS: integer (nullable = true)
 |-- DATA_PRIMA_PUBBLICAZIONE: string (nullable = true)
 |-- DM_RIFERIMENTO: integer (nullable = true)
 |-- GRUPPO_DM_SIMILI: integer (nullable = true)
 |-- ISCRIZIONE_REPERTORIO: string (nullable = true)
 |-- INIZIO_VALIDITA: string (nullable = true)
 |-- FINE_VALIDITA: string (nullable = true)
 |-- FABBRICANTE_ASSEMBLATORE: string (nullable = true)
 |-- CODICE_FISCALE: string (nullable = true)
 |-- PARTITA_IVA_VATNUMBER: string (nullable = true)
 |-- CODICE_CATALOGO_FABBR_ASS: string (nullable = true)
 |-- DENOMINAZIONE_COMMERCIALE: string (nullable = true)
 |-- CLASSIFICAZIONE_CND: string (nullable = true)
 |-- DESCRIZIONE_CND: string (nullable = true)
 |-- DATAFINE_COMMERCIO: string (nullable = true)

>>> df.count()
1653697
>>> df.createOrReplaceTempView("mask")
>>> spark.sql("select count(*) from mask")
DataFrame[count(1): bigint]

1 Answer 1

1

Spark operations like sql() do not process anything by default. You need to add .show() or .collect() to get results.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.