1

I want to create a hive table with partitions.

The schema for the table is:

val schema = StructType(StructField(name,StringType,true),StructField(age,IntegerType,true))

I can do this with Spark-SQL using:

val query = "CREATE TABLE some_new_table (name string, age integer) USING org.apache.spark.sql.parquet OPTIONS (path '<some_path>') PARTITIONED BY (age)"

spark.sql(query)

When I try to do with Spark API (using Scala), the table is filled with data. I only want to create an empty table and define partitions. This is what I am doing, what I am doing wrong :

val df = spark.createDataFrame(sc.emptyRDD[Row], schema)

val options = Map("path" -> "<some_path>", "partitionBy" -> "age")

df.sqlContext().createExternalTable("some_new_table", "org.apache.spark.sql.parquet", schema, options);

I am using Spark-2.1.1.

1 Answer 1

1

If you skip partitioning. can try with saveAsTable:

spark.createDataFrame(sc.emptyRDD[Row], schema)
  .write
  .format("parquet")
  //.partitionBy("age")
  .saveAsTable("some_new_table")

Spark partitioning and Hive partitioning are not compatible, so if you want access from Hive you have to use SQL: https://issues.apache.org/jira/browse/SPARK-14927

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.