Error: "Data source org.apache.spark.sql.cassandra does not support streamed reading"

Question

Data source org.apache.spark.sql.cassandra does not support streamed reading

   val spark = SparkSession
  .builder()
  .appName("SparkCassandraApp")
  .config("spark.cassandra.connection.host", "localhost")
  .config("spark.cassandra.connection.port", "9042")
  .config("spark.cassandra.auth.username", "xxxxx")
  .config("spark.cassandra.auth.password", "yyyyy")
  .master("local[*]")
  .getOrCreate();

val tableDf3 = spark.**readStream**
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "aaaaa", "keyspace" -> "bbbbb"))
  .load()
  .filter("deviceid='XYZ'")

tableDf3.show(10)

Alex Ott · Accepted Answer · 2020-10-11 09:41:59Z

2

That's correct - Spark Cassandra Connector could be used only as streaming sink, not as streaming source.

If you want to get changes from Cassandra, then it's quite a complex task, depending on the version of Cassandra (does it implement CDC or not), and other factors.

For Spark, you can implement some kind of streaming by periodic re-read of the data, using the timestamp column to filter out the data you already read. You can find more information about that approach in the following answer.

answered Oct 11, 2020 at 9:41

Alex Ott

88.1k10 gold badges110 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Error: "Data source org.apache.spark.sql.cassandra does not support streamed reading"

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related