1

Data source org.apache.spark.sql.cassandra does not support streamed reading

   val spark = SparkSession
  .builder()
  .appName("SparkCassandraApp")
  .config("spark.cassandra.connection.host", "localhost")
  .config("spark.cassandra.connection.port", "9042")
  .config("spark.cassandra.auth.username", "xxxxx")
  .config("spark.cassandra.auth.password", "yyyyy")
  .master("local[*]")
  .getOrCreate();

val tableDf3 = spark.**readStream**
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "aaaaa", "keyspace" -> "bbbbb"))
  .load()
  .filter("deviceid='XYZ'")

tableDf3.show(10)

1 Answer 1

2

That's correct - Spark Cassandra Connector could be used only as streaming sink, not as streaming source.

If you want to get changes from Cassandra, then it's quite a complex task, depending on the version of Cassandra (does it implement CDC or not), and other factors.

For Spark, you can implement some kind of streaming by periodic re-read of the data, using the timestamp column to filter out the data you already read. You can find more information about that approach in the following answer.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.