Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Filter by
Sorted by
Tagged with
0 votes
2 answers
89 views

Is it safe to mutate, emit, and snapshot the same operator-state instance in Apache Flink?

I'm building a single global Topology object in a non-keyed ProcessFunction with parallelism = 1. I keep it as a local mutable object and update it for every input event using topology.apply(GnmiEvent)...
Kvn's user avatar
  • 1
-1 votes
1 answer
40 views

Does Flink throughput decrease proportionally with the number of side outputs?

I have a Flink job with multiple downstream operators I want to route tuples to based on a condition. Side outputs are advertised for this use case in the Flink documentation. However, when sending ...
keezar's user avatar
  • 111
0 votes
0 answers
58 views

Azure Event Hubs +Apache Flink + Cassandra – how to handle downtime without losing events (one Replay Hub vs multiple Replay Hubs)?

We use Azure Event Hubs (Kafka API) with Apache Flink consumers, and a shared Cassandra DB as the sink. There are 7 Event Hubs (one per application) → each has its own Flink consumer writing to the ...
Sadhanala Akhil Kumar's user avatar
0 votes
1 answer
57 views

How to handle multiple aggregations on the same Kafka event stream when each needs a different partitioning key?

I'm working on a system that processes events from Kafka, and I'm running into a design problem related to scaling aggregations. The events contain fields like this (purchased SKUs in an order): { &...
Aleksey Usatov's user avatar
0 votes
1 answer
35 views

How does Kafka Streams' KTable.groupBy + aggregate guarantee correct order of the re-partitioned messages?

If I have a kafka input topic with multiple partitions and then in Kafka Streams I use kStream.map to change the key of each record and write that to an output topic, I will face the problem, that ...
selbstereg's user avatar
0 votes
0 answers
49 views

Average aggregation of data stream in bytewax

I want to aggregate the values of my DataStream in tumbling windows of 10 seconds. Unfortunately is the documentation in Bytewax very limited and I also don't find any other source where an average of ...
LeXXan's user avatar
  • 27
1 vote
1 answer
77 views

Docker connection between Kafka and Bytewax refused

I want to consume a stream from Kafka using Bytewax to perform aggregations. Unfortunately I'm not able to connect to Kafka and the connection is always refused. I assume something with the port setup ...
LeXXan's user avatar
  • 27
0 votes
1 answer
123 views

Apache Flink Branching

Problem Overview: I am working on a Flink application that allows users to design dataflows dynamically. The core engine is built around stages, where a DataStream is passed sequentially through these ...
Mohamed Sallam's user avatar
0 votes
1 answer
32 views

Timestamps of result elements of window function in Apache Flink

The Flink documentation says "The only relevant information that is set on the result elements is the element timestamp [...] which is [set to] end timestamp - 1 [...]" In case I have to ...
keezar's user avatar
  • 111
0 votes
1 answer
48 views

Flink State Processor API with Windowed CoGroup and Unaligned Checkpoints

How can I use the Flink State Processor API to process the state of a windowed CoGroup (or Join) function? The documentation does not give such an example. Is there a way to use the Flink State ...
keezar's user avatar
  • 111
1 vote
0 answers
40 views

ksqldb explode multiple list elements

I have data like this: One Kafka Message: [{"source": "858256_6052+571", "numericValue": null, "created": 1725969039288, "textValue": "mytestData&...
Max Muster's user avatar
0 votes
1 answer
345 views

Kafka Streams Processor has no access to StateStore null the store is not connected to the processor

I have the code in A. After we call builtTopology = builder.build, the call to new org.apache.kafka.streams.TopologyTestDriver(builtTopology, properties) gives me the error in B. I've combed through ...
Nutritioustim's user avatar
1 vote
1 answer
118 views

Is there any chance to limit database sessions using jdbc sinks with apache flink?

We are using jdbc sinks (apache flink) with which we are hitting the database maximal session count, especially when we increase parallelism. Our tests showed that if we increase our default ...
user26565994's user avatar
0 votes
1 answer
45 views

Flink GlobalWindow Trigger only process the trigger event

I have datastream keyby by an event property, that is then passed to a globalwindow, trigged when a specific event comes in, the issue is that when the window is trigged to process the events, it only ...
car_dev's user avatar
0 votes
1 answer
139 views

How to correlate HTTP request to its corresponding response with Apache Flink?

I'm looking for the best practice to correlate requests and their corresponding responses in an Apache Flink stream processing job. The key attributes of the problem are: Conditions: Each request and ...
Cauchy H's user avatar
0 votes
2 answers
101 views

Slowly Updating Side Inputs & Session Windows - Transform node AppliedPTransform was not replaced as expected

In my apache beam streaming pipeline, I have an unbounded pub/sub source which I use with session windows. There is some bounded configuration data which I need to pass into some of the DoFns of the ...
Thomas W.'s user avatar
  • 560
2 votes
1 answer
72 views

Why does PyFlink give me a time in the past

I have a Kafka topic in which I produce an entry every 2-3 seconds Then I have PyFlink job that will format the entries and send them to a db here's my Flink env setup env = StreamExecutionEnvironment....
GeorgeSedhom's user avatar
0 votes
1 answer
145 views

Order Preserving Union/Merge Executor in Flink

I want to merge two (multiple) streams using Flink. Both streams are themselves ordered, I want merged result to be ordered also. As an example [1,2,4,5,7,8, ...] and [2,3,6,7, ..] should produce ...
akurmustafa's user avatar
2 votes
0 answers
379 views

Apache Beam portable runner to run golang based job on flink

I am trying to learn Apache Beam and trying to create a sample project to learn stream processing. For now, I want to read from a Kafka topic "word" and print the data on console. I have ...
Swapnil1456's user avatar
1 vote
1 answer
1k views

Are checkpoints needed for a stream processed in databricks job running via continuous trigger?

We have a requirement to process a data stream in a databricks notebook job and load to a delta table. I noted that there was a new "Continuous" trigger available for databricks jobs and we ...
azuresnowflake1's user avatar
0 votes
2 answers
3k views

Apache flink job not printing anything to stdout and not sending to Kafka sink topic

I am experimenting with Apache Flink for a personal project and I have been struggling to make the resulting stream output to StdOut and to send it to a Kafka topic orders-output. My goal is to ...
 Annis99's user avatar
0 votes
1 answer
46 views

How to access the results from ProcessFunction?

I am running a query against two topics and calculating the results. In the main class: tableEnv.createTemporaryView("tbl1", stream1); tableEnv.createTemporaryView("tbl2", stream2);...
newbie5050's user avatar
2 votes
2 answers
1k views

Good way to parallel process elements of 'stream' while keeping output in order

I have an app that receives a stream of XML events from Kafka. These events have to be deserialized/parsed and otherwise converted, before being handed in-order to some business logic. (This logic ...
Jannick's user avatar
  • 57
0 votes
1 answer
78 views

Unable to perform 'maxBy' on a Flink windowed stream of type <SampleClass> by using the SampleClass's field as a parameter does not work

Lets say this is my sample stream like so: SingleOutputStreamOperator<Tuple2<String, SampleClass>> sampleStream = previousStream .keyBy(value -> ...
FelixNavidad's user avatar
2 votes
1 answer
2k views

Is there a suitable python library for doing stream processing with Kafka topics? [duplicate]

I am trying to find a suitable Python library to do stream processing with streams Kafka topics, Kafka streams. Specifically, I am looking for libraries that support the following operations. KStream-...
Eagle1992's user avatar
0 votes
1 answer
1k views

In Kafka Streams, how do you parallelize complex operations (or sub-topologies) using multiple topics and partitions?

I am currently trying to understand how Kafka Streams achieves parallelism. My main concern boils down to three questions: Can multiple sub-topologies read from the same partition? How can you ...
donare's user avatar
  • 1
0 votes
0 answers
179 views

Is it possable to stream images to a PDF using tesseract?

I need to stream images from a scanner to a PDF document. Ultimately I want to OCR the images and save the text to the PDF as well, but I'm more concerned with getting the streaming working first. ...
tlum's user avatar
  • 933
1 vote
0 answers
214 views

Pipelined vs blocking data exchange in a Flink job

I've been reading about pipelined region scheduling in Flink and am a bit confused about what they mean. My understanding of it is that a Streaming job is always pipelined whereas a Batch job can ...
sunny's user avatar
  • 39
1 vote
0 answers
138 views

Event-Driven-API with Real Time Streaming Analytics from Datastream? (Kappa-Architecture, IoT)

I've recently read up common Big Data architectures (Lambda and Kappa) and I'm trying to put it into practice in the context of an IoT Application. As of right now, events are produced, ingested into ...
QUE's user avatar
  • 11
1 vote
0 answers
243 views

Upgrade Apache Storm from 1.2.3. to 2.2.0

We are seeing below error when we migrate to Apache Storm 2.2.0 from 1.2.3. We do not see any permission issues as well on the file which it is unable to find. Also, the file exists but just the /data ...
Deepti's user avatar
  • 138
0 votes
1 answer
195 views

How to configure logback-based logging to handle log masking with Apache Storm?

I am trying to configure logback based log masking for Apache Storm topologies. When I try to replace logback.xml file inside Apache Storm log4j2- directory and update worker.xml and cluster.xml file, ...
Rajan Kasodariya's user avatar
4 votes
1 answer
864 views

One consumer to multiple tables or many consumers per table

I have a kafka topic with millions of sale events. I have a consumer which on every message will insert the data into 4 table: 1 for the raw sales, 1 for the sales sum by date by product category (...
friartuck's user avatar
  • 3,161
0 votes
2 answers
78 views

Is it possible to implement a timer in a Apache Storm bolt?

A Bolt's code is triggered when data arrives (an input tuple). How can we program code inside a Bolt to run even in the case of missing input data? I mean, if no tuple arrives how can we force an ...
ellav's user avatar
  • 1
0 votes
1 answer
328 views

Azure Stream Analytics - JSON Array, windowing and previous value

I receive containers of senor data as an input for ASA. Containers look like this. { "data": [ { "sensor_id": 55, "timestamp": 1663075725000, "value&...
mananana's user avatar
  • 413
2 votes
2 answers
3k views

Apache flink vs Apache Beam (With flink runner)

I am considering using Flink or Apache Beam (with the flink runner) for different stream processing applications. I am trying to compare the two options and make the better choice. Here are the ...
Guillaume Delmas-Frenette's user avatar
2 votes
1 answer
660 views

How to apply an offset to Tumbling Window, in order to delay the starting of Windows<TimeWindow> in Kafka Streams

I'm calculating a simple mean on a dataset with values for May 2022, using different windows sizes. Using 1 hour windows there are no problems, while using 1 week and 1 month windows, records are not ...
sixpain's user avatar
  • 354
2 votes
3 answers
865 views

Flink AggregateFunction in TumblingWindow is automatically splitted in two windows for big window size

I'm calculating a simple mean on some records, using different windows sizes. Using 1 hour and 1 week windows there are no problems, and the results are computed correctly. var keyed = src ....
sixpain's user avatar
  • 354
1 vote
1 answer
69 views

Sink for user activity data stream to build Online ML model

I am writing a consumer that consumes (user activity data, (activityid, userid, timestamp, cta, duration) from Google Pub/Sub and I want to create a sink for this such that I can train my ML model in ...
amor.fati95's user avatar
2 votes
1 answer
511 views

How to drain the window after a Flink join using coGroup()?

I'd like to join data coming in from two Kafka topics ("left" and "right"). Matching records are to be joined using an ID, but if a "left" or a "right" record ...
Beryllium's user avatar
  • 13k
3 votes
3 answers
1k views

How to do stream processing with Redpanda?

Redpanda seems easy to work with, but how would one process streams in real-time? We have a few thousand IoT devices that send us data every second. We would like to get the running average of the ...
NorwegianClassic's user avatar
1 vote
0 answers
772 views

Use Redis streams for MQTT queue

Redis has changed a lot in the later years, and it's difficult to keep up with the latest features. We have a few thousand IoT devices that all send MQTT messages every second. We want different ...
NorwegianClassic's user avatar
3 votes
4 answers
270 views

Stream processing alternatives

We have a few thousand IoT devices that send us their temperature every second. The input source can be MQTT or JSON (or a queue if needed). Our goal is to near continuously process data for each of ...
NorwegianClassic's user avatar
2 votes
1 answer
368 views

How to inject delay between the window and sink operator?

Context - Application We have an Apache Flink application which processes events The application uses event time characteristics The application shards (keyBy) events based on the sessionId field The ...
Peter Csala's user avatar
  • 23.7k
1 vote
1 answer
382 views

Python program to use Elasticsearch as sink in Apache Flink

I am trying to read data from a kafka topic do some processing and dump the data into elasticsearch. But I could not find example in python ti use Elastisearch as sink. Can anyone help me with a ...
Madhuri Desai's user avatar
4 votes
3 answers
5k views

Flink Python Datastream API Kafka Consumer

Im new to pyflink. Im tryig to write a python program to read data from kafka topic and prints data to stdout. I followed the link Flink Python Datastream API Kafka Producer Sink Serializaion. But i ...
Madhuri Desai's user avatar
0 votes
2 answers
57 views

Are Hazelcast Jet Reliable Topic Sinks idempotent? (Hazelcast fault-tolerance of a websocket source)

I cannot find this in the Hazelcast Jet 5.0 (or 4.x) documentation, so I hope someone can answer this here - can a reliable topic be used as an idempotent sink, for example to de-duplicate events ...
siddhadev's user avatar
  • 16.6k
0 votes
1 answer
513 views

Easiest stream processing framework for python?

I'm working on a project that will determine whether or not I score an internship. The project focuses on stream processing and is due in 2 weeks. It's pretty simple, just deriving some statistics ...
geisha-and-guis's user avatar
0 votes
1 answer
436 views

How to scale flink on an un-keyed stream

I have a relatively basic use case. My data lives in a few 100 kafka partitions and I need to pass the events through a map operator before I send them to a custom HTTP sink. For performance reasons ...
jlunavtgrad's user avatar
  • 1,015
3 votes
0 answers
356 views

using Java Stream without the filter() operation to block the stream

I'm working on a service that needs to make some stream processing for products. Given a Company we can use getProducts(Company company) to get List<Product>. The next thing I'd like to do is to ...
IsaacLevon's user avatar
  • 2,630
0 votes
1 answer
553 views

Why does my flink window trigger when I have set watermark to be a high number?

I would expect windows to trigger only after we wait until the maximum possible time as defined by the max lateness for watermark. .assignTimestampsAndWatermarks( WatermarkStrategy....
user1099123's user avatar
  • 6,723

1
2 3 4 5 6