Indexing realtime data to elastic search

pksinghal · December 12, 2020, 6:17pm

Hi,

I am creating a new user profiling system. I am using Elasticsearch to store user data for fast search.
User profiles have various data like registration-data, comment-added-data, reply-data, etc.
So whenever the user performs an action, we put user data to Kafka queue, and a Kafka consumer processes user data and index data to Elasticsearch one packet at a time.
We are creating the monthly index and using the last activity time to put users to a particular monthly index.
As we are using the user's last activity time for choosing the index, so user document needs to delete from the previous monthly index and add to the new monthly index.
So every time when we are updating user data, then first we need to get user data via querying all monthly index, then if the user last activity month changes, then we need to delete user doc from the previous monthly index and add it to the new monthly index.

Now, As in Elasticsearch, there is a minimum 1-sec refresh interval and we are processing one packet from Kafka at a time, so when we get two users (same user) packet, then we process 1st packet and insert it to Elasticsearch then while processing 2nd packet we need to check if user doc already existing in the Elasticsearch or not(to check if we need to move user doc to new monthly index), but as these two packet process within a second, so I won't find the previous data in Elasticsearch. so I need to take a 1-second of sleep every time when I am processing new data.
I don't think this is a good approach.

What I thinking is:
pick chunk data from Kafka, process it then take 1-second sleep, and again start processing chunk data then again take 1-second sleep ...

Is this a good approach or not?

Or there is any other solution for this?

warkolm · December 13, 2020, 11:35pm

This seems way too complex. Why do you think you need to do take this approach?

pksinghal · December 14, 2020, 5:16am

Then will u help with some other solution?

warkolm · December 14, 2020, 8:50pm

Why not just store each event with the user details in it as well?

system · January 11, 2021, 8:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch with realtime ingestion Elasticsearch	5	14380	February 16, 2017
Help import data using log-stash everyday Elasticsearch	2	457	October 27, 2018
Capture Changes (Updates/New) in my ElasticSearch Index and send to Kafka topic Elasticsearch	1	356	December 5, 2019
How to do a full update each day Elasticsearch	2	880	July 4, 2019
Updating the indexes in every half an hour Elasticsearch	7	1556	July 6, 2017

Indexing realtime data to elastic search

Related topics