Long delay between indexing and search availability


(HPGANGA) #1

I am using Elastic search version 6.1 , Nest 6.1 , C# 4.5.2 for development. I receive huge volumes of data from different applications in short intervals (50000 documents/sec) and then store all the documents in Elastic search. All data are of same type and all fields are of basic core types (int, long, string).. The fields are not dynamic. No new types would be added at any point of time. I also stream the data from another application and show those documents in a live viewer since Elastic search supports near to real time data and very good text analysis.

I am running elastic search in single node. Since I do not need data for long time, I purge at regular intervals ( 7 days). So, single node cluster is more than enough for my requirements..

I also use only one index for the entire work.

Below is my index mapping:
_elasticClient.CreateIndex(_defaultIndex, index =>
index.Mappings(ms => ms.Map(m => m
.AllField(al => al.Enabled(false))
.SourceField(sou => sou.Enabled())
.Properties(ps => ps
.Number(n => n.Name(e => e.Prop1).Type(NumberType.Integer))
.Number(n => n.Name(nam => nam.Prop2).Type(NumberType.Byte))
.Number(n => n.Name(nam => nam.Prop3).Type(NumberType.Byte))
.Keyword(t => t.Name(n => n.Prop4).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop5).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop6).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop7).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop8).Norms(false).IndexOptions(IndexOptions.Docs))
.Text(t => t.Name(n => n.Prop9).Norms(false))
.Text(t => t.Name(n => n.Prop10).Norms(false))
.Number(n => n.Name(nam => nam.Prop11).Type(NumberType.Integer))
.Number(n => n.Name(nam => nam.Prop12).Type(NumberType.Long))
.Number(n => n.Name(nam => nam.Prop13).Index(false))
.Object(o => o.Name(nam => nam.Object1).Enabled(false))
))).Settings(i => i.NumberOfReplicas(0))
.Settings(i => i.RefreshInterval(TimeSpan.FromMilliseconds(100))));

I buffer all the docs for 200 milliseconds and use below code to index into Elastic search..
Below is my api to do bulk indexing and refreshing:
_elasticClient.BulkAll(docs, b => b
.MaxDegreeOfParallelism(Environment.ProcessorCount)
.BackOffRetries(1)
.BackOffTime(TimeSpan.FromSeconds(1))
//.RefreshOnCompleted()
);

        bulkAll.Subscribe(new BulkAllObserver(
            onError: (e) =>
            {
                _logger.LogError(e.Message);
                throw e;
            },
            onCompleted: () => {_logger.LogInfo("Sending bulk documents completed"); }
        ));

For refreshing, if I use "RefreshOnCompleted" , the application crashes in very short time less than 5 minutes with exception " Refreshing after all documents have indexed failed"..
So, I added Refresh interval setting in index mapping.

From live streaming application, I get all the records using below query:

var response = _elasticClient.Search(s => s
.From(0)
.Size(howMuch)

            .Query(q => q
                .Bool(b => b

                    .Must(
                        DocsSatisfyFiltersIncludes(filters), DocsSatisfyRangeIncludeBoundaries(timeLimit)
                    )
                    .MustNot(
                        DocsSatisfyFiltersExcludes(filters))))
            .Sort(so => so
                .Ascending(a => a.Seconds)
                .Ascending(ac => ac.Nanos))
            .SearchAfter(afterThisTime.Seconds, afterThisTime.Nanos));

What I observed is that when I use the above search query, I am not getting all the documents indexed.
But when I query the same after 4 to 5 seconds, I get all the documents.

Please help me to reduce this long delay between refresh and search availability..

Thanks and Regards,
HPG


(David Turner) #2

I think this is to be expected given the setup you describe. Refreshing is what makes the indexed documents visible, so if you are only refreshing periodically you must wait for at least one full refresh interval before expecting the documents to be visible to searches.

This seems unexpected. Can you share the full exception message and the stack trace from the server log?

It's not relevant here, I think, but there are better ways to deal with an indexing exception than crashing your application. Occasional failures are a fact of life in a distributed system.


(HPGANGA) #3

Hi DavidTurner,
Thanks a lot for the detailed explanation..
I don't understand your first point. What did you mean by refreshing periodically and full refresh?? Could you please explain..
I set refresh interval as 200 milliseconds in index mapping. Why does it take 4 seconds for the documents to be available for search??
If there is any other way to overcome this issue, please advise. Throughput is the first on our plate to decide the software stack

I will share the stack trace details for the exception on Monday..


(HPGANGA) #4

The exception is thrown for every "RefreshOnCompleted" after 5 minutes of bulk indexing... Also, Elastic Search stops sending documents in the search query. Will send all the details with stack trace asap


(David Turner) #5

Ah you didn't say that the refresh interval was 200ms - I interpreted your post as suggesting that the refresh interval was ~4 seconds, hence the wait. It's possible that a refresh of 10k documents (50k docs/s * 200ms) itself takes a few seconds. This shouldn't affect throughput much - you can carry on indexing while the refresh is taking place.

If you require some of documents to be visible then you really need to explicitly refresh and wait for a response, either with the indexing request or as a separate call later.


(HPGANGA) #6

Can Refresh be done for every 50 milliseconds? What is the optimal value for better performance?. In my case, I also noticed few new data were available and few older ones were missing. I felt refresh persisting documents in random order. How does it work?? Could you please explain the details on that area


(David Turner) #7

Technically you can set the refresh interval that short, but I think it will be horrible for performance.

Let me turn the question around. What is your goal for latency, from starting to send a document to Elasticsearch until it becomes visible in a search?


(Christian Dahlqvist) #8

Refreshing is a quite expensive operation, which is why it by default is done only once every second. In order to improve indexing performance, it is generally recommended to increase the refresh interval rather than reduce it.


(HPGANGA) #9

I actually buffer documents for 200 milliseconds and then do bulk indexing to Elastic search. Bulk indexing is done quickly in very few milliseconds. I have also set refreshing interval to 200 milliseconds...

The actual expectation is that if all the documents after bulk indexing were available before 1 second, it would be really perceived as live data


(David Turner) #10

In reality those few milliseconds are not doing the whole indexing operation: Elasticsearch returns as soon as it can guarantee that the documents will eventually be indexed. The index is only built by a subsequent refresh, which is why that can be quite expensive.


(HPGANGA) #11

I was quiet misled by OnCompleted event of BulkObservable ?? How do I know that indexing and refreshing is completed successfully or not.. Is there any event/ callback available??

Am really pleased with the quick responses. Thanks a lot guys


(David Turner) #12

Either ask for a refresh along with your bulk indexing request, or else perform a separate explicit refresh. In either case the response only comes back once the refresh is complete.


(HPGANGA) #13

Exception Details:
Message = "Refreshing after all documents have indexed failed"
Response = {Unsuccessful low level call on POST: /myindex/_refresh}
DebugInformation = "Unsuccessful low level call on POST: /myindex/_refresh\r\n# Audit trail of this API call:\r\n - [1] BadResponse: Node: http://localhost:9200/ Took: 00:01:00.2204042\r\n - [2] MaxTimeoutReached: Took: -737019.11:16:48.7862369\r\n#
Message = "The operation has timed out"

This happens for all the RefreshOnCompleted calls after 5 minutes of bulk indexing of 50,000 docs/sec.


(David Turner) #14

I think this error is a client-side timeout and not coming all the way from Elasticsearch. Could you configure your client not to time out on this call?

It does, however, raise the question of why a refresh could take over a minute to complete. Are your nodes doing a lot of GC, for instance?


(HPGANGA) #15

Is there any setting to increase the time out ?? I am using Nest C# client for development. Can it be done via Nest or Elastic search.yml ??
Like I mentioned before in this thread, I am using single node cluster. When does GC trigger ?? How do I avoid more GC triggers??
Please enlighten


(David Turner) #16

It seems that you can: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/request-timeout.html

GC triggers when the JVM needs to reclaim some allocated memory. That was just a guess. You'll need to look at the server logs in order to start to work out why this refresh is taking so long.


(HPGANGA) #17

@DavidTurner Could you please suggest how much memory shall I allocate for JVM considering my requirements ??


(David Turner) #18

It very much depends on your specific workload, and it's still not clear that heap size is the problem, but the reference manual gives some guidance: https://www.elastic.co/guide/en/elasticsearch/reference/6.5/heap-size.html