I am using Elastic search version 6.1 , Nest 6.1 , C# 4.5.2 for development. I receive huge volumes of data from different applications in short intervals (50000 documents/sec) and then store all the documents in Elastic search. All data are of same type and all fields are of basic core types (int, long, string).. The fields are not dynamic. No new types would be added at any point of time. I also stream the data from another application and show those documents in a live viewer since Elastic search supports near to real time data and very good text analysis.
I am running elastic search in single node. Since I do not need data for long time, I purge at regular intervals ( 7 days). So, single node cluster is more than enough for my requirements..
I also use only one index for the entire work.
Below is my index mapping:
_elasticClient.CreateIndex(_defaultIndex, index =>
index.Mappings(ms => ms.Map(m => m
.AllField(al => al.Enabled(false))
.SourceField(sou => sou.Enabled())
.Properties(ps => ps
.Number(n => n.Name(e => e.Prop1).Type(NumberType.Integer))
.Number(n => n.Name(nam => nam.Prop2).Type(NumberType.Byte))
.Number(n => n.Name(nam => nam.Prop3).Type(NumberType.Byte))
.Keyword(t => t.Name(n => n.Prop4).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop5).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop6).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop7).Norms(false).IndexOptions(IndexOptions.Docs))
.Keyword(t => t.Name(n => n.Prop8).Norms(false).IndexOptions(IndexOptions.Docs))
.Text(t => t.Name(n => n.Prop9).Norms(false))
.Text(t => t.Name(n => n.Prop10).Norms(false))
.Number(n => n.Name(nam => nam.Prop11).Type(NumberType.Integer))
.Number(n => n.Name(nam => nam.Prop12).Type(NumberType.Long))
.Number(n => n.Name(nam => nam.Prop13).Index(false))
.Object(o => o.Name(nam => nam.Object1).Enabled(false))
))).Settings(i => i.NumberOfReplicas(0))
.Settings(i => i.RefreshInterval(TimeSpan.FromMilliseconds(100))));
I buffer all the docs for 200 milliseconds and use below code to index into Elastic search..
Below is my api to do bulk indexing and refreshing:
_elasticClient.BulkAll(docs, b => b
.MaxDegreeOfParallelism(Environment.ProcessorCount)
.BackOffRetries(1)
.BackOffTime(TimeSpan.FromSeconds(1))
//.RefreshOnCompleted()
);
bulkAll.Subscribe(new BulkAllObserver(
onError: (e) =>
{
_logger.LogError(e.Message);
throw e;
},
onCompleted: () => {_logger.LogInfo("Sending bulk documents completed"); }
));
For refreshing, if I use "RefreshOnCompleted" , the application crashes in very short time less than 5 minutes with exception " Refreshing after all documents have indexed failed"..
So, I added Refresh interval setting in index mapping.
From live streaming application, I get all the records using below query:
var response = _elasticClient.Search(s => s
.From(0)
.Size(howMuch)
.Query(q => q
.Bool(b => b
.Must(
DocsSatisfyFiltersIncludes(filters), DocsSatisfyRangeIncludeBoundaries(timeLimit)
)
.MustNot(
DocsSatisfyFiltersExcludes(filters))))
.Sort(so => so
.Ascending(a => a.Seconds)
.Ascending(ac => ac.Nanos))
.SearchAfter(afterThisTime.Seconds, afterThisTime.Nanos));
What I observed is that when I use the above search query, I am not getting all the documents indexed.
But when I query the same after 4 to 5 seconds, I get all the documents.
Please help me to reduce this long delay between refresh and search availability..
Thanks and Regards,
HPG