Newbie ES Questions regarding batch commits, performance, etc


#1

Hi all,

I am planning to ES for indexing(Coming from Lucene) and querying good volume of data. Use case is, 10-20 documents / second(roughly around 40-50 fields) and in parallel doing query.

-- In Lucene i can directly add the document to indexWriter and do the commit in background(Custom thread after n number of minutes). I was using ControlledRealTimeReopenThread and TrackingIndexWriter for query, so that documents are immediately available for search(Even though they are not committed yet). How is it possible to achieve this functionality in ES ? I'll be using REST client for indexing and query.

-- Also do i need to keep track of documents got committed for book-keeping ? What if document was added, but before the commit happened node went down(Assume single node deployment). Now when node comes back, would document be re-indexed automatically(Does ES uses JMS or embedded queues) ? If not, how can i get list of documents got committed ?

Can someone point me to the resource/wiki/example to cover above ?

Regards.

Regards.


#2

Hi experts, can anyone help answering above questions ?

Regards.


(Lee Hinman) #3

[quote="lukes, post:1, topic:72546"]
-- In Lucene i can directly add the document to indexWriter and do the commit in
background(Custom thread after n number of minutes). I was using
ControlledRealTimeReopenThread and TrackingIndexWriter for query, so that
documents are immediately available for search(Even though they are not
committed yet). How is it possible to achieve this functionality in ES ? I'll
be using REST client for indexing and query. [/quote]

Elasticsearch will automatically refresh (Lucene flush) and re-open the searcher
to make documents visible by default every second. You can change this interval
dynamically to be higher, or disable automatic refresh entirely (you can
explicitly refresh also).

[quote="lukes, post:1, topic:72546"]
-- Also do i need to keep track of documents got committed for book-keeping ?
What if document was added, but before the commit happened node went down(Assume
single node deployment). Now when node comes back, would document be re-indexed
automatically(Does ES uses JMS or embedded queues) ? If not, how can i get list
of documents got committed ? [/quote]

No, Elasticsearch puts documents that are acknowledged but not yet committed
into its Translog, which is fsynced to disk after each request. If ES were to
die during this period, the translog is replayed when Elasticsearch is started
back up so that documents that were not yet committed are not lost.


#4

Thanks a lot @dakrone ...

Regards.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.