NRT search with elastic search


(rcch) #1

Hi all,

  1. I'd like to get some feedback/hear experiences from people who've
    used elastic search for near-real-time search applications.

  2. How does elastic search compare to using near-real time search in
    lucene ?

  3. I believe there's a 1 second delay before documents can be made
    searchable (due to how the index gets updated asynchronously). Is this
    a parameter that can be tuned ?

Our application requires an ingestion of content of ~1000s of
documents/second, but they need to be searchable right-away..

Thanks for your input.

Cheers,

Vijay


(Shay Banon) #2

On Thu, Aug 11, 2011 at 5:44 AM, rcch vijay.cmu@gmail.com wrote:

Hi all,

  1. I'd like to get some feedback/hear experiences from people who've
    used elastic search for near-real-time search applications.

  2. How does elastic search compare to using near-real time search in
    lucene ?

elasticsearch uses Lucene NRT.

  1. I believe there's a 1 second delay before documents can be made
    searchable (due to how the index gets updated asynchronously). Is this
    a parameter that can be tuned ?

The index is not updated asynchronously, when you index data its applied to
the shard and the replicas in a sync manner including writing to a
transaction log. A "fresh" view of the content to be searchable is opened
every 1 second by default (thats NRT).

Our application requires an ingestion of content of ~1000s of
documents/second, but they need to be searchable right-away..

Thanks for your input.

Cheers,

Vijay


(Pavel Penchev) #3

Hi

On 11.08.2011 11:25, Shay Banon wrote:

On Thu, Aug 11, 2011 at 5:44 AM, rcch <vijay.cmu@gmail.com
mailto:vijay.cmu@gmail.com> wrote:

Hi all,

1) I'd like to get some feedback/hear experiences from people who've
used elastic search for near-real-time search applications.

2) How does elastic search compare to using near-real time search in
lucene ?

elasticsearch uses Lucene NRT.

3) I believe there's a 1 second delay before documents can be made
searchable (due to how the index gets updated asynchronously). Is this
a parameter that can be tuned ?

The index is not updated asynchronously, when you index data its
applied to the shard and the replicas in a sync manner including
writing to a transaction log. A "fresh" view of the content to be
searchable is opened every 1 second by default (thats NRT).

Our application requires an ingestion of content of ~1000s of
documents/second, but they need to be searchable right-away..

In case you only need them "get-able" right-away - ES has this since
0.17.0, see https://github.com/elasticsearch/elasticsearch/issues/1060

Regards,
Pavel

Thanks for your input.

Cheers,

Vijay

(rcch) #4

Thanks for your replies, Shay, Pavel.

Can you please tell me

  1. Is it true that there will always be a 1 second delay before
    content can be searchable?

  2. How does ES compare to Zoie (from Linkedin) for real-time-search
    applications ?

  3. What kind of ingestion rates can ES support comfortably ? Is 10s/
    second ? 100s/second ? or 1000s/second ? Can you suggest where you
    think the breakdown is?

Thanks,

Cheers,

Vijay

On Aug 11, 8:13 am, Pavel Penchev pavel.penc...@gmail.com wrote:

Hi

On 11.08.2011 11:25, Shay Banon wrote:

On Thu, Aug 11, 2011 at 5:44 AM, rcch <vijay....@gmail.com
mailto:vijay....@gmail.com> wrote:

Hi all,
1) I'd like to get some feedback/hear experiences from people who've
used elastic search for near-real-time search applications.
2) How does elastic search compare to using near-real time search in
lucene ?

elasticsearch uses Lucene NRT.

3) I believe there's a 1 second delay before documents can be made
searchable (due to how the index gets updated asynchronously). Is this
a parameter that can be tuned ?

The index is not updated asynchronously, when you index data its
applied to the shard and the replicas in a sync manner including
writing to a transaction log. A "fresh" view of the content to be
searchable is opened every 1 second by default (thats NRT).

Our application requires an ingestion of content of ~1000s of
documents/second, but they need to be searchable right-away..

In case you only need them "get-able" right-away - ES has this since
0.17.0, seehttps://github.com/elasticsearch/elasticsearch/issues/1060

Regards,
Pavel

Thanks for your input.
Cheers,
Vijay

(Berkay Mollamustafaoglu-2) #5

On Fri, Aug 12, 2011 at 3:02 PM, rcch vijay.cmu@gmail.com wrote:

Thanks for your replies, Shay, Pavel.

Can you please tell me

  1. Is it true that there will always be a 1 second delay before
    content can be searchable?

By default, refresh is called every 1 sec, so there would be delay up to 1
sec. However, refresh interval is configurable and can also be called
programmatically. Technically you can call it after indexing every document
but this would have significant performance impact.

  1. How does ES compare to Zoie (from Linkedin) for real-time-search
    applications ?
  1. What kind of ingestion rates can ES support comfortably ? Is 10s/
    second ? 100s/second ? or 1000s/second ? Can you suggest where you
    think the breakdown is?

Hard to tell. What is the document size? How many servers (CPU, Memory,
Disks) in the cluster? ES is horizontally scalable so event 1000s/sec is
achievable with multiple servers. No better way to find it out is to test
with your own docs.

Thanks,

Cheers,

Vijay

On Aug 11, 8:13 am, Pavel Penchev pavel.penc...@gmail.com wrote:

Hi

On 11.08.2011 11:25, Shay Banon wrote:

On Thu, Aug 11, 2011 at 5:44 AM, rcch <vijay....@gmail.com
mailto:vijay....@gmail.com> wrote:

Hi all,
1) I'd like to get some feedback/hear experiences from people

who've

used elastic search for near-real-time search applications.
2) How does elastic search compare to using near-real time search

in

lucene ?

elasticsearch uses Lucene NRT.

3) I believe there's a 1 second delay before documents can be made
searchable (due to how the index gets updated asynchronously). Is

this

a parameter that can be tuned ?

The index is not updated asynchronously, when you index data its
applied to the shard and the replicas in a sync manner including
writing to a transaction log. A "fresh" view of the content to be
searchable is opened every 1 second by default (thats NRT).

Our application requires an ingestion of content of ~1000s of
documents/second, but they need to be searchable right-away..

In case you only need them "get-able" right-away - ES has this since
0.17.0, seehttps://github.com/elasticsearch/elasticsearch/issues/1060

Regards,
Pavel

Thanks for your input.
Cheers,
Vijay

(system) #6