Percolate Performance


(Adam Georgiou) #1

Hey Guys,

Another investigative question here -- trying to figure out if
elasticsearch is right for a certain project my team is working on:

Does anyone have any hard metrics on how the percolate feature holds up
against different sized datasets and how it fares with real time query
indexing/percolating. I've seen relatively small
stuff discussed anecdotally, but am wondering if anyone has any specifics.

In our current architecture we have ~500,000 queries stored and percolate
(we call it profile) around 10 documents a minute through it, expecting
responses in under (or around) half a second. This doesn't seem like it
would be too hard of a problem for a properly driven elasticsearch setup,
but we also add/remove about 50 queries/minute from our
index simultaneously while profiling documents. Would ElasticSearch block
in such cases?

Wondering...

-Adam

--


(Otis Gospodnetić) #2

Hi,

Depends on the hardware and query complexity... :slight_smile:

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Wednesday, October 17, 2012 3:59:44 PM UTC-4, Adam Georgiou wrote:

Hey Guys,

Another investigative question here -- trying to figure out if
elasticsearch is right for a certain project my team is working on:

Does anyone have any hard metrics on how the percolate feature holds up
against different sized datasets and how it fares with real time query
indexing/percolating. I've seen relatively small
stuff discussed anecdotally, but am wondering if anyone has any specifics.

In our current architecture we have ~500,000 queries stored and percolate
(we call it profile) around 10 documents a minute through it, expecting
responses in under (or around) half a second. This doesn't seem like it
would be too hard of a problem for a properly driven elasticsearch setup,
but we also add/remove about 50 queries/minute from our
index simultaneously while profiling documents. Would ElasticSearch block
in such cases?

Wondering...

-Adam

--


(Karel Minarik) #3

Hi, I think everybody would be interested in a properly executed and
repeatable benchmark of that...

For what it's worth, in one of my past contracts, we had like ~ 2,500
registered queries, tens of millions of docs, and used percolator for
"alert" use cases (every document being indexed was running through
percolator, not too high indexing rate though), as well as "document
clustering" use case (when exporting, tell me what "topics" this document
is about, run every document being exported against multiple percolator
queries).

It performed very well -- we never noticed any performance issues, it
wasn't slowing down the throughput.

Would ElasticSearch block in such cases?

Certainly not -- after all, percolator queries are just "special" documents
in a "special" index.

Karel

On Wednesday, October 17, 2012 9:59:44 PM UTC+2, Adam Georgiou wrote:

Hey Guys,

Another investigative question here -- trying to figure out if
elasticsearch is right for a certain project my team is working on:

Does anyone have any hard metrics on how the percolate feature holds up
against different sized datasets and how it fares with real time query
indexing/percolating. I've seen relatively small
stuff discussed anecdotally, but am wondering if anyone has any specifics.

In our current architecture we have ~500,000 queries stored and percolate
(we call it profile) around 10 documents a minute through it, expecting
responses in under (or around) half a second. This doesn't seem like it
would be too hard of a problem for a properly driven elasticsearch setup,
but we also add/remove about 50 queries/minute from our
index simultaneously while profiling documents. Would ElasticSearch block
in such cases?

Wondering...

-Adam

--


(system) #4