Real-time Response Time

Behrad_Zari · July 24, 2012, 5:20pm

Howdy,
We have a Couchdb database of 10million documents (6GB in size), on which
we do realtime search by map/reduce views.
I installed ES with couchdb river to see if I can acheive realtime response
time (<500 ms) in searching our dataset.
with a default configuration, indexes grow to 7GB in ES, and response time
of a query like f1:v1* AND f2:v2* was more than
1.5 secs, and around 2, 3 secs in some cases!
I want to know if I can use a configuration, or tweak my indexes
(ignore unnecessary fields or optimize for a subset of searches)
to achieve <0.5 sec response time!!?

dadoonet · July 24, 2012, 5:42pm

Wildcard searches are not the most performant ones.
You should think of building an efficient mapping for your documents.

Plus, you should ask yourself what are your users looking for.
If you need full text search (like Google), you probably don't want to use wildcards.

Could you post a sample of your documents ?

David
Twitter : @dadoonet / @elasticsearchfr

Le 24 juil. 2012 à 19:20, Behrad Zari behradz@gmail.com a écrit :

Howdy,
We have a Couchdb database of 10million documents (6GB in size), on which we do realtime search by map/reduce views.
I installed ES with couchdb river to see if I can acheive realtime response time (<500 ms) in searching our dataset.
with a default configuration, indexes grow to 7GB in ES, and response time of a query like f1:v1* AND f2:v2* was more than
1.5 secs, and around 2, 3 secs in some cases!
I want to know if I can use a configuration, or tweak my indexes (ignore unnecessary fields or optimize for a subset of searches)
to achieve <0.5 sec response time!!?

Behrad_Zari · July 24, 2012, 10:40pm

David,
Consider a business directory which queries are patterns (wildcards are
crucial to us) on 2, 3 fields that can be AND-ed together! fields are
Strings like, names, addresses, phones, tags, ... in a simple structured
json.

I also have an off-topic question: Can we change the default lucene indexer
to enable homophonic characters in Arabic/Persian (too support mis-spelling
of words)? forexample searching for صلام also contains results like سلام!
(In English it means searching foto results in photo...)

On Tuesday, July 24, 2012 9:50:51 PM UTC+4:30, Behrad Zari wrote:

Howdy,
We have a Couchdb database of 10million documents (6GB in size), on which
we do realtime search by map/reduce views.
I installed ES with couchdb river to see if I can acheive realtime
response time (<500 ms) in searching our dataset.
with a default configuration, indexes grow to 7GB in ES, and response time
of a query like f1:v1* AND f2:v2* was more than
1.5 secs, and around 2, 3 secs in some cases!
I want to know if I can use a configuration, or tweak my indexes
(ignore unnecessary fields or optimize for a subset of searches)
to achieve <0.5 sec response time!!?

Clinton_Gormley · July 25, 2012, 8:04am

Hi Behrad

Consider a business directory which queries are patterns (wildcards
are crucial to us)

You can achieve the same effect as wildcards but without using wildcard
queries, which are inefficient. You need to analyze the strings you
want to be searchable as wildcards using either the edge ngram
tokenizer, or the ngram tokenizer (depending on whether you want to
match from the beginning of the word, or anywhere within a word)

See this post for an example of using edge ngrams:

on 2, 3 fields that can be AND-ed together! fields are Strings like,
names, addresses, phones, tags, ... in a simple structured json.

Doing AND queries is fine, either by changing the default_operator for a
query string query or a text query to 'AND', or using bool or dismax
query clauses to join multiple queries.

I also have an off-topic question: Can we change the default lucene
indexer to enable homophonic characters in Arabic/Persian (too support
mis-spelling of words)? forexample searching for ØµÙØ§Ù also contains
results like Ø³ÙØ§Ù!
(In English it means searching foto results in photo...)

I'm not sure what is available in Lucene for Arabic homophones, but the
mapping character filter and the synonyms token filter may be useful:

Possibly also the pattern replace filter:

clint

Topic		Replies	Views
Couchdb river index performance slows down after a few hours Elasticsearch	1	303	July 6, 2017
Slow to index Elasticsearch	13	362	July 6, 2017
CouchDB river and flush index in ES Elasticsearch	2	294	July 6, 2017
Query response time not scaling well with large resultsets Elasticsearch	8	401	July 6, 2017
Is my response time is ok? Elasticsearch	18	7948	July 6, 2018

Real-time Response Time

Related topics