Real-time Response Time

Howdy,
We have a Couchdb database of 10million documents (6GB in size), on which
we do realtime search by map/reduce views.
I installed ES with couchdb river to see if I can acheive realtime response
time (<500 ms) in searching our dataset.
with a default configuration, indexes grow to 7GB in ES, and response time
of a query like f1:v1* AND f2:v2* was more than
1.5 secs, and around 2, 3 secs in some cases!
I want to know if I can use a configuration, or tweak my indexes
(ignore unnecessary fields or optimize for a subset of searches)
to achieve <0.5 sec response time!!?

Wildcard searches are not the most performant ones.
You should think of building an efficient mapping for your documents.

Plus, you should ask yourself what are your users looking for.
If you need full text search (like Google), you probably don't want to use wildcards.

Could you post a sample of your documents ?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 24 juil. 2012 à 19:20, Behrad Zari behradz@gmail.com a écrit :

Howdy,
We have a Couchdb database of 10million documents (6GB in size), on which we do realtime search by map/reduce views.
I installed ES with couchdb river to see if I can acheive realtime response time (<500 ms) in searching our dataset.
with a default configuration, indexes grow to 7GB in ES, and response time of a query like f1:v1* AND f2:v2* was more than
1.5 secs, and around 2, 3 secs in some cases!
I want to know if I can use a configuration, or tweak my indexes (ignore unnecessary fields or optimize for a subset of searches)
to achieve <0.5 sec response time!!?

David,
Consider a business directory which queries are patterns (wildcards are
crucial to us) on 2, 3 fields that can be AND-ed together! fields are
Strings like, names, addresses, phones, tags, ... in a simple structured
json.

I also have an off-topic question: Can we change the default lucene indexer
to enable homophonic characters in Arabic/Persian (too support mis-spelling
of words)? forexample searching for صلام also contains results like سلام!
(In English it means searching foto results in photo...)

On Tuesday, July 24, 2012 9:50:51 PM UTC+4:30, Behrad Zari wrote:

Howdy,
We have a Couchdb database of 10million documents (6GB in size), on which
we do realtime search by map/reduce views.
I installed ES with couchdb river to see if I can acheive realtime
response time (<500 ms) in searching our dataset.
with a default configuration, indexes grow to 7GB in ES, and response time
of a query like f1:v1* AND f2:v2* was more than
1.5 secs, and around 2, 3 secs in some cases!
I want to know if I can use a configuration, or tweak my indexes
(ignore unnecessary fields or optimize for a subset of searches)
to achieve <0.5 sec response time!!?

Hi Behrad

Consider a business directory which queries are patterns (wildcards
are crucial to us)

You can achieve the same effect as wildcards but without using wildcard
queries, which are inefficient. You need to analyze the strings you
want to be searchable as wildcards using either the edge ngram
tokenizer, or the ngram tokenizer (depending on whether you want to
match from the beginning of the word, or anywhere within a word)

See this post for an example of using edge ngrams:

on 2, 3 fields that can be AND-ed together! fields are Strings like,
names, addresses, phones, tags, ... in a simple structured json.

Doing AND queries is fine, either by changing the default_operator for a
query string query or a text query to 'AND', or using bool or dismax
query clauses to join multiple queries.

I also have an off-topic question: Can we change the default lucene
indexer to enable homophonic characters in Arabic/Persian (too support
mis-spelling of words)? forexample searching for صلام also contains
results like سلام!
(In English it means searching foto results in photo...)

I'm not sure what is available in Lucene for Arabic homophones, but the
mapping character filter and the synonyms token filter may be useful:

Possibly also the pattern replace filter:

clint