Es search optimizing question


(Jaydanielian) #1

I know its dangerous to get general answers when talking about performance,
as the answer usually is "it depends". But I am going to try anyway :slight_smile: My
question is as a general rule of thumb is it better to have a list of items
in an array stored and the query only has to issue a single matching term?
Or store a single value per document and create various terms in an array
passing in those generated terms for the query?

My example use case is this. I am trying to find contacts by name and
email. Emails usually fall into several common patterns (first.last@domain,
first_last@domain, firstinitial_last@domain, etc), so I want to be able to
search against all of those possible combinations in trying to find this
contact in our index. The queries are all filter terms, no wildcard, etc.
The fields are all not_analyzed, so its basically an exact term match that
I am looking for. So, I can either store the extra possible combinations in
the document, and have the query syntax only need to pass in one term (as
the field stored is an array). Or I can pass in the multiple combinations
in a term array in the query syntax, and search against the single email we
have stored in the index.

I know its never a perfect answer, but even general rule of thumb response
from someone with deep internal knowledge of lucene/ES would be
appreciated.

Thanks!

J

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef1b1d61-96b6-4dcd-a658-1385aa3f380f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(suresh) #2

I have data loaded in ES using logstash, and i am using elasticsearch.js in my App to query and fetch the data. I am looking for an optimum solution in which my search is quick and Data File-Size is reduced.

In the present set up search - took 647, - hits total=45806 and Data/FileSize: 14.1MB, browser download Time:6.85s, Which was originally - took 3153, - hits total=45806 and Data/FileSize: 31.9MB, browser download Time:14.77s

I have tried to optimize the JSON search request as below.I need suggestion if there is better one then Ver1.3. I guess the problem in my App is on client side with filedownload option where Data/FileSize is hugh.

Ver1.0
GET k00125_car/_search
{"query":{"filtered":{"query":{"query_string":{"analyze_wildcard":true,"query":""}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"gte":1286952143643}}}],"must_not":[]}}}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"":{}},"fragment_size":2147483647},"size":1000000,"sort":[{"focus_tier":{"order":"desc","unmapped_type":"boolean"}}],"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"1M","pre_zone":"+05:30","pre_zone_adjust_large_interval":true,"min_doc_count":0,"extended_bounds":{"min":1286952143643,"max":1444718543643}}}},"fields":["*","_source"],"script_fields":{},"fielddata_fields":["@timestamp"]}

Ver1.1
GET k00125_car/_search
{
"query": { "match_all": {} },
"size":1000000,
"_source": ["bunit","company_code","customer_number","focus_tier","name","contact_phone","service_address","sum_svchrg"]
}

Ver1.2
GET k00125_car/_search
{
"size":1000000
}

Ver1.3
GET k00125_car/_search
{
"fields": ["bunit","company_code","customer_number","focus_tier","name","contact_phone","service_address","sum_svchrg"],
"size":1000000

}


(Christian Dahlqvist) #3

This thread is 9 months old. Please create a new thread for your question.


(suresh) #4

Done


(system) #5