Hi,
is there a difference in performance / cpu usage if I search for the complete string of a field or if i search on the .keyword field?
example:
type: myLog1
versus
type.keyword: mylog1
thanks, Andreas
Hi,
is there a difference in performance / cpu usage if I search for the complete string of a field or if i search on the .keyword field?
example:
type: myLog1
versus
type.keyword: mylog1
thanks, Andreas
Hi Andreas,
I'm not sure I can confidently answer this question but I will try. My assumption is searching on the field without keyword will have to run the query through Elasticsearch's analyzer, and then match with the parsed out fields will take some extra CPU. But I don't know the internal of ES well enough to know for sure. I will keep an eye out for your question and get back to you.
Thanks
Rashmi
thanks for the update
Hi Asp
As I understand your question more, here is what I think. The keyword field will be faster since it only needs to find one term in the inverted index, whereas the text field will need to search for each individual term.But in your example there won't be much difference since for both fields there will only be one term to search. If the example was type:"This is myLog1 file"
and type.keyword:"This is myLog1 file"
then the keyword would be faster.
In terms of how fast it can be, it really depends on a lot of different factors but in my mind I simplify it down to how many individual entries in the inverted index you are asking the query to visit. In the case of a keyword field it is always 1 since the search is for a single term which is the entire text. In the case of the text field the number of terms that need to be visited is determined by the search text and also the analyzer thats applied to the field, so could be any number of terms n
. If you consider each visit of a term to require a fixed amount of time/CPU t
(this is an oversimplification since there are some optimizations in multi term searching) then the keyword search will always require 1 x t
but the text field will require n x t
. Hope this explanation helps.
Thanks
Rashmi
Ok, thanks a lot. I also thought in this direction, but I wanted to be sure
Then I have one additional question which is enhancing the original one.
I opnened a different thread for it, but I assume I can answer it with the details you provided:
So if my event has multiple fileds: message, payload, field 1... field n, then it should be faster to name the fields I am searching in, and even faster if I can filter on the keyword field. correct? Because there is one inverted index per field?
Thanks, Andreas
I tested a bit:
result fetches 32k events.
searching for type: xyz takes about 500ms
searching for type.keyword: xyz takes about 3s.
why?
Its reproducible. No caching issue.
Hi Asp
I might need more help from ES team to answer this question for you. Hence I have moved this discussion thread to ElasticSearch discuss
to get a better answer for the performance comparison.
Thanks
Rashmi
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.