How expensive is the Source Filtering?

hym · November 10, 2018, 9:00pm

Hi,
Pretty big documents are stored in the Index but with not so much of fields indexable. let's say only 5% of fields are indexed but the rest just stored in _source field.
As mentioned the document is so big and when no Source Filtering is set, it takes a lot of time to have the result (just because of IO, not search). To have records in a reasonable time we decided to use Source filter but I am not sure how expensive is it for Elasticsearch (lucenc) to apply the projection. Please let me know if it would not be a source of performance issues or we should break down the document into more than one index.

Thanks in advance

Mark_Harwood · November 11, 2018, 10:48am

“Source” is stored as a blob of JSON in the underlying Lucene storage. It can be filtered but not without incurring the cost of reading the full JSON from Lucene.
At index time you can choose to extract selected fields for storage in Lucene where they can be retrieved at query time individuallly without needing to parse full JSON. See https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-store.html

hym · November 11, 2018, 11:03pm

@Mark_Harwood thanks for replay.
I just wanted to get some insights into the size of documents. We have document sizes of around 400KB (~1M 3 of them). I know it depends on underlying hardware, IO and etc. But is these kind of numbers are normal or our documents are too big?

One more thing, I take a look at the Explain and Profile APIs which I think are good more to investigate the index/search issues (if I got them right). Is there any other way to assess the IO times and numbers?

Thanks in advance

warkolm · November 12, 2018, 12:42am

Those are relatively large documents, at least based on my experience.

Mark_Harwood · November 12, 2018, 8:41am

Rally is our benchmarking tool

system · December 10, 2018, 8:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search source code and also use as store for them at the same time Elasticsearch	7	883	March 13, 2020
Possible optimisations for large _source documents Elasticsearch	7	595	July 5, 2017
Question about source data Elasticsearch	2	458	July 5, 2017
Disabling _source field Elasticsearch	22	1965	July 6, 2017
Performance issues around _source and large page size Elasticsearch	5	1001	July 5, 2017

How expensive is the Source Filtering?

Related topics