Index full document or subset?

Rvs · January 8, 2020, 2:06am

I Had a newbie question. Would appreciate some advice here.

Should i index my entire document or only the data (subset)within my document that i know will be searched on.

What is the philosophy of elasticsearch. Should we create indexes on only a subset of our document that will be searched, to improve performance.

If in the future there is a new field that needs to be searched, then i would have to reprocess the document to include that new field in the index.

So is indexing the entire document future proofing all searches or would performance degrade with that approach.

I am trying to see what approach elastisearch recommends.

Thank you

dadoonet · January 8, 2020, 8:49am

Welcome!

I'd index the full document. Specifically if you need to use some of the data to display results even though you don't search on those fields.

If you don't want the fields to be searchable, you can change the mapping and define "index": false (see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html).

If you change your mind, you can always use the reindex API to reindex the documents with a new strategy for those fields. As you have the full _source, it will make your life easier.

Long story short: send the whole document to elasticsearch.

Rvs · January 8, 2020, 11:28am

Thank you for the reply. I really appreciate it.
Would size of the document be an issue from a cost or performance perspective.

Also do i need to map all the fields in my document explicitly?
I understand elasticsearch will index all fields by default if an explicit mapping is not provided (dynamic mapping?). Is relying on that not recommended.

Thank you again

dadoonet · January 8, 2020, 11:54am

Yes. If you are storing things like BASE64 blobs, then it will have an impact (disk size and performance wise). It depends. What does a typical document would look like? Could you share that?

I do not recommend it. It's very good to begin with elasticsearch but at the end, you want to have more control on what is happening. You can define a dynamic template: see Dynamic templates | Elasticsearch Guide [8.11] | Elastic.

Rvs · January 8, 2020, 7:49pm

It's not just the size of the document (which might not be a big deal for now).

The variation in the document is large too (which would make mapping all fields cumbersome).

My document is a xml document that follows commercial insurance ACORD model.

Lots of nested tags and lots of optional data tags, so we'd have to map a ton of fields and mark a lot of them to be NOT indexed

dadoonet · January 8, 2020, 8:19pm

If all the optional fields are under another field like foo and bar under meta (meta.foo and meta.bar), then you can just disable meta object field and everything under will be ignored.

See https://www.elastic.co/guide/en/elasticsearch/reference/7.5/enabled.html

Rvs · January 9, 2020, 1:18am

Thank you for your patience. I do have one last question.

Can we map the fields we want and configure elasticsearch to ignore unmapped fields by default, instead of applying dynamic mapping

Thank you

dadoonet · January 9, 2020, 2:53am

Yes. Dynamic templates will help for this.

system · February 6, 2020, 2:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search over large documents Elasticsearch	5	343	March 11, 2019
A lightweight partial index Elasticsearch	4	2678	July 6, 2017
How to index only some of the fields of the entire document? Elastic Search elastic-app-search	11	2961	February 24, 2023
Best Indexing approach Elasticsearch	5	430	July 6, 2017
Architecture and performance question on searching small subsets of documents Elasticsearch	4	390	July 6, 2017

Index full document or subset?

Related topics