Index full document or subset?

I Had a newbie question. Would appreciate some advice here.

Should i index my entire document or only the data (subset)within my document that i know will be searched on.

What is the philosophy of elasticsearch. Should we create indexes on only a subset of our document that will be searched, to improve performance.

If in the future there is a new field that needs to be searched, then i would have to reprocess the document to include that new field in the index.

So is indexing the entire document future proofing all searches or would performance degrade with that approach.

I am trying to see what approach elastisearch recommends.

Thank you

Welcome!

I'd index the full document. Specifically if you need to use some of the data to display results even though you don't search on those fields.

If you don't want the fields to be searchable, you can change the mapping and define "index": false (see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html).

If you change your mind, you can always use the reindex API to reindex the documents with a new strategy for those fields. As you have the full _source, it will make your life easier.

Long story short: send the whole document to elasticsearch.

1 Like

Thank you for the reply. I really appreciate it.
Would size of the document be an issue from a cost or performance perspective.

Also do i need to map all the fields in my document explicitly?
I understand elasticsearch will index all fields by default if an explicit mapping is not provided (dynamic mapping?). Is relying on that not recommended.

Thank you again

Yes. If you are storing things like BASE64 blobs, then it will have an impact (disk size and performance wise). It depends. What does a typical document would look like? Could you share that?

I do not recommend it. It's very good to begin with elasticsearch but at the end, you want to have more control on what is happening. You can define a dynamic template: see Dynamic templates | Elasticsearch Guide [8.11] | Elastic.

1 Like

It's not just the size of the document (which might not be a big deal for now).

The variation in the document is large too (which would make mapping all fields cumbersome).

My document is a xml document that follows commercial insurance ACORD model.

Lots of nested tags and lots of optional data tags, so we'd have to map a ton of fields and mark a lot of them to be NOT indexed

If all the optional fields are under another field like foo and bar under meta (meta.foo and meta.bar), then you can just disable meta object field and everything under will be ignored.

See https://www.elastic.co/guide/en/elasticsearch/reference/7.5/enabled.html

1 Like

Thank you for your patience. I do have one last question.

Can we map the fields we want and configure elasticsearch to ignore unmapped fields by default, instead of applying dynamic mapping

Thank you

Yes. Dynamic templates will help for this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.