ES 6 sparse docs and index.mapping.total_fields.limit


(Felix Barnsteiner) #1

I'm currently trying to understand how to map http request parameters into Elasticsearch.

Ideally, I'd have one field per query parameter

parameters:
  foo: "bar"
  baz: "qux"
  q: "search query"

This would allow me to easily analyze the search queries by performing a terms aggregation on parameters.q.

But as I've learned, this leads to sparse documents which are bad. It's also possible that you hit the total fields limit of 1000 fields by mapping parameters that way.

But as Lucene 7/Elasticsearch 6 will come with better support for sparse documents (https://www.elastic.co/blog/elasticsearch-6-0-0-alpha1-released#sparse-doc-values), does the recommendation against sparse documents still hold true? Will the field limit of 1000 still remain in ES 6? How would you map request parameters to Elasticsearch now and in ES 6?


(Felix Barnsteiner) #2

@warkolm Hey bud, could you help me out here?


(Felix Barnsteiner) #3

@spinscale do you have any information on this?


(Alexander Reelsen) #4

Hey,

where as the sparse documents issue has been improved dramatically, you still run into the issue of exploding field mapping and huge cluster state. Have you thought of changing your model to a nested type with elements like

parameters: [
  { "key":"foo", "value":"bar"}
  { "key":"spam", "value":"eggs"}
]

to prevent mapping explosion?


(Amit Ripshtos) #5

As a person who shares almost the same problem (each user has it's own "schema"), this nested trick is not helping much since the searches/aggregations become wrong because of it, and it's a lot nicer to use kibana without this schema (because for example in kibana you need to show key "foo" and "spam" in same graph)


(Felix Barnsteiner) #6

That's an important information for me, thx. Seems like the nested approach is the only way to go performance-wise.

But the problem is that i can't do a terms aggregation of the foo parameter, right? Or is there a workaround for that?

Maybe I'll add a configuration option for a explicit whitelist of parameters which should be converted to the non nested version and use nested parameters by default...


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.