Hello,
I am new to ES and just started exploring it. One question I have is
regarding _all field indexing. Is there way to influence position offsets
gap between properties of my JSON object when it is being rolled into _all
field similar to how array position_offset_gap flag functions? The reason I
am asking is that I would like to score a hit on multiple terms in a single
field higher than across multiple fields same way as a match on array
element should score higher than across its elements.
The reason I am trying to achieve it is because I am exploring different
approaches to searching a highly structured JSON with fairly small (few
words) individual field contents but very large number of fields and deeply
nested 1-N relationships. Essentially it is JSON representation of a
major business object in our system with couple hundreds fields many of
them are 1-N and graph depth is sometimes 4-6 levels deep (it is attractive
to index it as is so we can utilize the data stored in ES but I am not sure
it is feasible and we may be better off having a special representation
optimized for searching)
I should say that the system is for unstructured searches (structured ones
will be there as well but it is relatively simple once you know which
field(s) you are searching on) where user does not exactly know what field
may contain the data and would want not only find the data but which fields
were matched. One complication is a large number
of repetitive data which is coming from N-1 relations (i.e. Organization
can be referenced several times within our business object with different
purposes and additionally orgs are hierarchical so top portion of the
org hierarchy is highly repetitive across business objects) which tells me
that phrase auto suggest will be crucial especially if I can break it by
field where the phrase can occur)
If I take approach of searching _all I need to inject as much "structure"
in it as I can and as far as I understand the only way is by using offset
gaps and then use phrase/span queries
If I take approach of searching across all fields it might quickly become
very slow (I have not tried this approach at all yet).
- Anyone has experience of running complex queries against couple of
hundreds of fields? - Will score calculation be meaningful?
- And anyways what kind of queries can I run against couple of hundreds of
small text fields? AND? SPAN? PHRASE? DIS_MAX?
Sorry for the lengthy post I will post my architectural/design questions
separately
Thank you,
Alex
--