Controlling position offset gap between properties of a JSON object

Hello,

I am new to ES and just started exploring it. One question I have is
regarding _all field indexing. Is there way to influence position offsets
gap between properties of my JSON object when it is being rolled into _all
field similar to how array position_offset_gap flag functions? The reason I
am asking is that I would like to score a hit on multiple terms in a single
field higher than across multiple fields same way as a match on array
element should score higher than across its elements.

The reason I am trying to achieve it is because I am exploring different
approaches to searching a highly structured JSON with fairly small (few
words) individual field contents but very large number of fields and deeply
nested 1-N relationships. Essentially it is JSON representation of a
major business object in our system with couple hundreds fields many of
them are 1-N and graph depth is sometimes 4-6 levels deep (it is attractive
to index it as is so we can utilize the data stored in ES but I am not sure
it is feasible and we may be better off having a special representation
optimized for searching)

I should say that the system is for unstructured searches (structured ones
will be there as well but it is relatively simple once you know which
field(s) you are searching on) where user does not exactly know what field
may contain the data and would want not only find the data but which fields
were matched. One complication is a large number
of repetitive data which is coming from N-1 relations (i.e. Organization
can be referenced several times within our business object with different
purposes and additionally orgs are hierarchical so top portion of the
org hierarchy is highly repetitive across business objects) which tells me
that phrase auto suggest will be crucial especially if I can break it by
field where the phrase can occur)

If I take approach of searching _all I need to inject as much "structure"
in it as I can and as far as I understand the only way is by using offset
gaps and then use phrase/span queries
If I take approach of searching across all fields it might quickly become
very slow (I have not tried this approach at all yet).

  • Anyone has experience of running complex queries against couple of
    hundreds of fields?
  • Will score calculation be meaningful?
  • And anyways what kind of queries can I run against couple of hundreds of
    small text fields? AND? SPAN? PHRASE? DIS_MAX?

Sorry for the lengthy post I will post my architectural/design questions
separately :slight_smile:

Thank you,
Alex

--

Another old followup, but I'm assuming many people were on vacation over
the Christmas and New Years holidays.

On 12/16/2012 2:40 PM, AlexR wrote:

Hello,

I am new to ES and just started exploring it. One question I have is
regarding _all field indexing. Is there way to influence position
offsets gap between properties of my JSON object when it is being
rolled into _all field similar to how array position_offset_gap flag
functions? The reason I am asking is that I would like to score a hit
on multiple terms in a single field higher than across multiple fields
same way as a match on array element should score higher than across
its elements.

If you can't make the all field work, you might consider doing exactly
what you suggest and create your own "allToSearch" field then put all
the values you expect to search into an array in this field using an
appropriate position_offset_gap, so that span queries work as you would
like. If you don't use spans or phrases, playing with positions don't
matter.

If I take approach of searching across all fields it might quickly
become very slow (I have not tried this approach at all yet).

  • Anyone has experience of running complex queries against couple of
    hundreds of fields?

You might be surprised how fast it is.

  • Will score calculation be meaningful?

Multifield scoring works great, because it based on formulas used in
Lucene, but sometimes scoring can be surprising.

  • And anyways what kind of queries can I run against couple of
    hundreds of small text fields? AND? SPAN? PHRASE? DIS_MAX?

You'd have HUGE expressions listing all the various fields.

Sorry for the lengthy post I will post my architectural/design
questions separately :slight_smile:

Thank you,
Alex

--