Controlling position offset gap between individual fields of _all field

Is there way to influence position offsets gap between individual fields in
_all field similar to how array's position_offset_gap flag functions?
I would like to score of a hit on multiple terms of _all originating from
a single field to be higher than hit across multiple originating fields
within _all.
Essentially the same as I can increase score of a a phrase match on a
single array element vs that matching across multiple elements of an array

Thank you,
Alex

--

The _all field adds all the values from the source fields as one
stream of tokens. There isn't a way of influencing the position
offsets gaps based on the source fields.

There is a different way you can achieve this. You can create your own
all field with the use of multi-field. Lets say you have two fields in
your mapping: name and description. You make both fields of type
multi_field, both of these multi_fields included your own all field.
Lets name this field 'my_all'. If you give 'my_all' field the same
index_name, these fields point to the same field in the inverted
index. This custom my_all field does take field boundaries into
account specified with position_offset_gap.

I have created an example:

The can put the match phrase query as a should with a higher boost
inside your main bool query.

Martijn

On 19 December 2012 00:34, AlexR roytmana@gmail.com wrote:

Is there way to influence position offsets gap between individual fields in
_all field similar to how array's position_offset_gap flag functions?
I would like to score of a hit on multiple terms of _all originating from
a single field to be higher than hit across multiple originating fields
within _all.
Essentially the same as I can increase score of a a phrase match on a single
array element vs that matching across multiple elements of an array

Thank you,
Alex

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Thank you Martijn!
Not just a hint but a complete example thanks again!

How will gap setting via multifield work for arrays whic will have their
own gap setting?
like this:
field1field2array[0]array[1]field4

With over hundred of fields it will be quite a mapping if I had to do
multifield for each field.
I wonder if I am not better off just putting together a big composite field
(array) myself. The only thing here I guess is that I will not be able to
have different gaps between different fields or fields and arrays

To be honest, I am not sure using _all (or equivalents) is the best
strategy. I am very new to ES and lucene and do not have a good feel what
work well and if I should try to use multifield queries rather than a
synthetic _all fields. But I am worried that there are too many fields and
given a complex query it'll be a killer. plus I do not understand just how
well multifield query scoring works (whether it is done via boolean or
dismax)

I am about to post a big help me architectural question :slight_smile: part of which
is use of multi field searches vs synthetic _all field etc
Hopefully someone will be kind enough to give me few suggestions as to
which way to go

--

How will gap setting via multifield work for arrays whic will have their own
gap setting?
like this:
field1field2array[0]array[1]field4
Not sure, but position wise the field gap can't be crossed. The array
gap depends on the position offset gap set in your custom all field.

With over hundred of fields it will be quite a mapping if I had to do
multifield for each field.
I wonder if I am not better off just putting together a big composite field
(array) myself. The only thing here I guess is that I will not be able to
have different gaps between different fields or fields and arrays
Why would you want to have different position offset gaps? The use of this
settings is just not has phase query matches and highlight snippets
that cross the array element boundary.
Just one position offset gap is just fine.

To be honest, I am not sure using _all (or equivalents) is the best
strategy. I am very new to ES and lucene and do not have a good feel what
work well and if I should try to use multifield queries rather than a
synthetic _all fields. But I am worried that there are too many fields and
given a complex query it'll be a killer. plus I do not understand just how
well multifield query scoring works (whether it is done via boolean or
dismax)
Nothing more different than scoring for a normal field. For scoring
it doesn't matter that the matching terms originate from different
array elements.

--

With over hundred of fields it will be quite a mapping if I had to do

multifield for each field.
I wonder if I am not better off just putting together a big composite
field
(array) myself. The only thing here I guess is that I will not be able
to
have different gaps between different fields or fields and arrays
Why would you want to have different position offset gaps? The use of this
settings is just not has phase query matches and highlight snippets
that cross the array element boundary.
Just one position offset gap is just fine.

Martijn,

My goal would be not to prevent matches across boundaries of fields that
make up _all (or similar composite field) but rather penalize (reduce
scores) of such matches vs. match within an individual field. If I could
control gaps between individual "pieces" of my "_all"-like composite field
I could use average field length (in words) to set reasonable gaps between
such pieces in my composite field and use phrase query perhaps? For example
field1, field2, field3 make up my composite field and these respective
fields have 10, 20 and 30 average word count. gaps could be 20 for
field1-field2 and 30 for field2-field3

But as I said it all might be fairly useless since in the end by combining
fields into a composite one and searching it rather than individual fields
does lose some of the precision.

If you had a JSON with over hundred pretty small text fields (3-5 to 15-20
words long) and needed to search them all most of the time (plus faceting
and search on some known fields and not much need for busting individual
fields) would you search on _all-like composite field or on individual
fields using myrecord.* multi-field search? Which should yield better
relevance you think? How well multifield queries (particularly multifield
phrase) calculate scores especially given large number of short fields
involved. Will it be prohibitively expensive to search on such large number
of fields

Thanks again,
Alex

--