Removing non-unique tokens during indexing

Hi

I came up with idea of indexing only document-wide unique tokens in a
field. So for example having this document:

{
'category': 'wine',
'description': 'bottle of wine from Marlow brewery'
}

i'd like to remove all tokens in category field from description - making
it better suited for search.

Which means i need to find a place that has tookens for both fields and
remove some of them.
Is there there an easy way to achieve that during indexing on ES side?

Thanks for any advice

--
fiedzia@gmail.com
Maciej Dziardziel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can add both fields to the _all field [1] and then apply an analyzer
with a unique token filter [2] to the _all field. Make sure to exclude
other fields from _all. You can achieve something similar to the _all field
with multi field merging [3].

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[3] Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,
Ivan

On Thu, Aug 22, 2013 at 8:47 AM, maciej@ly.st wrote:

Hi

I came up with idea of indexing only document-wide unique tokens in a
field. So for example having this document:

{
'category': 'wine',
'description': 'bottle of wine from Marlow brewery'
}

i'd like to remove all tokens in category field from description - making
it better suited for search.

Which means i need to find a place that has tookens for both fields and
remove some of them.
Is there there an easy way to achieve that during indexing on ES side?

Thanks for any advice

--
fiedzia@gmail.com
Maciej Dziardziel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks - this may be useful, but i need more flexibility. multifield can
add fields, but i want to subtract them. Could plugin do that?

On Thursday, August 22, 2013 5:09:16 PM UTC+1, Ivan Brusic wrote:

You can add both fields to the _all field [1] and then apply an analyzer
with a unique token filter [2] to the _all field. Make sure to exclude
other fields from _all. You can achieve something similar to the _all field
with multi field merging [3].

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[3] Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,
Ivan

On Thu, Aug 22, 2013 at 8:47 AM, <mac...@ly.st <javascript:>> wrote:

On Thu, Aug 22, 2013 at 8:47 AM, <mac...@ly.st <javascript:>> wrote:

Hi

I came up with idea of indexing only document-wide unique tokens in a
field. So for example having this document:

{
'category': 'wine',
'description': 'bottle of wine from Marlow brewery'
}

i'd like to remove all tokens in category field from description - making
it better suited for search.

Which means i need to find a place that has tookens for both fields and
remove some of them.
Is there there an easy way to achieve that during indexing on ES side?

Thanks for any advice

--
fie...@gmail.com <javascript:>
Maciej Dziardziel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Turned out, stop filter did the trick.

On Friday, August 23, 2013 12:40:04 PM UTC+1, mac...@ly.st wrote:

Thanks - this may be useful, but i need more flexibility. multifield can
add fields, but i want to subtract them. Could plugin do that?

On Thursday, August 22, 2013 5:09:16 PM UTC+1, Ivan Brusic wrote:

You can add both fields to the _all field [1] and then apply an analyzer
with a unique token filter [2] to the _all field. Make sure to exclude
other fields from _all. You can achieve something similar to the _all field
with multi field merging [3].

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[3]
Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,
Ivan

On Thu, Aug 22, 2013 at 8:47 AM, mac...@ly.st wrote:

On Thu, Aug 22, 2013 at 8:47 AM, mac...@ly.st wrote:

Hi

I came up with idea of indexing only document-wide unique tokens in a
field. So for example having this document:

{
'category': 'wine',
'description': 'bottle of wine from Marlow brewery'
}

i'd like to remove all tokens in category field from description -
making it better suited for search.

Which means i need to find a place that has tookens for both fields and
remove some of them.
Is there there an easy way to achieve that during indexing on ES side?

Thanks for any advice

--
fie...@gmail.com
Maciej Dziardziel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.