Index first element in array

Hello, I'm using ElasticSearch 0.20.6 and trying to index the first element
in an array as a multi_field type. I need to query for the first item in
an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently I'm
getting two, which is why I think something must be wrong with my pattern.
Also, is there a better way to query or index the data for my purposes?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you tried using the analysis API to test your analyzer?

I'll play around with the regex later.

--
Ivan

On Tue, May 7, 2013 at 9:40 PM, thesuaves22@gmail.com wrote:

Hello, I'm using Elasticsearch 0.20.6 and trying to index the first
element in an array as a multi_field type. I need to query for the first
item in an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently I'm
getting two, which is why I think something must be wrong with my pattern.
Also, is there a better way to query or index the data for my purposes?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, and the analyze API gives me the single token I expect for this regex
(the first item in the array). Is there something I'm doing wrong in how
I'm indexing the array as a multi_field?

On Wednesday, May 8, 2013 11:00:11 AM UTC-4, Ivan Brusic wrote:

Have you tried using the analysis API to test your analyzer?

Elasticsearch Platform — Find real-time answers at scale | Elastic

I'll play around with the regex later.

--
Ivan

On Tue, May 7, 2013 at 9:40 PM, <thesu...@gmail.com <javascript:>> wrote:

Hello, I'm using Elasticsearch 0.20.6 and trying to index the first
element in an array as a multi_field type. I need to query for the first
item in an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently I'm
getting two, which is why I think something must be wrong with my pattern.
Also, is there a better way to query or index the data for my purposes?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Looking at your data, the places field is multi-valued, so the analyzer
will be applied to each value of the field, which means it will find the
first word of every value, or in your case, every value.

Your array would need to be represented as a single string for your
tokenizer to work. Lucene has a LimitTokenCountFilter, but it is not
exposed in Elasticsearch. You can create a plugin to support that filter,
or open an issue and have the ES team expose the filter.

Cheers,

Ivan

On Wed, May 8, 2013 at 8:16 AM, thesuaves22@gmail.com wrote:

Yes, and the analyze API gives me the single token I expect for this regex
(the first item in the array). Is there something I'm doing wrong in how
I'm indexing the array as a multi_field?

On Wednesday, May 8, 2013 11:00:11 AM UTC-4, Ivan Brusic wrote:

Have you tried using the analysis API to test your analyzer?

Elasticsearch Platform — Find real-time answers at scale | Elastic**
indices-analyze/http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/

I'll play around with the regex later.

--
Ivan

On Tue, May 7, 2013 at 9:40 PM, thesu...@gmail.com wrote:

Hello, I'm using Elasticsearch 0.20.6 and trying to index the first
element in an array as a multi_field type. I need to query for the first
item in an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently I'm
getting two, which is why I think something must be wrong with my pattern.
Also, is there a better way to query or index the data for my purposes?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan, thank you for your help! I have opened an
issue: Analysis: Expose LimitTokenCountFilter in ElasticSearch · Issue #3013 · elastic/elasticsearch · GitHub

On Wednesday, May 8, 2013 12:20:36 PM UTC-4, Ivan Brusic wrote:

Looking at your data, the places field is multi-valued, so the analyzer
will be applied to each value of the field, which means it will find the
first word of every value, or in your case, every value.

Your array would need to be represented as a single string for your
tokenizer to work. Lucene has a LimitTokenCountFilter, but it is not
exposed in Elasticsearch. You can create a plugin to support that filter,
or open an issue and have the ES team expose the filter.

Cheers,

Ivan

On Wed, May 8, 2013 at 8:16 AM, <thesu...@gmail.com <javascript:>> wrote:

Yes, and the analyze API gives me the single token I expect for this
regex (the first item in the array). Is there something I'm doing wrong in
how I'm indexing the array as a multi_field?

On Wednesday, May 8, 2013 11:00:11 AM UTC-4, Ivan Brusic wrote:

Have you tried using the analysis API to test your analyzer?

Elasticsearch Platform — Find real-time answers at scale | Elastic**
indices-analyze/http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/

I'll play around with the regex later.

--
Ivan

On Tue, May 7, 2013 at 9:40 PM, thesu...@gmail.com wrote:

Hello, I'm using Elasticsearch 0.20.6 and trying to index the first
element in an array as a multi_field type. I need to query for the first
item in an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently
I'm getting two, which is why I think something must be wrong with my
pattern. Also, is there a better way to query or index the data for my
purposes?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Adding support for an existing Lucene filter is almost trivial. I will
submit a pull request if I find the time (not today!). If you don't want to
wait for a fix to be released, a plugin is not difficult, but it requires a
lot of boilerplate code. You can work off any of the existing analysis
plugins.

--
Ivan

On Wed, May 8, 2013 at 10:31 AM, thesuaves22@gmail.com wrote:

Ivan, thank you for your help! I have opened an issue:
Analysis: Expose LimitTokenCountFilter in ElasticSearch · Issue #3013 · elastic/elasticsearch · GitHub

On Wednesday, May 8, 2013 12:20:36 PM UTC-4, Ivan Brusic wrote:

Looking at your data, the places field is multi-valued, so the analyzer
will be applied to each value of the field, which means it will find the
first word of every value, or in your case, every value.

Your array would need to be represented as a single string for your
tokenizer to work. Lucene has a LimitTokenCountFilter, but it is not
exposed in Elasticsearch. You can create a plugin to support that filter,
or open an issue and have the ES team expose the filter.

Cheers,

Ivan

On Wed, May 8, 2013 at 8:16 AM, thesu...@gmail.com wrote:

Yes, and the analyze API gives me the single token I expect for this
regex (the first item in the array). Is there something I'm doing wrong in
how I'm indexing the array as a multi_field?

On Wednesday, May 8, 2013 11:00:11 AM UTC-4, Ivan Brusic wrote:

Have you tried using the analysis API to test your analyzer?

Elasticsearch Platform — Find real-time answers at scale | Elastic**
es-analyze/http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/

I'll play around with the regex later.

--
Ivan

On Tue, May 7, 2013 at 9:40 PM, thesu...@gmail.com wrote:

Hello, I'm using Elasticsearch 0.20.6 and trying to index the first
element in an array as a multi_field type. I need to query for the first
item in an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently
I'm getting two, which is why I think something must be wrong with my
pattern. Also, is there a better way to query or index the data for my
purposes?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.**com.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Good to hear it is fairly straightforward! I have an existing plugin I've
built to power the specific search functionality I need. I'll try adding
this to my plugin. Thanks again for your help!

On Wednesday, May 8, 2013 3:07:38 PM UTC-4, Ivan Brusic wrote:

Adding support for an existing Lucene filter is almost trivial. I will
submit a pull request if I find the time (not today!). If you don't want to
wait for a fix to be released, a plugin is not difficult, but it requires a
lot of boilerplate code. You can work off any of the existing analysis
plugins.

--
Ivan

On Wed, May 8, 2013 at 10:31 AM, <thesu...@gmail.com <javascript:>> wrote:

Ivan, thank you for your help! I have opened an issue:
Analysis: Expose LimitTokenCountFilter in ElasticSearch · Issue #3013 · elastic/elasticsearch · GitHub

On Wednesday, May 8, 2013 12:20:36 PM UTC-4, Ivan Brusic wrote:

Looking at your data, the places field is multi-valued, so the analyzer
will be applied to each value of the field, which means it will find the
first word of every value, or in your case, every value.

Your array would need to be represented as a single string for your
tokenizer to work. Lucene has a LimitTokenCountFilter, but it is not
exposed in Elasticsearch. You can create a plugin to support that filter,
or open an issue and have the ES team expose the filter.

Cheers,

Ivan

On Wed, May 8, 2013 at 8:16 AM, thesu...@gmail.com wrote:

Yes, and the analyze API gives me the single token I expect for this
regex (the first item in the array). Is there something I'm doing wrong in
how I'm indexing the array as a multi_field?

On Wednesday, May 8, 2013 11:00:11 AM UTC-4, Ivan Brusic wrote:

Have you tried using the analysis API to test your analyzer?

Elasticsearch Platform — Find real-time answers at scale | Elastic**
es-analyze/http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/

I'll play around with the regex later.

--
Ivan

On Tue, May 7, 2013 at 9:40 PM, thesu...@gmail.com wrote:

Hello, I'm using Elasticsearch 0.20.6 and trying to index the first
element in an array as a multi_field type. I need to query for the first
item in an array, and this is the only way I could come up with:

Analysis settings:

index:
analysis:
tokenizer:
first_item_token:
type: pattern
pattern: "^(?:[^\w*])(\w).*$"
group: 1
analyzer:
first_item:
type: custom
tokenizer: first_item_token
filter: [standard,lowercase]

And finally, my mapping:

"places":{
"type":"multi_field",
"fields":{
"places":{
"type":"string"
},
"places_first":{
"analyzer":"first_item",
"type":"string"
}
}
}

And my data looks something like:

{"places": ["restaurant", "mall"]},
{"places": ["mall"]},
...

I want to query places.places_first:mall and get one hit. Currently
I'm getting two, which is why I think something must be wrong with my
pattern. Also, is there a better way to query or index the data for my
purposes?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.**com.

For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.