I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
You can see that the text Api23-82199996 gets broken down into two terms,
Api23, and 82199996 that get indexed. If you want to treat it as a single
term, you need to define in a mapping that trackingId is not analyzed.
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
Here's a link to a ticket I opened for the UI, figured I'd start
there: Jira
thanks a lot, you were right, it was analyzed. However, I changed it (and killed my indices), checked my metadata and it's not analyzed, but if the content in the field in Api23 (vs api23) then the wildcard query doesn't work. What am I missing? I tried both upper and lower case search query, but it seems to be dependent on the content in the document, which is weird to me.
Thanks,
/Hakan
On Oct 8, 2011, at 11:47 AM, Shay Banon wrote:
trackingId is probably analyzed, so its gets broken down into several terms, using this:
create a sample index
curl -XPUT localhost:9200/test
see how the text for trackingId get analyzed using the default (standard) analyzer
You can see that the text Api23-82199996 gets broken down into two terms, Api23, and 82199996 that get indexed. If you want to treat it as a single term, you need to define in a mapping that trackingId is not analyzed.
On Fri, Oct 7, 2011 at 10:16 PM, Hakan Lindestaf hakan@lindestaf.com wrote:
Hi,
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
thanks a lot, you were right, it was analyzed. However, I changed it (and killed my indices), checked my metadata and it's not analyzed, but if the content in the field in Api23 (vs api23) then the wildcard query doesn't work. What am I missing? I tried both upper and lower case search query, but it seems to be dependent on the content in the document, which is weird to me.
Thanks,
/Hakan
On Oct 8, 2011, at 11:47 AM, Shay Banon wrote:
trackingId is probably analyzed, so its gets broken down into several terms, using this:
create a sample index
curl -XPUT localhost:9200/test
see how the text for trackingId get analyzed using the default (standard) analyzer
You can see that the text Api23-82199996 gets broken down into two terms, Api23, and 82199996 that get indexed. If you want to treat it as a single term, you need to define in a mapping that trackingId is not analyzed.
On Fri, Oct 7, 2011 at 10:16 PM, Hakan Lindestaf ha...@lindestaf.com wrote:
Hi,
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
The problem is that I'm using Logstash as the UI (and I've been told it's using the Java API client to access ES). So I can't see what the real search parameter is unfortunately.
However when I do searches I can guess what it does.
If my data looks like this:
api12-xxxxyyyy
then any of these searches bring back the same result:
api12*
Api12*
API12*
However if the data looks like this:
Api12-xxxxyyyy
then none of the combinations above bring back any results (only the full exact match works).
I also verified this with other (non_analyzed) fields. If the content has upper case characters then the wildcard search doesn't seem to work.
/Hakan
On Oct 12, 2011, at 12:01 PM, Jamshid wrote:
So you're using the "keyword" analyzer now? You probably have to set
lowercase_expanded_terms=false.
thanks a lot, you were right, it was analyzed. However, I changed it (and killed my indices), checked my metadata and it's not analyzed, but if the content in the field in Api23 (vs api23) then the wildcard query doesn't work. What am I missing? I tried both upper and lower case search query, but it seems to be dependent on the content in the document, which is weird to me.
Thanks,
/Hakan
On Oct 8, 2011, at 11:47 AM, Shay Banon wrote:
trackingId is probably analyzed, so its gets broken down into several terms, using this:
create a sample index
curl -XPUT localhost:9200/test
see how the text for trackingId get analyzed using the default (standard) analyzer
You can see that the text Api23-82199996 gets broken down into two terms, Api23, and 82199996 that get indexed. If you want to treat it as a single term, you need to define in a mapping that trackingId is not analyzed.
On Fri, Oct 7, 2011 at 10:16 PM, Hakan Lindestaf ha...@lindestaf.com wrote:
Hi,
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
The problem is that I'm using Logstash as the UI (and I've been told it's using the Java API client to access ES). So I can't see what the real search parameter is unfortunately.
However when I do searches I can guess what it does.
If my data looks like this:
api12-xxxxyyyy
then any of these searches bring back the same result:
api12*
Api12*
API12*
However if the data looks like this:
Api12-xxxxyyyy
then none of the combinations above bring back any results (only the full exact match works).
I also verified this with other (non_analyzed) fields. If the content has upper case characters then the wildcard search doesn't seem to work.
/Hakan
On Oct 12, 2011, at 12:01 PM, Jamshid wrote:
So you're using the "keyword" analyzer now? You probably have to set
lowercase_expanded_terms=false.
thanks a lot, you were right, it was analyzed. However, I changed it (and killed my indices), checked my metadata and it's not analyzed, but if the content in the field in Api23 (vs api23) then the wildcard query doesn't work. What am I missing? I tried both upper and lower case search query, but it seems to be dependent on the content in the document, which is weird to me.
Thanks,
/Hakan
On Oct 8, 2011, at 11:47 AM, Shay Banon wrote:
trackingId is probably analyzed, so its gets broken down into several terms, using this:
create a sample index
curl -XPUT localhost:9200/test
see how the text for trackingId get analyzed using the default (standard) analyzer
You can see that the text Api23-82199996 gets broken down into two terms, Api23, and 82199996 that get indexed. If you want to treat it as a single term, you need to define in a mapping that trackingId is not analyzed.
On Fri, Oct 7, 2011 at 10:16 PM, Hakan Lindestaf ha...@lindestaf.com wrote:
Hi,
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
Ahhh, now I understand. That makes sense and it solved my problem. I think the search query was automatically (by the Logstash UI) made lower case, so with the default analyzer it didn't pick up the lower case search terms. With this change it all works! Thanks a lot!
/Hakan
On Oct 12, 2011, at 9:53 PM, David Pilato wrote:
You should use a keyword analyzer with lowercase filter.
Define your own analyzer (keylowercase) and apply it to your field.
Then, when the user enter a search term, lowercase it.
The problem is that I'm using Logstash as the UI (and I've been told it's using the Java API client to access ES). So I can't see what the real search parameter is unfortunately.
However when I do searches I can guess what it does.
If my data looks like this:
api12-xxxxyyyy
then any of these searches bring back the same result:
api12*
Api12*
API12*
However if the data looks like this:
Api12-xxxxyyyy
then none of the combinations above bring back any results (only the full exact match works).
I also verified this with other (non_analyzed) fields. If the content has upper case characters then the wildcard search doesn't seem to work.
/Hakan
On Oct 12, 2011, at 12:01 PM, Jamshid wrote:
So you're using the "keyword" analyzer now? You probably have to set
lowercase_expanded_terms=false.
thanks a lot, you were right, it was analyzed. However, I changed it (and killed my indices), checked my metadata and it's not analyzed, but if the content in the field in Api23 (vs api23) then the wildcard query doesn't work. What am I missing? I tried both upper and lower case search query, but it seems to be dependent on the content in the document, which is weird to me.
Thanks,
/Hakan
On Oct 8, 2011, at 11:47 AM, Shay Banon wrote:
trackingId is probably analyzed, so its gets broken down into several terms, using this:
create a sample index
curl -XPUT localhost:9200/test
see how the text for trackingId get analyzed using the default (standard) analyzer
You can see that the text Api23-82199996 gets broken down into two terms, Api23, and 82199996 that get indexed. If you want to treat it as a single term, you need to define in a mapping that trackingId is not analyzed.
On Fri, Oct 7, 2011 at 10:16 PM, Hakan Lindestaf ha...@lindestaf.com wrote:
Hi,
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
I think logstash uses the query_string query to query elasticsearch.
Wildcard / Prefix queries will automatically be lowercased (since they are
not analyzed, Lucene tries its "best" to do some sort of common analysis,
which is lowercasing it). I think you solved your problem, which is mapping
it as keyword and lowercase, which is the best way to solve it.
Ahhh, now I understand. That makes sense and it solved my problem. I think
the search query was automatically (by the Logstash UI) made lower case, so
with the default analyzer it didn't pick up the lower case search terms.
With this change it all works! Thanks a lot!
/Hakan
On Oct 12, 2011, at 9:53 PM, David Pilato wrote:
You should use a keyword analyzer with lowercase filter.
Define your own analyzer (keylowercase) and apply it to your field.
Then, when the user enter a search term, lowercase it.
The problem is that I'm using Logstash as the UI (and I've been told
it's using the Java API client to access ES). So I can't see what the real
search parameter is unfortunately.
However when I do searches I can guess what it does.
If my data looks like this:
api12-xxxxyyyy
then any of these searches bring back the same result:
api12*
Api12*
API12*
However if the data looks like this:
Api12-xxxxyyyy
then none of the combinations above bring back any results (only the
full exact match works).
I also verified this with other (non_analyzed) fields. If the content
has upper case characters then the wildcard search doesn't seem to work.
/Hakan
On Oct 12, 2011, at 12:01 PM, Jamshid wrote:
So you're using the "keyword" analyzer now? You probably have to set
lowercase_expanded_terms=false.
thanks a lot, you were right, it was analyzed. However, I changed it
(and killed my indices), checked my metadata and it's not analyzed, but if
the content in the field in Api23 (vs api23) then the wildcard query doesn't
work. What am I missing? I tried both upper and lower case search query, but
it seems to be dependent on the content in the document, which is weird to
me.
Thanks,
/Hakan
On Oct 8, 2011, at 11:47 AM, Shay Banon wrote:
trackingId is probably analyzed, so its gets broken down into several
terms, using this:
create a sample index
curl -XPUT localhost:9200/test
see how the text for trackingId get analyzed using the default
You can see that the text Api23-82199996 gets broken down into two
terms, Api23, and 82199996 that get indexed. If you want to treat it as a
single term, you need to define in a mapping that trackingId is not
analyzed.
On Fri, Oct 7, 2011 at 10:16 PM, Hakan Lindestaf < ha...@lindestaf.com> wrote:
Hi,
I have some documents that have a field like this:
trackingid: Api23-82199996
I would like to query on this, but only on the Api23 part. If
possible
I want to ignore cases (to pick up both api23 and Api23). I tried to
query this using trackingid:api23* and trackingid:Api23* but no
result
is returned. If I try trackingid:Api23-82199996 I get results, but
only for a full match of course. I realize there is something I'm
missing, but if anyone can help me understand or come up with a
workaround I'd appreciate it.
Hi Kimchy,
I have also the similar issue. For instance we have values like "City of God" and "God". If I start searching for "g*", I should get get "God" only. Please advice.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.