I'm having issues with searching the analyzed and not_analyzed fields in a
multi_field object (using ES 0.20.4):
I've created a gist here:
The gist workflow is:
Create an index with language analyzers configured
Create an index type with a multi_field value which is split into
analyzed and not_analyzed. (note the analyzer is based on another field
value called language)
Try and search for terms using the analyzed field (different results
returned depending on on which language is set for indexing/search)
Try and search for terms using the not_analyzed field (should get more
results due to lack of language analysis).
Recreate my mapping to try and force ES to store/index the 'raw' field
part of the multi_field
I've having trouble doing the searches with the unanalyzed fields - as per
kimchy's example here:
This all works but if i change the last line to be:
curl -XGET localhost:9200/test/_search?q=name.untouched:me
rather than:
curl -XGET localhost:9200/test/_search?q=name.untouched:*
I get 0 results. What i want is the ability to search over the untouched
fields also with terms.
Sorry, think i may have answered my own question (a little)!
If i search for:
curl -XGET localhost:9200/test/_search?q=name.untouched:"test me"
I get the result back (i.e. the whole not_analyzed) stores the entire text
as 1 token. In my use case (if i want to expose "test" in the search and
"me" in the search, would stop/simple/whitespace be a better solution than
non analyzed?
On 28 February 2013 09:09, Derry O' Sullivan derryos@gmail.com wrote:
Hi all,
I'm having issues with searching the analyzed and not_analyzed fields in a
multi_field object (using ES 0.20.4):
Create an index with language analyzers configured
Create an index type with a multi_field value which is split into
analyzed and not_analyzed. (note the analyzer is based on another field
value called language)
Try and search for terms using the analyzed field (different results
returned depending on on which language is set for indexing/search)
Try and search for terms using the not_analyzed field (should get more
results due to lack of language analysis).
Recreate my mapping to try and force ES to store/index the 'raw' field
part of the multi_field
I've having trouble doing the searches with the unanalyzed fields - as per
kimchy's example here: gist:1296043 · GitHub
This all works but if i change the last line to be:
curl -XGET localhost:9200/test/_search?q=name.untouched:me
rather than:
curl -XGET localhost:9200/test/_search?q=name.untouched:*
I get 0 results. What i want is the ability to search over the untouched
fields also with terms.
Why not simply search on the "name" field which is analyzed?
The default (standard) analyzer will tokenize on common word boundaries,
plus lowercase and stop filters. The choice if analyzer depends on your
goals.
On Thu, Feb 28, 2013 at 1:16 AM, Derry O' Sullivan derryos@gmail.comwrote:
Sorry, think i may have answered my own question (a little)!
If i search for:
curl -XGET localhost:9200/test/_search?q=name.untouched:"test me"
I get the result back (i.e. the whole not_analyzed) stores the entire text
as 1 token. In my use case (if i want to expose "test" in the search and
"me" in the search, would stop/simple/whitespace be a better solution than
non analyzed?
On 28 February 2013 09:09, Derry O' Sullivan derryos@gmail.com wrote:
Hi all,
I'm having issues with searching the analyzed and not_analyzed fields in
a multi_field object (using ES 0.20.4):
Create an index with language analyzers configured
Create an index type with a multi_field value which is split into
analyzed and not_analyzed. (note the analyzer is based on another field
value called language)
Try and search for terms using the analyzed field (different results
returned depending on on which language is set for indexing/search)
Try and search for terms using the not_analyzed field (should get more
results due to lack of language analysis).
Recreate my mapping to try and force ES to store/index the 'raw' field
part of the multi_field
I've having trouble doing the searches with the unanalyzed fields - as
per kimchy's example here: gist:1296043 · GitHub
This all works but if i change the last line to be:
curl -XGET localhost:9200/test/_search?q=name.untouched:me
rather than:
curl -XGET localhost:9200/test/_search?q=name.untouched:*
I get 0 results. What i want is the ability to search over the untouched
fields also with terms.
not_analyzed fields index the exact value in the field, so eg:
"The quick brown fox" ->
analyzed: ["quick","brown","fox"]
not_analyzed: ["The quick brown fox"]
So the only search that will work on the not_analyzed field is a search
for "The quick brown fox". Even "the quick brown fox" won't work,
because the case is different
clint
The gist workflow is:
Create an index with language analyzers configured
Create an index type with a multi_field value which is split into
analyzed and not_analyzed. (note the analyzer is based on another
field value called language)
Try and search for terms using the analyzed field (different
results returned depending on on which language is set for
indexing/search)
Try and search for terms using the not_analyzed field (should get
more results due to lack of language analysis).
Recreate my mapping to try and force ES to store/index the 'raw'
field part of the multi_field
I've having trouble doing the searches with the unanalyzed fields - as
per kimchy's example here: gist:1296043 · GitHub
This all works but if i change the last line to be:
curl -XGET localhost:9200/test/_search?q=name.untouched:me
rather than:
curl -XGET localhost:9200/test/_search?q=name.untouched:*
I get 0 results. What i want is the ability to search over the
untouched fields also with terms.
Clint, i thought that not_analyzed mean more 'simple' analysis so thanks
for the clarification.
My use case is that i insert content into elasticsearch based on automatic
language detection (e.g. tika, user input, or other language setting
measures). This happens automatically so i have no idea what the
index_analyzer (set on the language field) is going to be when the content
is ingested. When i go to use the content (in search), i want to use a
'best fit' analyzer or else the language analyzer a user specifies in
search. (so my real question is how do i solve someone searching in a
potentially different language to the language that some relevant content
has been inserted with)
My logic on using multi field was that i was going to have the content
analyzed in the (hopefully) correct language and then a catch all version
of the content (sounds to me like this should be standard or something
simpler?) which is language ambiguous.
Therefore if i go to search the content using a search analyzer in a
different language to the index analyzer, i could have some chance of
getting some text matching (although i'm guessing that i would need to do 2
searches - one with the 'best fit' analyzer' and one with the catch all
analyzer?
Thanks again,
Derry
On Friday, 1 March 2013 13:22:50 UTC, Clinton Gormley wrote:
Hi Derry
I'm having issues with searching the analyzed and not_analyzed fields
in a multi_field object (using ES 0.20.4):
not_analyzed fields index the exact value in the field, so eg:
"The quick brown fox" ->
analyzed: ["quick","brown","fox"]
not_analyzed: ["The quick brown fox"]
So the only search that will work on the not_analyzed field is a search
for "The quick brown fox". Even "the quick brown fox" won't work,
because the case is different
clint
The gist workflow is:
Create an index with language analyzers configured
Create an index type with a multi_field value which is split into
analyzed and not_analyzed. (note the analyzer is based on another
field value called language)
Try and search for terms using the analyzed field (different
results returned depending on on which language is set for
indexing/search)
Try and search for terms using the not_analyzed field (should get
more results due to lack of language analysis).
Recreate my mapping to try and force ES to store/index the 'raw'
field part of the multi_field
I've having trouble doing the searches with the unanalyzed fields - as
per kimchy's example here: gist:1296043 · GitHub
This all works but if i change the last line to be:
curl -XGET localhost:9200/test/_search?q=name.untouched:me
rather than:
curl -XGET localhost:9200/test/_search?q=name.untouched:*
I get 0 results. What i want is the ability to search over the
untouched fields also with terms.
Any help greatly appreciated.
Derry
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
My logic on using multi field was that i was going to have the content
analyzed in the (hopefully) correct language and then a catch all
version of the content (sounds to me like this should be standard or
something simpler?) which is language ambiguous.
So yes - use one field with just the "standard" analyzer.
Therefore if i go to search the content using a search analyzer in a
different language to the index analyzer, i could have some chance of
getting some text matching (although i'm guessing that i would need to
do 2 searches - one with the 'best fit' analyzer' and one with the
catch all analyzer?
You don't need two queries, just two query clauses. Or even a
multi_match query that looks at both fields, and can assign them
different boosts
clint
Thanks again,
Derry
On Friday, 1 March 2013 13:22:50 UTC, Clinton Gormley wrote:
Hi Derry
>
>
> I'm having issues with searching the analyzed and
not_analyzed fields
> in a multi_field object (using ES 0.20.4):
>
>
> I've created a gist here:
> Trying to do language specific searching with multi_field in ES · GitHub
You've misunderstood not_analyzed fields.
not_analyzed fields index the exact value in the field, so
eg:
"The quick brown fox" ->
analyzed: ["quick","brown","fox"]
not_analyzed: ["The quick brown fox"]
So the only search that will work on the not_analyzed field is
a search
for "The quick brown fox". Even "the quick brown fox" won't
work,
because the case is different
clint
>
>
>
> The gist workflow is:
> 1. Create an index with language analyzers configured
> 2. Create an index type with a multi_field value which is
split into
> analyzed and not_analyzed. (note the analyzer is based on
another
> field value called language)
> 3. Try and search for terms using the analyzed field
(different
> results returned depending on on which language is set for
> indexing/search)
> 4. Try and search for terms using the not_analyzed field
(should get
> more results due to lack of language analysis).
> 5. Recreate my mapping to try and force ES to store/index
the 'raw'
> field part of the multi_field
>
>
> I've having trouble doing the searches with the unanalyzed
fields - as
> per kimchy's example here:
> https://gist.github.com/kimchy/1296043
>
>
>
> This all works but if i change the last line to be:
> curl -XGET localhost:9200/test/_search?q=name.untouched:me
>
> rather than:
> curl -XGET localhost:9200/test/_search?q=name.untouched:*
>
>
>
> I get 0 results. What i want is the ability to search over
the
> untouched fields also with terms.
>
>
>
> Any help greatly appreciated.
>
>
> Derry
>
> --
> You received this message because you are subscribed to the
Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails
from it, send
> an email to elasticsearc...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
On Monday, 4 March 2013 10:44:32 UTC, Clinton Gormley wrote:
My logic on using multi field was that i was going to have the content
analyzed in the (hopefully) correct language and then a catch all
version of the content (sounds to me like this should be standard or
something simpler?) which is language ambiguous.
So yes - use one field with just the "standard" analyzer.
Therefore if i go to search the content using a search analyzer in a
different language to the index analyzer, i could have some chance of
getting some text matching (although i'm guessing that i would need to
do 2 searches - one with the 'best fit' analyzer' and one with the
catch all analyzer?
You don't need two queries, just two query clauses. Or even a
multi_match query that looks at both fields, and can assign them
different boosts
clint
Thanks again,
Derry
On Friday, 1 March 2013 13:22:50 UTC, Clinton Gormley wrote:
Hi Derry
>
>
> I'm having issues with searching the analyzed and
not_analyzed fields
> in a multi_field object (using ES 0.20.4):
>
>
> I've created a gist here:
> Trying to do language specific searching with multi_field in ES · GitHub
You've misunderstood not_analyzed fields.
not_analyzed fields index the exact value in the field, so
eg:
"The quick brown fox" ->
analyzed: ["quick","brown","fox"]
not_analyzed: ["The quick brown fox"]
So the only search that will work on the not_analyzed field is
a search
for "The quick brown fox". Even "the quick brown fox" won't
work,
because the case is different
clint
>
>
>
> The gist workflow is:
> 1. Create an index with language analyzers configured
> 2. Create an index type with a multi_field value which is
split into
> analyzed and not_analyzed. (note the analyzer is based on
another
> field value called language)
> 3. Try and search for terms using the analyzed field
(different
> results returned depending on on which language is set for
> indexing/search)
> 4. Try and search for terms using the not_analyzed field
(should get
> more results due to lack of language analysis).
> 5. Recreate my mapping to try and force ES to store/index
the 'raw'
> field part of the multi_field
>
>
> I've having trouble doing the searches with the unanalyzed
fields - as
> per kimchy's example here:
> https://gist.github.com/kimchy/1296043
>
>
>
> This all works but if i change the last line to be:
> curl -XGET localhost:9200/test/_search?q=name.untouched:me
>
> rather than:
> curl -XGET localhost:9200/test/_search?q=name.untouched:*
>
>
>
> I get 0 results. What i want is the ability to search over
the
> untouched fields also with terms.
>
>
>
> Any help greatly appreciated.
>
>
> Derry
>
> --
> You received this message because you are subscribed to the
Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails
from it, send
> an email to elasticsearc...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
The field would be "name.untouched", but otherwise yes. Also, you can
boost individual fields like:
"field": ["name^2","name.untouched"]
clint
?
Cheers,
Derry
On Monday, 4 March 2013 10:44:32 UTC, Clinton Gormley wrote:
>
> My logic on using multi field was that i was going to have
the content
> analyzed in the (hopefully) correct language and then a
catch all
> version of the content (sounds to me like this should be
standard or
> something simpler?) which is language ambiguous.
So yes - use one field with just the "standard" analyzer.
>
>
> Therefore if i go to search the content using a search
analyzer in a
> different language to the index analyzer, i could have some
chance of
> getting some text matching (although i'm guessing that i
would need to
> do 2 searches - one with the 'best fit' analyzer' and one
with the
> catch all analyzer?
You don't need two queries, just two query clauses. Or even
a
multi_match query that looks at both fields, and can assign
them
different boosts
clint
>
>
> Thanks again,
>
>
> Derry
>
>
>
>
>
> On Friday, 1 March 2013 13:22:50 UTC, Clinton Gormley
wrote:
> Hi Derry
> >
> >
> > I'm having issues with searching the analyzed and
> not_analyzed fields
> > in a multi_field object (using ES 0.20.4):
> >
> >
> > I've created a gist here:
> >
https://gist.github.com/derryos/2218785ca960e3a4f30f
>
> You've misunderstood not_analyzed fields.
>
> not_analyzed fields index the exact value in the
field, so
> eg:
>
> "The quick brown fox" ->
> analyzed: ["quick","brown","fox"]
> not_analyzed: ["The quick brown fox"]
>
> So the only search that will work on the
not_analyzed field is
> a search
> for "The quick brown fox". Even "the quick brown
fox" won't
> work,
> because the case is different
>
> clint
>
> >
> >
> >
> > The gist workflow is:
> > 1. Create an index with language analyzers
configured
> > 2. Create an index type with a multi_field value
which is
> split into
> > analyzed and not_analyzed. (note the analyzer is
based on
> another
> > field value called language)
> > 3. Try and search for terms using the analyzed
field
> (different
> > results returned depending on on which language is
set for
> > indexing/search)
> > 4. Try and search for terms using the not_analyzed
field
> (should get
> > more results due to lack of language analysis).
> > 5. Recreate my mapping to try and force ES to
store/index
> the 'raw'
> > field part of the multi_field
> >
> >
> > I've having trouble doing the searches with the
unanalyzed
> fields - as
> > per kimchy's example here:
> > https://gist.github.com/kimchy/1296043
> >
> >
> >
> > This all works but if i change the last line to
be:
> > curl -XGET
localhost:9200/test/_search?q=name.untouched:me
> >
> > rather than:
> > curl -XGET
localhost:9200/test/_search?q=name.untouched:*
> >
> >
> >
> > I get 0 results. What i want is the ability to
search over
> the
> > untouched fields also with terms.
> >
> >
> >
> > Any help greatly appreciated.
> >
> >
> > Derry
> >
> > --
> > You received this message because you are
subscribed to the
> Google
> > Groups "elasticsearch" group.
> > To unsubscribe from this group and stop receiving
emails
> from it, send
> > an email to elasticsearc...@googlegroups.com.
> > For more options, visit
> https://groups.google.com/groups/opt_out.
> >
> >
>
>
>
> --
> You received this message because you are subscribed to the
Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails
from it, send
> an email to elasticsearc...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
The field would be "name.untouched", but otherwise yes. Also, you can
boost individual fields like:
"field": ["name^2","name.untouched"]
clint
?
Cheers,
Derry
On Monday, 4 March 2013 10:44:32 UTC, Clinton Gormley wrote:
>
> My logic on using multi field was that i was going to have
the content
> analyzed in the (hopefully) correct language and then a
catch all
> version of the content (sounds to me like this should be
standard or
> something simpler?) which is language ambiguous.
So yes - use one field with just the "standard" analyzer.
>
>
> Therefore if i go to search the content using a search
analyzer in a
> different language to the index analyzer, i could have some
chance of
> getting some text matching (although i'm guessing that i
would need to
> do 2 searches - one with the 'best fit' analyzer' and one
with the
> catch all analyzer?
You don't need two queries, just two query clauses. Or even
a
multi_match query that looks at both fields, and can assign
them
different boosts
clint
>
>
> Thanks again,
>
>
> Derry
>
>
>
>
>
> On Friday, 1 March 2013 13:22:50 UTC, Clinton Gormley
wrote:
> Hi Derry
> >
> >
> > I'm having issues with searching the analyzed and
> not_analyzed fields
> > in a multi_field object (using ES 0.20.4):
> >
> >
> > I've created a gist here:
> >
https://gist.github.com/derryos/2218785ca960e3a4f30f
>
> You've misunderstood not_analyzed fields.
>
> not_analyzed fields index the exact value in the
field, so
> eg:
>
> "The quick brown fox" ->
> analyzed: ["quick","brown","fox"]
> not_analyzed: ["The quick brown fox"]
>
> So the only search that will work on the
not_analyzed field is
> a search
> for "The quick brown fox". Even "the quick brown
fox" won't
> work,
> because the case is different
>
> clint
>
> >
> >
> >
> > The gist workflow is:
> > 1. Create an index with language analyzers
configured
> > 2. Create an index type with a multi_field value
which is
> split into
> > analyzed and not_analyzed. (note the analyzer is
based on
> another
> > field value called language)
> > 3. Try and search for terms using the analyzed
field
> (different
> > results returned depending on on which language is
set for
> > indexing/search)
> > 4. Try and search for terms using the not_analyzed
field
> (should get
> > more results due to lack of language analysis).
> > 5. Recreate my mapping to try and force ES to
store/index
> the 'raw'
> > field part of the multi_field
> >
> >
> > I've having trouble doing the searches with the
unanalyzed
> fields - as
> > per kimchy's example here:
> > https://gist.github.com/kimchy/1296043
> >
> >
> >
> > This all works but if i change the last line to
be:
> > curl -XGET
localhost:9200/test/_search?q=name.untouched:me
> >
> > rather than:
> > curl -XGET
localhost:9200/test/_search?q=name.untouched:*
> >
> >
> >
> > I get 0 results. What i want is the ability to
search over
> the
> > untouched fields also with terms.
> >
> >
> >
> > Any help greatly appreciated.
> >
> >
> > Derry
> >
> > --
> > You received this message because you are
subscribed to the
> Google
> > Groups "elasticsearch" group.
> > To unsubscribe from this group and stop receiving
emails
> from it, send
> > an email to elasticsearc...@googlegroups.com.
> > For more options, visit
> https://groups.google.com/groups/opt_out.
> >
> >
>
>
>
> --
> You received this message because you are subscribed to the
Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails
from it, send
> an email to elasticsearc...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.