Forcing Analysis of Terms and Span Terms?

Is there any way to force term queries to be analyzed? How about span_term
queries?

I would like to use the span_near query but it only accepts other span-type
queries as part of its clauses. The other span-type queries are not
analyzed, so you cannot construct a span_near query with analyzed
sub-parts. For example, suppose you wanted a a query looking for "singing"
within a two word span of "birds". Simply returning a span_near with
"singing" and "birds" as the clauses will yield zero results because the
analyzer was not run and did not turn "singing" into "sing" and "birds"
into "bird".

An alternative, I suppose, is to write my own analyzer and run it while
constructing the query. Another option is to make separate calls to
elasticsearch to run the analysis on the terms. Of course, I would prefer
to not have to do either solution. Any suggestions?

Thanks,

--

Hiya

I would like to use the span_near query but it only accepts other
span-type queries as part of its clauses. The other span-type queries
are not analyzed, so you cannot construct a span_near query with
analyzed sub-parts. For example, suppose you wanted a a query looking
for "singing" within a two word span of "birds". Simply returning a
span_near with "singing" and "birds" as the clauses will yield zero
results because the analyzer was not run and did not turn "singing"
into "sing" and "birds" into "bird".

For the example you give, which is fairly simple, you could just use a
match_phrase query:

{ "match_phrase": {
"my_field": {
"query": "singing birds",
"slop": 3
}
}

Not quite as much control as when using span queries, but it'd serve the
purpose you describe

clint

--

I instantiate Elasticsearch analyzers on the client side and analyze the
terms myself, without making a separate call to Elasticsearch. You can
create analyzers using the AnalysisModule.

--
Ivan

On Mon, Nov 26, 2012 at 10:08 PM, Michael Sander
michael.sander@gmail.comwrote:

Is there any way to force term queries to be analyzed? How about span_term
queries?

I would like to use the span_near query but it only accepts other
span-type queries as part of its clauses. The other span-type queries are
not analyzed, so you cannot construct a span_near query with analyzed
sub-parts. For example, suppose you wanted a a query looking for "singing"
within a two word span of "birds". Simply returning a span_near with
"singing" and "birds" as the clauses will yield zero results because the
analyzer was not run and did not turn "singing" into "sing" and "birds"
into "bird".

An alternative, I suppose, is to write my own analyzer and run it while
constructing the query. Another option is to make separate calls to
elasticsearch to run the analysis on the terms. Of course, I would prefer
to not have to do either solution. Any suggestions?

Thanks,

--

--

I am running python and using haystack to talk to the elasticsearch
instance. Maybe I can copy the analysis code from Whoosh.

Michael Sander

On Tue, Nov 27, 2012 at 6:20 PM, Ivan Brusic ivan@brusic.com wrote:

I instantiate Elasticsearch analyzers on the client side and analyze the
terms myself, without making a separate call to Elasticsearch. You can
create analyzers using the AnalysisModule.

--
Ivan

On Mon, Nov 26, 2012 at 10:08 PM, Michael Sander <michael.sander@gmail.com

wrote:

Is there any way to force term queries to be analyzed? How about
span_term queries?

I would like to use the span_near query but it only accepts other
span-type queries as part of its clauses. The other span-type queries are
not analyzed, so you cannot construct a span_near query with analyzed
sub-parts. For example, suppose you wanted a a query looking for "singing"
within a two word span of "birds". Simply returning a span_near with
"singing" and "birds" as the clauses will yield zero results because the
analyzer was not run and did not turn "singing" into "sing" and "birds"
into "bird".

An alternative, I suppose, is to write my own analyzer and run it while
constructing the query. Another option is to make separate calls to
elasticsearch to run the analysis on the terms. Of course, I would prefer
to not have to do either solution. Any suggestions?

Thanks,

--

--

--

@Clinton - Good point, but that was just a toy example. I want to be able
to perform these complex queries.

@Ivan I am in a pure-python environment, so I can't just instantiate an
AnalysisModule.

I think I have two options: (1) make a call to Elasticsearch (a url fetch)
to analyze the tokens or (2) do the analysis in pure python.

Option 1 is probably easy to implement but has the obvious disadvantage of
making a URL fetch, to analyze each token.
Option 2 may be faster than making a URL fetch, but in pure python it could
still be slow. Putting speed aside, a bigger problem may arise if the
python analyzer does not match up exactly with the Elasticsearch analyzer.
I am also loath to write my own analyzer in python.

I think, for now, I am going to go with option 2 and hope that the
analyzers are the same. I found this pure-python stemming python (
stemming · PyPI) which appears to do what I want.

I'll let you know how it goes.

Michael Sander

On Tue, Nov 27, 2012 at 11:17 PM, Michael Sander
michael.sander@gmail.comwrote:

I am running python and using haystack to talk to the elasticsearch
instance. Maybe I can copy the analysis code from Whoosh.

Michael Sander

On Tue, Nov 27, 2012 at 6:20 PM, Ivan Brusic ivan@brusic.com wrote:

I instantiate Elasticsearch analyzers on the client side and analyze the
terms myself, without making a separate call to Elasticsearch. You can
create analyzers using the AnalysisModule.

--
Ivan

On Mon, Nov 26, 2012 at 10:08 PM, Michael Sander <
michael.sander@gmail.com> wrote:

Is there any way to force term queries to be analyzed? How about
span_term queries?

I would like to use the span_near query but it only accepts other
span-type queries as part of its clauses. The other span-type queries are
not analyzed, so you cannot construct a span_near query with analyzed
sub-parts. For example, suppose you wanted a a query looking for "singing"
within a two word span of "birds". Simply returning a span_near with
"singing" and "birds" as the clauses will yield zero results because the
analyzer was not run and did not turn "singing" into "sing" and "birds"
into "bird".

An alternative, I suppose, is to write my own analyzer and run it while
constructing the query. Another option is to make separate calls to
elasticsearch to run the analysis on the terms. Of course, I would prefer
to not have to do either solution. Any suggestions?

Thanks,

--

--

--