Here's another
Input:
foo w/3 (biz and bar)
Output:
{
"span_near" : {
"clauses" : [
{ "span_term" : { "text" : "foo" } },
{ "span_near" : {
"clauses" : [
{ "span_term" : { "text" : "biz" } },
{ "span_term" : { "text" : "bar" } },
],
"slop" : int(1e6),
"in_order" : False,
"collect_payloads" : False
} },
],
"slop" : 2,
"in_order" : False,
"collect_payloads" : False
}
}},
Michael Sander
mes65@cornell.edu
607-227-9859
On Thu, Oct 24, 2013 at 10:51 AM, Michael Sander mes65@cornell.edu wrote:
Hi David,
This is from one of my tests:
Input:
'ferm! w/5 outil w/5 dispositif'
Output:
{
'span_near': {
'clauses': [
{'span_multi': {'match': {'prefix': {'text': 'ferm'}}}},
{'span_near': {
'clauses': [
{'span_term': {'text': 'outil'}},
{'span_term': {'text': 'dispositif'}}
],
'collect_payloads': False,
'in_order': False,
'slop': 4}
}
],
'collect_payloads': False,
'in_order': False,
'slop': 4
}
} },
Michael Sander
mes65@cornell.edu
607-227-9859
On Thu, Oct 24, 2013 at 9:38 AM, janssen.dja@gmail.com wrote:
Hi Michael,
I have the same requirements you got to 'convert' sophisticated queries
into elasticsearch query dsl.
Would it be possible to get a JSON sample of multi_span usage with
complex proximity ?
For example, what would be the right syntax for this request (is is
possible to request that using ES ?) :
(toto and tata) within/20 (tutu or titi)
Best regards
David
Le mardi 10 septembre 2013 13:42:20 UTC+2, Michael Sander a écrit :
Hi Cristophe,
Yes, I got it working using span_multi. At the time I originally sent
this message, span_multi did not exist. However, it has been since added
and works great.
I am curious... how you are using Elasticsearch in the patent space? My
site also searches legal documents: www.docketalarm.com/search
Best,
Michael Sander
michael.sander@gmail.com
607-227-9859
On Tue, Sep 10, 2013 at 5:20 AM, Christophe V. <christophe.viaud@cewo.fr
wrote:
Hi Michael,
I have the same need today to search on patent doc.
Do you succeed to do use the span_multi query on elasticsearch ? or do
you migrate to another lucene search engine ?
thanks in advance for your feedback
regards
--
Christophe
Le vendredi 9 novembre 2012 23:46:35 UTC+1, Michael Sander a écrit :
Hi Simon,
Yes I really want to do this and your guess is correct: I am working
on a legal research tool. Lawyers use surprisingly sophisticated queries
to research law. For example, a lawyer researching employment
discrimination lawsuits in New York may use the following query:
(employ* within/5 discrimi*) within/20 (black or latino or hispanic or
(african within/3 american)) and "New York"
It seems complex, but searches like this occur all the time and such
functionality is expected. It's one of the reasons Google scholar is not
terribly popular with attorneys. Speed is important but not of extreme
importance. A two or three second wait-time is not a deal breaker, but it
definitely needs to be under ten. To make things run faster, I could limit
wildcard queries to require at least four or five letters.
I will look into creating the plugin, however it does not look like a
simple task.
On Friday, November 9, 2012 3:30:29 AM UTC-5, simonw wrote:
Hi Michael,
this kind of queries are possible but do you really wanna do this.
Take a step back and think about how we would calculate relevance for this?
I don't think you can expect a reasonable relevance score for such a query
neither a reasonable performance. The fact that lucene allows these kind of
queries is scary enough. I'd really want to hear what you are trying
to achieve and maybe we can find a better way to do this than multiterms
spans. What is the usecase to allow queries like "pret* and ug*" who types
that in? I mean I could imagine there are usecases like this (lawyers to
weird things with searchengines in the patent space...) but maybe you can
elaborate and we think about a better solution?
simon
On Wednesday, November 7, 2012 8:57:11 PM UTC+1, Michael Sander wrote:
Hi,
Is it possible to construct an elasticsearch query (or filter) that
detects whether two words with wildcards are within a certain distance of
each other. Is this possible with elasticsearch?
For example, I would like a query that detects whether pret* and ug*
are within five words of each other. Such a query should match "She is
pretty and he is ugly."
I think I would need to use the span_near query, but span_near only
accepts a series of span_term's as arguments and span_term doesn't appear
to allow wildcards.
Is it possible to do this with elasticsearch? If not, is this
possible with Lucene directly?
FYI, I have an SO question open here
http://stackoverflow.com/**quest**ions/13258997/**elasticsearch-**
query-wildcard-**or-stemming-**within-a-span-i-e-**proximity-**queryhttp://stackoverflow.com/questions/13258997/elasticsearch-query-wildcard-or-stemming-within-a-span-i-e-proximity-query
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**vHQh0ARaAHY/unsubscribehttps://groups.google.com/d/topic/elasticsearch/vHQh0ARaAHY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/vHQh0ARaAHY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.