Using Terms Stats Facet on "edgeNGram" key field

dadoonet · June 27, 2012, 10:18pm

Heya,

Full gist is here : https://gist.github.com/3007168

gist.github.com

https://gist.github.com/dadoonet/3007168

result.json

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 3,
    "failed" : 2,
    "failures" : [ {
      "index" : "dpitestcase",
      "shard" : 3,

This file has been truncated. show original

testcase.sh

# First delete old index
curl -XDELETE localhost:9200/dpitestcase

# Then creat the index with settings
curl -XPUT localhost:9200/dpitestcase -d '
{
  "settings":{
    "analysis":{
      "analyzer":{
        "myanalyzer":{

This file has been truncated. show original

Let's say I have a document like this :

{
"id":"212468504455168001",
"location":{
"lat":46.20715933,
"lon":6.14494212
},
"lat":46.20715933,
"lng":6.14494212,
"path":"1022232003013110123223112",
"created_at":1339491413
}

I defined first an analyzer and a mapping for field path:

  "analyzer":{
    "myanalyzer":{
      "type":"custom",
      "tokenizer":"mytokenizer"
    }
  },
  "tokenizer":{
    "mytokenizer":{
      "type":"edgeNGram",
      "min_gram":"1",
      "max_gram":"25",
      "side":"front"
    }
  }

Then I defined a mapping for field path:

    "path":{"type":"string", "analyzer":"myanalyzer"}

As you can see, I apply a EdgeNGram on path field. So my path is broken in
25 tokens like : 1,10,102,1022,...,1022232003013110123223112

My need is now to compute facets on this field as a key, but only for tokens
with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My facet is :

"path_lat":{

  "terms_stats":{

    "key_field":"path",

    "value_field":"lat",

    "size":0

  },

  "facet_filter":{

    "geo_bounding_box":{

      "location":{

        "top_left":{ "lat":84.4740645845916,  "lon":-179.999999 },

        "bottom_right":{ "lat":-75.67219739055291, "lon":179.999999 }

      }

    }

  }

}

My concern is that I get facets for the 25 tokens for each document (see
result : https://gist.github.com/3007168#file_result.json
https://gist.github.com/3007168#file_result.json). But, I only need to
compute facet on the edgeNGram with a length of 4.

I imagine that I can create a multifield value with 25 fields (path_1 to
path_25) and applying each time a edgeNGram with a min/max:1,min/max:2, .,
min/max:25 and then apply my facet on field path_4 for example, but is there
a fancier way to do it ?

Thanks for your help

I hope that my description is clear enough :-/

Cheers

David.

Clinton_Gormley · June 28, 2012, 7:51am

My need is now to compute facets on this field as a key, but only for
tokens with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My concern is that I get facets for the 25 tokens for each document
(see result : Using Terms Stats Facet on "edgeNGram" key field · GitHub). But,
I only need to compute facet on the edgeNGram with a length of 4.

The "terms" facet accepts a "regex" filter, which would allow you to
restrict results only on terms of length 4.

Unfortunately, this doesn't work on the terms_stats facet. I'm sure this
would be easy to implement. Open an issue?

In the meantime, you'll have to do the filtering client side

clint

dadoonet · June 28, 2012, 8:13am

Many thanks Clint.

Issue is here : Add regex filter on terms_stats facet · Issue #2063 · elastic/elasticsearch · GitHub
https://github.com/elasticsearch/elasticsearch/issues/2063
I would like to try to implement it myself. Could you tell me where to start ?

David.

Le 28 juin 2012 à 09:51, Clinton Gormley clint@traveljury.com a écrit :

My need is now to compute facets on this field as a key, but only for
tokens with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My concern is that I get facets for the 25 tokens for each document
(see result : Using Terms Stats Facet on "edgeNGram" key field · GitHub). But,
I only need to compute facet on the edgeNGram with a length of 4.

The "terms" facet accepts a "regex" filter, which would allow you to
restrict results only on terms of length 4.

Unfortunately, this doesn't work on the terms_stats facet. I'm sure this
would be easy to implement. Open an issue?

In the meantime, you'll have to do the filtering client side

clint

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

Clinton_Gormley · June 28, 2012, 8:37am

Issue is
here : Add regex filter on terms_stats facet · Issue #2063 · elastic/elasticsearch · GitHub

I would like to try to implement it myself. Could you tell me where to
start ?

Heh - you're asking me? Perl guy?

I'd start with: grep -r regex src/ | grep facet

clint

dadoonet · June 28, 2012, 10:39am

I would like to try to implement it myself. Could you tell me where to start
?

Heh - you're asking me? Perl guy?

Big LOL

I'd start with: grep -r regex src/ | grep facet

Should I say "thanks" ?

Ok. I will try to dig into the code and ask Shay for some advices on how to plug
this new feature in...

Take care
David.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

dadoonet · July 20, 2012, 10:08pm

Just an update on this Thread.

Thanks to Clint I found the way to implement it. So there is now a pull request for it here : Add regex filter on terms_stats facet by dadoonet · Pull Request #2109 · elastic/elasticsearch · GitHub

Cheers

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de David Pilato
Envoyé : jeudi 28 juin 2012 12:40
À : elasticsearch@googlegroups.com
Objet : Re: Using Terms Stats Facet on "edgeNGram" key field

I would like to try to implement it myself. Could you tell me where to start ?

Heh - you're asking me? Perl guy?

Big LOL

I'd start with: grep -r regex src/ | grep facet

Should I say "thanks" ?

Ok. I will try to dig into the code and ask Shay for some advices on how to plug this new feature in...

Take care

David.

--
David Pilato
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet

Topic		Replies	Views
How to make a specific edgeNGram query Elasticsearch	5	314	July 6, 2017
Highlighting - not working Elasticsearch	2	280	July 6, 2017
Analyzer problem Elasticsearch	3	339	July 6, 2017
FosElasticaBundle edgeNgram filter Elasticsearch	1	560	July 6, 2017
Facet_filter broken? Elasticsearch	1	288	July 6, 2017

Using Terms Stats Facet on "edgeNGram" key field

Related topics