Using Terms Stats Facet on "edgeNGram" key field

Heya,

Full gist is here : https://gist.github.com/3007168

Let's say I have a document like this :

{
"id":"212468504455168001",
"location":{
"lat":46.20715933,
"lon":6.14494212
},
"lat":46.20715933,
"lng":6.14494212,
"path":"1022232003013110123223112",
"created_at":1339491413
}

I defined first an analyzer and a mapping for field path:

  "analyzer":{
    "myanalyzer":{
      "type":"custom",
      "tokenizer":"mytokenizer"
    }
  },
  "tokenizer":{
    "mytokenizer":{
      "type":"edgeNGram",
      "min_gram":"1",
      "max_gram":"25",
      "side":"front"
    }
  }

Then I defined a mapping for field path:

    "path":{"type":"string", "analyzer":"myanalyzer"}

As you can see, I apply a EdgeNGram on path field. So my path is broken in
25 tokens like : 1,10,102,1022,...,1022232003013110123223112

My need is now to compute facets on this field as a key, but only for tokens
with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My facet is :

"path_lat":{

  "terms_stats":{

    "key_field":"path",

    "value_field":"lat",

    "size":0

  },

  "facet_filter":{

    "geo_bounding_box":{

      "location":{

        "top_left":{ "lat":84.4740645845916,  "lon":-179.999999 },

        "bottom_right":{ "lat":-75.67219739055291, "lon":179.999999 }

      }

    }

  }

}

My concern is that I get facets for the 25 tokens for each document (see
result : https://gist.github.com/3007168#file_result.json
https://gist.github.com/3007168#file_result.json). But, I only need to
compute facet on the edgeNGram with a length of 4.

I imagine that I can create a multifield value with 25 fields (path_1 to
path_25) and applying each time a edgeNGram with a min/max:1,min/max:2, .,
min/max:25 and then apply my facet on field path_4 for example, but is there
a fancier way to do it ?

Thanks for your help

I hope that my description is clear enough :-/

Cheers

David.

My need is now to compute facets on this field as a key, but only for
tokens with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My concern is that I get facets for the 25 tokens for each document
(see result : Using Terms Stats Facet on "edgeNGram" key field · GitHub). But,
I only need to compute facet on the edgeNGram with a length of 4.

The "terms" facet accepts a "regex" filter, which would allow you to
restrict results only on terms of length 4.

Unfortunately, this doesn't work on the terms_stats facet. I'm sure this
would be easy to implement. Open an issue?

In the meantime, you'll have to do the filtering client side

clint

Many thanks Clint.

Issue is here : Add regex filter on terms_stats facet · Issue #2063 · elastic/elasticsearch · GitHub
https://github.com/elasticsearch/elasticsearch/issues/2063
I would like to try to implement it myself. Could you tell me where to start ?

David.

Le 28 juin 2012 à 09:51, Clinton Gormley clint@traveljury.com a écrit :

My need is now to compute facets on this field as a key, but only for
tokens with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My concern is that I get facets for the 25 tokens for each document
(see result : Using Terms Stats Facet on "edgeNGram" key field · GitHub). But,
I only need to compute facet on the edgeNGram with a length of 4.

The "terms" facet accepts a "regex" filter, which would allow you to
restrict results only on terms of length 4.

Unfortunately, this doesn't work on the terms_stats facet. I'm sure this
would be easy to implement. Open an issue?

In the meantime, you'll have to do the filtering client side

clint

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

Issue is
here : Add regex filter on terms_stats facet · Issue #2063 · elastic/elasticsearch · GitHub

I would like to try to implement it myself. Could you tell me where to
start ?

Heh - you're asking me? Perl guy? :slight_smile:

I'd start with: grep -r regex src/ | grep facet

clint

I would like to try to implement it myself. Could you tell me where to start
?

Heh - you're asking me? Perl guy? :slight_smile:

Big LOL :wink:

I'd start with: grep -r regex src/ | grep facet

Should I say "thanks" ? :wink:

Ok. I will try to dig into the code and ask Shay for some advices on how to plug
this new feature in...

Take care
David.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

Just an update on this Thread.

Thanks to Clint I found the way to implement it. So there is now a pull request for it here : Add regex filter on terms_stats facet by dadoonet · Pull Request #2109 · elastic/elasticsearch · GitHub

Cheers

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de David Pilato
Envoyé : jeudi 28 juin 2012 12:40
À : elasticsearch@googlegroups.com
Objet : Re: Using Terms Stats Facet on "edgeNGram" key field

I would like to try to implement it myself. Could you tell me where to start ?

Heh - you're asking me? Perl guy? :slight_smile:

Big LOL :wink:

I'd start with: grep -r regex src/ | grep facet

Should I say "thanks" ? :wink:

Ok. I will try to dig into the code and ask Shay for some advices on how to plug this new feature in...

Take care

David.

--
David Pilato
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet