Using Terms Stats Facet on "edgeNGram" key field


(David Pilato) #1

Heya,

Full gist is here : https://gist.github.com/3007168

Let's say I have a document like this :

{
"id":"212468504455168001",
"location":{
"lat":46.20715933,
"lon":6.14494212
},
"lat":46.20715933,
"lng":6.14494212,
"path":"1022232003013110123223112",
"created_at":1339491413
}

I defined first an analyzer and a mapping for field path:

  "analyzer":{
    "myanalyzer":{
      "type":"custom",
      "tokenizer":"mytokenizer"
    }
  },
  "tokenizer":{
    "mytokenizer":{
      "type":"edgeNGram",
      "min_gram":"1",
      "max_gram":"25",
      "side":"front"
    }
  }

Then I defined a mapping for field path:

    "path":{"type":"string", "analyzer":"myanalyzer"}

As you can see, I apply a EdgeNGram on path field. So my path is broken in
25 tokens like : 1,10,102,1022,...,1022232003013110123223112

My need is now to compute facets on this field as a key, but only for tokens
with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My facet is :

"path_lat":{

  "terms_stats":{

    "key_field":"path",

    "value_field":"lat",

    "size":0

  },

  "facet_filter":{

    "geo_bounding_box":{

      "location":{

        "top_left":{ "lat":84.4740645845916,  "lon":-179.999999 },

        "bottom_right":{ "lat":-75.67219739055291, "lon":179.999999 }

      }

    }

  }

}

My concern is that I get facets for the 25 tokens for each document (see
result : https://gist.github.com/3007168#file_result.json
https://gist.github.com/3007168#file_result.json). But, I only need to
compute facet on the edgeNGram with a length of 4.

I imagine that I can create a multifield value with 25 fields (path_1 to
path_25) and applying each time a edgeNGram with a min/max:1,min/max:2, .,
min/max:25 and then apply my facet on field path_4 for example, but is there
a fancier way to do it ?

Thanks for your help

I hope that my description is clear enough :-/

Cheers

David.


(Clinton Gormley) #2

My need is now to compute facets on this field as a key, but only for
tokens with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My concern is that I get facets for the 25 tokens for each document
(see result : https://gist.github.com/3007168#file_result.json). But,
I only need to compute facet on the edgeNGram with a length of 4.

The "terms" facet accepts a "regex" filter, which would allow you to
restrict results only on terms of length 4.

Unfortunately, this doesn't work on the terms_stats facet. I'm sure this
would be easy to implement. Open an issue?

In the meantime, you'll have to do the filtering client side

clint


(David Pilato) #3

Many thanks Clint.

Issue is here : https://github.com/elasticsearch/elasticsearch/issues/2063
https://github.com/elasticsearch/elasticsearch/issues/2063
I would like to try to implement it myself. Could you tell me where to start ?

David.

Le 28 juin 2012 à 09:51, Clinton Gormley clint@traveljury.com a écrit :

My need is now to compute facets on this field as a key, but only for
tokens with a size of x (x depends to the user).

So if x is 4, I want to compute on the first 4 characters of my path.

My concern is that I get facets for the 25 tokens for each document
(see result : https://gist.github.com/3007168#file_result.json). But,
I only need to compute facet on the edgeNGram with a length of 4.

The "terms" facet accepts a "regex" filter, which would allow you to
restrict results only on terms of length 4.

Unfortunately, this doesn't work on the terms_stats facet. I'm sure this
would be easy to implement. Open an issue?

In the meantime, you'll have to do the filtering client side

clint

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Clinton Gormley) #4

Issue is
here : https://github.com/elasticsearch/elasticsearch/issues/2063

I would like to try to implement it myself. Could you tell me where to
start ?

Heh - you're asking me? Perl guy? :slight_smile:

I'd start with: grep -r regex src/ | grep facet

clint


(David Pilato) #5

I would like to try to implement it myself. Could you tell me where to start
?

Heh - you're asking me? Perl guy? :slight_smile:

Big LOL :wink:

I'd start with: grep -r regex src/ | grep facet

Should I say "thanks" ? :wink:

Ok. I will try to dig into the code and ask Shay for some advices on how to plug
this new feature in...

Take care
David.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(David Pilato) #6

Just an update on this Thread.

Thanks to Clint I found the way to implement it. So there is now a pull request for it here : https://github.com/elasticsearch/elasticsearch/pull/2109

Cheers

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de David Pilato
Envoyé : jeudi 28 juin 2012 12:40
À : elasticsearch@googlegroups.com
Objet : Re: Using Terms Stats Facet on "edgeNGram" key field

I would like to try to implement it myself. Could you tell me where to start ?

Heh - you're asking me? Perl guy? :slight_smile:

Big LOL :wink:

I'd start with: grep -r regex src/ | grep facet

Should I say "thanks" ? :wink:

Ok. I will try to dig into the code and ask Shay for some advices on how to plug this new feature in...

Take care

David.

--
David Pilato
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet


(system) #7