Get distinct words for search suggestions


(JAS) #1

Hi,

I like ElasticSearch so far, nice job!

I am indexing what I call Items under a single index called items, a
default index with default config. So I add items like this:

{
"name" : "Steak"
"description" : "yummy"
}

Suppose I've added 1000 items, 300 with the name "Steak", 200 with the
name "Steak Sandwich" and then 500 random names involving the word
Steak.

If I do a simple search like http://localhost:9200/items/_search?q=steak&size=100
I get at list like the following:

{

              {
                  # "_index": "items",
                  # "_type": "document",
                  # "_id": "41ae5059a28b4838ad3bbc19f84610ba",
                  # "_score": 2.1778517,
                  #
                    -
                    "_source": {
                        * "id":

"41ae5059a28b4838ad3bbc19f84610ba",
* "name": "Steak",
* "description": "\n"
}
},

.... A WHOLE BUNCH OF "name" : "Steak" ...

              {
                  # "_index": "items",
                  # "_type": "document",
                  # "_id": "f7c714f8b66f4854884379c3a954e4f6",
                  # "_score": 2.1087837,
                  #
                    -
                    "_source": {
                        * "id":

"f7c714f8b66f4854884379c3a954e4f6",
* "name": "Steak Encebollado",
* "description": "Steak with onions.\n"
}
},

This is not what I need for search suggestions. I basically want "the
set of all distinct names with the word Steak in them". In other
words, even if there are 300 documents with the name "Steak" I only
want the exact name "Steak" to be returned once. I want to see all
the variations of names that contain the word "Steak" exactly once.

How do I do that? I don't want to do it client side, so I want the
limit parameter to mean "return 100 distinct variations of the name
Steak".

Is this possible? Is it efficient?

Thanks,

Joel


(Karussell) #2

Why not create a new field where you feed the name which is splitted
by whitespace (=> array of strings) and then facet on that value for
the suggestions. Not sure if you could simply use a whitespace
tokenizer as well.

But probably you want also suggestions for parts of the words like 'st
which should return 'Steak''? Then you should use the (edge) ngram
token filters...

Peter.

On Nov 8, 5:16 am, JAS j...@republicofapps.com wrote:

Hi,

I like ElasticSearch so far, nice job!

I am indexing what I call Items under a single index called items, a
default index with default config. So I add items like this:

{
"name" : "Steak"
"description" : "yummy"

}

Suppose I've added 1000 items, 300 with the name "Steak", 200 with the
name "Steak Sandwich" and then 500 random names involving the word
Steak.

If I do a simple search likehttp://localhost:9200/items/_search?q=steak&size=100
I get at list like the following:

{

              {
                  # "_index": "items",
                  # "_type": "document",
                  # "_id": "41ae5059a28b4838ad3bbc19f84610ba",
                  # "_score": 2.1778517,
                  #
                    -
                    "_source": {
                        * "id":

"41ae5059a28b4838ad3bbc19f84610ba",
* "name": "Steak",
* "description": "\n"
}
},

.... A WHOLE BUNCH OF "name" : "Steak" ...

              {
                  # "_index": "items",
                  # "_type": "document",
                  # "_id": "f7c714f8b66f4854884379c3a954e4f6",
                  # "_score": 2.1087837,
                  #
                    -
                    "_source": {
                        * "id":

"f7c714f8b66f4854884379c3a954e4f6",
* "name": "Steak Encebollado",
* "description": "Steak with onions.\n"
}
},

This is not what I need for search suggestions. I basically want "the
set of all distinct names with the word Steak in them". In other
words, even if there are 300 documents with the name "Steak" I only
want the exact name "Steak" to be returned once. I want to see all
the variations of names that contain the word "Steak" exactly once.

How do I do that? I don't want to do it client side, so I want the
limit parameter to mean "return 100 distinct variations of the name
Steak".

Is this possible? Is it efficient?

Thanks,

Joel


(system) #3