Term facets - getting result after splitting the filed WRT space


(vineeth mohan) #1

Hi ,
If i run term facet on a particular field , i am getting the result of
terms after splitting the filed WRT space.
For eg :

If the fields are - "Mr King","King Kong","Mr CEO","CEO OF" , "Mr King"
After running term facet on this

{
"facets": {
"Categories": {
"terms": {
"field": "Name",
"size": 10
}
}
}
}

Am getting the result as

"Mr" - 3
"King" - 3
Kong - 1
CEO - 2
OF -1

What i expected was

"Mr King" - 2
"King Kong"-1
"Mr CEO" - 1

And so on...

Is this the right behavior.
If it is , what is the other alternative to the desired output.

Thanks
Vineeth


(Clinton Gormley) #2

On Wed, 2011-11-30 at 19:19 +0530, Vineeth Mohan wrote:

Hi ,
If i run term facet on a particular field , i am getting the result of
terms after splitting the filed WRT space.

This is correct - your field is being analyzed so what is stored is the
result of that analysis (ie 'mr', 'king', 'kong', etc)

If you want the original phrase to be preserved, then you should map
that field to have {"index": "not_analyzed"}

However, that has implications for searching too, because you then
wouldn't be able to search for "king".

If you want to be able to do both, then you should use multi-fields,
with one sub-field analyzed (for searching), and one sub-field not
analyzed (for facets)

clint

For eg :

If the fields are - "Mr King","King Kong","Mr CEO","CEO OF" , "Mr
King"
After running term facet on this

{
"facets": {
"Categories": {
"terms": {
"field": "Name",
"size": 10
}
}
}
}

Am getting the result as

"Mr" - 3
"King" - 3
Kong - 1
CEO - 2
OF -1

What i expected was

"Mr King" - 2
"King Kong"-1
"Mr CEO" - 1

And so on...

Is this the right behavior.
If it is , what is the other alternative to the desired output.

Thanks
Vineeth


(vineeth mohan) #3

I have a particular field whose key name is not known before hand.

So the scema looks like

Entities : { name : vm , vm : [ {name : abc } , {name : bcd}]}

Here VM is not know before hand and i want to make field Entities.X.name as
faceting field.
How will i do index: not_analyzed for name alone ?

Thanks
Vineeth

On Wed, Nov 30, 2011 at 7:28 PM, Clinton Gormley clint@traveljury.comwrote:

On Wed, 2011-11-30 at 19:19 +0530, Vineeth Mohan wrote:

Hi ,
If i run term facet on a particular field , i am getting the result of
terms after splitting the filed WRT space.

This is correct - your field is being analyzed so what is stored is the
result of that analysis (ie 'mr', 'king', 'kong', etc)

If you want the original phrase to be preserved, then you should map
that field to have {"index": "not_analyzed"}

However, that has implications for searching too, because you then
wouldn't be able to search for "king".

If you want to be able to do both, then you should use multi-fields,
with one sub-field analyzed (for searching), and one sub-field not
analyzed (for facets)

clint

For eg :

If the fields are - "Mr King","King Kong","Mr CEO","CEO OF" , "Mr
King"
After running term facet on this

{
"facets": {
"Categories": {
"terms": {
"field": "Name",
"size": 10
}
}
}
}

Am getting the result as

"Mr" - 3
"King" - 3
Kong - 1
CEO - 2
OF -1

What i expected was

"Mr King" - 2
"King Kong"-1
"Mr CEO" - 1

And so on...

Is this the right behavior.
If it is , what is the other alternative to the desired output.

Thanks
Vineeth


(vineeth mohan) #4

So the question is more like how will i change the default of index to
not_analyzed for those field which are not specified in the schema.

Thanks
Vineeth

On Wed, Nov 30, 2011 at 7:37 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

I have a particular field whose key name is not known before hand.

So the scema looks like

Entities : { name : vm , vm : [ {name : abc } , {name : bcd}]}

Here VM is not know before hand and i want to make field Entities.X.nameas faceting field.
How will i do index: not_analyzed for name alone ?

Thanks
Vineeth

On Wed, Nov 30, 2011 at 7:28 PM, Clinton Gormley clint@traveljury.comwrote:

On Wed, 2011-11-30 at 19:19 +0530, Vineeth Mohan wrote:

Hi ,
If i run term facet on a particular field , i am getting the result of
terms after splitting the filed WRT space.

This is correct - your field is being analyzed so what is stored is the
result of that analysis (ie 'mr', 'king', 'kong', etc)

If you want the original phrase to be preserved, then you should map
that field to have {"index": "not_analyzed"}

However, that has implications for searching too, because you then
wouldn't be able to search for "king".

If you want to be able to do both, then you should use multi-fields,
with one sub-field analyzed (for searching), and one sub-field not
analyzed (for facets)

clint

For eg :

If the fields are - "Mr King","King Kong","Mr CEO","CEO OF" , "Mr
King"
After running term facet on this

{
"facets": {
"Categories": {
"terms": {
"field": "Name",
"size": 10
}
}
}
}

Am getting the result as

"Mr" - 3
"King" - 3
Kong - 1
CEO - 2
OF -1

What i expected was

"Mr King" - 2
"King Kong"-1
"Mr CEO" - 1

And so on...

Is this the right behavior.
If it is , what is the other alternative to the desired output.

Thanks
Vineeth


(Clinton Gormley) #5

On Wed, 2011-11-30 at 19:44 +0530, Vineeth Mohan wrote:

So the question is more like how will i change the default of index to
not_analyzed for those field which are not specified in the schema.

Have a look at Dynamic Mapping:
http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html

clint

Thanks
Vineeth

On Wed, Nov 30, 2011 at 7:37 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:
I have a particular field whose key name is not known before
hand.

    So the scema looks like 
    
    Entities : {  name : vm , vm :  [ {name : abc } , {name :
    bcd}]}
    
    Here VM is not know before hand and i want to make field
    Entities.X.name as faceting field.
    How will i do index: not_analyzed for name alone ?
    
    Thanks
               Vineeth
    
    
    
    On Wed, Nov 30, 2011 at 7:28 PM, Clinton Gormley
    <clint@traveljury.com> wrote:
            On Wed, 2011-11-30 at 19:19 +0530, Vineeth Mohan
            wrote:
            > Hi ,
            > If i run term facet on a particular field , i am
            getting the result of
            > terms after splitting the filed WRT space.
            
            
            This is correct - your field is being analyzed so what
            is stored is the
            result of that analysis (ie 'mr', 'king', 'kong', etc)
            
            If you want the original phrase to be preserved, then
            you should map
            that field to have {"index": "not_analyzed"}
            
            However, that has implications for searching too,
            because you then
            wouldn't be able to search for "king".
            
            If you want to be able to do both, then you should use
            multi-fields,
            with one sub-field analyzed (for searching), and one
            sub-field not
            analyzed (for facets)
            
            clint
            
            
            > For eg :
            >
            > If the fields are - "Mr King","King Kong","Mr
            CEO","CEO OF" , "Mr
            > King"
            > After running term facet on this
            >
            > {
            >   "facets": {
            >     "Categories": {
            >       "terms": {
            >         "field": "Name",
            >         "size": 10
            >       }
            >     }
            >   }
            > }
            >
            >
            > Am getting the result as
            >
            > "Mr" - 3
            > "King" - 3
            > Kong - 1
            > CEO - 2
            > OF -1
            >
            > What i expected was
            >
            > "Mr King" - 2
            > "King Kong"-1
            > "Mr CEO" - 1
            >
            > And so on...
            >
            > Is this the right behavior.
            > If it is , what is the other alternative to the
            desired output.
            >
            > Thanks
            >           Vineeth
            >
            >

(vineeth mohan) #6

Ok , i have created the following file

XYZ@XYZ:~/elasticSearch$ cat config/default-mapping.json
{
"default" : {
"index" : "not_analyzed"
}
}

But its not helping.... :frowning:

Is there something i have missed out

Thanks
Vineeth

On Wed, Nov 30, 2011 at 8:19 PM, Clinton Gormley clint@traveljury.comwrote:

On Wed, 2011-11-30 at 19:44 +0530, Vineeth Mohan wrote:

So the question is more like how will i change the default of index to
not_analyzed for those field which are not specified in the schema.

Have a look at Dynamic Mapping:
http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html

clint

Thanks
Vineeth

On Wed, Nov 30, 2011 at 7:37 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:
I have a particular field whose key name is not known before
hand.

    So the scema looks like

    Entities : {  name : vm , vm :  [ {name : abc } , {name :
    bcd}]}

    Here VM is not know before hand and i want to make field
    Entities.X.name as faceting field.
    How will i do index: not_analyzed for name alone ?

    Thanks
               Vineeth



    On Wed, Nov 30, 2011 at 7:28 PM, Clinton Gormley
    <clint@traveljury.com> wrote:
            On Wed, 2011-11-30 at 19:19 +0530, Vineeth Mohan
            wrote:
            > Hi ,
            > If i run term facet on a particular field , i am
            getting the result of
            > terms after splitting the filed WRT space.


            This is correct - your field is being analyzed so what
            is stored is the
            result of that analysis (ie 'mr', 'king', 'kong', etc)

            If you want the original phrase to be preserved, then
            you should map
            that field to have {"index": "not_analyzed"}

            However, that has implications for searching too,
            because you then
            wouldn't be able to search for "king".

            If you want to be able to do both, then you should use
            multi-fields,
            with one sub-field analyzed (for searching), and one
            sub-field not
            analyzed (for facets)

            clint


            > For eg :
            >
            > If the fields are - "Mr King","King Kong","Mr
            CEO","CEO OF" , "Mr
            > King"
            > After running term facet on this
            >
            > {
            >   "facets": {
            >     "Categories": {
            >       "terms": {
            >         "field": "Name",
            >         "size": 10
            >       }
            >     }
            >   }
            > }
            >
            >
            > Am getting the result as
            >
            > "Mr" - 3
            > "King" - 3
            > Kong - 1
            > CEO - 2
            > OF -1
            >
            > What i expected was
            >
            > "Mr King" - 2
            > "King Kong"-1
            > "Mr CEO" - 1
            >
            > And so on...
            >
            > Is this the right behavior.
            > If it is , what is the other alternative to the
            desired output.
            >
            > Thanks
            >           Vineeth
            >
            >

(vineeth mohan) #7

Finally i used dynamic template mapping and get the thing working.

Once again clint saved my day :slight_smile:

Thanks
Vineeth

On Wed, Nov 30, 2011 at 8:52 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Ok , i have created the following file

XYZ@XYZ:~/elasticSearch$ cat config/default-mapping.json
{
"default" : {
"index" : "not_analyzed"
}
}

But its not helping.... :frowning:

Is there something i have missed out

Thanks
Vineeth

On Wed, Nov 30, 2011 at 8:19 PM, Clinton Gormley clint@traveljury.comwrote:

On Wed, 2011-11-30 at 19:44 +0530, Vineeth Mohan wrote:

So the question is more like how will i change the default of index to
not_analyzed for those field which are not specified in the schema.

Have a look at Dynamic Mapping:
http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html

clint

Thanks
Vineeth

On Wed, Nov 30, 2011 at 7:37 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:
I have a particular field whose key name is not known before
hand.

    So the scema looks like

    Entities : {  name : vm , vm :  [ {name : abc } , {name :
    bcd}]}

    Here VM is not know before hand and i want to make field
    Entities.X.name as faceting field.
    How will i do index: not_analyzed for name alone ?

    Thanks
               Vineeth



    On Wed, Nov 30, 2011 at 7:28 PM, Clinton Gormley
    <clint@traveljury.com> wrote:
            On Wed, 2011-11-30 at 19:19 +0530, Vineeth Mohan
            wrote:
            > Hi ,
            > If i run term facet on a particular field , i am
            getting the result of
            > terms after splitting the filed WRT space.


            This is correct - your field is being analyzed so what
            is stored is the
            result of that analysis (ie 'mr', 'king', 'kong', etc)

            If you want the original phrase to be preserved, then
            you should map
            that field to have {"index": "not_analyzed"}

            However, that has implications for searching too,
            because you then
            wouldn't be able to search for "king".

            If you want to be able to do both, then you should use
            multi-fields,
            with one sub-field analyzed (for searching), and one
            sub-field not
            analyzed (for facets)

            clint


            > For eg :
            >
            > If the fields are - "Mr King","King Kong","Mr
            CEO","CEO OF" , "Mr
            > King"
            > After running term facet on this
            >
            > {
            >   "facets": {
            >     "Categories": {
            >       "terms": {
            >         "field": "Name",
            >         "size": 10
            >       }
            >     }
            >   }
            > }
            >
            >
            > Am getting the result as
            >
            > "Mr" - 3
            > "King" - 3
            > Kong - 1
            > CEO - 2
            > OF -1
            >
            > What i expected was
            >
            > "Mr King" - 2
            > "King Kong"-1
            > "Mr CEO" - 1
            >
            > And so on...
            >
            > Is this the right behavior.
            > If it is , what is the other alternative to the
            desired output.
            >
            > Thanks
            >           Vineeth
            >
            >

(Clinton Gormley) #8

On Thu, 2011-12-01 at 11:58 +0530, Vineeth Mohan wrote:

Finally i used dynamic template mapping and get the thing working.

Once again clint saved my day :slight_smile:

Perhaps I should do the ElasticSearch thing and take a new super-hero
name every morning :slight_smile:

glad it helped

clint


(system) #9