Need count of terms using facets, taking space into account


(anjesh) #1

Hi,

I have a following data in ES.
{
"title": "title1 of project",
"organization": "XYZ company"
},
{
"title": "title2 of project",
"organization": "XYZ company"
},
{
"title": "title3 of project",
"organization": "ABC company"
},

I need the count of organizations as follows:
"ABC company":1
"XYZ company": 2

I tried using facets but facets give the count of words

curl -X POST http://localhost:9200/testcompany/activity/_search?pretty=true-d '
{ "query" : {"match_all": {} },
"facets" : {"organization" : {"terms" : {"field": "organization"}}}}'

gives

"facets" : {
"organization" : {
"_type" : "terms",
"missing" : 0,
"total" : 6,
"other" : 0,
"terms" : [ {
"term" : "company",
"count" : 3
}, {
"term" : "xyz",
"count" : 2
}, {
"term" : "abc",
"count" : 1
} ]
}

I have no idea if there are any options which checks for the whole phrase
than words in the facet terms.
Tried searching here and there but couldn't find anything.

Thanks
Anjesh.


(jagdeep) #2

It basically depends on mapping. You have used default standard
analyzer, insted of that you need to use keyword analyzer.

Regards
Jagdeep

On May 2, 6:40 pm, anjesh anjeshtulad...@gmail.com wrote:

Hi,

I have a following data in ES.
{
"title": "title1 of project",
"organization": "XYZ company"},

{
"title": "title2 of project",
"organization": "XYZ company"},

{
"title": "title3 of project",
"organization": "ABC company"

},

I need the count of organizations as follows:
"ABC company":1
"XYZ company": 2

I tried using facets but facets give the count of words

curl -X POSThttp://localhost:9200/testcompany/activity/_search?pretty=true-d'
{ "query" : {"match_all": {} },
"facets" : {"organization" : {"terms" : {"field": "organization"}}}}'

gives

"facets" : {
"organization" : {
"_type" : "terms",
"missing" : 0,
"total" : 6,
"other" : 0,
"terms" : [ {
"term" : "company",
"count" : 3
}, {
"term" : "xyz",
"count" : 2
}, {
"term" : "abc",
"count" : 1
} ]
}

I have no idea if there are any options which checks for the whole phrase
than words in the facet terms.
Tried searching here and there but couldn't find anything.

Thanks
Anjesh.


(Sumit Guptaa) #3

hi jagdeep

i am also facing same problem can u give me one mapping example for this implementation it would be very helpful to me...

thanx
Sumit Gupta


(Marcin Dojwa) #4

Hi,

I think that setting index for 'organization' to 'not_analyzed' should work
like you want.

Best regards.

2012/5/3 Sumit Guptaa sumit.gupta.ngi@gmail.com

hi jagdeep

i am also facing same problem can u give me one mapping example for this
implementation it would be very helpful to me...

thanx
Sumit Gupta

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3958655.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Sumit Guptaa) #5

Hi Marcin

for getting the phrase count "not_analyzed" is not working..so if u hv any idea for searching the phrase using facet query. please help me..

Thanx
Sumit Gupta


(sujoysett) #6

First register custom analyzers, using your own configuration in the format
like following, along with the index creation API

{
"index": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"standard1": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase"
]
},
"keyword1": {
"type": "custom",
"tokenizer": "keyword"
},
"keyword2": {
"type": "pattern",
"pattern": ","
}
}
}
}
}

Next, use these analyzers to map to individual fields of the data you are
going to post in your index. Use something like following in the update
mapping API

{
"mediasource": {
"properties": {
"mediaSourceTypeId": {
"index": "analyzed",
"type": "integer"
},
"isuName": {
"analyzer": "keyword1",
"type": "string"
},
"newsCategories": {
"properties": {
"category": {
"analyzer": "keyword1",
"type": "string"
},
"category_words": {
"analyzer": "keyword1",
"type": "string"
},
"score": {
"index": "analyzed",
"type": "double"
}
}
}
}
}
}

For your data example, you have to analyze "organization" field with
keyword analyzer. Just like I did for "isuName" field in my example.

On Thursday, May 3, 2012 3:45:58 PM UTC+5:30, Sumit Gupta wrote:

Hi Marcin

for getting the phrase count "not_analyzed" is not working..so if u hv any
idea for searching the phrase using facet query. please help me..

Thanx
Sumit Gupta

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3958739.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Sumit Guptaa) #7

Hi sujoysett
thanks for ur quick response.

after giving the mapping that u define when we search for phrase there is no hit and searching for term the result like this...

curl -XPUT 'http://localhost:9200/my_twitter1/my_tweet/1' -d '{
"user" : "hi hello how",
"post_date" : "2011-09-20T16:20:00",
"message" : "abc xyz def abc xyz def"
}'

and when we want to apply this facet query like

curl -X POST 'localhost:9200/my_twitter1/my_tweet/_search?pretty=true' -d '{
"query": {
"term": {
"message": "abc"
}
},
"facets": {
"message": {
"terms": {
"field": "message"
}
}
}
}'

and i m getting the result for all the count of abc like "abc":1,"xyz":1,"def":1 and when we search for "abc xyz" ther is no hit..

so please help me how i can search for "abc xyz" and also find the count "abc xyz" using facet query..

thanx
Sumit Gupta


(jagdeep) #8

Sumit change the of this field to keyword as explained by Sujoy. By
default its using standard analyzer.
"message": {
"analyzer": "keyword",
"type": "string"
},

Regards
Jagdeep

On May 3, 6:12 pm, Sumit Guptaa sumit.gupta....@gmail.com wrote:

Hi sujoysett
thanks for ur quick response.

after giving the mapping that u define when we search for phrase there is
no hit and searching for term the result like this...

curl -XPUT 'http://localhost:9200/my_twitter1/my_tweet/1'-d '{
"user" : "hi hello how",
"post_date" : "2011-09-20T16:20:00",
"message" : "abc xyz def abc xyz def"

}'

and when we want to apply this facet query like

curl -X POST 'localhost:9200/my_twitter1/my_tweet/_search?pretty=true' -d '{
"query": {
"term": {
"message": "abc"
}
},
"facets": {
"message": {
"terms": {
"field": "message"
}
}
}

}'

and i m getting the result for all the count of abc like
"abc":1,"xyz":1,"def":1 and when we search for "abc xyz" ther is no
hit..

so please help me how i can search for "abc xyz" and also find the count
"abc xyz" using facet query..

thanx
Sumit Gupta

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-u...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Sumit Guptaa) #9

Hi jagdeep

please can u give me one example for searching like "abc xyz" using facet query that give the count "abc xyz"for also...

because i am unable to get search "abc xyz" using facet query ...

Thanx,
Sumit Gupta


(Ivan Brusic) #10

Sumit,

I do not think you will be able to achieve what you want without
implementing a custom tokenizer. Analyzed tokenizers will tokenize on
whitespace, and keyword analyzers take the whole term without
stemming/splitting. You need a tokenizer that tokenizes a string into
different permutations of the terms. Something like this tokenizer
must already exist, but I do not think it is part of the default
Lucene/ElasticSearch packages.

Cheers,

Ivan

On Thu, May 3, 2012 at 9:31 AM, Sumit Guptaa sumit.gupta.ngi@gmail.com wrote:

Hi jagdeep

please can u give me one example for searching like "abc xyz" using facet
query that give the count "abc xyz"for also...

because i am unable to get search "abc xyz" using facet query ...

Thanx,
Sumit Gupta

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3959783.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Sumit Guptaa) #11

hi Ivan

can u give me the full implementation for this so that i am able to perform the facet query on phrase....please help me...

Thanx,
Sumit Gupta


(anjesh) #12

hi

I managed to get what i am looking for using the followings. Thanks
Jagdeep. I posted in entirety - that should work.

curl -XDELETE http://localhost:9200/testcompany/
curl -XPUT http://localhost:9200/testcompany/
curl -XPUT 'http://localhost:9200/testcompany/activity1/_mapping' -d '{
"activity" : {
"properties" : {
"organization" : {"analyzer": "keyword", "type": "string"}
}
}
}'
curl -XPUT http://localhost:9200/testcompany/activity1/1 -d '{
"title": "title1 of project",
"organization": "ABC company"
}'
curl -XPUT http://localhost:9200/testcompany/activity1/2 -d '{
"title": "title2 of project",
"organization": "XYZ company"
}'
curl -XPUT http://localhost:9200/testcompany/activity1/3 -d '{
"title": "title3 of project",
"organization": "XYZ company"
}'
curl -X POST http://localhost:9200/testcompany/activity1/_search?pretty=true-d
'{
"query" : {"match_all":{}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}
}'

gives

"facets" : {
"organization" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "XYZ company",
"count" : 2
}, {
"term" : "ABC company",
"count" : 1
} ]
}

However now i can't search for ABC in organization field, as Sumit seems to
be asking.

curl -X POST http://localhost:9200/testcompany/activity1/_search?pretty=true-d
'{
"query" : {"term":{"organization": "ABC"}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}
}'

gives 0 hits.

But

curl -X POST http://localhost:9200/testcompany/activity1/_search?pretty=true-d
'{
"query" : {"term":{"organization": "ABC company"}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}
}'

gives

"facets" : {
"organization" : {
"_type" : "terms",
"missing" : 0,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "ABC company",
"count" : 1
} ]
}

I think something is still missing there and i can't seem to figure it out.
The search is case sensitive in this case - "abc company" doesn't give
results. I don't fully understand the internals - notably tokens,
analyzers. I would appreciate if somebody could point to the appropriate
posts.

Best
Anjesh

On 5 May 2012 11:51, Sumit Guptaa sumit.gupta.ngi@gmail.com wrote:

hi Ivan

can u give me the full implementation for this so that i am able to perform
the facet query on phrase....please help me...

Thanx,
Sumit Gupta

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3964087.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(jagdeep) #13

You either have to use pattern analyzer with case_insensitive flag as
explained here
http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer.html

Or you need to use regex with case_insensitive flag in your query
string

Regards
Jagdeep

On May 5, 10:05 pm, anjesh anjeshtulad...@gmail.com wrote:

hi

I managed to get what i am looking for using the followings. Thanks
Jagdeep. I posted in entirety - that should work.

curl -XDELETEhttp://localhost:9200/testcompany/
curl -XPUThttp://localhost:9200/testcompany/
curl -XPUT 'http://localhost:9200/testcompany/activity1/_mapping'-d '{
"activity" : {
"properties" : {
"organization" : {"analyzer": "keyword", "type": "string"}
}
}}'

curl -XPUThttp://localhost:9200/testcompany/activity1/1-d '{
"title": "title1 of project",
"organization": "ABC company"}'

curl -XPUThttp://localhost:9200/testcompany/activity1/2-d '{
"title": "title2 of project",
"organization": "XYZ company"}'

curl -XPUThttp://localhost:9200/testcompany/activity1/3-d '{
"title": "title3 of project",
"organization": "XYZ company"}'

curl -X POSThttp://localhost:9200/testcompany/activity1/_search?pretty=true-d
'{
"query" : {"match_all":{}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}

}'

gives

"facets" : {
"organization" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "XYZ company",
"count" : 2
}, {
"term" : "ABC company",
"count" : 1
} ]
}

However now i can't search for ABC in organization field, as Sumit seems to
be asking.

curl -X POSThttp://localhost:9200/testcompany/activity1/_search?pretty=true-d
'{
"query" : {"term":{"organization": "ABC"}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}

}'

gives 0 hits.

But

curl -X POSThttp://localhost:9200/testcompany/activity1/_search?pretty=true-d
'{
"query" : {"term":{"organization": "ABC company"}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}

}'

gives

"facets" : {
"organization" : {
"_type" : "terms",
"missing" : 0,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "ABC company",
"count" : 1
} ]
}

I think something is still missing there and i can't seem to figure it out.
The search is case sensitive in this case - "abc company" doesn't give
results. I don't fully understand the internals - notably tokens,
analyzers. I would appreciate if somebody could point to the appropriate
posts.

Best
Anjesh

On 5 May 2012 11:51, Sumit Guptaa sumit.gupta....@gmail.com wrote:

hi Ivan

can u give me the full implementation for this so that i am able to perform
the facet query on phrase....please help me...

Thanx,
Sumit Gupta

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-u...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #14