How to count all occurences in ES via search api call?

Hey,

I have data stored in ES(1.4.2) as
1st document :
{
"col1":"123","col2":"tag1,tag2,tag4"
}
2nd...
{
"col1":"333","col2":"tag1,tag4,tag5"
}
3rd...
{
"col1":"111","col2":"tag1,tag1,tag5,tag5"
}

now when I am searching it via making search api call - Search is for tag1

  • it returns me the count of 3 where as I am looking for 4 since 3rd
    document is having tag1 two times -
    I am generating the stats data via kibana (3.1.2) so thats my client who
    makes call to ES?

Is there any analyzer or tokenizer that I should be using while creating an
index?
Though I am not using any special tokenizer or anything while creating
index - I am creating index on both col1 and col2.

Any workaround on this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74315594-b2e8-4b3d-972d-3891af6bc06b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Perhaps use nested documents for your tag list?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

I don't think you can aggregate substrings in a single field since normal
aggregation is based on matches against the entire document. By having
nested (or parent child), now you have documents for each tag that you can
aggregate against.

On Wednesday, January 14, 2015 at 11:35:07 AM UTC-8, Bhumir Jhaveri wrote:

Hey,

I have data stored in ES(1.4.2) as
1st document :
{
"col1":"123","col2":"tag1,tag2,tag4"
}
2nd...
{
"col1":"333","col2":"tag1,tag4,tag5"
}
3rd...
{
"col1":"111","col2":"tag1,tag1,tag5,tag5"
}

now when I am searching it via making search api call - Search is for tag1

  • it returns me the count of 3 where as I am looking for 4 since 3rd
    document is having tag1 two times -
    I am generating the stats data via kibana (3.1.2) so thats my client who
    makes call to ES?

Is there any analyzer or tokenizer that I should be using while creating
an index?
Though I am not using any special tokenizer or anything while creating
index - I am creating index on both col1 and col2.

Any workaround on this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f31b587c-0dae-4f4c-a317-62d81265338b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is there any other alternate since this particular field would be accessed
from kibana? and I dont think kibana has support for such aggregation or
may be I have not explored enough - to say it better.

On Wednesday, January 14, 2015 at 12:00:17 PM UTC-8, Ed Kim wrote:

Perhaps use nested documents for your tag list?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

I don't think you can aggregate substrings in a single field since normal
aggregation is based on matches against the entire document. By having
nested (or parent child), now you have documents for each tag that you can
aggregate against.

On Wednesday, January 14, 2015 at 11:35:07 AM UTC-8, Bhumir Jhaveri wrote:

Hey,

I have data stored in ES(1.4.2) as
1st document :
{
"col1":"123","col2":"tag1,tag2,tag4"
}
2nd...
{
"col1":"333","col2":"tag1,tag4,tag5"
}
3rd...
{
"col1":"111","col2":"tag1,tag1,tag5,tag5"
}

now when I am searching it via making search api call - Search is for
tag1 - it returns me the count of 3 where as I am looking for 4 since 3rd
document is having tag1 two times -
I am generating the stats data via kibana (3.1.2) so thats my client who
makes call to ES?

Is there any analyzer or tokenizer that I should be using while creating
an index?
Though I am not using any special tokenizer or anything while creating
index - I am creating index on both col1 and col2.

Any workaround on this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54bd81ec-9766-4868-8ca5-05cd6dbccb6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.