Field count facet?


(caphrim007) #1

Hi folks,

I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.

I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.

So assuming the following data

doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"

it would return

field1: 2
field2: 2
field3: 1
field4: 1

Instead of

term: abc, count: 2
term: 123, count 3
term: fgh, count 1

I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.

Is this possible?

Thanks,
Tim


(Pavel Penchev) #2

Hi,

If you make a facet the sum of the facet values count will give you the
number of documents that have value for this field. Alternatively you
you could do a range query -
http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
If you don't specify the from/to boundaries then you would get all
documents that have value for the given field.

Regards,
Pavel

On 16.08.2011 16:45, caphrim007 wrote:

Hi folks,

I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.

I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.

So assuming the following data

doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"

it would return

field1: 2
field2: 2
field3: 1
field4: 1

Instead of

term: abc, count: 2
term: 123, count 3
term: fgh, count 1

I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.

Is this possible?

Thanks,
Tim


(caphrim007) #3

Thanks for the info. A couple more questions/clarifications.

If I were to make a facet that was the sum of the facet values, I
would need to know one of those values to begin with wouldn't I? I'm
only interested having a count of the number of fields for a query;
not a count of the number of different values for a specified field.

A range query also looks like it requires a field name; again, looking
for an aggregate count of fields, not field values.

Any idea?

Thanks,
Tim

On Aug 16, 9:14 am, Pavel Penchev pavel.penc...@gmail.com wrote:

Hi,

If you make a facet the sum of the facet values count will give you the
number of documents that have value for this field. Alternatively you
you could do a range query -http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
If you don't specify the from/to boundaries then you would get all
documents that have value for the given field.

Regards,
Pavel

On 16.08.2011 16:45, caphrim007 wrote:

Hi folks,

I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.

I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.

So assuming the following data

doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"

it would return

field1: 2
field2: 2
field3: 1
field4: 1

Instead of

term: abc, count: 2
term: 123, count 3
term: fgh, count 1

I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.

Is this possible?

Thanks,
Tim


(Pavel Penchev) #4

Ok here's my understanding - you have documents with completely dynamic
schema, fields can be added at any time and you don't know neither the
name nor the type of the field. Then you want to make a query that would
return which fields participate in the result and the number of
documents that have value for each field.

I'm not aware of a built in mechanism in ES to do that. A clumsy and
possibly slow way to do it:

  1. Using the mappings API you obtain all present fields and their types
    ('curl http://localhost:9200/myindex/_mapping?pretty=true' check
    http://www.elasticsearch.org/guide/reference/mapping/)
  2. You add to your query a facet request for each of the fields from 1)
  3. In the query response you take each facet and you calculate how many
    documents have some value for this field (loop through all the values, sum)
  4. Any field from 1) that doesn't have a facet result in 3) marks a
    field not present in the current result.

Hope this helps,
Pavel

On 16.08.2011 17:58, caphrim007 wrote:

Thanks for the info. A couple more questions/clarifications.

If I were to make a facet that was the sum of the facet values, I
would need to know one of those values to begin with wouldn't I? I'm
only interested having a count of the number of fields for a query;
not a count of the number of different values for a specified field.

A range query also looks like it requires a field name; again, looking
for an aggregate count of fields, not field values.

Any idea?

Thanks,
Tim

On Aug 16, 9:14 am, Pavel Penchevpavel.penc...@gmail.com wrote:

Hi,

If you make a facet the sum of the facet values count will give you the
number of documents that have value for this field. Alternatively you
you could do a range query -http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
If you don't specify the from/to boundaries then you would get all
documents that have value for the given field.

Regards,
Pavel

On 16.08.2011 16:45, caphrim007 wrote:

Hi folks,
I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.
I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.
So assuming the following data
doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"
it would return
field1: 2
field2: 2
field3: 1
field4: 1
Instead of
term: abc, count: 2
term: 123, count 3
term: fgh, count 1
I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.
Is this possible?
Thanks,
Tim


(caphrim007) #5

I was completely unaware of the _mapping endpoint.

I can definitely make this work now.

Thanks for the suggested steps Pavel!

-Tim

On Aug 17, 6:46 am, Pavel Penchev pavel.penc...@gmail.com wrote:

Ok here's my understanding - you have documents with completely dynamic
schema, fields can be added at any time and you don't know neither the
name nor the type of the field. Then you want to make a query that would
return which fields participate in the result and the number of
documents that have value for each field.

I'm not aware of a built in mechanism in ES to do that. A clumsy and
possibly slow way to do it:

  1. Using the mappings API you obtain all present fields and their types
    ('curlhttp://localhost:9200/myindex/_mapping?pretty=true'checkhttp://www.elasticsearch.org/guide/reference/mapping/)
  2. You add to your query a facet request for each of the fields from 1)
  3. In the query response you take each facet and you calculate how many
    documents have some value for this field (loop through all the values, sum)
  4. Any field from 1) that doesn't have a facet result in 3) marks a
    field not present in the current result.

Hope this helps,
Pavel

On 16.08.2011 17:58, caphrim007 wrote:

Thanks for the info. A couple more questions/clarifications.

If I were to make a facet that was the sum of the facet values, I
would need to know one of those values to begin with wouldn't I? I'm
only interested having a count of the number of fields for a query;
not a count of the number of different values for a specified field.

A range query also looks like it requires a field name; again, looking
for an aggregate count of fields, not field values.

Any idea?

Thanks,
Tim

On Aug 16, 9:14 am, Pavel Penchevpavel.penc...@gmail.com wrote:

Hi,

If you make a facet the sum of the facet values count will give you the
number of documents that have value for this field. Alternatively you
you could do a range query -http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
If you don't specify the from/to boundaries then you would get all
documents that have value for the given field.

Regards,
Pavel

On 16.08.2011 16:45, caphrim007 wrote:

Hi folks,
I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.
I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.
So assuming the following data
doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"
it would return
field1: 2
field2: 2
field3: 1
field4: 1
Instead of
term: abc, count: 2
term: 123, count 3
term: fgh, count 1
I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.
Is this possible?
Thanks,
Tim


(system) #6