Count of term occurrences in fields?

Hi,

I'm wondering if I might be able to use es to help solve an entity
classification problem I'm working on. I'm trying to classify words or
phrases by type based on a corpus of data. For example...

We might have a corpus of people

{ "first_name": "John", "last_name": "Jones", "state": "CA", }
{ "first_name": "John", "last_name": "Terry", "state": "CA", }
{ "first_name": "Richard", "last_name": "John", "state": "CA", }

With an incoming search term of "John" I want to understand how often it
occurs in which fields. The information I want back is something like

{ "first_name": 2, "last_name": 1 }

That would allow me to understand that "John" is most likely a first name,
but may be a last name.

Can anyone think of a way to achieve this with es?

thanks

rob

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

for your "simple" example I would use a Filter Facet
http://www.elasticsearch.org/guide/reference/api/search/facets/filter-facet/

{
"facets" : {
"first_name_facet" : {
"filter" : {
"term" : { "first_name" : "john" }
}
},
"last_name_facet" : {
"filter" : {
"term" : { "first_name" : "john" }
}
}
}
}

I hope this helps.

Cheers
Valentin

On Thursday, June 27, 2013 10:42:09 AM UTC+2, Rob Styles wrote:

Hi,

I'm wondering if I might be able to use es to help solve an entity
classification problem I'm working on. I'm trying to classify words or
phrases by type based on a corpus of data. For example...

We might have a corpus of people

{ "first_name": "John", "last_name": "Jones", "state": "CA", }
{ "first_name": "John", "last_name": "Terry", "state": "CA", }
{ "first_name": "Richard", "last_name": "John", "state": "CA", }

With an incoming search term of "John" I want to understand how often it
occurs in which fields. The information I want back is something like

{ "first_name": 2, "last_name": 1 }

That would allow me to understand that "John" is most likely a first name,
but may be a last name.

Can anyone think of a way to achieve this with es?

thanks

rob

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

That's exactly what I needed - combined with size 0 to return only the
facet counts it works exactly as I want :slight_smile:

Thanks a lot

rob

On 27 June 2013 10:23, Valentin pletzer@gmail.com wrote:

Hi,

for your "simple" example I would use a Filter Facet
http://www.elasticsearch.org/guide/reference/api/search/facets/filter-facet/

{
"facets" : {
"first_name_facet" : {
"filter" : {
"term" : { "first_name" : "john" }
}
},
"last_name_facet" : {
"filter" : {
"term" : { "first_name" : "john" }
}
}
}
}

I hope this helps.

Cheers
Valentin

On Thursday, June 27, 2013 10:42:09 AM UTC+2, Rob Styles wrote:

Hi,

I'm wondering if I might be able to use es to help solve an entity
classification problem I'm working on. I'm trying to classify words or
phrases by type based on a corpus of data. For example...

We might have a corpus of people

{ "first_name": "John", "last_name": "Jones", "state": "CA", }
{ "first_name": "John", "last_name": "Terry", "state": "CA", }
{ "first_name": "Richard", "last_name": "John", "state": "CA", }

With an incoming search term of "John" I want to understand how often it
occurs in which fields. The information I want back is something like

{ "first_name": 2, "last_name": 1 }

That would allow me to understand that "John" is most likely a first
name, but may be a last name.

Can anyone think of a way to achieve this with es?

thanks

rob

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.