Facet to find possible keys for querying


(Corey Nolet) #1

Hello,

I've got an "entity" document which looks like this:

{
id: 'id',
type: 'person',
tuples: [
{
key: 'nameFirst',
value: 'john',
type: 'string'
},
key: 'age',
value: '38',
type: 'int'
},
{
key: 'nameLast',
value: 'doe',
}
]
}

The tuples field has been mapped in ElasticSearch as a nested type where I
provide both analyzed and not_analyzed indices for each of the nested
fields (for exact and fuzzy match). What I'm trying to do is find, for each
entity's type field, the unique tuple key values along with their
associated types.

In other words, I want to write a web service where someone can start
typing "n" and I'll return "[{ key:'nameFirst', type:'string'}, { key:
'nameLast', type: 'string' }]" or they could start typing "a" and I'll
return "[{ key: 'age', type: 'int' }]. If they don't type anything, I'd
like to return the union between the two sets (where it includes nameLast,
nameFirst, and age).

As i'm reading, I'm seeing that this may be done with facets but I know
they have some limitations Is this something that would be possible to do
directly? I'm trying to do this all with one fast query if I can.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a1f6fa72-4753-43d3-9514-2bbf08bbb5a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Corey Nolet) #2

I forgot to mention, I need the ability for the user to specify they only
care about keys for the entity.type === 'person' (or any type for that
matter).

On Tuesday, March 4, 2014 11:13:27 PM UTC-5, Corey Nolet wrote:

Hello,

I've got an "entity" document which looks like this:

{
id: 'id',
type: 'person',
tuples: [
{
key: 'nameFirst',
value: 'john',
type: 'string'
},
key: 'age',
value: '38',
type: 'int'
},
{
key: 'nameLast',
value: 'doe',
}
]
}

The tuples field has been mapped in ElasticSearch as a nested type where I
provide both analyzed and not_analyzed indices for each of the nested
fields (for exact and fuzzy match). What I'm trying to do is find, for each
entity's type field, the unique tuple key values along with their
associated types.

In other words, I want to write a web service where someone can start
typing "n" and I'll return "[{ key:'nameFirst', type:'string'}, { key:
'nameLast', type: 'string' }]" or they could start typing "a" and I'll
return "[{ key: 'age', type: 'int' }]. If they don't type anything, I'd
like to return the union between the two sets (where it includes nameLast,
nameFirst, and age).

As i'm reading, I'm seeing that this may be done with facets but I know
they have some limitations Is this something that would be possible to do
directly? I'm trying to do this all with one fast query if I can.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c62e98d-3ad9-4a4f-b7c7-5620221c2380%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #3

Hey,

before answering your question, I think the approach of handling your data
might be problematic. You are actually mixing two things in your data and
your metadata (which is in every document). First the data itself (John
Doe, 38 years old), but you are also putting meta information in the same
document - maybe it makes more sense to put this data somewhere else (as it
hopefully applies for all documents of that type). Also your above approach
has another problem, you will not be able to execute number range queries
on the age of people, because the value field is configured to be a string

  • same goes for sorting.
    With that said, it might be more useful to have a dedicated index for your
    field configuration, which you can query for the usecase you outlined in
    your post. And you have a dedicated index for the data - splitting those
    IMO makes a lot of sense. On the other hand I dont know your data well
    enough, maybe I am completely wrong.

Back to your original question. If you store a document like the above, and
you execute searches on it, the full document always gets returned, not
just parts of it. You may want to read into parent-child/nested
functionality though (I still do not like that approach).

Facetting can only be done on single fields, so you will not get back the
tuple you actually need (you could join them via a script facet, but that
seems like another work around) - or again read about parent-child/nested
documents (again disliking this, but I guess you know this by now).

One last thing: Its nice to have everything in one query, but dont consider
this a must. If two queries solve your problem, it might make more sense.

--Alex

On Wed, Mar 5, 2014 at 5:15 AM, Corey Nolet cjnolet@gmail.com wrote:

I forgot to mention, I need the ability for the user to specify they only
care about keys for the entity.type === 'person' (or any type for that
matter).

On Tuesday, March 4, 2014 11:13:27 PM UTC-5, Corey Nolet wrote:

Hello,

I've got an "entity" document which looks like this:

{
id: 'id',
type: 'person',
tuples: [
{
key: 'nameFirst',
value: 'john',
type: 'string'
},
key: 'age',
value: '38',
type: 'int'
},
{
key: 'nameLast',
value: 'doe',
}
]
}

The tuples field has been mapped in ElasticSearch as a nested type where
I provide both analyzed and not_analyzed indices for each of the nested
fields (for exact and fuzzy match). What I'm trying to do is find, for each
entity's type field, the unique tuple key values along with their
associated types.

In other words, I want to write a web service where someone can start
typing "n" and I'll return "[{ key:'nameFirst', type:'string'}, { key:
'nameLast', type: 'string' }]" or they could start typing "a" and I'll
return "[{ key: 'age', type: 'int' }]. If they don't type anything, I'd
like to return the union between the two sets (where it includes nameLast,
nameFirst, and age).

As i'm reading, I'm seeing that this may be done with facets but I know
they have some limitations Is this something that would be possible to do
directly? I'm trying to do this all with one fast query if I can.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5c62e98d-3ad9-4a4f-b7c7-5620221c2380%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5c62e98d-3ad9-4a4f-b7c7-5620221c2380%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9FK70q7hBbjDzFbM%2B%3DSCtZO9ADbESxcXQxsved%3De5s-w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Corey Nolet) #4

Thanks for your reply Alex. I have replies inline.

bq. but you are also putting meta information in the same document -

Correct. My elasticsearch implementation is part of a larger framework.
Similar to Pig, Hive, Avro and other data model-agnostic frameworks, I pass
along a small piece metadata with each key/value that gets stored on an
object. This promotes change of models without breaking the analytics
processing or view layers.

bq. you will not be able to execute number range queries on the age of
people

The model I've given in my previous post is actually a little dumbed down.
The framework has a value normalization system that knows how to turn
native datatypes into lexicographically sortable strings (fixed length byte
arrays or strings for longs/ints, etc...). What I'm showing in my previous
post is simply a hand typed version of the actual data model.

bq. it might be more useful to have a dedicated index for your field
configuration, which you can query for the usecase you outlined in your
post. And you have a dedicated index for the data

This solution sounds wonderful! Is there a way I can do this automatically
in ElasticSearch? I know one of the things I did in my mappings was to
bifurcate the indexes for each of the tuples so that one index I can do
exact matches and the other index I can do fuzzy matches (I believe I just
used one with analyzed and one with not_analyzed). Is this where i'd tell
it to index all unique tuple key names for me? I agree with you on the
facets, I'd rather not have to perform an aggregated query on ALL the
entity types if it's not necessary.

Thanks much!

On Thursday, March 6, 2014 3:59:43 AM UTC-5, Alexander Reelsen wrote:

Hey,

before answering your question, I think the approach of handling your data
might be problematic. You are actually mixing two things in your data and
your metadata (which is in every document). First the data itself (John
Doe, 38 years old), but you are also putting meta information in the same
document - maybe it makes more sense to put this data somewhere else (as it
hopefully applies for all documents of that type). Also your above approach
has another problem, you will not be able to execute number range queries
on the age of people, because the value field is configured to be a string

  • same goes for sorting.
    With that said, it might be more useful to have a dedicated index for your
    field configuration, which you can query for the usecase you outlined in
    your post. And you have a dedicated index for the data - splitting those
    IMO makes a lot of sense. On the other hand I dont know your data well
    enough, maybe I am completely wrong.

Back to your original question. If you store a document like the above,
and you execute searches on it, the full document always gets returned, not
just parts of it. You may want to read into parent-child/nested
functionality though (I still do not like that approach).

Facetting can only be done on single fields, so you will not get back the
tuple you actually need (you could join them via a script facet, but that
seems like another work around) - or again read about parent-child/nested
documents (again disliking this, but I guess you know this by now).

One last thing: Its nice to have everything in one query, but dont
consider this a must. If two queries solve your problem, it might make more
sense.

--Alex

On Wed, Mar 5, 2014 at 5:15 AM, Corey Nolet <cjn...@gmail.com<javascript:>

wrote:

I forgot to mention, I need the ability for the user to specify they only
care about keys for the entity.type === 'person' (or any type for that
matter).

On Tuesday, March 4, 2014 11:13:27 PM UTC-5, Corey Nolet wrote:

Hello,

I've got an "entity" document which looks like this:

{
id: 'id',
type: 'person',
tuples: [
{
key: 'nameFirst',
value: 'john',
type: 'string'
},
key: 'age',
value: '38',
type: 'int'
},
{
key: 'nameLast',
value: 'doe',
}
]
}

The tuples field has been mapped in ElasticSearch as a nested type where
I provide both analyzed and not_analyzed indices for each of the nested
fields (for exact and fuzzy match). What I'm trying to do is find, for each
entity's type field, the unique tuple key values along with their
associated types.

In other words, I want to write a web service where someone can start
typing "n" and I'll return "[{ key:'nameFirst', type:'string'}, { key:
'nameLast', type: 'string' }]" or they could start typing "a" and I'll
return "[{ key: 'age', type: 'int' }]. If they don't type anything, I'd
like to return the union between the two sets (where it includes nameLast,
nameFirst, and age).

As i'm reading, I'm seeing that this may be done with facets but I know
they have some limitations Is this something that would be possible to do
directly? I'm trying to do this all with one fast query if I can.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5c62e98d-3ad9-4a4f-b7c7-5620221c2380%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5c62e98d-3ad9-4a4f-b7c7-5620221c2380%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b532ed7d-34ae-46d6-8409-b56d30207aee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5