Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely identifies a
list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
If you have one set of ids per query and they are all in the filter that is
applied to the entire query, you can create a filtered alias for each id
list. Besides saving on network traffic you will also save on parsing this
huge list of ids for every request.
On Friday, November 2, 2012 12:32:43 PM UTC-4, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely identifies
a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
On Friday, November 2, 2012 10:01:20 AM UTC-7, Igor Motov wrote:
If you have one set of ids per query and they are all in the filter that
is applied to the entire query, you can create a filtered alias for each id
list. Besides saving on network traffic you will also save on parsing this
huge list of ids for every request.
On Friday, November 2, 2012 12:32:43 PM UTC-4, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely identifies
a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
On Friday, November 2, 2012 10:01:20 AM UTC-7, Igor Motov wrote:
If you have one set of ids per query and they are all in the filter that
is applied to the entire query, you can create a filtered alias for each id
list. Besides saving on network traffic you will also save on parsing this
huge list of ids for every request.
On Friday, November 2, 2012 12:32:43 PM UTC-4, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely
identifies a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
On Friday, November 2, 2012 10:01:20 AM UTC-7, Igor Motov wrote:
If you have one set of ids per query and they are all in the filter that
is applied to the entire query, you can create a filtered alias for each id
list. Besides saving on network traffic you will also save on parsing this
huge list of ids for every request.
On Friday, November 2, 2012 12:32:43 PM UTC-4, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely
identifies a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't
change the way I am storing data because of business requirements)
not sure if I understand it completely - but 1MB sized queries just because
of ID lists are scaring me.
Your challenge of ID substitution by value extraction looks a lot like
denormalization of data structures (an ID is replaced by the resource the
ID stands for).
since automatic denormalization needs some mechanics that are missing
currently in ES, you will have to work around some issues. Filter alias or
index views is one alternative, another one is maintaining ID resolution at
client side (so the client does not send IDs but other related values to
ES).
Yes, I work on a denormalization implementation, a mapper plugin
elasticsearch-mapper-iri. My plan is to expand documents automatically if
an IRI (International Resource Identifier) is detected in a source document
by looking up the resource of the IRI. Even an ES GetRequest in the
background will be possible, beside fetching non-ES JSON over HTTP.
Yes, think of a better data model at index time, not at search time.
Your business requirements seem valid in a relational data model where IDs
play a different role than in an inverted index. An ES data model is
document-centric, so you would have to rethink your data organization
completely. The challenge is constructing documents that do not rely on
"pointers" between them but on hierarchical JSON structures.
Best regards,
Jörg
On Friday, November 2, 2012 5:32:43 PM UTC+1, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely identifies
a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
Thanks for the comments Jörg. The reason for such big queries is because I
am getting multiple facets with each facet having its own facet filter. The
whole query explodes because I am trying to query this for around 50k ids.
I am looking at filter aliases and hopefully that will solve my problem.
But the fact is that I will have to continuously update aliases as and when
the batch of ids change. Your plugin sounds exactly what I need. Any
pointers to it and when you are planning to release it?
Thanks!
Vinay
On Friday, November 2, 2012 12:08:37 PM UTC-7, Jörg Prante wrote:
Hi,
not sure if I understand it completely - but 1MB sized queries just
because of ID lists are scaring me.
Your challenge of ID substitution by value extraction looks a lot like
denormalization of data structures (an ID is replaced by the resource the
ID stands for).
since automatic denormalization needs some mechanics that are missing
currently in ES, you will have to work around some issues. Filter alias or
index views is one alternative, another one is maintaining ID resolution at
client side (so the client does not send IDs but other related values to
ES).
Yes, I work on a denormalization implementation, a mapper plugin
elasticsearch-mapper-iri. My plan is to expand documents automatically if
an IRI (International Resource Identifier) is detected in a source document
by looking up the resource of the IRI. Even an ES GetRequest in the
background will be possible, beside fetching non-ES JSON over HTTP.
Yes, think of a better data model at index time, not at search time.
Your business requirements seem valid in a relational data model where IDs
play a different role than in an inverted index. An ES data model is
document-centric, so you would have to rethink your data organization
completely. The challenge is constructing documents that do not rely on
"pointers" between them but on hierarchical JSON structures.
Best regards,
Jörg
On Friday, November 2, 2012 5:32:43 PM UTC+1, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely identifies
a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
I hack Elasticsearch plugins in my spare time just for fun. Sorry, no ETA
for the plugin. It will be announced here on the list, of course.
Here is my status of activity in that area:
Netty websocket transport plugin and client (done)
ES client/server codebase modularization (pending)
HTTP Netty client (thanks to asynchttpclient done)
ES HTTP Java RESTful client (in progress)
IRI/Semantic Web content extraction plugin for ES client (open)
IRI/Semantic Web Mapper plugin for ES server (open)
The modularization task is just for better code dependency design and
smaller code units, it is pending because I wait for 0.21.
Because my day job is focused on indexing library catalogs with the help of
the semantic web, where denormalizing identifiers is a common task, I hope
to make use of ES for semantic web data indexing and search.
On Saturday, November 3, 2012 1:50:02 AM UTC+1, revdev wrote:
Thanks for the comments Jörg. The reason for such big queries is because I
am getting multiple facets with each facet having its own facet filter. The
whole query explodes because I am trying to query this for around 50k ids.
I am looking at filter aliases and hopefully that will solve my problem.
But the fact is that I will have to continuously update aliases as and when
the batch of ids change. Your plugin sounds exactly what I need. Any
pointers to it and when you are planning to release it?
Thanks!
Vinay
On Friday, November 2, 2012 12:08:37 PM UTC-7, Jörg Prante wrote:
Hi,
not sure if I understand it completely - but 1MB sized queries just
because of ID lists are scaring me.
Your challenge of ID substitution by value extraction looks a lot like
denormalization of data structures (an ID is replaced by the resource the
ID stands for).
since automatic denormalization needs some mechanics that are missing
currently in ES, you will have to work around some issues. Filter alias or
index views is one alternative, another one is maintaining ID resolution at
client side (so the client does not send IDs but other related values to
ES).
Yes, I work on a denormalization implementation, a mapper plugin
elasticsearch-mapper-iri. My plan is to expand documents automatically if
an IRI (International Resource Identifier) is detected in a source document
by looking up the resource of the IRI. Even an ES GetRequest in the
background will be possible, beside fetching non-ES JSON over HTTP.
Yes, think of a better data model at index time, not at search time.
Your business requirements seem valid in a relational data model where IDs
play a different role than in an inverted index. An ES data model is
document-centric, so you would have to rethink your data organization
completely. The challenge is constructing documents that do not rely on
"pointers" between them but on hierarchical JSON structures.
Best regards,
Jörg
On Friday, November 2, 2012 5:32:43 PM UTC+1, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely
identifies a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't change
the way I am storing data because of business requirements)
I hack Elasticsearch plugins in my spare time just for fun. Sorry, no ETA
for the plugin. It will be announced here on the list, of course.
Here is my status of activity in that area:
Netty websocket transport plugin and client (done)
ES client/server codebase modularization (pending)
HTTP Netty client (thanks to asynchttpclient done)
ES HTTP Java RESTful client (in progress)
IRI/Semantic Web content extraction plugin for ES client (open)
IRI/Semantic Web Mapper plugin for ES server (open)
The modularization task is just for better code dependency design and
smaller code units, it is pending because I wait for 0.21.
Because my day job is focused on indexing library catalogs with the help
of the semantic web, where denormalizing identifiers is a common task, I
hope to make use of ES for semantic web data indexing and search.
On Saturday, November 3, 2012 1:50:02 AM UTC+1, revdev wrote:
Thanks for the comments Jörg. The reason for such big queries is because
I am getting multiple facets with each facet having its own facet filter.
The whole query explodes because I am trying to query this for around 50k
ids. I am looking at filter aliases and hopefully that will solve my
problem. But the fact is that I will have to continuously update aliases as
and when the batch of ids change. Your plugin sounds exactly what I need.
Any pointers to it and when you are planning to release it?
Thanks!
Vinay
On Friday, November 2, 2012 12:08:37 PM UTC-7, Jörg Prante wrote:
Hi,
not sure if I understand it completely - but 1MB sized queries just
because of ID lists are scaring me.
Your challenge of ID substitution by value extraction looks a lot like
denormalization of data structures (an ID is replaced by the resource the
ID stands for).
since automatic denormalization needs some mechanics that are missing
currently in ES, you will have to work around some issues. Filter alias or
index views is one alternative, another one is maintaining ID resolution at
client side (so the client does not send IDs but other related values to
ES).
Yes, I work on a denormalization implementation, a mapper plugin
elasticsearch-mapper-iri. My plan is to expand documents automatically if
an IRI (International Resource Identifier) is detected in a source document
by looking up the resource of the IRI. Even an ES GetRequest in the
background will be possible, beside fetching non-ES JSON over HTTP.
Yes, think of a better data model at index time, not at search time.
Your business requirements seem valid in a relational data model where IDs
play a different role than in an inverted index. An ES data model is
document-centric, so you would have to rethink your data organization
completely. The challenge is constructing documents that do not rely on
"pointers" between them but on hierarchical JSON structures.
Best regards,
Jörg
On Friday, November 2, 2012 5:32:43 PM UTC+1, revdev wrote:
Hi,
I am at a point where some of my search queries are close to 1MB, for
example queries where the "terms" filter has around 50k ids. I assume this
is putting a lot of network overhead to transfer such big queries to ES. I
am wondering if there is a way I can send "macro" in a query which is later
on translated by ES.
Let me describe how it would work:
Query sent to ES:
filter: { terms : { ids: [macro_label_list_of_ids] }}
where "macro_label_list_of_ids" is just a label which uniquely
identifies a list of integer ids.
Upon receiving this query, ES does a lookup for
"macro_label_list_of_ids" on another index (created by us) to extract
values of ids and then replaces these ids where "macro_label_list_of_ids"
appears in the query.
So, can you tell me three things:
Is this possible in ES current?
If not, is there any plans to implement this?
Is there any other workaround to shorten my queries? ( I can't
change the way I am storing data because of business requirements)
Thanks for the comments Jörg. The reason for such big queries is
because I am getting multiple facets with each facet having its own
facet filter.
Each facet with it's own filter? Why isn't there one "filtered" query
the results of which feed all facets? Does each facet really look at a
different set of ids?
Why isn't the original "macro_label_list_of_ids" in your original
"filter: { terms : { ids: [macro_label_list_of_ids] }}" representable in
the index as the ID and not all values in the list?
-Paul
The whole query explodes because I am trying to query this for around
50k ids. I am looking at filter aliases and hopefully that will solve
my problem. But the fact is that I will have to continuously update
aliases as and when the batch of ids change. Your plugin sounds
exactly what I need. Any pointers to it and when you are planning to
release it?
Hi P. Hill,
Yeah each facet has its own filter because they need to act on a subset of
data, not the entire data filtered by global query. Secondly, I cant
represent all "ids" as a single "group id" that groups them together
because these groups are ever changing. If I add the group id to a doc, I
might have to update all docs when an "id" moves out of a group.
But, I found the solution with "Nested Type". I am now storing data as an
"nested" array vs as an object. That works perfectly but requires more
storage on disk and RAM. Hopefully, it will scale smoothly :).
Thanks for the comments Jörg. The reason for such big queries is because
I am getting multiple facets with each facet having its own facet filter.
Each facet with it's own filter? Why isn't there one "filtered" query the
results of which feed all facets? Does each facet really look at a
different set of ids?
Why isn't the original "macro_label_list_of_ids" in your original "filter:
{ terms : { ids: [macro_label_list_of_ids] }}" representable in the index
as the ID and not all values in the list?
-Paul
The whole query explodes because I am trying to query this for around 50k
ids. I am looking at filter aliases and hopefully that will solve my
problem. But the fact is that I will have to continuously update aliases as
and when the batch of ids change. Your plugin sounds exactly what I need.
Any pointers to it and when you are planning to release it?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.