Spatial Query not searching both indexes

I have a query for test purposes where I've created 2 indexes both with
the same exact data.
When I query each index individually I get the desired results....When I
query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded
to delete some documents in geoindex1 hoping they would show now in
geoindex2 but that doesn't happen.
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error? Any help would be appreciated.

curl -XGET
'http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "500km",
"location" : {
"lat" : 45.59174,
"lon" : 11.4050
}
}
}
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

An update to this. When I used aliases I get the same results. It seems
to only query the index that was created first in the list. So even if I
put geoindex2,geoindex1....or geoindex4,geoindex2....It pulls the lowered
number index (Which I created first in my test).

If I run a NON-Spatial query for all indexes using either hard-coded
indexes geoindex1,geoindex2,geoindex3.... or using an alias referncing all
indexes the query DOES work. So it appears just related to spatial??

Thanks

On Monday, February 25, 2013 12:24:51 AM UTC-5, Dave O wrote:

I have a query for test purposes where I've created 2 indexes both with
the same exact data.
When I query each index individually I get the desired results....When I
query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded
to delete some documents in geoindex1 hoping they would show now in
geoindex2 but that doesn't happen.
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error? Any help would be appreciated.

curl -XGET '
http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "500km",
"location" : {
"lat" : 45.59174,
"lon" : 11.4050
}
}
}
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

An update to this. When I used aliases I get the same results. It seems
to only query the index that was created first. So even if I put
geoindex2,geoindex1....or geoindex4,geoindex2....It pulls the lowered
number index (Which I created first in my test).

If I run a NON-Spatial query for all indexes using either hard-coded
indexes geoindex1,geoindex2,geoindex3.
... or using an alias referncing all indexes the query DOES work. So it
appears just related to spatial??

Thanks

On Monday, February 25, 2013 12:24:51 AM UTC-5, Dave O wrote:

I have a query for test purposes where I've created 2 indexes both with
the same exact data.
When I query each index individually I get the desired results....When I
query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded
to delete some documents in geoindex1 hoping they would show now in
geoindex2 but that doesn't happen.
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error? Any help would be appreciated.

curl -XGET '
http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "500km",
"location" : {
"lat" : 45.59174,
"lon" : 11.4050
}
}
}
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Please see my testcase on github where I re-created the problem. Any help
would be greatly appreciated.
I am running elasticsearch 0.20.4 on a single instance (laptop) using
virtual box running a centos 6.3 OS.

Thank you

On Monday, February 25, 2013 12:24:51 AM UTC-5, Dave O wrote:

I have a query for test purposes where I've created 2 indexes both with
the same exact data.
When I query each index individually I get the desired results....When I
query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded
to delete some documents in geoindex1 hoping they would show now in
geoindex2 but that doesn't happen.
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error? Any help would be appreciated.

curl -XGET '
http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "500km",
"location" : {
"lat" : 45.59174,
"lon" : 11.4050
}
}
}
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

Please see my testcase on github where I re-created the problem. Any
help would be greatly appreciated.
I am running elasticsearch 0.20.4 on a single instance (laptop) using
virtual box running a centos 6.3 OS.

Your term query for country 'AD' won't work because "country" is defined
as a field of { type: "string" }, which means that it is "analyzed",
which means that "AD" will be indexed as "ad". But you are searching
for the EXACT term "AD", so it won't be found.

Set the country field to { type: "string", index: "not_analyzed" }

Then as far as why results are not being returned from geoindex11/12, I
think you're just getting the first 10 results which happen to be in
geoindex10.

Try setting { size: 100} in your query, to see more results

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Clinton, Great! Yes this worked beautiful in both cases. I have some
learning to do! Making the move from the RDBMS to ES. Great product.
Good work on the Perl client also. I've used it a little bit for my client
stuff. Just trying to get familiar with the technology first before I get
too much into the API's.

Issue Resolved!

Mike

On Monday, February 25, 2013 4:01:12 PM UTC-5, Clinton Gormley wrote:

Hiya

Please see my testcase on github where I re-created the problem. Any
help would be greatly appreciated.
I am running elasticsearch 0.20.4 on a single instance (laptop) using
virtual box running a centos 6.3 OS.

Your term query for country 'AD' won't work because "country" is defined
as a field of { type: "string" }, which means that it is "analyzed",
which means that "AD" will be indexed as "ad". But you are searching
for the EXACT term "AD", so it won't be found.

Set the country field to { type: "string", index: "not_analyzed" }

Then as far as why results are not being returned from geoindex11/12, I
think you're just getting the first 10 results which happen to be in
geoindex10.

Try setting { size: 100} in your query, to see more results

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Clint,
I had one observation that was hoping you could help. Not sure if this is
just my laptop PC or normal overheads (which it appears to be) but
basically I have 4 indexes each with about 9 million records (the same
exact records in each index). I just wanted to do some
benchmarking/performing testing to see how querying multiple indexes would
perform.

I'm noticing about half a second for one index and then it slowly increases
up to about 1.3 seconds when adding all 4. I'm doing the basic polygon
feature in my test case, no aggregates or anything. I would expect a
slight increase as indexes are added and this increase is very minor so I'm
okay with it and this seems like it could be normal. (especially on my
Laptop VM)

I was just wondering how parallelization or distribution works when
querying multiple indexes (or a single index with many shards). I
reviewed the documentation on parallel processes but was a little unclear.

If I have a query that queries 4 indexes do 4 processes get created. (or 4
threads). Would each one use a separate CPU if I have 4 CPU's. Is there
a way to tune this? As an example. Lets say I have 20 indexes that I
need to query at one time (all on the same box/node). 20 processes may
overload the system. Can the process count be controlled? Or do I just
need to know the limitations of the system and that maybe it can only
handle 10 processes??

If I have 20 servers each with 1 index and I query them all would this
behave any differently then 1 server with 20 indexes (as it relates to the
number of processes created)?

My hopes are that when I issue a query against say 10 indexes....that 10
processes get created simultaneously....each process queries it's targeted
index and then returns to aggregate results and back to user.....rather
then 10 processes running serially.

Thank you

On Monday, February 25, 2013 4:01:12 PM UTC-5, Clinton Gormley wrote:

Hiya

Please see my testcase on github where I re-created the problem. Any
help would be greatly appreciated.
I am running elasticsearch 0.20.4 on a single instance (laptop) using
virtual box running a centos 6.3 OS.

Your term query for country 'AD' won't work because "country" is defined
as a field of { type: "string" }, which means that it is "analyzed",
which means that "AD" will be indexed as "ad". But you are searching
for the EXACT term "AD", so it won't be found.

Set the country field to { type: "string", index: "not_analyzed" }

Then as far as why results are not being returned from geoindex11/12, I
think you're just getting the first 10 results which happen to be in
geoindex10.

Try setting { size: 100} in your query, to see more results

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Dave

I'm noticing about half a second for one index and then it slowly
increases up to about 1.3 seconds when adding all 4. I'm doing the
basic polygon feature in my test case, no aggregates or anything. I
would expect a slight increase as indexes are added and this increase
is very minor so I'm okay with it and this seems like it could be
normal. (especially on my Laptop VM)

How much ES_HEAP_SIZE are you giving to ES? And how much RAM are you
leaving to file system caches?

For geo calculations, the field value for every document in every index
being queries needs to be loaded into memory. Once loaded, that "cache"
isn't evicted by default, to speed up future queries.

But it can take up a lot of RAM, esp if you have small heap sizes.

I was just wondering how parallelization or distribution works when
querying multiple indexes (or a single index with many shards). I
reviewed the documentation on parallel processes but was a little
unclear.

Each shard is queried in parallel. So if you have 1 index with 5 shards,
or 5 indices with 1 shard, both would result in querying 5 shards in
parallel.

However, a single shard is concurrent and can make full use of the
resources of a single node, so hosting many shards on a single node
doesn't buy you concurrency. In fact, it'd probably slow things down a
bit, as there would be more context switches.

As you add nodes, shards are redistributed to them, spreading the load
and giving each shard access to more resources.

If I have a query that queries 4 indexes do 4 processes get created.
(or 4 threads). Would each one use a separate CPU if I have 4 CPU's.
Is there a way to tune this?

You can specify the max number of threads per node:

As an example. Lets say I have 20 indexes that I need to query at
one time (all on the same box/node). 20 processes may overload the
system. Can the process count be controlled? Or do I just need to
know the limitations of the system and that maybe it can only handle
10 processes??

If I have 20 servers each with 1 index and I query them all would this
behave any differently then 1 server with 20 indexes (as it relates to
the number of processes created)?

My hopes are that when I issue a query against say 10 indexes....that
10 processes get created simultaneously....each process queries it's
targeted index and then returns to aggregate results and back to
user.....rather then 10 processes running serially.

That's pretty much how it happens.

clint

On Monday, February 25, 2013 4:01:12 PM UTC-5, Clinton Gormley wrote:
Hiya

    > Please see my testcase on github where I re-created the
    problem.  Any 
    > help would be greatly appreciated. 
    > I am running elasticsearch 0.20.4 on a single instance
    (laptop) using 
    > virtual box running a centos 6.3 OS. 
    
    Your term query for country 'AD' won't work because "country"
    is defined 
    as a field of { type: "string" }, which means that it is
    "analyzed", 
    which means that "AD" will be indexed as "ad".  But you are
    searching 
    for the EXACT term "AD", so it won't be found. 
    
    Set the country field to { type: "string", index:
    "not_analyzed" } 
    
    Then as far as why results are not being returned from
    geoindex11/12, I 
    think you're just getting the first 10 results which happen to
    be in 
    geoindex10. 
    
    Try setting { size: 100} in your query, to see more results 
    
    clint 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.