Hi Honza,
This is my "full" code:
from elasticsearch import Elasticsearch
import json
import pandas as pd
import numpy as np
import os
create the connection to the ES
es = Elasticsearch("host:port", timeout=600, max_retries=10, revival_delay=0)
############################################################
####### READ IN THE ORIGINAL SURVEY DATA ###################
############################################################
origall = es.search('survey_data' ,'primary',
body = {"query":
{"bool":
{"must":
[{
"term": {"file": "original"}
}]
}
}
,"size" : "0"}
)
total_o = origall['hits']['total']
origall_o = es.search('tns_survey_data','primary',
body = {"query":
{"bool":
{"must":
[{
"term": {"file": "original_amit2"}
}]
}
}
,"size" : 20
}
)
force it to data frame
orig_dict = origall_o['hits']['hits']
############################################################
####### READ IN THE NEW SURVEY DATA ########################
############################################################
get the documents
newall = es.search('survey_data','primary',
{"query":
{
"bool":
{
"should":[
{
"term":{
"file":"destinationqc22"
}
},
{
"term":{
"file":"destinationqc33"
}
},
{
"term":{
"file":"destinationqc44"
}
}
]
}
}
,"size" : "0"
}
)
total_n = newall['hits']['total']
newall_n = es.search('tns_survey_data','primary',
{"query":
{
"bool":
{
"should":[
{
"term":{
"file":"destinationqc22"
}
},
{
"term":{
"file":"destinationqc33"
}
},
{
"term":{
"file":"destinationqc44"
}
}
]
}
}
,"size" : 20
}
)
force it to data frame
new_dict = newall_n['hits']['hits']
print(origall_o)
print(newall_n)
print orig_dict
print new_dict
And then I run it I get this:
print(origall_o)
{u'hits': {u'hits': , u'total': 110950, u'max_score': 0.7038795},
u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
u'timed_out': False}
print(newall_n)
{u'hits': {u'hits': , u'total': 110950, u'max_score': 0.7038795},
u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
u'timed_out': False}
print orig_dict
print new_dict
And what I would expect is:
origall_o total is correct (110k hits)
newall_n total should be 84k, not sure why it has the same 110k as for the
origall_o
And for the orig_dict and new_dict I would expect to see those 20 documents
that I query.
Many thanks for your help.
Geza
On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
Hi Geza,
I don't understand what you mean by re-running, can you post the complete
code?
When you do a search with size: 20, can you just print the result of
the search method and see if that data is there?
As a side note it looks like you are trying to filter out some data,
while this works with a query you will get much better performance
when using a filtered query and a filter instead of a query.
Honza
On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <kere...@gmail.com<javascript:>>
wrote:
Hello,
I am querying an elasticsearch index from python. Issue 1 is that when I
change my query and rerun it, my objects in Python don't get refreshed
according to my modified query. Issue 2 is that even if I see that I got
some hits, no data comes through at all (eg I see I've got 85k hits, but
when I put it in a dictionary, it is blank).
from elasticsearch import Elasticsearch
es = Elasticsearch("host:port", timeout=600, max_retries=10,
revival_delay=0)
origall = es.search('esdata' ,'primary',
{"query":
{"bool":
{"must_not":
[{
"term": {"file": "original"}
}]
}
}
,"size" : "0"}
)
total_o = origall['hits']['total']
At this stage for total_o I get 110k, which is correct. Then I rerun my
query after changing the size=0 to size=20, and if I want to have a look
at
these 20 hits, I get nothing for this:
orig = origall['hits']['hits']
print(orig)
Then I go back to my original query and change the must_not to must. In
this
way I should get 85k hits, but after rerunning it I still get 110k in
total_o.
It is quite random when it works and when it doesn't. Sometimes I get my
expected 85k hits, but then this get stuck and when I change my query
back
to get the 110k, it would still be 85k. Also sometimes I get data in my
orig
= origall['hits']['hits'], but then let's say I change the size in my
query
to 0, rerun it and the origall['hits']['hits'] will still give me back
the
data.
I use Anaconda, but tried also in Pycharm and the default Python IDLE,
these
behave the same. Tried to create separate ES connections for all my
queries,
doesn't help. Played around with cache, but no luck.
I'm running it on a 64 bit, Windows 7 machine.
Any idea what I'm doing wrong? Many thanks,
Geza
--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d246577-1604-45e7-9858-c48f533e8f4f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.