The error doesn't show when I specified more than one node which is good
:), but if I put only one it will give a division by zero error.
But now It popped up another issue this time one ES itself. I defined the
second node to be only data and the cluster name is the same as the master
node. If I try to connect directly to the second node on 9501 or even 9201
it gives me timeout.
How can I get a second node to work in cluster?
On Tuesday, 29 October 2013 16:46:59 UTC, Honza Král wrote:
Oops, there was a bug in the sniffing code, now fixed in master:
Thrift plugin uses different format in cluster stats · elastic/elasticsearch-py@04afc03 · GitHub
Can you please try again with master?
Thanks!
On Tue, Oct 29, 2013 at 5:29 PM, Mauro Farracha <farr...@gmail.com<javascript:>
wrote:
It's not related to Thrift, using http also shares this behaviour.
On Tuesday, 29 October 2013 16:27:32 UTC, Mauro Farracha wrote:
Hi Honza,
Yep, I understand the issue around dependencies. The minimum you can do
is probably add this sort of information in documentation.Regarding the mapping issue, you were right, adding the index type
solved the problem.Since elasticsearch-py uses connection pooling with round-robin by
default, I was wondering if I could get more improvement if I had two nodes
up, since I would distribute the load between two servers, but using
ThriftConnection it throws an error which I don't understand why It happens
since Im pretty sure that Im passing the right configuration:connection_pool.py", line 60, in select
self.rr %= len(connections)
ZeroDivisionError: integer division or modulo by zeroScenarios:
- two node, sniff_* properties => zerodivisionerror
- one node, sniff_* properties => zerodivisionerror (so it's an issue
with sniff properties?)- one node, no sniff_* properties => no problems
- two node, no sniff_* properties => timeout connecting to ES.
I'm understanding that round-robin is used on each request, right? So I
would end up sending one bulk action to node1 and the second would go to
node2?Thanks
On Tuesday, 29 October 2013 16:01:17 UTC, Honza Král wrote:
On Tue, Oct 29, 2013 at 4:02 PM, Mauro Farracha farr...@gmail.comwrote:
Thanks Honza.
I was importing the class. But not the right way missing the
connection part in from.I thought it was the case, I update the code so that it should work in
the future.I changed the serializer (used ujson as Jorg mentioned) and I got an
improvement from 2.66MB/s to 4.7MB/s.ah, good to know, I will give it a try. I wanted to avoid additional
dependencies, but if it makes sense I will happily switch the client to
ujson. have you also tried just passing in a big string?Then I configured ThriftConnection and the write performance increased
to 6.2MB/s.Not bad, but still far off from the 12MB/s from curl.
we still have to deserialize the response which curl doesn't need to do
so it will always have an advantage on us I am afraid, it shouldn't be this
big though.Have two questions:
using elasticsearch-py the index mapping is not the same on the
server side as when using pyes. Am I missing something? With pyes all the
properties were there, but using elasticsearch-py, only type appears on the
server side and are not the ones I specified. On the server log, It shows
"update_mapping [accesslogs] (dynamic)" which doesn't happen with pyes. I'm
sure I'm missing some property/config.how I invoke: (the mapping is on the second post)
self.client.indices.put_**mapping(index=self.doc_**collection,
doc_type=self.doc_type,body={'**properties':self.doc_mapping})body should also include the doc_type, so:
self.client.indices.put_**mapping(index=self.doc_**collection,
doc_type=self.doc_type,body={*self.doc_type: {'properties':self.doc
*mapping}})
- Also, can you guys share what's your performance on a single local
node?I haven't done any tests like this, it varies so much with different
HW/configuration/environment that there is little value in absolute
numbers, only thing that matters is the relative speed of python clients,
curl etc.As I mention on my first post, these are my non-default
configurations, maybe there is still room for improvement? Not to mention
of course, that these same settings were responsible for the 12MB/s on curl.indices.memory.index_buffer_size: 50%
indices.memory.min_index_buffer_size: 300mb
index.translog.flush_threshold: 30000
index.store.type: mmapfs
index.merge.policy.use_compound_file: falsethese look reasonable though I am no expert. Also when using SSDs you
might benefit from switching the kernel IO scheduler to noop:
https://speakerdeck.com/**elasticsearch/life-after-ec2https://speakerdeck.com/elasticsearch/life-after-ec2On Tuesday, 29 October 2013 14:32:12 UTC, Honza Král wrote:
you need to import it before you intend to use it:
from elasticsearch.connection import ThriftConnection
On Tue, Oct 29, 2013 at 3:02 PM, Mauro Farracha farr...@gmail.comwrote:
Ahhhh you are the source!
As I mentioned on the first post, I wrote a python script using
elasticsearch-py also and the performance was equals to pyes, but I
couldn't get it working with Thrift. The documentation available for me was
not detailed enough so I could understand how to fully use all the features
and was a little bit confusing the Connection/Transport classes.Maybe you could help me out... the error was:
self.client = Elasticsearch(hosts=self.elast
icsearch_conn,connection_class=ThriftConnection)
NameError: global name 'ThriftConnection' is not definedI have ES thrift plugin installed (works on pyes), I have the thrift
python module installed and I import the class. Don't know what I'm missing.On Tuesday, 29 October 2013 13:16:47 UTC, Honza Král wrote:
I am not familiar with pyes, I did however write elasticsearch-py
and made sure you can bypass the serialization by doing it yourself. If
needed you can even supply your own serializer - just create an instance
that has .dumps() and loads() methods and behaves the same as
elasticsearch.serializer.**JSONSerializer. you can then pass
it to the Elasticsearch class as an argument (serializer=my_faster_
**serializer)On Tue, Oct 29, 2013 at 2:07 PM, Mauro Farracha farr...@gmail.comwrote:
Hi Honza,
Ok, that could be a problem. I'm passing a python dictionary to
pyes driver. If I send a "string" json format I could pass the
serialization? Are you familiar with pyes driver?I saw this method signature, but don't know what's the "header",
and the document can it be one full string with several documents?index_raw_bulk(header, document)http://pyes.readthedocs.org/en/latest/references/pyes.es.html#pyes.es.ES.index_raw_bulk
Function helper for fast inserting
Parameters:
- header – a string with the bulk header must be ended with
a newline- header – a json document string must be ended with a
newlineOn Tuesday, 29 October 2013 12:55:55 UTC, Honza Král wrote:
Hi,
and what was the bottle neck? Has the pyhton process maxed out
the CPU or was it waiting for network? You can try serializing the
documents yourself and passing json strings to the client's bulk() method
to make sure that's not the bottle neck (you can pass in list of strings or
just one big string and we will just pass it along).The python client does more than curl - it serializes data and
parses output, that's at least 2 cpu intensive operations that need to
happen. One of them you can eliminate.On Tue, Oct 29, 2013 at 1:23 PM, joerg...@gmail.com <
joerg...@gmail.com> wrote:Also to mention, the number of shards and replica, which affect
a lot the indexing performance.Jörg
--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@**googlegroups.**c****om.For more options, visit https://groups.google.com/**grou******
ps/opt_out https://groups.google.com/groups/opt_out.--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/**grou**
ps/opt_out https://groups.google.com/groups/opt_out.--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.**com.
For more options, visit https://groups.google.com/**grou**ps/opt_outhttps://groups.google.com/groups/opt_out
.--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.