Help on ---- remote server bulk load

mohammad_rafi_g · April 4, 2013, 6:04am

Hi to all.

Subject: how to load bulk documents on our server machine[my company
server ip add is like xxx.xxx.x.251 ] from my local system[my ip add add
is xxx.xxx.x.182].
actually we need to push some data in our
server[say some thing like., ip address is xxx.xxx.x.251] from my lap[my ip
address is like xxx.xxx.x.182]
can any one tell me how to load data in my server machine using java api.
If is there any technique to do it?
please tell.

regards and thanks to all.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · April 4, 2013, 7:00am

Hi to all,

       in my java application I have been used the following code for

bulk load.
.........
.........
Client client = new
TransportClient().addTransportAddress(new
InetSocketTransportAddress("xxx.xxx.x.251", 9300));
.......
BulkRequestBuilder bulkRequest =
client.prepareBulk();
bulkRequest.add(client.prepareIndex("jan2103",
"cdr3",Long.toString(Id))
.setSource(
jsonBuilder().startObject()
.field("id", nextLine[0])
.field("Call_source", nextLine[1])
.field("call_destination", nextLine[2])
.field("billing_number", nextLine[3])
.field("plan", nextLine[4])
.field("call_type", nextLine[5])
.field("call_start_time", nextLine[6])
.field("call_connected", nextLine[7])
.field("call_ended", nextLine[8])
.field("duration", nextLine[9])
.field("amount", nextLine[10])
.endObject()));
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
............
............

When i run the above code from my system[ip address is: (like)
xxx.xxx.x.182] I got the following error like:
..........................................
INFO: [Blink] failed to get node info for
[#transport#-1][inet[/192.168.2.251:9300]], disconnecting...
org.elasticsearch.transport.NodeDisconnectedException:
[][inet[/192.168.2.251:9300]][cluster/nodes/info] disconnected
java.util.concurrent.ExecutionException:
org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: No
node available
..............................................

Please Can any one tell me, what i have to do for load my data into server
machine's elasticsearch?
Tell me also, if, i have done wrong with my code. or is there any other way
for load data into remote systems.

regards and thanks to all.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · April 4, 2013, 7:10am

Check your cluster name.
If you changed it, follow instructions here:

If it does not work, check your node firewall settings…

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 4 avr. 2013 à 09:00, rafi me.mrafi.en@gmail.com a écrit :

Hi to all,
       in my java application I have been used the following code for bulk load.
                   .........
                   .........
                   Client client = new TransportClient().addTransportAddress(new InetSocketTransportAddress("xxx.xxx.x.251", 9300));
                   .......
                   BulkRequestBuilder bulkRequest = client.prepareBulk();
                   bulkRequest.add(client.prepareIndex("jan2103", "cdr3",Long.toString(Id))
                    .setSource(
                            jsonBuilder().startObject()
                            .field("id", nextLine[0])
                            .field("Call_source", nextLine[1])
                            .field("call_destination", nextLine[2])
                            .field("billing_number", nextLine[3])
                            .field("plan", nextLine[4])
                            .field("call_type", nextLine[5])
                            .field("call_start_time", nextLine[6])
                            .field("call_connected", nextLine[7])
                            .field("call_ended", nextLine[8])
                            .field("duration", nextLine[9])
                            .field("amount", nextLine[10])
                  .endObject()));
        BulkResponse bulkResponse = bulkRequest.execute().actionGet();
             ............
             ............
When i run the above code from my system[ip address is: (like) xxx.xxx.x.182] I got the following error like:
..........................................
INFO: [Blink] failed to get node info for [#transport#-1][inet[/192.168.2.251:9300]], disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [inet[/192.168.2.251:9300]][cluster/nodes/info] disconnected
java.util.concurrent.ExecutionException: org.elasticsearch.client.transport.NoNodeAvailableException: No node available
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: No node available
..............................................

Please Can any one tell me, what i have to do for load my data into server machine's elasticsearch?
Tell me also, if, i have done wrong with my code. or is there any other way for load data into remote systems.

regards and thanks to all.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · April 4, 2013, 7:15am

You must tell the transport client the cluster name to connect to.

Example

Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "test")
.build();
client = new TransportClient(settings);

Jörg

Am 04.04.13 09:00, schrieb rafi:

Client client = new TransportClient().addTransportAddress(new
InetSocketTransportAddress("xxx.xxx.x.251", 9300));

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · April 4, 2013, 9:41am

hi., David and Prante
Thanks for reply.

As per your suggestion,

David
Check your cluster name.
If you changed it, follow instructions here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

I have checked my cluster name and check the above link.
But it is not working and throwing same exception.

If it does not work, check your node firewall settings…

I don't know, how to check my node fire wall settings.
Can you Please tell me how to check it and How to solve my problem

Prante
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "test").build();
client = new TransportClient(settings);

I also checked this code but not working.

...............Now i am sharing my source code[which deals with
Elasticsearch] with full exception.
public static void main(String arguments) {
Long Id = new Long(1);
String nextLine;
Settings settings =
ImmutableSettings.settingsBuilder().put("cluster.name",
"elasticsearch257").build();
Client client = new
TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress("xxx.xxx.x.251", 9300));
BulkRequestBuilder bulkRequest = client.prepareBulk();
for (; (nextLine = reader.readNext()) != null; Id++) {
bulkRequest
.add(client.prepareIndex("jan2013",
"cdr3",Long.toString(Id))
.setSource(
jsonBuilder().startObject()
.field("id", nextLine[0])
.field("Call_source", nextLine[1])
.field("call_destination", nextLine[2])
.field("billing_number",
nextLine[3]).endObject())); }
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
client.close();
.............................................................................................................................
and Exception in log is:

Apr 4, 2013 2:45:58 PM org.elasticsearch.plugins
INFO: [Black Panther] loaded , sites
Apr 4, 2013 2:46:07 PM org.elasticsearch.client.transport
INFO: [Black Panther] failed to get node info for*[#transport#-1][inet[/xxx.xxx.x.251:9300]],

disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [inet[/xxx.xxx.x.251:9300]][cluster/nodes/info]
disconnected
Apr 4, 2013 2:46:07 PM org.elasticsearch.client.transport
INFO: [Black Panther] failed to get node info for [#transport#-1][inet[/xxx.xxx.x.251:9300]],
disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [][ine
t[/xxx.xxx.x.251:9300]][cluster/nodes/info] disconnected
java.util.concurrent.ExecutionException:
org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
com.enhancesys.elasticsearch.general.dev.IndexCSVData.main(IndexCSVData.java:90)
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: No
node available
at
org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:205)
at
org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:97)
at
org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:141)
at
org.elasticsearch.client.transport.TransportClient.bulk(TransportClient.java:328)
at
org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:128)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:47)
at
com.enhancesys.elasticsearch.general.dev.IndexCSVDataDependent.call(IndexCSVDataDependent.java:16)
at
com.enhancesys.elasticsearch.general.dev.IndexCSVDataDependent.call(IndexCSVDataDependent.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
......................................................................................................................................................................................

*This is my problem.
From my system[with ip address say, xxx.xxx.x.182], I want to load data
into another system[ip add say, xxx.xxx.x.251] elaestichsearch.
*I am totally confused, how to do this.
So please anyone can tell me how to do load some documents from my system
into another system's elasticsearch.

thanks and regards to you all.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · April 4, 2013, 9:58am

So your cluster name running at xxx.xxx.xxx.251 is elasticsearch257?

To check firewall, I often simply use telnet:

telnet xxx.xxx.xxx.251 9300

should give you something like:

Trying ::1...
Connected to localhost.
Escape character is '^]'.

If it's not working, then:
Your node on xxx.xxx.xxx.251 is not running
Your node is blocked by a firewall or any network box. -> ask your network or system admin

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 4 avr. 2013 à 11:41, rafi me.mrafi.en@gmail.com a écrit :

hi., David and Prante
Thanks for reply.

As per your suggestion,

David
Check your cluster name.
If you changed it, follow instructions here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

I have checked my cluster name and check the above link.
But it is not working and throwing same exception.

If it does not work, check your node firewall settings…

I don't know, how to check my node fire wall settings.
Can you Please tell me how to check it and How to solve my problem

Prante
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "test").build();
client = new TransportClient(settings);

I also checked this code but not working.

...............Now i am sharing my source code[which deals with Elasticsearch] with full exception.
public static void main(String arguments) {
Long Id = new Long(1);
String nextLine;
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch257").build();
Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress("xxx.xxx.x.251", 9300));
BulkRequestBuilder bulkRequest = client.prepareBulk();
for (; (nextLine = reader.readNext()) != null; Id++) {
bulkRequest
.add(client.prepareIndex("jan2013", "cdr3",Long.toString(Id))
.setSource(
jsonBuilder().startObject()
.field("id", nextLine[0])
.field("Call_source", nextLine[1])
.field("call_destination", nextLine[2])
.field("billing_number", nextLine[3]).endObject())); }
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
client.close();
.............................................................................................................................
and Exception in log is:

Apr 4, 2013 2:45:58 PM org.elasticsearch.plugins
INFO: [Black Panther] loaded , sites
Apr 4, 2013 2:46:07 PM org.elasticsearch.client.transport
INFO: [Black Panther] failed to get node info for [#transport#-1][inet[/xxx.xxx.x.251:9300]], disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [inet[/xxx.xxx.x.251:9300]][cluster/nodes/info] disconnected
Apr 4, 2013 2:46:07 PM org.elasticsearch.client.transport
INFO: [Black Panther] failed to get node info for [#transport#-1][inet[/xxx.xxx.x.251:9300]], disconnecting...
org.elasticsearch.transport.NodeDisconnectedException: [inet[/xxx.xxx.x.251:9300]][cluster/nodes/info] disconnected
java.util.concurrent.ExecutionException: org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.enhancesys.elasticsearch.general.dev.IndexCSVData.main(IndexCSVData.java:90)
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:205)
at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:97)
at org.elasticsearch.client.support.AbstractClient.bulk(AbstractClient.java:141)
at org.elasticsearch.client.transport.TransportClient.bulk(TransportClient.java:328)
at org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(BulkRequestBuilder.java:128)
at org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:53)
at org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:47)
at com.enhancesys.elasticsearch.general.dev.IndexCSVDataDependent.call(IndexCSVDataDependent.java:16)
at com.enhancesys.elasticsearch.general.dev.IndexCSVDataDependent.call(IndexCSVDataDependent.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
......................................................................................................................................................................................

This is my problem.
From my system[with ip address say, xxx.xxx.x.182], I want to load data into another system[ip add say, xxx.xxx.x.251] elaestichsearch.
I am totally confused, how to do this.
So please anyone can tell me how to do load some documents from my system into another system's elasticsearch.

thanks and regards to you all.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · April 5, 2013, 7:18am

hi David.,
Thanks for reply.
I will try as per you said. If any thing new i got i will let you
know.
Once again thank you very much

regards to all

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · April 15, 2013, 11:37am

Hi, David.,

As per you Said.,

  To check firewall, I often simply use telnet:
telnet xxx.xxx.xxx.251 9300
should give you something like:
Trying ::1...
Connected to localhost.
Escape character is '^]'.

I don't Know where I have to Check the above command.
Is there any separate tool/software to test this command.
Please tell me which tool is needed to test.

In my office we are using PUTTY for communicating with server.
But i don't know how to test my server's elasticsearch availability by
using PUTTY.

Once again Thanks You Very Much...

Regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · April 15, 2013, 11:44am

Please ask your system/network administrators. They should be able to help you to diagnose things.
I'm afraid I can't help more here.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 avr. 2013 à 13:37, rafi me.mrafi.en@gmail.com a écrit :

Hi, David.,

As per you Said.,
  To check firewall, I often simply use telnet:
telnet xxx.xxx.xxx.251 9300
should give you something like:
Trying ::1...
Connected to localhost.
Escape character is '^]'.
I don't Know where I have to Check the above command.
Is there any separate tool/software to test this command.
Please tell me which tool is needed to test.

In my office we are using PUTTY for communicating with server.
But i don't know how to test my server's elasticsearch availability by using PUTTY.

Once again Thanks You Very Much...

Regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · April 15, 2013, 12:04pm

Hi David.

Thank You Very Much for Your Support. ok I will Check it my Own.
Once again Thanks.
Please Excuse me if i make you Fear.

Regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · May 14, 2013, 6:37am

Hi., to all
After All., Now I have able to load Bulk data from my
local system to our server system.
Now we are using 0.20.5V of elasticserach.
and java 1.6.0_32 java version.

Actually we have implemented some application and created a Runnable Jar
and then we have run that jar in server machine. now its working fine.

our Application taking Average of 130sec for 1Miilion data [which has
8fields].
If Any One Knows,Plz tell me., is this the good Performance for Indexing or
not?
Please tell me for Indexing 1Milloin with 8 fields of data, how much time
it will take?

Thanks to All..

Best Regards
Mohammad Rafi.

On Thursday, April 4, 2013 11:34:27 AM UTC+5:30, rafi wrote:

Hi to all.

Subject: how to load bulk documents on our server machine[my company
server ip add is like xxx.xxx.x.251 ] from my local system[my ip add add
is xxx.xxx.x.182].
actually we need to push some data in our
server[say some thing like., ip address is xxx.xxx.x.251] from my lap[my ip
address is like xxx.xxx.x.182]
can any one tell me how to load data in my server machine using java api.
If is there any technique to do it?
please tell.

regards and thanks to all.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · May 14, 2013, 6:55am

On a single node? With standard hard disks?

More than 7.5k per sec is not bad at all.

You can probably increase it using more boxes (one shard per box), disabling
replicas, disable refresh, use SSD drives…

HTH

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de rafi
Envoyé : mardi 14 mai 2013 08:38
À : elasticsearch@googlegroups.com
Objet : Re: help on ---- remote server bulk load

Hi., to all
After All., Now I have able to load Bulk data from my
local system to our server system.
Now we are using 0.20.5V of elasticserach.
and java 1.6.0_32 java version.

Actually we have implemented some application and created a Runnable Jar and
then we have run that jar in server machine. now its working fine.

our Application taking Average of 130sec for 1Miilion data [which has
8fields].
If Any One Knows,Plz tell me., is this the good Performance for Indexing or
not?
Please tell me for Indexing 1Milloin with 8 fields of data, how much time it
will take?

Thanks to All..

Best Regards
Mohammad Rafi.

On Thursday, April 4, 2013 11:34:27 AM UTC+5:30, rafi wrote:

Hi to all.

Subject: how to load bulk documents on our server machine[my company server
ip add is like xxx.xxx.x.251 ] from my local system[my ip add add is
xxx.xxx.x.182].
actually we need to push some data in our
server[say some thing like., ip address is xxx.xxx.x.251] from my lap[my ip
address is like xxx.xxx.x.182]
can any one tell me how to load data in my server machine using java api.
If is there any technique to do it?
please tell.

regards and thanks to all.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · May 14, 2013, 8:18am

Hi, David Pilato.,

On a single node? With standard hard disks?

More than 7.5k per sec is not bad at all.

Actually We Have two Nodes[both are master nodes only] Running in our
Server and our hard disks is SSD Only.

we have 8 shards and 0 replicas. and with bulk size is: 8k

[we have tried for 1k, 4k ,10k, 12k also but at 8k we we have good speed
with minimum resource<here while indexing we are calling Thread.sleep() for
pause the indexing operation>]

For the above Configuration also is our Indexing speed good?

Now i am sharing a peace of code of our Application

.................................................................................................................................

          XContentBuilder builder = XContentFactory.jsonBuilder();
            builder.startObject();
            int position = 0;
            for (Object fieldName : csvFields) {                   
                builder.field((String) fieldName,

nextLine[position++]); }
builder.endObject();
bulkRequest.add(Requests.indexRequest("cdrdata").type("cdr")

.id(nextLine[0]).create(true).source(builder)); //here we have one
method like " .setRefresh(true) " but we are not using refresh here

        //here we have taken sleep method for pause the index operation

for few seconds like.,

           try {                        Thread.sleep(1000);      //

because if we not pause the indexing operation the ES node Heap size is
Incrementing up to its max size[5gb] and
} catch (Exception e) { } //
causing NoNodeException or Some thing Like., TimeOut Exception

          bulkRequest.execute(new ActionListener<BulkResponse>() {
                    @Override
                    public void onResponse(BulkResponse bulkResponse) {

//some print stmts }
@Override
public void onFailure(Throwable e) {// error print
stmts }
});

.....................................................................................................................

here., Is there any thing to change for boost the speed of Indexing
Operation..

You can probably increase it using more boxes (one shard per box),
disabling replicas, disable refresh, use SSD drives…

have we 8 shards and 0 replicas. and we have not use any refresh while
doing bulk Indexing..

But I don't know is the refreshing operation enable default or not? if the
refresh operation is default then how to disable it?

You can probably increase it using more boxes (one shard per box)

And You have mentioned here use One Shard per Box ., i clearly not able
to understand what is box here, is it server or node?

Also give me any suggestions to speed up my indexing operation.

Once again thanks for replying., David Pilato

Best Regards

Mohammad Rafi.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · May 14, 2013, 8:35am

IMHO, the best you can give to ES is to have one shard per node and one node per server.
So, with 8 shards, you can have up to 8 nodes so up to 8 servers.
That said, you can try to tune it and start with only two shards as you have two nodes. Then increase the number of shards, reindex and see where it goes.

Don't run more than one node on a single server (or only if you have more than 64Gb of RAM).

Refresh is enabled by default. Every second. To disable it:
See the middle of this page (Bulk Indexing Usage): Elasticsearch Platform — Find real-time answers at scale | Elastic

Does it help?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 14 mai 2013 à 10:18, rafi me.mrafi.en@gmail.com a écrit :

Hi, David Pilato.,

On a single node? With standard hard disks?
More than 7.5k per sec is not bad at all.

Actually We Have two Nodes[both are master nodes only] Running in our Server and our hard disks is SSD Only.

we have 8 shards and 0 replicas. and with bulk size is: 8k

[we have tried for 1k, 4k ,10k, 12k also but at 8k we we have good speed with minimum resource<here while indexing we are calling Thread.sleep() for pause the indexing operation>]

For the above Configuration also is our Indexing speed good?

Now i am sharing a peace of code of our Application

.................................................................................................................................
          XContentBuilder builder = XContentFactory.jsonBuilder();
            builder.startObject();
            int position = 0;
            for (Object fieldName : csvFields) {                   
                builder.field((String) fieldName, nextLine[position++]);                }
            builder.endObject();
          bulkRequest.add(Requests.indexRequest("cdrdata").type("cdr")
                    .id(nextLine[0]).create(true).source(builder));      //here we have one method like "  .setRefresh(true)  " but we are not using refresh here

        //here we have taken sleep method for pause the index operation for few seconds like.,  


           try {                        Thread.sleep(1000);      // because if we not pause the indexing operation the ES node Heap size is Incrementing up to its max size[5gb] and 
                } catch (Exception e) {                    }      // causing NoNodeException or Some thing Like., TimeOut Exception

          bulkRequest.execute(new ActionListener<BulkResponse>() {
                    @Override
                    public void onResponse(BulkResponse bulkResponse) { //some print stmts             }
                    @Override
                    public void onFailure(Throwable e) {// error print stmts                        }
                });
.....................................................................................................................

here., Is there any thing to change for boost the speed of Indexing Operation..

You can probably increase it using more boxes (one shard per box), disabling replicas, disable refresh, use SSD drives…

have we 8 shards and 0 replicas. and we have not use any refresh while doing bulk Indexing..

But I don't know is the refreshing operation enable default or not? if the refresh operation is default then how to disable it?

You can probably increase it using more boxes (one shard per box)

And You have mentioned here use One Shard per Box ., i clearly not able to understand what is box here, is it server or node?

Also give me any suggestions to speed up my indexing operation.

Once again thanks for replying., David Pilato

Best Regards

Mohammad Rafi.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · May 14, 2013, 9:40am

Hi, David Pilato
Thanks a Lot for Replying. and your posts
are very useful to me.

IMHO, the best you can give to ES is to have one shard per node and one
node per server.
So, with 8 shards, you can have up to 8 nodes so up to 8 servers.
Don't run more than one node on a single server (or only if you have more
than 64Gb of RAM).

             Ok David., Now I got some clear idea on it.

Actually., The Server Machine which we are using has 64Gb of Ram but as of
Our Works we have Splited our Server into approximately 10 Virtual Machines
And now, our server has 10Gb which we are using for ES Activities

That said, you can try to tune it and start with only two shards as you
have two nodes. Then increase the number of shards, reindex and see where
it goes.

       So now we will use Only One Node[that means we will run only one

ES Instance] and we will prepare some statistics for "having 1shard, 2
shards, .. for testing the speed of index operation".
And I will Let U Know what happen?

Thanks again for Replying.

Best Regards
Mohammad Rafi.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · May 14, 2013, 9:46am

Yes please. Will be happy to hear news from your tests.

About Virtual machines, I definitely prefer having one single machine per hardware box. You can start a JVM with 30Gb RAM instead of 10 VM with smaller RAM.
Also, don't forget that in case of a crash (hard disk failure for example), you will probably lost all your replicas as well (as they are on the same physical box - even if it's another virtual node)…

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 14 mai 2013 à 11:40, rafi me.mrafi.en@gmail.com a écrit :

Hi, David Pilato
Thanks a Lot for Replying. and your posts are very useful to me.

IMHO, the best you can give to ES is to have one shard per node and one node per server.
So, with 8 shards, you can have up to 8 nodes so up to 8 servers.
Don't run more than one node on a single server (or only if you have more than 64Gb of RAM).
             Ok David., Now I got some clear idea on it.
Actually., The Server Machine which we are using has 64Gb of Ram but as of Our Works we have Splited our Server into approximately 10 Virtual Machines
And now, our server has 10Gb which we are using for ES Activities

That said, you can try to tune it and start with only two shards as you have two nodes. Then increase the number of shards, reindex and see where it goes.
       So now we will use Only One Node[that means we will run only one ES Instance] and we will prepare some statistics for "having 1shard, 2 shards, ..  for testing the speed of index operation".
 And I will Let U Know what happen? 
Thanks again for Replying.

Best Regards
Mohammad Rafi.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mohammad_rafi_g · May 27, 2013, 1:57pm

Hi., David Pilato

Yes please. Will be happy to hear news from your tests.

Actually We Start our Elasticserach server before 30-40 days.
now, if we load 10million of data it is taking average of 130seconds for
index 1Million data.

        After all have restart the elasticserach server [what we do is,

we just shutdown the ES server and again started the ES server.]
I have tested with differnt options and got different timings for index 10
million of data [with 8 fields].

After REstating server.,
now, if we load 10million of data it is taking average of 60seconds for
index 1Million data.[with 2 nodes and index has 8shards and 0 replica with
refresh ]
if we take like.,
8 shards 0 replica with refresh with 2 nodes -----taking avg of 60 sec for
indexing 1 million of data [which have 8 fields]
8 shards 0 replica with out refresh with 2 nodes-----taking much time for
indexing 1 million of data [which have 8 fields]<it is taking 80-120sec>
8 shards 0 replica with refresh with 1 nodes -----taking much time for
indexing 1 million of data<more than 80sec>
1 shard 0 replica with refresh with 2 nodes -----taking much time for
indexing 1 million of data [which have 8 fields]<first it has taking 60-70
sec, after some data say., 4Million of data, it is taking 90-120sec>
2 shard 0 replica with refresh with 2 nodes -----taking much time for
indexing 1 million of data [which have 8 fields]<first it has taking 60-70
sec, after some data say., 4Million of data, it is taking 80-110sec>
3 shard 0 replica with refresh with 2 nodes-----taking much time for
indexing 1 million of data [which have 8 fields]<first it has taking 60-70
sec, after some data say., 4Million of data, it is taking 80-100sec>
6 shard 0 replica with refresh with 2 nodes------taking avg of 60 sec for
indexing 1 million of data [which have 8 fields]

So thus We have found that with our server we have the best performance on
6-8shards with 2 nodes.

Now i have one question in my mind. Why our ES Server was indexed an avg
speed of 130sec for 1Millon before Restarting ES Server.
And why it has taken only 60sec after restarting ES Server?
Please Tell me if You know, Is there any Reason behind this?
and Is there any Option to fix this with out restarting ES server?

Thanks and Regards,
Mohammad Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Need -Help --On ---How to Index Data(e.g., CSV File Data) using java api from my system(local machine) into another system(Server machine)'s Elasticsearch Elasticsearch	1	538	July 6, 2017
ElasticSearch: Allow only local requests Elasticsearch	2	363	July 6, 2017
Index server on remote machine Elasticsearch	3	481	July 6, 2017
Elastic 5.1.1 bulk java api not working on remote server Elasticsearch es-hadoop	2	768	June 22, 2017
Sending BulkRequest in Java API Client Elasticsearch version 7.16.3 Elasticsearch language-clients	3	822	March 22, 2022

Help on ---- remote server bulk load

Related topics