Thanks again.
Code looks okay, so it might be just the full volume that is in the way
Jörg
On Tue, Sep 9, 2014 at 8:44 PM, Joshua P <jpeter...@gmail.com
<javascript:>> wrote:
This is the code I've been using to index:
I'm going to try to fix the running out of space issue and then try
slimming down settings. Thank you.
public class Indexer {
private static final Logger logger = LogManager.getLogger(
"ESBulkUploader");
public static void main(String[] args) throws IOException,
NoSuchFieldException {
DBConnection dbConn = new DBConnection("");
String query = "SELECT TOP 300000 * FROM vw_PropertyGeneralInfo
WHERE Country_id = 1 ORDER BY Property_id DESC";
System.out.println("getting data");
List<PropertyGeneralInfoRow> pgiTable = dbConn.
ExecuteQueryWithoutParameters(query);
System.out.println("got data");
ObjectMapper mapper = new ObjectMapper();
Settings settings = ImmutableSettings.settingsBuilder().put("
cluster.name", "property_transaction_data").build();
Client client = new TransportClient(settings).addTransportAddress(
new InetSocketTransportAddress("192.168.133.131", 9300));
BulkProcessor bulkProcessor = BulkProcessor.builder(client, new
BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId, BulkRequest request)
{
System.out.println("About to index " + request.
numberOfActions() + " records of size " + request.estimatedSizeInBytes() +
".");
}
@Override
public void afterBulk(long executionId, BulkRequest request,
BulkResponse response) {
if( response.hasFailures() ){
for( BulkItemResponse item : response.getItems() ){
BulkItemResponse.Failure failure = item.getFailure
();
if( failure != null ){
System.out.println(failure.getId() + " -- " +
failure.getStatus().name() + " -- " + failure.getMessage() + " -- " +
failure.getType());
}
}
}
System.out.println("Successfully indexed " + request.
numberOfActions() + " records in " + response.getTook() + ".");
}
@Override
public void afterBulk(long executionId, BulkRequest request,
Throwable failure) {
System.out.println("failure somewhere on " + request.
toString());
failure.printStackTrace();
logger.warn("failure on " + request.toString());
}
}).setBulkActions(500).setConcurrentRequests(1).build();
for( int i = 0; i < pgiTable.size(); i++ ){
//prep location field
PropertyGeneralInfoRow pgiRow = pgiTable.get(i);
Double[] location = {pgiRow.getLon_dbl(), pgiRow.getLat_dbl
()};
geocode geocode = new geocode();
geocode.setLocation(location);
pgiRow.setGeocode(geocode);
// prep full address string
pgiRow.setFulladdressstring(pgiRow.getPropertykey_tx() + ", "
-
pgiRow.getCity_tx() + ", " + pgiRow.getStateprov_cd()
-
", " + pgiRow.getCountry_tx() + ", " + pgiRow.
getPostalcode_tx());
String jsonRow = mapper.writeValueAsString(pgiRow);
if( jsonRow != null && !jsonRow.isEmpty() && !jsonRow.equals(
"{}") ){
bulkProcessor.add(new IndexRequest("rcapropertydata",
"rcaproperty").source(jsonRow.getBytes()));
//
bulkProcessor.add(client.prepareIndex("rcapropertydata",
"rcaproperty").setSource(jsonRow));
}
else{
// don't add null strings..
try{
System.out.println(pgiRow.toString());
}
catch (Exception e){
System.out.println("Some error in the toString()
method...");
}
System.out.println("Some json output was null. -- " +
pgiRow.getProperty_id().toString());
}
}
bulkProcessor.flush();
bulkProcessor.close();
}
}
On Tuesday, September 9, 2014 1:57:54 PM UTC-4, Jörg Prante wrote:
Check the path.data setting in config/elasticsearch.yml
Jörg
On Tue, Sep 9, 2014 at 7:50 PM, Joshua P jpeter...@gmail.com wrote:
Just reran the indexer and found this error coming up. I'm running out of
disk space on the partition ES wants to write to.
F38KqHhnRDWtiJCss5Wz0g -- INTERNAL_SERVER_ERROR --
TranslogException[[index_type][0] Failed to write operation
[org.elasticsearch.index.translog.Translog$Create@6f1f6b1e]]; nested:
IOException[No space left on device]; -- index_type
Where would I change the write location? Which config file?
On Tuesday, September 9, 2014 1:28:21 PM UTC-4, Joshua P wrote:
Hi Jörg,
Can you elaborate on what you mean by I still need more fine tuning?
I've upped the heap size to 4g (in both places I mentioned before because
it's not clear to me which one ES actually uses). I haven't tried to index
again yet.
Other than throttling my indexing, what are some other things I need to be
thinking about?
On Tuesday, September 9, 2014 12:53:35 PM UTC-4, Jörg Prante wrote:
Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and
indexing around 1 million docs, you need some more fine tuning, which is
complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8
GB RAM.
Jörg
On Tue, Sep 9, 2014 at 5:39 PM, Joshua P jpeter...@gmail.com wrote:
Here is /etc/default/elasticsearch
Run Elasticsearch as this user ID and group ID
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch
Heap Size (defaults to 256m min, 1g max)
ES_HEAP_SIZE=512m
Heap new generation
#ES_HEAP_NEWSIZE=
max direct memory
#ES_DIRECT_SIZE=
Maximum number of open files, defaults to 65535.
MAX_OPEN_FILES=65535
Maximum locked memory size. Set to "unlimited" if you use the
bootstrap.mlockall option in elasticsearch.yml. You must also set
ES_HEAP_SIZE.
MAX_LOCKED_MEMORY=unlimited
Maximum number of VMA (Virtual Memory Areas) a process can own
#MAX_MAP_COUNT=262144
Elasticsearch log directory
#LOG_DIR=/var/log/elasticsearch
Elasticsearch data directory
#DATA_DIR=/var/lib/elasticsearch
Elasticsearch work directory
#WORK_DIR=/tmp/elasticsearch
Elasticsearch configuration directory
#CONF_DIR=/etc/elasticsearch
Elasticsearch configuration file (elasticsearch.yml)
#CONF_FILE=/etc/elasticsearch/elasticsearch.yml
Additional Java OPTS
#ES_JAVA_OPTS=
Configure restart on package upgrade (true, every other setting will
lead to not restarting)
#RESTART_ON_UPGRADE=true
I also see the same setting in /etc/init.d/elasticsearch. Do you know
which file takes priority? And what a good size would be?
On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote:
Hello Joshua ,
I am not sure which variable you are referring to on the memory settings
in the config file , please paste the comment and config.
I usually change the config from init.d script.
Best approach would be to bulk index say 10,000 feeds in sync mode , wait
until is everything is indexed and then proceed to the next batch.
I am not sure about the java API , but long back i used to curl to this
stats API and see how much request was rejected.
Thanks
Vineeth
On Tue, Sep 9, 2014 at 8:58 PM, Joshua P jpeter...@gmail.com wrote:
You also said you wouldn't recommend indexing that much information at
once. How would you suggest breaking it up and what status should I look
for before doing another batch? I have to come up with some process that is
repeatable and mostly automated.
On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote:
Thanks for the reply, Vineeth!
What's a practical heap size? I've seen some people saying they set it to
30gb but this confuses me because in the /etc/default/elasticsearch file,
the comment suggests the max is only 1gb?
I'll look into the threadpool issue. Is there a Java API for monitoring
Cluster Node health? Can you point me at an example or give me a link to
that?
Thanks!
On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote:
Hello Joshuva ,
I have a feeling this has something to do with the threadpool.
There is a limit on number of feeds to be queued for indexing.
Try increasing the size of threadpool queue of index and bulk to a large
number.
Also through cluster node API on threadpool, you can see if any request
has failed.
Monitor this API for any failed request due to large volume.
Threadpool - Elasticsearch Platform — Find real-time answers at scale | Elastic
nce/current/modules-threadpool.html
Threadpool stats - Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/cluster-nodes-stats.html
Having said that , i wont recommend bulk indexing that much information at
a time and 512 MB is not going to help much.
Thanks
Vineeth
On Tue, Sep 9, 2014 at 7:48 PM, Joshua P jpeter...@gmail.com wrote:
Hi there!
I'm trying to do a one-time index of about 800,000 records into an
instance of elasticsearch. But I'm having a bit of trouble. It continually
fails around 200,000 records. Looking at in the Elasticsearch Head Plugin,
my index goes offline and becomes unrecoverable.
For now, I have it running on a VM on my personal machine.
VM Config:
Ubuntu Server 14.04 64-Bit
8 GB RAM
2 Processors
32 GB SSD
Java
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2
)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
Elasticsearch is using mostly the defaults. This is the output of:
curl http://localhost:9200/_nodes/process?pretty
{
"cluster_name" : "property_transaction_data",
"nodes" : {
"KlFkO_qgSOKmV_jjj5xeVw" : {
"name" : "Marvin Flumm",
"transport_address" : "inet[/192.168.133.131:9300]",
"host" : "ubuntu-es",
"ip" : "127.0.1.1",
"version" : "1.3.2",
"build" : "dee175d",
"http_address" : "inet[/192.168.133.131:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 1092,
"max_file_descriptors" : 65535,
"mlockall" : true
}
}
}
}
I adjusted ES_HEAP_SIZE to 512mb.
I'm using the following code to pull data from SQL Server and index it.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/ms
gid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/ms
gid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer
.
...