Strange limits when using Java API (SOLVED)


(Ane18s) #1

Hi,

I'm using Java API to index a lot of large documents. However, some documents are not indexed (or maybe partially indexed).

After a lot of tests, I have found that when a field is too large, the document is not being indexed. In the following code, when size=10, everything works and when size=1000, the problem occurs.

Apparently there are some limits that I am not aware of. The following code is executed 500 times or so, per each document. The size of the cb.string() is around 50ΜΒ.

XContentBuilder cb = XContentFactory.jsonBuilder();
...
String fakeDesc = "";
int size = 10;
for(int i = 0; i < size; i++) fakeDesc += "0";
cb.field("doc-desc", fakeDesc);
...

Please help!

I am using elasticsearch-1.7.0 and Java 1.8.


(Mark Walkom) #2

There is no theoretical limit on the size of a document, so something else is happening.
What do your ES logs show when this "missing" data is being indexed?


(Ane18s) #3

The logs don't show anything. Now I realize that I haven't setup logging correctly. There should be at least some logs right..? So I will setup logging and post the results here.

Thank you for answering, it actually helped!


(Ane18s) #4

Update: The logs did not show anything useful. I have even tried the "trace" level.

I have created a fully working example that reproduces the potential bug (don;t forget to substitute cluster.name with your cluster name):

import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;

public class ESTests {

    public static final String INDEX = "tempindex";
    public static final int SIZE1 = 3;
    public static final int SIZE2 = 3;

    public static void main(String[] args) throws Exception {

        Client client = new TransportClient(ImmutableSettings.builder().put("cluster.name", "ncluster").build())
                .addTransportAddress(new InetSocketTransportAddress("localhost", 9300));
        System.out.println("Client initialized.");
        try {
            //first delete the old index, if exists.
            client.admin().indices().delete(new DeleteIndexRequest(INDEX)).actionGet();
            System.out.println("Index deleted.");
        } catch (ElasticsearchException e) {
            //ignore
        }

        Settings settings = ImmutableSettings.settingsBuilder().build();
        System.out.println("Settings built.");
        //Then create a new-clean index.
        client.admin().indices().create(new CreateIndexRequest(INDEX, settings)).actionGet();
        System.out.println("Index created.");

        XContentBuilder cb = XContentFactory.jsonBuilder();
        int id = 1;
        cb.startObject();
        cb.field("name", "test document");
        
        {
            

            cb.startArray("testArray");
            String str = "";
            for(int i = 0; i < SIZE1; i++) str += "0"; //515 does not work!
            
            for(int i = 0; i < SIZE2; i++) {
                cb.startObject();
                cb.field("counter", i);
                cb.field("str", str);
                cb.endObject();
            }
            cb.endArray();
            
        }
        
        
        cb.endObject();

        client.prepareIndex(INDEX, "document", Integer.toString(id)).setSource(cb.string()).execute();

    }
}

The above code, should insert a similar document to the one below:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "tempindex",
            "_type": "document",
            "_id": "1",
            "_score": 1,
            "_source": {
               "name": "test document",
               "testArray": [
                  {
                     "counter": 0,
                     "str": "000"
                  },
                  {
                     "counter": 1,
                     "str": "000"
                  },
                  {
                     "counter": 2,
                     "str": "000"
                  }
               ]
            }
         }
      ]
   }
}

Now, when I change the constants SIZE1 and SIZE2, to values 100 and 1000 respectively, the document is not being created! You can play with the values to determine when exactly the error occurs.

To retrieve the document, I use the Sense plugin for Chrome and the following command:

POST /tempindex/document/_search?pretty=1
{
     "query" : {
        "match_all" : {}
    }
}

Any Ideas?


(Christoph) #5

I cannot reproduce this, the test code seems to be missing something. I left a comment on the github issue.


(Ane18s) #6

Your response solved the problem:

just calling execute() on the IndexRequestBuilder will
return a Future, but that only starts the index operation. If your test
now aborts, the cluster might not have received it. Instead, like when
issuing an index request via REST, you should wait for the response
using execute().actionGet(). there's even a handy shortcut for this with the get() method on all ActionRequestBuilders.

Calling execute().actionGet() was the solution.

Thank you!


(system) #7