Documents indexed, but cannot 'GET' them

mp2893 · April 9, 2012, 5:13am

Hi,

I have a index named 'news' and a mapping named 'document'
'news' has 5 shards, 2 replicates.
Below is the mapping of 'document'

{
"document" : {
"_parent" : {
"type" : "cluster"
},
"_routing" : {
"required" : true
},
"_source" : {
"enabled" : false
},
"properties" : {
"clusterid" : {
"type" : "string",
"store" : "yes"
},
"company" : {
"type" : "string"
},
"companyNum" : {
"type" : "string"
},
"count" : {
"type" : "integer"
},
"date" : {
"type" : "date",
"store" : "yes",
"format" : "YYYY-MM-dd"
},
"text" : {
"type" : "string",
"analyzer" : "snowball",
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "snowball",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"url" : {
"type" : "string"
}
}
}
}

Don't mind the '_parent'. I don't think that is relevant.

I bulk-indexed 427410 json-documents. I got no errors. Everything went
fine.
Below is the code for bulk indexing (partial code actually)

public void insertDocumentBulk(ArrayList<String> lineList, String

documentMapping) throws Exception
{
BulkRequestBuilder brb = client.prepareBulk();

    for (String line: lineList)
    {
        JSONObject jobj = new JSONObject(line);
        String id = jobj.getString("docid");
        String clusterId = jobj.getString("clusterid");
        String dateFormat =

convertDateFormat(jobj.getInt("date"));
JSONArray sentArray = jobj.getJSONArray("text");
String text = jsonArrayToString(sentArray);
jobj.put("text", text);
jobj.put("date", dateFormat);
jobj.remove("docid");
brb.add(client.prepareIndex(index, documentMapping,
id).setParent(clusterId).setSource(jobj.toString()));
}

    BulkResponse bulkResponse = brb.execute().actionGet();

    int count = 0;
    if (bulkResponse.hasFailures())
    {
        count++;
    }
    System.out.println("error count: " + count);
}

But when I try to 'GET' some of the documents, I get nothing.
For example, if I do a query "curl -XGET 'http://etridorm.iptime.org:
9200/news/document/20110803_0_96464759519569618?fields=title'",
I get
"{"_index":"news","_type":"document","_id":"20110803_0_96464759519569618","exists":false}".

It's not that I can't 'GET' all the documents. Some of them are
accessible, but some of them aren't.
But the funny thing is, when I do a query "curl -XGET 'http://
etridorm.iptime.org:9200/news/document/_count?q=*'"
I get {"count":427410,"_shards":{"total":5,"successful":5,"failed":
0}}", which is the exact same number of documents I had indexed.

I did flushing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_flush),
I did refreshing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_refresh),
but nothing seemed to improve the situation.

Am I doing something wrong here?
I'd appreciate any help.

Ed

mp2893 · April 10, 2012, 12:14am

I've found the answer to my problem. (as it is often the case)
After you index a document with the "_parent" set, if you want to "GET" the
document, you need to specify the "routing" parameter.
For example, if you index a document with an id "1234" and its parent
"5678", then the "GET" command should be,
"curl -XGET 'http://localhost:9200/index/mapping/1234?routing=5678'".
Hope this helps someone like me.

2012/4/9 mp2893 mp2893@gmail.com

Hi,

I have a index named 'news' and a mapping named 'document'
'news' has 5 shards, 2 replicates.
Below is the mapping of 'document'

{
"document" : {
"_parent" : {
"type" : "cluster"
},
"_routing" : {
"required" : true
},
"_source" : {
"enabled" : false
},
"properties" : {
"clusterid" : {
"type" : "string",
"store" : "yes"
},
"company" : {
"type" : "string"
},
"companyNum" : {
"type" : "string"
},
"count" : {
"type" : "integer"
},
"date" : {
"type" : "date",
"store" : "yes",
"format" : "YYYY-MM-dd"
},
"text" : {
"type" : "string",
"analyzer" : "snowball",
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "snowball",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"url" : {
"type" : "string"
}
}
}
}

Don't mind the '_parent'. I don't think that is relevant.

I bulk-indexed 427410 json-documents. I got no errors. Everything went
fine.
Below is the code for bulk indexing (partial code actually)

public void insertDocumentBulk(ArrayList lineList, String
documentMapping) throws Exception
{
BulkRequestBuilder brb = client.prepareBulk();
   for (String line: lineList)
   {
       JSONObject jobj = new JSONObject(line);
       String id = jobj.getString("docid");
       String clusterId = jobj.getString("clusterid");
       String dateFormat =
convertDateFormat(jobj.getInt("date"));
JSONArray sentArray = jobj.getJSONArray("text");
String text = jsonArrayToString(sentArray);
jobj.put("text", text);
jobj.put("date", dateFormat);
jobj.remove("docid");
brb.add(client.prepareIndex(index, documentMapping,
id).setParent(clusterId).setSource(jobj.toString()));
}
   BulkResponse bulkResponse = brb.execute().actionGet();

   int count = 0;
   if (bulkResponse.hasFailures())
   {
       count++;
   }
   System.out.println("error count: " + count);
}

But when I try to 'GET' some of the documents, I get nothing.
For example, if I do a query "curl -XGET 'http://etridorm.iptime.org:
9200/news/document/20110803_0_96464759519569618?fields=title'",
I get

"{"_index":"news","_type":"document","_id":"20110803_0_96464759519569618","exists":false}".

It's not that I can't 'GET' all the documents. Some of them are
accessible, but some of them aren't.
But the funny thing is, when I do a query "curl -XGET 'http://
etridorm.iptime.org:9200/news/document/_count?q=*'"
I get {"count":427410,"_shards":{"total":5,"successful":5,"failed":
0}}", which is the exact same number of documents I had indexed.

I did flushing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_flush),
I did refreshing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_refresh),
but nothing seemed to improve the situation.

Am I doing something wrong here?
I'd appreciate any help.

Ed

kimchy · April 11, 2012, 11:31am

Yea, thats because the child document is routed based on the parent id so
they end up in the same shard.

On Tue, Apr 10, 2012 at 3:14 AM, edward choi mp2893@gmail.com wrote:

I've found the answer to my problem. (as it is often the case)
After you index a document with the "_parent" set, if you want to "GET"
the document, you need to specify the "routing" parameter.
For example, if you index a document with an id "1234" and its parent
"5678", then the "GET" command should be,
"curl -XGET 'http://localhost:9200/index/mapping/1234?routing=5678'".
Hope this helps someone like me.

2012/4/9 mp2893 mp2893@gmail.com
Hi,

I have a index named 'news' and a mapping named 'document'
'news' has 5 shards, 2 replicates.
Below is the mapping of 'document'

{
"document" : {
"_parent" : {
"type" : "cluster"
},
"_routing" : {
"required" : true
},
"_source" : {
"enabled" : false
},
"properties" : {
"clusterid" : {
"type" : "string",
"store" : "yes"
},
"company" : {
"type" : "string"
},
"companyNum" : {
"type" : "string"
},
"count" : {
"type" : "integer"
},
"date" : {
"type" : "date",
"store" : "yes",
"format" : "YYYY-MM-dd"
},
"text" : {
"type" : "string",
"analyzer" : "snowball",
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "snowball",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"url" : {
"type" : "string"
}
}
}
}

Don't mind the '_parent'. I don't think that is relevant.

I bulk-indexed 427410 json-documents. I got no errors. Everything went
fine.
Below is the code for bulk indexing (partial code actually)

public void insertDocumentBulk(ArrayList lineList, String
documentMapping) throws Exception
{
BulkRequestBuilder brb = client.prepareBulk();
   for (String line: lineList)
   {
       JSONObject jobj = new JSONObject(line);
       String id = jobj.getString("docid");
       String clusterId = jobj.getString("clusterid");
       String dateFormat =
convertDateFormat(jobj.getInt("date"));
JSONArray sentArray = jobj.getJSONArray("text");
String text = jsonArrayToString(sentArray);
jobj.put("text", text);
jobj.put("date", dateFormat);
jobj.remove("docid");
brb.add(client.prepareIndex(index, documentMapping,
id).setParent(clusterId).setSource(jobj.toString()));
}
   BulkResponse bulkResponse = brb.execute().actionGet();

   int count = 0;
   if (bulkResponse.hasFailures())
   {
       count++;
   }
   System.out.println("error count: " + count);
}

But when I try to 'GET' some of the documents, I get nothing.
For example, if I do a query "curl -XGET 'http://etridorm.iptime.org:
9200/news/document/20110803_0_96464759519569618?fields=title'",
I get

"{"_index":"news","_type":"document","_id":"20110803_0_96464759519569618","exists":false}".

It's not that I can't 'GET' all the documents. Some of them are
accessible, but some of them aren't.
But the funny thing is, when I do a query "curl -XGET 'http://
etridorm.iptime.org:9200/news/document/_count?q=*'"
I get {"count":427410,"_shards":{"total":5,"successful":5,"failed":
0}}", which is the exact same number of documents I had indexed.

I did flushing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_flush http://etridorm.iptime.org:9200/news/document/_flush),
I did refreshing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_refreshhttp://etridorm.iptime.org:9200/news/document/_refresh
),
but nothing seemed to improve the situation.

Am I doing something wrong here?
I'd appreciate any help.

Ed

Allan_Johns · June 18, 2013, 4:39am

So I can't GET a document with a known ID, unless I know its parent's ID as
well??

This really throws a spanner in the works for me. I started getting the
same problem (random GETs failing) when I added parent-child documents to
my db. But I was relying on being able to get a document knowing only its
ID. Is there a workaround?

thx
A

On Wednesday, April 11, 2012 9:31:45 PM UTC+10, kimchy wrote:

Yea, thats because the child document is routed based on the parent id so
they end up in the same shard.

On Tue, Apr 10, 2012 at 3:14 AM, edward choi <mp2...@gmail.com<javascript:>

wrote:
I've found the answer to my problem. (as it is often the case)
After you index a document with the "_parent" set, if you want to "GET"
the document, you need to specify the "routing" parameter.
For example, if you index a document with an id "1234" and its parent
"5678", then the "GET" command should be,
"curl -XGET 'http://localhost:9200/index/mapping/1234?routing=5678'".
Hope this helps someone like me.

2012/4/9 mp2893 <mp2...@gmail.com <javascript:>>
Hi,

I have a index named 'news' and a mapping named 'document'
'news' has 5 shards, 2 replicates.
Below is the mapping of 'document'

{
"document" : {
"_parent" : {
"type" : "cluster"
},
"_routing" : {
"required" : true
},
"_source" : {
"enabled" : false
},
"properties" : {
"clusterid" : {
"type" : "string",
"store" : "yes"
},
"company" : {
"type" : "string"
},
"companyNum" : {
"type" : "string"
},
"count" : {
"type" : "integer"
},
"date" : {
"type" : "date",
"store" : "yes",
"format" : "YYYY-MM-dd"
},
"text" : {
"type" : "string",
"analyzer" : "snowball",
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "snowball",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"url" : {
"type" : "string"
}
}
}
}

Don't mind the '_parent'. I don't think that is relevant.

I bulk-indexed 427410 json-documents. I got no errors. Everything went
fine.
Below is the code for bulk indexing (partial code actually)

public void insertDocumentBulk(ArrayList lineList, String
documentMapping) throws Exception
{
BulkRequestBuilder brb = client.prepareBulk();
   for (String line: lineList)
   {
       JSONObject jobj = new JSONObject(line);
       String id = jobj.getString("docid");
       String clusterId = jobj.getString("clusterid");
       String dateFormat =
convertDateFormat(jobj.getInt("date"));
JSONArray sentArray = jobj.getJSONArray("text");
String text = jsonArrayToString(sentArray);
jobj.put("text", text);
jobj.put("date", dateFormat);
jobj.remove("docid");
brb.add(client.prepareIndex(index, documentMapping,
id).setParent(clusterId).setSource(jobj.toString()));
}
   BulkResponse bulkResponse = brb.execute().actionGet();

   int count = 0;
   if (bulkResponse.hasFailures())
   {
       count++;
   }
   System.out.println("error count: " + count);
}

But when I try to 'GET' some of the documents, I get nothing.
For example, if I do a query "curl -XGET 'http://etridorm.iptime.org:
9200/news/document/20110803_0_96464759519569618?fields=title'",
I get

"{"_index":"news","_type":"document","_id":"20110803_0_96464759519569618","exists":false}".

It's not that I can't 'GET' all the documents. Some of them are
accessible, but some of them aren't.
But the funny thing is, when I do a query "curl -XGET 'http://
etridorm.iptime.org:9200/news/document/_count?q=*'"
I get {"count":427410,"_shards":{"total":5,"successful":5,"failed":
0}}", which is the exact same number of documents I had indexed.

I did flushing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_flush http://etridorm.iptime.org:9200/news/document/_flush),
I did refreshing (curl -XPOST 'http://etridorm.iptime.org:9200/news/
document/_refreshhttp://etridorm.iptime.org:9200/news/document/_refresh
),
but nothing seemed to improve the situation.

Am I doing something wrong here?
I'd appreciate any help.

Ed

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Bulk update with java API Elasticsearch	3	2457	July 6, 2017
Document ALWAYS Routes to ONE Shard when BULK Loading - HELP! Elasticsearch	18	1504	July 6, 2017
ElasticSearch (0.17.6) BulkIndexer: PDF / HTML Docs Elasticsearch	1	373	July 6, 2017
Can't index a document, getting an UnavailableShardsException Elasticsearch	2	332	July 6, 2017
Elasticsearch-hadoop: bulk indexing JSON Elasticsearch	5	550	July 6, 2017

Documents indexed, but cannot 'GET' them

I have a index named 'news' and a mapping named 'document' 'news' has 5 shards, 2 replicates. Below is the mapping of 'document'

I bulk-indexed 427410 json-documents. I got no errors. Everything went fine. Below is the code for bulk indexing (partial code actually)

Related topics

I have a index named 'news' and a mapping named 'document'
'news' has 5 shards, 2 replicates.
Below is the mapping of 'document'

I bulk-indexed 427410 json-documents. I got no errors. Everything went
fine.
Below is the code for bulk indexing (partial code actually)