Indexing documents with complex _id fields

I'm importing data from a mongodb store that uses subdocuments for the _id
field, and I'm having trouble searching against the data that I'm
importing. I've broken down the problem into the following example:

insert 2 documents, one with a complex _id:

curl -XPOST "localhost:9200/index/test" -d '{"key":"value"}'
curl -XPOST "localhost:9200/index/test" -d
'{"_id":{"name":"sfrenkiel"},"key":"value2"}}}'

fetching the first works, of course:

curl "localhost:9200/index/test/_search?pretty=true" -d
'{"query":{"term":{"key":"value"}}}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "index",
"_type" : "test",
"_id" : "A4AWuBXdQZqLxjxHK_jk9g",
"_score" : 0.30685282, "_source" : {"key":"value"}
} ]
}
}

but fetching the second does not:

curl "localhost:9200/index/test/_search?pretty=true" -d
'{"query":{"term":{"key":"value2"}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

using the following to discover the list of indexed values reveals that

only the first value has been indexed (if I understand correctly what this
command is doing):

curl -XGET "http://localhost:9200/index/_search?pretty=true" -d '{"query":
{"match_al": {}}, "facets": {"tag": {"terms": {"field": "key"}}}, "size":
0}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 1,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "value",
"count" : 1
} ]
}
}
}

I've tried different mapping strategies and whatever options I can think
of, and nothing seems to work. Strangely, entering the query to fetch the
second record into the head plugin seems to work, but curl (and the java
driver) do not. Can anyone help me out?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

I havent tested anything, but I think if you

you might get this behaviour. Try recreating your index with the mapping
configured correctly and I guess this works.

--Alex

On Sat, Aug 3, 2013 at 9:48 PM, Scott Frenkiel sfrenkiel@gmail.com wrote:

I'm importing data from a mongodb store that uses subdocuments for the _id
field, and I'm having trouble searching against the data that I'm
importing. I've broken down the problem into the following example:

insert 2 documents, one with a complex _id:

curl -XPOST "localhost:9200/index/test" -d '{"key":"value"}'
curl -XPOST "localhost:9200/index/test" -d
'{"_id":{"name":"sfrenkiel"},"key":"value2"}}}'

fetching the first works, of course:

curl "localhost:9200/index/test/_search?pretty=true" -d
'{"query":{"term":{"key":"value"}}}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "index",
"_type" : "test",
"_id" : "A4AWuBXdQZqLxjxHK_jk9g",
"_score" : 0.30685282, "_source" : {"key":"value"}
} ]
}
}

but fetching the second does not:

curl "localhost:9200/index/test/_search?pretty=true" -d
'{"query":{"term":{"key":"value2"}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

using the following to discover the list of indexed values reveals that

only the first value has been indexed (if I understand correctly what this
command is doing):

curl -XGET "http://localhost:9200/index/_search?pretty=true" -d
'{"query": {"match_al": {}}, "facets": {"tag": {"terms": {"field":
"key"}}}, "size": 0}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" :
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 1,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "value",
"count" : 1
} ]
}
}
}

I've tried different mapping strategies and whatever options I can think
of, and nothing seems to work. Strangely, entering the query to fetch the
second record into the head plugin seems to work, but curl (and the java
driver) do not. Can anyone help me out?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for responding.

  • Do a POST without specifying an ID in the URL (which means the id is
    autogenerated)
    I'm uncertain how to insert a record specifying the _id field in the URL
    when the _id field is a subdocument anyway. So POST is the only method I
    know.

{ "mappings": { "type" : { "properties" : { "_id" : { "path" : "_id" } } }
} }

And got an error "Trying to parse an object but has a different type". No
idea what that means.

--

Your post prompted me to look more closely at how my two documents are
stored. The one with the complex id looks like:

"hits" : [ {
  "_index" : "index",
  "_type" : "test",
  "_id" : "oPC7SzPXTEa8wT9iZ6S5gw",
  "_score" : 1.0, "_source" :

{"_id":{"name":"sfrenkiel"},"key":"value2"}}}
}, {

So I notice that the _id field in the document source is different from the
top-level _id. When using the mongodb river, I see that the top-level _id
field and the _id from the source are the same. Regardless, I see the same
problematic behavior in both cases (other fields in the original source doc
are not indexed).

For those not familiar with mongodb, they use the same _id key to represent
primary keys. So is there something special about inserting a record that
has a complex _id field. As an additional test, I inserted a doc with a
complex field NOT called _id, and was able to query the other fields in the
document just fine.

Thanks,
Scott

On Mon, Aug 5, 2013 at 4:25 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

I havent tested anything, but I think if you

you might get this behaviour. Try recreating your index with the mapping
configured correctly and I guess this works.

--Alex

On Sat, Aug 3, 2013 at 9:48 PM, Scott Frenkiel sfrenkiel@gmail.comwrote:

I'm importing data from a mongodb store that uses subdocuments for the
_id field, and I'm having trouble searching against the data that I'm
importing. I've broken down the problem into the following example:

insert 2 documents, one with a complex _id:

curl -XPOST "localhost:9200/index/test" -d '{"key":"value"}'
curl -XPOST "localhost:9200/index/test" -d
'{"_id":{"name":"sfrenkiel"},"key":"value2"}}}'

fetching the first works, of course:

curl "localhost:9200/index/test/_search?pretty=true" -d
'{"query":{"term":{"key":"value"}}}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "index",
"_type" : "test",
"_id" : "A4AWuBXdQZqLxjxHK_jk9g",
"_score" : 0.30685282, "_source" : {"key":"value"}
} ]
}
}

but fetching the second does not:

curl "localhost:9200/index/test/_search?pretty=true" -d
'{"query":{"term":{"key":"value2"}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" :
}
}

using the following to discover the list of indexed values reveals that

only the first value has been indexed (if I understand correctly what this
command is doing):

curl -XGET "http://localhost:9200/index/_search?pretty=true" -d
'{"query": {"match_al": {}}, "facets": {"tag": {"terms": {"field":
"key"}}}, "size": 0}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" :
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 1,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "value",
"count" : 1
} ]
}
}
}

I've tried different mapping strategies and whatever options I can think
of, and nothing seems to work. Strangely, entering the query to fetch the
second record into the head plugin seems to work, but curl (and the java
driver) do not. Can anyone help me out?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0at1uZBvN3k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Scott Frenkiel
(732) 239-5012
sfrenkiel@gmail.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.