Problem with _id path setting and fields with periods

Hello everyone! Searched the group for an answer to this question but
couldn't find anything. My apologies if something similar was posted.

We've created a mapping according to this documentation to try to tell
ElasticSearch where to pull the '_id' field from:
http://www.elasticsearch.org/guide/reference/mapping/id-field.html

The mapping looks like:
{
"document" : {
"_id" : {
"path" : "A.field"
}
}
}

This does not seem to work, because ES is treating the period as JSON
dot-notation. We've tried to escape the period to no avail. So the
documents we post look like:

{"A.field": "4000/1"}

But ES expects it to look like:

{"A": {"field": "4000/1"}}

So we end up having an auto-generated "_id" anyways.

A couple of questions:

  • Are there other "gotchas" like this caused by having periods in fields?
    Is that just generally a bad idea?
  • Is it recommended to not overwrite the "_id" field? Our IDs for our
    documents look like that above; are string IDs a bad idea? Is the slash a
    problem? Does it affect hashing/distribution of documents?

Thanks in advance for the help!

  • Ash

--

Hiya Ash

This does not seem to work, because ES is treating the period as JSON
dot-notation. We've tried to escape the period to no avail. So the
documents we post look like:

{"A.field": "4000/1"}

But ES expects it to look like:

{"A": {"field": "4000/1"}}

So we end up having an auto-generated "_id" anyways.

A couple of questions:

  • Are there other "gotchas" like this caused by having periods in
    fields? Is that just generally a bad idea?

It is generally a bad idea.

  • Is it recommended to not overwrite the "_id" field? Our IDs for our
    documents look like that above; are string IDs a bad idea? Is the
    slash a problem? Does it affect hashing/distribution of documents?

It is fine to override the _id field. The slash may be tricky, because
it needs to be passed in the URL, so you should make sure that it is
always URI escaped: 4000%2F1

clint

--

Thanks for the quick reply Clinton!

I figured it would be a bad idea to have periods in fields (due to Object
type and JSON dot notation for referencing nested fields), but the system
we are migrating from has a period in just about every field. I am assuming
there is no such thing as "field aliasing" in ES? Could not confirm or
deny this based on the brief research I did.

And thanks for the heads up with the slashes in the _id field. Figured it
had to be URI escaped.

  • Ash

On Tuesday, January 22, 2013 9:48:27 AM UTC-5, Clinton Gormley wrote:

Hiya Ash

This does not seem to work, because ES is treating the period as JSON
dot-notation. We've tried to escape the period to no avail. So the
documents we post look like:

{"A.field": "4000/1"}

But ES expects it to look like:

{"A": {"field": "4000/1"}}

So we end up having an auto-generated "_id" anyways.

A couple of questions:

  • Are there other "gotchas" like this caused by having periods in
    fields? Is that just generally a bad idea?

It is generally a bad idea.

  • Is it recommended to not overwrite the "_id" field? Our IDs for our
    documents look like that above; are string IDs a bad idea? Is the
    slash a problem? Does it affect hashing/distribution of documents?

It is fine to override the _id field. The slash may be tricky, because
it needs to be passed in the URL, so you should make sure that it is
always URI escaped: 4000%2F1

clint

--

Hiya Ash

I figured it would be a bad idea to have periods in fields (due to
Object type and JSON dot notation for referencing nested fields), but
the system we are migrating from has a period in just about every
field. I am assuming there is no such thing as "field aliasing" in ES?
Could not confirm or deny this based on the brief research I did.

You could try using the 'index_name' parameter in the field mapping to
control in field names in the index, but that introduces a level of
complexity that would be best avoided.

You say that you're migrating from an old system - it'd surely be better
in the long run to go ahead and rename your fields

clint

--