This does not seem to work, because ES is treating the period as JSON
dot-notation. We've tried to escape the period to no avail. So the
documents we post look like:
{"A.field": "4000/1"}
But ES expects it to look like:
{"A": {"field": "4000/1"}}
So we end up having an auto-generated "_id" anyways.
A couple of questions:
Are there other "gotchas" like this caused by having periods in fields?
Is that just generally a bad idea?
Is it recommended to not overwrite the "_id" field? Our IDs for our
documents look like that above; are string IDs a bad idea? Is the slash a
problem? Does it affect hashing/distribution of documents?
This does not seem to work, because ES is treating the period as JSON
dot-notation. We've tried to escape the period to no avail. So the
documents we post look like:
{"A.field": "4000/1"}
But ES expects it to look like:
{"A": {"field": "4000/1"}}
So we end up having an auto-generated "_id" anyways.
A couple of questions:
Are there other "gotchas" like this caused by having periods in
fields? Is that just generally a bad idea?
It is generally a bad idea.
Is it recommended to not overwrite the "_id" field? Our IDs for our
documents look like that above; are string IDs a bad idea? Is the
slash a problem? Does it affect hashing/distribution of documents?
It is fine to override the _id field. The slash may be tricky, because
it needs to be passed in the URL, so you should make sure that it is
always URI escaped: 4000%2F1
I figured it would be a bad idea to have periods in fields (due to Object
type and JSON dot notation for referencing nested fields), but the system
we are migrating from has a period in just about every field. I am assuming
there is no such thing as "field aliasing" in ES? Could not confirm or
deny this based on the brief research I did.
And thanks for the heads up with the slashes in the _id field. Figured it
had to be URI escaped.
Ash
On Tuesday, January 22, 2013 9:48:27 AM UTC-5, Clinton Gormley wrote:
Hiya Ash
This does not seem to work, because ES is treating the period as JSON
dot-notation. We've tried to escape the period to no avail. So the
documents we post look like:
{"A.field": "4000/1"}
But ES expects it to look like:
{"A": {"field": "4000/1"}}
So we end up having an auto-generated "_id" anyways.
A couple of questions:
Are there other "gotchas" like this caused by having periods in
fields? Is that just generally a bad idea?
It is generally a bad idea.
Is it recommended to not overwrite the "_id" field? Our IDs for our
documents look like that above; are string IDs a bad idea? Is the
slash a problem? Does it affect hashing/distribution of documents?
It is fine to override the _id field. The slash may be tricky, because
it needs to be passed in the URL, so you should make sure that it is
always URI escaped: 4000%2F1
I figured it would be a bad idea to have periods in fields (due to
Object type and JSON dot notation for referencing nested fields), but
the system we are migrating from has a period in just about every
field. I am assuming there is no such thing as "field aliasing" in ES?
Could not confirm or deny this based on the brief research I did.
You could try using the 'index_name' parameter in the field mapping to
control in field names in the index, but that introduces a level of
complexity that would be best avoided.
You say that you're migrating from an old system - it'd surely be better
in the long run to go ahead and rename your fields
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.