I have dates mapped as strings, which was a mistake, and I need them mapped as longs so as to sort on them successfully (as I've been advised in previous threads - thanks!).
How can I handle this migration? A few options occur to me:
Index into an completely new index. When it is done, switch over my queries.
Create a new field. The old one (the string) is called "receivedDate" I could create "receivedDateAsLong" and then update all the documents to fill in that field. Once done, sort on that field.
Do something with multi fields. In experiments, I found that I can take a non-multi field and turn it into a multi. However, I do wonder how I'd tell the sort which instance of the multi field to sort on.
Something else I'm not thinking of that you might suggest.
Or, maybe none of these are the right idea. In my ideal world, I'd be left with 1 non-multi field with a long (the string field going away) and I would not need to rebuild an entirely new index. I don't know if that will be possible.
Also, if possible, I'd like to index the receivedDate field without having to get all the other fields. (I don't have the source anymore; I disabled it, and the original data for all the other fields is a pain for me to get at again). Again, this may not be possible, but it is a definite preference.
Your simplest and cleanest solution going forward is to completely reindex
the data using the date mapping (and not string). Even if you add the new
date mapping as additional mapping, you will still need to reindex all the
data again (no option to update just one field in a doc), so a more
optimized manner is to just index into a new doc.
I have dates mapped as strings, which was a mistake, and I need them mapped
as longs so as to sort on them successfully (as I've been advised in
previous threads - thanks!).
How can I handle this migration? A few options occur to me:
Index into an completely new index. When it is done, switch over my
queries.
Create a new field. The old one (the string) is called "receivedDate"
I
could create "receivedDateAsLong" and then update all the documents to fill
in that field. Once done, sort on that field.
Do something with multi fields. In experiments, I found that I can take
a non-multi field and turn it into a multi. However, I do wonder how I'd
tell the sort which instance of the multi field to sort on.
Something else I'm not thinking of that you might suggest.
Or, maybe none of these are the right idea. In my ideal world, I'd be left
with 1 non-multi field with a long (the string field going away) and I
would
not need to rebuild an entirely new index. I don't know if that will be
possible.
Also, if possible, I'd like to index the receivedDate field without having
to get all the other fields. (I don't have the source anymore; I disabled
it, and the original data for all the other fields is a pain for me to get
at again). Again, this may not be possible, but it is a definite
preference.
Thanks for your above help. I am having trouble getting back at the original documents to reindex the data (long an unfortunate story). So for now, I am trying to come up with a work-around until I can get back to the original data to reindex. My dates at current is indexed as:
Long value, but as type string. (foolish, I know)
Long cast to a float, as type float. (another mistake)
I'm very sorry to have to ask for help on a hacky work-around based on mistakenly indexed data, but I'm not going to be able to get out that situation quickly.
When I try to sort of the long-stored-as-string it's unreliable. If my requested result set size is a relatively high percentage of the matching docs (say, 35 of 100 matching docs), it works fine. If I want a low percentage of matching docs (say, 35 of 1000 matching docs), they often don't come back sorted right (it seems to come back sorted in the order indexed, not in the order of the long-stored-as-string value). Also, BTW, I can't get it to sort ASCENDING, but that is not so bad, as I want DESCENDING.
When I try to sort off the long cast to a float (stored as type float), if my query term matches a large number of documents (thousands), then it is again unreliable. Sometimes it works fine (generally the first query after restarting the client). In subsequent queries, I tend to be missing the most recent month or so, but the results before that in time do come back and are sorted correctly.
I should add: When I try to sort off the long cast to a float (stored as type float), if my query term matches a small number of documents, it works fine.
I am not sure I understand then, is it a problem in ES and how it sorts
(based on the actual types mapped), or the problem with long loosing
resolution as float, and long represented as string?
You can try and use the scrolling search api to paginate thought docs and
reindex them, though it gets quick expensive as you paginate more. If you
can, yourself, chunk search requests into separate ones (for example, based
on time, or something else), then you can use search for each chunk and
reindex the data.
I should add: When I try to sort off the long cast to a float (stored as
type
float), if my query term matches a small number of documents, it works
fine.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.