How to migrate a field type


(John Chang) #1

I have dates mapped as strings, which was a mistake, and I need them mapped as longs so as to sort on them successfully (as I've been advised in previous threads - thanks!).

How can I handle this migration? A few options occur to me:

  1. Index into an completely new index. When it is done, switch over my queries.
  2. Create a new field. The old one (the string) is called "receivedDate" I could create "receivedDateAsLong" and then update all the documents to fill in that field. Once done, sort on that field.
  3. Do something with multi fields. In experiments, I found that I can take a non-multi field and turn it into a multi. However, I do wonder how I'd tell the sort which instance of the multi field to sort on.
  4. Something else I'm not thinking of that you might suggest.

Or, maybe none of these are the right idea. In my ideal world, I'd be left with 1 non-multi field with a long (the string field going away) and I would not need to rebuild an entirely new index. I don't know if that will be possible.

Also, if possible, I'd like to index the receivedDate field without having to get all the other fields. (I don't have the source anymore; I disabled it, and the original data for all the other fields is a pain for me to get at again). Again, this may not be possible, but it is a definite preference.

Thanks.


(Shay Banon) #2

Your simplest and cleanest solution going forward is to completely reindex
the data using the date mapping (and not string). Even if you add the new
date mapping as additional mapping, you will still need to reindex all the
data again (no option to update just one field in a doc), so a more
optimized manner is to just index into a new doc.

On Sat, Dec 18, 2010 at 2:10 AM, John Chang jchangkihtest2@gmail.comwrote:

I have dates mapped as strings, which was a mistake, and I need them mapped
as longs so as to sort on them successfully (as I've been advised in
previous threads - thanks!).

How can I handle this migration? A few options occur to me:

  1. Index into an completely new index. When it is done, switch over my
    queries.
  2. Create a new field. The old one (the string) is called "receivedDate"
    I
    could create "receivedDateAsLong" and then update all the documents to fill
    in that field. Once done, sort on that field.
  3. Do something with multi fields. In experiments, I found that I can take
    a non-multi field and turn it into a multi. However, I do wonder how I'd
    tell the sort which instance of the multi field to sort on.
  4. Something else I'm not thinking of that you might suggest.

Or, maybe none of these are the right idea. In my ideal world, I'd be left
with 1 non-multi field with a long (the string field going away) and I
would
not need to rebuild an entirely new index. I don't know if that will be
possible.

Also, if possible, I'd like to index the receivedDate field without having
to get all the other fields. (I don't have the source anymore; I disabled
it, and the original data for all the other fields is a pain for me to get
at again). Again, this may not be possible, but it is a definite
preference.

Thanks.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-migrate-a-field-type-tp2108002p2108002.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(John Chang) #3

Thanks for your above help. I am having trouble getting back at the original documents to reindex the data (long an unfortunate story). So for now, I am trying to come up with a work-around until I can get back to the original data to reindex. My dates at current is indexed as:

  1. Long value, but as type string. (foolish, I know)
  2. Long cast to a float, as type float. (another mistake)

I'm very sorry to have to ask for help on a hacky work-around based on mistakenly indexed data, but I'm not going to be able to get out that situation quickly.

When I try to sort of the long-stored-as-string it's unreliable. If my requested result set size is a relatively high percentage of the matching docs (say, 35 of 100 matching docs), it works fine. If I want a low percentage of matching docs (say, 35 of 1000 matching docs), they often don't come back sorted right (it seems to come back sorted in the order indexed, not in the order of the long-stored-as-string value). Also, BTW, I can't get it to sort ASCENDING, but that is not so bad, as I want DESCENDING.

When I try to sort off the long cast to a float (stored as type float), if my query term matches a large number of documents (thousands), then it is again unreliable. Sometimes it works fine (generally the first query after restarting the client). In subsequent queries, I tend to be missing the most recent month or so, but the results before that in time do come back and are sorted correctly.

Thanks,
John


(John Chang) #4

I should add: When I try to sort off the long cast to a float (stored as type float), if my query term matches a small number of documents, it works fine.


(Shay Banon) #5

I am not sure I understand then, is it a problem in ES and how it sorts
(based on the actual types mapped), or the problem with long loosing
resolution as float, and long represented as string?

You can try and use the scrolling search api to paginate thought docs and
reindex them, though it gets quick expensive as you paginate more. If you
can, yourself, chunk search requests into separate ones (for example, based
on time, or something else), then you can use search for each chunk and
reindex the data.

On Thu, Jan 6, 2011 at 8:16 PM, John Chang jchangkihtest2@gmail.com wrote:

I should add: When I try to sort off the long cast to a float (stored as
type
float), if my query term matches a small number of documents, it works
fine.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-migrate-a-field-type-tp2108002p2207200.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #6