Java API: Indexing not analyzed text fields


(Ryan Chazen) #1

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe818851-a429-47cc-b119-142c4f0933be%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ryan Chazen) #2

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update it
with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be very
slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #3

Hey,

you may want to use index templates to make sure, that you do not have to
check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update it
with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be very
slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ryan Chazen) #4

Hey, thanks - that looks useful. I'm guessing there is no java api for it
though? It's like the java api devs went half way and then got bored.

On Mon, Feb 10, 2014 at 2:59 PM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

you may want to use index templates to make sure, that you do not have to
check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update
it with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be
very slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADawRD2dsjmgaGFZZ8g5LPW9RMd26bwqy%2BPyYZn7XZwMYtrU4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #5

Hey,

no, not at all... basically every REST/HTTP action is using the java API.
What makes you think so?
Just take a look at the RestGetIndexTemplateAction, and all the other
Rest*IndexTemplateAction to see how it is wokring...

--Alex

On Mon, Feb 10, 2014 at 5:01 PM, Ryan Chazen ryanza@gmail.com wrote:

Hey, thanks - that looks useful. I'm guessing there is no java api for it
though? It's like the java api devs went half way and then got bored.

On Mon, Feb 10, 2014 at 2:59 PM, Alexander Reelsen alr@spinscale.dewrote:

Hey,

you may want to use index templates to make sure, that you do not have to
check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update
it with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be
very slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's
not actually text, just a string id or other data field. This is inside an
ORM, so I'd like to have it set to use a not analyzed index automatically
when inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADawRD2dsjmgaGFZZ8g5LPW9RMd26bwqy%2BPyYZn7XZwMYtrU4Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9TNHigQL4KBCY6StTJvFwH5BoF-wvOC6g3ZTCMSwRU7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ryan Chazen) #6

A google search for "elasticsearch java api" + feature didnt return
anything, so I just assumed.. my bad. Not that I'd have ever thought to
look for a Rest*IndexTemplateAction :smiley:

I think I've tracked it down now though, thanks!

So if I set a template mapping for "default" as not analyzed, it will
make all indexes created with an explicit mapping be not analyzed, correct?
Then I could just set the particular indexes that need analysis (using
@Analyzed or similar in my class). Defaulting to not analyzed seems to make
sense, since most fields just need basic matching for property storage and
only things like names would need analysis.

Am I making a bad choice there? I'm assuming that using a dual mapping for
everything would double (or more) the total index size.

On Mon, Feb 10, 2014 at 6:08 PM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

no, not at all... basically every REST/HTTP action is using the java API.
What makes you think so?
Just take a look at the RestGetIndexTemplateAction, and all the other
Rest*IndexTemplateAction to see how it is wokring...

--Alex

On Mon, Feb 10, 2014 at 5:01 PM, Ryan Chazen ryanza@gmail.com wrote:

Hey, thanks - that looks useful. I'm guessing there is no java api for it
though? It's like the java api devs went half way and then got bored.

On Mon, Feb 10, 2014 at 2:59 PM, Alexander Reelsen alr@spinscale.dewrote:

Hey,

you may want to use index templates to make sure, that you do not have
to check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create
it with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update
it with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be
very slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's
not actually text, just a string id or other data field. This is inside an
ORM, so I'd like to have it set to use a not analyzed index automatically
when inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADawRD2dsjmgaGFZZ8g5LPW9RMd26bwqy%2BPyYZn7XZwMYtrU4Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9TNHigQL4KBCY6StTJvFwH5BoF-wvOC6g3ZTCMSwRU7Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADawRD3N5_buWTDOo_vSifmfgnZwqWVwKhovrBQgUYGNU3cOuw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #7

Ryan,

The retrieve mappings step is not necessary. If it's automated, then it
would also need to handle the situation where the new mappings are not
compatible with the existing mappings. And ES already does that.

Your steps to check for the index and then, only if the index doesn't
exist, create the index and put the mappings is good. I wrote a tool that
does this, and I call it before my bulk-load updates are performed. I can
also use it to update mappings, though I use that only for experimentation
and not much for production. Still, it works well but sometimes I run into
the case where the new mappings aren't compatible and then I learn a new
thing or two!

The template idea is good; I used to use that. But once I figured out the
Java put and get mappings, I don't do that anymore. I have 6 different
indices on my laptop and each has its own unique mapping that is very
different from that of any other index. So a template would be of no use to
me.

I do lock down ES so that an index is never created automatically, nor is a
new field in a new type created automatically; an explicit mapping is
always required. That has really helped me juggle those 6 different indices
and catch field name spelling errors and bad data really quickly.

Hope this helps! Post back if you'd like additional details on the Java API
for putting and updating mappings.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c511fe05-544d-422e-9f79-e1d3ef924bed%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ryan Chazen) #8

Thanks Brian, that makes sense. I'm going for a much more generic attempt
than just 6 defined indexes though - but the base idea should stay the
same.

Mappings not being compatible would be a problem. I think in my case if a
mapping gets changed to be incompatible, I'd probably just throw an
exception and terminate. Maybe with a special run mode to download an
index's source data as json to a file, drop and re-create the index, and
then upload the data again. Obviously that would need to be something run
manually by a user after taking their site down though as it would not work
in production and not really possible to do nicely.

What I'm trying to put together is a full async ORM for elasticsearch that
lets you use elasticsearch as a DB (similar to how you'd use mongodb with
something like morphia). Mostly it's at the experimental stage as I wanted
to learn what you'd need to do to make an ORM for Java, along with learning
elastic search.

It works well for storing/retrieving complex pojos, and for simple queries,
but it's missing useful advanced features like setting fields to use
specific analysis using annotations and that kind of thing which I'm trying
to work out how to add now. It's a pretty fun project so far, and it's
turning into something useful I think.

On Mon, Feb 10, 2014 at 6:22 PM, InquiringMind brian.from.fl@gmail.comwrote:

Ryan,

The retrieve mappings step is not necessary. If it's automated, then it
would also need to handle the situation where the new mappings are not
compatible with the existing mappings. And ES already does that.

Your steps to check for the index and then, only if the index doesn't
exist, create the index and put the mappings is good. I wrote a tool that
does this, and I call it before my bulk-load updates are performed. I can
also use it to update mappings, though I use that only for experimentation
and not much for production. Still, it works well but sometimes I run into
the case where the new mappings aren't compatible and then I learn a new
thing or two!

The template idea is good; I used to use that. But once I figured out the
Java put and get mappings, I don't do that anymore. I have 6 different
indices on my laptop and each has its own unique mapping that is very
different from that of any other index. So a template would be of no use to
me.

I do lock down ES so that an index is never created automatically, nor is
a new field in a new type created automatically; an explicit mapping is
always required. That has really helped me juggle those 6 different indices
and catch field name spelling errors and bad data really quickly.

Hope this helps! Post back if you'd like additional details on the Java
API for putting and updating mappings.

Brian

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c511fe05-544d-422e-9f79-e1d3ef924bed%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADawRD1Px-jK2NO4NtXdD4ahheOR_D_zXdAWLvMxcEAT%3D%3DqnWw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #9