Java API: Indexing not analyzed text fields

Ryan_Chazen · February 8, 2014, 9:30pm

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe818851-a429-47cc-b119-142c4f0933be%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ryan_Chazen · February 8, 2014, 9:51pm

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update it
with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be very
slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · February 10, 2014, 12:59pm

Hey,

you may want to use index templates to make sure, that you do not have to
check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update it
with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be very
slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ryan_Chazen · February 10, 2014, 4:01pm

Hey, thanks - that looks useful. I'm guessing there is no java api for it
though? It's like the java api devs went half way and then got bored.

On Mon, Feb 10, 2014 at 2:59 PM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

you may want to use index templates to make sure, that you do not have to
check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
Elasticsearch Platform — Find real-time answers at scale | Elastic

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update
it with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be
very slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's not
actually text, just a string id or other data field. This is inside an ORM,
so I'd like to have it set to use a not analyzed index automatically when
inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADawRD2dsjmgaGFZZ8g5LPW9RMd26bwqy%2BPyYZn7XZwMYtrU4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · February 10, 2014, 4:08pm

Hey,

no, not at all... basically every REST/HTTP action is using the java API.
What makes you think so?
Just take a look at the RestGetIndexTemplateAction, and all the other
Rest*IndexTemplateAction to see how it is wokring...

--Alex

On Mon, Feb 10, 2014 at 5:01 PM, Ryan Chazen ryanza@gmail.com wrote:

Hey, thanks - that looks useful. I'm guessing there is no java api for it
though? It's like the java api devs went half way and then got bored.

On Mon, Feb 10, 2014 at 2:59 PM, Alexander Reelsen alr@spinscale.dewrote:

Hey,

you may want to use index templates to make sure, that you do not have to
check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
Elasticsearch Platform — Find real-time answers at scale | Elastic

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create it
with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update
it with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be
very slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's
not actually text, just a string id or other data field. This is inside an
ORM, so I'd like to have it set to use a not analyzed index automatically
when inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADawRD2dsjmgaGFZZ8g5LPW9RMd26bwqy%2BPyYZn7XZwMYtrU4Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9TNHigQL4KBCY6StTJvFwH5BoF-wvOC6g3ZTCMSwRU7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ryan_Chazen · February 10, 2014, 4:19pm

A google search for "elasticsearch java api" + feature didnt return
anything, so I just assumed.. my bad. Not that I'd have ever thought to
look for a Rest*IndexTemplateAction

I think I've tracked it down now though, thanks!

So if I set a template mapping for "default" as not analyzed, it will
make all indexes created with an explicit mapping be not analyzed, correct?
Then I could just set the particular indexes that need analysis (using
@Analyzed or similar in my class). Defaulting to not analyzed seems to make
sense, since most fields just need basic matching for property storage and
only things like names would need analysis.

Am I making a bad choice there? I'm assuming that using a dual mapping for
everything would double (or more) the total index size.

On Mon, Feb 10, 2014 at 6:08 PM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

no, not at all... basically every REST/HTTP action is using the java API.
What makes you think so?
Just take a look at the RestGetIndexTemplateAction, and all the other
Rest*IndexTemplateAction to see how it is wokring...

--Alex

On Mon, Feb 10, 2014 at 5:01 PM, Ryan Chazen ryanza@gmail.com wrote:

Hey, thanks - that looks useful. I'm guessing there is no java api for it
though? It's like the java api devs went half way and then got bored.

On Mon, Feb 10, 2014 at 2:59 PM, Alexander Reelsen alr@spinscale.dewrote:

Hey,

you may want to use index templates to make sure, that you do not have
to check this stuff inside of your application and can just rely that the
mappign is configured as expected, see

See
Elasticsearch Platform — Find real-time answers at scale | Elastic

--Alex

On Sat, Feb 8, 2014 at 10:51 PM, Ryan Chazen ryanza@gmail.com wrote:

OK Nevermind, I see how it's meant to be done now:

On app start, check if the index exists already: if it doesn't create
it with the correct mappings
If it does, retrieve the mappings and compare. If they're wrong, update
it with a putmapping with the correct mappings.

I'd guess that doing the mapping compares during data insert would be
very slow and it's why the API is set up this way. Makes sense.

So only one question: is the "retrieve mappings and compare" step
necessary? If I always putmapping with the correct mappings, will it just
do nothing if they're already correct?

On Saturday, February 8, 2014 11:30:56 PM UTC+2, Ryan Chazen wrote:

Hey

I'm inserting some documents using

Map<String, Object> map = new HashMap<>();
map.put("field1", "something");
...
client.prepareIndex("a", "a").setSource(map).get();

This inserts field1 as a text field with an analyzed index, but it's
not actually text, just a string id or other data field. This is inside an
ORM, so I'd like to have it set to use a not analyzed index automatically
when inserting the data.
I can't seem to find anything in the API for defining the mapping when
indexing. Have I missed something? I'd prefer not to have to run a special
mapping query after inserting the data as it would slow the ORM down to be
constantly checking if indexes are analyzed or not...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f890caf7-00ae-4f16-8e7b-e5ea1833a23a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9dVqEzjzOLgLR920O7s3QpO6R1bNGAPpH2X4qK4%3DXhJA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADawRD2dsjmgaGFZZ8g5LPW9RMd26bwqy%2BPyYZn7XZwMYtrU4Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9TNHigQL4KBCY6StTJvFwH5BoF-wvOC6g3ZTCMSwRU7Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADawRD3N5_buWTDOo_vSifmfgnZwqWVwKhovrBQgUYGNU3cOuw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · February 10, 2014, 4:22pm

Ryan,

The retrieve mappings step is not necessary. If it's automated, then it
would also need to handle the situation where the new mappings are not
compatible with the existing mappings. And ES already does that.

Your steps to check for the index and then, only if the index doesn't
exist, create the index and put the mappings is good. I wrote a tool that
does this, and I call it before my bulk-load updates are performed. I can
also use it to update mappings, though I use that only for experimentation
and not much for production. Still, it works well but sometimes I run into
the case where the new mappings aren't compatible and then I learn a new
thing or two!

The template idea is good; I used to use that. But once I figured out the
Java put and get mappings, I don't do that anymore. I have 6 different
indices on my laptop and each has its own unique mapping that is very
different from that of any other index. So a template would be of no use to
me.

I do lock down ES so that an index is never created automatically, nor is a
new field in a new type created automatically; an explicit mapping is
always required. That has really helped me juggle those 6 different indices
and catch field name spelling errors and bad data really quickly.

Hope this helps! Post back if you'd like additional details on the Java API
for putting and updating mappings.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c511fe05-544d-422e-9f79-e1d3ef924bed%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ryan_Chazen · February 10, 2014, 4:49pm

Thanks Brian, that makes sense. I'm going for a much more generic attempt
than just 6 defined indexes though - but the base idea should stay the
same.

Mappings not being compatible would be a problem. I think in my case if a
mapping gets changed to be incompatible, I'd probably just throw an
exception and terminate. Maybe with a special run mode to download an
index's source data as json to a file, drop and re-create the index, and
then upload the data again. Obviously that would need to be something run
manually by a user after taking their site down though as it would not work
in production and not really possible to do nicely.

What I'm trying to put together is a full async ORM for elasticsearch that
lets you use elasticsearch as a DB (similar to how you'd use mongodb with
something like morphia). Mostly it's at the experimental stage as I wanted
to learn what you'd need to do to make an ORM for Java, along with learning
Elasticsearch.

It works well for storing/retrieving complex pojos, and for simple queries,
but it's missing useful advanced features like setting fields to use
specific analysis using annotations and that kind of thing which I'm trying
to work out how to add now. It's a pretty fun project so far, and it's
turning into something useful I think.

On Mon, Feb 10, 2014 at 6:22 PM, InquiringMind brian.from.fl@gmail.comwrote:

Ryan,

The retrieve mappings step is not necessary. If it's automated, then it
would also need to handle the situation where the new mappings are not
compatible with the existing mappings. And ES already does that.

Your steps to check for the index and then, only if the index doesn't
exist, create the index and put the mappings is good. I wrote a tool that
does this, and I call it before my bulk-load updates are performed. I can
also use it to update mappings, though I use that only for experimentation
and not much for production. Still, it works well but sometimes I run into
the case where the new mappings aren't compatible and then I learn a new
thing or two!

The template idea is good; I used to use that. But once I figured out the
Java put and get mappings, I don't do that anymore. I have 6 different
indices on my laptop and each has its own unique mapping that is very
different from that of any other index. So a template would be of no use to
me.

I do lock down ES so that an index is never created automatically, nor is
a new field in a new type created automatically; an explicit mapping is
always required. That has really helped me juggle those 6 different indices
and catch field name spelling errors and bad data really quickly.

Hope this helps! Post back if you'd like additional details on the Java
API for putting and updating mappings.

Brian

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0x7VcpS9zGE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c511fe05-544d-422e-9f79-e1d3ef924bed%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADawRD1Px-jK2NO4NtXdD4ahheOR_D_zXdAWLvMxcEAT%3D%3DqnWw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Java API: Using not analyzed fields Elasticsearch	1	1074	July 6, 2017
Elasticsearch Java API prepareIndex overwrites existing mapping Elasticsearch	7	917	July 6, 2017
Mapping problem Elasticsearch	3	232	July 6, 2017
Default mapping for strings : analyzed vs non_analyzed Elasticsearch	1	298	July 6, 2017
Default mapping for string - analyzed vs non_analyzed Elasticsearch	1	346	July 6, 2017

Java API: Indexing not analyzed text fields

Related topics