How to deal with the fields you don't mean to index but you need it after query?

Jingang_Wang · March 8, 2013, 2:03pm

Hi guys,

The documents I would index contains 3 fields, doc_id, doc_content and
doc_time.
I would just make query in doc_content, but I also need the other 2 fields
after query.
In other words, I want to use hit.getSource().get("doc_id") and
hit.getSource().get("doc_time") after search in doc_content.
The solution I thought is making the two fields like ordinary items in
database.

The mapping string I construct are attached as following， I don't know
whether it is right.
XContentBuilder mapping = jsonBuilder()
.startObject()
.startObject("myType")
.startObject("properties")
.startObject("doc_id").field("type",
"string").field("index","no").endObject()
.startObject("doc_content").field("type",
"string").field("index", "yes").endObject()

.startObject("doc_time").field("type","string").field("index","no").endObject()

     .endObject()  
    .endObject()  
  .endObject();

Should I set the fields that I do not want index as "no" or "not_analyzed"?
Any help would be appreciated!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 8, 2013, 2:09pm

Heya

Should I set the fields that I do not want index as "no" or
"not_analyzed"?
Any help would be appreciated!

"no" is the correct value if you don't want the fields to be searchable.
"analyzed" is only useful for fields of type "string", and indicates
that the field should be searchable, and that it's value should be
passed through the analysis process before being indexed.

For non-string fields, the options are "no" or "not_analyzed" (as
non-strings never have an analysis process)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jingang_Wang · March 8, 2013, 2:13pm

Thank you, Clinton.
I conduct an experiment with a little document, ~8M Bytes.
The indices created on them reaches ~38M Bytes.
Is it a normal phenomenon?

Best,
Jingang

On Fri, Mar 8, 2013 at 10:09 PM, Clinton Gormley clint@traveljury.comwrote:

Heya

Should I set the fields that I do not want index as "no" or
"not_analyzed"?
Any help would be appreciated!

"no" is the correct value if you don't want the fields to be searchable.
"analyzed" is only useful for fields of type "string", and indicates
that the field should be searchable, and that it's value should be
passed through the analysis process before being indexed.

For non-string fields, the options are "no" or "not_analyzed" (as
non-strings never have an analysis process)

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 8, 2013, 2:26pm

On Fri, 2013-03-08 at 22:13 +0800, Jingang Wang wrote:

Thank you, Clinton.
I conduct an experiment with a little document, ~8M Bytes.
The indices created on them reaches ~38M Bytes.
Is it a normal phenomenon?

You're storing a few things:

indexes on the fields themselves
indexes on the _all field
the _source

Also you're doing this on 5 separate shards.

The index growth will slow down as you index more docs with terms in
common. You can set the _source field and the terms indexes to be
compressed (although with the next version compression will happen
automatically), and you can disable the _all field if you're not
intending to use it

clint

Best,
Jingang

On Fri, Mar 8, 2013 at 10:09 PM, Clinton Gormley
clint@traveljury.com wrote:
Heya
>
> Should I set the fields that I do not want index as "no" or
> "not_analyzed"?
> Any help would be appreciated!
    "no" is the correct value if you don't want the fields to be
    searchable.
    "analyzed" is only useful for fields of type "string", and
    indicates
    that the field should be searchable, and that it's value
    should be
    passed through the analysis process before being indexed.
    
    For non-string fields, the options are "no" or
    "not_analyzed" (as
    non-strings never have an analysis process)
    
    clint
    
    
    
    --
    You received this message because you are subscribed to a
    topic in the Google Groups "elasticsearch" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US.
    To unsubscribe from this group and all its topics, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jingang_Wang · March 8, 2013, 2:49pm

I'm using elasticsearch-0.90 beta1, so it accomplish compression
automatically, according to the official guide, which says that From
version 0.90 onwards, all stored fields (including _source) are always
compressed.
If I want to implement it manually, should I set the mapping like follows?

XContentBuilder mapping = jsonBuilder()
.startObject()
.startObject("myType")
.startObject("properties")
*
.startObject("_source").field("compress",
"true").endObject()*
.startObject("doc_id").field("type",
"string").field("index","no").endObject()
.startObject("doc_content").field("type",
"string").field("index", "yes").endObject()

.startObject("doc_time").field("type","string").field("index","no").endObject()

     .endObject()
    .endObject()
  .endObject();

On Fri, Mar 8, 2013 at 10:26 PM, Clinton Gormley clint@traveljury.comwrote:

On Fri, 2013-03-08 at 22:13 +0800, Jingang Wang wrote:

Thank you, Clinton.
I conduct an experiment with a little document, ~8M Bytes.
The indices created on them reaches ~38M Bytes.
Is it a normal phenomenon?

You're storing a few things:

indexes on the fields themselves

indexes on the _all field

the _source

Also you're doing this on 5 separate shards.

The index growth will slow down as you index more docs with terms in
common. You can set the _source field and the terms indexes to be
compressed (although with the next version compression will happen
automatically), and you can disable the _all field if you're not
intending to use it

clint
Best,
Jingang

On Fri, Mar 8, 2013 at 10:09 PM, Clinton Gormley
clint@traveljury.com wrote:
Heya
>
> Should I set the fields that I do not want index as "no" or
> "not_analyzed"?
> Any help would be appreciated!
    "no" is the correct value if you don't want the fields to be
    searchable.
    "analyzed" is only useful for fields of type "string", and
    indicates
    that the field should be searchable, and that it's value
    should be
    passed through the analysis process before being indexed.

    For non-string fields, the options are "no" or
    "not_analyzed" (as
    non-strings never have an analysis process)

    clint



    --
    You received this message because you are subscribed to a
    topic in the Google Groups "elasticsearch" group.
    To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US
.
    To unsubscribe from this group and all its topics, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 8, 2013, 3:03pm

On Fri, 2013-03-08 at 22:49 +0800, Jingang Wang wrote:

I'm using elasticsearch-0.90 beta1, so it accomplish compression
automatically, according to the official guide, which says that From
version 0.90 onwards, all stored fields (including _source) are always
compressed.
If I want to implement it manually, should I set the mapping like
follows?

If you're using 0.90+ then you don't need to set anything. it is always
compressed

clint

XContentBuilder mapping = jsonBuilder()
.startObject()
.startObject("myType")
.startObject("properties")

.startObject("_source").field("compress", "true").endObject()
.startObject("doc_id").field("type",
"string").field("index","no").endObject()
.startObject("doc_content").field("type",
"string").field("index", "yes").endObject()

.startObject("doc_time").field("type","string").field("index","no").endObject()
.endObject()
.endObject()
.endObject();

On Fri, Mar 8, 2013 at 10:26 PM, Clinton Gormley
clint@traveljury.com wrote:
On Fri, 2013-03-08 at 22:13 +0800, Jingang Wang wrote:
> Thank you, Clinton.
> I conduct an experiment with a little document, ~8M Bytes.
> The indices created on them reaches ~38M Bytes.
> Is it a normal phenomenon?

    You're storing a few things:
     - indexes on the fields themselves
     - indexes on the _all field
     - the _source
    
    Also you're doing this on 5 separate shards.
    
    The index growth will slow down as you index more docs with
    terms in
    common.  You can set the _source field and the terms indexes
    to be
    compressed (although with the next version compression will
    happen
    automatically), and you can disable the _all field if you're
    not
    intending to use it
    
    clint
    
    >
    >
    > Best,
    > Jingang
    >
    >
    > On Fri, Mar 8, 2013 at 10:09 PM, Clinton Gormley
    > <clint@traveljury.com> wrote:
    >         Heya
    >         >
    >         > Should I set the fields that I do not want index
    as "no" or
    >         > "not_analyzed"?
    >         > Any help would be appreciated!
    >
    >
    >         "no" is the correct value if you don't want the
    fields to be
    >         searchable.
    >         "analyzed" is only useful for fields of type
    "string", and
    >         indicates
    >         that the field should be searchable, and that it's
    value
    >         should be
    >         passed through the analysis process before being
    indexed.
    >
    >         For non-string fields, the options are "no" or
    >         "not_analyzed" (as
    >         non-strings never have an analysis process)
    >
    >         clint
    >
    >
    >
    >         --
    >         You received this message because you are subscribed
    to a
    >         topic in the Google Groups "elasticsearch" group.
    >         To unsubscribe from this topic, visit
    >
    https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US.
    >         To unsubscribe from this group and all its topics,
    send an
    >         email to elasticsearch+unsubscribe@googlegroups.com.
    >         For more options, visit
    >         https://groups.google.com/groups/opt_out.
    >
    >
    >
    >
    >
    >
    >
    > --
    > Wang Jingang(王金刚)
    > Ph.D Candidate at
    > Lab of High Volume Language Information Processing & Cloud
    Computing
    > School of Computer Science
    > Beijing Institute of Technology
    > Beijing 100081
    > P.R China
    >
    >
    > --
    
    > You received this message because you are subscribed to the
    Google
    > Groups "elasticsearch" group.
    > To unsubscribe from this group and stop receiving emails
    from it, send
    > an email to elasticsearch+unsubscribe@googlegroups.com.
    > For more options, visit
    https://groups.google.com/groups/opt_out.
    >
    >
    
    
    --
    You received this message because you are subscribed to a
    topic in the Google Groups "elasticsearch" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US.
    To unsubscribe from this group and all its topics, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jingang_Wang · March 8, 2013, 3:11pm

OK, I know it.
Thank you again for your patient explain.

On Fri, Mar 8, 2013 at 11:03 PM, Clinton Gormley clint@traveljury.comwrote:

On Fri, 2013-03-08 at 22:49 +0800, Jingang Wang wrote:

I'm using elasticsearch-0.90 beta1, so it accomplish compression
automatically, according to the official guide, which says that From
version 0.90 onwards, all stored fields (including _source) are always
compressed.
If I want to implement it manually, should I set the mapping like
follows?

If you're using 0.90+ then you don't need to set anything. it is always
compressed

clint

XContentBuilder mapping = jsonBuilder()
.startObject()
.startObject("myType")
.startObject("properties")

.startObject("_source").field("compress", "true").endObject()
.startObject("doc_id").field("type",
"string").field("index","no").endObject()
.startObject("doc_content").field("type",
"string").field("index", "yes").endObject()

.startObject("doc_time").field("type","string").field("index","no").endObject()
     .endObject()
    .endObject()
  .endObject();
On Fri, Mar 8, 2013 at 10:26 PM, Clinton Gormley
clint@traveljury.com wrote:
On Fri, 2013-03-08 at 22:13 +0800, Jingang Wang wrote:
> Thank you, Clinton.
> I conduct an experiment with a little document, ~8M Bytes.
> The indices created on them reaches ~38M Bytes.
> Is it a normal phenomenon?
    You're storing a few things:
     - indexes on the fields themselves
     - indexes on the _all field
     - the _source

    Also you're doing this on 5 separate shards.

    The index growth will slow down as you index more docs with
    terms in
    common.  You can set the _source field and the terms indexes
    to be
    compressed (although with the next version compression will
    happen
    automatically), and you can disable the _all field if you're
    not
    intending to use it

    clint

    >
    >
    > Best,
    > Jingang
    >
    >
    > On Fri, Mar 8, 2013 at 10:09 PM, Clinton Gormley
    > <clint@traveljury.com> wrote:
    >         Heya
    >         >
    >         > Should I set the fields that I do not want index
    as "no" or
    >         > "not_analyzed"?
    >         > Any help would be appreciated!
    >
    >
    >         "no" is the correct value if you don't want the
    fields to be
    >         searchable.
    >         "analyzed" is only useful for fields of type
    "string", and
    >         indicates
    >         that the field should be searchable, and that it's
    value
    >         should be
    >         passed through the analysis process before being
    indexed.
    >
    >         For non-string fields, the options are "no" or
    >         "not_analyzed" (as
    >         non-strings never have an analysis process)
    >
    >         clint
    >
    >
    >
    >         --
    >         You received this message because you are subscribed
    to a
    >         topic in the Google Groups "elasticsearch" group.
    >         To unsubscribe from this topic, visit
    >
https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US
.
    >         To unsubscribe from this group and all its topics,
    send an
    >         email to elasticsearch+unsubscribe@googlegroups.com.
    >         For more options, visit
    >         https://groups.google.com/groups/opt_out.
    >
    >
    >
    >
    >
    >
    >
    > --
    > Wang Jingang(王金刚)
    > Ph.D Candidate at
    > Lab of High Volume Language Information Processing & Cloud
    Computing
    > School of Computer Science
    > Beijing Institute of Technology
    > Beijing 100081
    > P.R China
    >
    >
    > --

    > You received this message because you are subscribed to the
    Google
    > Groups "elasticsearch" group.
    > To unsubscribe from this group and stop receiving emails
    from it, send
    > an email to elasticsearch+unsubscribe@googlegroups.com.
    > For more options, visit
    https://groups.google.com/groups/opt_out.
    >
    >


    --
    You received this message because you are subscribed to a
    topic in the Google Groups "elasticsearch" group.
    To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US
.
    To unsubscribe from this group and all its topics, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zJxgaHtCN9c/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
How to set field values when put mapping Elasticsearch	2	506	July 6, 2017
Java API: Using not analyzed fields Elasticsearch	1	1074	July 6, 2017
How to Not index a field Elasticsearch	3	412	July 6, 2017
Switching off the indexing on specific fields Elasticsearch	7	1661	May 11, 2018
Elasticsearch index mapping in java Elasticsearch	7	2305	July 6, 2017

How to deal with the fields you don't mean to index but you need it after query?

Related topics