Not able to understand the mapping in elasticsearch


(R_C) #1

Hi ,

I am very new to elasticsearch . Presently working in v5.4.3
I was ging through the concept of mapping but could not get much clarity on the different types of mapping .
For example -
#curl -XPUT 'http://localhost:9200/twitter/user/XYZ?pretty' -H 'Content-Type: application/json' -d '{ "name" : "ABC" }'

Is this automatic mapping (or default mapping ) ????? Since mapping is not defined before actual insertion of data takes place ???

And where we define the mapping template first and then insert data accordingly is explicit mapping ??
like below -
[root@node1 ~]# curl -X PUT "localhost:9200/test" -H 'Content-Type: application/json' -d'
{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "text" }
}
}
}
}
'
{"acknowledged":true,"shards_acknowledged":true,"index":"test"}[root@node1 ~]#

And now insert data after defining mapping
[root@node1 ~]# curl -X PUT "localhost:9200/test/type1/1?" -H 'Content-Type: application/json' -d'

{
"title": "User2-Document6"
}
'

Is my understanding correct ?? I read mapping related information in the official site but not very clear .
Kindly help me understand the same .
Any leads would be highly appreciable !!

Thanks,
R_C


(Thomas Dasch) #2

Hey Roshni,

With your first example,

PUT twitter/user/xyz
{
"name": "ABC"
}

you get a mapping of,

  "twitter": {
    "mappings": {
      "user": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256

ES has turned your name field into both text and keyword (look at multi-fields). Text gets run through an analyzer and keyword is the source.

With your second example, you are correct, you are using Explicit Mapping. You are setting up a mapping type of type1 with properties. field1 is where you should specify the field name.

In your example

> PUT test
> {
>   "mappings": {
>     "type1": {
>       "properties": {
>         "field1": {
>           "type": "text"
>         }
>       }
>     }
>   }
> }

If you insert your data

PUT test/type1/1
{
"title": "User2-Document6"
}

The result of your mapping if you run

GET test/_mapping/type1

will be

> {
>   "test": {
>     "mappings": {
>       "type1": {
>         "properties": {
>           "field1": {
>             "type": "text"
>           },
>           "title": {
>             "type": "text",
>             "fields": {
>               "keyword": {
>                 "type": "keyword",
>                 "ignore_above": 256
>               }
> 

Which I don't think was your intention. Your now have mapping for type1 and title You probably wanted just the title type. So you could have done the following

> PUT test
> {
>   "mappings": {
>     "type1": {
>       "properties": {
>         "title": {
>           "type": "text"
>         }

PUT test/type1/1
{
"title": "User2-Document6"
}

And then a GET test/_mapping/type1 would produce the following

> {
>   "test": {
>     "mappings": {
>       "type1": {
>         "properties": {
>           "title": {
>             "type": "text"
>           }

Which I think was your intended result, maybe? A good read is here. Hope this helps!


(R_C) #3

thanks .. that was much help ... !! require some more insight on indexing types :analysed, not_analysed and default

And also what is the exact role of keyword !!

Case 1

curl -X PUT "localhost:9200/elasticsearch_data" -H 'Content-Type: application/json' -d'
{
"mappings": {
"user" : {
"properties" : {
"text" : {
"type" : "string",
"analyzer": "standard"
}
}
}
}
}

And inserted same data 5 times (5 docs )
curl -X PUT "localhost:9200/elasticsearch_data/user/1?" -H 'Content-Type: application/json' -d'
{
"text": "This is a string only "
}
'
Case 2

curl -X PUT "localhost:9200/elasticsearch_data_notanalyzed" -H 'Content-Type: application/json' -d'
{
"mappings": {
"user" : {
"properties" : {
"text" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
'

And inserted data 5 times (same as above )

curl -X PUT "localhost:9200/elasticsearch_data_notanalyzed/user/1?" -H 'Content-Type: application/json' -d'
{
"text": "This is a string only "
}
'

Case 3:
Automatic mapping

curl -X PUT "localhost:9200/elasticsearch/user/5?" -H 'Content-Type: application/json' -d'
{
"text": "This is a string only "
}
'

(inserted 5 docs with same data )

Memory Status

[root@node1 ~]# curl -X GET "localhost:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open elasticsearch_data_notanalyzed HaTktqKVRVCS6cxeCg1sUg 5 1 5 0 14.2kb 14.2kb
yellow open elasticsearch_data Z8SkXvZwTNqaYVzXickCdA 5 1 5 0 14.2kb 14.2kb
yellow open elasticsearch tT_gYO4dRPa3AlOutylN9g 5 1 5 0 17.1kb 17.1kb

It seems for the last case it takes more storage space for same data .
Can you please help me analyze the reason for this observation

Thanks and regards,
Roshni


(Thomas Dasch) #4

I'm glad that was helpful! I'll do my best to try and answer your other questions.

String types are automatically analyzed as multi-fields by ES. The default analyzer is the Standard Analyzer. The two multi-field types are text, which by default uses the standard tokenizer to divide the text into tokens for the inverted index, and the other type is keyword which is not analyzed. keyword is the exact text that was put into ES. There is an older blog post here that could benefit you by explaining why Elastic made the switch to multi-fields for string types.

Having your data stored as text allows full-text search. Having your data stored as keyword allows keyword searches and aggregations to be performed.

Regarding your last question about the difference in the index sizes, I don't know.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.