Cannot set eager in ES as by ES guide


#1

Hi there.
I'm running ES stack (ES, KB, LS, FB) version 5.5.1, Because I have 780 000 000 entries, we wanted to make some fields eager loaded.

I've sticked to ES guide:
https://www.elastic.co/guide/en/elasticsearch/guide/current/preload-fielddata.html

So my index is named: statistika, so I've tried to execute:

PUT /statistika/_mapping/log
{
  "acronym": {
    "type": "text",
    "fielddata": {
      "loading": "eager"
    }
  }
}

After running this, I've get this result:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Root mapping definition has unsupported parameters:  [acronym : {type=text, fielddata={loading=eager}}]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Root mapping definition has unsupported parameters:  [acronym : {type=text, fielddata={loading=eager}}]"
  },
  "status": 400
}

So can anyone please help me, hot to set eager or eager_global_ordinals.

Thanks,
Mojster


(Shane Connelly) #2

You've got a few problems. Most of it relates to the fact that the guide has unfortunately been last updated around Elasticsearch 2.x and is missing some of the changes that have happened in 5.x. The guide does largely still apply, but one of the bigger changes that happened between versions 2 and 5 was the split of the string type to text and keyword. There is a bit of history there, so if you're interested in reading up on why this changed and what text and keyword actually mean, you may want to have a read at https://www.elastic.co/blog/strings-are-dead-long-live-strings

Ok, so first and foremost is that you're missing a top-level properties on your mapping. That is, if you wanted just to create a text type field in the statistika index and log type, it'd be as follows:

PUT /statistika/_mapping/log
{
  "properties": {
    "acronym": {
      "type": "text"
    }
  }
}

The error you're getting is saying that acronym can't be at the root level, because Elasticsearch is expecting properties at the root level and then acronym under that.

The second thing is that the mappings for fielddata and string changed between 2.x and 5.x, for the reasons outlined in the blog. If you're on our reference docs, you can have a look at the old 2.4 string type at https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html and compare it to the new text type at https://www.elastic.co/guide/en/elasticsearch/reference/5.5/text.html and new keyword type at https://www.elastic.co/guide/en/elasticsearch/reference/5.5/keyword.html. From the naming (acronym), it seems likely that you actually want to use a keyword type in 5.x rather than text type. You may also note here that the documentation is versioned: the URL contains the version you're on, so you can match this up to the version you're using for the most up-to-date information. There's a drop-down selector in the upper-right hand of every docs page that lets you select the version of the docs you're on.

The third thing is that before turning on fielddata / eager_global_ordinals on a text field, you may want to look into if/why you may want or not want to. Docs for that are at https://www.elastic.co/guide/en/elasticsearch/reference/5.5/fielddata.html

Finally, to address your question directly, if you have a look at the 5.5 text docs I linked above, you can note specifically fielddata and eager_global_ordinals parameters. Putting it all together, the following provides eager global ordinals on the acronym field:

PUT /yourindex/_mapping/yourtype
{
 "properties": {
   "yourfield": {
     "type": "text",
     "fielddata": true,
     "eager_global_ordinals": true
   }
 }
}

and if you want to enable eager global ordinals on a keyword type, it's

PUT /yourindex/_mapping/yourtype
{
 "properties": {
   "yourfield": {
     "type": "keyword",
     "eager_global_ordinals": true
   }
 }
}

#3

Thanks for your reply. It's really kind of you to explain the background of changes in ES regarding my issue.

Yes I've seen that the the version is 2.x, but I've thought, if there is no new version, than nothing has change.
After experimenting, I've found a few gitHub issues, so that explained to me, that some rework has been done.

You asked me why I need fielddata functionality. I'm running currently an index with 780 000 000 entries. If there are no searches made for some time, then the first search takes quite some time. I've thought to use eager_global_ordinals to solve this issue. My first field should be acronym, because all searches, must be acronym based.

All my fields/types were created with Logstash. After applying fielddata I've still have both types on this field. Is there a way to remove type text/keyword and just to preserve one?

For example, my current _mapping

{
  "statistika": {
    "mappings": {
      "log": {
        "acronym": {
          "full_name": "acronym",
          "mapping": {
            "acronym": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

and if I try to apply eager global ordinals on keyword type, I get this error

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "mapper [acronym] of different type, current_type [text], merged_type [keyword]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "mapper [acronym] of different type, current_type [text], merged_type [keyword]"
  },
  "status": 400
}

Is there a way to do a mass update? I found an options with update_all_type but couldn't figure it out how to use it.
Logstash created 65 types, so this would come handy.


(Shane Connelly) #4

You're getting both the text and keyword field because you've applied string to the mapping. The first link I posted last time ("strings are dead, long live strings") explains why this is happening. It explains that if you had created a string type for acronym, you'd end up with an acronym field of type text and an acronym.keyword field of type keyword. That's exactly what you have here. I think what you want out of this acronym field based upon the name is for it to be a keyword type, not a text type. This means you need to set that up in your Logstash index template. After that, if you want to completely eliminate the text type field, you'll need to create a new index and reindex your documents from your existing index into the new one. There's no way to delete a field from a mapping without reindexing.


#5

Thanks again for your answer.

I've read the article about strings. I know this is happening, because I've let LS create those fields on its own.
Yes, the acronym fields it an abbreviation of our title field.

So my first step is to a create a new index and define all of those 65 types with all attributes manually.
After that, I should reindex all data to my new index.

I'll let you know of my progress.


#6

@shanec:
I've started first on a small index with only one mapping.

I know, I can change string to date with mutate plugin. As I see know it can only change to: integer, float, string, and boolean.
So is there a way to create a keyword field with LS, or must I create all the mapping before starting with LS.


#7

Fond the answer here:
​Little Logstash Lessons: Using Logstash to help create an Elasticsearch mapping template

So yes, I have to write all the mapping.


#8

I've removed the old index and created a new one with mapping predefined.

Last time 760 710 643 went through logstash without no problems. This time I've a predefined mapping and replaced kv filter with custom written filter in ruby, because kv was not working as it should.

Afer 81 711 586 I'm on a standstill. I'm getting this error:

2017-09-07T08:37:04+02:00 ERR Failed to publish events caused by: read tcp 127.0.0.1:49569->127.0.0.1:5044: i/o timeout
2017-09-07T08:37:04+02:00 INFO Error publishing events (retrying): read tcp 127.0.0.1:49569->127.0.0.1:5044: i/o timeout
2017-09-07T08:37:33+02:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_errors=1 libbeat.logstash.published_but_not_acked_events=1940
2017-09-07T08:38:03+02:00 INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=550

As it was recomended here: ERR Failed to publish events caused by: read tcp IP:40634->IP:5044: i/o timeout I've raised timeout to 60s and it still didn't work.


#9

I've found this in logstash log:

[2017-09-07T08:50:36,681][ERROR][logstash.outputs.elasticsearch] Encountered an unexpected error submitting a bulk request! Will retry. {:error_message=>"incompatible encodings: Windows-1250 and UTF-8", :class=>"Encoding::CompatibilityError", :backtrace=>["e:/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.8-java/lib/logstash/outputs/elasticsearch/common.rb:153:in `submit'", "org/jruby/RubyArray.java:1613:in `each'", "org/jruby/RubyEnumerable.java:971:in `each_with_index'", "e:/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.8-java/lib/logstash/outputs/elasticsearch/common.rb:131:in `submit'", "e:/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.8-java/lib/logstash/outputs/elasticsearch/common.rb:91:in `retrying_submit'", "e:/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.8-java/lib/logstash/outputs/elasticsearch/common.rb:42:in `multi_receive'", "e:/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:13:in `multi_receive'", "e:/logstash/logstash-core/lib/logstash/output_delegator.rb:47:in `multi_receive'", "e:/logstash/logstash-core/lib/logstash/pipeline.rb:420:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "e:/logstash/logstash-core/lib/logstash/pipeline.rb:419:in `output_batch'", "e:/logstash/logstash-core/lib/logstash/pipeline.rb:365:in `worker_loop'", "e:/logstash/logstash-core/lib/logstash/pipeline.rb:330:in `start_workers'"]}

And here's my pipeline:

# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
	beats {
        port => "5044"
    }
}
# The filter part of this file is commented out to indicate that it is
# optional.
filter {
	mutate {
		gsub => ["message", "\|C3\|", "|cir=C3|"]
	}

#	kv {
#		field_split => "|"
#	}
	ruby {
		code => "
			a = event.get('message').split('|').delete_if{|x| !x.match(/=/)}
			a.each {|y| b = y.split('=', 2)
				event.set(b[0].strip, b[1])
			}
			event.set('acronym', event.get('acronym').upcase)"
	}
	mutate {
		gsub => ["date", " ", ";"]
		convert => {"type" => "integer"}
		convert => {"rptPackageStatus" => "integer"}
		add_field => {"country" => "si"}
	}
	date {
		locale => "en"
		match => ["date", "dd.MM.YYYY;HH:mm:ss"]
		timezone => "Europe/Ljubljana"
		target => "date"
	}
	date {
		locale => "en"
		match => ["returnDate", "dd.MM.YYYY"]
		timezone => "Europe/Ljubljana"
		target => "returnDate"
	}
	date {
		locale => "en"
		match => ["firstsignUpDate", "dd.MM.YYYY"]
		timezone => "Europe/Ljubljana"
		target => "firstsignUpDate"
	}
	date {
		locale => "en"
		match => ["lastVisitDate", "dd.MM.YYYY"]
		timezone => "Europe/Ljubljana"
		target => "lastVisitDate"
	}
	date {
		locale => "en"
		match => ["loanDate", "dd.MM.YYYY"]
		timezone => "Europe/Ljubljana"
		target => "loanDate"
	}
	date {
		locale => "en"
		match => ["lastProlongDate", "dd.MM.YYYY"]
		timezone => "Europe/Ljubljana"
		target => "lastProlongDate"
	}
	date {
		locale => "en"
		match => ["reservationDate", "dd.MM.YYYY"]
		timezone => "Europe/Ljubljana"
		target => "reservationDate"
	}
}
output {
	elasticsearch {
        hosts => [ "localhost:9200" ]
        index => "transakcije"
		document_type => "log_transakcije"
    }
#	stdout { codec => rubydebug }
}

Eager_global_ordinals
(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.