Need help with Performance and Storage Factors

(Sidra Farooq) #1

Hi ..

i am new to elasticsearch and have some basic knowledge about it and at this point i dont know what and how performance will be altered with the choices i will make for using Elasticsearch. i came across some questions and hoping to get answers ..

  1. For AWS, it provides built-in 5.5 ES should i opt for it because 5.6 is the latest and what other options i need to consider when installing ES on AWS.

  2. How can i analyze my DB requirements in ES .. if i have 800GB+ of data how much space i will be needing in case of ES.

  3. Logistash is used to get data from MYSQL into ES .. so for live server are we still going to use Logistash, if so then what is the proper way to transfer data from our MYSQL to ES.

  4. _bulk update is faster or one by one update . like when ever an order is creating it will sync with ES .. So one by one updation and bulk updation what is better suited in case of performance.

  5. To search part of words ngram tokenizer is used as per i research if there any other way?

  6. should we use default mappings or custom code mappings for our fields? and how much space these mappings will be taking.

  7. what other factors i need to consider when it comes to improving performance in terms of ES.

  8. how much knowledge should i be having to fully run ES without any errors?

Thank you ..

(Mark Walkom) #2
  1. The only real choice is to use which provides the latest releases and without any limitations to Elasticsearch
  2. Depends, what sort of data is it. What sort of analysis are you going to apply (ie the mapping)
  3. Logstash :slight_smile:
  4. Bulk, always!
  5. Depends what problem you wan tot solve here
  6. Default mappings are always better because you are explicit. How much spaces is a question you need to test
  7. What problems are you having?
  8. What errors are you having?

(Sidra Farooq) #3

basically i want to search parts of words like if i have world and i type orld in search then it should show me all the values matching orld .. basically i need it to search product names, asin etc so which way is the best way to solve this in elasticsearch ..

(Mark Walkom) #4

ngrams are, but they can be expensive.

(Sidra Farooq) #5

what you mean by expensive?

(Mark Walkom) #6


The ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length.

Have a read of the rest of that page, it runs through an example :slight_smile:

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.