Mysql data mapping for Elasticseach


(Hilal) #1

Hi,
I am in the process of converting my queries from Mysql to Elasticsearch, and would like to know if my mapping makes sense.

I've identified 4 common types of data that I want to store.

  1. tip : "tip" is my enum field name in mysql db. For exampe, tip=M, tip=X or tip=D.

  2. usul : "usul" is my enum field name in mysql db.
    For example, usul=A, usul=P, usul=M or usul=K.

  3. maliyet : "maliyet" is my char field name in mysql db.
    For example, maliyet=1, maliyet=2, maliyet=X or maliyet=0.

  4. islem : "islem" is my varchar field name in mysql db.
    For example, islem=Y, islem=YD, islem=YDDD, islem=YI or islem=YSS.

Mappings:

1. tip - {type: string, index:not_analyzed} 

2. usul - {type: string, index:not_analyzed} 

3. maliyet - {type: string, index:not_analyzed} 

4. islem - {type: nested, index:not_analyzed} 

However, my serach results is not correct. I want to do mapping. How can I do mapping these fields?


(Thomas Decaux) #2

Hello, you create the index (like a mysql data-base) first:

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#mappings

With your mapping, then insert documents.


(Hilal) #3

I have a index called "ihaleler". My question is this not

I indexed 10000 datas but these fields ('tip', 'usul', 'maliyet', 'islem') mapping is not correct. So my search results is not correct.

What should be the type of these fields for mapping?

PUT ihaleler/ihale/_mapping
{
  "properties": {
    "tip" : {
      "type" : "?"
        , "index": "not_analyzed"
      }
  }
}

(David Pilato) #4

I think your mapping is correct and using not_analyzed is the way to go in such a case.

But it's hard to tell why your query fails without seeing the query.

Remember that not_analyzed will not lowercase your term. So if you indexed YSS, searching for yss won't work.


(Hilal) #5

Ok. I did "usul", "tip" and "maliyet" fields. But I have a problem about "islem". It differs from others.
My query :

http://10.0.2.15:8080/filter.php?islem=YDDD

My result:

found = 2
->YDDD
->YDDD

It works true actually but I want to be YD instead of YDDD

Because YDDDDD=YDDDD=YDDD=YD. So it have to be unique array.

Mysql db screenshot:

CODES:

"islem" mapping on sense editor :

PUT ihaleler/ihale/_mapping
{
  "properties": {
    "islem": {
      "type": "string",
      "index": "analyzed",
      "analyzer": "turkish"
    }
  }
}

indexed codes on php (index.php):

....
$islem=$ihale['islem'];
            $islem_array = array();
               $islem_array[] = 'Y';
               //$islem=$ihale['islem']; 
               for($i=0;$i<strlen($islem);$i++){
                   $islem_array[]=substr($islem,0,1);
               }
            $islem=array_unique($islem_array);    
   
$dizi['islem']=$ihale['islem'];

filter codes on php(filter.php) :

 unset($filter_islem);
    if(isset($_GET['islem'])){
    $islemler=$_GET['islem'];
    $islemler=trim($islemler, ',');
    $islemler_array=explode(',', $islemler);
    $islem=array_unique($islemler_array);
    $filter_islem['match_phrase']['islem']=$_GET['islem'];
    }
if(is_array($filter_islem)){
    $searchParams['body']['query']['bool']['should']=$filter_islem;
} 

How can I do? Is the mapping must be a nested instead of string?


(David Pilato) #6

If you want to run a query like "starts with", you could use edge n grams for that.


(Hilal) #7

no it is not.

Y: Yeni (English meanings new)
D: Düzeltme (English meanings correction)

Y: Yeni(New)
YD: Yeni Düzeltme(New Correction)
YDD: Yeni Düzeltme Düzeltme(New Correction Correction)
YDDD: Yeni Düzeltme Düzeltme Düzeltme(New Correction Correction Correction) 
...etc.

When I query YDDD It must come just YD. (because YD=YDDD)


(David Pilato) #8

Sorry. I don't understand.

Can you reproduce your case with a small example?

  • create a new index
  • create a type with a mapping and a single field
  • index some documents using this single field
  • query

(Hilal) #9

1.I have mysql datase and my field name is islem(english meaning process).
2.I created index on sense editor:

DELETE ihaleler
PUT ihaleler

3.I did mapping for my field "islem"

PUT ihaleler/ihale/_mapping
{
  "properties": {
    "islem": {
      "type": "string",
      "index": "analyzed",
      "analyzer": "turkish"
    }
  }
}

4.I indexed data on php (index.php):
I made array of process variable in here. After I did unique this array.

....
$islem=$ihale['islem'];
            $islem_array = array();
               $islem_array[] = 'Y';
               //$islem=$ihale['islem']; 
               for($i=0;$i<strlen($islem);$i++){
                   $islem_array[]=substr($islem,0,1);
               }
            $islem=array_unique($islem_array);    
$dizi['islem']=$ihale['islem'];

5.Filtered indexing data on php(filter.php) :

 unset($filter_islem);
    if(isset($_GET['islem'])){
    $islemler=$_GET['islem'];
    $islemler=trim($islemler, ',');
    $islemler_array=explode(',', $islemler);
    $islem=array_unique($islemler_array);
    $filter_islem['match_phrase']['islem']=$_GET['islem'];
    }
if(is_array($filter_islem)){
    $searchParams['body']['query']['bool']['should']=$filter_islem;
}

6.And query :

http://10.0.2.15:8080/filter.php?islem=YDDD

I want to be YD of this query result .
Because in the future it could be very long.

For example:
like YDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD


(David Pilato) #10

Where is the edge n gram in your mapping?


(tri-man) #11

I think you can do ?islem=YDD* when searching.

In stead of using your filter.php, try to verify the search results from ES's REST interface to make sure it works the way you want... then review the filter.php module.

Here is the URL you can try...
http://[es host:es port]/[index name]/_search?q=islem:YDD*

Here is the link for further info
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html


(Hilal) #12

edge n gram :

PUT /ihaleler
    {
        "settings" : {
            "analysis" : {
                "analyzer" : {
                    "my_edge_ngram_analyzer" : {
                        "tokenizer" : "my_edge_ngram_tokenizer"
                    }
                },
                "tokenizer" : {
                    "my_edge_ngram_tokenizer" : {
                        "type" : "edgeNGram",
                        "min_gram" : "0",
                        "max_gram" : "1",
                        "token_chars": [ "letter" ]
                    }
                }
            }
        }
    }

(Hilal) #13

NO thn tri-man.
You didn't understand me.

when I do query http://10.0.2.15:8080/filtreleme.php?islem=YDDDDDD

my result shoulnd't be islem=YDDDDDD .
It should be islem=YD. Five of the D here is unnecessary


(tri-man) #14

Got it now... use the edgeNGram as suggested by @dadoonet is the right approach. I would change "min_gram" to 1 and "max_gram" to 2 based on your example (ie. Y, YD, YDD, etc.)


(Hilal) #15

I did but result is the same.
My query :

http://10.0.2.15:8080/filter.php?islem=YDDD

My result:

found = 2
->YDDD
->YDDD

Actually it should be :

    found = 2
    ->YD
    ->YD

(tri-man) #16

Oh, are you expecting to see the results back as YD, not YDDD (where YDDD is the actual value)?


(Hilal) #17

yes. My islem datasY,D,I,S

for example :

Y
YD
YI
YDI
YDDDD
YS
......

example query and expecting result:

if I do query Y, my result must be Y.
if I do query YD, my result must be YD.
if I do query YI, my result must be YI.
if I do query YDDDDD, my result must be YD.
if I do query YDD, my result must be YD....

They must be unique array.


(tri-man) #18

The Edge NGram tokenizer or the "my_edge_ngram_analyzer" tells ES how to tokenize a value before indexing the information. When the data comes in as Y, YD, YDD, YDDD, YDDDD, etc. ES will index Y and YD as tokens, it won't index YDD, YDDD, etc b/c of this custom analyzer. Also, it won't change the original values either. When you search with YD, YDD, YDDD, etc.. ES will be able to serve the request and the returned results will be in its original string as you have seen. If you only wants YD as you said, you might need an extra step to turn the output value to the ngram value that you prefer to see.

This link might have what you are looking for

Since you are using PHP, I'm assuming you are using ES REST web services for this kind of search, you can check out this URL to product the tokens you want

http://[es host]:[es port]/[index name]/_analyze?analyzer=my_edge_ngram_analyzer&text=YDDD


(Hilal) #19

Thank you. I tried.

http://localhost:9200/ihaleler/_analyze?analyzer=my_edge_ngram_analyzer&text=YDDD

result:

{"tokens":[{"token":"Y","start_offset":0,"end_offset":1,"type":"word","position":0},{"token":"YD","start_offset":0,"end_offset":2,"type":"word","position":1}]}


(Hilal) #20

I have another idea.

I did nested mapping this field.

PUT ihaleler/ihale/_mapping
{ "properties": {
    "islem": {
      "type": "nested"
    }}} 

result:

{
  "acknowledged": true
}

After I did unique array in php code. I indexed with my php codes:

....
    $islem=$ihale['islem'];
                $islem_array = array();
                   $islem_array[] = 'Y';
                   //$islem=$ihale['islem']; // abcdef
                   for($i=0;$i<strlen($islem);$i++){
                       $islem_array[]=substr($islem,0,1);
                   }
                $islem=array_unique($islem_array);    
    $dizi['islem']=$ihale['islem'];

but it has failed:

Fatal error:  Uncaught exception 
'Guzzle\Http\Exception\ClientErrorResponseException' with message  'Client error response[status code] 400  [reason phrase] Bad Request [url] http://localhost:9200/ihaleler/ihale/1030235' in/home/admin web/server.com/public_html/vendor/guzzle/http/Guzzle/Http/Exception/BadResponseException.php:43 Stack trace: #0 /home/adminweb/server.com/public_html/vendor/guzzle/http/Guzzle/Http/Message/Request.php(145): Guzzle\HttpExceptionBadResponseException::factory(Object(Guzzle\Http\Message\EntityEnclosingRequest),Object(Guzzle\Http\Message\Response))#1 [internal function]:Guzzle\Http\MessageRequest::onRequestError(Object(Guzzle\Common\Event),
     'request.error',Object(Symfony\Component\EventDispatcher\EventDispatcher))#2 /home/admin/web/server.com/public_html/vendor/symfony/event-dispatcher/Symfony/Component/EventDispatcher/EventDispatcher.php(164):call_user_func(Array, Object(Guzzle\Common\Event), 'request.error',Object(Symfony\Component\EventDispatcher\EventDispatcher))#3 /home/admin/web/ in /home/adminweb/server.com/public_html/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Connections/GuzzleConnection.php on line 266