Text analysis


(Priyasmini Sahoo) #1

I have an index in Elasticsearch. It has fields with string values. Can I tokenize them? or Can I do text analysis in that index?


(David Pilato) #2

It's done by default at index and search time.
What do you mean? What is your need exactly?


(Priyasmini Sahoo) #3

I wanted to do sentiment analysis of Twitter data. I tried to do standard tokenization by typing in dev tools like this:-
PUT /priya_twitter_pok
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
but error is coming like this:-
{
"error": {
"root_cause": [
{
"type": "resource_already_exists_exception",
"reason": "index [priya_twitter_pok/iyshYCcuSA6CQ_OsKEShHg] already exists",
"index_uuid": "iyshYCcuSA6CQ_OsKEShHg",
"index": "priya_twitter_pok"
}
],
"type": "resource_already_exists_exception",
"reason": "index [priya_twitter_pok/iyshYCcuSA6CQ_OsKEShHg] already exists",
"index_uuid": "iyshYCcuSA6CQ_OsKEShHg",
"index": "priya_twitter_pok"
},
"status": 400
}


(David Pilato) #4

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Add:

DELETE /priya_twitter_pok

At the beginning at your script.


(Priyasmini Sahoo) #5

Sorry for the bad formatting. I have written the query in Dev Tools as follows:-

        PUT /priya_twitter_pok
        {
          "settings": {
            "number_of_shards": 1,
            "analysis": {
              "filter": {
                "autocomplete_filter": {
                  "type": "edge_ngram",
                  "min_gram": 1,
                  "max_gram": 20
                }
              },
              "analyzer": {
                "autocomplete": {
                  "type": "custom",
                  "tokenizer": "standard",
                  "filter": [
                    "lowercase",
                    "autocomplete_filter"
                  ]
                }
              }
            }
          }
        }
    `

After executing the above query error is coming as follows:-

{
  "error": {
    "root_cause": [
      {
        "type": "resource_already_exists_exception",
        "reason": "index [priya_twitter_pok/iyshYCcuSA6CQ_OsKEShHg] already exists",
        "index_uuid": "iyshYCcuSA6CQ_OsKEShHg",
        "index": "priya_twitter_pok"
      }
    ],
    "type": "resource_already_exists_exception",
    "reason": "index [priya_twitter_pok/iyshYCcuSA6CQ_OsKEShHg] already exists",
    "index_uuid": "iyshYCcuSA6CQ_OsKEShHg",
    "index": "priya_twitter_pok"
  },
  "status": 400
}

And can you tell me why I would have to DELETE my index before beginning my script?


(David Pilato) #6

Because your index priya_twitter_pok already exists.


(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.