Getting Elasticsearch exception while indexing using Python script

ravi-shanker · June 17, 2021, 6:17pm

Hello All

I am trying read, parse and index a html file using the below python script.

from elasticsearch import Elasticsearch 
from bs4 import BeautifulSoup
import glob

es=Elasticsearch([{'host':'ip-address','port':9200}])

def remove_tags(html):

        # parse html content
        soup = BeautifulSoup(html, "html.parser")

        for data in soup(['style', 'script']):
                # Remove tags
                data.decompose()

        # return data by retrieving the tag content
        return ' '.join(soup.stripped_strings)

path = 'path_of_html_file'
files=glob.glob(path)
for file in files:
   fname = open(file, 'r')
   e1 = remove_tags(fname)
   res = es.index(index='ep1',doc_type='employee',id=1,body=e1)

While executing the above script on my linux ec2, i am getting below error.

/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template
  warnings.warn(message, category=ElasticsearchWarning)
Traceback (most recent call last):
  File "readMount_Parse_Index.py", line 25, in <module>
    res = es.index(index='ep1',doc_type='emp',id=1,body=e1)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 411, in index
    body=body,
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
elasticsearch.exceptions.RequestError: RequestError(400, u'mapper_parsing_exception', u'not_x_content_exception: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes')

Can somebody help me out on this if faced the same issue before.

Thanks!

ravi-shanker · June 21, 2021, 7:19am

@Badger Hello Badger could you please help me on this issue.

spinscale · June 21, 2021, 8:35am

please refrain from pinging people directly (and then also over the weekend). This is a community forum and folks will chime in to help if there is a good reproducible use-case

It looks to me as if you are not creating a JSON object to be indexed into Elasticsearch but try to send raw text data. You need to send JSON to Elasticsearch.

system · July 19, 2021, 8:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem in connecting to Elasticsearch using Python script Elasticsearch	5	5120	July 15, 2021
Issue indexing data from Python Elasticsearch	1	611	July 5, 2017
Unable to index Html content in Ealsticsearch Elasticsearch	1	342	June 2, 2020
Help with ElasticSearchException due to data format? Elasticsearch	2	426	July 6, 2017
Error in indexing document in elasticsearch Elasticsearch	5	4251	July 6, 2017

Getting Elasticsearch exception while indexing using Python script

Related topics