How to define content type


(Rohit Kshirsagar) #1

Hello !
Greetings !

We are newbies in the Elasticsearch and we are currently fumbling over the following issue.

Objective : We are trying to fetch the json formatted files from a particular website and then use Elasticsearch to store and later search these files. These files will be dynamically generated with a fixed frequency. We have a Python script to call the files from the desired web site.

Issue: as we need the called files in the json files , we are trying to define the content Type in the python script but we are unable to do so.
Following is the code in Pythn , could you please give us some feedback on how to define the content type here ?

import elasticsearch
import simplejson as json
import requests
from elasticsearch import helpers
from bs4 import BeautifulSoup
import pandas as pd
from stringify import stringify_py

es = elasticsearch.Elasticsearch([{'host':'127.0.0.1', 'port':9200}])
res = requests.get('http://127.0.0.1:9200')
print(res.content)

r = requests.get("https://coinmarketcap.com/currencies/ripple/#markets")

soup = BeautifulSoup(r.content, "lxml")
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
data = df[0].to_json(orient = 'records')

es.index(index='test', doc_type='people', id=1, body=data)

Thanks ,
Rohit


(Mujtaba Hussain) #2

Do you mean when you call the remote website for the JSON files or when you index said JSON into ES? For the former, you will have to look up the docs for requests module I assume.

For the latter, what does the ES python SDK doc tell you?


(Rohit Kshirsagar) #3

Hello,
Thank you for the response.

I would like to elaborate the exact error we are getting , this is occurring while indexing the JSON file.
Please refer the following which has a error message.

It would be a great help to us , if you can provide input here.

Error

b'{\n "name" : "b9as1yU",\n "cluster_name" : "elasticsearch",\n "cluster_uuid" : "5KaX8y0NSAKKUrukl4PF0w",\n "version" : {\n "number" : "6.0.1",\n "build_hash" : "601be4a",\n "build_date" : "2017-12-04T09:29:09.525Z",\n "build_snapshot" : false,\n "lucene_version" : "7.0.1",\n "minimum_wire_compatibility_version" : "5.6.0",\n "minimum_index_compatibility_version" : "5.0.0"\n },\n "tagline" : "You Know, for Search"\n}\n'

PUT /test/people/1 [status:406 request:0.013s]
Traceback (most recent call last):
File "C:\Users\吳倫彰\Desktop\BlockChain Partners internship\Project 2\EStest.py", line 73, in
es.index(index='test', doc_type='people', id=1, body=data)
File "C:\Users\吳倫彰\AppData\Local\Programs\Python\Python36-32\lib\site-packages\elasticsearch\client\utils.py", line 69, in wrapped
return func(*args, params=params, **kwargs)
File "C:\Users\吳倫彰\AppData\Local\Programs\Python\Python36-32\lib\site-packages\elasticsearch\client_init
.py", line 263, in index
_make_path(index, doc_type, id), params=params, body=body)
File "C:\Users\吳倫彰\AppData\Local\Programs\Python\Python36-32\lib\site-packages\elasticsearch\transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "C:\Users\吳倫彰\AppData\Local\Programs\Python\Python36-32\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 93, in perform_request
self._raise_error(response.status, raw_data)
File "C:\Users\吳倫彰\AppData\Local\Programs\Python\Python36-32\lib\site-packages\elasticsearch\connection\base.py", line 105, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(406, 'Content-Type header [] is not supported')


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.