How to send data post data to jobs api elasticsearch

Ryan_Boch · May 20, 2019, 7:44pm

I cannot figure out how to send json data via python for posting data to jobs. The documentation is not clear on the format the json file needs to be in. I've tried many different options, but I continue getting different errors.

Here is an example of a json doc saved as file_name.json:

[{"myid": "id1", "client": "client1", "submit_date": 1514764857},{"my_id": "id2", "client": "client_2", "submit_date": 1514764857}]

from elasticsearch import Elasticsearch
from elasticsearch.client.xpack import MlClient

es = elastic_connection()
es_ml = MlClient(es)

def post_training_data(directory='Training Data', file_name='file_name.json'):
        with open(os.path.join(directory, file_name), mode='r') as train_file:
            train_data = json.load(train_file)
            es_ml.post_data(job_id=job_id, body=train_data)

post_training_data()

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "..\train_model.py", line 218, in post_training_data
    self.es_ml.post_data(job_id=self.job_id, body=train_data)
  File "..\inc_anamoly\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "..\inc_anamoly\lib\site-packages\elasticsearch\client\xpack\ml.py", line 81, in post_data
    body=self._bulk_body(body))
AttributeError: 'MlClient' object has no attribute '_bulk_body'

richcollier · May 22, 2019, 4:26pm

You found and filed a bug: https://github.com/elastic/elasticsearch-py/issues/959

Thanks for being a great community member!

Ryan_Boch · May 23, 2019, 1:57pm

I'm still not clear on exactly how to get post data to jobs api to work via the python elasticsearch client. The documentation says to send the data as so: "A sequence of one or more JSON documents containing the data to be analyzed. Only whitespace characters are permitted in between the documents."

How is this possible in python? The json library only allows multiple json docs to be serialized in a comma separated list format.

After this fix is applied, the model accepts the documents and shows the correct number processed but the results do not seem accurate. Very few anomalies are detected even when sending 5 months of data (about 300,000 json docs).

Ryan_Boch · May 23, 2019, 2:15pm

When I sent the json docs as one string no whitespace separating I got this error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "..\train_model.py", line 289, in post_training_data
    self.es_ml.post_data(job_id=self.job_id, body=myjsons)
  File "..\inc_anamoly\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "..\inc_anamoly\lib\site-packages\elasticsearch\client\xpack\ml.py", line 81, in post_data
    body=self.client._bulk_body(body))
  File "..\inc_anamoly\lib\site-packages\elasticsearch\transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "..\inc_anamoly\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 186, in perform_request
    self._raise_error(response.status, raw_data)
  File "..\inc_anamoly\lib\site-packages\elasticsearch\connection\base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: RequestError(400, 'parse_exception', 'The input JSON data is malformed.')

blaklaybul · May 23, 2019, 6:30pm

Hi,

The _bulk_body method will serialize the data to the proper format expected by ML. After applying this fix, passing a list of dicts as the body parameter should work:

In [56]: data
Out[56]:
[{'client': 'client1', 'my_id': 'id1', 'submit_date': 1514764857},
 {'client': 'client2', 'my_id': 'id2', 'submit_date': 1514764857}]

In [57]: es.xpack.ml.open_job('test')
Out[57]: {'opened': True}

In [58]: es.xpack.ml.post_data('test', body=data)
Out[58]:
{'bucket_count': 0,
 'earliest_record_timestamp': 1514764857,
 'empty_bucket_count': 0,
 'input_bytes': 120,
 'input_field_count': 4,
 'input_record_count': 2,
 'invalid_date_count': 0,
 'job_id': 'test',
 'last_data_time': 1558635579074,
 'latest_record_timestamp': 1514764857,
 'missing_field_count': 2,
 'out_of_order_timestamp_count': 0,
 'processed_field_count': 0,
 'processed_record_count': 2,
 'sparse_bucket_count': 0}

As for the results of your job, there could be any number of reasons why you are seeing those results. What type of analysis are you trying to do? and what job configuration did you create?

Also, if you haven't already, take a look at our getting started material for creating ML jobs here.

system · June 20, 2019, 6:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to upload json file to elastic search Elasticsearch	3	919	September 5, 2021
Error Putting JSON onto Elasticsearch Elasticsearch	2	363	July 16, 2019
Put JSON into elasticsearch? Elasticsearch	5	29991	May 27, 2019
Elasticsearch Python Elasticsearch language-clients	3	1038	March 7, 2022
Unable to send json data to elastic search Elasticsearch	6	11457	September 5, 2018

How to send data post data to jobs api elasticsearch

Related topics