No errors but nothing gets indexed when using the Bulk API

Hi

I have written a script to read a list ("dummy") and index it into Elasticsearch. I converted the list into a list of dictionaries and used the "Bulk" API to index it into Elasticsearch.

There are no errors but nothing gets indexed.
I have checked if the mapping was correct too.

THE MAPPING

THE MESSAGE AFTER EXECUTING THE SCRIPT
image

NOTHING GETS INDEXED

THE SCRIPT

import elasticsearch6  ###elasticsearch
# use the elasticsearch client's helpers class for _bulk API
from elasticsearch6 import Elasticsearch, helpers
import datetime
import re




##declare a client instance of the Python Elasticsearch library
ES_DEV_HOST = "http://localhost:9200/"
INDEX_NAME = "coral_ia" #name of index
DOC_TYPE = 'coral_edge'  #type of data



dummy = ['labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 23.1\n', 'Time(ms): 5.7\n', '\n', '\n', 'Inference: corkscrew, bottle screw\n', 'Score: 0.03125 \n', '\n', 'TPU_temp(°C): 57.05\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 29.3\n', 'Time(ms): 10.8\n', '\n', '\n', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk\n", 'Score: 0.09375 \n', '\n', 'TPU_temp(°C): 56.8\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 45.6\n', 'Time(ms): 31.0\n', '\n', '\n', 'Inference: pick, plectrum, plectron\n', 'Score: 0.09766 \n', '\n', 'TPU_temp(°C): 57.55\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v3_299_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 68.8\n', 'Time(ms): 51.3\n', '\n', '\n', 'Inference: ringlet, ringlet butterfly\n', 'Score: 0.48047 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v4_299_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 121.8\n', 'Time(ms): 101.2\n', '\n', '\n', 'Inference: admiral\n', 'Score: 0.59375 \n', '\n', 'TPU_temp(°C): 57.05\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v2_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 34.3\n', 'Time(ms): 16.6\n', '\n', '\n', 'Inference: lycaenid, lycaenid butterfly\n', 'Score: 0.41406 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 14.4\n', 'Time(ms): 3.3\n', '\n', '\n', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea\n', 'Score: 0.36328 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 14.5\n', 'Time(ms): 3.0\n', '\n', '\n', 'Inference: bow tie, bow-tie, bowtie\n', 'Score: 0.33984 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v1_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 21.2\n', 'Time(ms): 3.6\n', '\n', '\n', 'Inference: pick, plectrum, plectron\n', 'Score: 0.17578 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n']


regex = re.compile(r'(\w+)\((.+)\):\s(.*)|(\w+:)\s(.*)')
match_regex = list(filter(regex.match, dummy))
match = [line.rstrip('\n') for line in match_regex]   #quita los saltos de linea
#print("match list", match, "\n")



groups = [{}]

for line in match:
    key, value = line.split(": ", 1)
    if key == "labels":
        if groups[-1]:
            groups.append({})
    groups[-1][key] = value.strip()


"""
Initialize Elasticsearch by server's IP'
"""
def initialize_elasticsearch():
    n = 0
    while n <= 10:
        try:
            es = Elasticsearch(ES_DEV_HOST)
            print("Initializing Elasticsearch...")
            return es
        except elasticsearch6.exceptions.ConnectionTimeout as e:  ###elasticsearch
            print(e)
            n += 1
            continue
    raise Exception



"""
Create an index in Elasticsearch if one isn't already there
"""
def initialize_mapping(es):
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            #'type': {'type':'keyword'},
            'labels': {'type': 'keyword'},
            'Model': {'type': 'keyword'},
            'Image': {'type': 'keyword'},
            'Time(ms)': {'type': 'short'},
            'Inference': {'type': 'text'},
            'Score': {'type': 'short'},
            'TPU_temp(°C)': {'type': 'short'}
        }
    }
    print("Initializing the mapping ...")  
    if not es.indices.exists(INDEX_NAME):
        es.indices.create(INDEX_NAME)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=INDEX_NAME)
        



def generate_actions():
    return[{
        '_index': INDEX_NAME,
        '@timestamp': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
        '_type': DOC_TYPE,
        '_source': doc
        } for doc in groups]
    
    print("Generating actions ...")   
   



def main():
    es=initialize_elasticsearch()
    initialize_mapping(es)    

    try:
        res=helpers.bulk(client=es, index = INDEX_NAME, actions = generate_actions())
        print ("\nhelpers.bulk() RESPONSE:", res)
        print ("RESPONSE TYPE:", type(res))
        
    except Exception as err:
        print("\nhelpers.bulk() ERROR:", err)


if __name__ == "__main__":
    main()

What's the result when you run GET coral_ia/_search in Console?

PS: Please paste the result as code and not as a screenshot, since it's much easier for us to work with and for others to find.

This is the result that I get when using: GET coral_ia/_search
By they way, I am using Elasticsearch 6.x.y

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "6v31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "efficientnet-edgetpu-S_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "5.7",
          "Inference": "corkscrew, bottle screw",
          "Score": "0.03125",
          "TPU_temp(°C)": "57.05"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "7_31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "inception_v2_224_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "16.6",
          "Inference": "lycaenid, lycaenid butterfly",
          "Score": "0.41406",
          "TPU_temp(°C)": "57.3"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "8f31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "mobilenet_v1_1.0_224_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "3.0",
          "Inference": "bow tie, bow-tie, bowtie",
          "Score": "0.33984",
          "TPU_temp(°C)": "57.3"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "7f31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "inception_v3_299_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "51.3",
          "Inference": "ringlet, ringlet butterfly",
          "Score": "0.48047",
          "TPU_temp(°C)": "57.3"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "8v31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "inception_v1_224_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "3.6",
          "Inference": "pick, plectrum, plectron",
          "Score": "0.17578",
          "TPU_temp(°C)": "57.3"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "6_31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "efficientnet-edgetpu-M_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "10.8",
          "Inference": "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
          "Score": "0.09375",
          "TPU_temp(°C)": "56.8"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "7P31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "efficientnet-edgetpu-L_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "31.0",
          "Inference": "pick, plectrum, plectron",
          "Score": "0.09766",
          "TPU_temp(°C)": "57.55"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "7v31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "inception_v4_299_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "101.2",
          "Inference": "admiral",
          "Score": "0.59375",
          "TPU_temp(°C)": "57.05"
        }
      },
      {
        "_index": "coral_ia",
        "_type": "coral_edge",
        "_id": "8P31M3MBt1HfpHGwlrAa",
        "_score": 1,
        "_source": {
          "labels": "imagenet_labels.txt",
          "Model": "mobilenet_v2_1.0_224_quant_edgetpu.tflite",
          "Image": "insect.jpg",
          "Time(ms)": "3.3",
          "Inference": "leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea",
          "Score": "0.36328",
          "TPU_temp(°C)": "57.3"
        }
      }
    ]
  }
}

I also published this question in stackoverflow. In case you need more details: https://stackoverflow.com/questions/62778983/compressor-detection-can-only-be-called-on-some-xcontent-bytes-or-compressed-xc/62779491?noredirect=1#comment111024595_62779491

None of these documents have an @timestamp field, so Discover can't show anything with your time filter.

ok. So, is the problem in my initialize_mapping or in generate_actions?

def initialize_mapping(es):
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            #'type': {'type':'keyword'},
            'labels': {'type': 'keyword'},
            'Model': {'type': 'keyword'},
            'Image': {'type': 'keyword'},
            'Time(ms)': {'type': 'short'},
            'Inference': {'type': 'text'},
            'Score': {'type': 'short'},
            'TPU_temp(°C)': {'type': 'short'}
        }
    }
    print("Initializing the mapping ...")  
    if not es.indices.exists(INDEX_NAME):
        es.indices.create(INDEX_NAME)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=INDEX_NAME)
        



def generate_actions():
    return[{
        '_index': INDEX_NAME,
        '@timestamp': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
        '_type': DOC_TYPE,
        '_source': doc
        } for doc in groups]
    
    print("Generating actions ...")   

The mapping should be fine, but the @timestamp value just doesn't contain any value. Maybe your time parsing already fails?

Any suggestions on what should I use instead?
I tested it before, and it used to work. But I never tested it when using the Bulk API

I would log the bulk request before you send it to check the content and also to try to run it manually.

I have changed it to datetime.now()
But the problem persists. Maybe is my function generate_actions

I think it is my function generate_actions.

How can I rewrite it? I used to use this one:

def yield_docs():
    doc = {
        '_index': INDEX_NAME,
        '_type': DOC_TYPE,
        '_source': {'data': groups}
        }
    yield doc

The problem there is that I am sending an array instead of single objects. So I changed it. And added the "timestamp"

It is the "timestamp". I removed it from the initialize_mapping and the generate_actions, and the data appears in Discover. But without the timestamp and no longer as a nested array.

Any suggestions?

The problem was the timestamp It should go inside the FOR loop.

def generate_actions():
    return[{
        '_index': INDEX_NAME,
        '_type': DOC_TYPE,
        '_source': {
            "data": doc,
            "@timestamp": str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),}
        }
        for doc in groups]

The problem is that I have 2 mappings now. BULK makes its own.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.