Creating a custom id in a mapping


#1

Hello,

I'm trying to give my indexed documents a custom id. So far no luck. This is a sample from my dataset:

_DESCRIPTIONS_ | _VALUES_
ID             | WAUX87
Model          | Avensis

Each document has an ID in its data and I want the corresponding value (in this case WAUX87) being used as the _id in Elasticsearch. I'm trying to do that in my mapping:

PUT pdf_docs
{ "settings": {
  "number_of_shards": 12,
  "number_of_replicas": 2
},
"mappings": {
"pdf_docs": {
  "_id" : {
            "path" : "ID"
        },
  "properties": {
    "DESCRIPTIONS": {
      "type": "string",
      "index": "not_analyzed"
    },
    "VALUES": {
      "type": "string",
      "index": "not_analyzed"
    },
    "COLUMN-3": {
      "type": "string",
      "index": "not_analyzed"
    },
    "LUMN-4": {
      "type": "string",
      "index": "not_analyzed"
    }
  }}}}

This mapping above will give me an error: _id is not configurable
I've Googled and tried searching in the official docs but I can't figure out how to do this. Any ideas?


(Yannick Welsch) #2

Have a look here: https://www.elastic.co/blog/great-mapping-refactoring#meta-fields

you need to explicitly pass the _id for your index requests.


#3

Ok thanks! Is it possible to write something in Logstash for it? I have thousands of csv-files to index which I would like to have a custom id.


(Yannick Welsch) #4

The Logstash Elasticsearch output has the document_id property which can be configured to the ID field.


#5

Thanks for your answer. I've played around with the document_id in my logstash conf but I don't get the right results yet. This is my conf:

input
{
    file
    {
        path => "/home/DSAdmin/pdf-1.csv"
        start_position => "beginning"
        ignore_older => 0
        sincedb_path => "/dev/null"
     }
}

filter {
csv {
columns => ["DESCRIPTIONS", "VALUES", "DATE"]
separator => ','
}
}

output
{
    elasticsearch {
        action => "index"
        hosts => "localhost"
        index => "pdf"
        document_id => "%{?}"
        workers => 8
    }
    stdout {}
}

With the dataset I have I don't know how to work with the document_id (therfore the questionmark)

_DESCRIPTIONS_ | _VALUES_
Id             | WAUX87
Model          | Avensis

I want Logstash to look at the row featuring 'Id' and use the corresponding value in the next column as a custom id which is in this case 'WAUX87'. I hope it's possible or else I'll look for other ways.


(Yannick Welsch) #6

This is a pure Logstash question now, so maybe better ask in the Logstash forums.


#7

Ok thanks for your help :slight_smile:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.