Hey everyone!
I am quite new to Elasticsearch so feel free to criticize my work method. I am currently working on a project in which multiple pdf documents (a series of court decisions) need to be searched. The PDF documents are ingested into Elasticsearch via FSCrawler (thank you daadonet for this amazing program). For all the new documents that are created, I want to make sure that they have certain properties. In my case, each decision has e.g. a CourtInstance, DecissionDate, etc. At the moment of ingesting the data via fscrawler I would like to set these values by default to null, so that they can, later on, be adapted based on a query. To test things out I created a test query:
POST _reindex
{
"source": {
"index": "courtdecissions"
},
"dest": {
"index": "testindex"
}
}
Then I added the following mapping:
PUT /testindex/_mapping
{
"properties":{
"CourtCategory":{
"type": "keyword",
"null_value": "null"
},
"CourtInstance": {
"type": "keyword",
"null_value": "NULL"
},
"DecissionDate": {
"type": "date",
"null_value": "NULL"
},
"DecissionNumber":{
"type": "keyword",
"null_value": "NULL"
},
"CaseNumber":{
"type": "keyword",
"null_value": "NULL"
},
"References":{
"type": "text"
},
"Summaries":{
"type": "text"
},
"Comments": {
"type": "text"
}
}
}
This seems to work fine since I can find the updated mapping via "GET /testindex/_mapping"
Afterward, I reindex using the POST _reindex mentioned above.
However, when I query for a null value, nothing returns:
GET /testindex/_search
{
"query": {
"term": {
"CaseNumber": "NULL"
}
}
}
Any suggestions or ideas? Thank you in advance.