Get file name from index based on content search in index

Hi,
Thanks in advance

i have loaded pdf files into index,
all the data in the pdf file is treated as content in index.
now i want the file name based on the content search.

suppose if i search for football in the content, i need all the file names of where the "football" data exists.

i want to view this in kibana.

that suppose the file name was indexed, example

{
"file_name": "some_pdf_file_namepdf",
"file_path": "http://some_url_to_my_pdf_file",
"file_content": "some content related to football"
}

Hi,
i don't want to mention file name and file path in query,
what i need is, if i search with content. i need to display the file name in which that content exists.

Yes I understand your request, but as i said that suppose you already indexed a document with many fields like file name and content, then you query on file content and you have the _source ....

if i am querying on content, it is displaying all the files where "football" not exists.

i am using the below query.

GET /football_index/_search
{
"query": {
"match": {
"content": "football"
}
}
}

can you share the mapping of you index "football_index" and some of the doc returned with your query

{
"_index" : "football_index",
"_type" : "_doc",
"_id" : "55ffd32fdca4684a20c6da2e714e1976",
"_score" : 2.0674558,
"_source" : {
"content" : """Lionel Andrés Messi Cuccittini,[note 1] commonly known as Lionel Messi or Leo Messi (Spanish pronunciation: [ljoˈnel anˈdɾez ˈmesi] (About this soundlisten);[A] born 24 June 1987), is an Argentine professional footballer who plays as a forward and captains both Spanish club Barcelona and the Argentina national team. Often considered the best player in the world and widely regarded as one of the greatest players of all time, Messi has won a record six Ballon d'Or awards,[note 2] and a record six European Golden Shoes. He has spent his entire professional career with Barcelona, where he has won a club-record 34 trophies, including ten La Liga titles, four UEFA Champions League titles and six Copas del Rey. A prolific goalscorer and creative playmaker, Messi holds the records for most goals in La Liga (441), a La Liga and European league season (50), most hat-tricks in La Liga (36) and the UEFA Champions League (8), and most assists in La Liga (182) and the Copa América (12). He has scored over 700 senior career goals for club and country.

Born and raised in central Argentina, Messi relocated to Spain to join Barcelona at age 13, for whom he made his competitive debut aged 17 in October 2004. He established himself as an integral player for the club within the next three years, and in his first uninterrupted season in 2008–09 he helped Barcelona achieve the first treble in Spanish football; that year, aged 22, Messi won his first Ballon d'Or. Three successful seasons followed, with Messi winning four consecutive Ballons d'Or, making him the first player to win the award four times and in a row.[9] During the 2011–12 season, he set the La Liga and European records for most goals scored in a single season, while establishing himself as Barcelona's all-time top scorer. The following two seasons, Messi finished second for the Ballon d'Or behind Cristiano Ronaldo (his perceived career rival), before regaining his best form during the 2014–15 campaign, becoming the all-time top scorer in La Liga and leading Barcelona to a historic second treble, after which he was awarded a fifth Ballon d'Or in 2015. Messi assumed the captaincy of Barcelona in 2018, and in 2019 he secured a record sixth Ballon d'Or.

An Argentine international, Messi is his country's all-time leading goalscorer. At youth level, he won the 2005 FIFA World Youth Championship, finishing the tournament with both the Golden Ball and Golden Shoe, and an Olympic gold medal at the 2008 Summer Olympics. His style of play as a diminutive, left-footed dribbler drew comparisons with his compatriot Diego Maradona, who described Messi as his successor. After his senior debut in August 2005, Messi became the youngest Argentine to play and score in a FIFA World Cup during the 2006 edition, and reached the final of the 2007 Copa América, where he was named young player of the tournament. As the squad's captain from August 2011, he led Argentina to three consecutive finals: the 2014 FIFA World Cup, for which he won the Golden Ball, and the 2015 and 2016 Copas América. After announcing his international retirement in 2016, he reversed his decision and led his country to qualification for the 2018 FIFA World Cup, and a third-place finish at the 2019 Copa América.

One of the most famous athletes in the world, Messi has been sponsored by sportswear company Adidas since 2006 and has established himself as their leading brand endorser. According to France Football, he was the world's highest-paid footballer for five years out of six between 2009 and 2014, and was ranked the world's highest-paid athlete by Forbes in 2019. Messi was among Time's 100 most influential people in the world in 2011 and 2012. In 2020, he was awarded the Laureus World Sportsman of the Year."""
,
"meta" : { },
"file" : {
"extension" : "txt",
"content_type" : "text/plain; charset=UTF-8",
"created" : "2020-06-19T11:06:35.436+00:00",
"last_modified" : "2020-04-09T11:14:34.320+00:00",
"last_accessed" : "2020-06-19T11:06:35.436+00:00",
"indexing_date" : "2020-06-19T11:25:01.640+00:00",
"filesize" : 9781,
"filename" : "Messi.txt",
"url" : """file://D:\football_data\Messi.txt"""
},
"path" : {
"root" : "4bd37c1058a1860e1386c249e48da2",
"virtual" : "/Messi.txt",
"real" : """D:\football_data\Messi.txt"""
}
}
}

Hi Yassine LASRI
my index mapping

{
"football_index" : {
"mappings" : {
"dynamic_templates" : [
{
"raw_as_text" : {
"path_match" : "meta.raw.*",
"mapping" : {
"fields" : {
"keyword" : {
"ignore_above" : 256,
"type" : "keyword"
}
},
"type" : "text"
}
}
}
],
"properties" : {
"attachment" : {
"type" : "binary"
},
"attributes" : {
"properties" : {
"group" : {
"type" : "keyword"
},
"owner" : {
"type" : "keyword"
}
}
},
"content" : {
"type" : "text"
},
"file" : {
"properties" : {
"checksum" : {
"type" : "keyword"
},
"content_type" : {
"type" : "keyword"
},
"created" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"extension" : {
"type" : "keyword"
},
"filename" : {
"type" : "keyword",
"store" : true
},
"filesize" : {
"type" : "long"
},
"indexed_chars" : {
"type" : "long"
},
"indexing_date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"last_accessed" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"last_modified" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"url" : {
"type" : "keyword",
"index" : false
}
}
},
"meta" : {
"properties" : {
"altitude" : {
"type" : "text"
},
"author" : {
"type" : "text"
},
"comments" : {
"type" : "text"
},
"contributor" : {
"type" : "text"
},
"coverage" : {
"type" : "text"
},
"created" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"creator_tool" : {
"type" : "keyword"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"description" : {
"type" : "text"
},
"format" : {
"type" : "text"
},
"identifier" : {
"type" : "text"
},
"keywords" : {
"type" : "text"
},
"language" : {
"type" : "keyword"
},
"latitude" : {
"type" : "text"
},
"longitude" : {
"type" : "text"
},
"metadata_date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"modifier" : {
"type" : "text"
},
"print_date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"publisher" : {
"type" : "text"
},
"rating" : {
"type" : "byte"
},
"relation" : {
"type" : "text"
},
"rights" : {
"type" : "text"
},
"source" : {
"type" : "text"
},
"title" : {
"type" : "text"
},
"type" : {
"type" : "text"
}
}
},
"path" : {
"properties" : {
"real" : {
"type" : "keyword",
"fields" : {
"fulltext" : {
"type" : "text"
},
"tree" : {
"type" : "text",
"analyzer" : "fscrawler_path",
"fielddata" : true
}
}
},
"root" : {
"type" : "keyword"
},
"virtual" : {
"type" : "keyword",
"fields" : {
"fulltext" : {
"type" : "text"
},
"tree" : {
"type" : "text",
"analyzer" : "fscrawler_path",
"fielddata" : true
}
}
}
}
}
}
}
}
}

I don't see the problem, the doc contain the token "football"

GET football-index/_search
{
  "query": {
    "match": {
      "content": "football"
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

when i am using this query,
It is also displaying the files and data in index which "football" doesn't exists.

GET /football_index/_search
{
"query": {
"match": {
"content": "football"
}
}
}

You are using default analyser
so token like footballer, Football, will still match

@ylasri is there any option,if i only want the exact match of "football" data.

i am new to elastic search, that is why it is more complex to me to write queries.

The document example you provide, there are 3 matches :

football
Football
footballer

Because elasticsearch use the default "english" analyser for text fields, this analyser will lowercase text, do some stemming, and tokenize on whitespace ... etc

o change that beahavior, you need to add your custom analyser, example you want only to tokenize on whietspaces and lowercase the token

PUT football-index
{
  "settings": {
     "analysis": {
      "analyzer": {
        "content_analyzer": {
          "type": "custom", 
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }, 
  "mappings": {
    "dynamic_templates": [
      {
        "raw_as_text": {
          "path_match": "meta.raw.*",
          "mapping": {
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            },
            "type": "text"
          }
        }
      }
    ],
    "properties": {
      "attachment": {
        "type": "binary"
      },
      "attributes": {
        "properties": {
          "group": {
            "type": "keyword"
          },
          "owner": {
            "type": "keyword"
          }
        }
      },
      "content": {
        "type": "text",
        "analyzer": "content_analyzer"
      },
      "file": {
        "properties": {
          "checksum": {
            "type": "keyword"
          },
          "content_type": {
            "type": "keyword"
          },
          "created": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "extension": {
            "type": "keyword"
          },
          "filename": {
            "type": "keyword",
            "store": true
          },
          "filesize": {
            "type": "long"
          },
          "indexed_chars": {
            "type": "long"
          },
          "indexing_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "last_accessed": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "last_modified": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "url": {
            "type": "keyword",
            "index": false
          }
        }
      },
      "meta": {
        "properties": {
          "altitude": {
            "type": "text"
          },
          "author": {
            "type": "text"
          },
          "comments": {
            "type": "text"
          },
          "contributor": {
            "type": "text"
          },
          "coverage": {
            "type": "text"
          },
          "created": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "creator_tool": {
            "type": "keyword"
          },
          "date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "description": {
            "type": "text"
          },
          "format": {
            "type": "text"
          },
          "identifier": {
            "type": "text"
          },
          "keywords": {
            "type": "text"
          },
          "language": {
            "type": "keyword"
          },
          "latitude": {
            "type": "text"
          },
          "longitude": {
            "type": "text"
          },
          "metadata_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "modifier": {
            "type": "text"
          },
          "print_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "publisher": {
            "type": "text"
          },
          "rating": {
            "type": "byte"
          },
          "relation": {
            "type": "text"
          },
          "rights": {
            "type": "text"
          },
          "source": {
            "type": "text"
          },
          "title": {
            "type": "text"
          },
          "type": {
            "type": "text"
          }
        }
      },
      "path": {
        "properties": {
          "real": {
            "type": "keyword",
            "fields": {
              "fulltext": {
                "type": "text"
              },
              "tree": {
                "type": "text",
                "fielddata": true
              }
            }
          },
          "root": {
            "type": "keyword"
          },
          "virtual": {
            "type": "keyword",
            "fields": {
              "fulltext": {
                "type": "text"
              },
              "tree": {
                "type": "text",
                "fielddata": true
              }
            }
          }
        }
      }
    }
  }
}

@ylasri But in my index there are more documents,
in any other document there is no
"football
Football
footballer"

but it is displaying the remaining documents where

"football
Football
footballer "

does not exists.

Something wrong in our discussion :slight_smile:
I asked to give an example of document returned that doesn't contain your keywork that you are searching, and the example you provided contain at least 2 matches

@ylasri this is the data that "football" not exists

{
"_index" : "football_index",
"_type" : "_doc",
"_id" : "55ffd32fdca4684a20c6da2e714e1976",
"_score" : 2.0674558,
"_source" : {
"content" : """Rohit Gurunath Sharma (born 30 April 1987) is an Indian international cricketer who plays for Mumbai in domestic cricket and captains Mumbai Indians in the Indian Premier League as a right-handed batsman and an occasional right-arm off break bowler. He is the vice-captain of the Indian national team in limited-overs formats.

Outside cricket, Sharma is an active supporter of animal welfare campaigns. He is the official Rhino Ambassador for WWF-India and is a member of People for the Ethical Treatment of Animals (PETA). He has worked with PETA in its campaign to raise awareness of the plight of homeless cats and dogs in India"""
,
"meta" : { },
"file" : {
"extension" : "txt",
"content_type" : "text/plain; charset=UTF-8",
"created" : "2020-06-19T11:06:35.436+00:00",
"last_modified" : "2020-04-09T11:14:34.320+00:00",
"last_accessed" : "2020-06-19T11:06:35.436+00:00",
"indexing_date" : "2020-06-19T11:25:01.640+00:00",
"filesize" : 9781,
"filename" : "Rohit.txt",
"url" : """file://D:\football_data\Rohit.txt"""
},
"path" : {
"root" : "4bd37c1058a1860e1386c249e48da2",
"virtual" : "/Rohit.txt",
"real" : """D:\football_data\Rohit.txt"""
}
}
}

Strange, just tried on my es node, this doc is not retireved with the match query for keywork football.
What elasticsearch version you are using ?

@ylasri 7.6.2

How are you querying your node ? are you using Kibana Dev Tools ? or any third party library ? or curl

@ylasri I am using kibana dev tools,

can i get the only file name where my content "football" matches.