How to do elastic search for selected pdf file

I have requirement that to do elastic search for particular word/text/sentences in selected pdf file not for all present in specific folder.

using elasticsearch fs crawler and kibana .

Any idea guyz???????

What did you try so far?
What is not working ?

currently process is like that pdf files need to put in folder and that path gave to fscrawler but its searching searched text in all file.

My requirement is that in UI it will show the list of all files and we have to select the 2 or 3 files or how many you want. then search for particular word/sentence/text in that particular selected files.

I am new in Elastic search so something is there that we can send the selected files in request so that output will come from that files only.

so i am not getting an idea or i didnt find any thing over google search , how to do /

I think that what you are describing and your question is more related to the UI part than the server side part.

But anyway, let say you have indexed all your files with FSCrawler in elasticsearch. So you have documents like:

{
  "_index" : "fscrawler_doc",
  "_type" : "_doc",
  "_id" : "dd18bf3a8ea2a3e53e2661c7fb53534",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "content" : "This file contains some words.\n",
    "meta" : {
      "raw" : {
        "X-Parsed-By" : "org.apache.tika.parser.DefaultParser",
        "Content-Encoding" : "ISO-8859-1",
        "Content-Type" : "text/plain; charset=ISO-8859-1"
      }
    },
    "file" : {
      "extension" : "txt",
      "content_type" : "text/plain; charset=ISO-8859-1",
      "indexing_date" : "2017-01-04T21:01:08.043",
      "filename" : "test.txt"
    },
    "path" : {
      "virtual" : "/test.txt",
      "real" : "/path/to/test.txt"
    }
  }
}

You can first run a

GET fscrawler_doc/_search

To retrieve the list of documents (only the first 10 documents though). You can also run:

GET fscrawler_doc/_search?size=9999

To have a list of 9999 documents for example.

Then let say you select 3 documents from the UI. And you want to search within those documents only, you can run something like:

GET fscrawler_doc/_search
{
  "query": {
    "bool" : {
      "must" : {
        "match" : { "content" : "foo" }
      },
      "filter": {
        "ids" : { "values" : ["dd18bf3a8ea2a3e53e2661c7fb53534", "id2", "id3"] }
      }
    }
  }
}

That way you will search for foo as a text only in 3 documents.

1 Like

Thank you so much . Thats idea i was looking for.
i will try :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.