Can we index .zip file using ingest attachment plugin?

Hi,

As we can indexed .txt, .xlsx, .pptx , .pdf files using ingest attachment plugin. Can we indexed zip files using same plugin? I tried it but i don’t see attachment content is getting indexed.
Can anyone help me with this?

Thanks,
Priyanka

No I don't think you can.

FSCrawler supports it though.
But in any case I think you should unzip your files instead. There are many cons of indexing zip files:

  • memory usage when extracting big zip files
  • the fact that if you have tons of files within your zip, it will be hard for the end user knowing which inner file contains the text.

My 2,cents

Hi @dadoonet,

Thanks for your reply.
I am trying to index files using FSCrawler. FSCrawler will propose to create your first job. i have created job using 'FSCrawler job_name'.
Now trying to perform step as mentioned, but it is again asking me to create job_name.
$ bin/fscrawler --config_dir ./test job_name

after config_dir which path you should provide?

Could you please help me with this?

Thanks,
Priyanka.

config_dir is where you want to store FSCrawler config files.
It can be whatever you want.

If you don't define config_dir, it will use a default location (your home directory/.fscrawler).
So you can just run:

$ bin/fscrawler job_name

To create an empty job.

And

$ bin/fscrawler job_name

To run the job then.

Hi @dadoonet,

Thanks for your reply!!!
I tried to run new created job but i am getting below error for ES client version :

E:\fscrawler-es7-2.7-20190218.195814-11\fscrawler-es7-2.7-SNAPSHOT\bin>fscrawler test
11:09:46,562 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client,disabling crawler...
11:09:46,562 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.java.lang.RuntimeException: The Elasticsearch client version [7] is not compatible with the Elasticsearch cluster version [6.6.1].at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.checkVersion(ElasticsearchClientV7.java:183) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]atfr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:142) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:263) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
11:09:46,562 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [test] stopped
11:09:46,562 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [test] stopped 

Thanks,
Priyanka

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.

I updated your post.

Didn't you see this?

We can not start Elasticsearch Client. Exiting.java.lang.RuntimeException: The Elasticsearch client version [7] is not compatible with the Elasticsearch cluster version [6.6.1].

You need to download the right version as the documentation says:

Hi @dadoonet,

Thanks for your reply!!!
I have created job and ran job which is indexing files mentioned in folder (\tmp\es). I have created index pattern and try to discover it. I am not able to see attachment content is being indexed. So how will I able to know that all the attachments, I have added to folder is getting indexed?

Kindly guide me on this more.

Thanks,
Priyanka

Could you run in Kibana dev console:

GET /_cat/indices?v
GET /test/_search

Hi @dadoonet,

Yes now I can see doc is indexed. But how I can see indexed attachment content in discover section?
I have created index pattern but not able see content of attachment in discover like mapper attachment.

Thanks,
Priyanka

Can you share the output of the commands I pasted?

Hi @dadoonet,

Please find attached document.

2ndcommand

Thanks,
Priyanka

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

Hi @dadoonet,

Thanks for your help!!
This issue has been resolved now. I am able to index zip files and can discover index pattern using kibana. and able to see attachment content also.

one more question regarding FSCrawler is can we provide database connection using FSCrawler so that files can be indexed using database file path?

Could you please guide me?

Thanks,
Priyanka Yerunkar

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.