Automatic Keywords extraction in ElasticSearch

Hi all,

I started using ElasticSearch to index my corpus of PDF files, I succeeded
in indexing my PDF files as attachments (base64), my search queries on the
content go right but I couldn't find how to extract automaticaly keywords
from these files in ElasticSearch. Is it possible to do that with
ElasticSearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fab9d7ba-502b-4166-874a-552834be18db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Are you using mapper attachment plugin or do you extract text yourself using Tika or similar ?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 févr. 2015 à 14:00, Marria m_bekrar@esi.dz a écrit :

Hi all,

I started using Elasticsearch to index my corpus of PDF files, I succeeded in indexing my PDF files as attachments (base64), my search queries on the content go right but I couldn't find how to extract automaticaly keywords from these files in Elasticsearch. Is it possible to do that with Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fab9d7ba-502b-4166-874a-552834be18db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6272F6F7-2540-4527-B4CF-E7503CD44E9F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Hi David,

thank you for your answer.

I am using mapper attachment plugin.

Le lundi 16 février 2015 14:10:25 UTC+1, David Pilato a écrit :

Are you using mapper attachment plugin or do you extract text yourself
using Tika or similar ?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 févr. 2015 à 14:00, Marria <m_be...@esi.dz <javascript:>> a écrit :

Hi all,

I started using Elasticsearch to index my corpus of PDF files, I succeeded
in indexing my PDF files as attachments (base64), my search queries on the
content go right but I couldn't find how to extract automaticaly keywords
from these files in Elasticsearch. Is it possible to do that with
Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fab9d7ba-502b-4166-874a-552834be18db%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fab9d7ba-502b-4166-874a-552834be18db%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0a4c1e8-fffc-4a7e-9f22-670b8e3a4618%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

So the mapper attachment plugin tries to extract keywords: https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L533-542 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L533-542

Why do you think they are not extracted?

Note that mapper plugin never modifies _source which contains exactly what you sent to elasticsearch.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 16 févr. 2015 à 14:22, Marria m_bekrar@esi.dz a écrit :

Hi David,

thank you for your answer.

I am using mapper attachment plugin.

Le lundi 16 février 2015 14:10:25 UTC+1, David Pilato a écrit :
Are you using mapper attachment plugin or do you extract text yourself using Tika or similar ?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 févr. 2015 à 14:00, Marria <m_be...@esi.dz <javascript:>> a écrit :

Hi all,

I started using Elasticsearch to index my corpus of PDF files, I succeeded in indexing my PDF files as attachments (base64), my search queries on the content go right but I couldn't find how to extract automaticaly keywords from these files in Elasticsearch. Is it possible to do that with Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fab9d7ba-502b-4166-874a-552834be18db%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/fab9d7ba-502b-4166-874a-552834be18db%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0a4c1e8-fffc-4a7e-9f22-670b8e3a4618%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d0a4c1e8-fffc-4a7e-9f22-670b8e3a4618%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87DE4E15-0D0C-4B16-9647-FC42B7330C26%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

I am really new in using ElasticSearch (I have only a week using it), and
my google search didn't take me to this code.

So, if they are extracted, how can I get them again

(be patient with my ignorance please :frowning: )

Thanks a lot for your help sir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cab521d8-d341-45c1-b996-89ccbc3099ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I think this could help: GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch https://github.com/elasticsearch/elasticsearch-mapper-attachments#querying-or-accessing-metadata

Replace file.content_type with file.keywords

this should work

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 16 févr. 2015 à 14:48, Marria m_bekrar@esi.dz a écrit :

I am really new in using Elasticsearch (I have only a week using it), and my google search didn't take me to this code.

So, if they are extracted, how can I get them again

(be patient with my ignorance please :frowning: )

Thanks a lot for your help sir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cab521d8-d341-45c1-b996-89ccbc3099ac%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cab521d8-d341-45c1-b996-89ccbc3099ac%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1E486949-B55F-4DBC-ABBE-E9447336893B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Hi David,

Reading the code, made me conclude that I didn't explain well what I need,

What I mean by an automatic extraction is not to get the keywords i already
had entered in my metadata but an intelligent extraction from the text.
Like Alchemy that's is based on machine learning:

http://www.alchemyapi.com/products/demo/alchemylanguage/

I think, it is not possible with elasticsearch because it is not the
objective of this tool :frowning:

Thanks a lot David for your generous help :slight_smile:

Le lundi 16 février 2015 14:00:38 UTC+1, Marria a écrit :

Hi all,

I started using Elasticsearch to index my corpus of PDF files, I succeeded
in indexing my PDF files as attachments (base64), my search queries on the
content go right but I couldn't find how to extract automaticaly keywords
from these files in Elasticsearch. Is it possible to do that with
Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

May be you could build something using percolator feature?
I know that some users use that to classify information. Though it’s not an automatic classification and you need to provide queries.

My 0.05 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 16 févr. 2015 à 15:52, Marria m_bekrar@esi.dz a écrit :

Hi David,

Reading the code, made me conclude that I didn't explain well what I need,

What I mean by an automatic extraction is not to get the keywords i already had entered in my metadata but an intelligent extraction from the text. Like Alchemy that's is based on machine learning:

http://www.alchemyapi.com/products/demo/alchemylanguage/

I think, it is not possible with elasticsearch because it is not the objective of this tool :frowning:

Thanks a lot David for your generous help :slight_smile:

Le lundi 16 février 2015 14:00:38 UTC+1, Marria a écrit :
Hi all,

I started using Elasticsearch to index my corpus of PDF files, I succeeded in indexing my PDF files as attachments (base64), my search queries on the content go right but I couldn't find how to extract automaticaly keywords from these files in Elasticsearch. Is it possible to do that with Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2E937850-6847-417E-970E-D5A460CF6C41%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

On 16/02/2015 14:52, Marria wrote:

Hi David,

Reading the code, made me conclude that I didn't explain well what I need,

What I mean by an automatic extraction is not to get the keywords i
already had entered in my metadata but an intelligent extraction from
the text. Like Alchemy that's is based on machine learning:

http://www.alchemyapi.com/products/demo/alchemylanguage/

I think, it is not possible with elasticsearch because it is not the
objective of this tool :frowning:

Thanks a lot David for your generous help :slight_smile:

Hi Marria,

Firstly, why do you need to extract the keywords? Are you trying to
extract entities (e.g. company names, people, places), tag for
sentiment, or do term expansion (automatically add synonyms or related
terms)?

We've used Stanford NLP successfully for entity extraction and basic
sentiment tagging http://nlp.stanford.edu/ Python NLTK is another option
http://www.nltk.org/

You're right, this isn't a core function of Elasticsearch, but rather
something you would do at index time to enhance the data before you
index it, or at query time to enhance a query before you use it on the
index. You should also bear in mind that most of these tools only have a
certain success rate, may need training and may have a significant
overhead. Certainly take with a very large pinch of salt any claims of
'intelligence' especially from closed-source vendors.

HTH

Cheers

Charlie

Le lundi 16 février 2015 14:00:38 UTC+1, Marria a écrit :

Hi all,

I started using ElasticSearch to index my corpus of PDF files, I
succeeded in indexing my PDF files as attachments (base64), my
search queries on the content go right but I couldn't find how to
extract automaticaly keywords from these files in ElasticSearch. Is
it possible to do that with ElasticSearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54E20C90.4030008%40flax.co.uk.
For more options, visit https://groups.google.com/d/optout.

Ok David,

i'll search and see what it gives :slight_smile:

thanks a lot :slight_smile:

Le lundi 16 février 2015 16:19:49 UTC+1, David Pilato a écrit :

May be you could build something using percolator feature?
I know that some users use that to classify information. Though it’s not
an automatic classification and you need to provide queries.

My 0.05 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 16 févr. 2015 à 15:52, Marria <m_be...@esi.dz <javascript:>> a écrit :

Hi David,

Reading the code, made me conclude that I didn't explain well what I need,

What I mean by an automatic extraction is not to get the keywords i
already had entered in my metadata but an intelligent extraction from the
text. Like Alchemy that's is based on machine learning:

http://www.alchemyapi.com/products/demo/alchemylanguage/

I think, it is not possible with elasticsearch because it is not the
objective of this tool :frowning:

Thanks a lot David for your generous help :slight_smile:

Le lundi 16 février 2015 14:00:38 UTC+1, Marria a écrit :

Hi all,

I started using Elasticsearch to index my corpus of PDF files, I
succeeded in indexing my PDF files as attachments (base64), my search
queries on the content go right but I couldn't find how to extract
automaticaly keywords from these files in Elasticsearch. Is it possible to
do that with Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0eea7804-e37b-494f-8c7f-4a70a723ff4a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3a1b3a9-235b-4bc9-a4c7-e7bc5d10c428%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi charlie,

Thanks for your help,

I need to extract important keywords, themes, topics, place names from any
text (the current corpus I have is a group of PDF files).
I work with a research team and they want to do this on the scientific
papers they exploit.
For example, in a document about business plan of a mobile application
developed in a certain company, I need to be able to sort as keywords "Name
of the company",
"mobile", "application", business plan", "technology"; etc.

I think it is possible to do this with Stanford NLP, right?

Thanks again :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5d86283-33f3-4262-9460-5528bdeffd33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On 16/02/2015 16:27, Marria wrote:

Hi charlie,

Thanks for your help,

I need to extract important keywords, themes, topics, place names from
any text (the current corpus I have is a group of PDF files).
I work with a research team and they want to do this on the scientific
papers they exploit.
For example, in a document about business plan of a mobile application
developed in a certain company, I need to be able to sort as keywords
"Name of the company",
"mobile", "application", business plan", "technology"; etc.

I think it is possible to do this with Stanford NLP, right?

Partially. Use Apache Tika or pdftotext to grab the text from the files,
then try Stanford NER to get the name of the company. The more generic
words won't be something you can get with NER, but if they're indexed by
Elasticsearch they'll be available for searching.

Theme and topic extraction is a harder problem. You might want to look
at Latent Semantic Indexing which promises much in this area (but is
frankly hard to get working and may have low quality). Consider your use
case again - why do you need themes/topics and how will it help the user?

Cheers

Charlie

Thanks again :slight_smile:

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5d86283-33f3-4262-9460-5528bdeffd33%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a5d86283-33f3-4262-9460-5528bdeffd33%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54E30D71.4070409%40flax.co.uk.
For more options, visit https://groups.google.com/d/optout.

Hi Charlie,

I am really grateful to you.

Well, my supervisor is not available this week to ask him why he wants
this. But I think he wants to classify his scientific documents by topics.

He wants all this (extraction of keywords, themes/topics , classification
of documents...etc) integrated in ElasticSearch directly. So, after
indexation in ElasticSearch, he wants to be able to extract them (for one
document or a group of documents). I am not sure if it is feasible to
implement all this on ElasticSearch, I am not even sure if the
ElasticSearch is the best tool for these purposes.

What do you think?

Cheers :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/31459a21-97c8-48e2-b6c1-c0b15e1269e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On 17/02/2015 10:21, Marria wrote:

Hi Charlie,

I am really grateful to you.

Well, my supervisor is not available this week to ask him why he wants
this. But I think he wants to classify his scientific documents by topics.

He wants all this (extraction of keywords, themes/topics ,
classification of documents...etc) integrated in Elasticsearch directly.
So, after indexation in Elasticsearch, he wants to be able to extract
them (for one document or a group of documents). I am not sure if it is
feasible to implement all this on Elasticsearch, I am not even sure if
the Elasticsearch is the best tool for these purposes.

You can index this extra information (metadata) in Elasticsearch, and
use it to search/filter/facet to extract the documents accordingly.
However creating this extra information is something you would do as
part of the document ingestion process, before indexing with Elasticsearch.

Classification is another thing entirely - there are lots of ways to do
this - manual, naive Bayes, using stored expressions....but again this
is adding metadata to a document.

I think you have lots of things to do here, and your first step would be
to understand the definition of each and what is possible and with which
tool.

Cheers

Charlie

What do you think?

Cheers :slight_smile:

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/31459a21-97c8-48e2-b6c1-c0b15e1269e1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/31459a21-97c8-48e2-b6c1-c0b15e1269e1%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54E4611C.7070401%40flax.co.uk.
For more options, visit https://groups.google.com/d/optout.

1 Like

Ok charlie, I understand.

Thanks a lot.

Le lundi 16 février 2015 14:00:38 UTC+1, Marria a écrit :

Hi all,

I started using Elasticsearch to index my corpus of PDF files, I succeeded
in indexing my PDF files as attachments (base64), my search queries on the
content go right but I couldn't find how to extract automaticaly keywords
from these files in Elasticsearch. Is it possible to do that with
Elasticsearch or not?

Could anybody help with relevent links or advices??

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7218672d-8e7b-4f6a-bec5-ba32a33a2993%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.