Weird Behavior of Elastic Search


(rajeev reddy) #1

Hi All,

Elastic search mapper attachment for attaching pdf and docx files into
ElasticSearch. When i am searching ElastiSearch for particular string the
search results are giving zero. If i am giving single character, the search
is giving results.

ElasticSearch Version: 1.1.0
Mapper Attachment version: 2.0.0

Machine Details: Windows 2008 Server R2 Service pack1
Java run Time: Java 7 update 1

I also attached the pdf file which i indexed into Elastic Search. The pdf
file has word called Sample.

Any help is greatly appreciated. I am stuck here. Please help me.

Here is my search queries:

  1. POST /cg_attachments/IncidentRequest/_search?pretty=true
    {"query":{"query_string":{"fields":["attachmentFullpath","attachmentContenttype"],"query":"Sample"}},"fields":["attachmentParentId"]}

Elastic Search Response:
{
"took": 45,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

  1. POST /cg_attachments/IncidentRequest/_search?pretty=true
    {"query":{"query_string":{"fields":["attachmentFullpath","attachmentContenttype"],"query":"c"}},"fields":["attachmentParentId"]}

Response from elasticSearch:
{
"took": 453,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0.061956294,
"hits": [
{
"_index": "cg_attachments",
"_type": "IncidentRequest",
"_id": "39",
"_score": 0.061956294,
"fields": {
"attachmentParentId": [
52
]
}
},
{
"_index": "cg_attachments",
"_type": "IncidentRequest",
"_id": "40",
"_score": 0.061956294,
"fields": {
"attachmentParentId": [
53
]
}
},
{
"_index": "cg_attachments",
"_type": "IncidentRequest",
"_id": "41",
"_score": 0.061956294,
"fields": {
"attachmentParentId": [
53
]
}
},
{
"_index": "cg_attachments",
"_type": "IncidentRequest",
"_id": "42",
"_score": 0.061956294,
"fields": {
"attachmentParentId": [
54
]
}
},
{
"_index": "cg_attachments",
"_type": "IncidentRequest",
"_id": "43",
"_score": 0.061956294,
"fields": {
"attachmentParentId": [
55
]
}
},
{
"_index": "cg_attachments",
"_type": "IncidentRequest",
"_id": "44",
"_score": 0.061956294,
"fields": {
"attachmentParentId": [
55
]
}
}
]
}
}

3.* GET /cg_attachments/IncidentRequest/42*
working correctly. Getting correct response.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf8b84ad-1bde-4f78-a798-ccd898ada2d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

Can you show the code of how you index the pdf as well as the mapping?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/babcf89c-c808-4787-a6e0-69e5c5967328%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(rajeev reddy) #3

I used NEST for indexing documents to ElasticSearch. Here is the code
snippet.

        try
        {

            Task<IBulkResponse> bulkResponse;
            foreach (string currentItem in files)
            {
                log.Info("Started indexing information into

elasticSearch" + currentItem);
byte[] fileArray = File.ReadAllBytes(currentItem);

                int entityId = IndexerHelper.GetEntityId(currentItem);
                string myType = IndexerHelper.GetTypeName(currentItem);
                bulkRequest.Index<Attachments>(m => m.Object(new

Attachments
{
AttachmentContenttype =
IndexerHelper.GetContentType(currentItem),
AttachmentEncodedfile =
Convert.ToBase64String(fileArray, 0,
fileArray.Length,Base64FormattingOptions.InsertLineBreaks),
AttachmentFullpath = currentItem,
AttachmentLastmodified =
File.GetLastWriteTime(currentItem),
AttachmentParentId =
IndexerHelper.GetParentId(currentItem)
})
.Index(indexName)
.Type(myType)
.Id(entityId)
);
bulkCount++;
if (bulkRequestSize == bulkCount)
{
bulkResponse = esBulkClient.BulkAsync(bulkRequest);
bulkCount = 0;
bulkRequest = new BulkDescriptor();
}
fileArray = null;
}

            bulkResponse = esBulkClient.BulkAsync(bulkRequest);

            foreach (BulkOperationResponseItem item in

bulkResponse.Result.Items)
{
if (item.OK == false && item.Error != null)
{
log.Error("Not indexed ID:" + item.Id + " of Type:"

  • item.Type);
    log.Error(" Error is:" + item.Error);
    }
    }

On Mon, Mar 31, 2014 at 3:42 PM, Binh Ly binhly_es@yahoo.com wrote:

Can you show the code of how you index the pdf as well as the mapping?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RU5a2wSZCYc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/babcf89c-c808-4787-a6e0-69e5c5967328%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/babcf89c-c808-4787-a6e0-69e5c5967328%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
R R R

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOLrPN7JFp1wwKmktk%3Dqn-Tm7oNgBVQ%3DmtM6E0MdVFDSu0bn6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #4

Attachments require a specific mapping. It is likely that your mapping is
not correct. Here is a simple example of how to index a PDF with the proper
attachment type:

    class T1
    {
    }

    private static void IndexPdf()
    {
        var settings = new ConnectionSettings(new 

Uri("http://localhost:9200"), "_all");
var client = new ElasticClient(settings);

        client.CreateIndex("foo", c => c
            .AddMapping<T1>(m => m
                .Properties(props => props
                    .Attachment(s => s
                        .Name("file")
                        .FileField(fs => fs.Store())
                    )
                )
            )
        );

        var doc = new { file = new {
            content = 

Convert.ToBase64String(File.ReadAllBytes(@"C:\ESData\pdf\fn6742.pdf")),
_indexed_chars = -1
}};

        client.Index(doc, i => i.Index("foo").Type("t1"));
    }

Then after that you can run a search like this:

POST localhost:9200/foo/t1/_search
{
"fields": "file",
"query": {
"match": {
"file": "blah bah"
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b3a7cf5-f3b7-45d7-881e-099c803f7bdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(rajeev reddy) #5

I am indexing it the same way as you suggested.

I am able to properly index into elasticsearch. But the search
functionality only not working. With the same code in other machine i am
able to index and able to search also.

On Tue, Apr 1, 2014 at 4:29 PM, Binh Ly binhly_es@yahoo.com wrote:

Attachments require a specific mapping. It is likely that your mapping is
not correct. Here is a simple example of how to index a PDF with the proper
attachment type:

    class T1
    {
    }

    private static void IndexPdf()
    {
        var settings = new ConnectionSettings(new Uri("

http://localhost:9200"), "_all");
var client = new ElasticClient(settings);

        client.CreateIndex("foo", c => c
            .AddMapping<T1>(m => m
                .Properties(props => props
                    .Attachment(s => s
                        .Name("file")
                        .FileField(fs => fs.Store())
                    )
                )
            )
        );

        var doc = new { file = new {
            content =

Convert.ToBase64String(File.ReadAllBytes(@"C:\ESData\pdf\fn6742.pdf")),
_indexed_chars = -1
}};

        client.Index(doc, i => i.Index("foo").Type("t1"));
    }

Then after that you can run a search like this:

POST localhost:9200/foo/t1/_search
{
"fields": "file",
"query": {
"match": {
"file": "blah bah"
}
}
}

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RU5a2wSZCYc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9b3a7cf5-f3b7-45d7-881e-099c803f7bdc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9b3a7cf5-f3b7-45d7-881e-099c803f7bdc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
R R R

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOLrPN6zfu6D1w4Sp-5w5OKaJ3B63yNXzJ7zamMFjFEv3_mAHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6