Ingest Attachment: Parse Error using NEST


(Pierre) #1

I got the following error after execute my c#-code

`Preformatted text`[2017-05-08T17:34:15,135][DEBUG][o.e.a.b.TransportShardBulkAction] [-h4nwTK] [documents][3] failed to execute bulk item (index) BulkShardRequest [[documents][3]] containing [index {[documents][document][1], source[{"path":"D:\\foo.docx","attachment":{"date":"2017-05-04T15:58:00Z","content_type":"application/vnd.openxmlformats-officedocument.wordprocessingml.document","author":"Pierre","language":"et","content":"Java, C++, C","content_length":13},"id":1}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [attachment]
	at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:298) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:450) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:467) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:383) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:373) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:277) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:532) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:509) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.prepareIndexOperationOnPrimary(TransportShardBulkAction.java:447) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:455) ~[elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:143) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:113) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:322) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:264) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:888) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:885) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1654) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:897) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.access$400(TransportReplicationAction.java:93) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:281) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:260) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:252) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:618) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.3.1.jar:5.3.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.1.jar:5.3.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_131]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_131]
Caused by: java.lang.IllegalStateException: Can't get text on a START_OBJECT at 1:37

Any ideas?

Best regards,
Pierre


(Pierre) #2

C# - Code-Snippet:

            var indexResponse = client.CreateIndex(index, c => c
                .Settings(s => s
                    .Analysis(a => a
                        .Analyzers(ad => ad
                            .Custom("windows_path_hierarchy_analyzer", ca => ca
                                .Tokenizer("windows_path_hierarchy_tokenizer")
                            )
                         )
                         .Tokenizers(t => t
                            .PathHierarchy("windows_path_hierarchy_tokenizer", ph => ph
                                .Delimiter('\\')
                            )
                         )
                     )
                 )
                 .Mappings(m => m
                    .Map<Document>(mp => mp
                        .AutoMap()
                        .AllField(all => all
                        .Enabled(false)
                    )
                    .Properties(ps => ps
                        .Text(s => s
                            .Name(n => n.Path)
                            .Analyzer("windows_path_hierarchy_analyzer")
                        )
                        .Object<Attachment>(a => a
                            .Name(n => n.Attachment)
                            .AutoMap()
                        )
                    )
                 )
               )
            );

            client.PutPipeline("attachments", p => p
                .Description("Document attachment pipeline")
                .Processors(pr => pr
                    .Attachment<Document>(a => a
                        .Field(f => f.Cv)
                        .TargetField(f => f.Attachment)
                    )
                    .Remove<Document>(r => r
                        .Field(f => f.Cv)
                    )
                )
            );

            client.Index(document, i => i.Pipeline("attachments"));

(Russ Cam) #3

Hey @PierreFre, what does the Document type definition look like and where is the instance assigned to document constructed? Could you edit your question to add those please?


(Pierre) #4

Hello @forloop ,
here is the document-class:

public class Document
        {
            public int Id { get; set; }
            public string Path { get; set; }
            public string Cv { get; set; }
            public Attachment Attachment { get; set; }
        }

Here is the rest of the code:

var index = "documents";
            var node = new Uri("http://localhost:9200");
            var settings = new ConnectionSettings(node)
                .InferMappingFor<Document>(m => m.IndexName(index))
                .EnableDebugMode();

            var client = new ElasticClient(settings);

            var file = @"D:\foo.docx";

            var cvBytes = File.ReadAllBytes(file);
            var cvToBase64 = Convert.ToBase64String(cvBytes);

            var document = new Document()
            {
                Id = 1,
                Path = file,
                Cv = cvToBase64
            };

(Russ Cam) #5

do you also have the log of requests and responses from enabling debug mode? If it's long, you can create a gist and link to it .


(Pierre) #6

Hello Russ,

here is the Log

regards,
Pierre


(Pierre) #7

@forloop, do you have any ideas or alternatives? I was thinking to extract the text of word and pdf documents by myself and save them in a text field.

best regards,
Pierre


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.