Regarding path.real

Hi
I am using ElasticSearch and FSCrawler together. I have indexed a document directory with default of FSCrawler settings, no template applied no filter applied. simple standard configuration only.

I have a simple query
I want to search some text on documents but i would like retrieve result which match my criteria for fullFilePath i.e. UNC Path, I can see there is a property path.real, but that property is case sensitive. In my case i want path.real case insensitive. I am using windows environment.

My object was to search some text only in specified/given file(s). filepath should be case insensitive.

Feeing source.path.real or MyDoc.path.real is case sensitive, I am able to get result when filter matches with case sensitivity but the same below query didn't return result if filter is not matched with case sensitivity.
I think it is executing the query first then it is assigning the value of MyDoc.path.real.

overall my motto was either to search something in given file only or search something and then filter the fullfilepath as per my input,

        var testResults = client.Search<MyDoc>(selector => selector
            .Query(q => q
                .Bool(b => b
                    .Should(s => s
                        .Match(m => m
                            .Field(f => f.content)
                            .Query(query.ToLower()) //this is my local variable
                        )
                    )
                    .Filter(fi => fi
                         .Match(r => r
                            .Field(f => f.path.real)
                            .Query(fullFilePath.ToLower()) //this is my local variable
                        )
                    )
                )
            )
            );

I am using NEST client, so i would prefer answer which compatible with NEST Client.

Below if my default mapping for doing indexing on documents.

{"MyDoc":{"mappings":{"dynamic_templates":[{"raw_as_text":{"path_match":"meta.raw.*","mapping":{"fields":{"keyword":{"ignore_above":256,"type":"keyword"}},"type":"text"}}}],"properties":{"attachment":{"type":"binary"},"attributes":{"properties":{"group":{"type":"keyword"},"owner":{"type":"keyword"},"permissions":{"type":"long"}}},"content":{"type":"text"},"file":{"properties":{"checksum":{"type":"keyword"},"content_type":{"type":"keyword"},"created":{"type":"date","format":"dateOptionalTime"},"extension":{"type":"keyword"},"filename":{"type":"keyword","store":true},"filesize":{"type":"long"},"indexed_chars":{"type":"long"},"indexing_date":{"type":"date","format":"dateOptionalTime"},"last_accessed":{"type":"date","format":"dateOptionalTime"},"last_modified":{"type":"date","format":"dateOptionalTime"},"url":{"type":"keyword","index":false}}},"meta":{"properties":{"altitude":{"type":"text"},"author":{"type":"text"},"comments":{"type":"text"},"contributor":{"type":"text"},"coverage":{"type":"text"},"created":{"type":"date","format":"dateOptionalTime"},"creator_tool":{"type":"keyword"},"date":{"type":"date","format":"dateOptionalTime"},"description":{"type":"text"},"format":{"type":"text"},"identifier":{"type":"text"},"keywords":{"type":"text"},"language":{"type":"keyword"},"latitude":{"type":"text"},"longitude":{"type":"text"},"metadata_date":{"type":"date","format":"dateOptionalTime"},"modifier":{"type":"text"},"print_date":{"type":"date","format":"dateOptionalTime"},"publisher":{"type":"text"},"rating":{"type":"byte"},"raw":{"properties":{"Content-Encoding":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"Content-Type":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"X-Parsed-By":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"resourceName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"relation":{"type":"text"},"rights":{"type":"text"},"source":{"type":"text"},"title":{"type":"text"},"type":{"type":"text"}}},"path":{"properties":{"real":{"type":"keyword","fields":{"fulltext":{"type":"text"},"tree":{"type":"text","analyzer":"fscrawler_path","fielddata":true}}},"root":{"type":"keyword"},"virtual":{"type":"keyword","fields":{"fulltext":{"type":"text"},"tree":{"type":"text","analyzer":"fscrawler_path","fielddata":true}}}}}}}}}

path.real is indeed a keyword. If I'm not mistaken, at index time, 2 sub fields are generated:

  • path.real.tree

  • path.real.fulltext

          "real": {
            "type": "keyword",
            "fields": {
              "tree": {
                "type": "text",
                "analyzer": "fscrawler_path",
                "fielddata": true
              },
              "fulltext": {
                "type": "text"
              }
            }
          },
    

May be that could help you to solve your use case?
Otherwise, you need to change the mapping.

Yes path.real is keyword, by mapping itself we can identify it. but how to solve it. should I convert them to text ?
I am new to ElasticSearch, so my question might be silly for you.

I said that you can search within the 2 other fields I shared. That should work.

Its my bad luck or my basic query....i am just a beginner....
I tried to modify my MyDoc class as below

public class MyDoc
{
    public string content { get; set; }
    public ESPath path { get; set; }
}
public class ESPath
{
    public string root { get; set; }
    public ESReal real { get; set; }
}
public class ESReal
{
    public ESFields fields { get; set; }
}
public class ESFields
{
    public string fulltext { get; set; }
    public ESTree tree { get; set; }
}

public class ESTree
{
    public string analyzer { get; set; }
    public bool fielddata { get; set; }
}

but it throws an exception while querying .

Error : expected:'{', actual:'"\XYC\GT1\2091.pdf"', at offset:9406

I am sure somewhere i am making mistake, but i am not able to debug it, since i am MS guy and can understand only c# codes.

I already tried to modify MyDoc class in various ways....but it only works when i declare reals as string....
Sorry to bother you but i am newbie.

I'm not a C# dev so I'm afraid I can't help more without seeing pure elasticsearch queries.

Ok, Let me try in another way...
I simply trying to search a text by GET api using fiddler...

The response i am getting from fiddler is

HTTP/1.1 200 OK content-type: application/json; charset=UTF-8 content-length: 2737 
{
  "took": 158,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 17.183006,
    "hits": [
      {
        "_index": "mydoc",
        "_type": "_doc",
        "_id": "8c6d32b7c47f9b9c3c748069831c60da",
        "_score": 17.183006,
        "_source": {
          "content": "Free text search using ElasticSearch.\n",
          "meta": {
            "author": "abc, xyz",
            "date": "2020-06-09T10:35:00.000+00:00",
            "language": "en",
            "modifier": "abc, xyz",
            "publisher": "ABC",
            "created": "2020-06-09T10:34:00.000+00:00",
            "raw": {
              "date": "2020-06-09T04:35:00Z",
              "cp:revision": "1",
              "Total-Time": "1",
              "extended-properties:AppVersion": "16.0000",
              "meta:paragraph-count": "1",
              "meta:word-count": "7",
              "dc:creator": "abc, xyz",
              "extended-properties:Company": "ABC",
              "Word-Count": "7",
              "dcterms:created": "2020-06-09T04:34:00Z",
              "meta:line-count": "1",
              "dcterms:modified": "2020-06-09T04:35:00Z",
              "Last-Modified": "2020-06-09T04:35:00Z",
              "Last-Save-Date": "2020-06-09T04:35:00Z",
              "meta:character-count": "43",
              "Template": "Normal",
              "Line-Count": "1",
              "Paragraph-Count": "1",
              "meta:save-date": "2020-06-09T04:35:00Z",
              "meta:character-count-with-spaces": "49",
              "Application-Name": "Microsoft Office Word",
              "extended-properties:TotalTime": "1",
              "modified": "2020-06-09T04:35:00Z",
              "Content-Type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
              "X-Parsed-By": "org.apache.tika.parser.DefaultParser",
              "creator": "abc, xyz",
              "meta:author": "abc, xyz",
              "meta:creation-date": "2020-06-09T04:34:00Z",
              "extended-properties:Application": "Microsoft Office Word",
              "meta:last-author": "abc, xyz",
              "Creation-Date": "2020-06-09T04:34:00Z",
              "xmpTPg:NPages": "1",
              "resourceName": "ElasticSearch.docx",
              "Character-Count-With-Spaces": "49",
              "Last-Author": "abc, xyz",
              "Character Count": "43",
              "Page-Count": "1",
              "Revision-Number": "1",
              "Application-Version": "16.0000",
              "extended-properties:Template": "Normal",
              "extended-properties:DocSecurityString": "None",
              "Author": "äbc, xyz",
              "publisher": "ABC",
              "meta:page-count": "1",
              "dc:publisher": "ABC"
            }
          },
          "file": {
            "extension": "docx",
            "content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "created": "2020-06-09T04:38:33.511+00:00",
            "last_modified": "2020-06-09T04:35:37.385+00:00",
            "last_accessed": "2020-06-09T04:38:33.511+00:00",
            "indexing_date": "2020-06-13T04:43:51.324+00:00",
            "filesize": 11451,
            "filename": "ElasticSearch.docx",
            "url": "file://\\\\ABC\\GTFiles\\ElasticSearchTests\\ElasticSearch.docx"
          },
          "path": {
            "root": "d9754dc78afc186c778be87faffd22",
            "virtual": "/ElasticSearchTests/ElasticSearch.docx",
            "real": "\\\\ABC\\GTFiles\\ElasticSearchTests\\ElasticSearch.docx"
          },
          "attributes": {
            "owner": "ABC\\axyz",
            "permissions": 0
          }
        }
      }
    ]
  }
}

Here i cannot see fulltext property....
what could be the reason... why fiddler is not show up the fulltext propery

It's not in the _source document but it's indexed.

Can you please give me an example In which I would like to search some text and wants results only for specified/filtered by filename along with full path without case sensitive.

Suppose I indexed c:\abc directory with default mapping, and now I am trying to search some text which are available in file1 and file2
C:\abc\File1.txt
C:\abc\File2.txt
But I want to apply some criteria/filter/where clause by full path of filename so that it can only return result for File2. and it should not be case sensitive

How would i do that using NEST client.

How to access path.real.fulltext this property using nest client

You can run a search like:

GET /_search?q=path.real.fulltext:file1

I don't know.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.