Elasticsearch as Search Database For Hts Database

tm13 · August 15, 2018, 6:36pm

We are looking to build an a search tool for the Hs codes database Harmonized Tariff Schedule

Ideally, we want to search for queries like:

"men's blue shirt"

and find the hs code from the database. As of now, we only care about the code and description.

Considering that there are up to 9 indent levels for the description, I designed the mappings like so:

{
"demo_htsdata": {
"mappings": {
"doc": {
"properties": {
"hts_code": {
"type": "text"
},
"hts_description_0..9": {
"type": "text"
}
}
}
}
}
}

Now I then populated the cluster with these sample data:

{
"hts_code": "6205.30.10.00",
"hts_description_0": "Men's or boys' shirts",
"hts_description_1": "Of man-made fibers:",
"hts_description_2": "Certified hand-loomed and folklore products (640)"
}
},
{
  "hts_code": "6205.30.15.10",
  "hts_description_0": "Men's or boys' shirts",
  "hts_description_1": "Containing 36 percent or more by weight of wool or fine animal hair",
  "hts_description_2": "Mens's"
},
{
"hts_code": "6205.30.15.20",
"hts_description_0": "Men's or boys' shirts",
"hts_description_1": "Containing 36 percent or more by weight of wool or fine animal hair",
"hts_description_2": "Boys'"
},
{
"hts_code": "6205.20",
"hts_description_0": "Men's or boys' shirts",
"hts_description_1": "Of cotton"
},
{
"hts_code": "0102.29.20.11",
"hts_description_0": "Live bovine animals",
"hts_description_1": "Cows imported specially for dairy purposes.",
"hts_description_2": "Weighing less than 90 kg each"
},
{
"hts_code": "6205.90.10.00",
"hts_description_0": "Men's or boys' shirts",
"hts_description_1": "Of other textile materials:",
"hts_description_2": "Containing 70 percent or more by weight of silk or silk waste"
},
{
"hts_code": "0102.29.20.12",
"hts_description_0": "Live bovine animals",
"hts_description_1": "Cows imported specially for dairy purposes.",
"hts_description_2": "Weighing 90 kg or more each"
},
{
"hts_code": "6205",
"hts_description_0": "Men's or boys' shirts"
}
},
{
"hts_code": "6205.90.10.00",
"hts_description_0": "Men's or boys' shirts",
"hts_description_1": "Of other textile materials:",
"hts_description_2": "Containing 70 percent or more by weight of silk or silk waste"
}

Now searching for

{
"query": {
"query_string": {
"query": "men's blue cotton shirt"
}
}
}

Yields a result of:

"hits": {
    "total": 7,
    "max_score": 1.7368788,
    "hits": [
        {
            "_index": "demo_htsdata_test_cow",
            "_type": "doc",
            "_id": "h91oPmUBRmE13omutgb1",
            "_score": 1.7368788,
            "_source": {
                "hts_code": "6205.20",
                "hts_description_0": "Men's or boys' shirts",
                "hts_description_1": "Of cotton"
            }
        },
        {
            "_index": "demo_htsdata_test_cow",
            "_type": "doc",
            "_id": "id1wPmUBRmE13omuXwZ_",
            "_score": 0.6548752,
            "_source": {
                "hts_code": "6205.90.10.00",
                "hts_description_0": "Men's or boys' shirts",
                "hts_description_1": "Of other textile materials:",
                "hts_description_2": "Containing 70 percent or more by weight of silk or silk waste"
            }
        },

This is is a perfect result. We will be able to use the first returned item and get the 6 digit hs code.

However, when searching with:

{
  "query": {
    "query_string": {
      "query": "men's blue folklore shirt"
    }
  }
}

Yields:

 "hits": {
        "total": 7,
        "max_score": 0.6548752,
        "hits": [
            {
                "_index": "demo_htsdata_test_cow",
                "_type": "doc",
                "_id": "id1wPmUBRmE13omuXwZ_",
                "_score": 0.6548752,
                "_source": {
                    "hts_code": "6205.90.10.00",
                    "hts_description_0": "Men's or boys' shirts",
                    "hts_description_1": "Of other textile materials:",
                    "hts_description_2": "Containing 70 percent or more by weight of silk or silk waste"
                }
            },
            {
                "_index": "demo_htsdata_test_cow",
                "_type": "doc",
                "_id": "hN1RPmUBRmE13omunwZ7",
                "_score": 0.63465416,
                "_source": {
                    "hts_code": "6205.30.10.00",
                    "hts_description_0": "Men's or boys' shirts",
                    "hts_description_1": "Of man-made fibers:",
                    "hts_description_2": "Certified hand-loomed and folklore products (640)"
                }
            },

The database will be searched by non tech users, who prefer to search "google" like kind of searches, free style, and one that they do not need to know the indentation level.

I am trying to understand

What's the better way to organise the data? Is it by manually extracting the keywords of each indent level and putting it into a field? Better mapping strategy perhaps?
How can the search yield better results? In the case of the "men's blue folklore shirt" the folklore product had a lower _score than the first item in the list. More specific, more keywords does yield a better result, but considering that the search term is done by the end users themselves, I do not think that they will be verbose or specific with their search term.

I'm trying to consider a possibility where the end user might not be too verbose with the search terms. How can I make use the full capabilities of Elasticsearch for this use case?

system · September 12, 2018, 6:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch won't search specific field Elasticsearch	1	429	February 21, 2022
Question on scoring Elasticsearch	4	435	May 5, 2017
Highlight content from crawl data from manifoldcf to ES Elasticsearch	5	636	July 6, 2017
JSON structure & querying object hierarchies Elasticsearch	11	1221	July 6, 2017
Need help for search Elasticsearch	8	409	July 6, 2017

Elasticsearch as Search Database For Hts Database

Related topics