Duplicates when paging


(Juan) #1

Hi,

I am using ElasticSearch version 0.90.2 and when using the paging
functionality I manage to get duplicates returned. The issue is that on our
application we use the total records found, but when we start paging the
duplicates documents are removed from the U.I. for obvious reasons. Thus
not matching the total. I had a look at the scroll search type but is not
really an option because it does not support as far as I understand,
sorting scoring and faceting. We have a 2 node cluster with 5 shards and 5
replicas each.

Here is an example of my mapping.

{
"OrganizationProfile" : {
"properties" : {
"Campaigns" : {
"properties" : {
"AddedAt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Name" : {
"type" : "string"
}
}
},
"CandidateType" : {
"type" : "long"
},
"Fired" : {
"type" : "boolean"
},
"FirstJoined" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Flags" : {
"type" : "string"
},
"JoinCampaign" : {
"type" : "string"
},
"LastJoined" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"LastUpdated" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"NoCall" : {
"type" : "boolean"
},
"Notes" : {
"properties" : {
"Author" : {
"type" : "string"
},
"AuthorId" : {
"type" : "string"
},
"Content" : {
"type" : "string"
},
"Created" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"NoteType" : {
"type" : "long"
},
"_id" : {
"type" : "string"
}
}
},
"OrganizationId" : {
"type" : "long"
},
"Profile" : {
"properties" : {
"ExistingProfileId" : {
"type" : "long"
},
"PersonalDetail" : {
"properties" : {
"Email" : {
"type" : "string"
},
"FirstName" : {
"type" : "string"
},
"HighestEducationLevel" : {
"type" : "long"
},
"LastName" : {
"type" : "string"
},
"Location" : {
"properties" : {
"Address" : {
"type" : "string"
},
"City" : {
"type" : "string"
},
"Coordinates" : {
"type" : "geo_point"
},
"Country" : {
"type" : "string"
},
"PostCode" : {
"type" : "string"
},
"State" : {
"type" : "string"
}
}
},
"Phone" : {
"type" : "string"
}
}
},
"PhotoUrl140" : {
"type" : "string"
},
"PhotoUrl50" : {
"type" : "string"
},
"PhotoUrl80" : {
"type" : "string"
},
"ProfessionalDetail" : {
"properties" : {
"Education" : {
"properties" : {
"CourseMajor" : {
"type" : "string"
},
"EndDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Level" : {
"type" : "long"
},
"OrganizationName" : {
"type" : "string"
},
"StartDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
},
"Interests" : {
"properties" : {
"Title" : {
"type" : "string"
},
"_id" : {
"type" : "long"
}
}
},
"WorkHistories" : {
"properties" : {
"Employer" : {
"type" : "string"
},
"EndDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"JobTitle" : {
"type" : "string"
},
"StartDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
},
"ShortCode" : {
"type" : "string"
},
"Version" : {
"type" : "long"
},
"_id" : {
"type" : "string"
}
}
},
"RecruiterCampaigns" : {
"properties" : {
"AddedAt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Name" : {
"type" : "string"
}
}
},
"ReferringUrl" : {
"type" : "string"
},
"Source" : {
"type" : "string"
},
"Student" : {
"type" : "boolean"
},
"Tags" : {
"type" : "string"
},
"Unsubscribed" : {
"type" : "boolean"
},
"Version" : {
"type" : "long"
},
"Veteran" : {
"type" : "boolean"
},
"id" : {
"type" : "string"
},
"organizationId" : {
"type" : "long"
}
}
}
}

And here is an example of the query.

{
"from": 0,
"size": 20,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "mcghee*",
"fields": [
"JoinCampaign^1",
"Campaigns.Name^2",
"RecruiterCampaigns.Name^2",
"Notes.Content^2",
"Tags^2",
"Profile.ProfessionalDetail.WorkHistories.Employer^3",
"Profile.ProfessionalDetail.WorkHistories.JobTitle^3",
"Profile.PersonalDetail.Location.City^4",
"Profile.PersonalDetail.Location.Country^4",
"Profile.PersonalDetail.Location.PostCode^3",
"Profile.PersonalDetail.Location.State^4",
"Profile.PersonalDetail.Phone^1",
"Profile.PersonalDetail.FirstName^5",
"Profile.PersonalDetail.LastName^5",
"Profile.PersonalDetail.Email^7"
],
"default_operator": "and"
}
},
"filter": {
"term": {
"OrganizationId": "23"
}
}
}
},
"fields": [
"Profile",
"Tags",
"Campaigns",
"RecruiterCampaigns",
"CandidateType",
"Unsubscribed",
"Student",
"Fired",
"NoCall",
"Veteran",
"Notes",
"_id",
"OrganizationId"
]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

not sure, if I got your question right? Do you have this problem while
indexing and searching (which means new data is added, changing the search
results), or while search only? If the latter, you should take a look at
the preference setting, which could be your users sessionid from the web
application. See

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html

--Alex

On Thu, Oct 24, 2013 at 11:20 PM, Juan Herbst jjherbst@gmail.com wrote:

Hi,

I am using ElasticSearch version 0.90.2 and when using the paging
functionality I manage to get duplicates returned. The issue is that on our
application we use the total records found, but when we start paging the
duplicates documents are removed from the U.I. for obvious reasons. Thus
not matching the total. I had a look at the scroll search type but is not
really an option because it does not support as far as I understand,
sorting scoring and faceting. We have a 2 node cluster with 5 shards and 5
replicas each.

Here is an example of my mapping.

{
"OrganizationProfile" : {
"properties" : {
"Campaigns" : {
"properties" : {
"AddedAt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Name" : {
"type" : "string"
}
}
},
"CandidateType" : {
"type" : "long"
},
"Fired" : {
"type" : "boolean"
},
"FirstJoined" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Flags" : {
"type" : "string"
},
"JoinCampaign" : {
"type" : "string"
},
"LastJoined" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"LastUpdated" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"NoCall" : {
"type" : "boolean"
},
"Notes" : {
"properties" : {
"Author" : {
"type" : "string"
},
"AuthorId" : {
"type" : "string"
},
"Content" : {
"type" : "string"
},
"Created" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"NoteType" : {
"type" : "long"
},
"_id" : {
"type" : "string"
}
}
},
"OrganizationId" : {
"type" : "long"
},
"Profile" : {
"properties" : {
"ExistingProfileId" : {
"type" : "long"
},
"PersonalDetail" : {
"properties" : {
"Email" : {
"type" : "string"
},
"FirstName" : {
"type" : "string"
},
"HighestEducationLevel" : {
"type" : "long"
},
"LastName" : {
"type" : "string"
},
"Location" : {
"properties" : {
"Address" : {
"type" : "string"
},
"City" : {
"type" : "string"
},
"Coordinates" : {
"type" : "geo_point"
},
"Country" : {
"type" : "string"
},
"PostCode" : {
"type" : "string"
},
"State" : {
"type" : "string"
}
}
},
"Phone" : {
"type" : "string"
}
}
},
"PhotoUrl140" : {
"type" : "string"
},
"PhotoUrl50" : {
"type" : "string"
},
"PhotoUrl80" : {
"type" : "string"
},
"ProfessionalDetail" : {
"properties" : {
"Education" : {
"properties" : {
"CourseMajor" : {
"type" : "string"
},
"EndDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Level" : {
"type" : "long"
},
"OrganizationName" : {
"type" : "string"
},
"StartDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
},
"Interests" : {
"properties" : {
"Title" : {
"type" : "string"
},
"_id" : {
"type" : "long"
}
}
},
"WorkHistories" : {
"properties" : {
"Employer" : {
"type" : "string"
},
"EndDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"JobTitle" : {
"type" : "string"
},
"StartDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
},
"ShortCode" : {
"type" : "string"
},
"Version" : {
"type" : "long"
},
"_id" : {
"type" : "string"
}
}
},
"RecruiterCampaigns" : {
"properties" : {
"AddedAt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Name" : {
"type" : "string"
}
}
},
"ReferringUrl" : {
"type" : "string"
},
"Source" : {
"type" : "string"
},
"Student" : {
"type" : "boolean"
},
"Tags" : {
"type" : "string"
},
"Unsubscribed" : {
"type" : "boolean"
},
"Version" : {
"type" : "long"
},
"Veteran" : {
"type" : "boolean"
},
"id" : {
"type" : "string"
},
"organizationId" : {
"type" : "long"
}
}
}
}

And here is an example of the query.

{
"from": 0,
"size": 20,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "mcghee*",
"fields": [
"JoinCampaign^1",
"Campaigns.Name^2",
"RecruiterCampaigns.Name^2",
"Notes.Content^2",
"Tags^2",
"Profile.ProfessionalDetail.WorkHistories.Employer^3",
"Profile.ProfessionalDetail.WorkHistories.JobTitle^3",
"Profile.PersonalDetail.Location.City^4",
"Profile.PersonalDetail.Location.Country^4",
"Profile.PersonalDetail.Location.PostCode^3",
"Profile.PersonalDetail.Location.State^4",
"Profile.PersonalDetail.Phone^1",
"Profile.PersonalDetail.FirstName^5",
"Profile.PersonalDetail.LastName^5",
"Profile.PersonalDetail.Email^7"
],
"default_operator": "and"
}
},
"filter": {
"term": {
"OrganizationId": "23"
}
}
}
},
"fields": [
"Profile",
"Tags",
"Campaigns",
"RecruiterCampaigns",
"CandidateType",
"Unsubscribed",
"Student",
"Fired",
"NoCall",
"Veteran",
"Notes",
"_id",
"OrganizationId"
]
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Juan) #3

Thank you Alex, I will give it a try. FYI it even happens while not
indexing.

Juan

On Friday, 25 October 2013 20:44:58 UTC+13, Alexander Reelsen wrote:

Hey,

not sure, if I got your question right? Do you have this problem while
indexing and searching (which means new data is added, changing the search
results), or while search only? If the latter, you should take a look at
the preference setting, which could be your users sessionid from the web
application. See

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html

--Alex

On Thu, Oct 24, 2013 at 11:20 PM, Juan Herbst <jjhe...@gmail.com<javascript:>

wrote:

Hi,

I am using ElasticSearch version 0.90.2 and when using the paging
functionality I manage to get duplicates returned. The issue is that on our
application we use the total records found, but when we start paging the
duplicates documents are removed from the U.I. for obvious reasons. Thus
not matching the total. I had a look at the scroll search type but is not
really an option because it does not support as far as I understand,
sorting scoring and faceting. We have a 2 node cluster with 5 shards and 5
replicas each.

Here is an example of my mapping.

{
"OrganizationProfile" : {
"properties" : {
"Campaigns" : {
"properties" : {
"AddedAt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Name" : {
"type" : "string"
}
}
},
"CandidateType" : {
"type" : "long"
},
"Fired" : {
"type" : "boolean"
},
"FirstJoined" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Flags" : {
"type" : "string"
},
"JoinCampaign" : {
"type" : "string"
},
"LastJoined" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"LastUpdated" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"NoCall" : {
"type" : "boolean"
},
"Notes" : {
"properties" : {
"Author" : {
"type" : "string"
},
"AuthorId" : {
"type" : "string"
},
"Content" : {
"type" : "string"
},
"Created" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"NoteType" : {
"type" : "long"
},
"_id" : {
"type" : "string"
}
}
},
"OrganizationId" : {
"type" : "long"
},
"Profile" : {
"properties" : {
"ExistingProfileId" : {
"type" : "long"
},
"PersonalDetail" : {
"properties" : {
"Email" : {
"type" : "string"
},
"FirstName" : {
"type" : "string"
},
"HighestEducationLevel" : {
"type" : "long"
},
"LastName" : {
"type" : "string"
},
"Location" : {
"properties" : {
"Address" : {
"type" : "string"
},
"City" : {
"type" : "string"
},
"Coordinates" : {
"type" : "geo_point"
},
"Country" : {
"type" : "string"
},
"PostCode" : {
"type" : "string"
},
"State" : {
"type" : "string"
}
}
},
"Phone" : {
"type" : "string"
}
}
},
"PhotoUrl140" : {
"type" : "string"
},
"PhotoUrl50" : {
"type" : "string"
},
"PhotoUrl80" : {
"type" : "string"
},
"ProfessionalDetail" : {
"properties" : {
"Education" : {
"properties" : {
"CourseMajor" : {
"type" : "string"
},
"EndDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Level" : {
"type" : "long"
},
"OrganizationName" : {
"type" : "string"
},
"StartDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
},
"Interests" : {
"properties" : {
"Title" : {
"type" : "string"
},
"_id" : {
"type" : "long"
}
}
},
"WorkHistories" : {
"properties" : {
"Employer" : {
"type" : "string"
},
"EndDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"JobTitle" : {
"type" : "string"
},
"StartDate" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
},
"ShortCode" : {
"type" : "string"
},
"Version" : {
"type" : "long"
},
"_id" : {
"type" : "string"
}
}
},
"RecruiterCampaigns" : {
"properties" : {
"AddedAt" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Name" : {
"type" : "string"
}
}
},
"ReferringUrl" : {
"type" : "string"
},
"Source" : {
"type" : "string"
},
"Student" : {
"type" : "boolean"
},
"Tags" : {
"type" : "string"
},
"Unsubscribed" : {
"type" : "boolean"
},
"Version" : {
"type" : "long"
},
"Veteran" : {
"type" : "boolean"
},
"id" : {
"type" : "string"
},
"organizationId" : {
"type" : "long"
}
}
}
}

And here is an example of the query.

{
"from": 0,
"size": 20,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "mcghee*",
"fields": [
"JoinCampaign^1",
"Campaigns.Name^2",
"RecruiterCampaigns.Name^2",
"Notes.Content^2",
"Tags^2",
"Profile.ProfessionalDetail.WorkHistories.Employer^3",
"Profile.ProfessionalDetail.WorkHistories.JobTitle^3",
"Profile.PersonalDetail.Location.City^4",
"Profile.PersonalDetail.Location.Country^4",
"Profile.PersonalDetail.Location.PostCode^3",
"Profile.PersonalDetail.Location.State^4",
"Profile.PersonalDetail.Phone^1",
"Profile.PersonalDetail.FirstName^5",
"Profile.PersonalDetail.LastName^5",
"Profile.PersonalDetail.Email^7"
],
"default_operator": "and"
}
},
"filter": {
"term": {
"OrganizationId": "23"
}
}
}
},
"fields": [
"Profile",
"Tags",
"Campaigns",
"RecruiterCampaigns",
"CandidateType",
"Unsubscribed",
"Student",
"Fired",
"NoCall",
"Veteran",
"Notes",
"_id",
"OrganizationId"
]
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4