Hot questions for Using Azure in azure cognitive search

Top Java Programmings / Azure / azure cognitive search

Question:

1) Suppose I use the Azure Search API to upload a new document:

POST /indexes/[index name]/docs/index?api-version=[api-version]  

2) I get a response with an HTTP code 201 (document was successfully created)

3) I use the API again to DELETE to newly uploaded document

Can I be 100% sure that the document will be eventually deleted? Or will it fail if the indexing process has not complete?


Answer:

I work on the Azure Search team. Once you get a success code (HTTP 201) from the indexing API it implies that the document has been indexed successfully. This means that the document exists in the internal data structures and can be deleted. The indexed document might not be available for searching immediately as that requires an internal refresh of the index.

Deletion is lazy i.e. documents are first marked for delete and eventually removed from the index for performance. This means that the deleted documents might still show up in search results for a few seconds after executing the delete. I hope this answers your question.

Question:

1) Suppose I use the Azure Search API to upload a new document:

POST /indexes/[index name]/docs/index?api-version=[api-version]  

2) I get a response with an HTTP code 201 (document was successfully created)

3) I use the API again to search the newly uploaded document

Can I be 100% sure that I will get the document in the results? Or could there be a delay in the indexing process?


Answer:

No, it's not guaranteed the document will be returned in the query. The usual delay is on the order of seconds, but depending on the overall system load it can take longer. You'll need to run tests on your service to find the typical delay in your application.

Azure Search offers eventual consistency which means the index will be consistent at some time in the future but exactly when is not guaranteed.

Even polling for the document until it shows up in a query result is not sufficient to always guarantee consistency for indexes with multiple replicas, because requests can be interleaved with documents merging into replicas of the index. For example

  1. Replicas A and B are consistent
  2. Client uploads new document
  3. Replica A receives the upload request
  4. Replica A processes the upload request and is ready to return the new document in query results
  5. Client queries for the new document, which happens to be served by Replica A, and gets the new document in the result
  6. Client queries for the new document again, which happens to be served by Replica B this time, and does not get the new document in the result
  7. New document is processed by Replica B
  8. Both replicas are now consistent again