Every now and then I bump into stuffs that I didn’t even hear about when it comes to working with SQL Server. This time I heard the word Semantics Search and was wondering where and when did it come. In this blog, I will put the foundation for this and later we will look at how this can be used in our environments.
Wikipedia defines Semantics as the study of meaning. Microsoft defines “Semantics” as providing higher-level insight into large quantities of data even across diverse unstructured data.
In SQL Server 2012, this meaning is around key phrases in documents and documents that share similar concepts. However, if we define extraction as “from unstructured data, create structured data that describes it”, then SQL Server already has three instances of extraction: full-text indexing, XML indexing, and spatial indexing (this inputs unstructured spatial values and then creates a structured index supporting spatial queries).
Full-text search uses an inverted index for storing keywords from documents. This can answer questions like “What documents contain this word” or “What documents have this word NEAR that word” or “What documents have SQL Server in the title?”
While full-text search lets you query the words in a document, semantic search lets you query the meaning of the document. Semantic Search adds two additional index types as part of SQL Server’s vision for Rich Unstructured Data Services (RUDS).
- A tag index – Given a corpus of unstructured text documents, extract the key concepts in these documents or more specifically, the key phrases.
- A similarity index – utilizing the extracted key concepts to determine which concepts are related, using related concepts to infer related documents, etc. An example of using similarity is to find resumes based on a job description.
SQL Server Semantics falls into “Patterns/Outliers” segmentation. It is optimally designed to perform the extraction one time, but populate both the full-text search and the two sematic indexes, as configured. Therefore, it is possible to create a full-text index without semantic indexes, but it is not possible to create semantic indexes without also creating the full-text index.
Note: You cannot create one semantic index and not the other. You may only use one, but both are created during population when specifying to include.
If you are wondering if this is enabled on your SQL Server 2012 or 2014 instance, you can run the following command to determine if full-text and semantic search are installed. A return value of 1 indicates that these components are installed; a return value of 0 indicates that they are not installed.
Before I write the subsequent blogs around what these are and how to enable, I am interested in knowing has anyone worked or used Sematic Search with SQL Server in your applications?
Reference: Pinal Dave (http://blog.sqlauthority.com)