What Semantic Search Means for Recruiting

Semantics is the field of study that focuses on meaning. Semantic search engines, therefore, would be ones capable of understanding the meaning of content for which they search. We define meaning as the message inherently intended, expressed or signified in symbols, words, phrases, sentences and larger blocks of text.

A semantic search engine would need to understand not only the meaning of the data but that of the question being asked. And it would need to do this instantly or automatically, returning only results that match and none that have a meaning different from what the asker intended.

For example, a semantic search engine could disambiguate results that lead to people's resumes or profiles versus results that lead to employment advertisements. Perhaps an even simpler example would be to tell the difference between Apple the fruit, Apple the company (or products) and Apple the record studio. Search for just Apple in Google and most of the top results will be about the company, not the other two, because most people on Google are searching and clicking on results about Apple Inc. (and/or its products). This is how statistically based popularity driven search devices like Google’s "page rank" work. This is not semantic search at all. Popular pages are not necessarily credible, and credible sources are not always incredibly popular.


One of the largest problems with implementing true semantic search has been that it is difficult for the computer to know who you are. In the example above, it would have to tell whether you are a job seeker or a recruiter. Unless the search engine can learn from your past search behavior, or your previous selections, you would have to manually indicate a category for it to categorize results. Some search engines approximate semantic search by asking you to tag, catalog, sort and otherwise try to "train" the search engine, which is too time consuming for the average user.

So why is this important to you? Well, if a computer knows what you mean right away, without having to learn from you or be trained by you, it would give you only relevant results and not show you all that other junk you have to manually sift through in today’s search engines. Technology is getting there, but we’re not close enough yet. In my opinion there is still no search tool that comes close to understanding meaning and context, much less subtext, but there are a few getting close enough to be worth exploring.

Where Is Semantic Search Today?

Semantic search promises that we should be able to search content on the Internet without needing to be experts in search. To do this, it needs to be automatic and it must not require us to go around tagging and cataloging content in order to make it acceptable for computers to "understand." We just don’t have time to waste when a computer should be smart enough to derive context, subtext and meaning for me. Tagging and annotating is not the answer, either. Another part of the problem is the lack of sheer computing power. There are limits to what we can compute today, and search engines are no more than racks full of computers running relational databases.

You see, search problems that have an exponential number of possible solutions can’t be solved by merely analyzing relational data. Think of all the possible combinations of meaning around a simple word like "well." It could be a hole in the ground for extracting water, oil, gas or brine… or it could be a container such as an ink well, or health related as in "I’m not well," or an interjection in conversation "Well, then what I suggest is…" Or perhaps it signifies abundance as in "a well of information." It could even mean one of the Internet’s original communities, The WELL. And those are just some of the variations of the word as a noun. There are several others like the open space through all the floors in a building (stair well), nautical (anchor well), aeronautical (wheel well) or even the space in front of a judge’s bench in England. Then if you consider all the verb and adjective variations, and idioms, well… you see?

When a human reads "stair well" they don’t imagine an oil well inside a stair, they automatically know what it means. Computers, on the other hand, have to calculate dozens of variations and probabilities to be able to arrive at a best guess. I’m sure you’ve looked up words in the dictionary only to find they have three or four completely different definitions, sometimes even more! Disambiguating them is easy for humans because of context and subtext, but not very easy for computers.

Context is the physical text surrounding a word, sentence or paragraph. In other words, you can physically read context and so can search engines. Words can be indexed and statistically analyzed. This is mostly how major search engines work today. Subtext, however, is the implicit or underlying meaning of text. Machines have not yet been able to "read" subtext; it must be interpreted though either inference, intuition, knowledge of the stated subject matter, educated guessing, or by making assumptions as a result of logical leaps. To do this, machines would have to use massive computing power to establish all possible relationships between words.

This kind of soft information is easily interpreted by a human with just enough knowledge of the landscape to be able to make logical leaps. For example, if you read the words "windows" and "vista" on a page that has other words that look like they are related to computers, you immediately know that is a page about the Microsoft Windows Vista operating system. But if a computer picks up those two words in a page it could associate them with concepts such as a view out of a window, and not really understand the underlying meaning.

Someday, search engines may be able to infer meaning from the pages they index. I’m waiting, with bated breath, for a solution that approaches the artificial intelligence needed to successfully extract this from pages. While we’re not yet at the point where machines can understand meaning, enough progress has been made towards the promise of true semantic search that there are some products worth reviewing:

• Monster.com's 6Sense: designed specifically for recruiters and specialized in understanding patterns found in resumes, 6Sense utilizes intuitive, concept based searching and combines that with libraries of job titles, skills, experience level, industries and education to interpret resumes. The technology can differentiate subtleties that up until now only humans were able to recognize in a resume such as the difference between recent and past experience, and junior or senior level of experience.

• TalentSpring: also designed for recruiters this technology applies semantic technology to search for resumes, but in addition is capable of taking structured information such a people profiles on social networks, and matching them to employment requirements using both ontological categorization and semantic analysis.

• The "big five" search engines: Ask.com, Bing.com, Exalead.com, Google.com and Yahoo.com each employ different combinations of lexical or ontological, statistically, and user-behavioral learning technologies in an attempt to better understand what searchers really want, instead of providing an exact match for the keywords typed into the search box.

• Vertical search engines: index a multitude of websites focusing deep in one topic. For example, Wink.com searches only for people within social networks like LinkedIn, Twitter and Facebook while Spock.com and Zoominfo.com collate biographical information about people from a multitude of publicly available online sources.

Continued pressure on recruiters’ time will increase demand for these tools, resulting in the emergence of less expensive and more specialized solutions. Applying semantic search to an industry vertical is efficient because custom dictionaries make it possible to get closer to understanding meaning without having to "know everything." Semantic search engines that search through nothing but resumes, for example, can get very good at understanding resumes even with limited computing power. Apply that to even narrower ecosystems and ambiguous terms with definitions that vary widely in other fields (e.g., architect and struts mean something very different in software than real estate) become unique or specific on a smaller scale, and thus easier for machines to comprehend. More competition in this space means smaller players will come up with viable solutions that attack different aspects of what get lumped under semantic search, making semantic search ubiquitous and soon completely transparent.