Have We Got A Match For You
That's exactly what the government is up against in its attempt to identify terrorists, what law-enforcement agencies go through when they look for identity-theft criminals, and what airlines contend with as they try to sort through their passenger name records to stop known terrorists from boarding their planes.
Sifting through billions of names and their variations to properly identify individuals blacklisted by the government requires a specialization so refined that some government solution providers are making names for themselves by specializing in this interesting niche.
Indeed, the market for data-quality software is estimated to reach $600 million this year, according to Giga Information Group, a Forrester Research subsidiary. But the market for value-added services is much bigger, so VARs can expect to tap into a market that will reach into the billions.
And buoying their push into this space are recommendations from the 9/11 Commission "to close the long-standing holes in our border security that are caused by the U.S. government's ineffective name-handling software."
Acxiom, Convera, Group 1, Object Sciences and SRD are among the major solution-provider players differentiating themselves in this market by tapping into the need for name recognition, which could fall under broader applications of data quality, data cleansing, data mining or knowledge management, and which can improve customer relationship management initiatives.
These companies are implementing name-recognition products from vendors like Language Analysis Systems (LAS), based in Herndon, Va., and which takes residence in the Center for Innovative Technology Building, located practically on the Dulles Airport runway. Another company developing custom name-matching systems is Search Software America, an Intellisync company, with offices in Old Greenwich, Conn., and around the world.
DATA MINING PAYS OFF
Object Sciences, an IT government solution provider based in Alexandria, Va., recently passed the $20 million mark in sales and just celebrated the hiring of its 100th employee last June. In large measure due to increased success with its name-recognition work, Object Sciences has grown 40 percent to 50 percent year-over-year. It recently won a contract from the Department of the Army for three years—a one-year base with two option years—worth up to $89 million providing real-transformational solutions and technology to military intelligence.
Data mining is a piece of the solution Object Sciences provides using LAS' product. With so much data, especially when it involves billions of different names, it's not an easy task. And Brice Eldridge, director of strategic planning and business development at privately held Object Sciences, sees this as a huge opportunity.
"The budget for the intelligence community is large—we're talking billions of dollars," Eldridge says. "Instead of floating in a bathtub now, we're floating in a lake as far as massive amounts of data. Part of what we provide, an important piece of what we provide in solving military intelligence, is counting all the massive amounts of data and how you get knowledge out of that."
Object Sciences' job is to tag data in a way that allows its customers to determine what is relevant and what isn't.
"So, how do you get relevance and significance out of this data in a timely way? I don't think it's a big mental leap to say if you put names on watch-lists, you still need a mechanism to know that the name you have in your hand can be matched to a terrorist watch-list," Eldridge says. "LAS is a step forward."
A ROSE BY ANY OTHER NAME
To be sure, that step forward has taken a long time. In 1969, the United States sent a man to the moon, but engineers, linguists, technologists and scientists are still grappling with how to create a flawless name-recognition technology.
The 9/11 Report states that one of the most important problems to address is transliterations of the same name; for example, the lack of a single convention for transliterating Arabic names—that is, translating from the written Arabic, which uses different characters, to the Romanized version—"allowed the 19 hijackers to vary the spelling of their names to defeat name-based watch-list systems and confuse any potential efforts to locate them," according to the report.
Waleed Al-Shehri, one of the hijackers on American Airlines Flight 11, which crashed into the North Tower of New York's World Trade Center with 92 people on board, could have Romanized his name legitimately as Oualid Chihri. Not even the most sophisticated name-recognition software could find those kinds of matches.
The problem is that all the name-matching systems in use today derive from an algorithm called Soundex, which was first tested in 1918 to process census data from 1890. But, according to Jack Hermansen, CEO of LAS, Soundex, which is used to help associate names, is very simplistic. Based on a key-code system, it has no sensitivity to other cultures.
"It's one-size-fits-all. So we missed a lot of people in our database," Hermansen says. "That has cost human lives."
For example, a "false negative" from a name search allowed Mir Aimal Kasi to enter the country, purchase a gun and eventually shoot five people outside the McLean CIA building in Virginia. Kasi was in a federal database as a suspected terrorist as Kansi. Soundex does not have the ability to parse, or track, the names in all the ways they could have been interpreted. LAS, a privately run company with $10 million in sales that was co-founded in 1984 by Hermansen and his partner, CTO Leonard Shaefer, is attempting to change that. "The databases we've been looking at over the past 10 years are rife with errors. We have software that will fix that," Hermansen vows. "Our software cuts down on false positives by 90 percent."
The LAS product is capable of recognizing names in a number of ways. For instance, it can extrapolate parts of a person's name that aren't really part of the name, such as the German "Von" or "Hajj," which Muslims add to their name to show that they've made the trip to Mecca, considered a milestone in their religion.
Up until 2001, LAS was a consulting services and customer software development firm. Then it turned its focus to product sales. Today, the company offers a suite of eight easily integrated off-the-shelf name-recognition technology software solutions—including NameHunter and the accompanying module, NameClassifier, MetaMatch, Name Reference Library, NameGenderizer, Name Standardization, Name Variation Server and its latest flagship product, NameParser—off the GSA Schedule and through VAR and OEM partners. The technology supports languages from more than 76 countries and 200 U.S. consulates around the world.
TRACKING DOWN THE 9/11 TERRORISTS
About one week after the 9/11 terrorist attacks, a government official with the now-abolished INS was working with Hermansen piloting his software in its ports of entry. An investigator was looking at the passenger manifest of the people who were on those ill-fated flights.
"We used a special demo copy. I had been working with Jack in LAS, and we had piloted the software in our ports of entry," a government spokeswoman said on condition of anonymity. She says an individual used a different name to enter the United States than what he was using familiarly, but the software enabled the investigator to track down the variants of the name spelling.
"This is one of the most useful uses of this software; you get a multiple variation of how each name is spelled. For instance, the name Mohammad has about 25 different spellings," the government source says. "It was a real asset for the investigators post-9/11. It was very critical in helping them to search for the names on the passenger manifest. And it made flight schools more cognizant of who they were admitting."
Another solution provider working in this space is Lanham, Md.-based Group 1, which sells both packaged software solutions and the services to implement them. More specifically, the company provides data quality to ensure the accuracy, timeliness and consistency in the financial, retail, health care, government, high-tech, manufacturing, telco and utility markets to more than 3,500 customers around the world, according to Ken Chow, vice president of marketing and product management. Some of its better-known customers include Chrysler, Verizon and JC Penney. In the government sector, Group 1, which was acquired by Pitney-Bowes in July, serves branches of the armed services as well as Fannie Mae, Freddie Mac and also foreign governments, such as Canada.
"For many of our customers, particularly in the government and financial sectors, it's important to match names for money laundering, homeland security and to de-duplicate databases in order to make sure the person they bring into their [database] is the same person they need to match the name," Chow says. "For us in the Western world, we're all pretty familiar with Roman names, but that only represents a fraction of the world. We found it to be an extremely daunting challenge.
"This is really better off as a buy rather than a build, and we looked for the right company to partner with," he continues. "We felt we found the best possible technology with LAS. The technology allows us to provide customers a high level of assurance that they can identify identities."
WHO ARE YOU? WE REALLY WANT TO KNOW.
According to Mark Beyer, senior program director of Meta Group's research services, field experience is what gives solution providers such as Object Sciences and Group 1 the corner in this market. Beyer, who is working on an as-yet-published piece dealing with the application of enhanced name recognition, explains that name matching and name analysis are different.
"Name matching is always based on a probability score," he explains. "You have to decide after you get a matched score, what are you going to do with it?" he says. For example, Beyer says, "If I get 'Ken,' is it the same as 'Kenneth' or 'Kenney?' In order for the matching process to work, I have to have a thesaurus of all the nicknames and abbreviations that come from a name and see if it's already on the list. If it's on the list, I have a match."
For example, Chow uses his own last name, which is Chinese, as an example of how the software works. Chinese is logographic, not alphabetic, as English is. That means Chinese characters emphasize the meaning of the word, not the sound of it.
"My own name is spelled Chow. In China, the actual family name is Zhou or Chou. That's where the problem lies," he says. "If you have someone from one culture transliterating from another culture handing it to a third party, you won't be able to match the name." Name analysis is a little different, Beyer explains, in that it basically takes apart the pieces of a name and determines that certain aspects of those pieces imply it came from another language, such as "Von," or a derivative, such as "Bin," which means "son of."
"When you put name recognition along with name analysis, the name analysis engine drives a more accurate kind of thesaurus," Beyer says.
Why is that important? Because, as Beyer points out, an identity is more important than a name.
"For instance, if I'm trying to protect someone against identity theft, and I see someone who didn't use the appropriate name, I can figure out [that person] is using someone else's information," he says.
Beyer explains that a VAR can compete in the market through increased competence in this field, and he says he sees strong applications of this software in security, fraud protection and enhancement of social welfare services.
"The more things you do to increase the competence in a matched score, the more you can automate the process," Beyer says, which, for customers, translates to cost savings, better service and increased competence in the system's ability to deliver.
Group 1, for example, has introduced a capability that has enhanced its ability to use consumer names.
"LAS allows you to see through these vast libraries and allows [you] to understand how the name cultures, spellings and pronunciations are transliterated," Chow says, noting that Acxiom is one of its biggest customers. "That's something that can only be learned through decades of experience. We began to test it, and it worked [great]."
The company installs and supports a wide variety of platforms, including mainframe, AS/400 and any flavor of Unix and Windows, according to Chow.
"Our ability to parse, match and validate addresses, which is certainly one of our flagship functions, was developed over decades. There's a high barrier to entry. But there's a need for services for data quality," Chow says.
And the market is huge not just for security, but in the commercial sector. "People are getting tired of having their names mangled in a computer," Hermansen says.