How two Chinese women turned their Ph.D. theses into machine learning that makes connections between seemingly unrelated events to discover emerging fraud schemes.
By Hank Tucker , Forbes Staff
B ack in 2006, after earning her Ph.D. from Carnegie Mellon, Yinglian Xie wasn’t thinking about whether she wanted to make her career in the U.S. or her native China, let alone about someday being an entrepreneur. Instead, she was laser focused on the best place to continue her work–her thesis was on identifying potential internet security threats by looking for correlations between seemingly unrelated events. She ended up at Microsoft Research’s Silicon Valley labs, a think tank for tech breakthroughs. “They were truly the best in the world,’’ marvels Xie, noting that her colleagues included multiple winners of the Turing Award–a.k.a. the Nobel prize for computer science.
Three weeks later, fresh off earning her own Ph.D. at the University of California/Berkeley (also with a thesis on internet security) Fang Yu arrived at the same lab. The women had grown up 30 minutes apart near the city of Suzhou, just west of Shanghai, and quickly became friends and scientific collaborators.
Cody Pickens for Forbes
Today, Xie, 48, and Yu, 46, have dozens of academic papers and 11 years of startup struggles under their belts. Now, the company they cofounded in 2013, Mountain View, California- based DataVisor, has finally broken through in the hot area of protecting financial firms and their customers from fraud. With Xie as CEO and Yu as chief product officer, DataVisor’s revenue shot up 67% in 2024 to $50 million, helping it land for the first time on Forbes’ Fintech 50 list recognizing America’s most innovative fintech startups. Its customers include SoFi, Affirm and Marqeta.
These days, banks and fintechs use multiple security techniques and vendors in a veritable arms race with the scamsters; DataVisor’s niche is finding emerging fraud networks before they can inflict big losses. These networks are continually finding new ways to take over clusters of user accounts, exploit vulnerabilities like credit bureau leaks to submit fraudulent loan applications or dupe unsuspecting users into paying for illegitimate products. Any time a new fraud ring or method gets uncovered, fraud detection models can be updated to spot those red flags, but that doesn’t always help users that were already affected.
“With typical machine learning, you need to train it on something to learn and get better,” says Xie. “You’re always reactive, detecting attack patterns from months ago when things have already moved and changed.”
DataVisor’s secret sauce is what’s known as “unsupervised” machine learning, which uses algorithms to analyze data sets that are unlabeled and discover correlations on their own without humans telling it what targets or categories to look for. Xie won’t go into detail on what makes the patented algorithms work, but gives a down-to-earth example: Imagine a fraud ring gains access to a bank’s data and identifies a certain victim profile, like longtime older customers that have high average transaction amounts and less digital experience. They would then be sent fraudulent offers to purchase a gift card and if the dollar amount is under their typical transaction, it’s less likely to set off the bank’s fraud filters. But DataVisor’s unsupervised machine learning would be able to make connections between these bank customers in milliseconds–connections no one told it to look for–and block the onslaught of phony offers in real time.
“That’s very unique about DataVisor, that we’re able to do real-time clustering,” says Yu. “They have brand-new schemes coming up every day or even every hour.” That capability is particularly valuable these days, Xie says, because “pretty much all of today’s major attacks are coming from these coordinated fraud rings.”
In fact, one recent report by identity verification firm Au10tix declared 2024 the year of “Fraud As A Service.” The average number of incidents in each coordinated “mega attack” doubled from 4,000 to 8,000, the report says, and this commoditization of criminality is taking more and more money from consumers. Fraud losses reported to the Federal Trade Commission in the U.S. reached $10.4 billion in 2023 and $8.7 billion in the first three quarters of 2024, on pace to set another record.
S ay this for Xie and Yu–they have the technical chops to back up their claims that their algorithms are superior. Both were collegiate superstars, with Xie ranking first out of 140 computer science students graduating with her from Peking University and Yu interning with the founding members of Microsoft Research Asia while an undergrad at Fudan University in Shanghai, which is what inspired her to pursue a Ph.D.
Both came to the U.S. for graduate study, believing it was the place to pursue cutting edge computer science. They gained green cards and stayed to work for Microsoft, eventually becoming citizens.
In seven years at Microsoft, both published dozens of papers that have since been cited thousands of times. They often collaborated as coauthors on published papers covering subjects like a novel approach to detecting search bot traffic, or how to identify malicious web advertising schemes.
“We had a lot of ideas, but we were always waiting for other people to pick up the idea for it to become reality. We talked about, if we stay in Microsoft Research for another year, we can publish maybe three or four papers per year, but after many years, you get unhappy with that level of impact,” says Xie. “We wanted something more real.”
They had also been approached by researchers at other companies like Yelp, Pinterest and Facebook who had read their papers and wanted to collaborate on similar data analysis problems. So in 2013 they took the entrepreneurial plunge.
“Before we started the company, we asked around to people, are we ready?” says Yu. “The unanimous answer was `no, you guys do not know what’s ahead of you.’”
Starting with their own savings and Silicon Valley connections, they picked up a couple early customers like Yelp, which wanted to identify whether users were abusing its system with reviews, and Chinese instant messaging app Momo. They closed a $14.5 million Series A round in 2015, the first outside money they raised, and carved out a niche building security solutions for high-tech internet companies. In 2018, DataVisor raised another $40 million led by Sequoia China, and it achieved a $390 million valuation according to Pitchbook after $12 million more in 2019.
But beneath the surface, their market was drying up. While DataVisor was focused on abuse of promotions that tech companies would offer to attract more users, its customers weren’t offering as many of these rewards which didn’t typically lead to high retention rates. At the same time, financial institutions were digitizing their systems much more rapidly, and DataVisor shifted its strategy to target them as customers.
The pivot wasn’t fast. Xie says it took two or three years to “turn our product inside out” and make its algorithms more end-to-end, like a one-stop-shop for fraud prevention. DataVisor raised $40 million in December 2022 led by Brighton Park Capital to assist in its reinvention. It took a lower $260 million valuation for that round, according to Pitchbook, at a time when investors’ enthusiasm for fintechs was sinking, bringing its total funding to more than $100 million. Forbes estimates the two founders have collectively retained around 25% of the company.
Xie describes the change in market sentiment in 2022 as a “reality check” for her and her board after the previous year set a high bar for valuations. But she never felt serious pressure to bring in more experienced managers to raise another round. Instead, she says DataVisor ended up with multiple term sheets to consider and chose to partner with Brighton Park, a growth equity firm based in Greenwich, Connecticut, which believed in her long-term vision.
“That was a point where we needed additional funding to bring us to the next level,” says Xie. “We were very confident that we still had the best technology in the world.”
So far, the new look has been successful. DataVisor’s 50 customers now include buy-now, pay-later firm Affirm, digital bank SoFi and card issuer Marqeta. Although that customer count is paltry compared with identity verification firms like Persona or Socure which serve thousands of businesses, DataVisor promises a deep relationship covering everything from onboarding users to monitoring their transactions and wire transfers. Customers pay an annual subscription fee that varies based on the volume of events they need DataVisor to process.
Xie says many customers now come to DataVisor looking to consolidate their fraud-fighting efforts after finding that trying to integrate several different vendors creates headaches and inconsistencies. That handholding and holistic approach allows DataVisor to charge more for each additional customer than the competition. “Over time, we hope that gives us a pathway to become a much bigger company,” she says.
MORE FROM FORBES
Forbes How A Fintech Saved Itself By Targeting Buy-Now, Pay-Later Addicts
Forbes The 50 Hottest Fintech Startups In 2025
Forbes The Future Of Wall Street And Enterprise: Fintech 50 2025
Forbes The Future Of Investing: Fintech 50 2025