How Clearbit built the Risk API and stopped spammy sign-ups
At Clearbit, we pride ourselves on making it possible for anyone to get started with our APIs and data within minutes. Part of this means we provide a free trial to users that sign up at clearbit.com. A single trial account can't do anything particularly troublesome, but en-masse it's possible to create traffic and usage patterns we'd consider abusive at a free tier.
Every night around 10 pm PST, a pattern of sign-ups were appearing that featured suspicious emails from disposable email providers like
email@example.com. Each account would blow through the limits for our APIs in seconds, and then another illegitimate account would be created. Over the next couple of days, we would determine that hundreds of accounts fitting the same profile were being created each evening by a spam bot — something we definitely considered abusive.
At the time we were using Google's reCAPTCHA but it wasn't slowing down these risky sign-ups. It later turned out that reCAPTCHA had a vulnerability allowing it to be bypassed by using the audio captcha and running it through Google's own speech recognition API.
Stopping malicious users
With reCAPTCHA failing us, we calculated a number of our own data points designed to block risky sign-ups. These include things like identifying multiple sign-ups from a single IP address, blacklisting known bad and disposable email services, and running the sign-up's email address through our own Enrichment API to determine if it looks like a legitimate person or company. These data points are used individually as signals of a sign-up's risk factor, and then in aggregate through a machine learning algorithm that identifies patterns of risky behavior.
The best part about being able to identify risky and fraudulent behavior is that it's not a problem specific to Clearbit — the patterns and behaviors we see are consistent amongst most web properties. We also know that the more data you can aggregate the more useful it can be. With more information about what is risky and what is not, we can improve our machine learning and detection algorithms to generate more accurate risk scores. As a result, we're releasing our Risk API for public use, absolutely free for up to 50,000 requests a month. You can get started integrating it right now!
How do you use it?
We're big advocates of progressively increasing friction when dealing with risky actors. A great example of this is the Clearbit web form.
Initially, a user is asked for their email and their password. The form uses the provided email address to proactively fill out other important information through the Clearbit Enrichment API and also calculates a Risk score for the sign-up. If the sign up appears to be moderately risky, we increase friction by adding Google reCAPTCHA. If the sign up appears any more than moderately risky, we’ll ask to verify their phone number via SMS or email address via a confirmation email.
This combination of steps makes it incredibly simple for:
- legitimate actors to complete the sign-up process
- companies to get in-depth information about who is signing up
- increase friction for illegitimate actors to a point where the effort required is greater than the benefit returned
Through implementing the Risk API in this way, we've blocked 1,081 (27.56% of total attempts) illegitimate sign-ups to clearbit.com over the past month, and verified an additional 909 (32% of successful sign-ups) users via SMS to ensure they were legitimate.