Every API service should include request rate limiting. Up until recently, Zencoder didn't have it. This was less than ideal; it left us exposed to various types of problems. The good news is that as of today, we have full rate limiting.
Why rate limit API requests?
Rate limiting is important for a few reasons.
First, one of the golden rules of multi-tenant architecture is that one user's behavior shouldn't affect someone else. If you're using Dropbox to store files, and someone else uploads a few petabytes of data, you shouldn't get an error saying Dropbox doesn't have any space left for you. Or if your neighbor floods his cable internet with BitTorrent traffic, your own service shouldn't be degraded because of this.
Without rate limiting, one user of an API can adversely affect another user. One customer could flood a service with unnecessary API requests - intentionally, abusively, or accidentally - and that could mean that other users experience slowdowns or capacity issues. Now the service itself should be able to handle any reasonable capacity, of course. But no matter how scalable or performant a service is, there are always limits. Whether those limits are 100 requests/second or 1,000,000 requests/second, the limit is there. Rate limiting ensures that everyone has access to resources.
That leads to the second reason that rate limiting is important for APIs: it's easy to make mistakes when integrating with an API. Because APIs are called programmatically, it isn't particularly difficult to accidentally send more requests than you intend. If you forget the exit condition in a loop, or forget to wait for a few seconds (or minutes) between poll requests, your simple API integration can turn into a simple Denial of Service attack.
What's more, when an API is billed on a per-use basis, rate limits are actually a safeguard. If you're exceeding a reasonable rate limit, there is a good chance that you're doing something wrong. And if you're doing something wrong with a per-use API, it's possible that you're accidentally racking up a giant bill.
How it works at Zencoder
Zencoder now limits how many times you can call a particular method inside a give timeframe. Limits are tracked on a per-method (resource) basis, with the exception of progress requests, which are tracked per-file. If you exceed your quota, Zencoder will return a 403 error with a body of "403 Forbidden (Rate Limit Exceeded)". We'll also send back an additional header, "Retry-After", which contains the number of seconds until your quota is reset. Additionally, each HTTP response contains a header called "X-Zencoder-Rate-Remaining". This header lists the number of calls you can make to a given resource within the current time frame.
The limits are as follows.
Job creation requests (POST): 1,000 per minute
File progress requests (GET): 60 per minute per file
All other GET requests: 60 per minute total
This is more than enough for reasonable usage. This allows for up to 1 progress request per second for each output. It allows for 1,000 job creation requests per minute, which is a lot. And if you've architected your usage of Zencoder properly, you won't need more than 60 other requests per minute. (If you do, by the way, let us know.)
The good news is that if you exceed this limit, this doesn't mean that Zencoder is broken for you, or that your application will error - as long as you handle the limits properly. Just code your Zencoder integration to retry a request after the specified time period, and you'll be set.
Rate limiting is a feature, not a bug
From one perspective, rate limiting is a Bad Thing. "What do you mean, I can't send as many requests as I want?" But that's not how cloud resources work. If you want to launch more than 20 servers using Amazon EC2, you need to get in touch with Amazon to increase your instance limit. This limit protects you from accidentally launching 1M servers; it protects Amazon from giving giant supercomputer clusters for 30 days to fake accounts and stolen credit cards; and it protects other EC2 users from running out of capacity due to a single user's demand. Even though EC2 has massive computing resources, those resources are finite.
We've placed these limits at a level that customers should never hit under normal circumstances, and we'll aim to raise the limits over time. But if you do ever hit a rate limit, just have your application retry the request after a few seconds. It should do this already on certain HTTP codes, like 503.
And remember that rate limits are a feature, not a bug. Without rate limiting, your use of an API is in jeopardy every time someone writes buggy integration code.