During Amazon’s earliest days (1994-1995), CTO Shel Kaphan and Software Engineer Paul (then) Barton-Davis had to write all the software needed to power Amazon.com on the day it offered its website to the world to sell books (official launch date was July 16, 1995). The book catalog was online, and it needed an index (well, it needed several indexes, but that’s another story); specifically, it needed a unique key for each item in the catalog. Because the databases they were using to create the catalog were indexed by 10-character-long ISBN (International Standard Book Number), Shel and Paul decided to use ISBN as their key.
Unfortunately — and Shel was well aware of this very quickly, but of course by that time, it was too late — ISBNs are terribly abused in the United States. The company that issues ISBNs, Bowker, charges a lot of money for ISBNs (from the perspective of small publishers, anyway), and publishers don’t necessarily read all the rules. Small publishers were re-using ISBNs, and they also took their range of ISBNs and numbered through the entire range, rather than respecting the rule that the final character is actually a checksum, and you can only iterate through some of the digits. (It’s actually worse than just not using the last digit, but I’m not getting into that here.)
Shel very quickly removed all ‘checksum software checks’ (which would have made sure it was a legal ISBN), but Amazon was still stuck with a code base that stored the key value in 10 character strings, and which also stored them in other databases with similar constraints.
When we wanted to include other products in the catalog (e.g. CDs, DVDs, Electronics), we needed something other than ISBNs, because:
(1) other non-book products don’t have ISBNs,
(2) it would be cost-prohibitive to buy ISBNs from Bowker, and
(3) the namespace addressable by ISBNs is too small for the number of products we expected to be selling
I (Rebecca Allen) proposed that we replace the 10-digit ISBN as the key for our catalog with a minimum-impact-on-the-code “ASIN”. The ASIN would be the ISBN (if the item were a book and had an ISBN), or a 10 character serial number that represented a base 62 number. I got a ton of objections to base 62 (the 26 letters of the alphabet, case sensitive, plus the 10 digits), because that would have involved a case sensitive string — b000000000 would be a very different number from B000000000. Very. Different.
I somewhat agreed, and decided that a base 36 number (the letters of the alphabet plus the 10 digits) would work just fine. There was some debate about whether the key should have any structure. For example, credit card numbers have quite a bit of structure to them, as do well-constructed ISBNs for that matter. I objected to subdividing and having different parts of the string mean different things because it would substantially reduce the addressable space, and limit the number of items that could be moved through Amazon before having to go back and do this re-keying exercise again, with an even bigger code base.
The people who wanted “special” ASINs were pretty persistent, though, so I threw them a bone: they could have all the ASINs that started with the letter A, and I would start the counter for ASINs at B000000000. Finally, this proposal had to get past Shel, and Shel was not super keen on the idea of someone going through the code and changing every last place that ISBNs were referenced to something that was an ASIN. Even though this proposal actually minimized the hazards in several ways (same length string, same set of allowable characters, ISBNs are still legal ASINs, etc.), it was really going to involve a lot of code being changed at the same time. And an error would be very bad.
Humor can go a long ways, though, and I had already taken to presenting this as simultaneously the official term, Amazon Standard Identification/Item Number (ASIN), but also as, “It’s A SIN that we even have to do this”, which was a direct reference to the unfortunate ISBN key we currently had throughout the catalog and the system as a whole. Having worked with Shel fairly closely for a while, I hit upon the idea of presenting it to him as Arc Sine (which refers to a term that only mathematically-prone humans would enjoy). While that might not be humorous to everyone, it worked for Shel, and he gamely considered the proposal and agreed to it in his capacity as CTO, which was necessary for this project to go forward.
Where are ASINs now? Brad Stone has a new book, Amazon Unbound. It is (at the time I am writing this) available on Amazon as a Hardcover book, an Audio Book on CD, a Kindle Book, and an Audible Audio Book. The ASIN for the first two is a 10 character ISBN, which is created by taking the 13 character ISBN and stripping the 978 prefix off of it. Whereas the Kindle and Audible ASINs are as follows. If you browse to the Detail Page, you can find this information on the page as you select different product formats (Hardcover, Kindle, etc)
Kindle ASIN: B08TB1TP7H
Audible audiobook ASIN: B08V9DZZHG
Right away, we can see several things, before we do any calculations or conversions. First, there are not very many 0s after that B! In fact, there is just one zero. It’ll be a little while before that zero rolls over to 1, because the next digit after the 0 is an 8. If we look at more recently assigned ASINs, such as for Jayne Ann Krentz’s forthcoming Lightning in a Mirror, that has already rolled over to a 9: B092V35KJF.
I guess the first observation is:
(1) Amazon has gone through a lot of ASIN’s since I last checked.
(2) They are, unsurprisingly, going through them very, very fast.
Let’s do some math!
What is the distance between the Kindle and Audiobook ASIN listed above? Let’s just ignore the B0, because those are going to result in 0, and this gives us a slightly smaller number to deal with. (If that B ever rolls over to a C, you cannot just lop it off anymore; you have to keep it around long enough to subtract B000000000 from it, because ASINs start counting from that, not from 0. But we can ignore that for these calculations. We’re leaving the 8, though, so we can see how many ASINs have ever been assigned.)
Using: https://www.translatorscafe.com/unit-converter/en-US/numbers/39-13/base-36-base-10/ (I just googled for one and picked without testing, so buyer beware 🙂 This is the same tool I used in 2016.)
8V9DZZHG = 694,961,274,724
8TB1TP7H = 690,708,193,757
The immediate observation that these are quite distant from each other now has something we can emotionally connect to: they are about 4 billion (with a “B” as in Bezos) ASINs apart!
Also, these are really big numbers! When I did this calculation back in 2016 and left myself some notes in my blog, it was smaller: https://walkitout.dreamwidth.org/1402703.html
At the time, I used the Schmilco album, and got 117,690,453,020.
So, between sometime in 1997 or 1998 (I forget exactly when I created ASINs, but I left in 1998, so it had to be before that, and it wasn’t one of the first things I did, so it was not in 1996) and 2016, Amazon used 100 Billion (with a B, for Bezos) ASINs. Between 2016 and now, they’ve gone through at least 6 times that.
Whee! Are they going to run out?
NO! Come on. When I designed this, I made sure they were not going to run out in my lifetime, and I’m not dead yet. The first two characters of the Schmilco pre-order in 2016 were _also_ B0. We’re a really long ways away from rolling over to C, much less running out. But we’re definitely picking up speed along the way.
I’ll now speculate a little about why the velocity has increased. First, while it was pretty obvious way back in 1997 (or 1998) that we would be needing a distinct ASIN for each size / color of a T-shirt (for example), at the time I in no way imagined Amazon selling goods with expiration dates (failure of the imagination on my part, and I apologize). They are using a new ASIN for each new expiration date on a given product. This is the correct choice, but you can see where selling lettuce, for example, is going to really move through a lot of ASINs over time. I’m not sure how Amazon Restaurants was implemented (I wasn’t even aware of it until recently), but it may have used up a lot of ASINs as well. Second, vendors have the option of using a distinct ASIN in each geographic region / market. That may also be causing some acceleration. I’m sure there are other things that I am unaware of.
(If you are curious, C000000000 – B000000000 = 1000000000 in base 36 which is 101,559,956,668,416 in base 10. Good luck having any kind of “feel” for that number, but it is bigger than the national debt, for sure.)
Recently, Amazon released a report on measures it takes to deal with scam listings, counterfeit goods and the like. It included the somewhat astonishing statistic:
“In 2020, we prevented over 6 million attempts to create new selling accounts, stopping bad actors before they published a single product for sale, and blocked more than 10 billion suspected bad listings before they were published in our store.”
I guess if you are running through more than a hundred billion legitimate listings per year, 10 billion suspected bad listings could be expected? Although I guess ASINs assigned to listings that are then thwarted may also run through ASINs more quickly.
As you can see, Catalogs and unique IDs (ASINs, in this case) have a lot of complexity. I hope this gives you some sense for the depth and breadth of thinking that went into developing ASINs in the early days of Amazon.com.
What Do Other Online Stores Use Instead of ASIN?
At one point, Amazon and Target had a contractual relationship in which Amazon supplied a web presence for Target. That ended in 2011, but there was substantial continuity in the appearance for a Target customer from when Amazon was running the web presence. And that continues today.
ASIN appears in most URLs that lead to item detail pages on Amazon. Similarly, the TCIN (Target _____ Item Number?), appears in the URL and/or on the page of this product.
Both the Amazon ASIN and Target TCIN are 10 character strings; the Target URL element starts consistently as “A-” followed by 8 numbers. I have not seen any non-digit characters other than the leading “A-”. This looks like a serial item number which so far is 8 digits in length, and which has been padded out to the 10 characters that would have been an ASIN in the Amazon code base. I’m not saying it is that; but it just looks like that.
The Target item detail page also lists a UPC (I will not be explaining that here — it’s the scannable bar code printed on the item) and a DPCI. DP stands for Department (which is the first few digits of the code), C stands for Class (the next couple digits), and the final few digits are the Item itself. This is the kind of structured information in an item number which some people wanted to include in an item number at Amazon, and which I was very resistant to. I believe the DPCI to be a Target specific item number which predated their web presence.
Walmart item URLs have a trailing, 9 digit number, which, if you type it into the search box, pretty reliably returns a search screen with whatever you ripped it off at the top. If you enter it into BrickSeek on the Walmart SKU search screen, it’ll show the same item. I think that trailing 9 digit number is a Walmart SKU. However, it is NOT the same as the Walmart # which displays on the detail page, and I can’t find a displayed line item called a SKU OR which matches the trailing 9 digit number, so I lack certainty what they are doing in Bentonville :-)One note — the Walmart 9 digit number is not monotonically increasing over time; there are relatively recent items that start with a 2, and somewhat older things that start with an 8 (and which are otherwise the same general type of product — in this case, paperback books).
This post was written by Rebecca Allen (Amazonian Software Engineer 1996-1998).