During Amazon’s earliest days (1994-1995), CTO Shel Kaphan and Software Engineer Paul (then) Barton-Davis had to write all the software needed to power Amazon.com on the day it offered its website to the world to sell books (official launch date was July 16, 1995). The book catalog was online, and it needed an index (well, it needed several indexes, but that’s another story); specifically, it needed a unique key for each item in the catalog. Because the databases they were using to create the catalog were indexed by 10-character-long ISBN (International Standard Book Number), Shel and Paul decided to use ISBN as their key.
Unfortunately — and Shel was well aware of this very quickly, but of course by that time, it was too late — ISBNs are terribly abused in the United States. The company that issues ISBNs, Bowker, charges a lot of money for ISBNs (from the perspective of small publishers, anyway), and publishers don’t necessarily read all the rules. Small publishers were re-using ISBNs, and they also took their range of ISBNs and numbered through the entire range, rather than respecting the rule that the final character is actually a checksum, and you can only iterate through some of the digits. (It’s actually worse than just not using the last digit, but I’m not getting into that here.)
Shel very quickly removed all ‘checksum software checks’ (which would have made sure it was a legal ISBN), but Amazon was still stuck with a code base that stored the key value in 10 character strings, and which also stored them in other databases with similar constraints.
When we wanted to include other products in the catalog (e.g. CDs, DVDs, Electronics), we needed something other than ISBNs, because:
(1) other non-book products don’t have ISBNs,
(2) it would be cost-prohibitive to buy ISBNs from Bowker, and
(3) the namespace addressable by ISBNs is too small for the number of products we expected to be selling
I (Rebecca Allen) proposed that we replace the 10-digit ISBN as the key for our catalog with a minimum-impact-on-the-code “ASIN”. The ASIN would be the ISBN (if the item were a book and had an ISBN), or a 10 character serial number that represented a base 62 number. I got a ton of objections to base 62 (the 26 letters of the alphabet, case sensitive, plus the 10 digits), because that would have involved a case sensitive string — b000000000 would be a very different number from B000000000. Very. Different.
I somewhat agreed, and decided that a base 36 number (the letters of the alphabet plus the 10 digits) would work just fine. There was some debate about whether the key should have any structure. For example, credit card numbers have quite a bit of structure to them, as do well-constructed ISBNs for that matter. I objected to subdividing and having different parts of the string mean different things because it would substantially reduce the addressable space, and limit the number of items that could be moved through Amazon before having to go back and do this re-keying exercise again, with an even bigger code base.
The people who wanted “special” ASINs were pretty persistent, though, so I threw them a bone: they could have all the ASINs that started with the letter A, and I would start the counter for ASINs at B000000000. Finally, this proposal had to get past Shel, and Shel was not super keen on the idea of someone going through the code and changing every last place that ISBNs were referenced to something that was an ASIN. Even though this proposal actually minimized the hazards in several ways (same length string, same set of allowable characters, ISBNs are still legal ASINs, etc.), it was really going to involve a lot of code being changed at the same time. And an error would be very bad.
Humor can go a long ways, though, and I had already taken to presenting this as simultaneously the official term, Amazon Standard Identification/Item Number (ASIN), but also as, “It’s A SIN that we even have to do this”, which was a direct reference to the unfortunate ISBN key we currently had throughout the catalog and the system as a whole. Having worked with Shel fairly closely for a while, I hit upon the idea of presenting it to him as Arc Sine (which refers to a term that only mathematically-prone humans would enjoy). While that might not be humorous to everyone, it worked for Shel, and he gamely considered the proposal and agreed to it in his capacity as CTO, which was necessary for this project to go forward.
Where are ASINs now? Brad Stone has a new book, Amazon Unbound. It is (at the time I am writing this) available on Amazon as a Hardcover book, an Audio Book on CD, a Kindle Book, and an Audible Audio Book. The ASIN for the first two is a 10 character ISBN, which is created by taking the 13 character ISBN and stripping the 978 prefix off of it. Whereas the Kindle and Audible ASINs are as follows. If you browse to the Detail Page, you can find this information on the page as you select different product formats (Hardcover, Kindle, etc)
Kindle ASIN: B08TB1TP7H
Audible audiobook ASIN: B08V9DZZHG
Right away, we can see several things, before we do any calculations or conversions. First, there are not very many 0s after that B! In fact, there is just one zero. It’ll be a little while before that zero rolls over to 1, because the next digit after the 0 is an 8. If we look at more recently assigned ASINs, such as for Jayne Ann Krentz’s forthcoming Lightning in a Mirror, that has already rolled over to a 9: B092V35KJF.
I guess the first observation is:
(1) Amazon has gone through a lot of ASIN’s since I last checked.
(2) They are, unsurprisingly, going through them very, very fast.
Let’s do some math!
What is the distance between the Kindle and Audiobook ASIN listed above? Let’s just ignore the B0, because those are going to result in 0, and this gives us a slightly smaller number to deal with. (If that B ever rolls over to a C, you cannot just lop it off anymore; you have to keep it around long enough to subtract B000000000 from it, because ASINs start counting from that, not from 0. But we can ignore that for these calculations. We’re leaving the 8, though, so we can see how many ASINs have ever been assigned.)
Using: https://www.translatorscafe.com/unit-converter/en-US/numbers/39-13/base-36-base-10/ (I just googled for one and picked without testing, so buyer beware 🙂 This is the same tool I used in 2016.)
8V9DZZHG = 694,961,274,724
8TB1TP7H = 690,708,193,757
The immediate observation that these are quite distant from each other now has something we can emotionally connect to: they are about 4 billion (with a “B” as in Bezos) ASINs apart!
Also, these are really big numbers! When I did this calculation back in 2016 and left myself some notes in my blog, it was smaller: https://walkitout.dreamwidth.org/1402703.html
At the time, I used the Schmilco album, and got 117,690,453,020.
So, between sometime in 1997 or 1998 (I forget exactly when I created ASINs, but I left in 1998, so it had to be before that, and it wasn’t one of the first things I did, so it was not in 1996) and 2016, Amazon used 100 Billion (with a B, for Bezos) ASINs. Between 2016 and now, they’ve gone through at least 6 times that.
Whee! Are they going to run out?
NO! Come on. When I designed this, I made sure they were not going to run out in my lifetime, and I’m not dead yet. The first two characters of the Schmilco pre-order in 2016 were _also_ B0. We’re a really long ways away from rolling over to C, much less running out. But we’re definitely picking up speed along the way.
I’ll now speculate a little about why the velocity has increased. First, while it was pretty obvious way back in 1997 (or 1998) that we would be needing a distinct ASIN for each size / color of a T-shirt (for example), at the time I in no way imagined Amazon selling goods with expiration dates (failure of the imagination on my part, and I apologize). They are using a new ASIN for each new expiration date on a given product. This is the correct choice, but you can see where selling lettuce, for example, is going to really move through a lot of ASINs over time. I’m not sure how Amazon Restaurants was implemented (I wasn’t even aware of it until recently), but it may have used up a lot of ASINs as well. Second, vendors have the option of using a distinct ASIN in each geographic region / market. That may also be causing some acceleration. I’m sure there are other things that I am unaware of.
(If you are curious, C000000000 – B000000000 = 1000000000 in base 36 which is 101,559,956,668,416 in base 10. Good luck having any kind of “feel” for that number, but it is bigger than the national debt, for sure.)
Recently, Amazon released a report on measures it takes to deal with scam listings, counterfeit goods and the like. It included the somewhat astonishing statistic:
“In 2020, we prevented over 6 million attempts to create new selling accounts, stopping bad actors before they published a single product for sale, and blocked more than 10 billion suspected bad listings before they were published in our store.”
I guess if you are running through more than a hundred billion legitimate listings per year, 10 billion suspected bad listings could be expected? Although I guess ASINs assigned to listings that are then thwarted may also run through ASINs more quickly.
As you can see, Catalogs and unique IDs (ASINs, in this case) have a lot of complexity. I hope this gives you some sense for the depth and breadth of thinking that went into developing ASINs in the early days of Amazon.com.
What Do Other Online Stores Use Instead of ASIN?
At one point, Amazon and Target had a contractual relationship in which Amazon supplied a web presence for Target. That ended in 2011, but there was substantial continuity in the appearance for a Target customer from when Amazon was running the web presence. And that continues today.
ASIN appears in most URLs that lead to item detail pages on Amazon. Similarly, the TCIN (Target _____ Item Number?), appears in the URL and/or on the page of this product.
Both the Amazon ASIN and Target TCIN are 10 character strings; the Target URL element starts consistently as “A-” followed by 8 numbers. I have not seen any non-digit characters other than the leading “A-”. This looks like a serial item number which so far is 8 digits in length, and which has been padded out to the 10 characters that would have been an ASIN in the Amazon code base. I’m not saying it is that; but it just looks like that.
The Target item detail page also lists a UPC (I will not be explaining that here — it’s the scannable bar code printed on the item) and a DPCI. DP stands for Department (which is the first few digits of the code), C stands for Class (the next couple digits), and the final few digits are the Item itself. This is the kind of structured information in an item number which some people wanted to include in an item number at Amazon, and which I was very resistant to. I believe the DPCI to be a Target specific item number which predated their web presence.
Walmart item URLs have a trailing, 9 digit number, which, if you type it into the search box, pretty reliably returns a search screen with whatever you ripped it off at the top. If you enter it into BrickSeek on the Walmart SKU search screen, it’ll show the same item. I think that trailing 9 digit number is a Walmart SKU. However, it is NOT the same as the Walmart # which displays on the detail page, and I can’t find a displayed line item called a SKU OR which matches the trailing 9 digit number, so I lack certainty what they are doing in Bentonville :-)One note — the Walmart 9 digit number is not monotonically increasing over time; there are relatively recent items that start with a 2, and somewhat older things that start with an 8 (and which are otherwise the same general type of product — in this case, paperback books).
This post was written by Rebecca Allen (Amazonian Software Engineer 1996-1998).
Gary Williams says
“The ASIN for the first two is a 10 character ISBN, which is created by taking the 13 character ISBN and stripping the 978 prefix off of it.”
ISBN-10 uses a different algorithm for the check digit than ISBN-13. The check digits will coincidentally match less than 10% of the time.
Brian Horakh says
I was the first developer/company to integrate in a programmatic fashion with the Amazon Marketplace (and also Amazon Payments, some other Amazon services).
We had an e-commerce platform with around 5,000 prominent ebay sellers and several had been invited to participate.
Few (if any) of them had UPC’s, and even if they did, often the UPC was the same for all variations of a product (i.e. a shirt with a UPC would be small, med, large), each variation requires it’s own ASIN. Usually these items were being manufactured and directly imported from China.
We contacted GSI about getting a prefix of UPC’s for our premium sellers to share but couldn’t find a way to make it profitable for them and worth the trouble for us.
Anyway I remember asking “how should we handle this” to the AWS account who was assigned to one of our client accounts (explaining why we couldn’t get the client online) and we were told “just make up the numbers”. I balked and said “this is a really bad idea, are you sure?” .. “yes, just make them up”, I got it in writing. I picked some obscure prefix that was used for I think automotive parts and wrote a system that just started generating UPC’s with valid checksums.
A few years later we get a call from Amazon “YOU MUST STOP DOING THIS” .. (yeah, finally had a collision) and I’m like “yeah, except YOU TOLD US TO” .. obviously that person wasn’t working at Amazon anymore, and the email did fuck all. Our sellers had grown extremely accustomed to not needing to pay for UPC’s and the idea they’d need to start flew like a lead balloon .. plus none of our competitors were doing that either.
I can’t recall how it got resolved, I think we ended up selectively disabling the feature in specific categories which probably had UPC’s and leaving it on for apparel and others which didn’t.
Anyway hope you appreciate the anecdote.
Rebecca Allen says
That’s a fantastic and all too believable story. Thanks for sharing!
Rebecca Allen says
Great Story and all too believable! Thanks for sharing!
Fun fact, if my memory is correct: when I worked at target a number of years ago, I learned an 004- DPCI was actually printed signage, be it categorical (“Clearance”) or seasonal (“Music For The Holidays”) Different types and themes had different class numbers and each new issue sent from on high got its own item number.
I’m curios, what were those “A….” ASINs used for? Are they visible somewhere? I have only ever seen B…. ASINs
Rebecca Allen says
To the best of my knowledge, those ASINS have never been used — but I’ve been gone a long time, so what do I know.
When ASIN’s came out I tried to convince Amazon to name them BASIN’s, as Book Amazon Standard Identification Number, then they could talk about the Amazon Basin. Little did I realize at the time that ASIN’s were for everything, not just books, so Amazon wisely ignored me.
Yet to this day some sellers still talk about BASIN’s.
Fascinating post – thanks for this. I’m genuinely impressed you left space for such a massive catalogue. The expiration dates detail is particularly good. How many ids have been taken up just by lettuce I wonder?
Great article, Rebecca. I am curious; did you try to avoid duplicates when assigning the ASIN? When a new book was added to the catalog, perhaps without its ISBN, did you check that the book had not been assigned a number? Or is it the case that ASIN is scoped by the seller/merchant? If you tried to avoid duplicates, I’d love to know how you went about it. Thanks!
Rebecca Allen says
Since a lot of ASIN assignment was automated or not within anything resembling my control, no, I did not try to avoid duplicates. A lot of the obvious ways of avoiding “duplicates” would have treated things as the same that were actually quite different (hardback vs trade paperback vs mass market paperback vs movie based on the book, etc.).
One of the last projects I worked on was an authority database system, that would enable the catalog department to disentangle authors with the same name who were different people, or different names who were the same person, books with the same title that were different books or books with different titles that were the same book, etc. I tried to create as flexible a system as possible. I don’t really know whether the system was used after I left or for how long — it could still be around, or it could have been completely replaced.