- Disk vendors quote disk capacities in base-10 numbers; a drive advertised as “3TB” can store roughly 3,000,000,000,000 bytes of data.
- Computer address memory and disk in base-2 addresses, so 1KB is actually 1024 bytes and 3TB is actually 3,298,534,883,328 bytes.
- RAID formatting trades data storage capacity for the ability to recover data when a drive fails.
- File system formatting uses a portion of (possibly) RAID protected storage capacity to store information about files (file names, disk locations, access times, permissions, etc).
From this point we trade usable capacity for various things like resiliency (RAID) and simplicity of management (file systems)—the only things that actually qualify as formatting.
Disk drives will fail, and so some form of RAID is used to protect the data. If the above 360 drives are formatted in RAID6 (a best practice for 3TB SATA drives)—let’s say 8D+2P—the usable capacity of the storage arrays is “only” 786TB (2.73TB/drive * 8 data drives * 36 RAID groups), a whopping 317TB off of the advertised RAW capacity of 1.08PB. (And by the way, that was calculated with no spare drives configured, a definite No-No).
317TB is enough SAN capacity for a good size enterprise—production and development. That’s outside the pale for a rounding error.
The problem only gets worse as we talk larger and larger capacities. Here’s a table I put together to illustrate the difference between base-2 and base-10 in terms of storage capacity:
Capacity
|
Computer
|
Salesman
|
Difference
| |
1KB
|
1,024
|
1,000
|
2.3%
|
24 bytes
|
1MB
|
1,048,576
|
1,000,000
|
4.6%
|
47 KB
|
1GB
|
1,073,741,824
|
1,000,000,000
|
6.9%
|
70 MB
|
1TB
|
1,099,511,627,776
|
1,000,000,000,000
|
9.1%
|
93 GB
|
1PB
|
1,125,899,906,842,620
|
1,000,000,000,000,000
|
11.2%
|
115 TB
|
1EB
|
1,152,921,504,606,850,000
|
1,000,000,000,000,000,000
|
13.3%
|
136 PB
|
To make matters worse, in 1998 the EIC attempted to unilaterally redefine the terms kilobyte, megabyte, etc. to use base-10, coining obnoxious terms for base-2 numbers which understandably nobody uses. While I applaud the effort to standardize, such slow adoption in 14 years is evidence they made the wrong choice and muddied the water. I don't get the impression the IEC was too concerned about how capacity was measured "once upon a time," only that SI prefixes were changed, so it probably went something like this ("I am your king"... "I didn't vote for you"... "You don't vote for kings."...). A computer will only ever consume space in powers of two until someone figures out how to bring quantum computing to the mainstream. Until then you'll have to pry my base-2 kilobytes out of my cold, dead fingers. "Help, help! I'm being repressed!"
As sales engineers, it’s our responsibility to ensure we’re properly assessing customer need and architecting to meet those needs. In my experience, quoting capacities in base-2 numbers is the easiest way to communicate in a manner most likely to meet customer expectations. My personal preference is to quote usable RAID formatted capacity (ie, “13.3TB RAID5 usable”) to call out the difference from “raw” storage bids.
No comments:
Post a Comment