142

Which conversion should I teach to my undergrad students? That 1 kB is 1024 bytes (binary) as everyone learned back in the nineties or the recent industry-led "friendly" conversion that says that 1 kB is in fact 1000 bytes (decimal)?

My immediate feeling goes toward the binary conversion, but when IEC says otherwise and major OSs decide for the decimal conversion (Mac OS X ≥ 10.6 and Ubuntu ≥ 10.10 now use the SI prefixes exclusively to refer to powers of 1000) I'm not so sure anymore.

Ola Ström
  • 139
  • 1
  • 1
  • 6
alves
  • 1,513
  • 2
  • 8
  • 7
  • 27
    Please notice that the SI prefix "kilo" is always written with a lowercase "k". Personally, I am used to seeing "kB", even when it strictly isn't a SI prefix. – Andreas Rejbrand Mar 09 '18 at 19:08
  • In the millions of lines of code I've read `n×2^(m×10)` is actually incredibly rare. Think about where it actually makes sense: when dealing with very time-consuming algorithms which might be slightly faster (such as PGP) or when certain that your data will align with storage layout (such as database internals). Making text fields n×1024 characters long is just cargo culting. – l0b0 Mar 09 '18 at 20:09
  • 39
    [xkcd](https://xkcd.com/394/) – Kevin Mar 09 '18 at 21:09
  • 5
    Ram is soled in kiB, Mib, GiB and hard disks in kB, MB, GB. Both often labelled kB, MB, GB. So it is not always about programming. – ctrl-alt-delor Mar 10 '18 at 10:06
  • 1
    When standards are unclear it is best to just tell them that. Just like there is this problem with $\subseteq \subset \subsetneq$ in maths, different people use the middle one for either the left or the right. Our maths professor told us about this problem and then stated how he is going to use them in the future. Which is then personal preference – Felix B. Mar 10 '18 at 10:32
  • 13
    What I find amusing is that the power-of-two version (The one that's clearly what is usually desired) has no justification whatsoever for the use of the "Kilo" prefix--it's just that some arbitrary power of two happens to come fairly close to some arbitrary power of 10 so we ignore the difference for the convenience of being able to say "K" (or "M" or "G") because "0x0200"abyte is too hard to say. – Bill K Mar 12 '18 at 16:08
  • 1
    @BillK I was going to say the same thing - we religiously fight for K=1024 and we completely forget that "kilo" was "invented" long time before computers, and associating it with 1024 was nothing but a steal - convenient but highly improper. If 2^10 was, say, 1256, then for sure we wouldn't call it a "kilo" byte. – Bogdan Alexandru Mar 12 '18 at 20:11
  • 5
    Byte is not an SI unit. The SI unit for quantity is the mole. 1 GB is approximately 1.66 femtomole bytes – Erwin Bolwidt Mar 13 '18 at 10:38
  • 1
    @l0b0 There is an actual reason for using 1024 byte fields. Because the number of bits you need to address the bytes allows up to 1024 bytes. Using 1000 is fine, but your address space allows 24 more bytes so why don't you use them? So 1024 is an "binary even" number for a reason, not just cargo cult. – allo Mar 14 '18 at 08:33
  • 1
    @allo What do you mean "allows up to 1024 bytes"? You're going to have to specify what you are referring to if you're making such a claim, otherwise it's still cargo cult. – l0b0 Mar 14 '18 at 08:38
  • Lets say you have a string of 1024 bytes. Then you can use 10 bits to address any character in this string. Make it 1025 bytes and you need 11 bits. Make it 1000 bytes and you got some addresses left over. Most the time the constraint on the address length is harder than on the field itself. So making it 1024 is quite nice when you need at least 1000 bytes while avoiding to need 1025 is a good idea. – allo Mar 14 '18 at 08:40
  • 1
    This question requires correction. The linked IEC doc says nothing about KB. Please remove the false premise or otherwise eliminate this confusion, which only serves to further muddy an already unclear situation. – Sentinel Mar 14 '18 at 15:21
  • When the largest memory was 32k it wasn't as confusing. To anyone who entered the computing biz before about 1990 K=1024 and M= 1,048,576 (when describing RAM/disk storage size). But this is confusing when a 64K RAM is 65,535 bytes, and also confusing when 64kHz is 64,000 Hz. IBM decided to change the notation back maybe 1990, and most of the rest of the industry went along, some (the old big iron crowd) grudgingly, some (the new consumer computer biz) more happily. – Hot Licks Mar 14 '18 at 21:53
  • 1
    Teach them that both exist, explain WHY both exist, and tell them that people will likely be incredibly inconsistent as to which one they mean. – Shadur Mar 16 '18 at 11:58
  • KiB = 1024 Bytes > KB = 1000 Bytes > Kib = 1024 bits > Kb = 1000 bits – J.Money Nov 13 '19 at 23:42
  • Read this: https://stackoverflow.com/a/69679309/217867 – Lonnie Best Oct 22 '21 at 15:30

13 Answers13

191

You should teach both, and you probably want to use the binary unit. When you are talking about the difference, it may be helpful to tell them about how to tell the difference when reading them:

The SI kilo- is k:
$1\ \text{kB (kilobyte)} = 10^{3}\ \text{bytes} = 1000\ \text{bytes}$

While the binary kibi- is Ki:
$1\ \text{KiB (kibibyte)} = 2^{10}\ \text{bytes} = 1024\ \text{bytes}$

I notice that you used KB in your question to refer to both sizes; perhaps you should also point out that KB could be interpreted as either of these prefixes (though Wikipedia suggests it is most often used in place of KiB). In your position, I would suggest clarifying which one you mean if you use this notation.

(While you're going over confusing units, a related difference in writing units is that lowercase b is bits, uppercase B is bytes; an eightfold difference is much more significant than 2.4%.)

thesecretmaster
  • 4,785
  • 3
  • 21
  • 48
Mike P
  • 1,756
  • 1
  • 7
  • 4
  • This was the answer I was looking for when I read the question. Was surprised to see the top-voted one did not mention kibibytes, which is really at the heart of the answer here. – Ghoti and Chips Mar 09 '18 at 20:58
  • 52
    Beyond just teaching both, you need to teach that **k/kilo** can mean either depending on context/who's using it. Just because kibble exists doesn't mean people like or actually use it. – R.. GitHub STOP HELPING ICE Mar 10 '18 at 01:14
  • 7
    If you cover bits and bytes, you should also at least briefly mention that a "kilobit" is nearly always 1000 bits (because networking) and a "kilobyte" is nearly always 1024 bytes (because everything-except-for-networking). – Kevin Mar 10 '18 at 07:45
  • 29
    1MiB is ≈5% bigger that 1MB, 1GiB is 7.4% bigger that 1GB, and 1TiB is nearly 10% bigger than 1Tb. – ctrl-alt-delor Mar 10 '18 at 10:14
  • 1
    @ctrl-alt-delor, still nowhere near 8x. – prl Mar 11 '18 at 04:14
  • 17
    I was always taught that the base is binary, an 8-bit word is a Byte, a 16-bit word is two Bytes and, following binary convention 1KB is 1024 Bytes, 1MB is 1024 KB, 1GB is 1024 MB, 1TB is 1024 GB - and in binary, the base unit of computing, it makes perfect sense. I have always found the attempted adoption of SI usage an incorrect and unnecessary confusion. That said, as an educator, a student will need to understand the confusion. – Willtech Mar 11 '18 at 04:16
  • 56
    *"Should I teach that 1 KB = 1024 bytes or 1000 bytes?"* **Yes.** :-) – user541686 Mar 11 '18 at 06:01
  • 5
    @Willtech - an octet is 8 bits. An Apple ][+ has 7 bit bytes. – TOOGAM Mar 11 '18 at 09:29
  • 4
    @Kevin and for those of us older than ethernet, kilobits are binary https://electronics.stackexchange.com/questions/95389/why-are-eeprom-sizes-measured-in-k-or-kbit-and-not-kbyte-or-byte – Pete Kirkham Mar 11 '18 at 22:38
  • 5
    I'd also teach to **read the context**: "four kay" being for things like block sizes or amounts of memory is *almost always* going to mean 4096. If someone colloquially says "sixty-four kay of flash" when talking about a microcontroller, that means 65536 bytes (maybe missing some). If someone says "2 gigs of memory" when talking about RAM, that means 2 GiB. By way of contrast, I've never heard an Ethernet frame quoted as being "1.5 kay", or "9 kay", because that would be misleading/confusing. Mass storage is the thing at fault here. – Nick T Mar 12 '18 at 00:41
  • 1
    @Willtech There used to be reasons why 1024 byte powers were convenient, but they no longer apply with more advanced technology that is capable of dividing by 1000. – user253751 Mar 12 '18 at 04:27
  • 3
    @immibis has how RAM is manufactured changed? It is still binarily addressable in structure. – Willtech Mar 12 '18 at 04:33
  • 3
    @Willtech RAM is only a small slice of the things that are measured in bytes, and most other things (such as hard drives) have no reason to have power-of-two sizes. Modern systems are also much more able to deal with NPOT sizes - for example we no longer want to hard-wire address decoders using the minimal number of gates. By the way, modern memory chips actually include spare rows to automatically replace faulty ones, so that proves they can make any number of rows (because the total isn't a power of two). – user253751 Mar 12 '18 at 04:46
  • @immibis When we invent technology like PAE, that doesn't justify limiting paging to 01110111001101011001010000000000 bytes. – Willtech Mar 12 '18 at 05:30
  • Alright, most addressing is probably in hex values but it is compatible with binary. – Willtech Mar 12 '18 at 05:48
  • 1
    You should teach a generalization of @R.. 's excellent comment: *"Just because an standard exists doesn't mean people like or actually use it."*. I **wish** they had taught me that. Then I would still encourage them to use standards whenever practical (instead of [reinventing the wheel](https://xkcd.com/927/)), even if it means jumping through a couple of hoops. – xDaizu Mar 12 '18 at 08:35
  • 5
    Just to add to this though, prepare your students for the fact that they will almost certainly never find anyone using the word "kibibyte" in the real world. I've been working in software for 20+ years, as well as spending years building my own PCs, and I've never once seen it used. Occasionally I've seen people write "KiB" - and it's always read as "kilobytes". Even that is a small minority of cases though. – Graham Mar 12 '18 at 09:14
  • I have a 2TB hard drive. The capacity from hardware properties shows 1907601MB which doesn't add up with either formats. Why is that? – FisNaN Mar 12 '18 at 14:07
  • 2
    @FisNaN: 1907601 MiB * 1048576 B/MiB = 2000264626176 B; the hardware properties are using the wrong prefix – Mike P Mar 12 '18 at 17:17
  • Congratulations on going over 100 up votes. I think that is a first here. – Buffy Mar 12 '18 at 19:42
  • 5
    And while your students may never even use one, it may be instructive, as an example of just how confusing the terminology can get, to note that a 1.44 "megabyte" floppy disk is neither 1000*1000*1.44 nor 1024*1024*1.44, but in fact 1000*1024*1.44 bytes. – Clement Cherlin Mar 12 '18 at 23:46
  • @NickT, we say “one and a half kay” for the size of a packet all the time. But we’re talking about the amount of memory needed to hold it, not the actual transmission, so maybe that makes the difference. – prl Mar 13 '18 at 08:00
  • 1
    @ClementCherlin Actually, floppies were simply 1440 KiB; as much KiB's as they could fit on the disk. Hard disks and flash memory still have blocks the size of some-power-of-two of bytes, and an integer multiple of those for the total capacity. – JimmyB Mar 13 '18 at 09:55
  • 1
    @prl 1518 B and 1522 B (+4 for VLAN tags in 802.3ac) are common for Ethernet frames, when you say "one and a half kay", to which are you referring? Sure, once you establish either and are exclusively using it, abbreviate, but to a new person/group, they'll have no idea. – Nick T Mar 13 '18 at 15:59
  • @NickT, 1.5 KB is enough memory to hold either size. – prl Mar 14 '18 at 01:37
  • @prl punch 1536 into the wrong MTU field and you just slowed your network by 10-20% – Nick T Mar 14 '18 at 03:26
  • @NickT, again, I’m talking the amount of memory needed to contain a packet, not what is in the packet. – prl Mar 14 '18 at 04:43
  • I would say that technically, KiB means 1024 bytes and kB means 1000 bytes, but a more informal convention exists to use KB (or kB?) for 1024 bytes. – Micah Walter Mar 14 '18 at 19:18
  • 1
    @JohnPeyton: Uppercase "K" has been widely recognized as unambiguously meaning 1024, at least in contexts that would support using a lowercase "k" when appropriate, for 40+ years; the notion that there's anything wrong with that notation is far more recent. – supercat Mar 14 '18 at 21:35
  • 1
    @supercat I'm not sure I would admit that such usage is unambiguous, given that the prefix has been defined as meaning 1000 for centuries. And the same applies to the prefixes M, G, etc., which have no such case distinction to help. – Micah Walter Mar 14 '18 at 21:39
  • 1
    @JohnPeyton: When has the uppercase version been defined as a prefix meaning 1000, other than in contexts where lowercase would have been unavailable? I agree the larger prefixes have always been a problem, though I'm a bit curious as to why there's only now an interest in changing them? If someone had proposed new terms in the 1970s, that would have been much more useful than redefining the terms after decades of use. – supercat Mar 14 '18 at 22:07
  • 1
    Moreover, teach that the 1 kB = 1000 B and 1 KiB = 1024 B is the _official standard_ (by ISO, etc.) for the usage of these prefixes but there still exist a lot of _non-standard_ usages in relation to kB (usually given as "KB"), especially in older products and legacy systems. And if ambiguity can be a problem to say that one should either specify that kB = 1000 B or to stick with KiB ( = 1024 B) (and the higher MiB, GiB, etc.) exclusively as that one is unambiguous even if not a nice decimal factor (though you SHOULD know your powers of 2! :) ). – The_Sympathizer Mar 17 '18 at 08:47
  • 3
    But in the interests of getting standards used and adopted, _firmly discourage_ them from _using_ kB/KB _themselves_ to mean 1024 B, _even if others are doing so_. They should _only_ know this usage for _understanding others' work_, NOT for using it in their own! The way to break the habit is to get the next generation doing the right thing so they replace the old dogs and push bad practices into the history bin where they belong. – The_Sympathizer Mar 17 '18 at 08:48
  • @JimmyB My point is that the "megabyte" used in the description "1.44 megabyte floppy disk" is neither power-of-ten nor power-of-two, but a combination of both. The reasons why a high-density 3.5" floppy disk has that particular size are not relevant to this discussion. The inconsistent use of terminology is. See http://mathworld.wolfram.com/Megabyte.html – Clement Cherlin Mar 18 '18 at 14:44
  • @ClementCherlin Ah, yes, 1440 KiB is neither 1.44 MiB nor 1.44MB, but either ~1.41MiB or ~1.47 MB. – JimmyB Mar 19 '18 at 10:37
69

You should teach them it's messed up beyond repair, and it's their generation's job to teach the next generation to use the silly-sounding standard prefixes, so that when they finally retire (and the current old-timers are more permanently removed from the argument), there can finally be a consensus.

As the matters currently stand, all the prefixes are unknowable without context. A networking megabit is $10^6$ bits, a filesystem megabyte is $2^{20}$ bytes, a hard drive megabyte is somewhere pretty close to $10^6$ bytes, and a megapixel is "probably a million pixels, who cares."

Bass
  • 791
  • 4
  • 3
  • 1
    The consensus seems to be that disk size is the nearest simple approximation *lower than* n×1000^m. So 2.057×10^12 bytes would be *advertised as* 2 TB, not 2.1 TB. – l0b0 Mar 09 '18 at 20:15
  • 2
    I'd note the prefixes rarely (basically never) have their binary meaning with units other than *bytes*. A megapixel is 1 million pixels, a megabit is a million bits. – cHao Mar 12 '18 at 19:51
  • 1
    The filesystem megabyte being $2^{20}$ bytes - maybe. Sometimes in the same OS you'll see "megabytes" (including decimal precision) being $10^6$ in some of the tools and $2^{20}$ in others. Most often in command line tools vs GUI tools, but I know of an OS where even different OS-provided GUI tools disagree on this... – davidbak Mar 17 '18 at 02:31
  • @davidbak is right. It depends on OS also. In 2009, Apple switched to standards-based prefixes for filesystems etc, to match disk drive manufacturers, i.e. GB = 10^9 bytes. https://eshop.macsales.com/blog/1852-snow-leopard-changes-they-way-we-look-at-gigabytes-and-megabytes-and-kilobytes-as-well/ Ubuntu changed in 2010 https://wiki.ubuntu.com/UnitsPolicy When will Windows catch up with reality? – nealmcb Mar 10 '21 at 21:11
54

Actually, you need to teach them both so that they are warned that the usage is not consistent. Then you can choose one as a standard in your course going forward.

Which you choose depends a bit on what you are teaching. If it is how to evaluate hard drives, etc. then $K = 1000$ works now. For most programming, however, $K = 2^{10} = 1024$ is probably best.

Sadly, the dual meanings is likely due to manufacturers trying to avoid confusion in the minds of unsophisticated customers.

ctrl-alt-delor
  • 10,635
  • 4
  • 24
  • 54
Buffy
  • 35,808
  • 10
  • 62
  • 115
  • 4
    Kilobyte was coined far before the 1,000 byte kilobyte in 1998. IEC really just made a mess of things. – phyrfox Mar 09 '18 at 19:09
  • 48
    Yes, but kilo = 1000 goes back to 1795: https://www.etymonline.com/word/kilo- So non-geeks have some precedence here, perhaps. But more important: If you teach them just the one thing as the "correct thing" you are setting them up for confusion later. The world is messy. Teachers shouldn't pretend it isn't. Being dogmatic isn't very helpful. – Buffy Mar 09 '18 at 19:12
  • @GorchestopherH 2 x 2^30 = 2,147,483,648 It takes creative rounding to get that to 2.2TB. It would be more like 7.4% for "free". – Keeta - reinstate Monica Mar 09 '18 at 20:33
  • 1
    @Buffy but kilo SI is always small k. Large K is not kilo. – Sentinel Mar 10 '18 at 00:54
  • 2
    @Sentinel: Except when set in all caps, say, in a C macro name under normal macro naming convention... – R.. GitHub STOP HELPING ICE Mar 10 '18 at 01:17
  • 1
    Or in spoken conversation, where capitalization is hard to indicate without awkward sentence structure. I'll let my answer stand as is, I think. – Buffy Mar 10 '18 at 01:19
  • 5
    Also kB/KB doesn't help with MB, GB, TB which a) are much more relevant b) have much bigger differences. – Maciej Piechotka Mar 10 '18 at 02:20
  • 27
    "Sadly, the dual meanings is likely due to manufacturers trying to avoid confusion in the minds of unsophisticated customers" More likely it is advertisers wanting their product to sound larger than it really is. Why advertise a 3TB hard drive using the correct 1TB=1024*1024*1024*1024 bytes when you can advertise a 3.3TB hard drive using the lawyer approved 1TB=1000*1000*1000*1000 bytes. 3.3 is bigger than 3, right? – Readin Mar 10 '18 at 05:10
  • 25
    @Readin Or, as I see it more often, a 3TB drive that actually has 2.7TB of total storage. – Nic Mar 10 '18 at 05:43
  • @MaciejPichota Correct. But the question is specifically about kilo. – Sentinel Mar 10 '18 at 07:31
  • @NicHartley It can equally well be argued that that is because *you* are misinterpreting what a TB is.. – user253751 Mar 12 '18 at 04:17
  • 4
    @immibis Oh, for sure. It's just two different standards. Of course, it's still an unfair representation, because they know full well that their users' computers will use the base-2 versions, so when they're expecting their computer to say 3 terabytes, they'll actually get 2.7. – Nic Mar 12 '18 at 05:25
  • 1
    @NicHartley I know Windows does that (calculate in TiB then display as TB). Does OSX do that? Does Linux do that? How about industrial control systems that also use hard drives? I wouldn't say the manufacturer is lying, I'd say Windows is lying. – user253751 Mar 13 '18 at 02:16
  • @immibis For the record with Linux if you look at ls(1) you'll see: *`--block-size=SIZE scale sizes by SIZE before printing them; e.g., '--block-size=M' prints sizes in units of 1,048,576 bytes; see SIZE format below`* and *`The SIZE argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).`* I don't know about other tools off hand I just remember it typically being 2^10. On one of my smaller disks in this box: *`Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors`* (fdisk output). – Pryftan Mar 14 '18 at 20:47
  • @immibis As for the syscall stat(2): *`off_t st_size; /* Total size, in bytes */ blksize_t st_blksize; /* Block size for filesystem I/O */ blkcnt_t st_blocks; /* Number of 512B blocks allocated */`* All in all the idea of having more than just the 2^10 version was a terrible idea. – Pryftan Mar 14 '18 at 20:49
  • @Pryftan 512B isn't any of the above units. At best it's 0.5 KiB. But who wants to be mixing powers-of-10 fractions with power-of-1024 units? How much is 0.1 KiB? – user253751 Mar 14 '18 at 22:28
  • @immibis: 0.5 is an exact base-ten representation of a power-of-two fraction. – supercat Mar 14 '18 at 22:33
  • @immibis 2^10 == 1024. 1024/2 is what? And what is the size of a sector (though it's not a requirement)? 512 is what? 2^9. Nothing I suggested is amiss. – Pryftan Mar 15 '18 at 02:47
  • I believe Mac OS X uses powers of 10 for disk space. I don't know about Windows/Linux. – SilverWolf May 04 '18 at 19:10
  • @Buffy. I don't think you can reasonably say that the scholars who came up with the metric system *weren't* geeks. – TRiG Jul 23 '19 at 23:19
22

The difference between providing your students with a proper discussion of this topic, and simply teaching them one or the other, is the difference between being a real educator and being a reciter of factoids.

If there is no single correct definition of KB for you, then why would you instill something different in your students? The answer to your question is thus obvious in its formation. Your responsibility as a teacher is to convey an understanding of the issue, not to boil it down to one-or-another fact that you know to be less-than-true.

ruief
  • 329
  • 1
  • 2
  • 5
    I agree but before providing a proper discussion with my students, I'm providing a proper discussion here which was my intention in the first place (instead of getting simple _one or the other_ answers). – alves Mar 11 '18 at 20:32
18

Yes I agree with other answers, teach both, and also note the similarity.

The difference

  • $\text{ki} = 1024 = 2^{10}$
  • $\text{k} = 1000 = 10^3$
  • $\text{k}, \text{M}, \text{G}, \text{T}, \text{P}$ is sometimes used to mean $\text{ki}, \text{Mi}, \text{Gi}, \text{Ti}, \text{Pi}$

The similarity

  • $1 = \text{k}^0$ and $1 = \text{ki}^0$
  • $\text{k} = \text{k}^1$ and $\text{ki} = \text{ki}^1$
  • $\text{M} = \text{k}^2$ and $\text{Mi} = \text{ki}^2$
  • $\text{G} = \text{k}^3$ and $\text{Gi} = \text{ki}^3$
  • $\text{T} = \text{k}^4$ and $\text{Ti} = \text{ki}^4$
  • $\text{P} = \text{k}^5$ and $\text{Pi} = \text{ki}^5$
  • $\text{E} = \text{k}^6$ and $\text{Ei} = \text{ki}^6$

Quick maths

$64\text{ bits} = ( 6 \times 10 + 4 ) \text{ bits} = \text{ki}^6 \times 2^{4} = 16\text{ Ei addresses}$

This has some similarity and some difference with the base 10 system that they (should) know. First we break it into blocks of 10 (instead of 3), the remainder we just convert to base 10, the rest is the same.

Where used (mainly)

It is important to show where the 2 systems are used. While some answers say that they have never seen the $1000$ based SI system used in computing. It turns out that the SI system is used a lot, depending on what is being measured.

  • IEC 60027-2 A.2 and ISO/IEC 80000 e.g. $\text{ki}$:
    • measures of primary memory: RAM, RAM, cache.
    • measure of file sizes, partition sizes, and disk sizes within OS.
  • SI units e.g. $\text{k}$:
    • measures of secondary memory devices: hard-disks, SSDs.
    • network speeds.
    • CPU / memory / bus speeds.
    • all other speeds.

However the use of symbol $\text{ki}$ is at this time not always used.


see also https://en.wikipedia.org/wiki/Binary_prefix

ctrl-alt-delor
  • 10,635
  • 4
  • 24
  • 54
  • 6
    This answer begs the question. – prl Mar 11 '18 at 04:26
  • @prl If you are meaning dodge the question (answering a different question), then you are partly correct. I am trying to extend on other answers. And to give some advice on “How”, where the question was “Which”. – ctrl-alt-delor Mar 11 '18 at 19:33
  • 1
    IMO this is the best answer, but it could be slightly improved by explicit mention of *style*. I.e. in the same way that there are different styles for citing papers, or for delimiting lists (vide Oxford comma), there are different styles for formatting numbers. In an IEC publication post-2000 you can assume that house style will be SI / *bi. Other organisations / publishers may use other styles. – Peter Taylor Mar 12 '18 at 09:56
  • Pretty good answer. Two nitpicks: 0) For all the prefixes (k, M, Mi, Gi, etc.), use roman type, not italic; I suggest using `\text{}`. 1) Ki must have a capital K. – Nayuki Mar 13 '18 at 05:02
  • @Nayuki “The first letter of each such prefix is therefore identical to the corresponding SI prefixes, except for "K", which is used interchangeably with "k", whereas in SI, only the lower-case k represents 1000.” — https://en.wikipedia.org/wiki/Binary_prefix – ctrl-alt-delor Mar 13 '18 at 11:32
  • @nayuki feel free to do the other formatting: Italic → roman. – ctrl-alt-delor Mar 13 '18 at 11:34
  • @ctrl-alt-delor: The fact that there is no SI uppercase "K" SI prefix means that it can unambiguously represent 1024, especially if it's pronounced "kay" rather than "kill-o". IMHO, use of lowercase k for 1024 is and always has been sloppy. – supercat Mar 13 '18 at 23:03
  • @Supercat, I would agree if it was unambiguous that £100m = 10p ($100m = 10¢). If also does not scale to M, G, T … – ctrl-alt-delor Mar 13 '18 at 23:09
  • @ctrl-alt-delor: I agree that the use of uppercase M for 1E6 and lowercase m for 1E-3, and lowercase mu for 1E-6, along with the fact that uppercase Mu looks like an uppercase M, means there's no M-like letter available for anything else. From a pronunciation standpoint, using letter names for powers of two, "e.g. 4 em-bytes", "2 gee-bytes", or even "1.44 kilo-kay bytes" [the floppy size], would seem good. The only problem is with the written form. – supercat Mar 14 '18 at 15:59
  • @supercat If you have a compose key, then you can type µ by pressing composemu, or on this site by typing `$\mu$` to get $\mu$ – ctrl-alt-delor Mar 14 '18 at 16:03
  • @ctrl-alt-delor: That's great for representing micro; my point is that all of the "M-ish" characters are used up for SI prefixes, leaving none for a power-of-two prefix. – supercat Mar 14 '18 at 17:52
  • Note that the SI prefix is k (not K) whereas 1024 bytes is ordinarily represented as 1KB (1KiB if we must). – Willtech Mar 14 '18 at 21:01
11

I've worked in IT professionally since the mid-1980s. My current practice is to write whichever of e.g. KB or KiB that I mean at the time, with KB meaning $10^3$ and KiB meaning $2^{10}$. If I'm talking about the RAM in a machine I'll write e.g. "64MiB" and if I'm talking about the as-manufactured and as-marketed size of a disk drive I'll write "1TB." I am not, however, prepared to use words like "mebibyte" in conversation. Maybe one day I'll change my verbal abbreviations from e.g. "meg" to "meb" but I'm not there yet.

ctrl-alt-delor
  • 10,635
  • 4
  • 24
  • 54
WatcherOfAll
  • 119
  • 2
  • 5
    I've never seen, in a similar timeframe, MiB etc. used for RAM. KB/MB/GB/TB as concerned with RAM always is 1024-based. – AnoE Mar 10 '18 at 08:52
  • 6
    If you're using upper-case *K* for *kilo*, you're wrong. (I have seen people mixing up millimetre with megamolar.) – TRiG Mar 10 '18 at 20:24
  • 2
    I think I'd sooner say/write "binary megabyte" for MiB than "mebibyte", but the abbreviation would be OK. – Monty Harder Mar 12 '18 at 21:05
  • @MontyHarder: From a pronunciation standpoint, how about em-byte? – supercat Mar 14 '18 at 15:59
  • @supercat "em-byte" sounds like an abbreviation of megabyte. It therefore doesn't resolve the ambiguity the way MiB does. I find MiB a useful abbreviation (the "i" infix represents "b_i_nary"), but the word "mebibyte" itself is not coming out of my mouth smoothly, if at all. – Monty Harder Mar 14 '18 at 19:23
  • @MontyHarder: I don't mind the written form MiB nearly as much as the pronunciation. The pronunciation of "Blvd." is "boulevard", not "bull-vid". One might have to tweak a pronunciation rule slightly to say that if letters are pronounced for both the prefix and the unit [e.g "kay-gee" for mass or "kay-em" for distance], then the power-of-ten prefix applies. I think "bits" and "bytes" are usually pronounced as words, however, rather than as "bee". – supercat Mar 14 '18 at 20:16
  • @supercat If I speak one part of the actual word I will speak it all; so I'd say kilobyte but I'd not say Kbyte like that (more so Kbyte I'd pronounce kilobyte). I'd pronounce KB as the two letters individually. Maybe you're not saying that but in case you are that's how I would do it. I commend anyone who actually goes further than the binary way but I just can't for whatever reason. I'm not sure it's that I'm stubborn (although that's part of it). Even then MiB looks awful to me. Unfortunately language is meant to communicate and the double standards here complicate matters but what to do? – Pryftan Mar 14 '18 at 21:18
  • @Pryftan: I would pronounce 65,536 bytes as sixty-four kay-bytes, and a 32768Hz signal as thirty-two kay-hertz. The use of a pronounced letter only for the prefix indicates, at least to my ear, a meaning different from pronouncing the metric prefix. – supercat Mar 14 '18 at 21:27
  • @supercat Fair enough. I'm not saying you're wrong, of course; I'm just saying how I would do it. I'm unsure how I would pronounce your examples but possibly literally the way you wrote them. In any event I never say K-bits or M-bits; I either say kilobits or megabits or Kb/Mb (or whatever). Similarly Mbps I say megabits per second. And I thought the mess was bad enough; apparently it's even more confusing. Such is life when you have unwise standard changes (in one version of *`accept(2)`* Linus cited another example this one being a blunder of POSIX and how they tried to save face). – Pryftan Mar 14 '18 at 21:35
  • @TRiG: I'll grant you that in reference to where I used "K" for "kilo-" but apparently "K" for "kebi-" is correct under both IEC and JEDEC nomenclature. – WatcherOfAll Jul 23 '19 at 21:35
  • k for kilo; Ki for kibi, yes. Odd, but true. (Not kebi, though. That's some Anime character.) Incidentally, I'm happy to see I'm not the only person happy to reawaken year-long conversations. – TRiG Jul 23 '19 at 23:14
7

The basic confusion is in the notation at the KB (base 2 derived) vs kB (SI unit) unit level, and it is helpful to understand the origin of the use of the base 2 derived unit.

A computer is a binary machine.

At the basic level, memory addressing is binary. Usually, at the programmatic level, the addressing is keyed in hexadecimal format (it was originally binary); however, hexadecimal is also base 2 derived (it is base 16 or, 24) and so is directly compatible.

Beginning at the KB level for communicating understanding here is useful since the concepts of base 2 derived units have existed since before MB was in common usage (no differentiation in prefix from SI unit).

On a memory controller IC, if you imagine that address selectors are a row of switches (binary logic gates) and depending on how they are switched you get the memory read from a specific address on the data lines. The data is stored and returned as bytes.

There has always been a limited number of address lines available to address memory, and it so happens that using binary complete address sets for a given number of bits of addressing are base 2 numbers. So, on a 4KB machine, there are 12 address lines representing addresses 0 through 4095 (4096 bytes). These 12 address lines are corresponding to the 111111111111 addresses possible in binary, 0FFF in hexadecimal or, 4096 bytes in decimal. It would not be logical to limit address mapping to 4000 bytes for the sake of decimal convention when there are 12 addressing bits available.

This logic followed initially to hard disks also, where blocks are groups of bytes accessed by address, however (and I have not checked), I do hear that perhaps hard disk vendors find it less critical to use 'round addressing' formats, particularly considering the following.

All standard values in computer terminology are base 2 derived, although, for marketing purposes, some vendors 20MB hard disk may not be as large as some keeping the convention. It is convenient to slap 20MB on something even if it does not contain as many blocks and is easier to manufacture because there is less data density required.

Early IDE hard disks (there were other earlier systems before IDE), before the Logical Block Addressing (LBA) system was introduced, used to be configured by cylinders, heads and, sectors (CHS). The entire addressing system was binary, and even standard Unix utilities used 1024 byte blocks for display.[1] Standard tools like Conky still use base 2 for display of RAM and HDD information, although, it uses the GiB style format to avoid confusion. Later, the LBA addressing system allowed for logical mapping of the CHS format as hard disk size grew, however, LBA simply applies the CHS format addressing internally in the hard disk's onboard controller and allows the OS (and the programmer) to just consider the logical blocks.

The base 2 logic follows through to larger numbers, for example, 1111111111111111111111111111111 bytes is 2GB in standard usage or 7FFFFFFF bytes in hexadecimal. It is only in decimal where this looks untidy as 2,147,483,647 bytes, but the underlying technology and conventions are not decimal. Computers are not decimal machines; they are binary machines.

Network addressing also uses binary masks on every one of millions of data packets every second to ensure correct routing but, it is a long time since the data portion of a network packet has resembled a base 2 number. Probably the outermost layer of the packet still does {conjecture}.

You will no doubt need to mention that there is confusion especially when it comes to marketing of products as being a particular size, and that there are some programitc implementations for display of values using SI units (it is no longer more inconvenient or slower {actually, it is probably still slower, but on modern computers it is no longer noticable} for computer programmers to implement decimal, particularly for display) but, there can be no doubt about computer usage that the correct answer is the base 2 convention.

1024KB is the JEDEC 100B.01 standard meaning that 1KB is 1024 bytes.

rel:
[1] Wikipedia - Cylinder-head-sector (CHS) - https://en.wikipedia.org/wiki/Cylinder-head-sector

This question has been extensively explored.

SuperUser - Size of files in Windows OS. (It's KB or kB?) - https://superuser.com/questions/938234/size-of-files-in-windows-os-its-kb-or-kb

Most OS's and the vast majority of devices that deal with memory/storage use the prefixes K for Kilo to mean 1024 bytes, so when I get RAM that says it's a 4GB module, I know it's 4 Gibi-Bytes (4*1024*1024*1024) and not Giga-Bytes (4*1000*1000*1000).


Quora - Where do we use 1 kB = 1000 bytes, 1 MB = 1000 kB, 1 GB = 1000 MB, 1 TB = 1000 GB? And where do we use 1 KB = 1024 bytes, 1 MB = 1024 KB, 1 GB = 1024 MB, 1 TB = 1024 GB? - https://www.quora.com/Where-do-we-use-1-kB-1000-bytes-1-MB-1000-kB-1-GB-1000-MB-1-TB-1000-GB-And-where-do-we-use-1-KB-1024-bytes-1-MB-1024-KB-1-GB-1024-MB-1-TB-1024-GB

The second idea was formulated by Computer industry 1KB = 1024 bytes 1MB = 1024 KB 1GB = 1024 MB Notice I am using capital B and not small b, and capital B implies bytes The small b should not be used This is the case always and is true for things related to computers


The first idea was formulated by Tele-communication industry and is applicable not for data size (bits and bytes) but for data speed (bits per seconds or bytes per second) 1Kbps = 1000 bps (bits per second) 1Mbps = 1024 Kbps 1Gbps = 1024 Mbps Notice I am using small b and not capital B, and small b implies bits The capital B should not be used This is the case always and is true for things related to data transmission

Willtech
  • 201
  • 1
  • 4
  • Why don't you come by [The Classroom (the site's chat room)](https://chat.stackexchange.com/rooms/59174/the-classroom)? – ItamarG3 Mar 12 '18 at 13:32
6

I am adding a second answer to clarify some issues with the question and to clear the obvious confusion in the answers.

  1. The question incorrectly states that the linked IEC communication recommends KB to mean 1000. The link refers to 'kilo' only.

  2. kB may mean the SI kilobyte, I.e. 1000 bytes

  3. KB does and has always meant 1024 bytes.

Number 3 is essentially the only useful definition in software engineering.Note that the K is capitalized.

There is also KiB which is equivalent to KB. Note that the kilo word is always represented by small k. For OP to teach KB as 1000 ever is always flat wrong.

The above does not apply to MB and higher. There the usage is ambiguous and depends on context.

Sentinel
  • 224
  • 1
  • 5
  • 9
    Note that while KB as 1000 may be flat wrong, it's also necessary to teach that a lot of people do this wrong, and thus students must never trust KB to mean 1024 without further knowledge of the context. – Peter Mar 10 '18 at 23:39
  • 1
    @Peter Agreed 100% A broad discussion of history and context in a way that is interesting and entertaining would help differentiate a mediocre from a decent education. – Sentinel Mar 11 '18 at 08:33
  • 2
    In what way is number 3 "the only useful definition"? – user253751 Mar 12 '18 at 04:19
  • @BenI. '@Sentinel Please remember to assume good intentions and to adhere to our [Be Nice](https://cseducators.stackexchange.com/help/be-nice) policy. I've moved this discussion [to chat](https://chat.stackexchange.com/rooms/74412/discussion-on-answer-by-sentinel-should-i-teach-that-1-kb-1024-bytes-or-1000-b), where you can continue it in a respectful fashion, remembering that we're all reasonable human beings who are here to discuss teaching together. – thesecretmaster Mar 12 '18 at 23:38
  • 1
    @immibis - it was said to be "the only useful definition ***in software engineering***". Because of the binary nature of computer architecture and software, it's probably correct. Outside of discussions about computers and particularly software, it is most likely not correct. – Kevin Fegan Mar 14 '18 at 00:34
  • 3
    @KevinFegan: The only situations I can think of where using an uppercase K for 1000 should not be viewed as being simply wrong would those where a lowercase "k" is unavailable, e.g.some situations involving signage or limited character sets. – supercat Mar 14 '18 at 16:01
  • Can you cite any sources for claim #3 that KB "has always meant 1024 bytes"? That sort of clarity has certainly not been my experience over my 5 decades in the industry. – nealmcb Mar 13 '21 at 01:13
5

Teach them that without context, you don't know because there most certainly are people out there who will use k to mean 1000 and others who will use k to mean 1024. Which is right is not relevant because both usages are out there. This leaves any use of "k" with bytes ambiguous unless whoever gave the number also specified what they meant.

For this reason I'd recommend that you teach that when giving a value in bytes, always use an IEC prefix like Ki instead. 10 kB is ambiguous, 10 KiB is not.

We can declare certain usages are "wrong" all we want, and I'm not saying that is necessarily unjustified, but that doesn't make those usages go away.

smithkm
  • 167
  • 3
  • Not seen many decimal based computers recently so Kb when referring to computer isn't ambiguous – Neuromancer Mar 09 '18 at 22:05
  • 3
    @Neuromancer Whether it's ambiguous or not has nothing to do with decimal based computers... – user253751 Mar 10 '18 at 01:57
  • @smithkm Show me where k small k is ambiguous. – Sentinel Mar 10 '18 at 07:38
  • 1
    @Neuromancer Kb means... Maybe kb. Oh, the speed of telephone modems that were common until the early 2000s was given in kb/s. – rexkogitans Mar 10 '18 at 20:40
  • @rexkogitans It was Kbps for Kilobits per second. Of course some networking utilities would scale it to bytes and that would be KB/s (usually something like that) but the modems were Kbps just like now it might be Mbps or Gbps (and so on). Or if you're extremely unlikely yes Kbps. (Perhaps some wrote it as kbps though) – Pryftan Mar 14 '18 at 21:22
2

Teach them both but focus on 1024 in problems. They'll need to convert bandwidth, etc in networking and other courses.

Converting using 1000 is easy but 1024 is tricky so focus on that, the knowledge will help them in computer architecture, assembly and networking courses. They'll have to work with it someday so get them ready

Lynob
  • 121
  • 2
  • @immibis '@Lynob If you'd like to continue this discussion, plase do so [in chat](https://chat.stackexchange.com/rooms/74414/discussion-on-answer-by-lynob-should-i-teach-that-1-kb-1024-bytes-or-1000-byte). But, if you simply believe the answer is incorrect, [downvote and move on](https://meta.stackexchange.com/a/186084/303538). – thesecretmaster Mar 12 '18 at 23:51
1

The other answers all give solid reasons for teaching that both exist and how badly messed up the current situation is. This is important, but it does not clarify what the students should prefer to use themselves. This answer focuses on the practical side of what the students can do; after learning about the current situation from the other answers.

Assume the worst-case

As with all uncertainty in computing, the safest option is always to assume the worst-case scenario. That is, to minimise the chance that an incorrect assumption will cause bugs.

In this situation, the following can be applied to cover your bases:

  • Assume the amount of resource you have is in multiples of 1000 Bytes.

  • Assume resources used by 3rd party libraries etc. is in multiples of 1024 Bytes.

  • Provide any figures for resources you use as multiples of 1000 Bytes.

These three assumptions ensure that:

  • At worst, you will think you have less resources than you actually do. For example, assuming 4kB RAM means "4000 Bytes" could mean you plan for having 96 fewer Bytes than you actually do. But it means you will never plan for having 96 Bytes more than you actually do.

  • At worst, you will assume the library that said it uses 2kB RAM meant it uses 48 Bytes more memory than it actually does (assume it meant 2048, not 2000). But you will never plan for it using 48 Bytes less RAM than it actually does.

  • At worst, 3rd parties will assume your program uses more resources than it does, by assuming you meant 1024 Bytes per kB not 1000. But you will never accidentally lead somebody to think it uses less than it actually does.

Of course, it's not ideal to have to "lose" resources unnecessarily. But in the general case, the small difference is unlikely to be enough (especially as a student) to make their project unfeasible. In those specific cases where it does, they should already be measuring the exact footprints of everything and not assuming the sizes of anything from documentation alone.

The benefit however, is that your assumptions about what somebody else meant by "2kB" will not hurt you when they're wrong. Which in this specific case, and as a general lesson to your students - I feel is important.

Bilkokuya
  • 180
  • 4
0

“Which conversion should I teach to my undergrad students?”

Are these engineering related undergrads? If yes, I'd with 1024, based on binary math as that is what engineering is based on.

You can count off the bits on your fingers:

  • $1$ finger = $2$ states, 0 and 1.
  • $2,4,8,16,32,64, 128, 256, 512, 1024$. The highest decimal value that can be realized is 1 less, while the number of states represented is $2^x$ list.
  • $2^1 -1 = 1$. Therefore 0,1
  • $2^2 - 1 = 3$. Therefore 0,1,2,3
  • $2^3 - 1 = 7$. Therefore 0,1,2,3,4,5,6,7
  • etc. up to $2^8 - 1 = 255$. Therefore 256 states, from 0 to 255.

Manufacturers may advertise as 2.2TB, but the operating system will report it as 2TB, or maybe even 2TB usable.

ctrl-alt-delor
  • 10,635
  • 4
  • 24
  • 54
  • 3
    Incorrect, unfortunately. Different operating systems report differently. Specifically the fruity ones. – Peter Mar 10 '18 at 23:43
  • Incorrect, fortunately. Decent operating systems report sizes correctly, with GB = 1 billion bytes. The fruity ones started it. – gnasher729 Mar 11 '18 at 18:02
  • 4
    @gnasher729: Given that allocation units are multiples of 512 bytes on just about every operating system, reporting disk utilization in units of 1024 bytes makes a lot more sense to me than reporting in base ten units. – supercat Mar 13 '18 at 23:06
-1

In my 26 years as a professional software engineer I have never encountered KB to mean anything other than 1024.

Teach them whatever definitions you like and make sure that they know that 1024 is the only useful one.

Sentinel
  • 224
  • 1
  • 5
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/74396/discussion-on-answer-by-sentinel-should-i-teach-that-1-kb-1024-bytes-or-1000-b). Discussion is for chat, not for comments, and any further discussion in the comments will be deleted. – thesecretmaster Mar 12 '18 at 17:30