189

In computer science, we usually count starting from 0. Is there any effective way to explain why, to new programmers who ask why?

I've read a bunch of different sources that list several reasons for 0-indexing. However, they either seem unconvincing, or seem hard to explain to a new programmer who doesn't have much experience with computer science yet:

  • Zero-based indexing lets you write loop conditions as i < n; with one-based indexing, we'd have to write i < n+1, which saves one instruction. This makes no sense to me; i <= n seems like it would be fine.

  • In C, we can use *a to access the first element of array a. This doesn't seem very relevant today for a new student who isn't programming in C.

  • With zero-based indexing, the expression a[i] compiles to an operation on memory address a + c*i where c is a constant representing the size of a single array element; with one-based indexing, it would need to compile to a + c*i - c. This reason seems debatable today (couldn't a modern compiler often optimize this away?), but more importantly, I don't really want to be explaining memory addresses to a new programmer who doesn't understand memory layout and maybe doesn't even understand arrays yet.

Is there a better way to explain why to a new student who is just getting started with computer science?

D.W.
  • 1,895
  • 2
  • 9
  • 11
  • 15
    As an extra source of input, have you seen the other question [Real life examples of 0-indexing](https://cseducators.stackexchange.com/q/213/104)? Might be some help there. – Gypsy Spellweaver Sep 03 '18 at 02:05
  • 4
    Please avoid adding your answers in the comments. It gets around our quality control systems, for example downvoting and collaborative editing. It also unfairly puts your answer above everyone elses. I've moved these answers-in-comments [to chat](https://chat.stackexchange.com/rooms/82811/discussion-on-question-by-d-w-why-do-we-count-starting-from-zero) for posterity. Please avoid adding future answers in the comment section. – thesecretmaster Sep 06 '18 at 01:19
  • 4
    **Please do not use the comment section to answer the question.** I will be periodically deleting all answers-in-comments. Instead, you're welcome to write up your thoughts as an answer, using the form at the bottom of the page. – thesecretmaster Sep 09 '18 at 00:45
  • It is actually simple to answer: we start counting from zero instead of one for the same reason that rulers and measuring tapes start from zero instead of one. The "counting" index is the offset from the array's beginning, so it makes sense that the beginning is offset from itself by zero, or in other words, it is where it is. – NOT CSEducator May 18 '22 at 21:15

30 Answers30

229

None of the reasons you suggest really get to the heart of why we use zero-indexing in CS. Dijkstra's EWD831 explains why this convention works out the best. It comes down to the fact that we want to represent sequences of integers as half-open intervals that are inclusive on the start side.

To denote the subsequence of natural numbers 2, 3, ..., 12 without the pernicious three dots, four conventions are open to us
a) 2 ≤ i < 13
b) 1 < i ≤ 12
c) 2 ≤ i ≤ 12
d) 1 < i < 13

To paraphrase Dijkstra:

  • (a) and (b) have the advantage that subtracting the bounds gives you the length, which is convenient because you don't need to remember to add or subtract 1.
  • (a) and (c) have the advantage that sequences starting with zero don't need a negative lower bound -- and negatives are no longer natural numbers.
  • (a) and (d) have the advantage that if you create an interval that starts with zero and shrink it down to zero-length, you don't need a negative for the upper bound.

Because of the above, we want to write all of our intervals as (a).

Once you accept that (a) is the correct way of specifying intervals, indexing an array of length N as [0, N) is much nicer than [1, N+1).


Two additional notes in favor of using (a) for intervals.

The first point above is important because half-open intervals nicely decompose into other half-open intervals. This makes implementing divide-and-conquer algorithms like merge sort significantly less error prone.

For example:

  • [4, 14) can be broken into the equal-sized intervals [4, 9) and [9, 14), and the middle index is computed as $9 = \frac{4 + 14}{2}$. This is quite clean and nice.
  • [4, 13], decomposes into [4, 8] and [9, 13], where you get the end of one sequence as $8 = \frac{4 + 13 + 1}{2} - 1$, and the beginning of the next sequence as $9 = \frac{4 + 13 + 1}{2}$. It's easy to forget to add or subtract 1 somewhere.

It's also a bit jarring to specify intervals with (b), because it feels unnatural to skip the first element when you're writing a loop.


As for how to explain the above to your students, I think that they just need to accept that breaking sequences of integers in half is something they'll do later on, and then a few examples will quickly point to [,) intervals and zero indexing as the most natural choice.

drawoc
  • 1,819
  • 1
  • 5
  • 4
  • 7
    The half-intervals decomposing into half-intervals is a nice touch. Dijkstra's explanation does good job of justifying the use of the common scheme within CS as it was becoming a modern science. – Gypsy Spellweaver Sep 03 '18 at 06:53
  • 31
    Nice contribution and good insight. You've made this corner of the internet a better place :) Welcome to [cseducators.se]. Take a look around, I'd love to know what else you can add. – Ben I. Sep 03 '18 at 12:11
  • 4
    While a good reason to start indexing at 0, it feels much easier and more natural to mentally parse (d) and especially (c) than (a) or (b) - "numbers from x to y" in natural language usually means inclusive of both (or exclusive of both) – msam Sep 04 '18 at 13:34
  • 2
    Note that adopting the convention more widely helps in a number of other scenarios, not just array indices. For instance, it [eliminates a whole class of errors in date handling](https://sqlblog.org/2011/10/19/what-do-between-and-the-devil-have-in-common). If you're doing something like a fee scale for package weights, the exclusive upper-bound also allows measurement result to be independent of the limit specified (ie - if an inclusive upper bound is given as `2.9` but the scale measures as `2.95` you have issues). – Clockwork-Muse Sep 04 '18 at 16:21
  • 4
    @msam: Fully closed or fully open may be "more natural," but the subtraction property of half-open is extremely important. The interval [a, a+b) is exactly b elements long, without any pesky +1 or -1 terms. You simply cannot avoid having +1s and -1s everywhere when you use fully closed or fully open intervals. – Kevin Sep 05 '18 at 01:30
  • 11
    This also matches the behavior of the modulo operator, the return value of which lies in [0, n). `number_of_values_ending_in_a_particular_digit[x % 10] += 1` – Roman Odaisky Sep 05 '18 at 01:48
  • 2
    This is a good reason, but I think a more fundamental one is simply hardware: binary registers and memory locations representing unsigned integers have a 0 "all bits off" state. Indexing by one would waste that. – Lee Daniel Crocker Sep 05 '18 at 17:14
  • 1
    Every answer with Dijkstra in it is nice. But......... this answer seems to answer a completely different question, doesn't it? This answer only concerns itself with the openness of intervalls. Neither the quote from Dijkstra nor the rest of the answer has anything to do at all with the actual question - which is, to rephrase a bit, "why do array indices start at 0" (sic). – AnoE Sep 06 '18 at 12:07
  • 1
    Those reasons don't seem entirely compelling in the argument of 0 vs 1. For example, in the second point you can argue that sequences starting with 1 don't need negative numbers either, and in the third point, you can argue that sequences starting with 1 also don't use negative numbers for zero-length (although the counter-counter-argument is that they still require reversed bounds, which may be confusing). And if you know that the lower bound is 1 then the upper bound in form (c) is also the length, no arithmetic required. – Miral Sep 07 '18 at 00:25
  • 1
    Good point about the half-open ranges. The other common case where these come in useful is string manipulation. Having zero-based half-open ranges there makes substring positions and lengths a lot cleaner to work with. – Dewi Morgan Sep 07 '18 at 19:49
  • 1
    What do you mean by "indexing an array of length N as [0, N)"? – HelloGoodbye Sep 11 '18 at 14:12
  • "indexing an array of length N as [0, N)" - perhaps I'm not understanding the notation (especially the varied brackets), but surely an array with 0 ... N elements is not of length N? – youcantryreachingme May 18 '21 at 01:31
114

If I can modify the question, I can answer what I believe you are looking for.


The question is "Why do we count starting from zero?" The answer is "we don't" Not even in computer science do we "count" from zero. A list with 15 items in it from outside CS still has 15 items in it inside CS realms. Many languages include a count type function for arrays, and such a function would still return 15 for that list.


If the question becomes "Why do we index from zero?" then it has a different answer. This time the answer is "because that's the way everyone else does it." When we separate the concepts of "indexing" and "counting" things become simpler to explain. Often true of any concept, use the proper terminology and many difficulties are eliminated.


As to why we, as humans, index from zero, while counting from one, only requires a little thought before clarity arrives. Assuming a classroom environment, where the class period lasts for a specified time, perhaps 50 minutes, you can ask at very near, but after, a 60 second mark, how many minutes the class has been in session. If asked at 5:02 minutes into class, the answer will be five minutes. Ask them how long class had been in session five minutes ago, which would be around 0:20 into the class period. The answer should be zero minutes. Once they accept that it was zero minutes, even at 0:59 into the class, you can emphasize that even though the class time had not reached a full minute, it is still class time, and should have some way of "indexing" it. Since the number before one is zero, it must have an index of zero. Hence the "first minute" (by counting) has an index of zero.

The same thought experiments can be conducted with a ruler/tape measure, where the measure starts at zero, and the first centimeter/inch is called, and written, zero. The same concept applies to any distance, such as miles or kilometers, but those are harder to fit into the classroom. In the USA, most highways have "mile markers" that help in locating people for emergency responders. They mark the distance from the beginning of that highway on its south-western end. (The beginning is at the state line if it crosses from one state to another.) "Mile marker" 1 is set after the first mile is completed. "Mile marker" 0 is sometimes seen on the state line, though it is not always there. The first mile, in counting, has an index of 0.

As another example, though not quite as helpful, and requiring that the students have a level of algebra experience to apprehend it, is exponents. Limiting it to non-negative powers of ten and applying it to whole numbers only, we can use the fact that $10^0 = 1$ while $10^1 = 10$. The exponent represents the index for how many places to the right to move (again distance to move) the decimal point, while the count of the digits before the decimal point will be 1 larger.

Final demonstration: "What is the index of the first position on the volume knob below?" "What is the index of the last position on that volume knob?" "How many positions can you count on that volume knob?"

Volume Knob


Once the "index" vs "count" is solved, you can move into the CS realm, where the same practices are followed.

An array is "indexed" from zero and "counted" from one. Why? Because that is the way humans index and count. It certainly comes in handy when doing arithmetic with memory addresses, and there's nothing wrong with taking advantage of things that "just work" naturally. Similar shortcuts include dividing and multiplying by powers of two by shifting bits to the right of left. Its usefulness to the compiler, or the developer, however, is not the why. Such "benefits" are only side effects of "what was" before CS was a reality.

Bottom line is that index from zero and count from one existed a very long time before electronic computers were invented. Like any other science, computer science has to build upon what already exists before inventing new things. Zero-based indexing is one such "pre-existing" thing.

Gypsy Spellweaver
  • 5,425
  • 2
  • 17
  • 34
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/83029/discussion-on-answer-by-gypsy-spellweaver-why-do-we-count-starting-from-zero). – Ben I. Sep 11 '18 at 17:54
  • "*If asked at **5:02** minutes into class, the answer will be five minutes. Ask them how long class had been in session five minutes ago, which would be around **0:20** into the class period.*" Can't tell if you're assuming it takes 18 seconds, give or take, to answer the first & ask the second question or if it's a typo. ;^) – ruffin Sep 19 '18 at 14:18
  • 1
    @ruffin 18 seconds is an estimate of the dialog time. – Gypsy Spellweaver Sep 19 '18 at 15:44
  • You are correlating continuous time with discrete "buckets" for storing things - completely irrelevant. To counter, I could say "in which minute are we when we are 0:20 into the class?" and the answer would be "the 1st". So why not index from 1? As for your volume knob, I would assume zero to mean zero volume - that would be an empty array; an array with zero elements. That does not mean the 1st element should be called element zero. Completely irrelevant and I can't believe this got 100+ votes. – youcantryreachingme May 18 '21 at 01:36
  • @youcantryreachingme "How old is the child?" and "Which year is the child in?" also differ by one. It depends on teh question asked. "How many elements in the array?" and "What's the last index of the array?" differ by one. Interestingly enough, if you've used elementary school graphing, zero is also the beginning of each axis, known as the "origin." That difference between "count" and "index" is included in the opening of the answer. Of course, if you find this answer to not be helpful, for any reason, or none at all, you can still down-vote it. – Gypsy Spellweaver May 18 '21 at 02:42
  • @GypsySpellweaver - again - continuous time and continuous axes - irrelevant. The correct answer is Buffy's. See also my "TLDR" comment there. Let's not clog up these comments further by going back and forth. – youcantryreachingme May 18 '21 at 23:43
  • Great answer. You could have mentioned the concepts of "ordinal" vs "cardinal" for "index" & "count". Together with the answer that quotes Dijkstra, this is pretty much it. – Victor Eijkhout May 11 '22 at 16:35
63

I'm surprised that the following hasn't been stated yet. All of the answers given so far seem to be "after the fact" explanations of something that is really based on the way people built most (not all) early computers as binary machines.

To make things a bit more compact here, I'll assume we are creating a 4 bit (nibble) based machine. We don't want to use more binary components to build a nibble than we need to because we are cheap, so we use four bi-stable components (relays, vacuum tubes, transistors). We think of (perhaps) the two states as zero and one rather than, say, red and green.

So we have (0000) through (1111) as the combinations of the four transistors of a nibble. We can interpret them any way that we like. Suppose we want to interpret them as integers. How shall we assign the different codes to integers. We could let (0000) represent 42, a fundamental universal constant, and (0001) represent 3, an approximation to pi and so on, but we realize pretty soon that any computations we want to do with such an encoding would be pretty complex - and complexity in a machine costs money. We are cheap, remember.

So we notice that the codings are actually binary numbers. Well almost nobody used binary numbers before this so we start to think about it seriously now, as they have an application. I note that some early computers were actually decimal, not binary, but they recognized that while convenient it wasn't cheap.

Now, using binary numbers, not just binary encoding, it becomes "obvious" that the codes represent zero through fifteen, not one through sixteen. Using (0000) to represent 1 just seems dumb at this level - and at this time.

Now, (bit later) we want to index an "array" of nibbles. How shall we do it. Well we assign index numbers to the individual cells. Suppose we have four such cells. How shall we do it? We could use (0001) through (0100) to index them (1 through 4), but now (a-ha) if we have sixteen nibbles to index but start with one we can only get (our count) to fifteen without using another nibble or letting (0000) represent the last (sixteenth) cell. That seems dumb, so we (a-ha) index from zero. Now we can index sixteen cells with sixteen codes and it is cheap and pretty natural. The only cost is a bit of confusion in the minds of beginning programmers, but others will pay those costs and our machines can be cheap.

No contest. Index from zero.

The other answers here explore why the arithmetic inside the machine is cheap this way, so I won't repeat it here. But note that all of this happened before any languages at all were invented other than the simplest of machine coding without any abstraction facilities at all. It was just economics and engineering. Make it simple, keep it cheap.

Buffy
  • 35,808
  • 10
  • 62
  • 115
  • 8
    Give your students this simple task and they will find the answer themselves: **Number these ten students, each with a unique number, but you can only use a single digit** - they will quickly come up with the solution to number then 0 to 9 - and that is why we index starting at zero, we don't like waste :-) – Falco Sep 03 '18 at 11:46
  • 21
    Interestingly, all the *early* languages have *one*-based (or *user-defined*) indexing (e.g. Fortran (1957), Algol (1958)), or don't have a notion of "array" at all (Lisp (1957)). Zero-based indexing is a rather *modern* invention. – Jörg W Mittag Sep 03 '18 at 14:51
  • 10
    Interesting that on rotary telephones 0 (with ten pulses) comes after 9, but then pulsing the line zero times doesn't really work physically. – Scott Rowe Sep 03 '18 at 17:36
  • 6
    @JörgWMittag: Interesting! I wonder if designers of early languages were thinking that anyone who knew how computers and binary numbers really worked would just write in assembly language. Early computers were slow, and early compilers presumably didn't optimize well / at all. Anyway, this answer is exactly what occurred to me, too: letting an N-bit index address a full 2^N elements. Power-of-2 buffer/object sizes are a big deal. – Peter Cordes Sep 04 '18 at 00:01
  • 2
    @PeterCordes, languages have been a search for higher levels of abstraction and the ability to write more complex programs without getting lost in the details of assembler. The assumption was that problems were outgrowing our ability to think about them in low level terms. Languages aren't a placebo for babies, but a tool for experts. For a shock, look at the Google home page. Then use your browser to look at the actual source for that. Now try to duplicate it in assembler. Note that even assembler is a language with _minimal_ abstraction, unlike machine code with none at all. – Buffy Sep 04 '18 at 00:08
  • @Buffy: Yes, I understand that's the other motivation for higher-level languages, and why I use `bash` as a shell instead of an interactive assembler where I'd type `mov eax, __NR_fork` / `syscall`. But I thought 1958 was still early enough in the history of computing that performance might still trump everything for many problems. I could easily be wrong. Having grown up with C (and assembly), 1-based indexing seems totally unnecessary and unnatural *to me*. I'm not familiar enough with the history of computing and the motivations of early language architects to rule out my guess. – Peter Cordes Sep 04 '18 at 00:15
  • 8
    @ScottRowe zero had 11 pulses, and one had 2 pulses. This was to stop the equipment reacting to random single pulses. – ctrl-alt-delor Sep 04 '18 at 08:09
  • 2
    @PeterCordes It's one of the reasons so many computer scientists/engineers hated C and Unix - they took the "performance is better than correctness" approach, which was contrary to the prevailing thought "as computer software gets more complex, it's even *more* important to keep things correct". A correct program that's easy to understand gives you more opportunities for performance improvements too - most of the time when a C program was faster it did so by sacrificing correctness (e.g. ignoring boundary conditions). Dijkstra's lessons on sustainability and quality of software are great intro – Luaan Sep 05 '18 at 08:59
  • 1
    @Luaan: I'm not familiar with early C or Unix being unpopular for that reason. Early compilers sucked at optimization, so to make things fast people left out safety checks? These days optimizing compilers can hoist bounds-checks out of loops if it's the same bounds on every iteration, so you can write safer code (especially with C++ to hide the noise behind `std::vector`). C makes correctness optional, and makes you do it yourself, but it's still possible to write correct C functions / programs. These days it's a nice language for writing hot loops, but not great for maintainability. – Peter Cordes Sep 05 '18 at 09:10
  • @Luaan: Anyway, I don't see what zero-indexing has to do with correctness in any given language. zero-indexing isn't *why* C makes unsafe code more easily possible. Correlation / causation? – Peter Cordes Sep 05 '18 at 09:12
  • @PeterCordes You need to compare early C with languages of the time - C lacked pretty much all the things that other languages used to help you write correct programs. The goal in C was simplicity on the compiler and potentially performance, the trade-off was effort on part of the programmer. Contrast the difference between, say, iterating over an array in a for loop and doing a `map` - in the first case, you've opened yourself to potential incorrectness (out-of-bounds etc.), in the second these don't even make sense. C's design went directly against what those people thought was good design. – Luaan Sep 05 '18 at 13:18
  • 1
    @Luaan, don't lose track of the fact that C excelled at the task it was actually designed for. Kernighan and Ritchie were, at the time, at Bell Labs looking for a Better Assembler for a "spare" DEC PDP-7 computer they were using to develop Unix. It has the features it has precisely to match instructions in the PDP assembler set. That it was used for other purposes is a fact of history, of course, but for its designed usage it was near ideal. Ritchie wasn't trying to outdo Algol (or COBOL, of course), just do assembler more effectively. – Buffy Sep 05 '18 at 13:25
  • 1
    @Luaan, however, don't conclude from the above that I think C is a _good_ or even _adequate_ solution for modern programming. I would use it if I were writing device drivers and such, but prefer a safer language with features such as those described by user Peter Cordes. Even safe auto-typing is a big win. Actually safe anything is a big win, especially memory management. – Buffy Sep 05 '18 at 13:34
  • @ctrl-alt-delor - That's certainly not the case for British lines, and wikipedia suggests it wasn't the common system (https://en.wikipedia.org/wiki/Pulse_dialing). We used to have fun by dialing people's numbers by hitting the hangup button the appropriate number of times. Well, maybe "fun" is pushing it. – Guy G Sep 05 '18 at 15:06
  • @Guyg I did it to defeat the lock (in England), and I had to add one to each number to get it to work. – ctrl-alt-delor Sep 05 '18 at 15:21
  • 1
    @ctrl-alt-delor - I could swear I used to use 1 for 1, 2 for 2, etc, then 10 for zero (also in England). Some quick Googling seems to confirm this, but I'm happy to be corrected... – Guy G Sep 05 '18 at 15:29
  • @guyg I just tried it, it seems that you are correct. I wonder where I got that idea from. – ctrl-alt-delor Sep 05 '18 at 15:34
  • @JörgWMittag yes but the thing that was established before arrays/indeces was that numbers themselves start at 0 in binary. Because numbers start at 0 it is most natural for indexing to start at 0. – levininja Sep 05 '18 at 16:54
  • 6
    @JörgWMittag "*Zero-based indexing is a rather modern invention.*" It's been around since the first assembler. – RonJohn Sep 09 '18 at 09:25
  • @ctrl-alt-delor IIRC the pulses are when you *lift* the button. There's 1 *pulse* for 1, but you have to press it down first. – Will Crawford Sep 10 '18 at 02:07
  • 1
    @PeterCordes Early computers didn't have assemblers, programs were written in machine language directly. – Willtech Sep 13 '18 at 10:51
  • This answer addresses the question as I understood it. – Willtech Sep 13 '18 at 10:52
  • 1
    @Willtech: Wasn't assembly well-established before higher-level languages came along? Hmm, maybe not, but same difference in machine code directly: you add an offset to a pointer to get a new address. (At least on machines that *have* normal address registers for indirect addressing, or memory-indirect addressing) And/or you have indexed addressing modes. I don't think any hardware ISAs had 1-based indexed addressing, but I don't actually know really early ISAs. (I know 8-bit ISAs like 8080 have pointer registers, though.) – Peter Cordes Sep 13 '18 at 17:23
  • 1
    @PeterCordes Direct machine code programming preceded assembly languages, assembly languages were the first higher level languages. Machine code vs assembly code: https://stackoverflow.com/questions/466790/assembly-code-vs-machine-code-vs-object-code – Willtech Sep 14 '18 at 23:22
  • 1
    @Willtech: yes, I know that. I have a gold badge in the `[assembly]` tag over on SO. My point was that normal hardware has 0-based indexing / pointer-math, whether you generate machine-code from asm or write the machine code directly. And my other point was to acknowledge that maybe I should have said "machine code" instead of "assembly" when talking about alternatives to higher-level languages like Fortran, if asm wasn't already widespread before higher-level languages were first being developed. – Peter Cordes Sep 14 '18 at 23:35
  • TLDR: 0 is the 1st binary number of relevance to electronics and computing and therefore when talking about sets of binary numbers, we use that valuable precious resource of the zero value in the limited binary number set available when computing was being invented. *This* is the most sensible answer. Because the code will need to *match* a binary 0000 when referencing the 1st element of a set. Kudos. – youcantryreachingme May 18 '21 at 01:39
40

You have already gotten several good answers about why zero-based indexing is useful, and you have been explained the difference between indexing and counting.

I now want to challenge the basic premise of your question: it is not, in fact, universally agreed-upon that indexing starts at zero.

In Excel, which is arguably the most widely-used programming language in the world (even though it doesn't look much like a "traditional" programming language), row-indexing starts at 1 and not 0, and column-indexing starts at A and not ε.

In Visual Basic, the programmer can choose between zero-based and one-based indexing as the default indexing on a per-module (per-file) basis with the Option Base declaration. Plus, the programmer can declare the exact index range for each individual array using the To keyword in the array declaration, e.g.:

Dim Count(100 to 500) as Integer

This declares an array of integers named Count with indices ranging from 100 to 500 (inclusive).

In the "Wirthian languages", i.e. Pascal and all its successors (Modula-2, Oberon, Component Pascal), arrays can be indexed by any arbitrary range of scalars (except reals).

type
   foo = array[-10 .. 10] of real;
   (* an array of reals with indices ranging from -10 to +10 *)

   bar = array['b' .. 'g'] of 100..200;
   (* an array of integers from 100 to 200 with indices ranging from 'b' to 'g' *)

   weekday = (monday, tuesday, wednesday, thursday, friday, saturday, sunday);
   baz = array[tuesday..friday] of boolean;
   (* an array of booleans with indices ranging from tuesday to friday *)

The Wirthian languages, in turn, inherited this from ALGOL-60 and ALGOL-68.

Ada is similar, any discrete type can be used as the index type. Interestingly, idiomatic Ada style suggests using a range starting from 1 and not 0 when using an integer range as indices.

In Eiffel, which is heavily inspired by Wirthian languages, array indices have explicit lower and upper bounds which can be any signed 32-bit integer, so you could have an array with indices ranging from -1000000 to -1, for example.

In Fortran, the default is to start at 1, but like ALGOL, Pascal, Eiffel, and VB, you can specify any arbitrary lower and upper bound.

In Matlab, indexing starts at 1. In APL and Perl, you can choose.

Even in the "real world", there are different schemes. E.g., in Germany, the ground floor, i.e. the floor you enter from street level is called "ground floor", and is usually labelled 0 on elevator buttons that use numbers ("EG" if using letters). The floors above ground level are called "1st upper floor" (and so on) and labelled 1, 2, etc. (or "OG 1", …) The floors below ground level are called "1st lower floor" (and so on) and numbered -1, -2, etc. (or "UG 1", …)

In the US, "1st floor" is the floor at ground level.

Apparently, in Barcelona, there is a "ground floor", a "primary floor", and then the "first floor" is two stairs up from ground level.

There is an interesting discussion about this on the Wiki: http://wiki.c2.com/?ZeroAndOneBasedIndexes

I also found an essay that compares the syntactic and semantic noise of typical tasks using many different indexing schemes: http://enchantia.com/graphapp/doc/tech/arrays1.html

Jörg W Mittag
  • 1,086
  • 6
  • 9
  • 1
    It would seem that in Barcelona the "first floor" is one floor above the "primary floor", which name suggest that it is the "origin" for the building's life. The "ground floor" is presumably where garage, storage, utilities, and other facilities exist while regular rooms (kitchen etc.) are on the primary floor. Servants' quarters, are also likely to be "below" the primary floor. The ground floor would, presumably, have an index of -1. – Gypsy Spellweaver Sep 03 '18 at 17:41
  • 2
    In one university I have visited, the floor numbers were in feet above sea level. So not even consecutive. (This is also discussed here https://cseducators.stackexchange.com/a/215/204 ) – ctrl-alt-delor Sep 04 '18 at 08:19
  • 4
    The reason for the numbering of floors in Barcelona is due to an old building regulation which dictated no buildings could have more than six floors. To work around this restriction, extra floor names were added so that the maximum numbered floor was always #6. The extra floor names are "Entresuelo" and "Principal" just above the ground floor, and "Atico" and "Sobreatico" just above the sixth floor. (Not all buildings will have all of these floor names). – Aaron F Sep 04 '18 at 12:20
  • 1
    There's nothing factually incorrect about these examples but it doesn't mean that it's logical. The first hour of the day on a 12 hour clock starts is 12 (while minutes start at 0) and the second hour starts a 1. A 24 hour clock starts at 0. Which makes more sense? – JimmyJames Sep 04 '18 at 19:55
  • To add a floor counting example, in some parts in Germany counting starts with ground floor ("Erdgeschoss"), then floor 1, in other parts, counting starts with floor 1 at the ground. (I almost installed a network in the wrong floor). – Philm Sep 05 '18 at 15:11
  • Next: In Siemens buildings, it is a tradition that the most lower floor gets number 1, so normally the buildings have (at least) one basement so ground level has number 2 in elevators (which is obviously confusing to most guests)- but if the building has more than one basement- it is not easy to know which number has the ground level- not very convenient for guests or imagine fire alarm. – Philm Sep 05 '18 at 15:13
  • 2
    *Nothing* Wirthian is inherited from ALGOL-68. – philipxy Sep 06 '18 at 02:35
  • 4
    +1, but this is only half the answer. The other half is that in the 70's C made the choice to directly support only indexing arrays from 0, as this made the compiler simplest (which was really C's primary design goal). Because C compilers were so simple, they were easy to make and put on any machine, and as a result of this and a few other bits of historical luck, C became super popular. As a result of that, a whole host of other languages took their lead from C. – T.E.D. Sep 06 '18 at 14:32
  • @Philm: The Wiki discussion I linked to has an even more confusing example. At IBM Rochester, they use the same numbering, but the buildings are so tightly interconnected that they look and feel like *one* building. So, you could walk a couple of meters on the ground floor and pass from floor 2 to floor 3 without even noticing, just because you happened to step from a building that has 1 basement to a building that has 2. – Jörg W Mittag Sep 06 '18 at 14:50
23

When I've explained this to beginning students, I don't stray far from your third reason, though I agree that the beginning of arrays is early to introduce the concept of memory addresses. Among other things, that invites about your variable, c, when the only variable they need to worry about is i.

Instead of memory addresses, I talk about distances from the start. I've given a sample below using paper and a pencil as a reference point, though in my classroom I use the whiteboard and hold a marker. All of the numbers are in little boxes, and together they form a rectangle.

Look at the list of numbers on the paper. We all know about ordinal numbers, "first", "second", "third", and so forth. But when we program, we actually refer to the list numbers a slightly different way, as 0, 1, and 2. This is called 0-based indexing, and the reasons for it don't matter right now. What matters is that the first item on the list is actually item 0.

Honestly, that's all you need to know to use arrays correctly. But if you want a hint about why we actually do this, you can think about zero-based indexing as the distance from the head of the array.

So, if I'm at the head of the array, how many moves do I have to make to read the first number? No moves, I'm already there. What about the second number? I make one move to get there. The fourth number? One, two, three moves.

Beyond the most important fact that we start counting from 0, thinking about it as "moves" is actually a good way to think about array indexes, because when we come back to this topic later, you'll see that the distance from the head turns out to be an important concept for understanding a lot of things a computer does. Don't worry we'll get there soon enough.

So, for now, what is the index for this spot in the array? (I point to a random index, wait for the students to arrive at the answer.) Good, well done. Now, moving on to ...

Ben I.
  • 32,726
  • 11
  • 68
  • 151
  • 1
    'cardinal numbers, "first", "second", "third"'. These are ordinals, not cardinals. Cardinals count amount. Ordinals count order. – Potato44 Sep 03 '18 at 12:43
  • @Potato44 Yes, they are. What an embarrassing error! Thanks for pointing it out, I'll fix it now. – Ben I. Sep 03 '18 at 14:49
  • 2
    Ah, but some of us are willing to say zero-th, also. – Buffy Sep 03 '18 at 14:54
10

In computer science, we usually count starting from 0.

In programming or in (theoretical) computer science?

In Programming

In C programming language you count from 0 to (N-1). And of course in languages which are influenced by C: Java, JavaScript, PHP, C#, C++...

You already named the reason for this:

In C, we can use *a to access the first element of array a.

... and because a[i] is the same as *(a+i) the first index of the array must have the index 0.

This doesn't seem very relevant today for a new student who isn't programming in C.

In many (most?) other programming languages (Basic, Pascal and Matlab for example) you typically count from 1 to N, not from 0 to (N-1).

(For example in for loops.)

As already said in the other answers there often is the possibility to define the index of the first element of an array freely (e.g. in Pascal) or the index of the first element is even fixed to 1 (e.g. in Matlab).

These languages don't have pointer arithmetic.

In (theoretical) computer science

I have no idea if they count from 0 there.

However I think that the programming languages which are used most influence the way of thinking in theoretical computer science, too.

Martin Rosenau
  • 201
  • 1
  • 3
  • 2
    I think the C equivalence of `arr[i]` = `*(arr+i)` follows from its choice of zero-based indexing, not the other way around. But good point that having pointer arithmetic at all makes zero-based indexing much more natural, and that most still-used languages with 1-based indexing don't have that. C started as a portable assembly language, so it's totally natural that it works like asm in this regard. – Peter Cordes Sep 04 '18 at 00:04
  • 1
    @PeterCordes I'm not sure. But I think the fact that the array operator can be used for pointer arithmetic at all (such as `ptr[-5]`) played a role in defining that `ptr[i]` is equal to `*(ptr+i)`. – Martin Rosenau Sep 04 '18 at 05:53
  • 3
    The advantage of zero-based indexing becomes even more apparent when you need to map a 2D matrix to a linear array. – 200_success Sep 04 '18 at 18:13
  • 2
    This is the missing half of Jorg's answer, which together make the **correct** answer. C made the choice to only support indexing arrays from 0, and after it became popular it influenced the design of a lot of other languages. However, its quite likely *most* languages don't force a 0-based index, and some people thinking this is a computer language thing, (or worse yet, a Computer Science thing) is nothing more than myopia. – T.E.D. Sep 06 '18 at 14:37
  • Most index from 0, including BASIC (though I'm sure there's a variant which starts at 1). [Here's a table](https://www.wikiwand.com/en/Comparison_of_programming_languages_(array)#/Array_system_cross-reference_list). – Schwern Sep 10 '18 at 22:53
  • @Schwern Depending on the variant of BASIC arrays start with 0 or 1. However it was typical to start `FOR` loops at 1, not at 0! I just looked into the Acorn Electron manual: Even in BBC Basic (where arrays always started at 0) `FOR` loops were typically starting at 1. – Martin Rosenau Sep 11 '18 at 05:16
  • @200_success Not really - that's just an entirely *separate* advantage of zero-based indexing. And rarely needs to be explicit in the external interface anyway - that's just again going back to C's pointer arithmetic fetish. How structures are laid out in memory shouldn't be used to design how they're supposed to be used. If you have code-level abstraction, what's the difference between `matrix[0, 0]` and `matrix[1, 1]`? How does it affect your use of `matrix.ToLinear()`? The interface should be meaningful on its own, separate of its concrete implementation. – Luaan Sep 13 '18 at 10:26
9

In languages such as C, the first item in an array has an offset of zero from the pointer. If the size of your objects are 4 bytes, the next item has an offset of 1 x 4 bytes.

Index   Size    Offset
0       4       base + 0
1       4       base + 4
2       4       base + 8
3       4       base + 12

And so on.

If the items were indexed on natural counting, there would have to be an adjustment made by both the computer and the programmer to find the location of the item in memory. Mistakes would be made by having this additional step, particularly on the human side of the process. It's easier, although slightly unintuitive, to use zero-based indexing.

CJ Dennis
  • 209
  • 1
  • 2
  • 3
    Welcome to [cseducators.se]! Isn't this just a restatement of OP's third bullet? – Ben I. Sep 03 '18 at 12:16
  • yes, the pointer (which is the array) contains the 1st element, then you add 1 to get the next element etc. In assembler we were taught indexing to access arrays, so moving to C it all made simple sense. – WendyG Sep 03 '18 at 12:17
  • 1
    @BenI. My answer addresses the human side of "making it easy". – CJ Dennis Sep 03 '18 at 12:20
  • Welcome to the community, please read the question. The OP states in the question that this explanation is unsatisfactory, for the intended audience. – ctrl-alt-delor Sep 04 '18 at 07:58
  • Keep in mind that C arrays are rather plain. In many languages, the pointer refers not to the first item, but to things such as the length of the array. And yet both 0-indexed and 1-indexed arrays exist. Indeed, you could use your argument to argue that arrays should start at 1, because that would allow you to store metadata in the "0 offset". – Luaan Sep 05 '18 at 09:05
  • @Luaan If the metadata is at "0 offset" there's no guarantee it's the same number of bytes as each entry. In reality, it's a packed format: `metadata` + N x `array-item`. Only the `array-item`s are guaranteed to each be the same size. The array part is still 0 based. – CJ Dennis Sep 06 '18 at 23:43
  • @CJDennis As surprising as it might sound, plenty of languages older than C didn't have this problem at all - e.g. all arrays had the same element size. C became somewhat famous for encouraging these kinds of hardware-specific optimizations (of course, how effective these turned out to be depended entirely on the programmer). Many considered this to be a step back - instruction-by-instruction performance wasn't thought as an end goal in itself. But regardless, this doesn't change that as soon as you need an offset anyway, 0-based arrays are no longer easier to index than 1-based. – Luaan Sep 13 '18 at 10:18
7

Simply put, we do not count from zero, we shift from zero

You can think C as a neat way to not write different assembly for every architecture/machine/processor in existence. Instead, take a simple and short macro-like language for a abstracted machine, compile that, and brk() your way on abstracted memory.

But abstracted memory is only a sequence of bytes, you need a way to refer to specific segments. Enter pointers. But to not make every item a allocation (think of a string where every char is separately allocated) the next step is "lists". The most compact list1 is a pointer to the first element, and then put every other element in adjacent slots.

Every element is, then, indexed as shift from first. And because that, the shift index starts from zero.


1 This is from a era when you need entire reunions to decide if is worth spend whooping 52 bytes to have your system know about leap years...

6

I will try to answer, without reference to low-level programming (or any programming language), without mention of history, or much maths.

As already stated in some answers, we do not count from zero (using the value 0 to represent 1, and 1 to represent 2 (Usually)). So what do we do?

We measure from zero. If I give you a ruler, and ask you to measure something, you measure from zero. If I ask you to tell me how far a cell in an array is from the start, then you measure from zero (the first cell is zero from the start: it is at the start), the 2nd cell is 1 from the start.

ctrl-alt-delor
  • 10,635
  • 4
  • 24
  • 54
  • 3
    We absolutely count from zero. If you want to count sheep hopping over a fence, you initialize the `sheep_count` variable to 0, not 1. If no sheep appear, you must report that zero value as the count. – Kaz Sep 06 '18 at 00:03
  • @Kaz That simply isn't true at all. This should be pretty obvious when you take into account that zero-as-a-number is a rather recent invention compared to natural numbers. People always considered "nothing" as a special case, and it took effort to introduce zero to math in a meaningful way (and at the cost of confusions like division by zero). You may have been *taught* to count from zero, but it certainly isn't natural to humans. If you want to count sheep hopping over a fence, do you initialize `cow_count` to zero first? I hope not :) – Luaan Sep 13 '18 at 10:38
6

From time to time I have had to do numerical work in MATLAB and the 1-based indexing always stuck out like a sore thumb, so I have a few examples I can provide where 0-based indexing is advantageous.

Modular access

This is the simplest example. If you want to "wrap around" an array, the modulus operator works like a charm. (And don't say "what if the modulus operator were defined from 1 to n instead"—that would be mathematically ludicrous.)

str = "This is the song that never ends... "
position = 300
charAtPosition = str[position % (str.length)]

Real world examples of this might include working with days of the week, generating repeating gradients, creating hash tables, or coming up with a predicable item from a small list given a large ID number.

Distance of zero

In a lot of algorithms and applications, we need to work with distances. Distances can't be negative—but they can be zero. For example, one might want to count how many occurrences there are of each distance in a data set. Indexing from 0 allows this to be handled more naturally. (Even if distances aren't integers—see below.)

Discretisation

Let's say you want to generate a histogram of heights. The first "bucket" is 50cm-70cm. The next is 70-90cm. And so on up to 210cm.

Using zero-based indexing, the code looks like

bucket[floor((height - 50) / 20)] += 1

With 1-based indexing, it looks like

bucket[ceil((height - 50) / 20)] += 1
  or
bucket[floor((height - 50) / 20) + 1] += 1

(the two have slightly different semantics). So you can either have a stray "+ 1" or use the ceil function. I believe the floor function is more "natural" than the ceil function since floor(a/b) is the quotient when you divide a by b; for this reason integer division often suffices in such calculations whereas the ceil version would be more complex. Also, if you wanted the buckets in reverse order, this is 0-based indexing:

BUCKETS = (210 - 50) / 20 - 1

#intuitive because BUCKETS - 1 is the highest index
bucket[(BUCKETS - 1) - floor((height - 50) / 20)] += 1

versus

#not intuitive... what does BUCKETS + 1 represent?
bucket[BUCKETS + 1 - ceil((height - 50) / 20)] += 1
  or
#not intuitive... how come a forwards list needs "+ 1" but
#backwards list doesn't?
bucket[BUCKETS - floor((height - 50) / 20)] += 1

Change of Coordinate Systems

This is related to the above "direction flip". Although I will use a 1 dimensional example, this also applies to 2D and higher dimensional translations, rotations, and scaling.

Let's suppose, for each slot in the top row you want to find find the best fitting slot in the bottom row. (This often occurs when you want to efficiently reduce the quality of some data; image resizing is a 2D version of this).

With 0-based indexing:

0-based indexing

#intuitive because the centre of slot[0] is actually at x co-ordinate 0.5
nearest = floor((original + 0.5) / 5 * 7);

With 1-based indexing:

enter image description here

#Why -0.5? Can you explain? And then we need to "fix" it with a +1
nearest = floor((original - 0.5) / 5 * 7) + 1;

Nice Invariants

In order to prove that programs work correctly, computer scientists use the concept of an "invariant"—a property that is guaranteed to be true at a certain place in the code, no matter the program flow up to that point.

This is not just a theoretical concept. Even if invariants are not explicitly provided, code that has nice invariants tends to be easier to think about and explain to others.

Now, when writing 0-based loops, they often have an invariant that relates a variable to how many items have been processed. E.g.

i = 0

#invariant: i contains the number of characters collected

while not endOfInput() and i < 10:

    # invariant: i contains the number of characters collected

    input[i] = getChar()

    # (invariant temporarily broken)

    i += 1

    #invariant: i contains the number of characters collected

#invariant: i contains the number of characters collected

There are two basic ways to write the same loop with 1-based indexing:

i = 1
while not endOfInput() and i <= 10:
    input[i] = getChar()
    i += 1

In this case i does not represent the number of characters collected, so the invariant is a bit messier. (You would have to remember to subtract one later on if you wanted the count.) Here is the other way:

i = 0
while not endOfInput() and i < 10:
    i += 1
    input[i] = getChar()

Now the invariant holds again so this approach seems preferable, but it seems odd that the initial value of i can't be used as an array index—that we are not "ready to go" straight away and have to fix up i first. I suppose it's initially odd for learners of zero-based languages that, after the loop terminates, the variable is one more than the highest index used, but it all follows from the basic rule of "a list of size i can be indexed from 0 to i-1".

I admit that these arguments may not be very strong. That said, off-by-1 errors are one of the most common types of bug and at least in my experience, starting arrays from 1 needs a whole lot more adding and subtracting 1 than 0-based arrays.

Artelius
  • 261
  • 1
  • 3
  • +1 for the _modulus_ operation argument. I also really like the _nice invariants_ one. I always make a point of explaining to my team the importance of "least surprise" in designing and implementing new features (both in terms of UX design and code design). I feel the two concepts share commonality in that one of the aims of _least surprise_ design is to preserve a high quantity of _nice invariants_! –  Sep 08 '18 at 06:38
  • You're blinded by what you're used to. `for` loops are looping from `a` to `b`. *Count* is entirely irrelevant! Pascal made this much more explicit in its own loop structures - `for i := 1 to 10` makes this rather obvious. As far as I'm concerned, your code assumes that one variable contains *two* separate meanings that happen to coincide. That's *wrong*; it may be useful for some cases, but you're reusing the same variable to mean two different things. `i` doesn't contain the count because that's it's purpose - it contains it by accident. The same with your buckets example. – Luaan Sep 13 '18 at 10:34
  • @Luaan That's the whole point of my post! The multiple meanings don't "happen" to coincide; they are mathematically related, and it is much easier to think about and check for errors if this relation is equal rather than off-by-one. I will admit there are a lot of times where I need to convert array indices to a "1-to-n" system—but almost all of these are to pretty-print things for humans. That is an important consideration—after all, we generally write programs using decimal not hex—however it is also important that code is as easy as possible to review for bugs. – Artelius Sep 13 '18 at 21:55
6

Rather than answer myself based off my personal preference, as all other posters have done, I would instead like to link the following very in-depth analysis instead:

The conclusion it comes to, is as follows:

In conclusion, I set out to prove by usage cases which scheme was better: indexing arrays from zero, or indexing arrays from one. I think the discussion has shown that there is no mathematically strong argument that favours one or the other. Every usage case which asserts one scheme is better than another because that other scheme is more awkward can be countered either by a language syntax which ameliorates the problem, or else can be balanced against some other usage cases in which the first scheme is more awkward than the other it is being compared against. There are many data points, and the analysis is complex, yielding no clear winner.

However, the thing which tips the balance in my mind is historical convention and interoperability issues. There are so many existing applications for zero-based indexing, and many existing languages which use it, that interoperability and learnability are biased in favour of that camp in my opinion. It is disheartening that the result of an intellectual discussion should be settled with "that's the way we've always done things" but in this case it seems a justified conclusion.

That being said, I think for certain specialised or high-level languages, I hope there is a place for one-based indexing, because I feel it has been an inadequately explored solution to the job of addressing elements of an array. I hope this essay has shown that there are some inherently good things about these schemes, and I also hope that future discussions of the issue won't unnecessarily conflate indexing from one with including end-points in loops or slices; although related, the two are not the same issue.


My personal preference is 1-based, because in my experience, the +-1s in 0-based indexing tend to pop up more often in high level logic, while 1-based indexing's +-1s tend to pop up in lower-level code where you need to be paying attention anyway that don't pop up as often in high-level code, leading me to commit notably fewer off-by-one bugs in 1-based languages like Lua.

Llamageddon
  • 161
  • 2
  • 1
    Nice first contribution here. Welcome to [cseducators.se]! – Ben I. Sep 12 '18 at 15:09
  • @BenI. Thanks! :-) As a fan of 1-based indexing, it really pains me when people speak from their personal preference, and I in particular hate Djikstra's very selective argument. I've been lucky to find the resource I linked a while ago, because it is a much better analysis than anything I could hope to do. – Llamageddon Sep 12 '18 at 15:55
4

I think it's best to view addresses and indices not as identifying objects or elements, but rather the spaces between or on either side of them. Thus, an array with four elements would have the following indices:

Addr:   Base+0  Base+1   Base+2  Base+3  Base+4
.         V       V        V       V        V
Indices:  0       1        2       3        4
Elements   [FIRST] [SECOND] [THIRD] [FOURTH]

Each element is associated with two addresses--the one immediately preceding it and the one immediately following it. The address immediately following any element other than the last will immediately precede another element, and likewise the address immediately preceding any element other than the first will follow another element. Even though each address other than the first and last will be associated with two elements (one preceding and one following), by convention applying the * operator to an address yields the one following.

Note that unlike the "half-open interval" view, this approach does not require any special handling for pointers that go one beyond the last element. The address "Base+4" above is associated with the four-element array just like any other, but the * operator cannot be applied to it because that operator would require the ability to compute the next address.

supercat
  • 189
  • 4
  • 1
    I like this. This is also a useful way of thinking about labels in assembly language (as zero-width). Addresses and labels don't have any natural width associated with them; that depends on what type of load/store you do. (The type system in most languages associates an element width with a pointer, but this is why you can't add/subtract `void*` in C.) – Peter Cordes Sep 05 '18 at 01:54
4

Why not?

Why not count from zero? (This answer is a bit like Buffy's answer, but a bit simpler logic than many of the other answers.) Zero is a perfectly valid number. So why not use it? Failing to use that available number is just a waste of the possibility of using that number. You wouldn't want to start lower, since negative numbers may be a bit more complicated to need to consider, and may even be unavailable (when using unsigned numbers). But zero is available, so why waste the possibility of being able to use that number?

If you do that, it's like the 63 SPT (sectors per track/head) limit that contributed to the 504 MiB / 528 MB) Barrier. That used a 1-based count. If a zero-based count were used, a limit would have been 512 MB instead of 504 MB. In a day and age when hard drives were counted in megabytes, that limit was nearly 1.6% smaller because someone decided to do a one-based count for one of the numbers. So, keeping that real-world actual cost in mind, what benefit is there to trying to exclude zero? If there isn't one, then don't do it.

Standardizing on one number is nice. When I was ten years old, I knew how to program. Sometimes I counted from one (e.g., "for(x=1;x<=10;x++)" ), while other times I counted from zero (e.g., "for(x=0;x<10;x++)" ). Sometimes I would mix up those methods, and create a fencepost bug (using "for(x=0;x<=10;x++)" ). Later in life, I was pretty strongly compelled to count from zero and drop the test for equality (just testing for inequality using <, instead of "less-than-or-equal" using <=), and once I got into a standardized habit, I found myself to be far less likely to be making those off-by-one errors.

Ultimately, the code of in many "high level" programming languages is just meant to be converted into "low level" assembly language, which is designed to operate the same way as an actual chip. Zero is kind of a special number (having unique properties with addition and multiplication), while one has less special behavior (having a similar unique property with multiplication, but not addition). So it makes a bit of sense for zero to be special.

Also, if you think of the "what's left", zero is more logical. If you count down ("'for x=10;x;x--)`"), then do you want to exit when there is one job left, or zero jobs left? Zero is more sensible in that particular case, and that may be a key reason why some of the Intel Assembly Language operators perform the way they do.

TOOGAM
  • 251
  • 1
  • 3
3

I think that one of the first programming languages to use zero-based indexing was BCPL. In BCPL, pointers and integers are interchangable, and arrays are represented by pointers. Subscripting uses the operator !, so you can get the first and second members of an array using A!0 or A!1. These are defined to be equivalent to !(A+0) and !(A+1) where the unary ! operator dereferences a pointer. It's neat, simple, and consistent, and it wouldn't work with 1-based addressing.

A lot of BCPL ideas were carried into C, and the rest is history.

But although zero-based indexing seems to work best for system programming languages, there are many modern languages that chose to start at 1. XPath is an obvious example. I think the rationale there was that it was expected to be used by non-programmers (e.g document authors) and you can't expect a document author to think of the first chapter in a book as chapter 0.

Michael Kay
  • 376
  • 1
  • 4
  • 1
    Indeed BCPL seems to be origin of 0-based indexing. We can try to explain to ourselves or to pupils why indexing starts from 0, but the fact is that it only does so in certain languages, and that most of those do so out of historical reasons. This blog shines some light on the origin of 0-indexing: http://exple.tive.org/blarg/2013/10/22/citation-needed/ – idrougge Sep 06 '18 at 12:37
  • Nice blog article. I'm amused by the notion that we're more likely to have heard of Eben Upton than of Martin Richards. – Michael Kay Sep 06 '18 at 13:31
  • 1
    ... It reflects that the author is clearly digging into software history for the first time and is suprised by his discoveries, thinking that perhaps no-one else knows this. However, his conclusions are a bit inconsistent. Zero based indexing in BCPL was an inevitable consequence of having pointers rather than arrays as the fundamental data structuring concept, it wasn't to make array indexing more efficient. – Michael Kay Sep 06 '18 at 13:38
  • Since we're on Computer Science Educators, I might mention that Martin Richards and Steve Bourne between them probably taught me everything I really needed to know about programming -- mainly by example; they both wrote beautiful code. – Michael Kay Sep 06 '18 at 13:42
3

Counting is always 100% arbitrary. Counting from 0, from 1 and from 7 - all make exactly same sense. Our reliance on indexing from 1 in real life is no different than our reliance on 10 as base of positional system. It's nothing but a convention.

The only issue is "How do I remember where do we agreed to start counting at?". From there comes the lead-on question: "What is the default value for integer variable?" or "If you started with a freshly initialized integer, what would you think would be it's value?"

And this is the reason why we start at 0: Integers start at 0, so it's helpful to map the first entry of an array to the first value of a variable capable of indexing it.

Agent_L
  • 141
  • 3
  • 1
    BCPL, where zero-based indexing probably originated, didn't initialize variables. So that's hardly the right justification. Also, *integers don't start at 0*. Integers don't *have* a start. As for a default in your favorite language (if it even has one), that's entirely arbitrary - zero is convenient in pre-zeroed memory, but before modern memory management, there were no definite default values. A pretty common practice is to always initialize variables with a meaningful value, rather than some "whateverdefault" - e.g. if you want a control variable that counts down, initialize it with count. – Luaan Sep 13 '18 at 10:45
  • @Luaan Yeah, this is what I meant by saying that it's arbitrary. I don't agree with BCPL as the origin. In asm when you think of a memory area as an array, you index it with addr+(0*stride), addr+(1*stride), etc – Agent_L Sep 18 '18 at 08:12
  • In assembly, you didn't have any abstraction (at first). Indexing an array is already an abstraction - it's a map between "I want the 3rd element of the array" and what's actually needed to get that particular element. The fact that you don't need to know the element size in BCPL (or anything else about the array) is a big difference. Indeed, in assembly, you'd often see very complicated addressing patterns that were tailored to the capabilities of the hardware, rather than the logical structure of the program; in C, if you wanted the next element, you did `element++`. – Luaan Sep 18 '18 at 11:10
  • Now, it's true that in C, that actually was a low-level pointer operation. But that's not the purpose of it, that's just a constraint to make the compiler simpler. The meaning is still "the next element" or "the 3rd element", not "pointer + 0x12". C is a rather bare language, but it's still way above the abstraction of then-assembly. Meanings started to creep into things that used to be pure procedural code. Of course, it couldn't compare to anything like LISP - the ties to hardware were far too strong for that. But there were things like `0` as pointer value, or what `i++` means. – Luaan Sep 18 '18 at 11:13
  • @Luaan This is why I wrote "when you **think** of a memory as an array". It doesn't matter if the abstraction is a part of a language or your train of thought. BCPL built up on established standards of thinking about a program. I don't really get where are we going, because I feel like you're restating my concepts but phrase them like counter-arguments. If you believe that my answer can be improved, then it appears you need to be more specific. – Agent_L Sep 18 '18 at 12:12
3

An example from a non-technical field where counting is universally from 1 and it has strange consequences.

In music, chord intervals start from unision or 1st — that is the two notes sound the same note — and go up 2nd 3rd 4th 5th 6th 7th 8th which is an octave. Thus:

Octave intervals

Now we go further up from the octave : 9th 10th 11th 12th 13th 14th 15th = 2 octaves

Notice something strange? The 8th of the octave did not double to a 16th of two octaves!

What gives? The real issue is this:

An octave has 7 notes!

"Octave" itself is a misnomer!

ie C, D, E, F, G, A, B is the "octave"

After that the "loop" repeats at/from the next C.

And so if we had counted — better measured — the C-C chord — the so-called "unision" — as a zeroth then C-D, a single whole-tone step, would be a first and so on upto C-C being a 7th. And continuing C-(next)C would be a 14th thereby keeping our (arithmetic) expectation intact.

Of course the key as many others have pointed out is that the distance measured rather than counted between C and itself is zero!

In more Math language

In ℤ {0,1} are special

The infinite cyclic group (ℤ,+) has 0 identity and 1 generator

The field ℝ (or ℚ) has no generator ie 0 remains special, 1 less so.

Rusi
  • 918
  • 4
  • 12
  • It gets even weirder when you consider that a second plus a second plus a second is a third, and that you can add unison's (the first) infinitely without ever moving at all. – Ben I. Oct 05 '20 at 10:42
  • Umm.... Cute observation @BenI. But this is a composition of two different things: the minor one being counting from 1 the major one being the wondrous irregularity of the scale. Which itself we experience through two *big* simplifying approximations: the uncountable infinity of frequencies reduced to (western) music (C# spelt different from Db but the C-D whole tone = D-E whole tone). Then reduced once again to 12 notes of the keyboard – Rusi Oct 05 '20 at 11:06
  • Cute observation is what I was going for :) But it's truly exactly the same observation as octave plus octave is fifteenth, just with smaller numbers. – Ben I. Oct 05 '20 at 14:46
  • There's also a bit of something not quite right, actually. The octave doesn't have 7 notes; the octave is an interval (so at most, it only has 2 notes), and it is named for being the eighth note along the scale up from the tonic. So I think you meant to say "an octave of a major scale only has 7 notes", but even this wouldn't quite be correct. It would be correct to say that "an octave of a major scale is comprised of seven notes beyond its initial note". – Ben I. Oct 05 '20 at 14:59
  • @beni. «but even this wouldn't quite be correct. It would be correct to say that "an octave of a major scale is comprised of seven notes beyond its initial note"» Well there's a programming theorem that every top tested while can be written as a bottom tested do-while. And conversely. I wrote the 7 as CDEFGAB. you could write (reference C) DEFGABC. Which ever way you count there are 7 distinct followed (or preceded) by the "loop" – Rusi Oct 05 '20 at 15:55
  • But neither of those is an octave, those notes outline major and minor sevenths, respectively. The octave only has two notes, such as CC. – Ben I. Oct 06 '20 at 03:14
  • Let me try a different tack @BenI. {a, b, c, d, e, f, g} is a set with 7 elements (cardinality 7). You're not allowed 'h' (unless you're German!). In the music world (assuming a fixed scale/mode, no accidentals, modulation and other fancy stuff) **your palette consists of 7 "colours"** – Rusi Oct 06 '20 at 03:43
  • Perfect. I agree! This is all correct. – Ben I. Oct 06 '20 at 10:15
1

Slight generalization of CJ Denis's answer: The advantage of zero-based indexing is apparent as soon as you start working with relative indices. If you have a base array A and over it, let's say, a sliding window W, you would have to subtract 1 each time you add W-based index to the A-based one.

Edheldil
  • 19
  • 1
  • And if I add your SSN to my SSN... how does that translate to the array? Indices are a different level of abstraction than "there's 20 bytes after this pointer that represent an array of `short`s". You can argue that (e.g.) C chose this way of representing arrays an indices because of performance targets, but that's about all you can argue about that. A lot of the problems in software engineering stem from this way of thinking that just because it's possible to add two numbers together, it makes sense. It may have it's place in the lowest levels of our systems, but not normal application code. – Luaan Sep 18 '18 at 11:20
1

There are many answers but I miss an attempt to make it short and comprehensible for non programmers.

I don't really want to be explaining memory addresses to a new programmer who doesn't understand memory layout and maybe doesn't even understand arrays yet.

A simple and short example.
We have a list of elements aligned one after the other. The start of this list, the first element is flagged with a pin.
To get the first element, we go to where the pin is.
To get the second element, we need to go to the pin plus one element.
Starting at zero the computer can simply calculate where to go from the start location.

More thoughts on that.
Counting starts with "one" for the first element. Even for programmers. But programmers don't count elements when they start at zero, they give them names. These names start at the first number that can be built with digits. Zero denotes "nothing" but is the first valid number.

couldn't a modern compiler often optimize this away?

I think it can't. How? The calculation described above is fundamental, the CPU does it like that. A compiler can't optimize necessary calculations away.

puck
  • 111
  • 2
  • Of course a compiler could optimize this away - this was very common on x86, for example, because of its powerful memory addressing; especially since 486 and Pentium. And of course, just have a look at any modern disassembled code - in most cases, you will not find any direct indexing from the base of the array anyway. Ultimately, in many cases, indexing an array is a relic anyway - for most cases where you see indexing, what you really want is iteration. It's just that many programmers and languages are stuck without distinction between the two. – Luaan Sep 13 '18 at 10:55
  • C or many other languages don't have iteration so we really should know what it is and how it works. Modern techniques sometimes are great but sometimes they are simply overcomplicated, less comprehensive compared to [i] and I don't think there is no kind of index accessing behind iterators - we just don't see it any more. So I doubt iterators are really faster than a simple [i]. – puck Sep 13 '18 at 11:07
  • C was deficient in that regard even in comparison with other languages *of its time*. And no, they're certainly not *less* comprehensive. You need to look at code with fresh eyes - a lot of the crazy doesn't seem crazy anymore when you get used to it. The thing about iterators is that any indexing is strictly an implementation detail - which allows you to give much more freedom to the code that's using them as an interface. If you pass an array, it will always be an array. If you pass an iterable, the implementation can do whatever fits - like using a linked list. – Luaan Sep 13 '18 at 17:00
  • Iterators can easily be faster than pointers, and they can be slower than pointers. But they're definitely easier to understand and reason about, and they allow a lot of optimisations that simply aren't possible with "pointers to an array". Just consider something like map-reduce - it wouldn't be possible if people stuck to the idea that arrays are just a pointer to the first item in an array. Unsurprisingly, it came from the other side of the language spectrum - languages that value abstraction over micro-tweaks. – Luaan Sep 13 '18 at 17:02
1

Languages which start from 0 are currently popular but there are others which start from 1. These examples are quite old. Many of the start from 0 languages are derived from or inspired by C.

Cobol is not fashionable now but it is still an important language in business and its arrays start at 1. RPG (not role playing games) also has arrays which start from 1. RPG is ugly and horrible but it is also popular in business. My Fortran is rather rusty but I think that it also started at 1. Very long ago, I used Algol and you could choose the starting index 0, 1, or even negative. For some applications, negative indexes e.g. from -10 to +10, were convenient. I am not sure whether this was a standard feature or just one of the dialect that I used. Things were not so standardised back then.

Edited to remove possible unintended offence.

badjohn
  • 119
  • 3
  • Pascal and Visual Basic (and many other Basics) also preferred natural indices. Using zero-based indexing really has to do with conflating indexing and pointer arithmetic. If I want an array from -10 to 10, why should I index it from 0, right? Let the compiler/library deal with the details. C didn't have indexing at all, *only* pointer arithmetic, so there was little choice. For some weird reason, even today people still count this as C's *advantage*, rather than misfeature. Oh well. – Luaan Sep 13 '18 at 10:49
  • I considered including Basic but it was so long since I have used it that I was not sure. Also, as you hint, it is probably dialect dependent. I miss the ability to choose the minimum and maximum index. It is a shame that it has not caught on. – badjohn Sep 13 '18 at 11:39
  • Plenty of languages on the "abstraction makes things easier" side of the barrier still support ranges. You just need to give up C-like languages, for the most part (though e.g. C# does have support for arbitrary indexing, it's not really "equal citizen" compared to zero-based). – Luaan Sep 13 '18 at 17:05
1

You give the answer yourself in the third bullet point, and you dismiss it too easily.

Let me delve into my personal experience from when I learned C in the mid-1980s. I remember when for the first I time saw 68k assembler code produced from indexing an array in C and realized that variables are just constant addresses, and indices are just offsets to them, so that addressing an array element a[i] could be understood as *(a+i) which simply was a readily available adressing mode in machine code if address a was already in a register.

It was a revelation on many levels: What C is (a macro assembler with a few bells and whistles plus standard library), what a compiler does, under what constraints programming languages work, and why C was so fast.

Why do you want to deprive your students of this insight? I think teaching arrays (or any programming, really) benefits tremendously from teaching memory addresses and some rudimentary assembler, so just go for it. All language abstraction is operating under the constraints of the metal. Your students will never understand why certain seemingly simple things are expensive or impossible if they don't know what they are actually doing.1

To sum up, essentially your third bullet point is the reason for the decision by the designers of C (or probably rather, BCPL or B) to start indexing at 0. Saving a subtraction at each element access, i.e. in many hot loops, was certainly highly relvant in 1970. True, today these original reasons are largely irrelevant; but today we have 45 years of code and a diverse C-rooted language tree. Regarding these languages we are locked in for good. As Jörg correctly mentioned in another answer, the decision was made differently for many other languages, probably not coincidentally some for which speed is secondary and some who target a lay audience.


1Don't get me wrong, I like abstraction. Having had the glimpse under the hood lets me actually appreciate the saftey, correctness and comfort provided by high-level languages and libraries.
1

It's not an answer, but it is additional information.

When I was a student, what eventually helped it click for me is that I always found it really annoying and weird that the "19th Century" goes from 1800-1899, and the "20th Century" goes from 1900-1999.

But if we would have started indexing centuries at 0, then the "0th Century" goes from 0-99, the "1st Century" goes from 100-199, ... the "19th Century" goes from 1900-1999, and the "20th Century" goes from 2000-2099.

Likewise – if you're in a place where soccer/football is popular – we typically say the "1st minute" to talk about 0:00–0:59, the "27th minute" to talk about 26:00–26:59, etc. Had we started indexing minutes at 0 instead, the "27th minute" would more logically be 27:00–27:59. (Of course, both of these ignore the additional confusion that the 90th minute is sometimes considered to be 89:00–93:00.)

Bryan
  • 119
  • 2
  • 4
    Actually, not everyone agrees with your first paragraph. Some count centuries starting with year "1" as in 1901 thru 2000. There was a lot of worry about that at the millennium and the danger that systems would fail at the rollover whenever that was. Also, I'll note that the Western calendar doesn't have a year 0. It jumps from 1BCE to 1CE with no 0 in between. Personally, though, The 17th century, being the sixteen-hundreds has always been a pain for me. – Buffy Sep 11 '18 at 16:17
  • And then when a student well-actualy's you like that, they fell into your trap and you say, "Wouldn't it make a lot more sense if we started at year 0 and our centuries started at 1900, not 1901? That's 0-indexing." :P – Bryan Sep 11 '18 at 16:36
  • Hmmm. Well, actually, I was just pointing out an historical fact. – Buffy Sep 11 '18 at 17:13
  • @Buffy Surely that is now anachronistic? I never heard a peep about that when the year 2000 was rolling around. It seemed clear enough at the time that the popular understanding is that the centuries go from '00-'99. – Ben I. Sep 11 '18 at 17:43
  • Well, see, https://www.scientificamerican.com/article/when-is-the-beginning-of/ – Buffy Sep 11 '18 at 17:45
  • But the technical issue mostly involved using two bytes for a date's year, so the issue was what happens when 99 becomes 00. The fear was that all date computations would fail and take financial systems (etc.) with them. – Buffy Sep 11 '18 at 17:50
  • There were other issues with two digit dates, of course. some people on their 100th birthday were suddenly considered infants and their pensions, etc were put on hold along with some other legal problems. I don't know if you are old enough to remember the web in the previous century, but it was common (nearly universal) for a date form to have two digits for year. You entered your birthday as 7/9/88, for example. That has changed now. – Buffy Sep 11 '18 at 17:55
  • So in a race the person in 0th place is the winner (gets the gold). The 1st place gets silver, and the 2nd place gets bronze. – ctrl-alt-delor Sep 14 '18 at 17:02
1

The technical answer, the one where you explain memory, and the base and index register, should be your preferred method of instructing. I understand that some people think it's daunting, but in reality, it is really straight forward, and it should be the first thing you teach about programming. All of the stack overflows, stack underflows, buffer overflows, null pointer exceptions, dangling pointers, memory leaks, etc are all a lot easier to understand if you start with the basics of what a computer actually does. And you don't need to get super-technical with buses and caches and so on, just enough so that when a memory-related problem happens, they will understand why it happened easier.

As a comparison, which do we teach first in math: addition or multiplication? The reason why is because the latter is a concept built upon the foundation of the former. Teaching what a variable is before teaching what memory is akin to teaching multiplication without ever explaining addition. Everything we do in every computer language that exists on the market to today uses registers and memory. All of them. 100%. To completely gloss over the fundamental basics of computing is to perform a serious disservice to those you are supposed to be educating.

I realize that many modern languages have done away with bare pointers because of their potential for misuse, but that means that the new generation of developers are growing up without understanding a critical part of computers. And pointers are still everywhere, just neatly tucked away behind references, objects, automatic variables, and so on. Which leads to questions about in/out parameters, why changing a value here affected data elsewhere, etc.

Until we have a computer that literally has no concept of a stack, registers, heap/dynamic memory, loops, branches, etc, no matter how far we abstract, at some point, the processor needs to do its thing. And that thing will be done by loading a memory address to a base register, loading an index register, and loading the data from memory as a result of the final calculation. If students do not learn this in class, they will certainly not learn it in the real world in any appreciable amount of time, and will write far inferior code for perhaps the rest of their developer career.


Teach your students the basics of computers, and I can virtually guarantee that anyone with sufficient aptitude to be a developer to begin with will never ask the question "why do we count from zero." Once the basics are established, everything else becomes that much easier.

phyrfox
  • 161
  • 2
  • Variables are entirely separate concept from memory; even today, most of the variables you use never make it into memory. As for pointers... pointers are everywhere, yes. Pointer *arithmetic* is a fetish that should have died ages ago. Multiplication *can* be explained in terms of addition, but it doesn't have to. The reason we teach addition first is because it's easy to teach algebraically, while multiplication makes far more sense geometrically - and it so happens that most math teachers are a lot more comfortable with algebra than geometry. But that's an entirely subjective value. – Luaan Sep 13 '18 at 10:55
1

Everything naturally starts at 0. In the beginning, there was nothing. Now, there is all.

You might want to know if you have something or not, and how many as well. It's just easier to say how many.

Also, try counting without 0.
1, 2, 3, 4, 5, 6, 7, 8, 9 That's only 9 numbers. But if you count 10-19, you have 10 different numbers. 10-20, you have 11. It's easier to just go by the 10s place and group the numbers in tens.

ctrl-alt-delor
  • 10,635
  • 4
  • 24
  • 54
0

There are several explanations. But I would say that the main reason is because the number in computer science represents a memory offset i.e. the physical distance from the beginning of the memory.

0 then means "no distance from the beginning of the memory" that is, the start.

  • I 100% agree. When I saw this answer for the 1st time, I was totally convinced. However this is not the first time that this answer has been given, and **some** of the older versions, give deeper explanations. – ctrl-alt-delor Sep 14 '18 at 16:53
0

You are starting with the first element of you index set.

In a math text, your index set usually is $\mathbb{N}\setminus\{0\}$, because this is how you normally count.

But in your PC you use an unsigned integer type as index set. And the first element of this ordered set is the number $0$, not the number $1$.

So the problem is your number type. You could have a natural-number datatype without zero for your index and everything would be fine. But we like to re-use existing data types used for arithmetic, which normally need to have a $0$, which comes before the $1$.

allo
  • 119
  • 3
0

Very simple: This is the offset of the address. So if an array called x is located with x at say 0x1000, an array that start's there has its first element x[first] at 0x1000 By using:

 0 as [first]

there's no math to compute it's start, and it is computationally efficient.

JosephDoggie
  • 111
  • 3
  • 1
    I 100% agree. When I saw this answer for the 1st time, I was totally convinced. However this is not the first time that this answer has been given, and some of the older versions, give better explanations. – ctrl-alt-delor Sep 14 '18 at 16:53
  • As pointed out, other earlier answers mention this, but it is a concise explanation, "in my own words" IMHO. – JosephDoggie Sep 14 '18 at 19:21
0

Gypsy Spellweavers answer pretty much nails it, and looking at an index as an offset to a memory address (as JosephDoggie and others suggest) gives a precise technical reason. However, there is something I'd like to add:

It is a great tragedy that we don't "count" from zero in our everyday life. That is because early humans didn't have a concept of zero. That concept was unfortunately introduced much too late. We naturally use a decimal system because of our ten fingers (we also use a base 5 system for stroked lists, with the diagonal stroke representing a thumb). We should actually use a base 11 system, because we can display not ten, but eleven states with our hands: 0 fingers up to 10 fingers up. So if only the concept of zero had evolved earlier in the history of mankind, "counting" from zero would feel perfectly natural to us (and Matlab wouldn't start indexing at 1).

user82593
  • 9
  • 2
  • I am not sure that I agree, with all of paragraph 2 (counting from zero), but the base 11 thing is spot on. Why did I never see this. I am going to use base 11 from now on (or at least for a week). Oh no I failed already, I write eleven in base 10. I should have written 10 (that is eleven in base 10). Yes that is what I will do. I will use base 10 from now on. – ctrl-alt-delor Sep 14 '18 at 16:50
0

Modern computer implementations (physical, CMOS, as well as most virtual machines) almost completely use only binary logic (although some flash memories may use more than 2 states per physical bit cell).

By convention, if you have 1 bit of binary information, or a single logic signal, there are two possible states, and we call one state “0” and the other state “1”. If that 1 bit addresses 2 bits of memory, then the two addresses are 0 and 1 to match the binary state nomenclature of the address bit.

Early TTL Logic data books labeled the signal state nearest 0 Volts as logic “0” in the truth tables.

Compositions of a small number of logic bits (more than one) are usually numerically interpreted as sums of powers of two times those binary bits, and the minimum unsigned sum is zero.

Many early programming languages (assembly) grew out of shortcuts for data and program entry via binary switches. The conventions followed. Lots of binary zeros sum to numeric zero. And you count up starting from there.

If you are a CS educator, you should not be teaching computation as magic, but as the result of the implementation of comprehendable binary logic.

hotpaw2
  • 1,895
  • 8
  • 14
-1

We count from 1 because we started counting before the zero was invented.

We should have all switched to the new, lowest digit when it was invented but this habit was impossible to kill so we stuck to 1.

The revolution that came with the zero is called https://en.wikipedia.org/wiki/Positional_notation, also known as "place-value". Positional notation allowed us to re-use the same digits to count tens, hundreds, thousands, etc. If you pay attention, you'll notice that tens, hundreds and thousands all start with the new, lowest digit: zero. But for units it was too late; too hard to change.

However many programming languages did the switch and count all digits from zero because keeping units special and different from all other digits creates irregularities and edge cases; see examples at https://en.wikipedia.org/wiki/Zero-based_numbering#Numerical_properties and in some of the other answers here.

MarcH
  • 115
  • 1
  • This is a weird and false history. Counting from 1 still *exists* because it is useful in many contexts. The question you are responding to is about why counting from zero is useful in *this* context. – Ben I. May 04 '22 at 22:48
  • Which exact fact is false @BenI.? You don't say. Counting from zero is useful in any context, Intelpedia gives many examples unrelated to computers. – MarcH May 04 '22 at 23:08
  • The second article you linked to makes one generalized mention of the problems with zero-based numbering, though since it's an article about the benefits, it isn't a surprise that it doesn't speak more to it. Zero-based counting is excellent for distances (such as arrays) and its analogs, but disallows the ordinal and cardinal count to match. If you have index zero, you now have one element, and that mismatch is permanent. And if you have zero elements, then the question of index becomes undefined.... – Ben I. May 05 '22 at 02:58
  • ... Sometimes we want distance measuring, sometimes we want ordinal and cardinal to match, such as when we are counting eggs in an egg container. Both systems have their use and their place. – Ben I. May 05 '22 at 03:00
  • You intimate a history in which habit, not clear usefulness, is the reason that standard counting is still around, and it's simply not true. Every school in the world teaches traditional counting to small children, and children everywhere learn that most people have ten fingers, and learn how to count them so that they *arrive at that number.* It's a big benefit, and should not be discounted. – Ben I. May 05 '22 at 03:05
  • So you don't think we counted from 1 before zero was invented? – MarcH May 06 '22 at 04:37
  • We did, I don't contend that you are wrong about everything. The false history is the part where you say that we kept standard counting because of "habit" instead of "usefulness". Both systems are useful. – Ben I. May 06 '22 at 13:13
  • "Both systems have their use and their place" - you're justifying habit better than I would ever have. – MarcH Jun 01 '22 at 01:16
  • Even in programming, no programming language has removed the standard count. It would be nonsensical to use positional notation for the size of an array, and we don't, ever. When your maximum index is [0] (positional counting), your list size is 1 (standard counting, which also matches the maximum ordinal number). If we all "switched over" you list size would now be zero if your only position was zero, and you would no longer be answering the question "how many items are in this list", because that is not a positional question. – Ben I. Jun 01 '22 at 01:36
  • You're very very close. I've never seen someone explaining pure convention so well while falling so short of realizing it's just convention. Fascinating. – MarcH Jun 02 '22 at 02:25
  • **Person 1:** *holds up an apple* How many apples am I holding? **Person 2:** Zero. **Person 1:** *puts down the apple* How many now? **Person 2:** an undefined number. If that conversation sounds useful to you, then this conversation probably isn't. – Ben I. Jun 02 '22 at 02:53
-3

"Thing zero"

I have nothing, I have one thing, I have two things...

0,1,2...

0x00, 0x01, 0x02...

0000, 0001, 0010...

There is nothing in my register, there is one thing in my register, there are two things in my register.

There is nothing in my array, there is one thing in my array, there are two things in my array.

If you started at one, you'd always have a minimum of one thing.

Munkee
  • 17