10

Consider the following:

$ ksh -c '1(){ echo hi;};1'
ksh: 1: invalid function name
$ dash -c '1(){ echo hi;};1'
dash: 1: Syntax error: Bad function name
$ bash -c '1(){ echo hi;};1'
bash: `1': not a valid identifier
bash: 1: command not found
$ mksh -c '1(){ echo hi;};1'
hi

Basically, I was trying to declare functions 1 and 0 which would be shorthands for true and false, but as you can see I ran into problem with using numeric names in functions. Same behavior occurs with aliases and two-digit names.

Question is "why"? Is it mandated by POSIX? or just a quirk of bourne-like shells?

See also related question to this one.

Sergiy Kolodyazhnyy
  • 105,154
  • 20
  • 279
  • 497
  • 1
    Things that are mandated by posix are mostly sometime quirks of Bourne-like shells. :P – muru Nov 19 '17 at 02:52
  • I see what you did there . . . >:) lol – Sergiy Kolodyazhnyy Nov 19 '17 at 02:53
  • 4
    @fkraiem Please refer to https://meta.askubuntu.com/q/13807/295286 Shells/shell scripting are and always have been on topic on Ask Ubuntu :) It's an essential topic to proper system administration of any Ubuntu system – Sergiy Kolodyazhnyy Nov 19 '17 at 06:02
  • 4
    It's worth noting that 0 is true in shell scripting and 1 is false (really, any non-zero is treated as false), in case anyone reading this is unaware. This is backwards from most other programming languages. – Soron Nov 19 '17 at 06:57
  • 2
    @EthanKaminski yep, as far as exit statuses of commands , that's absolutely true. Return value of 0 is true in shell. However, in arithmetic expansion $((...)) return statuses are flipped - 1 is true and 0 is false for consistency with C-language syntax . Try for example bash -c 'echo $((1==1));echo $((1==2))' What I was trying to do outside of this question was actually "reverse" the behavior. See the last example on my answer here to see what exactly I was trying to do. Silly idea, but nonetheless works – Sergiy Kolodyazhnyy Nov 19 '17 at 07:03
  • @SergiyKolodyazhnyy oh, now that's an interesting wrinkle I hadn't encountered before. I know about command substitution, but $((...)) is new to me. Good luck to you, then! – Soron Nov 19 '17 at 07:50

3 Answers3

14

POSIX says:

2.9.5 Function Definition Command

A function is a user-defined name that is used as a simple command to call a compound command with new positional parameters. A function is defined with a "function definition command".

The format of a function definition command is as follows:

 fname ( ) compound-command [io-redirect ...]

The function is named fname; the application shall ensure that it is a name (see XBD Name) and that it is not the name of a special built-in utility. An implementation may allow other characters in a function name as an extension. The implementation shall maintain separate name spaces for functions and variables.

And:

3.235 Name

In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.

Note: The Portable Character Set is defined in detail in Portable Character Set.

So a word beginning with a digit cannot be a function name.

muru
  • 197,895
  • 55
  • 485
  • 740
  • Still POSIX doesn't quite say why, but I'll take it as "Because standards" answer. Thanks – Sergiy Kolodyazhnyy Nov 19 '17 at 03:06
  • 4
    @SergiyKolodyazhnyy I'd say it's an inherited thing. This standard for names is fairly common in other things as well (IIRC C names also follow the same standard), so probably it's a Unix thingy. Also, in C, it makes it easier to parse – muru Nov 19 '17 at 03:08
  • 3
    @muru in C it'd bring some ambiguity if it were allowed. E.g. what would 1L mean? A function name? Or a long int literal? – Ruslan Nov 19 '17 at 08:46
  • 2
    Adding to above, it's worth noting that in C, a bare function name can act as a pointer to that function. This allows you to pass functions as parameters to a function, store references to them in variables, etc. Often used for callbacks. This in contrast to the name of the function followed by (), possibly with arguments inside, which denotes a call to the function in question (and takes on the value that is returned by the function being called). So if you have a function int f() { return 42; } in C, f is valid in a pointer context, and f() is valid in a non-pointer, integer context. – user Nov 19 '17 at 19:27
  • Bash functions can begin with numbers. E.G: function 1a { echo "called me!";} – David Farrell Jun 21 '21 at 02:24
13

That is a standard in many languages to prevent confusion between mathematical operations and variables or functions or methods.

Consider:

var 1 = 100

print 1*10 //should return 10 but would instead return 1000

var x = 5
x += 1
print x //returns 105, not 6    

def 100(num)
  return num * 1000
end

var y = 10 + 100(10)
print y // returns 100010 instead of 1010

As you can see, if numbers were allowed as variable or function names, doing math later on in a program might become very confusing and you would have to come up with creative workarounds if you needed to actually do math with those numbers later on. It can also produce unexpected results in some languages. Imagine you are incrementing a number for a loop but one of the digits is already a variable equaling a string. It would immediately throw an error. If you were not the original author of the code, that error could take quite a while to find.

In a nutshell, this is why most languages do not allow you to use a number as the name of a variable or function or method or etc.

Josh
  • 345
  • I was going to comment "but variables in shell need to be prepended with $ for being expanded" but then again if shell is inspired by other languages, that's OK I guess, plus in arithmetic expansion (( variables aren't required to have leading $. OK, I can understand that – Sergiy Kolodyazhnyy Nov 19 '17 at 04:22
  • 3
    Yes, it's a convention because it will create unexpected results in almost every language, but, even in a situation where it would work, it can make code very difficult to comprehend for anyone who would have to work on your code after you. – Josh Nov 19 '17 at 04:27
  • 2
    @SergiyKolodyazhnyy arithmetic expansion allows referencing variable names without $, so there's that. But that's probably secondary to the other reason of "just following common convention" – hobbs Nov 19 '17 at 07:03
  • @hobbs yep, absolutely agree with that – Sergiy Kolodyazhnyy Nov 19 '17 at 07:06
  • 1
    This "most languages" explanation makes little sense for shells. All Bourne-style shells have parameters whose names look like numeric literals or operators: 0 for the shell or script name, 1, 2, ..., for positional parameters, * for them joined, - for enabled options, ? for the last exit status, and ! for the most recent asynchronous job's PID. (Thus, even in $(( )), the $ is often needed.) The code shown here to demonstrate the suggested rationale isn't a script for any Bourne-style shell, since there's no way to demonstrate it that way, as it doesn't apply to them. – Eliah Kagan Nov 19 '17 at 13:57
  • This is the best answer because it answers why you don't want to do it. – FKEinternet Nov 20 '17 at 02:35
10

In C, consider an expression like:

1000l + 2.0f;

Is 1000l a variable or a constant? Because variable names cannot begin with a digit, it has to be a constant. This makes parsing easier and stricter (typos like 1000k can be easily caught). It's also easier to have a single rule for variables and function names, since functions can treated as variables too. Now of course, parsers are far more complex and powerful, and we have things like custom literals in C++. But back in those ancient days of prehistory, sacrificing a bit of unnecessary flexibility could make your compilation (or interpretation) times much shorter (and people still complain about C++ compilation times).

And you can see the effects of a C influence throughout the shell language, so it's not surprising that the Bourne shell (or C shell) and hence, POSIX, has restricted the class of allowed names to the same as that of C.

Olorin
  • 3,488
  • 2
    As explanations go, this is the correct one. It's not that the same considerations apply to all languages or that Bourne-style shells have a syntax similar to that of C, but that the culture associated with C was strong and shell designers had to come up with some guarantees as to what identifiers would be allowed. I think this could use examples of "you can see the effects of a C influence throughout the shell language," as similarities between them really don't outweigh the differences (think about what numbers are considered true, for example). Nonetheless, this answer is correct. – Eliah Kagan Nov 20 '17 at 04:46
  • @Eliah arguably the numbers thing is also a direct consequence of the exit statuses for success and failure in C, so I tend to think in terms of success and failure instead of true and false when writing shell tests. You're right in that plenty of shell syntax is nothing like C, but examples include the braces, the semicolons, the short-circuiting && and ||, the lack of support for the ASCII nul in strings, – Olorin Nov 20 '17 at 05:41