12
$ (echo hello; echo there) | sed ':a;$!N;s/\n/string/;ta'
hellostringthere

Above sed command replaces new line character with the string "string". But I don't know the meaning of :a;$!N;s/\n/string/;ta within the single quotes. I know the middle part s/\n/string/. But I don't know the function of first (:a;$!N;) and last (ta) part.

terdon
  • 100,812
Avinash Raj
  • 78,556

2 Answers2

19

These are the, admittedly cryptic, sed commands. Specifically (from man sed):

: label
         Label for b and t commands.

t label
         If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script.

n N         Read/append the next line of input into the pattern space.

So, the script you posted can be broken down into (spaces added for readbility):

sed ':a;  $!N;  s/\n/string/;  ta'
     ---  ----  -------------  --
      |     |        |          |--> go back (`t`) to `a`
      |     |        |-------------> substitute newlines with `string`
      |     |----------------------> If this is not the last line (`$!`), append the 
      |                              next line to the pattern space.
      |----------------------------> Create the label `a`.

Basically, what this is doing could be written in pseudocode as

while (not end of line){
    append current line to this one and replace \n with 'string'
}

You can understand this a bit better with a more complex input example:

$ printf "line1\nline2\nline3\nline4\nline5\n" | sed ':a;$!N;s/\n/string/;ta'
line1stringline2stringline3stringline4stringline5

I am not really sure why the !$ is needed. As far as I can tell, you can get the same output with

printf "line1\nline2\nline3\nline4\nline5\n" | sed ':a;N;s/\n/string/;ta'
terdon
  • 100,812
  • 1
    The !$ is to don't match the last newline, IMO. – Braiam May 05 '14 at 16:09
  • @Braiam not too sure about that, it's $! not !$. However, it might also be !N and not $!. – terdon May 05 '14 at 16:19
  • I was trying to parse the texinfo page but didn't found references to neither !N or $!. So, I still keep my thinking that is looking if the last line is newline or EOF. – Braiam May 05 '14 at 16:23
  • 4
    I try to think of $! as an address 'range' with a postfix complement operator - so $!N (do N everywhere except for address $) is really the same syntax as something like m,n!d (delete everything except lines m to n). – steeldriver May 05 '14 at 17:31
  • : is analog of goto label, and in fact : used to be goto label in Thompson shell, so it'd be familiar to people back in the day using both sed and Thompson shell – Sergiy Kolodyazhnyy Aug 07 '18 at 12:59
  • when does append the next line to the pattern space happen? Does the substitution happen on just new line before append or on the concatenated after substitution (less effective? as tries to substitute same beginning all over again?) – msciwoj Sep 04 '18 at 10:26
  • @msciwoj they happen in the order they are written. That's why it works. If the substitution were only done on the original line, before concatenating, then it would only ever remove one \n from the first line and there would be no point in concatenating. – terdon Sep 04 '18 at 10:30
  • @terdon I was rather thinking it would take place on the new line being appended (before appending). Imagine 10 lines 100 chars each - If append happens before substitution then it should have huge performance cost (first time substituting on 100 chars, 2nd time on 200 chars, 3rd time on 300 chars and so on, each next time going through the beginning of the string that is already substituted). Is that how this works? – msciwoj Sep 04 '18 at 10:56
  • @msciwoj I see what you mean. I think that is indeed how it works, but only because I am assuming the operations happen in the order in which they are written. My previous comment was wrong, it could also work by substituting first and appending later. You make a good point. This might be worth its own question, either here or on [unix.se]. – terdon Sep 04 '18 at 11:11
0

I post this answer since I see a lot of confusion about why the last line is excluded when executing N (through the line addressing string $!) and because the OP was confused about the meaning of :a;$!N; in a sed command, not only in the specific one he posted.

Well, the benefit of using $!N instead of N is not evindent in the examples proposed (by the OP and by @terdon), since no "important" (keep reading) command is performed on the last line after the N command. (Indeed, the result is the same if one strips that line address off.)

In a more complex example (for instance, substitute this sentence in a file, with the two words appearing sometimes on one line and some other times on two lines), excluding the last line for the N command could be crucial! If the last line is not excluded, upon executing N on it, sed hits the EOF and exits immediately, preventing all subsequent commands (branching commands as well, namely t and b) to be executed.

In the too simplistic examples shown, we can safely remove $! and let sed fail in executing N and return since the aborted s command would do nothing if it was executed, since there's no \n to match.

Enlico
  • 272
  • 1
  • 5
  • 16