awk or sed command to replace line break plus text containing spaces

Question

An answer to another question suggests sed -i 's/original/replacement/g' file.txt to replace specific words in a text file. My starting situation looks like this:

        Item: PRF
        Type: File
        Item: AOX
        Type: Folder
        Item: DD4
        Type: File

My ending situation should look like this:

        Item: PRF^Type: File
        Item: AOX^Type: Folder
        Item: DD4^Type: File

Notes: (1) The Ask Ubuntu interface seems to suppress some of the leading spaces before Item: and Type:. There are in fact eight leading spaces. (2) I may have erred in using simplistic examples of Item. The items are actually partial Windows paths (lacking e.g., D:), some of which are quite long. A more accurate example would be Item: Folder\Some Folder\A file name.txt.

I've tried this, with and without double quotes:

sed -i 's/\n"        Type: "/\^"Type: "/g' file.txt

That gives me no errors, but also no changes. Also tried this:

awk '/ "        Item: " / { printf "%s", $0"^" } / "        Type: " / { gsub(/^[ \t]+/,"",$0); print $0 }' source.txt

I tried that to verify that I would be changing only those entries with eight blank spaces before "Item." That didn't work. Trying it with no spaces and no double quotes, as in the answer (below), also failed. Trying it with gawk -i inplace produced source.txt containing zero bytes.

My title initially specified sed. An answer proposing awk alerted me to that alternative, which (now that I'm looking at it) seems more capable. But I cannot figure out how to make it work.

"but also no changes" .. do you mean changes in the file? If you want in-place changes a la sed -i, you'd need to use GNU awk with the -i inplace option — muru, Jan 16 '23 at 12:00
Ah. I thought one answer (below) was saying that GNU awk was the default in Ubuntu. Apparently I misunderstood that: https://askubuntu.com/a/1420570/80644. With sudo apt install gawk the -i inplace option did modify source.txt, though with undesirable results (see edited question, above). — Ray Woodcock, Jan 16 '23 at 12:18
I don't remember if it's the default or not, but anyway, the command in the answer works for me, but your post has some weirdness: / " Item: " /, / " Type: " / - these don't match anything in the input file you have shown, so nothing gets printed, so your input file is replaced with nothing. — muru, Jan 16 '23 at 12:22
"The items are actually partial Windows paths" ... Was your input file edited on Windows at some point? ... If yes, then it might have \r\n carriage return(Windows style newlines) and you need to run it through e.g dos2unix file to correct that before processing it with either sed or awk — Raffa, Jan 16 '23 at 12:36
Also see: https://askubuntu.com/editing-help#code for how to code format properly (either indent by 4 spaces or wrap with triple-backticks) — muru, Jan 16 '23 at 13:45
The Windows-style newline was the solution. To fix that, I opened the file in gedit and used Save As to change the line ending from Windows to Unix\Linux. — Ray Woodcock, Jan 16 '23 at 21:41

steeldriver · Answer 1 · 2023-01-17T00:21:20.910

By default, sed only loads one line at a time into its pattern space. You can use the N command to load another line.

In fact, your question is a variant of a well-known "one-liner" for joining lines based on the initial character(s) of the following line¹:

40. Append a line to the previous if it starts with an equal sign "=".
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

So given

$ cat file.txt
    Item: PRF
    Type: File
    Item: AOX
    Type: Folder
    Item: DD4
    Type: File

(which has 4 initial spaces), then

$ sed -E -e :a -e '$!N;s/\n {4}Type: (File|Folder)/^Type: \1/; ta' -e 'P;D' file.txt
    Item: PRF^Type: File
    Item: AOX^Type: Folder
    Item: DD4^Type: File

Add -i or -i.bak to edit the file in place once you are happy that it is doing the right thing.

Alternatively, you could use the following non-streaming ed editor script to match the Type: lines, substitute ^ for the leading spaces, then join to the preceding line, writing the result back to the same file:

g/^ \{4\}Type:/s//^Type:/\
-1,.j
wq

You can implement that as a non-interactive shell one-liner:

printf '%s\n' 'g/^ \{4\}Type:/s//^Type:/\' '-1,.j' 'wq' | ed -s file.txt

See The GNU ed line editor for details.

see for example Sed One-Liners Explained, Part I: File Spacing, Numbering and Text Conversion and Substitution

@RayWoodcock :a sets a label for the conditional branch ta. See Commands for sed gurus — steeldriver, Jan 17 '23 at 13:05

Raffa · Accepted Answer · 2023-01-16T14:03:43.470

1

I would use awk … It is a straightforward one-liner like so:

awk '/Item:/ { printf "%s", $0"^" } /Type:/ { gsub(/^[ \t]+/,"",$0); print $0 }' file

That is … If the line has Item: in it, then print it without appending a newline(printf doesn't append a newline by default) but append the ^ character at the end … and if the line has Type: in it, then remove all leading space and print it appending a newline(print appends a newline by default).

The above command will not modify the original file but, will rather output modified text in the terminal.

To edit the original file in-place, use the -i inplace option of GNU awk(Might be the default on Ubuntu ... Check with awk -W version) or if not, you can install gawk then use it like so:

gawk -i inplace '/Item:/ { printf "%s", $0"^" } /Type:/ { gsub(/^[ \t]+/,"",$0); print $0 }' file

edited Jan 16 '23 at 14:03

answered Jan 14 '23 at 12:11

Raffa

32,237

This is a wilderness to me. (1) Would I be better advised to use printf "%s", $0 (see https://stackoverflow.com/a/46455937/711879)? (2) If printf $0 doesn't append a newline in the first part, why does it append a newline in the second part? (3) I think [ \t] refers to any occurrence of space or tab, but what do / and + characters do in gsub(/^[ \t]+/,"",$0)? – Ray Woodcock Jan 14 '23 at 18:22
1

(1) yes printf "%s", $0"^" would be a better safety measure … (2) It’s print(not printf) in the second part … (3) // enable regular expressions and + matches multiple occurrences of the regular expressions inside [] – Raffa Jan 14 '23 at 19:48
Very helpful. Thank you. Clarification on print vs. printf: https://en.wikibooks.org/wiki/An_Awk_Primer/Output_with_print_and_printf. Follow-up question regarding + : doesn't gsub (as distinct from sub) already match multiple occurrences within the specified string - or is that defeated by ^ ? Anyway, I wasn't successful so far. Editing the question to update. – Ray Woodcock Jan 15 '23 at 01:17
1

@RayWoodcock "doesn't gsub (as distinct from sub) already match multiple occurrences within the specified string?" ... It does if you don't anchor the regex to the the beginning of the line with ^(there will always possibly be only one* space or tab that satisfies this condition ... hence the +*) ... It's worth mentioning that in your case sub(/^[ \t]+/,"",$0) is an alternative option too ... Also, please notice that we only see your provided example input and expected output and write our answers to help you achieve just that ... We don't see the other context you see :-) – Raffa Jan 15 '23 at 09:16

awk or sed command to replace line break plus text containing spaces

2 Answers2