3

I have strings:

fvvDataFolders/DDB/DDB2018-02-21oM]
fbbDataFolders/DDB/DDB2018-02-22oM]

I want to strip everything that starts with Data and ends in what looks like a date:

DataFolders/DDB/DDB2018-02-21
DataFolders/DDC/DDB2018-02-22

How I can do it?

Josef Klimuk
  • 1,596

2 Answers2

4

You can use the command grep in this way:

grep -oP 'Data.*[0-9]{4}-[0-9]{2}-[0-9]{2}' input-file > output-file
  • -o, --only-matching - show only the part of a line matching PATTERN.
  • -P, --perl-regexp - PATTERN is a Perl regular expression; or in this case could be used also the option -E, --extended-regexp - PATTERN is an extended regular expression (ERE).
  • the regexp 'Data.*[0-9]{4}-[0-9]{2}-[0-9]{2}' matches to your requirements. It begin with the string Data, followed by unknown number * of any characters ., and ends with the date format: 4 digits from 0 to 9 dash 2 digits from 0 to 9 dash 2 digits from 0 to 9.

Here is also a sed solution:

sed -r 's/^.*(Data.*[0-9]{4}-[0-9]{2}-[0-9]{2}).*$/\1/' /tmp/input-file 
  • redirect the output to a new file > output-file or use the option -i.bak to make the changes in their places and create a backup file.
  • -r, --regexp-extended - use extended regular expressions in the script.
  • the command s means substitute: /<string-or-regexp>/<replacement>/.
  • ^.* will match to the beginning ^ of the line, followed by unknown number of any characters.
  • .*$ will match to the end $ of the line, precede by unknown number of any characters.
  • within the the , the capture group (...), will be treated as the variable \1. So the whole line ^.*$ will be substituted by the part that matces to what is in the brackets.
pa4080
  • 29,831
4

Either

grep -P -o 'Data.+?\d\d\d\d-\d\d-\d\d'

or

perl -pe 's/^.+(Data.+?\d\d\d\d-\d\d-\d\d).+$/$1/'

will do. They both print the minimal string that starts with Data and ends in what looks like a date (YYYY-MM-DD).

echo "fvvDataFolders/DDB/DDB2018-02-21oM]" > input.txt
echo "fbbDataFolders/DDB/DDB2018-02-22oM]" >> input.txt
grep -P -o 'Data.+?\d\d\d\d-\d\d-\d\d' input.txt

# output:
DataFolders/DDB/DDB2018-02-21
DataFolders/DDB/DDB2018-02-22

perl -pe 's/^.+(Data.+?\d\d\d\d-\d\d-\d\d).+$/$1/' input.txt

# output:
DataFolders/DDB/DDB2018-02-21
DataFolders/DDB/DDB2018-02-22
PerlDuck
  • 13,335