3

I have a CSV file with two columns (and a header) where each of its elements includes any number between 0 to 199. I want to convert these to their corresponding URLs. Here is an example:

41,51

should become:

http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/41.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/51.jpg

Here is the list.csv file I want to convert:

$ head list.csv
imageA,imageB
41,51
172,100
99,149
83,72
84,160
186,8
93,198
150,21
63,102
Mona Jalal
  • 4,545
  • 21
  • 68
  • 99

3 Answers3

7

Using sed:

sed -r 's#^([0-9]+),([0-9]+)$#http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/\1\.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/\2\.jpg#' input-file
  • redirect the output to a new file > output-file; or use the option -i.bak to make the changes in their places and create a backup file.
  • -r, --regexp-extended - use extended regular expressions in the script.
  • the command s means substitute: #<string-or-regexp>#<replacement>#.
  • # is used as delimiter - usually / plays this role, but here we have much slashes within the <replacement>, thus we do not need to escape each of them.
  • ^ will match to the beginning of the line. $ will match to the end of the line.
  • [0-9]+ will match to each string consisting of digits.
  • within the the <replacement>, the capture groups ([0-9]+), will be treated as the variables \1 and \2.
  • \. is just escape of the special meaning of the dot.

Here is a simplification proposed by @dessert:

sed -r 's#([0-9]+)#http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/\1\.jpg#g' input-file
  • here we assume the file format is homogeneous, as it is in the example, and we do not need to match the whole line.
  • the g flag (at the end) repeats the substitution for each occurrence of the matched regex to the end of the line.

In addition it is possible to use also variables for the base URL and for the file extension:

URL='http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/'; EXT='.jpg'
sed -r "s#([0-9]+)#$URL\1$EXT#g" input-file
  • Note: here are used double quote marks.
pa4080
  • 29,831
6

I'd probably use awk e.g.

awk -F, -v baseurl='http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/' '
  FNR>1 {printf("%s%d.jpg,%s%d.jpg\n", baseurl, $1, baseurl, $2)}
' list.csv
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/0.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/0.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/41.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/51.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/172.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/100.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/99.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/149.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/83.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/72.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/84.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/160.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/186.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/8.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/93.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/198.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/150.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/21.jpg
http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/63.jpg,http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/102.jpg
steeldriver
  • 136,215
  • 21
  • 243
  • 336
1

I have a code for you:

firstline=true
url_before_id=http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/
url_after_id=.jpg
for id in $(less list.csv)
do
  if $firstline;then
    firstline=false;echo $id
  else echo "$url_before_id${id%%,*}$url_after_id","$url_before_id${id##*,}$url_after_id"
  fi
done

or in one line

firstline=true;url_before_id=http://www.cs.bu.edu/~betke/research/vc-crowd/MSCOCO/;url_after_id=.jpg;for id in $(less list.csv);do if $firstline;then firstline=false;echo $id;else echo "$url_before_id${id%%,*}$url_after_id","$url_before_id${id##*,}$url_after_id";fi;done
pa4080
  • 29,831
Boba Fit
  • 1,633