I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:
sed -e 's|\(href\)|\1|' index.html
Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.
The same happens if I remove -e
or if I add g
at the end.
It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?
grep
is the command to go with. – Ravexina Mar 21 '19 at 20:32/
(or,
) does not change the behaviour – Brydon Gibson Mar 21 '19 at 20:36grep -o
so grep prints only the matched (non-empty) parts of a matching line. – Ravexina Mar 21 '19 at 20:52