0

I'd like to take snapshots of a youtube page (e.g. youtube.com/feed/trending)

Here's what I try:

wget https://www.youtube.com/feed/trending --convert-links -E

The issue is with the video thumbnails. Youtube seems to serve empty gif thumbnails (later replaced with the real thumbnails? Not sure). In the case of trending, I see the thumbnails of the first 6 videos, but everything else is gray/missing. Is this javascript-related? Anyone knows a reliable way to snapshot a web page that works for complicated pages such as youtube?

Thanks

1 Answers1

2

I've no idea what you're using this for but the proper way to grab that page's content is via the Youtube API. You can search by trending. It deals in nice JSON responses that you can tweak to whatever you like.

Their page itself looks like it's using the API, or perhaps a private version, but the data is on there, it's just deferred. Here's one of the images:

<img width="196" onload=";__ytRIL(this)" alt="" height="110" src="/yts/img/pixel-vfl3z5WfW.gif" data-thumb="https://i.ytimg.com/vi/Rqa9ph0cWSA/hqdefault.jpg?custom=true&amp;w=196&amp;h=110&amp;stc=true&amp;jpg444=true&amp;jpgq=90&amp;sp=68&amp;sigh=Vt5qpPXMxoaOiEG4ohSszdhmMJU" data-ytimg="1" >

Normally you'd be able to fix this image by doing a simple string replace but Youtube changes the order of the attributes between refreshes. You need to process the HTML and convert the data-thumb attribute to the src (and delete a load more tags). And then you'd need to download those (because wget won't have) and then convert the links.

But that's considerably more work than just using the API from the first paragraph.
Not to mention that what you're doing is all sorts of against their terms of service.

So my answer to this is the same as the short one: Use the API.


If you're really determined to do this the wrong way, you can manipulate a real browser, get it to load the page and then dump the DOM (what it's actually rendering).

It's actually rather neat and serves a real purpose for testing automation and generating screenshots of pages, but you're still going to have to pass it through something to convert the links and download the assets. You're probably bored of me saying this now but, just use the API ☺

Oli
  • 293,335
  • I will give a shot with the linked phantomjs and report back whether it works for the youtube case, thanks!

    The rationale of the request is I wanted something very easy to setup; I only need this for debugging purposes (it's nice to have a dead simple way to look what the ranking looked like some time ago, without relying on internal tools or APIs). I do work at youtube :-P

    – Dimitris Andreou Mar 02 '17 at 12:24