What are the approaches to teach AI to how to render html page based on its source code?

Question

I'm wondering, instead of implementing new web browsers over and over again with millions line of code which is very difficult to manage, would it be possible to use ANN or GA algorithm to teach it about the rendering process (how the page should look like)?

So as an input I would imaging the html source code, output is the rendered page (maybe in some interactive image like SVG, some library or something, I'm not sure).

The training data can be dataset of websites providing input source code and their rendered representation by using other browsers for the guidance as expected output.

Which approach would you take and what are the most challenging things you can think of?

score 7 · Accepted Answer · answered Aug 13 '16 at 05:49

The rendering process for browsers is very well defined, and has a very rigid definite ruleset where (virtually) every accountability is noted and handled. This is not optimal for Machine Learning, which works when we have a large pool of examples, and we don't know the ruleset; it will figure it out. Even if you were to train an Neural Network to process that input, there are several things you must account for:

1. Variance in data.

Not all webpages are equal in length or complexity, and making a neural network to generate output from HTML would produce garbage most of the time.

2. Training time.

The time it would take for a neural network to understand HTML tags, attributes, the DOM Tree, and each and every element, including new ones being added every few years, and how each one renders and behaves, would take an extremely long time, most likely several years on a fast computer, if it even were possible

3. Interactivity.

Web pages aren't just static, they change according HTML, CSS and JavaScript. Not only would you have to design your system to account for the rendering step, you would also make it have to understand the Turing Complete scripting language JavaScript, as well as the less complicated, but inherently intertwined with HTML, CSS stylesheet language. If you thought the rendering process was easy, try training a neural network to handle complicated scripting patterns.

4. New Standards

Not all HTML is equal, because of different standards. WHATWG began working on HTML5 in 2004, and browsers started to implement not long after. In 2004, there were very few examples of HTML5 sites to train your network to begin with. Sure, now it's standardized and every website uses it, but what about HTML6? When the first specification is released (probably 2017-2025), virtually no websites will use it, because no one will support it. Only when it finally becomes standard, probably in the late 2020s or early 2030s, will you have enough data to train your monstrous system of neural networks

As for AI in general, one could argue that browsers already use A.I. in their rendering process. They intelligently decide what to render (taking CSS into account), when in order to get the most efficient render time, they selectively use different JavaScript parsers on different sections of the code to optimize the speed, the whole system has been optimized on another ruleset to make rendering and interacting with a webpage as seamless and easy-to-use as possible. Your system will never be as good as what hundreds of humans have optimized over 20 years.

Trying to solve HTML rendering with Neural Networks is akin to trying to nail a nail with a screwdriver. It's just not going to work

Hope this was helpful!

What are the approaches to teach AI to how to render html page based on its source code?

1 Answers1