What artificial intelligence strategies are useful for summarization?

Question

If I have a paragraph I want to summarize, for example:

Ponzo and Fila went to the mall during the day. They walked for a long while, stopping at shops. They went to many shops. At first, they didn't buy anything. After going to a number of shops, they eventually bought a shirt, and a pair of pants.

Better summarized as:

They shopped at the mall today and bought some clothes.

What is the best AI strategy to automate this process, if there is one? If there isn't, is it because it would be dependent on first having an external information resource that would inform any algorithm? Or is it because the problem is inherently contextual?

score 7 · Accepted Answer · edited Aug 14 '18 at 18:21

The following post has a bit of math, which I hope helps to explain the problem better. Unfortunately it seems, this SE site does not support LaTex:

Document summarization is very much an open problem in AI research. One way this task is currently handled is called "extractive summarization". The basic strategy is as follows: Split this document into sentences and we will present as a summary a subset of sentences which together cover all the important details in the post. Assign sentence $i$, $1 \leq i \leq n$, a variable $z_i \in \{ 0, 1 \}$, where $z_i = 1$ indicates the sentence was selected and $z_i = 0$ means the sentence was left out. Then, $z_i z_j = 1$ if and only if both sentences were chosen. We will also define the importance of each sentence $w_i$ for the sentence $i$ and interaction terms $w_{i,j}$ between sentences $i$ and $j$.

Let $x_i$ be the feature vectors for sentence $i$. $w_i = w(x_i)$ captures how important it is to include this sentence (or the topics covered by it) while $w_{i,j} = w(x_i,x_j)$ indicates the amount of overlap between sentences in our summary. Finally we put all this in a minimization problem:

\begin{aligned} \underset{z_i}{\text{maximize }} & \sum_{i} w_i z_i - w_{i,j} z_i z_j \\ \text{s.t. } & z_i = 0 \text{ or } 1 \end{aligned}

This tries to maximize the total weight of the sentences covered and tries to minimize the amount of overlap. This is an integer programming problem similar to finding the lowest weight independent set in a graph and many techniques exist to solve such problems.

This design, in my opinion, captures the fundamental problems in text summarization and can be extended in many ways. We will discuss those in a bit, but first, we need to completely specify the features $w$. $w_i = w(x_i)$ could be a function only of the sentence $i$, but it could also depends on the place of the sentence in the document or its context (Is the sentence at the beginning of a paragraph? Does it share common words with the title? What is its length? Does it mention any proper nouns? etc)

$w_{i,j} = w(x_i,x_j)$ is a similarity measure. It measures how many repetitions there will be if we include both words in the sentence. It can be defined by looking at common words between sentences. We can also extract topics or concepts from each sentence and see how many are common among them, and use language features like pronouns to see if one sentence expands on another.

To improve the design, first, we could do keyphrase extraction, i.e. identify key phrases in the text and choose to define the above problem in terms of those instead of trying to pick sentences. That is a similar problem to what Google does to summarize news articles in their search results, but I am not aware of the details of their approach. We could also break the sentences up further into concepts and try to establish the semantic meaning of the sentences ( Ponzo and Fila are people P1 and P2, a mall is a place P, P1 and P2 went to the place P at time T (day). Mode of transport walking.... and so on). To do this, we would need to use a semantic ontology or other common-sense knowledge databases. However, all the parts of this last semantic classification problem are open and I have not seen anyone make satisfactory progress on it yet.

We could also tweak the loss function above so that instead of the setting the tradeoff between the sentence importance $w_i$ and the diversity score $w_{i,j}$ by hand, we could learn it from data. One way to do this is to use Conditional Random Fields to model the data, but many others surely exist.

I hope this answer explained the basic problems that need to be solved to make progress towards good summarization systems. This is an active field of research and you will find the most recent papers via Google Scholar, but first read the Wikipedia page to learn the relevant terms

What artificial intelligence strategies are useful for summarization?

1 Answers1