In L1 regularization, the penalty term you compute for every parameter is a function of the absolute value of a given weight (times some regularization factor).
Thus, irrespective of whether a weight is positive or negative (due to the absolute value) and irrespective of how large the weight is, there will be a penalty incurred as long as weight is unequal 0. So, the only way how a training procedure can considerably reduce the L1 regularization penalty is by driving all (unnecessary) weights towards 0, which results in a sparse representation.
Of course, the L2 regularization will also only be strictly 0 when all weights are 0. However, in L2, the contribution of a weight to the L2 penalty is proportional to the squared value of the weight. Therefore, a weight whose absolute value is smaller than 1, i.e. $abs(weight) < 1$, will be much less punished by L2 than it would be by L1, which means that L2 puts less emphasis on driving all weights towards exactly 0. This is because squaring a some value in (0,1) will result in a value of lower magnitude than taking the un-squared value itself: $x^2 < x\ for\ all\ x\ with\ abs(x) < 1$.
So, while both regularization terms end up being 0 only when weights are 0, the L1 term penalizes small weights with $abs(x) < 1$ much more strongly than L2 does, thereby driving the weight more strongly towards 0 than L2 does.