In general, if you have a composite function $h(x) = g(f(x))$, then $\frac{dh}{dx} = \frac{d g}{df} \frac{d f}{dx}$. In your case, the function to differentiate is
$$L_{i}(\theta_{i}) = \mathbb{E}_{(s,a,r,s') \sim U(D)} \left[ \left(r+\gamma \max_{a '} Q(s',a',\theta_{i}^{-}) - Q(s,a;\theta_{i}) \right)^2 \right]$$
So, we want to calculate $\nabla_{\theta_i} L_{i}(\theta_{i})$, which is equal to
$$\nabla_{\theta_i} \mathbb{E}_{(s,a,r,s') \sim U(D)} \left[ \left(r+\gamma \max_{a '} Q(s',a',\theta_{i}^{-}) - Q(s,a;\theta_{i}) \right)^2 \right] \label{2} \tag{2} $$
For clarity, let's ignore the iteration number $i$ in \ref{2}, which can thus more simply be written as
$$\nabla_{\theta} \mathbb{E}_{(s,a,r,s') \sim U(D)} \left[ \left(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta ) \right)^2 \right] \label{3} \tag{3} $$
The subscript of the expected value operator $\mathbf{e}=(s,a,r,s') \sim U(D)$ means that the expected value is being taken with respect to the multivariate random variable, $\mathbf{E}$ (for experience), whose values (or realizations) are $\mathbf{e}=(s,a,r,s')$, and that follows the distribution $U(D)$ (a uniform distribution), that is, $\mathbf{e}=(s,a,r,s')$ are uniformly drawn from the experience replay buffer, $D$ (or $\mathbf{E} \sim U(D)$). However, let's ignore or omit this subscript for now (because there are no other random variables in \ref{3}, given that $\gamma$ and $a'$ should not be random variables, thus it should not be ambiguous with respect to which random variable the expectation is being calculated), so \ref{3} can be written as
$$\nabla_{\theta} \mathbb{E} \left[ \left(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta ) \right)^2 \right] \label{4} \tag{4} $$
Now, recall that, in the case of a discrete random variable, the expected value is a weighted sum. In the case of a continuous random variable, it is an integral. So, if $\mathbf{E}$ is a continuous random variable, then the expectation \ref{4} can be expanded to
$$
\int_{\mathbb{D}} {\left(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta )\right)}^2 f(\mathbf{e}) d\mathbf{e}
$$
where $f$ is the density function associated with $\mathbf{E}$ and $\mathbb{D}$ is the domain of the random variable $\mathbf{E}$.
The derivative of an integral can be calculated with the Leibniz integral rule. In the case the bounds of the integration are constants ($a$ and $b$), then the Leibniz integral rule reduces to
$${\displaystyle {\frac {d}{dx}}\left(\int _{a}^{b}f(x,t)\,dt\right)=\int _{a}^{b}{\frac {\partial }{\partial x}}f(x,t)\,dt.}$$
Observe that the derivative is taken with respect to the variable $x$, while the integration is taken with respect to variable $t$.
In our case, the domain of integration, $\mathbb{D}$, is constant because it only represents all experience in the dataset, $\mathcal{D}$. Therefore, the gradient in \ref{4} can be written as
\begin{align}
\nabla_{\theta} \int_{\mathbb{D}} \left(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta )\right)^2 f(\mathbf{e}) d\mathbf{e}
&= \\
\int_{\mathbb{D}} \nabla_{\theta} \left( \left( r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta )\right)^2 f(\mathbf{e}) \right) d\mathbf{e}
&=\\
\mathbb{E} \left[ \nabla_{\theta} { \left(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta) \right)}^2 \right] \label{5}\tag{5}
\end{align}
Recall that the derivative of $f(x)=x^2$ is $f'(x)=2x$ and that the derivative of a constant is zero. Note now that the only term in \ref{5} that contains $\theta$ is $Q(s,a; \theta)$, so all other terms are constant with respect to $\theta$. Hence, \ref{5} can be written as
\begin{align}
\mathbb{E} \left[ \nabla_{\theta} {(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta))}^2 \right]
&=\\
\mathbb{E} \left[ 2 {(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta))} \nabla_{\theta} \left( {r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta)} \right) \right]
&=\\
\mathbb{E} \left[ 2 {(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta))} \left( {\nabla_{\theta} r + \nabla_{\theta} \gamma \max_{a'} Q(s',a',\theta^{-}) - \nabla_{\theta} Q(s,a; \theta)} \right) \right]
&=\\
\mathbb{E} \left[ - 2 {(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta))} {\nabla_{\theta} Q(s,a; \theta)} \right]
&=\\
-2 \mathbb{E} \left[ {(r + \gamma \max_{a'} Q(s',a',\theta^{-}) - Q(s,a; \theta))}{\nabla_{\theta} Q(s,a; \theta)} \right]
\end{align}
The $-2$ disappears because it can be absorbed by the learning rate.