6

I'm using an Ubuntu 14.04.4 server, running sshd OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8, OpenSSL 1.0.1f 6 Jan 2014.

The server is also running Intel's DPDK framework, to develop network software; part of doing that involves bringing down interfaces at the Linux level to bind them to DPDK. However, the network interface used to ssh from the outside is never brought up or down, only others are touched.

Most of the time ssh works fine, but once every few days it stops working; ssh sessions are interrupted, and trying to reconnect by running ssh -v halts at the message Local version string SSH-2.0 ... (i.e. the client can establish a connection, it's the SSH part that fails).
Directly connecting to the machine doesn't work either, the command-line interface doesn't show up, just a blank screen.
TCP connections can be established, and the machine still answers pings.

This is pretty annoying, since the server then needs to be rebooted.

I enabled debug3 logs on the server, and the log in /var/log/auth.log when a client tries (and fails) to connect look like this:

sshd[1688]: debug3: fd 5 is not O_NONBLOCK
sshd[1688]: debug1: Forked child 39149.
sshd[1688]: debug3: send_rexec_state: entering fd = 13 config len 724
sshd[1688]: debug3: ssh_msg_send: type 0
sshd[1688]: debug3: send_rexec_state: done
sshd[39149]: debug3: oom_adjust_restore
sshd[39149]: Set /proc/self/oom_score_adj to 0
sshd[39149]: debug1: rexec start in 5 out 5 newsock 5 pipe 12 sock 13

This log doesn't seem any different from the one for successful connections, except that it stops there, whereas successful connections continue (the next line is then debug1: inetd sockets after dupping: ...).

The problem seems to arise right when an interface is bound or unbound from DPDK.

What could be causing this? Are there workarounds?

  • Hi, when you say "not the one SSH sessions are using" do you mean a) not the one your connection is running on or b) you explicitly configured ssh to not care about it (e.g. ListenAddress)? – Christian Ehrhardt Jan 23 '17 at 07:34
  • @ChristianEhrhardt There are no explicitly configured addresses, but a) is correct, not the one the connection is on. I had a chance to connect to the machine directly today, and it turns out the command-line interface doesn't work there either - I've updated the question. – Solal Pirelli Jan 24 '17 at 16:47
  • Have to let ssh and dpdk more explicitly stay away from each other? a) ssh to only care on the subnet b) set DPDK pci whitelist (-w) to be more specific. To be clear I'd expect your case to work as-is, but it does not - so this is just a try to raise the wall in between dpdk and ssh a bit more. FWIW I looped enabling disabling a DPDK App and binding/unbinding devices today without loosing my ssh connection, so no repro here (Ubuntu 16.10 / DPDK 16.07.2) – Christian Ehrhardt Jan 30 '17 at 06:44
  • You said the local command line didn't work either. Is there any new insights that the community should know to help getting you over the issue? – Christian Ehrhardt Feb 20 '17 at 06:39

1 Answers1

1

I had issues with ssh timeouts, I found a workaround by using:

 sudo sysctl -w net.ipv4.tcp_keepalive_time=50 \
 net.ipv4.tcp_keepalive_intvl=10 \
 net.ipv4.tcp_keepalive_probes=5