1

(see EDIT added to end, bout avahi stuff)


It just does this:
> ssh -v [ubuntu_comp_hostname].local =>

    OpenSSH_8.2p1, LibreSSL 3.0.2  
    debug1: Reading configuration data /home/o1/.ssh/config  
    debug1: Reading configuration data /etc/ssh/ssh_config  
    debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling  
    ssh: Could not resolve hostname [ubuntu_comp_hostname].local: Name or service not known  

(Like, this didn't even start after rebooting or waking one of the computers from sleep or anything. It was working, I got up for a few minutes to get some tea, came back, and it was suddenly no longer working.)


I can still ssh in by using the IP of the ubuntu computer
(
ie, in place of [ubuntu_comp_hostname].local)
-- it's the IP I get by running this command:
ifconfig | sed -En 's/127.0.0.1//;s/.*inet (addr:)?(([0-9]*\.){3}[0-9]*).*/\2/p'
(I copied it from somewhere I forget and have had it hanging around in an alias for a while)
)


I can still ssh in to the other computer from the ubuntu computer, by using the hostname of the other computer.


And very confusingly to me,
I'm running a synergy client on the ubuntu computer,
and the client and server connect by using the two computers' hostnames,
and that still works just fine
(I just killed and restarted synergy on both computers to check, and yup, it reconnected just fine.)


I have no clue: where do I start on trying to debug this?



EDIT (copied up from my comments below):

I just tried ps -eo cmd|grep avahi, and found (besides avahi-daemon: chroot helper)
this:
avahi-daemon: running [ubuntu_comp_hostname-2.local].
(ie, there's this extra -2 there on the end of the hostname for no apparent reason).
((Trying to connect with ssh ubuntu_comp_hostname-2.local does work.))

I'm extremely ignorant on all this stuff, but I'm guessing that...

  • avahi crashed or restarted or something?

  • And then it tried to run as plain [ubuntu_comp_hostname.local],
    but that was still hanging around in a defunct state or something,
    so it automatically appended a -2 to the end of the actual hostname?

And yeah, sudo service avahi-daemon restart worked
-- ie, there's no more -2,
and everything connects by hostname.local as expected again
(even though I removed the ~/.ssh/config file with I added as a workaround as recommended (didn't have any such file before))

I'm still wondering what caused the problem in the first place, though,
and how to prevent it from happening again
(or at least get it to automatically fix itself?)

Owen_AR
  • 361

2 Answers2

1

EDIT

The cause of the problem

As for why the server name is not correctly resolved to the IP, I wouldn't know, this requires diagnosis. Please perform the following actions:

  1. Post the contents of /etc/hosts, /etc/resolv.conf /etc/nsswitch.conf.
  2. Post the contents of ${HOME}/.ssh/config. Try removing (for privacy issues) as little as possible.
  3. Rename config and try ssh-ing. Post the outcome.
  4. Try connecting not using .local, using both the IP and server name. Post the outcome.
  5. Specify what is your network configuration... 1) Where are the connecting machines located? 2) Which is the DNS?
  6. Try host hostname_of_ubuntu_comp. Post the outcome.
  7. Further debug with ssh -vvv .... Post the outcome.
  8. Post the output of netstat -plunt | grep 22 in the server.
  9. Post the output of systemctl status ssh in the server.

Related:

  1. What does ".local" do?
  2. https://serverfault.com/questions/831747/why-is-ssh-not-resolving-this-hostname/831769
  3. https://stackoverflow.com/questions/20252294/ssh-could-not-resolve-hostname-hostname-nodename-nor-servname-provided-or-n
  4. https://docstore.mik.ua/orelly/networking_2ndEd/ssh/ch12_01.htm

A way to make it work

You say you can ssh into the server using the IP. Let's say the command that you use to connect is

$ ssh user_at_server@10.20.10.10

Then, make sure in ${HOME}/.ssh/config in your client you have the following lines (the last two are convenient, but optional)

Host server_user
  HostName      10.20.10.10
  User          user_at_server
  ServerAliveInterval 30
  ForwardX11 yes

Then make sure the right attributes are assigned:

$ chmod 0664 ${HOME}/.ssh/config
$ chmod 0700 ${HOME}/.ssh

You should now be able to

$ ssh server_user
  • Well, thanks, that works as a workaround. (I just added a .local like Host server_user.local, since all my shell functions expect it.) I don't think I can count this as the actual answer, though, since I still have no idea what caused the problem or how to fix that root cause. I mean, it worked perfectly since I installed the ubuntu and set up ssh (about 5 days ago), and then suddenly (again, literally just got tea and came back) ... poof? As far as I can remember, I didn't install/upgrade/change anything in the system. (And /var/log/apt/history.log agrees, if I understand it right.) – Owen_AR Mar 28 '21 at 06:58
  • @Owen_AR - Please see edited answer. – sancho.s ReinstateMonicaCellio Mar 28 '21 at 10:18
  • Oh thanks for keeping helping! I just tried ps -eo cmd|grep avahi, and found (besides avahi-daemon: chroot helper) this: avahi-daemon: running [ubuntu_comp_hostname-2.local]. (And ssh ubuntu_comp_hostname-2.local works as expected.) I'm extremely ignorant on all this stuff, but I'm guessing that avahi ... crashed or restarted or something? And then it tried to run as plain [ubuntu_comp_hostname.local], but that was still hanging around in a defunct state or something, so it automatically appended a -2 to the end of the actual hostname? Sound like I'm on the right track? – Owen_AR Mar 30 '21 at 04:39
  • And yeah, sudo service avahi-daemon restart worked (no more -2, and I removed the ~/.ssh/config files from both computers (didn't have any before), and everything connects by hostname.local as expected again.) I'm still wondering what caused the problem in the first place, though, and how to prevent it from happening again (or at least get it to automatically fix itself?) – Owen_AR Mar 30 '21 at 04:57
  • @Owen_AR - I wouldn't know. You had reset the problematic condition, plus you didn't try what I suggested nor you posted the output of the diagnostics I mentioned. So (for me) it is hard to guess now, more than I did before. But it is good that you are up and running now. – sancho.s ReinstateMonicaCellio Mar 30 '21 at 21:56
0

Using .local as your lan domain could be the problem, since it is reserved for multicast DNS (mDNS) and zeroconf. Recent linux updates enforce this. You can safely change that to .lan but you will have to update all your scripts.

https://en.wikipedia.org/wiki/.local#:~:text=local%20is%20a%20special%2Duse,localhost.

user3806
  • 21
  • 2
  • Huh??? I never configured it to use .local -- it's just a systemd service that comes enabled by default, and that's what it does without any configuration. It's just that there seems to be some strange race-condition that it hits every once in a while where it goes like (from journalctl -u avahi-daemon, with hostname "myhostname") Oct 20 18:59:22 myhostname avahi-daemon[575]: Host name conflict, retrying with myhostname-2, and then I have to notice the problem and manually do sudo service avahi-daemon restart to get it back to plain avahi-daemon: running [myhostname.local]. – Owen_AR Nov 07 '21 at 17:20
  • You can have zeroconf autoconfigured IP but you shouldn't be using them or .local if you have DHCP active. The DHCP server (your ISPs box or your router) also does DNS. Sometimes, this DNS is incorrectly configured and uses .local as a search domain, for instance. You could look in /etc/resolv.conf and check the search option, or investigate "systemd-resolve --status" for the search option. You have found that adding -2 is the result of a hostname conflict, which I interpret as having 2 hosts requesting the same hostname on a network. Possibly two NICs to the same network ? – user3806 Nov 09 '21 at 01:02
  • Thanks for trying to help, but honestly I don't know enough to really understand most of that (although I did take a look at that etc file and command output). But taking a step back... ultimately, how could it make sense to count this as anything other than a bug somewhere in the default setup? Cuz the only relevant change I made to the out-of-box setup was apt install openssh-server, after which everything almost always "just works", except for this weird intermittent bug... every week or two on average? (ie, sometimes a couple times a week, sometimes not for a couple weeks) – Owen_AR Nov 09 '21 at 16:30
  • (Whereas if using hostname.lan rather than hostname.local is "supposed" to be the "official" "default", and this is somehow supposed to be "enforced" now... then that "default" should've been automatically set up, right?) – Owen_AR Nov 09 '21 at 16:31
  • Your router should be correctly set up. Your router defines two key pieces of information:
    • The name of your local network: in theory it can be anything, but network security is increased if you choose among the domains returned by "systemd-resolve --status" under DNSSEC_NTA (corp, home, internal, intranet, lan, private, test), excluding "local".
    – user3806 Nov 12 '21 at 13:45