SSH Canonicalization

A couple of days ago I discovered OpenSSH's ability to manage canonicalization. You might be thinking to your self, "That's nice. Doesn't the system resolver take care of canonicalization? Why would you want OpenSSH to handle the canonicalization?" Those are fair questions, and I hope to provide fair answers. Further, I hope that my answers will intrigue you into using OpenSSH's canonicalization ability yourself.

The biggest reason that I see to have OpenSSH do the canonicalization is that it will canonicalize the hostnames for you and then it will re-parse the config files with the canonicalized host name. Let me say it again, OpenSSH will use the canonicalized name to match Host entries in the config files as if you had typed the full canonical name on the command line. This means that you can empower OpenSSH to do even more for you.

The following OpenSSH client config file will allow you to apply different ssh configuration options based on hostname pattern matches.

CanonicalizeFallbackLocal no
CanonicalizeHostname yes

Host *
	IdentityFile ~/.ssh/exampleCOM
	User gtaylor
Host *
	IdentityFile ~/.ssh/exampleNET
	Port 2222
Host *
	ForwardX11 yes
	IdentityFile ~/.ssh/exampleORG

Suppose that I have the following three Raspberry Pis connected to RepRap printers that I helped friends manage:

Sure, I could add individual Host entries to my OpenSSH client config file. But suppose for scalability I actually was needing to work with thousands of devices that would canonicalize to a few domain names. Adding them to the OpenSSH client config file is not as scalable as I might want.

So, here's what happens OpenSSH client does when I issue the ssh netreprap command in my terminal.

  1. Process the configuration file(s) looking for Host entries that match "netreprap"
  2. Attempt to resolve but fail.
  3. Attempt to resolve and succeed.
  4. Process the configuration file(s) looking for Host entries that match ""

#1 may or may not succeed, depending on what other Host entries are in the config file. #2 will fail because there is no host named "". #3 will succeed because there is a host named "". #4 will then re-process the config file(s) looking for any Host entries that match the canonical host name "". Since the canonical host name "" matches the "*" Host entry, ssh applies the IdentityFile and Port settings to the connection.

Having OpenSSH do the canonicalization has the added advantage that it can use the canonical name to apply different settings, something that could not be done if the system resolver library did the canonicalization for us. The system resolver would resolve the host names, but it would be in such a way that ssh couldn't take advantage of the canonical form, thus unable to apply specific settings.

If all of the host names followed a naming pattern you could probably create Host entries based off of those patterns. However, if your host names don't follow a pattern, you have to rely on the canonical host name, where you can apply a Host pattern.

Hopefully you can see how this will scale up quite well without needing to put a bunch of Host entries in your config file. My specific use case is with sub-domains rather than the top level domain, but the same idea holds true. I don't have any host names that appear in multiple sub-domains. Thus the three canonicalization directives and Host entries will cover thousands of machines for me, making for a nice simple client config file.

Update: 2016-01-17

A co-worker brought a problem that SSH canonicalization caused to my attention. Ultimately the problem was that CanonicalizeFallbackLocal was set to no. All of the testing that I had done was with hosts that were included in CanonicalizeDomains or other Host entries. We ended up re-enabling the default of CanonicalizeFallbackLocal = yes. This allowed all host names to resolve correctly.