OpsMgr 2007 PKI and Gateway Scenarios Part 4: Troubleshooting Mutual Authentication
You can find the following Troubleshooting information in the updated version of our Gateway and PKI Scenarios Guide…and Thanks to Cameron, Neale and the anyone else I fail to mention for their input.
Let me start by saying you can generally spot failure of mutual authentication in a certificate-based authentication scenario by the presence of the following error event.
Event Type: Error
Event Source: OpsMgr Connector
Event Category: None
Event ID: 21006
Description: The OpsMgr Connector could not connect to NOCRMS01.noc.momresources.org:5723. The error code is 10061L (No connection could be made because the target machine actively refused it.
So you have followed all the instructions, and the agent still cannot communicate with the gateway, or the gateway cannot communicate with the upstream management server. Now what?
We will validate 3 key areas:
- Network connectivity
- Name resolution and
- Certificate installation and configuration
Start by verifying network connectivity and name resolution
1. Validate that the Gateway Server can ping the MS server by fully qualified domain name (if ping is restricted to the Management Server validate with the telnet test - see step 3 below).
If the two nodes attempting to communicate cannot reach one another by name (whether mgmt server to gateway, mgmt server to agent, agent to gateway) mutual authentication will fail.
2. Validate that the Management Server can ping the Gateway server by fully qualified domain name.
Mutual authentication is a two-way process. Name resolution failures in either direction will cause the process to fail.
NOTE: In scenarios where the two hosts are communicating across the Internet and name resolution of the remote host is impossible due to non-routable FQDNs (a name using a private namespace, such as contoso.int), you may add the FQDN and IP address to the HOST file on the opposing computer.
In other words, if the Management Server cannot resolve the FQDN of the Gateway Server, add the FQDN and public IP address of the Gateway Server to the HOST file on the Management Server. If the reverse is true, simply perform these steps in reverse.
3. Validate that the Gateway Server can telnet to the Management Server on port 5723.
This simply validates that 1) MS is listening on the expected port and 2) port is reachable through any firewalls in the path.
Verify certificates are present on the Management Server
1. Validate that the root certificate exists in Local Computer / Personal / Trusted Root Certificates / Certificates on the Management Server
This is often overlooked. Especially if you are issuing certificates from an internal Root CA, the Root CA may not be trusted by default (definitely not if a stand-alone Root CA).
2. Validate that the certificate exists in Local Computer / Personal / Certificates on the Management Server
The requested certificate must of course be installed if we expect the process to work.
3. Validate that the certificate exists in Local Computer / Operations Manager / Certificates on the Management Server
The requested certificate must of course be installed if we expect the process to work.
Verify certificates are present on the Gateway Server or Agent computer (in scenarios where no gateway exists)
1. On the Gateway Server, validate that the root certificate exists in Local Computer / Personal / Trusted Root Certificates / Certificates
2. On the Gateway Server, validate that the certificate exists in Local Computer / Personal / Certificates
3. On the Gateway Server, validate that the certificate exists in Local Computer / Operations Manager / Certificates.
Verify MOMCertImport successfully wrote the Certificate serial number to the registry
1. On the Gateway Server, validate the existence of the HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings\ChannelCertificateSerialNumber with the value of the certificate (from the Local Computer / Personal / Certificates folder within the details in the Serial number field) reversed within it.
2. On the Management Server, validate the existence of the HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings\ChannelCertificateSerialNumber with the value of the certificate (from the Local Computer / Personal / Certificates folder within the details in the Serial number field) reversed within it.
Writing the certificate serial number to the registry is actually the function of the MOMCertImport tool. If you do not see the serial number of the certificate here, then you should run the tool again.
What does “serial number reversed” refer to? Great question!
When you view the serial number field in the certificate properties in the Certificate Store, it looks like this:
But in the registry key mentioned here, it looks like this:
If you look closely, you will see the serial number in the registry is entered in reverse order! No idea why this is. However, when you turn this around and compare the two, the values should match. If not, delete the value and try again.
Verifying successful mutual authentication and communication
After verifying the above, repeating configuration steps as necessary, your efforts can be verified as successful with the following event in the Operations Manager Event Log on the gateway or agent machine:
Event Type: Information
Event Source: OpsMgr Connector
Event Category: None
Event ID: 21025
Computer: CUSTSCE01
Description:
OpsMgr has received new configuration for management group LAB from the Configuration Service. The new state cookie is “78 50 81 18 D4 B8 39 18 47 08 26 F7 93 1F B2 D2 A1 73 37 69 ”
For a detailed HOW-TO on gateway and certificate-based authentication scenarios in Operations Manager 2007, see our recently updated Gateway and PKI Scenarios Guide.
