IProvider: Oleg Grodzevich Scientific and Software Development Playground
#

Mysterious VPN authentication problems

Sat 17 Feb 2007 | 4:54 PM | Oleg Grodzevich

Yesterday night I’ve discovered that users from one of our domains can no longer establish VPN connections. A mysterious situation indeed — just two days ago everything was working fine, I’ve been doing absolutely nothing critical on either of the VPN server or the domain controller, and yet any connection attempt results in VPN error 691 with a very promising text “Access denied because username and/or password is invalid on the domain.”

Quick investigation only added to the puzzle. Examining events revealed the following error message in Application log:

Event Type: Error
Event Source:   IAS
Event Category: None
Event ID:   5052
Date:       2/17/2007
Time:       4:10:52 PM
User:       N/A
Computer:   COMPUTERNAME
Description:
The description for Event ID ( 5052 ) in Source ( IAS ) cannot
be found. The local computer may not have the necessary
registry information or message DLL files to display messages
from a remote computer. You may be able to use
the /AUXSOURCE= flag to retrieve this description;
see Help and Support for details.
The following information is part of the event: DOMAINNAME.

The 5052 event means that no domain controller can be located for the DOMAINNAME. Well, it could be a good explanation for the situation. But… the domain controller was available. Same users could log into the VPN server remotely, dcdiag saw the DC, etc.

The (usual) first thing to blame is the DNS resolution. I thoroughly checked DNS settings just to discover everything was perfect there. Neither of netdiag, dcdiag, or dnslint reported problems. What was even more mysterious that users from a different domain with exactly the same configuration experienced no problems. These domains have a cross-forest trust and while one of them works the other doesn’t.

Having no other option as to go really deep I’ve turned on the tracing on IAS:

netsh ras diagnostics set rastracing * enabled

Once tracing is enabled, IAS/RADIUS starts spamming into %SystemRoot%/tracing folder. In my case the file iassam.log is of a special interest. What I got was the following:

NT-SAM Names handler received request with user identity
DOMAINNAME\USERNAME.
Username is already an NT4 account name.
SAM-Account-Name is "DOMAINNAME\USERNAME".
NT-SAM Authentication handler received request for
DOMAINNAME\USERNAME.
Processing MS-CHAP v2 authentication.
LogonUser succeeded.
NT-SAM User Authorization handler received request for
DOMAINNAME\USERNAME.
Opening LDAP connection to DOMAINCONTROLLER.
The registry value DisableLdapEncryption does not exist.
Using default 0
Trying to set LDAP encryption = 1
LDAP ERROR in ldap_bind_s. Code = 82
Extended error string: (null)
LDAP connect failed: A directory service error has occurred.

Which means that it was able to authenticate the user (i.e. confirm one’s identity) but failed to authorize establishing of the VPN connection. Lack of authorization usually means that you are not granted Dial-in permission, but in our case IAS failed to bind to LDAP when trying to retrieve permission bits. Moreover, it failed with a very explanatory error message — “A directory service error has occurred.”

Well, it looked like the domain controller was in fact around, but might have a corrupted database (very reassuring.) To test this hypothesis I started ldp.exe which allows to connect, bind, and then explore the LDAP tree. But…, unlike IAS, it managed to bind rather perfectly. At this point I’ve pretty much exhausted all of ideas. Some sort of a mystery, especially when you have a perfectly working second domain.

I was left no choice as to try to recollect all of my actions of past few days. Meditating for some time I realized that I’d been doing something with the second domain (the one working well). What I did was setup a time synchronization with an external server because its clock was a little bit behind… time, Time, TIME!!? I rushed to check the time on the domain controller in trouble. Lo and behold, its clock was a few minutes past the world time. At the same moment the VPN server had its clock synchronized with the other DC that had the correct time. Moving the clock forward on the domain controller instantly fixed the problem! I still couldn’t believe it.

Leave your five cents

Required