All,
I have a WCF web service (let's called service "B") hosted under IIS using a service account (VM, Windows 2003 SP2). The service exposes an endpoint that use WSHttpBinding with the default values except for maxReceivedMessageSize, maxBufferPoolSize, maxBufferSize and some of the time outs that have been increased.
The web service has been load tested using Visual Studio Load Test framework with around 800 concurrent users and successfully passed all tests with no exceptions being thrown. The proxy in the unit test has been created from configuration.
There is a sharepoint application that use the Office Sharepoint Server Search service to call web services "A" and "B". The application will get data from service "A" to create a request that will be sent to service "B". The response coming from service "B" is indexed for search. The proxy is created programmatically using the ChannelFactory.
When service "A" takes less than 10 minutes, the calls to service "B" are successfull. But when service "A" takes more time (~20 minutes) the calls to service "B" throw the following exception:
Exception Message: An unsecured or incorrectly secured fault was received from the other party. See the inner FaultException for the fault code and detail
Inner Exception Message: The message could not be processed. This is most likely because the action 'namespace/OperationName' is incorrect or because the message contains an invalid or expired security context token or because there is a mismatch between bindings. The security context token would be invalid if the service aborted the channel due to inactivity. To prevent the service from aborting idle sessions prematurely increase the Receive timeout on the service endpoint's binding.
The binding settings are the same, the time in both client server and web service server are synchronize with the Windows Time service, same time zone.
When i look at the server where web service "B" is hosted i can see the following security errors being logged:
Source: Security
Category: Logon/Logoff
Event ID: 537
User NT AUTHORITY\SYSTEM
Logon Failure:
Reason: An error occurred during logon
Logon Type: 3
Logon Process: Kerberos
Authentication Package: Kerberos
Status code: 0xC000006D
Substatus code: 0xC0000133
After reading some of the blogs online, the Status code means STATUS_LOGON_FAILURE and the substatus code means STATUS_TIME_DIFFERENCE_AT_DC. but i already checked both server and client clocks and they are syncronized.
I also noticed that the security token seems to be cached somewhere in the client server because they have another process that calls the web service "B" using the same service account and successfully gets data the first time is called. Then they start the proccess to update the office sharepoint server search service indexes and it fails. Then if they called the first proccess again it will fail too.
Has anyone experienced this type of problems or have any ideas?
Regards,
--Damian