Last week, I noticed Dynamics CRM had become slower and my DB had grown too large for the amount of real records I expected it to have. After some investigation I realized it was a combination of things.
The symptoms of a problem
There were way too many Matchcode Update records with a Waiting status. The normal System Jobs report can’t show more than 250 records at once and I had gone through several pages already. This simple SQL query showed that there were more than 100.000 records.
Select COUNT(*) As TotalWaitingTasks from dbo.AsyncOperationBase where StatusCode=10
This was going to take a long time to solve. (It took me about 8 hours to have the DB back to a normal size, because its server is not specially powerful).
I found several posts that suggested changing the status, the posts offered code to do it programmatically, but due to the number of records I’d have to modify them to be able to do it in chunks, avoiding a prolonged database lockout. The console utilities were too involved and after doing some investigation I realized that the fields I needed to change were two in a single table. Another simple SQL Query took care of it in a few hours.
Update dbo.AsyncOperationBase
Set StatusCode = 32, StateCode = 3
where OperationType = 12 and StatusCode=10
The OperationType value is for MatchCode Update, later on I found a few other workflows, so I just changed this the corresponding value. The value of 10 in StateCode is for “Waiting” and setting it to 3 is equivalent to “Completed”
I could’ve wrapped this one to do it chunks, but the day was over and nobody would be using the DB until the next day so I left it running.
Day 2
Early next day the update had finished. All System Jobs involved had been canceled. Next steps, delete the records and then shrink the DB.
However before I set out to delete them I needed to find the cause while the data was still there. Another simple query was needed:
Select Top 5 [Message] from dbo.AsyncOperationBase where StateCode = 3
I just get the first five, because obviously the result set would be way to big. In the field the reason was clear, it was a communication problem. The Message field showed an error stack trace:
System.Net.WebException: The request failed with HTTP status 400: Bad Request.
at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, _
WebResponse response, Stream responseStream, Boolean asyncCall)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
at Microsoft.Crm.SdkTypeProxy.CrmService.RetrieveMultiple(QueryBase query)
at Microsoft.Crm.Asynchronous.SdkTypeProxyCrmServiceWrapper.RetrieveMultiple(QueryBase query)
at Microsoft.Crm.Asynchronous.UpdateContractStatesOperation.ExpireContracts()
at Microsoft.Crm.Asynchronous.UpdateContractStatesOperation.InternalExecute(AsyncEvent asyncEvent)
at Microsoft.Crm.Asynchronous.AsyncOperationCommand.Execute(AsyncEvent asyncEvent)
at Microsoft.Crm.Asynchronous.AsyncHostHandler.Handle(AsyncEvent asyncEvent)
at Microsoft.Crm.Asynchronous.QueueManager.PoolHandler.ProcessAsyncEvent(AsyncEvent asyncEvent)
A quick look around and I found that this was due to an incorrect value for AsyncSdkRootDomain in the DeploymentProperties table. Yet another simple query.
SELECT [ColumnName],[NVarCharColumn]
FROM [MSCRM_CONFIG].[dbo].[DeploymentProperties]
Where [ColumnName] = 'AsyncSdkRootDomain'
The value in this field has to be like where “server” matches the name of the host where Dynamics CRM is installed, and port the port used when installing it. The standard default port is 5555. Therefor, for a host called BIGSERVER with a standard install it would be “BIGSERVER:5555”.
After changing the value to the correct one, and verifying that connection was successful I used a query I used this query from Microsoft Support to delete the Cancelled jobs. The query is originally intended to fix a problem with all completed jobs, which includes cancelled and successful ones. When used, some history for workflows will be lost. This was not an issue for me and I ran it with confidence.
The query also took a long time to run, but in the end it did its job. If you inspect it you will see that it appropriately does the job in steps limited to 2000 records and also uses transactions in case any errors happen.
I repeated this steps for other Testing organizations and I went to bed happy with the results of the day.