Its Me

Monday, September 15, 2008

Failed On Start (Retrying) - Error Moss 2007 Sharepoint Services

All,

I have been under the scanner in my office to resolve one of the most complex (atleast in my office) Sharepoint issues. Its the standard 'Failed on Start(retrying)' error message that shows up whenever a workflow associated with the list fails. The workflow developed in SPD fails almost on all of my lists.

The workflows used to work fine till sometime and suddenly got to fail one day. Now, i thot this can be solved easily by googling it, but it was not. I did not get to the exact solution, but, i did get some nice references like the following: Look at the logs first, Rodney Langley's blog about application pool credentials, Workflow Bugs etc. The best link is the 3rd one which i got to get good leads as to why the workflows suddenly started to fail. THis happened on both our test and production servers with 3 days difference.

First i went to look into the log files immediately after running the workflow on a list. Checked the time stamp when the workflow failed and went to %ProgramFiles%\Common Files\Microsoft Shared\web server extensions\12\LOGS to open the latest log file. Then searched for 'Workflow Infrastructure'. The Level for this entry was: Unexpected. The full details were something like this:

06/11/2008 10:05:34.65 OWSTIMER.EXE (0x1A8C) 0x12EC Windows SharePoint Services Workflow Infrastructure 72fu Unexpected Load Workflow Class: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.InvalidOperationException: The requested Performance Counter is not a custom counter, it has to be initialized as ReadOnly. at System.Diagnostics.PerformanceCounter.Initialize() at System.Diagnostics.PerformanceCounter..ctor(String categoryName, String counterName, String instanceName, Boolean readOnly) at System.Workflow.Runtime.PerformanceCounterManager.CreateCounters(String name) at System.Workflow.Runtime.Hosting.DefaultWorkflowSchedulerService.OnStarted() at System.Workflow.Runtime.Hosting.WorkflowRuntimeService.HandleStarted(Object source, WorkflowRuntimeEventArgs e) at System.EventHandler`1.Invoke(Object sender, TEventArgs e)... 06/11/2008 10:05:34.65* OWSTIMER.EXE (0x1A8C) 0x12EC Windows SharePoint Services Workflow Infrastructure 72fu Unexpected ... at System.Workflow.Runtime.WorkflowRuntime.StartRuntime() at Microsoft.SharePoint.Workflow.SPWinOeHostServices..ctor(SPSite site, SPWeb web, SPWorkflowManager manager, SPWorkflowEngine engine) --- End of inner exception stack trace --- at System.RuntimeMethodHandle._InvokeConstructor(Object[] args, SignatureStruct& signature, IntPtr declaringType) at System.RuntimeMethodHandle.InvokeConstructor(Object[] args, SignatureStruct signature, RuntimeTypeHandle declaringType) at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at System.Ac... 06/11/2008 10:05:34.65* OWSTIMER.EXE (0x1A8C) 0x12EC Windows SharePoint Services Workflow Infrastructure 72fu Unexpected ...tivator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at System.Reflection.Assembly.CreateInstance(String typeName, Boolean ignoreCase, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at Microsoft.SharePoint.Workflow.SPWorkflowManager.LoadPluggableClass(String classname, String assembly, Object[] parameters) 06/11/2008 10:05:34.66 OWSTIMER.EXE (0x1A8C) 0x12EC Windows SharePoint Services Workflow Infrastructure 98d8 Unexpected System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.InvalidOperationException: The requested Performance Counter is not a custom counter, it has to be initialized as ReadOnly. at System.Diagnostics.PerformanceCounter.Initialize() at System.Diagnostics.PerformanceCounter..ctor(String categoryName, String counterName, String instanceName, Boolean readOnly) at System.Workflow.Runtime.PerformanceCounterManager.CreateCounters(String name) at System.Workflow.Runtime.Hosting.DefaultWorkflowSchedulerService.OnStarted() at System.Workflow.Runtime.Hosting.WorkflowRuntimeService.HandleStarted(Object source, WorkflowRuntimeEventArgs e) at System.EventHandler`1.Invoke(Object sender, TEventArgs e) at System.Workfl... 06/11/2008 10:05:34.66* OWSTIMER.EXE (0x1A8C) 0x12EC Windows SharePoint Services Workflow Infrastructure 98d8 Unexpected ...ow.Runtime.WorkflowRuntime.StartRuntime() at Microsoft.SharePoint.Workflow.SPWinOeHostServices..ctor(SPSite site, SPWeb web, SPWorkflowManager manager, SPWorkflowEngine engine) --- End of inner exception stack trace --- at System.RuntimeMethodHandle._InvokeConstructor(Object[] args, SignatureStruct& signature, IntPtr declaringType) at System.RuntimeMethodHandle.InvokeConstructor(Object[] args, SignatureStruct signature, RuntimeTypeHandle declaringType) at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at System.Activator.CreateInstanc... 06/11/2008 10:05:34.66* OWSTIMER.EXE (0x1A8C) 0x12EC Windows SharePoint Services Workflow Infrastructure 98d8 Unexpected ...e(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at System.Reflection.Assembly.CreateInstance(String typeName, Boolean ignoreCase, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at Microsoft.SharePoint.Workflow.SPWorkflowManager.LoadPluggableClass(String classname, String assembly, Object[] parameters) at Microsoft.SharePoint.Workflow.SPWorkflowManager.GetService(String objClass, String objAssm, SPSite parentSite, SPWeb parentWeb, SPWorkflowEngine engine) at Microsoft.SharePoint.Workflow.SPWorkflowManager.RunWorkflowElev(SPWorkflow originalWorkflow, SPWorkflow workflow, Collection`1 events, SPRunWorkflowOptions runOptions)

Almost all the times i ran the workflow i get the above entries in the logs. Then i figured out that the reason was because of a "bug" with Windows Workflow Foundation which doesnt set one of the dirtry boolean bits to true or false accordingly. This is tied up with the Performance Counters values which trigger the OSTIMER.exe to schedule the workflows to run whenever the list items are updated. So, the following URL points you to get the hotfix which fixes this dirty boolean bit accordingly. WWF HOTFIX(this is the .NET 3.0 WWF hotfix).
So, now i formulated the steps to repair this stupid error which i know for sure most of them have no clue why its caused.

1) Open the Performance Counter MMC. (in the run prompt enter perfmon.exe). Expand the 'Performance Logs and Alerts' item. Click on the 'Counter Logs' item. Open the 'System Overview' properties.Click on 'Add Counters' button. In the Performance Object list box, check to see if there is an entry for 'Windows Workflow Foundation'. If its not there then this the problem. Somehow the counters got messed up. This means you need to add the right perf counters to this list so that all the associations of timer jobs are kicked off correctly.

2) If the .NET Framework 3.0 is installed on the system, then please repair it from Add/Remove Programs snap-in. Or else, my suggested method would be to RE-INSTALL .NET 3.0 framework.

3) After the .NET 3.0 is installed, install the .NET 3.0 hotfix. Link: WWF Hotfix.

4) After the .NET 3.0 Hotfix is installed, stop and start all the Application Pool instances and also the Web-Sites in the IIS Manager.How to do it? Here are the steps:
a) In the run prompt enter: inetmgr.exe. From the IIS snap-in, expand the server to see the details.
b) Expand the 'Application Pools' item. Now click on each of the pool identity, hit the Stop button. Then hit the Start button.
c) Repeat the above steps (b) for 'Web Sites' item.
[Stop and Start the relevant items specific to Sharepoint]

5) Once this is done, reset the IIS just to make sure that everthing works fine. To do this: in the run prompt enter: iisreset.

6) Now the final step is to re-associate all the Performance Counters with the windows workflow foundations instances. In the run prompt enter: cmd. Basically opening a command prompt.Change your working directory to the following: \WINDOWS\Microsoft.NET\Framework\v3.0\Windows Workflow Foundation

Now execute the following command: "LodCtr PerfCounters.ini" (excluding the quotes). If its success you do not get any message. Assume that if there is no message, the command executed successfully.

Now repeat the steps indicated in Step - 1. Now you should see 'Windows Workflow Foundation' counter in the list.
Re-execute any workflow on any list that has workflows. It should be successful. For me these steps worked like a charm albeit for all the efforts i had to put in to find out the root cause. If you liked my blog, please add your comments. That surely motivates me :).

thanks,
-Mantrala, Kiran

8 comments:

Jovix said...

I've been searching for weeks to find a solution to this problem. I'm going to give this a try and let you know. However I have different variations of the Workflow problem. Throughout all of my testing, I have encountered the "Failed On Start (Retrying)". Also, in other tests the workflow kicks off but the status remains on Starting and never changes. I have also encountered an issue where the workflow works only on the first time it runs, any attempts after the first run fails. I've found blogs and articles on how to fix these issues but nothing has worked so far. Thanks for sharing your findings with the rest of us...I will let you know what I come up with.

Unknown said...

Sure. If my blog helped you, I am more than happy.
-Kiran

Unknown said...

Dear Kiran,

I am from Taiwan and your article are very useful to me.Thank you very much.

Best Regards,
James

Mantrala, Kiran said...

Thank you James. Glad that my blog helped you.

Unknown said...

Hi Mantrala,

Your solution is very well documented and thorough. But i can see the WWF performance counters in that but still the workflow fails to start. Here are the details about my error.

My WF used to work fine, but they stopped working one day (i dont know why?). I have created one of the sites using the LendingLibrary template that comes from Microsoft out of the box solution. When a new item is created in the list, the workflow gives and error "Failed to start (retrying)". And say after about 2 minutes, it starts then works fine. It happens every time when a new item is created. The error from the log files is attached below. If you can look into this matter and see what's causing the workflow not to kick off straight away. I am stuck with this problem for weeks and have tried many solutions, but no avail.

04/22/2009 10:38:39.63 w3wp.exe (0x102C) 0x05F8 Windows SharePoint Services Workflow Infrastructure 72fs Unexpected RunWorkflow: System.ArgumentException: Value does not fall within the expected range. at Microsoft.SharePoint.Workflow.SPWorkflowActivationProperties..ctor(SPWorkflow workflow, Int32 runAsUserId, String associationData, String initiationData) at Microsoft.SharePoint.Workflow.SPWinOEWSSService.MakeActivation(SPWorkflow workflow, SPWorkflowEvent e) at Microsoft.SharePoint.Workflow.SPWinOeEngine.RunWorkflow(Guid trackingId, SPWorkflowHostService host, SPWorkflow workflow, Collection`1 events, TimeSpan timeOut) at Microsoft.SharePoint.Workflow.SPWorkflowManager.RunWorkflowElev(SPWorkflow originalWorkflow, SPWorkflow workflow, Collection`1 events, SPRunWorkflowOptions runOptions)
04/22/2009 10:38:39.63 w3wp.exe (0x102C) 0x05F8 Windows SharePoint Services Workflow Infrastructure 98d7 Unexpected System.ArgumentException: Value does not fall within the expected range. at Microsoft.SharePoint.Workflow.SPWorkflowActivationProperties..ctor(SPWorkflow workflow, Int32 runAsUserId, String associationData, String initiationData) at Microsoft.SharePoint.Workflow.SPWinOEWSSService.MakeActivation(SPWorkflow workflow, SPWorkflowEvent e) at Microsoft.SharePoint.Workflow.SPWinOeEngine.RunWorkflow(Guid trackingId, SPWorkflowHostService host, SPWorkflow workflow, Collection`1 events, TimeSpan timeOut) at Microsoft.SharePoint.Workflow.SPWorkflowManager.RunWorkflowElev(SPWorkflow originalWorkflow, SPWorkflow workflow, Collection`1 events, SPRunWorkflowOptions runOptions)

Mantrala, Kiran said...

Soban, not all problems can be solved by my solution. Each and every case is different. So, I would suggest you to look through the logs thoroughly and get the root cause.

Indrajeet Kumpavat said...

Hi Kiran,

This error come sporadically for me. It does not happens all the time. Some time workflow triggers.

is your issue also sporadic one?

And can this solution work for my solution?

Just want to confirm before applying the solution on my system.

Thanks,
Indrajeet.

Mantrala, Kiran said...

Indrajeet,
This error came once in 3 days, but then increased the frequency to almost instantly. Meaning, whenever I had the workflow kick in, it failed. So, I had to fix the issue. Trying my solution should help you fix what you are facing, but, I will not guarantee it. Like I indicated, it worked for me and its been more than an year that I did not see this issue at all. The server is pounded with heavy load ever since, but still it never failed for me. Good luck.