tracker issue : CF-3681626

select a category, or use search below
(searches all categories and all time range)
Title:

poor error handling when neo-cron.xml is corrupted or truncated

| View in Tracker

Status/Resolution/Reason: Closed/Fixed/

Reporter/Name(from Bugbase): Tim Parker / Tim Parker (Tim Parker)

Created: 12/09/2013

Components: Scheduler

Versions: 10.0

Failure Type: Usability Issue

Found In Build/Fixed In Build: Final /

Priority/Frequency: Minor / Few users will encounter

Locale/System: English / Platforms All

Vote Count: 8

Listed in the version 2016.0.0.297996 Issues Fixed doc
Problem Description: if neo-cron.xml does not contain valid XML, 'scheduled tasks' tab of CF administrator crashes.  Most (all?) attempts to use CFSchedule also fail with un-helpful message.  Users without intimate knowledge of CF internals will probably be forced to reinstall CF after hours (days?) of frustration.

Example:
   "The system has attempted to use an undefined value, which usually indicates a programming error,either in your code or some system code.Null Pointers are another name for undefined values."

stack trace:

   0: ........coldfusion.scheduling.CronServiceImpl.updateTask[cronserviceimpl.java:1128]
   1: ........coldfusion.tagext.lang.ScheduleTag.doActionUpdate[scheduletag.java:1036]
   2: ........coldfusion.tagext.lang.ScheduleTag.doStartTag[scheduletag.java:707]
   3: ........coldfusion.runtime.CfJspPage._emptyTcfTag[cfjsppage.java:2799]
   4.... <CFSchedule action="update"...>


Steps to Reproduce:
  1) manually modify neo-cron.xml so it is no longer valid XML (delete the closing </wddxPacket> tag, for example)
  2) (optional??) restart CF
  3) open CF administrator, browse to Scheduled Tasks tab <crash>
  - or -
  4) attempt to add/update a new job using CFSchedule


Actual Result:
  null pointer exception, useless error message

Expected Result:
  error message pointing to the cause of the problem - something like 'scheduling data has been corrupted, please use ColdFusion administrator to repair'.  In CF administrator, neo-cron.xml should be replaced with empty (and valid) file.
 - or - at least... we should see something like 'XML (WDDX) parse error in neo-cron.xml'...

in the context of the CF administrator (where access is nominally limited to appropriate/knowledgeable people)... the corrupted file should be replaced by an empty one and appropriate messages displayed... doing the replacement in the context of a CFSchedule call would be a bad thing, since evidence of the corruption would be lost.  Replaced file(s) should be renamed so diagnostic information is not lost.

  

Any Workarounds:
  replace corrupted neo-cron.xml with a valid one

Exception Stack trace for Dev/QE Reference:
================================
"Error","DefaultQuartzScheduler_Worker-8","06/02/15","15:00:12",,"java.lang.OutOfMemoryError: GC overhead limit exceeded "
coldfusion.server.ServiceRuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at coldfusion.server.ServiceBase.doSerialize(ServiceBase.java:258)
	at coldfusion.server.ServiceBase.access$100(ServiceBase.java:37)
	at coldfusion.server.ServiceBase$2.run(ServiceBase.java:204)
	at java.security.AccessController.doPrivileged(Native Method)
	at coldfusion.server.ServiceBase.serialize(ServiceBase.java:200)
	at coldfusion.scheduling.CronServiceImpl.store(CronServiceImpl.java:491)
	at coldfusion.scheduling.CronServiceImpl.updateAndStore(CronServiceImpl.java:1233)
	at coldfusion.scheduling.CronTask.execute(CronTask.java:80)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:560)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
"Error","DefaultQuartzScheduler_Worker-8","06/02/15","15:00:45",,"GC overhead limit exceeded"
java.lang.OutOfMemoryError: GC overhead limit exceeded
"Error","DefaultQuartzScheduler_Worker-5","06/02/15","15:01:22",,"GC overhead limit exceeded"
java.lang.OutOfMemoryError: GC overhead limit exceeded
"Error","DefaultQuartzScheduler_Worker-2","06/02/15","15:01:11",,"java.lang.OutOfMemoryError: GC overhead limit exceeded "
coldfusion.server.ServiceRuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at coldfusion.server.ServiceBase.doSerialize(ServiceBase.java:258)
	at coldfusion.server.ServiceBase.access$100(ServiceBase.java:37)
	at coldfusion.server.ServiceBase$2.run(ServiceBase.java:204)
	at java.security.AccessController.doPrivileged(Native Method)
	at coldfusion.server.ServiceBase.serialize(ServiceBase.java:200)
	at coldfusion.scheduling.CronServiceImpl.store(CronServiceImpl.java:491)
	at coldfusion.scheduling.CronServiceImpl.updateAndStore(CronServiceImpl.java:1233)
	at coldfusion.scheduling.CronTask.execute(CronTask.java:80)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:560)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
"Error","DefaultQuartzScheduler_Worker-1","06/02/15","15:01:22",,"java.lang.OutOfMemoryError: GC overhead limit exceeded "
coldfusion.server.ServiceRuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at coldfusion.server.ServiceBase.doSerialize(ServiceBase.java:258)
	at coldfusion.server.ServiceBase.access$100(ServiceBase.java:37)
	at coldfusion.server.ServiceBase$2.run(ServiceBase.java:204)
	at java.security.AccessController.doPrivileged(Native Method)
	at coldfusion.server.ServiceBase.serialize(ServiceBase.java:200)
	at coldfusion.scheduling.CronServiceImpl.store(CronServiceImpl.java:491)
	at coldfusion.scheduling.CronServiceImpl.updateAndStore(CronServiceImpl.java:1233)
	at coldfusion.scheduling.CronTask.execute(CronTask.java:80)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:560)

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3681626

Deployment Phase:	Release Candidate

External Customer Info:
External Company:  
External Customer Name: TParker
External Customer Email:  
External Test Config: My Hardware and Environment details:

Attachments:

Comments:

This does not happen in latest build.Are you still facing the issue?
Comment by Suchika S.
13847 | December 18, 2013 04:50:55 AM GMT
This may be harder to reproduce than I had thought - I repeated the original steps, as well as some more radical corruptions in neo-cron.xml - and it appears that CF has an additional data store for this data - from which neo-cron.xml seems to be getting reconstructed. The initial instance where this issue was seen was after restore from a backup (apparently somewhat corrupt, given the state of neo-cron.xml). We have also seen the problem on some other installations where neo-cron.xml has zero bytes, resulting in the same crash.
Comment by External U.
13848 | December 18, 2013 10:41:39 AM GMT
Can you please tell us exactly the installation instances where it resulted in crash? It will be easy to reproduce . Thanks , Suchika
Comment by Suchika S.
13849 | December 19, 2013 12:54:23 AM GMT
+3 I've had a corrupted neo-cron.xml file 3 times now, once on production. In order to get things working I had to copy another neo-cron.xml file into the corrupted (blank) one plus ensure the existence of neo-cron.bak to get things working again. My best guess why this happens is that CF is shutting down mid update and then on restart it's not able to restore. Please fix this as it causes a lot of panic if it happens on production as you have to then make sure to restore all the tasks appropriately.
Vote by External U.
13866 | May 12, 2014 07:54:52 AM GMT
Many times scheduled tasks are critical to business processes, and the fact that they're "out of sight-out of mind" means they really need to be reliable. Please make whatever changes are necessary to avoid ever having the neo-cron file become corrupt.
Vote by External U.
13867 | May 12, 2014 08:41:47 AM GMT
We just traced an unstable server back to this problem - CF10 with updater 13 - so this clearly hasn't been fixed. Still no clue as to where the corruption is happening, but we would at least like to see some error handling improvements so we get a message like 'neo-cron.xml is corrupted' instead of an apparently random null pointer exception with no useful diagnostic information (at least to anyone without the source code for CFSchedule internals) As with other cases, neo-cron.xml had been somehow truncated to zero bytes Sample CFSchedule instance (in this case, action="update", but the action probably doesn't matter) <cfschedule ACTION="Update" URL="http://myBox/something.cfm" PORT="80" TASK="myTestJob" OPERATION="HTTPRequest" PUBLISH="yes" PATH="c:/temp/logs/" FILE="myTestJob.log" INTERVAL="once" STARTDATE="2014-09-03" STARTTIME="11:15:37" REQUESTTIMEOUT="7200"> Stack Trace fragment (from null pointer exception): 0: ........coldfusion.scheduling.CronServiceImpl.updateTask[cronserviceimpl.java:1128] 1: ........coldfusion.tagext.lang.ScheduleTag.doActionUpdate[scheduletag.java:1036] 2: ........coldfusion.tagext.lang.ScheduleTag.doStartTag[scheduletag.java:707] 3: ........coldfusion.runtime.CfJspPage._emptyTcfTag[cfjsppage.java:2799] 4: CFM....{caller} ============== Bottom line here is three things: 1) improved diagnostics - at a minimum, throw an exception with a useful error message. This could easily save a development team many hours or days, since most of us don't routinely inspect our neo-xyzzy.xml files for corruption - nor do we routinely look at the various log files written by CF [that's a separate topic - there's much room for improvement in this area] 2) improved recovery (if neo-cron.xml is empty, all scheduling information is lost - so consider replacing it with an empty instance so further CFSchedule invocations don't crash - and, if the root cause remains difficult to track, consider maintaining backups which can swap in when the zero-length (or otherwise invalid neo-cron.xml) case recurs. This probably wouldn't prevent all data loss, but it could be enough to keep a critical system running [the assumption is that anything not in the backup is probably a recent change] 3) figure out the cause of the corrupted/truncated file - clearly the end goal, but also likely to be the hardest part...
Comment by External U.
13850 | September 03, 2014 01:45:38 PM GMT
Can you attach the original neo-cron file you were having. We will try to repro the issue with the same set of tasks you were having and check where exactly file gets corrupted(if it gets)
Comment by Uday O.
13851 | November 24, 2014 04:41:54 AM GMT
We are experiencing the same error on our development server. The error was noticed when we tried to used cfschedule to create / update a task. The resulting error was null null. The neo-cron.xml was last modified on 8/13/2013 and is empty.
Vote by External U.
13868 | December 01, 2014 03:10:30 PM GMT
This occurs if the PC crashes or CF is terminated due to high memory causing full CPU usage. CF is unfortunately updating the cron file at the time of the crash which results in an empty file. This is so the cron know when to run next I presume. The severity of this needs to be incerased as I have to revert to a backed up version of the cron and bring the CF service offline to replace it. 2 suggestions:- 1) Write the next run times to a separate "next run" file which can be automatically recreated if the previous one is corrupt. 2) You should write the updated schedules to neo-cron-tmp.xml, delete the neo-cron.xml then rename neo-cron-tmp.xml to neo-cron.xml - If for any reason the server doesn't get to do the rename part of the process, when the service is restarted it can rename the file at that point. Or if you have any better solutions - But please, please FIX THIS.. Invoices and scheduled emails not being sent to customers is embarrassing!!!!
Vote by External U.
13869 | January 29, 2015 11:09:00 PM GMT
PLEASE VOTE UP!!! >>>>>>>>>>>>>>>>
Comment by External U.
13852 | January 29, 2015 11:09:39 PM GMT
Happened again this morning after windows update restarted the server - XML empty, invoices not sent!!!! Please help - this is costing money!
Comment by External U.
13853 | February 01, 2015 04:52:48 PM GMT
Following cf11 crash, I had this neo-cron.xml erased (size 0kb). Obviously, it would help to have a new valid one created.
Comment by External U.
13854 | April 02, 2015 05:31:31 AM GMT
Following cf11 crash, I had this neo-cron.xml erased (size 0kb). Obviously, it would help to have a new valid one created.
Vote by External U.
13870 | April 02, 2015 05:32:22 AM GMT
We've just encountered this situation, and since we're running Standard, not Enterprise, we can't recreate the XML file via the Administrator. We're going to start backing up the ColdFusion engine files as well as our content files, but that doesn't help much at this time. At a minimum, a useful error message would save a lot of time figuring out what happened.
Vote by External U.
13871 | April 07, 2015 10:13:39 AM GMT
the content of neo-cron.xml has gone - can you please FIX this error - I cannot afford for scheduled tasks to keep being forgotten so that I have to rebuild them
Comment by External U.
13855 | April 13, 2015 09:53:09 AM GMT
Hi Tparker & All, We tried corrupting the neo-cron.xml, as per your suggestion. Also, we tried using the cfschedule tag, but it appears to be working in every scenario. Please send an email at cfinstal@adobe.com with reference to this bug, and we will reach out to you.
Comment by Anit K.
13856 | June 17, 2015 03:39:29 PM GMT
Hi All, We would need your help in understanding and reproducing the issue. We are unable to validate the same, with our setup. Please send an email at cfinstal@adobe.com with reference to this bug, and we will reach out to you.
Comment by Anit K.
13857 | June 22, 2015 07:02:15 AM GMT
[Third Reminder] Hi All, We would need your help in understanding and reproducing the issue. We are unable to validate the same, with our setup. Please send an email at cfinstal@adobe.com with reference to this bug, and we will reach out to you.
Comment by Anit K.
13858 | June 25, 2015 07:23:41 AM GMT
In the absence of others replying to this log, I will pick it up because we have had it occur on a number of occasions. As far as I can tell the issue occurs when there is HEAP space memory error when running a scheduled task or a process that takes a long time to complete. It may also be linked with SQL locking errors. What I have tried to do is to rewrite queries that update tables to reduce the number or records being updated and thus the time a specific query runs. This appears to have been successful but it may be that the ColdFusion server (CF11 Update 5) could be better tuned to increase performance My issue is that while I am unclear on the cause, the impact is huge and not clear it has occurred. I can send logs etc… please advise I have also sent this to the address listed below Regards Richard
Comment by External U.
13859 | June 25, 2015 08:11:11 AM GMT
Richard, please do share log at cfinstal@adobe.com and mention this bug#.
Comment by Priyank S.
13860 | June 25, 2015 06:13:30 PM GMT
Also encountered this. CF 10 Update 15, Standard Edition. Hoping to restore neo-cron.xml from our server backup. This has been a huge productivity drain for our staff, and is impacting our ability to meet SLAs.
Vote by External U.
13872 | July 21, 2015 05:02:09 PM GMT
Hi All, We haven't got neo-cron.xml, from any of the user, with the corrupt preferences. We are closing this bug. Please send an email at cfinstal@adobe.com with reference to this bug, whenever you have the required corrupted files.
Comment by Anit K.
13861 | August 03, 2015 07:06:18 PM GMT
Hi Anit I have replied to this - the problem is that the neo-cron-xml is blank - I still don't know what you want it for. What we need is some means of reporting this when it happens - .i.e if neo-cron.xml is size 0 then do something because it has been wiped Regards Richard
Comment by External U.
13862 | August 04, 2015 02:31:04 AM GMT
+1 - I've seen this many times where neo-cron.xml becomes 0 byte
Vote by External U.
13873 | September 23, 2015 12:28:58 AM GMT
Fix is made such that neo-* files won't get emptied and at the same time backed up files would always be valid ones(backed up won't ever be a 0 sized file like it happens now).
Comment by Krishna R.
13863 | September 30, 2015 11:23:19 AM GMT
Which update is this fixed in? I have this problem frequently on several different servers with various clients. It's a serious bug and has been around for years. I'm not entirely convinced that the bug has been fixed, based on the discussion below, and would like to see more information about what has been done.
Comment by External U.
13864 | July 28, 2016 03:21:53 AM GMT
Krishna, I have a couple of follow-up/clarification requests for you. And after that I have some thoughts for other's dealing with this. 1) As the previous user asked in July, can you confirm specifically when this fix (which you mention on 9/30/15) was made, in terms of both the version of CF (2016, 11, and/or 10) and the update, and especially for 10 which was the version originally discussed in this bug? I have looked at the bugfix lists for the two CF10 updates before and after then (16-20) and don't see anything seemingly referring to a fix related to scheduled tasks, the neo files, or crashes. 2) And would I be reading this right to say that the "fix" is not so much one that will PREVENTING a 0-byte XML file (as may happen on a CF crash) but rather that the fix was to prevent the previously created neo-cron.bak file from being OVERWRITTEN with that 0-byte file? If so, then people should understand that they're not saying you will never get another 0-byte neo-cron.xml file (a separate problem that could and should be solved) but rather that at least you should be able to recover by restoring the neo-cron.bak which should be good from the last time it was written (though we could ask also that at startup, the system could check and see the 0-byte neo-cron.xml and just auto-restore the neo-cron.bak for us, rather than leave people panicked about "having lost all their scheduled tasks"). All that said, I had someone get this while on CF10 update 18, which is from Nov 2015, so two months after Krishna's comment. and they said that the neo-cron.bak WAS 0-byte also. I sure hope you did mean that you fixed this for CF10 as well as 11 and 2016, but it's unclear above. And if you did, and it should have been in CF10 u18, it's a bummer that this client STILL had a 0-byte bak file. ... For folks still suffering the bigger problem here, of lost/corrupted scheduled tasks and 0-byte no-cron.xml files, as I say I've been working with someone experiencing these recently, and based on that and reading all the comments here, I have some thoughts which MAY give a little more perspective to what's happening. I welcome feedback: a) You may wonder "how could the neo-cron.xml get corrupted by a crash in the first place?", Well, it seems that if CF crashes (in a certain way) while CF is in the process of editing any of these neo-*.xml files (in the cfusion/lib folder, or ]instance]\lib if using multiple instances), then they get left as 0-byte files. b) And you may wonder, "why does this happen to some people and not others?" I think there are two facets to that. First, judging from comments others have made here and some of my own experience here, it seems it only happens when CF/the JVM is so impacted that you have to kill it or it itself crashes. (Many people refer to CF "crashing" when instead simply it is no longer processing requests, and they have to restart it. I'd bet that if CF "restarts" when requested, this problem of 0-byte files would not occur. It's likely only when CF is killed or crashes itself, non-gracefully.) c) So you may wonder next, "yeah but it's getting corrupted even when we haven't touched the scheduled tasks in weeks or longer. Why would it be being 'written to' by CF....and why isn't this happening to other neo xml files"? There's a seeming answer there, too. (I say "seeming" because all these are just my assertions for now. I hope Adobe or others will chime in with confirmation or proof, or I may get to create specific proofcases.) The issue may be that the neo-cron.xml is being edited NOT because YOU are are editing the scheduled tasks but rather because CF itself updates that file whenever a schedule task is FIRED. Yep, it tracks in that same file the date/time of the last execution. And if you think about it, if you have one or many sched tasks which fire frequently, like each minute, then there would be a FAR greater chance that CF would be trying to edit that file when for whatever reason it may crash/hangup. (And of course, not even everyone who runs tasks frequently necessarily has reason for CF to crash this way.) So that may explain why this happens to some folks more than others, for what that may be worth (and perhaps is another reason Adobe said in some comments that they could not "recreate the problem"). So might there be another "solution"/fix to this problem? It seems so. Perhaps the tracking of the last run of these tasks ought to be moved out to its own file (as another commenter here proposed), so that it's only THAT file which is being updated so frequently and at worst it's only THAT last-run information which is lost rather than ALL scheduled tasks. Looking forward to thoughts from Adobe or others on this. (And if you ARE suffering this still, it may be helpful to start clarifying the specific CF version and update level you're on.)
Comment by Charlie A.
13865 | December 23, 2016 05:34:30 PM GMT