Recovery of Jobs on a Job Queue is something all of the HA products lack the ability to do. They can replicate the attributes of a job queue but not the content. In our view this is a major crack in any ones recovery process because without it you have no idea what state your application database is going to be in. A number of customers now see this requirement as key to recovery and have been using JobQGenie as their application of choice. Even if you don’t have a HA solution being able to reload jobs that failed with the same attributes as they first ran without lots of re-keying is a handy option.
We now have brought JobQGenie into a development stage where any problems we get are getting easier to understand and fix. A case in point was where a customer reported lots of jobs showing in the JobQGenie logs as being on the Job Queue when in fact they had run and ended. This was confusing for the customer and he asked us to investigate the problem with them.
First thoughts was this could be due to a timing issue, reviews of the logs showed some problems during the IPL where JobQGenie jobs were submitted but did not actually start collecting data until a full 3 minutes after the log said they had been submitted. During this time the other subsystems had been started and all of the jobs had run before JobQGenie had time to collect any information.
The puzzling part was why the end messages had not been processed as these are not time dependent. So we took a closer look and could see how JobQGenie had seen the messages but just skipped past them. We then started to look closer at why this could occur. The code seemed to show that any entry would be processed even if it was only just to set the Job State to ended.
After a couple hours we found out what the issue was. When we capture the data from the Exit Points we use the Internal job Identifier to tag each and every record in the data files. Because the system had been IPL’d with the Jobs still sitting on the Job Queue waiting to run during the IPL the Internal Job Identifiers for the jobs changed. So when we looked for the records in the files to update they would not be found because the Internal job Identifiers had changed!
We had to create a program which would reset all of the required Internal job Identifiers within the data queues and files so when the jobs ran we would have the new Internal job Identifiers ready to be linked.
The process takes a few seconds to run and should be run in advance of starting any job activity after an IPL. You can safely start JobQGenie before the process is run but allowing the jobs in the monitored job queues to run will result in missed data.
The new programs will be available once we have done more testing.
Other enhancements have also been added to this fix such as being able to filter by job queue when reviewing the job list, reduced job data search overhead by limiting the search criteria and collection of additional data.
Chris…