Apr 25

Save and Restore Spool File challenges


When something that was working suddenly stops working it can be a real challenge to find out what has changed that caused the problem! We had been working on improving the spool file replication process for a couple of weeks and had made a number of changes to the source files, so when things stopped working unexpectedly we immediately thought we had introduced a bug somewhere.

For some reason all of the spool file replication requests on the test system started to error, every save and restore process failed when the target system tried to restore the spool file from the save file we had created and sent to the target system. I have to say the QSRRSTO API is a real pain when trying to configure for spool file restores, the documentation is pretty poor and what should work (according to the documentation) does not when coded. We spent many hours experimenting with the API and the structure passed to the API before we finally came up with something that would actually work, when things started to go wrong we immediately thought this was where we had introduced the bug or the design would not work in differing environments.

After a lot of experimenting and hair pulling we finally found out what the problem was! On one system we had the time zone set to QN0500UTCS and the other set to QN0500EST. While both of these are UTC – 5 hours this somehow caused the data in the save file to be incorrectly stored. I have attached a printout below to show what was happening.

Saved Object Information Page 1
5761SS1 V6R1M0 080215 SHIELD3 04/25/11 15:20:30 EDT
Save file . . . . . . . . . . . . . . : Q064313873
Library . . . . . . . . . . . . . . : HA4I61SPLF
Records . . . . . . . . . . . . . . : 88
Save operation:
Save command . . . . . . . . . . . . : SAVOBJ
Save date/time . . . . . . . . . . . : 04/25/11 15:11:36
Save while active . . . . . . . . . : *NO
Data compressed . . . . . . . . . . : No
Release level . . . . . . . . . . . : V6R1M0
Data saved:
Library . . . . . . . . . . . . . . : *SPLF
System serial number . . . . . . . . : 10-0B3AA
Private authorities . . . . . . . . : No
Objects
Object Type Attribute Owner Size (K) Data Text
*SPLF *OUTQ 16 YES
Spooled Files
Spooled File Creation Creation Output
File Number Job User Number System Date Time Queue Library ASP
QPDZDTALOG 000001 CHRISH#X21 CHRISH 050909 SHIELD2 04/25/11 16:11:36 QPRINT QGPL 1
Summary
Number of objects saved . . . . . . . : 1
Members . . . . . . . . . . . . . . . : 0
Access paths . . . . . . . . . . . . . : 0
Spooled files . . . . . . . . . . . . : 1
* * * * * E N D O F L I S T I N G * * * * *

You will notice that the Save date/time is 04/25/11 15:11:37 BUT the spool file creation time is 04/25/11 16:11:36 so the spool file was created an hour after it was saved! After checking the save file on the source system we found the save file creation data and the saved date both matched! Also the save file is copied from the source to the target complete so the data within it should be the same on both systems, except in this instance it looks like the data conversion is skewed by the timezone setting! Setting the QTIMZON system values to match resolved out issues but we still have a concern about this feature (its probably working as designed so its not a bug!) which IBM should take a look at. In our code we are setting the restore buffer criteria using the date and time we retrieved from the source save file to determine the right spool file to restore. This obviously does not work where the system sets the time in the save file based on different time zones.

So if you start to see a problem with restore operations and you are using the QRSRSTO API you might want to check this out before spending hours as we did looking for a coding error.

Chris…

Apr 20

Spool file replication requires good application practices


We have just spent a day looking at an issue with one of our clients and the HA4i replication process with some pertinent findings we thought we would share with everyone.

HA4i relies on the audit journal as its trigger for spool file replication, this means we see every request that generates a T SF entry. Once we get the entry we can see the data it contains and create the appropriate response for the HA4i programs to carry out. In this particular case the client was seeing thousands of errors being logged and resources being consumed at a terrific rate! At first we thought it was because the process was getting behind which meant the spool file was being deleted before we had a chance to save it for transport to the remote system, but closer inspection showed that the process was generating over 60 spool file requests per second. Even closer inspection showed that the spool files were being deleted as soon as they were created! This meant we were trying to save objects which had already been deleted. All the time this was going on HA4i was creating save files, trying to save the objects, creating delete commands and then logging the errors for each entry and still keeping up pretty well with the data being placed in the audit journal. So the question was why was this happening and what could be do about it?

There was one particular job involved which created these entries to one output queue, this job ran constantly and woke up every few minutes to carry out the spooling process. The client informed us that this job did not create the spool files it was simply opening the stream, checking if it had to write any data and if not close the stream. Unfortunately this means the OS creates a spool file every time it opens the stream and deletes the spool file when it closes. This is why we were seeing so many create and delete requests in the journal and at such a fantastic rate.

This shows how a simple design flaw can create a large overhead on the system, each time it opened a stream (printer file) it was allocating resources and deleting them on the close. We can see this because the spool file number was being changed and it allocated and deleted a member in the appropriate file in QSPL used to store the spool file data. We have suggested to the client that they change the program to check if data is to be written before they open the file, this will save countless resources on the system. They will not have to allocate everything only to delete them because nothing is written, the overhead of opening and closing the stream will be removed from the program, thousands of audit journal entries will be removed plus our processes will not be chasing their tails trying to keep up with an impossible task.

So if you are an application developer and see this kind of process being carried out on your systems, take a look to see if you can remove the unnecessary overhead and wasted resources. It will also make the use of a replication process a lot more viable. The IBM i is a fantastic box which allows a lot of sloppy processes to be employed with minimal impact, but just because it allows you to do it does not mean it is right. Our base principle in everything we do is try to keep it simple but make sure its resource aware. Always try to make the process lean and mean while ensuring it carries out the task effectively.

Chris…

Apr 18

Planning for COMMON 2011 at the end of the month

We thought it was about time we went back to the COMMON conference this year so we have bitten the bullet and decided to attend the Minneapolis event at the end of this month. As usual we are doing things at the last moment so we have a lot of things to get into place before we attend. We had considered taking a booth but our lack of commitment early enough meant we missed the chance to get everything in place in time. Maybe we will move fast enough to get a booth at either the next conference or maybe one of the European events?

Our main focus will be attending many of the sessions related to PHP and High Availability, so if you are there make sure you say Hi! If you are looking at High Availability there are a couple I would suggest you look at (Particularly if you are looking at a home grown solution) which are the Larry Youngren and Chuck Stupca sessions that are spread throughout the conference such as HA on a Shoestring.

I am personally looking forward to finding out what IBM and Zend have come up with for the Open Source PHP toolkit? One of the biggest problems I see with the IBM i and its PHP implementation is its lack of openness so maybe this will help remove some of those issues for me? We are also looking forward to meeting up with our friends at Aura Equipments who have a booth, if you are interested in PHP on the IBM i these are some people who you should make sure you pop round to see. They have a solution which allows you to connect to the IBM i from other platforms using PHP calls just the same as you do from the IBM i HTTP/Zend solution except you can talk to remote IBM installations. This brings a number of benefits one which we feel is important which is security. You don’t need to expose your IBM i to the internet to get IBM i content delivered to the internet! Performance is another big one for us as we only run small systems, but the new Zend/IBM announcements may reduce that somewhat if they actually deliver on their promises.

At the conference we will be sporting our new HA4i product logo which we have been working on for sometime. We feel it shows our commitment to the IBM i very well and doesn’t take a lot of interpretation for anyone to understand what the product is all about. Let us know what you think, I have put a copy of the logo’s below and I will be wearing the HA4i one proudly throughout the week. So if you see me make sure you pull me over for a chat about the product and what we have to offer.

Here is the new HA4i Logo, make sure you keep your eyes open for it.

HA4i ~ affordable Availability for the IBM i

HA4i Logo

This is the new DR4i logo, we won’t be sporting it at the event but look for it in the months to come.

DR4i ~ availability without complexity

DR4i Logo

We haven’t been posting much lately mainly because we are developing our own journal apply process, not that the IBM apply process is bad but we have found a couple of unique situations where it doesn’t fit the application environment too well. Not only that, but when we find anything that does need attention we have a lot of work to do with IBM to convince them a change would be to the benefit of everyone and as everyone knows that can sometimes be an uphill trudge. The basic technology for applying journal entries already exists for DB files because we developed it for the RAP product to apply the updates to the job data files. Now we have to cater for a lot more journal entries and as per usual it comes with a lot of quirks to work around such as IFS based entries where IBM doesn’t always store the object path but instead uses an Object ID which can differ between systems! We have the core functionality developed, all we have to do now is build the checks and balances for when things go wrong and make sure we have sufficient error checking to ensure it keeps running when errors occur.

The new Apply process should be available for the next major release which we hope to be announcing before the end of the year. Testing will probably take up most of that time! Other enhancements are available in recent PTFs which are available for download from the website. HA4i is developing fast and continues to provide an affordable High Availability solution for the IBM i market. If you are looking at Availability make sure you add HA4i to your list of products to look at, you may be surprised at just how affordable it can be.

If you are at COMMON make sure you say hello! I look forward to meeting new people and discussing what we can do for you. You never know, you might even be fortunate enough to catch me at the bar and manage to drag a free beer out of me!

Chris…