Jun 28

Restore a PF with all attributes


Because the IBM restore process does not guarantee (and does not in many cases) that the object will be restored with the same attributes as it was saved, you must delete the object prior to running the restore request to ensure the attributes are restored correctly. With many objects this is not a problem but with physical files it can be.

A physical file cannot be deleted if it has any logical files that are dependent on it. So to delete the physical file you must first delete all of the logical files then the physical before restoring the new physical file. Obviously this would leave a problem because the Logical files would have been deleted to allow the physical file to be restored correctly.

The solution was to save all of the logical files prior to carrying out the physical file process and once that had completed restore the logical files again. Pretty simple really but it has a few challenges. Firstly there could be many logical files built over the physical and they could be in any number of libraries so you have to be able to record what logical files there are and have the ability to save them. Once you have this information you also need to be able to roll back out of the process should anything go wrong at any point.

We spent some time looking at the options before we came up with what appears to be a pretty elegant solution. We did take the option to limit the number of logical files we would support (200) not sure if there are too many shops out there with 200 logical files built over a physical file (if you are one of those let us know) but we didn’t think there would be many. The command will take the file object which is to be restored and the save file the object is stored in, it then retrieves the logical file information from the physical file and carries out a save for every logical file and deletes the existing one. It then deletes the existing physical file and restores it from the passed save file before restoring the logical files and cleaning up.

If you would like to try out the command it will be available for download from the site soon, if you have a need to increase the logical file count let us know and we can provide an updated version.
This will also be integrated into the Shield products at the next relevant update.

Chris…

Jun 27

DSPJRN returns strange results.


We have a client who is running our HA software who recently had files showing up as missing from the target system but they existed on the source system. Our first thoughts were there must be some process deleting and adding the files back and not journaling them. In fact this is the case, but the problem we encountered was how to identify the actual reason for the delete and which process was doing it.

The client has limited knowledge about the application and the supplier promised faithfully that they are not deleting and adding files back. We had to be able to prove that it wasn’t the HA application that was causing the issue because as usual that’s where the finger always points. The first thing we did was run a DSPJRN against the journal on the production system, for any entries for the files and using the entire receiver chain to make sure we caught everything. The command came back with 0 entries converted, as we had no history with the files we thought it must be because the files were never journaled so we had never been keeping them updated. Easy fix, we just journaled the files and sync’d them up. Then we noticed the same files re-appeared in the audits a couple of days later, the audits just reported the files did not exist on the target. Again we looked at the DSPJRN command to see what had happened and it came back stating that the file had no entries in the journal?

We believed we must have done something wrong with the journaling and sync’ing of the file. So we made the same request to start journaling and re’sync, only this time we checked to make sure the files were journaled and they were the same on both systems after the sync. Next audit came up clean so we thought OK its fixed. It wasn’t, a couple of days later the files appeared in the audit reports as missing from the remote system again!

DSPJRN on the source system yet again showed no entries! So we looked at the logs on the target system and could see the files were synced in a particular receiver as we expected. We looked closely at the entries deposited by the APYJRNCHG command and could see the file was being deleted as part of the apply process. Using the data in the receiver we were able to track down the offending program and prove to the client that we were in fact working as expected and the application and a user were responsible for deleting the files. Now the client and application vendor have to decide how they want to handle the files and if they are important for recovery. They cannot auto journal new objects due to a high volume of temporary objects being created in the database library! Why developers don’t separate temporary objects from production objects is beyond us.. Another option would be to clear them instead of deleting them!

Enough of the Rant! What does appear strange to us is how the DSPJRN command worked. On the source system the object existed yet a DSPJRN failed to find any entries for the object, if the object had been journaled we could have believed the DSPJRN command used the JID but the object wasn’t. On the target system the object did not exist and yet the DSPJRN command did find all the entries for the particular object, like the source system its not journaled either so why did it find the entries? The above problem only appeared when we were looking for entries for a specific object, trawling through the receivers showed all of the entries for all the objects so they are there.

So if you run a DSPJRN looking for entries for a specific object be aware the results may not be right.

Chris…

Jun 20

Restore of a journaled object does not always restore the journal info!


Here is a BIG problem with the IBM i restore process. If I have a file which has been saved while it was journaled to JRNA and try to restore it over the same file which was journaled to JRNB the restore works just fine with no errors but the restored object is NOT journaled to the same journal as the restored object, even if the target object is no longer journaled it is still not journaled to the original journal. The problem is as we have described before IBM does not guarantee object consistency only data consistency for a restored object.

Even if we do not enter the ALWOBJDIF(*ALL) parameter the restore still does not restore the journal information with the object. For the restore to work and bring with it the journal information you have to delete the object on the target and restore, this then starts journaling to the correct journal and everything is as it should be.

But what about those objects with Logical files built over them? Before you can delete a physical file you have to delete all of the logical files that have been built over it. That can be achieved with a simple command to look for the related logical files or even an API to identify them before you delete the offending items. Then you can delete the actual physical file and restore the new object with the expected result that it will be journaled and it will be journaled to the right journal. Seems simple enough until you then have to go through and re-create all of the logical files which you just deleted to allow the restore to work correctly. Hopefully you have a save or can copy them from the source system in the instance we have where the journaling is required for High Availability.

I think this is a very bad design and IBM should fix it, but unless people get off their butts and make some noise IBM will sit happily behind the Working as designed clause it brings out for such occasions.

Chris…

Jun 16

HA4i PTF04 content


We have a new PTF scheduled for release in the next couple of weeks for HA4i. The main focus has been on adding new management features which improve the ability to automate certain functions. Below is a quick list about the changes we have made in this PTF.

The SyncMgr concept was introduced in PTF03, it is a mechanism for replicating journaled objects in a manner that allows them to be restored at the correct point to ensure the journal entries after the restore match up with the base object. New features provide a better overview of the actual replication process for the objects and changes to the save process such as compression and access path saves now improve the network utilization and speed of transfer. A couple of other nice features we added was a re-submission of failed requests, so if an object is not able to be saved due to object locks etc it is added back to the bottom of the list. This provides a much more robust process which ensures objects are replicated as soon as possible while not interfering with normal replication processes.

The Status Check process which looks at all of the running processes and reports any anomalies can now link to the Email Manager and send those status checks out to the registered email account. This provides a quick and efficient notification process when things start to go wrong. Things such as jobs going into message wait or jobs that should be running which are not are reported to both the QSYSOPR message queue and the email recipients.

Audits have also been improved in a number of ways but most significantly we have tied up the audit of files with the apply processing. Because the audit has to run when no activity occurs from the users and all existing data has been applied the audits tend to be submitted overnight hen no one is around, so we have delivered an automated process that will apply the existing data in the attached receiver before running an audit. This means the audit can be sure all available data has been applied prior to running the audits and the users do not have to run complicated cross system checks to make sure everything is as it should be. You can now submit a SBMSYNCREQ directly from the audit display file to re-sync any object that requires it using the SyncMgr technology.

As part of the improvements to the status displays we have added a new status field which shows the size of an object which is in error such as a file. This data is pulled from the source system to show the actual size of the object which will be replicated if a SBMSYNCREQ is made. This helps those who need to know just how big an object is and how long it will take to replicate once it is started. We have also provided the ability to not save the access paths with the file, many customers found that the access paths increased the size of a saved object significantly which in turn caused other issues as the object was saved and restored between the systems. The default is still to save the access paths but using a simple data area you can force the processes to ignore this setting and only save the object not its access paths.

If you have a very slow connection between the systems or the link is already fully utilized just managing the normal replication operations you may not have the ability to re-sync an object over the link. This PTF adds a new feature which runs the same save process as the SBMSYNCREQ but does not transfer the save file between the systems. Instead you will get a notification of the save file created which you can then save off to an alternative medium (tape or CD etc) and transport to the target site for restore. The apply process can be forced into a held state until you have restored the object to ensure the entry generated by HA4i to restore the object is not processed before the save file is actually available on the remote system.

Automation is a big requirement when it comes to doing role swaps, this PTF brings a new one shot role swap which allows a *PLANNED role swap to be carried out from a single system, the remote system is switched under the control of the source system without the users having to sign onto the remote system. *UNPLANNED role swaps still require each system to be switch independently. We have also taken the opportunity to add exit point processing to the role swap, at each stage a program will be called by the role swap program to allow other activities to be carried out. As usual if the process fails or the exit program fails the process is halted at that point awaiting a restart.

Object and Spool file replication have been updated and a number of new features added, the main one is a filter for multiple failures for the same object or command etc. Now you will only see one request even if the object is in error multiple times such as when an object is changed constantly but is always locked. Any errors can be re-submitted automatically using the commands provided as well as being re-submitted individually using the panel group options.

As always we have made many changes at the bequest of customers. They run the product in far more complicated environments that we are able to produce on our test systems so their input is very important to us. The above changes reflect our ongoing commitment to the HA4i product and providing our customers with a simple yet effective HA solution.

If you wish to speak with us about HA4i and how it can provide you with a Cost Effective HA solution let us know. The price may be a pleasant surprise.

Chris…

Jun 08

SMS4i Available for download


The licensed program product of SMS4i is now available for download. You will need to sign into the website to get a copy which provides a free 30 day trial code on installation. The manual is underway, but knowing most users it seldom gets read so not in a great rush to publish it. If you have difficulties understanding what to do here is a quick overview of the process.

1. Download and unzip the save file from the website using the links provided.
2. copy the save file to the IBM i.
3. RSTLICPGM (1SMS4IP is the LPP ID)
4. GO SPMMAIN to get to the menu (the library is already in the library list after the RSTLICPGM)
5. Configure the SMTP server and the senders email address Option 3 from the menu.
6. Start the SMS server (option 1 from the menu)
7. Configure any additional providers or users using the interface provided (Option 7 or 8)
8. Use the interface of choice to send a test message (Option 7 or Option 8)
9. Check the Log (Option 9) to ensure the message was sent to the SMTP server.
10. Check for any bounced email in the sending email accounts inbox.

That’s all there is to it.. Happy to help if you get stuck, we only have accounts with Bell, Rogers,AT&T and Virgin which all worked perfectly (I love it when a plan comes together). I will be posting a few screen shots etc later but its getting a bit late now!

If you want pricing let us know, its not expensive and should provide value to any IBM shop that requires simple event notification.

Chris..

Jun 03

SMS Text from the IBM i.


I had a call recently about sending SMS text messages from the IBM i. One of our partners provides a solution for this but it requires the purchase of a GSM modem and service contract with a cell provider to use. One of the things I have been looking at is how to build in some kind of SMS status messaging into our HA4i product so the clients request got me thinking.

SMS text are ideal for status messaging because they allow simple messages to be sent to mobile devices, this means not only our favorite smart phones but a lot of other phones that support SMS but not email and the like. HA4i has had the emailer built in for many years, but SMS was one area we just kept ignoring until now. As I have said the clients request peaked my interest again so I decided to take a look at what is available out there to allow me to build in a simple SMS status service.

I could have built a program to talk with a GSM modem which is probably fairly easy to accomplish, another option would be to look at the SMS text providers out there that supply a web interface to send SMS text to users at a very reasonable cost (about 2cents) using https strings over an SSL connection. But I finally ended up looking at doing it via the providers Email to SMS service. The nice thing about this method is its free to send the message, the receiver may attract costs if they do not have a text messaging service on their phone.

First thing we did was look for a list of the providers and their gateway address, on top of this we also found out that the cell number sometimes needs a prefix to work on certain providers networks so we had to take into consideration and format of the address on the email we send. Once we had all of this information we could then start to build the solution. Our first pass was going to be very simple, we want to be able to take a message input from a user, format it to the desired string and send out via the SMTP connection we use today.

As always the biggest time consumer is building the interfaces to the actual programs that do the work, we also had to build interfaces to allow access to the configuration data and be able to control the functions. The next step was to determine how the product would pick up the message requests and send them out to the SMTP server, as we have been using Data Queues for years in all of our products we decided to build a program that would accept requests via a data queue and send them out to the SMTP server. This program would be submitted to a specific Subsystem and Job Queue and would constantly look for a request from the data queue to end or send the message.

The first couple of send attempts failed due to programming errors (we forgot to NULL terminate some strings) but after that everything seemed to work out just fine. We have managed to send test messages to our Bell phones and Rogers cell phones without any problems (well we did have a problem with the Rogers phone because it had to be registered to receive the messages).

So if you are interested we now have a working IBM i to SMS messaging product, its not got all the bells and whistles of the major providers and we can only do one way text messages, but the cost of doing so for the sender is very attractive.

Let us know if you want to give it a spin, we will be packaging it up and posting it to the websites in the coming weeks.

Chris…