May 19

HA4i gets new one button roleswap.


One of the most sought after requirements for HA4i is a single command that will carry out the required actions on both systems for *PLANNED role-swaps. The clients also asked for exit points to be placed for programs to be called at each stage so they can carry out additional tasks while the role-swap is running. These features have now been added to HA4i.

We have built a set of processes which allow the roles-swap to be carried out using a single command from the source system, it will rebuild the journal environment for the applications to use, start the remote journals and start the required jobs on each system to carry out the replication tasks. To allow user programs to be called we have also added a new exit program (we also provide a skeleton program for users to modify) that can be used to run additional actions at each stage of the role-swap. As before we track what is happening at each stage so if a particular action fails the process can automatically start from that point and move forward. This can be carried using either restart of the *PLANNED switch from the source system or using an *UNPLANNED switch request from each system.

One of the big questions customers always ask is how long does it take to switch, well with 4 journals setup it took HA4i just over 2 minutes to switch on our small systems. That includes changing all of the journals, cleaning up the receivers, starting the remote journals in the correct direction plus starting all of the HA4i programs again. Is that important? Probably not, the biggest amount time taken for most roles swaps using any HA software is the time it takes to manage the other environment requirements such as starting the application and making sure the users can access the system. However it does provide the customers with a very simple role-swap capability and recovery options should things go wrong.

We are constantly updating the product and continue to add new features which make the product easier to use. For a free 30 day trial of HA4i contact us to discuss.

Chris…

May 13

Shock Horror Director for IBM i Navigator is better! Well it was! Now it is again… Maybe..

After a 3 days upgrade from V6.1 to V7.1 we thought things were getting better. Then we noticed that the HTTP server was starting which should not have happened as we had set it not to start any servers when it was running V6R1. So we thought we would go back into the *ADMIN server and set it not to start up again. As soon as we tried to access the pages it would end abnormally with a SSL error so we raised a PMR with IBM. Fix was fairly simple, we just deleted the customer config and restarted the server and we were in. The biggest shock was just how much faster it was then previously, not sure why?

We were also informed by IBM that we should install the latest PTF groups which we did. Unfortunately we then had no access to Director yet again? It just sits there for hours waiting to connect…..

So thinking back on what we had done previously I deleted the customer config again which removed the SSL connection and restarted the server again. We could then access all of the pages again and while the performance left a hell of a lot lot be desired it could at least be used. We decided to configure the SSL connection again to see if the upgrade had caused the problems we saw and we could not get back into the server at all, we just keep getting the access to / is denied message… So its back to no SSL until we get the time to understand what issue setting up SSL is giving us?

If you are going to use IBM Director for i Navigator and see the problems we did and you don’t need SSL turn it off, life is a lot simpler that way.

Chris…

May 12

Streamline the remote journal to get the most out of your bandwidth


One of the major concerns we see from HA installations is the bandwidth that is required to send the data from the source system to the target system using the remote journal process. Its not that the remote journal function is adding lots of overhead to the journal data but that a lot of people imply do not understand what you can do with a remote journal. So here is a quick overview of the actions you should take to reduce the bandwidth overhead between the systems for remote journalling.

1. The journal was not developed for transporting data between systems, it was developed as a tool which could be used to rebuild the system in the event of a failure. This is the reason you can use that same data to keep the data in synch between systems but it has a lot of information which is simply not required just for updating that data. As part of this the journal also carries a LOT of data about access paths, it requires this data to ensure the IPL times are kept to a minimum, if you have a system failure and access paths were not journaled you could be in for what they say is “the long wait”. But that information is not required or used on the target system for your HA product so the first things you should do is tell the remote journal function to ignore it! This just requires the *RMVINTENT setting on the journal to achieve and from V6R1 onwards is automatically set when you create a new journal. The problem is most companies have migrated old journals so the condition will not be set. This could save you a lot of bandwidth especially in customers who have large files and lots of logical files built over them (most ERP customers have this without question).
2. The journal can be setup to capture both before and after images, the standard today for all HA software is to only require the *AFTER image. If you have commitment control the OS forces you to use *BOTH for any object which comes under commitment control so it will set it automatically to *BOTH.
3. The journal adds a further 96 bytes of data to every entry in the journal, 56 bytes of this is not required on the target system by the apply processes and can be discarded without any problems. To remove these 56 bytes of data simply change the journal to use *MINFIXLEN in the RCVSIZOPT setting. For users with millions of entries being transported per hour this can significantly reduce the bandwidth requirements. This setting also improves the journal performance because it reduces the overhead required to collect the data, specifically the program information.
4. There is one other setting which can be used to reduce the amount of data being transferred, this is the MINENTDTA setting. This will tell the journaling function to only store the changed data in the journal entries and not the entire record. For those with applications that do a lot of updates this can reduce the data stored but those applications that do mostly adds its probably not going to show much improvement because the entire record is new. If you are not going to need to see the data in the journal entries use the *FILE and *DTAARA settings as this should give the biggest benefit. Using *FLDBDY will allow the data to be seen at the field boundary so there will still be some padding in the data.

Once you have done all of this you should see some significant gains in the bandwidth requirements. If you have already these setting you are on the right track, make sure you are using the new *MAXOPT2 setting or greater as this provides some performance improvements for journaling as well.

Chris…

May 12

Need your help with Save Restore Issue


In a previous post we mentioned a problem we had found with the save and restore process where objects that are restored over existing objects will not have all of their attributes restored. The only way to ensure the objects is fully restored is to delete the object before you do the restore. Biggest problem I see with that is if I am doing a RSTLIB, I would have to delete every object before running the command, that means if the media is damaged (see previous post again about how an OS upgrade took 2 days to complete because of damaged media) you have deleted the copies of the very object you have! In my mind that is not good enough for a system which is supposed to be the best BUSINESS system around!

So I would like to ask everyone who is reading this post to take action along with us. I have requested IBM fix the problem but there answer is quite simple, this is how it is designed to work! They only guarantee the integrity of the object data on the restore not the integrity of the object itself. This means you could have objects which are a mismatch of what you thought you had restored and what you thought you had replaced. Therefore the only way IBM will do anything about the problem is if people raise DCREQ’s asking for the design to be changed. Also it has to be something that helps the entire user base not just a few. I think this falls into that category but I am sure IBM will only look at if there are plenty of requests for it to happen. If you are a LUG member even better, you get a lot bigger say on what goes into development than anyone else, so if you are attending the next LUG meeting make sure you bring it up there as a big requirement.

If you don’t know how to raise a DQREQ I have listed a couple of links below. The COMMON link just puts it into a pile for COMMON representatives to consider, they will consider it and pass onto IBM if they wish to support it. Good thing is COMMON has some sway with IBM and what they put into development, bad thing is it has to go through a committee stage before getting passed onto IBM.

Here is the link for the DCREQ form that any IBM customer can fill in. DCREQ

Here is the link for COMMON’s DCREQ form COMMON request

You can also simply raise a PMR asking IBM to fix this, but you may end up going via the DCREQ channel anyhow. Eitherway, get involved and ask for this change its important for everyone.

Chris..

May 11

6.1 to 7.1 not for the faint hearted.


Well I am still installing the OS 28 hours after starting! The media IBM shipped was not good, in fact I have 3 duff DVD’s out of 6 which is definitely not a good batting average. Good job I could download them from the ESS site otherwise I would be waiting until IBM could ship me replacements. I thought it could be my DVD drive on the system so I copied the DVD’s to an image catalog using another system, transferred them to the system I am upgrading and after loading the Image Catalog they still refused to load which would point to duff media not the device.

I am not saying the 7.1 install is bad just the media IBM shipped. So if you are planning to install 7.1 and were shipped the DVD’s when it first came out you may want to check the quality of the DVD’s before going ahead with the install. Better still download new images from the IBM ESS site and use those.

Chris…

May 11

Upgrade i/OS 6.1 to 7.1 SRC B900 3460


Finally decided to bite the bullet and upgrade our development system to i/OS 7.1. The OS should be pretty stable now and we felt it was time to take advantage of some of the improvements.

The process was meant to be a fairly simple one, we would just save our existing system and start the installation process while we downloaded the PTF’s from Fix Central. That’s when things started to go a bit off plan. First problem we encountered was the Fix Central server gave us an estimated availability time of 2:55PM for our down load to be available, we thought OK we can live with that because the OS install is going to take a few hours anyhow so lets get moving. 2:55 came and went and every time we tried to resume the download it would move the time out further. So we contacted IBM who said they had just found out that there were problems with Fix Central and they expected my downloads should be available soon (that was 4:40). In the meantime we were installing the new version and the first part went well, the internal licensed code installed perfectly and it IPL’d to show us on the right level (7.1). So we inserted the new CD and it set off installing the objects required.

After some time we noticed the console had disappeared and was in connection wait, a quick check of the panel on the system showed B9003460. I searched through the web trying to find out exactly what this SRC meant and found 2 links, both which went to a V5R4 PSP. Even a search in the manuals failed to show exactly what this message was. So we took the nearest solution in the manual for the B900 XXXX code and restarted the installation. This time we watched it closely and saw the stage 5 install get through to 90% when the console disappeared and SRC came up on the panel again! It was now 6PM and the PTF’s were still waiting to be created before we could download them. I had had enough and decided I would contact IBM in the morning bright and early to get their help.

At around 8pm I received a note from IBM stating the download was now available for the PTF’s so I started that off ready for the next day, it finally finished downloading at 1:30AM. I decided that I would start the download of the DVD which was in error from the IBM ESS site so at 6:30AM I started the download and placed the call with IBM with a SEV1 asking for assistance. IBM called back and we found the SRC is actually pointing to a media error (good job I downloaded the image) so I burned the image to DVD loaded it and started the upgrade again, thankfully it started and moved through the rest of the DVD’s without further problems other than I have a number of failures on the LPP’s??

So if you are upgrading to 7.1 using the IBM shipped media and find SRC B9003460 on the panel you could have a media error which needs to be resolved. The main reason I put this up is to add to the number of hits you will find when you Google the SRC! I remember Steve Wills presentation at COMMON where he said the users had commented on just how easy the upgrade to 7.1 had been, I had an amusing chat when I logged the call, the support rep said “7.1 is that new! I have no logs for that OS yet!”. I am sure its all going to be worth the pain, honest..

Chris…

May 09

When is a restored object not a restored object?


We have been trying to figure out just what a restore operation does and does not do! We have for a long time believed the restore process will take the object which was saved and totally restore it over any existing object. That is not true! When you carry out a restore operation there are a number of attributes which are not restored if the object already exists on the system.

That does not seem right to us, but according to IBM its how it was designed! For years we have always told clients to clear out a library before making the restore just to make sure there are no surprises when they restore the new objects. Not that we thought the new objects would not be restored in their entirety but because we wanted to make sure any orphaned objects were cleaned up and the source library would look like the target library at that point in time. As it turns out there are a lot of other reasons to make sure you delete an object before you restore the new copy.

We have known for some time that a restored object keeps the existing objects audit value setting, that is if we have an object which we saved from one system which had its auditing value set to *CHANGE and restored it over an existing object which had *NONE the final object retains the existing objects value(*NONE in this case). Is that good? I don’t think so. After all the main reason we restore an object is to get an exact copy of what we saved isn’t it? Lets take a look at another small feature, we have a source file which is created with CLP as its type (could be anything but we will use this for clarity) which is saved and copied to another system. I then update the original object to type CLLE and because I am keeping the two systems in synch I take a save of the object and restore it to the other system. Would you expect the type of the target object to be CLP or CLLE? Well in fact it remains as CLP. Even if I change the data in the file and save and restore it again and again it will remain as a CLP until I delete the existing object on the new system and restore it again. To me that is a big problem! I have always thought the restore operation restored what I had saved and not decided which parts to restore and which parts not to.

I did take a quick look around the documentation to see if I could find a definitive list of the attributes which are ignored on a restore if the object already existed but could not find one, that’s not to say there isn’t one, just that I couldn’t find it. My concern is that others like me thought that a restore did the whole job and not a part of it, they have been manually restoring objects in the expectation that they would be the same.

So if you are doing restores and need the object to be exactly the same as the object on the save media make sure you delete the existing object first. Further more if you feel this is not correctly designed discuss your concerns with IBM.

Chris…