Dec 09

High Availability Secret Weapons

I just read an article in the IT Jungle which made me smile to myself, basically it was stating how a competitor of our High Availability Products had a secret weapon with their simulated role swap process. Firstly its not secret, they use it to sell against their competition so its a feature.

The fact that it is showing the feature off as being one that no one else has is fine, but to then go on and trash someone else’s idea (Focal Point Solutions new Flashcopy process) stating that theirs is better made me take a closer look at the content of the “Story”. A few points really stuck in my mind as being a little misleading so I though I would give my own perspective on what is being said.

The main point is really that people who invest in High Availability hardly every test that it works as it should and so fail to deliver on the expectations the solution should provide. Having just been involved with a client who was running a competitive product of ours and not the one mentioned I fully agree with that statement, had this client actually bothered to test the environment at all they would have identified a number of significant issues prior to needing to use the backup system. The client actually failed to switch over correctly and lost a lot of time and data in the process. None of this was the fault of the High Availability Solution but simply that the client had failed to maintain the environment at all and the processes required to make the switch effectively were just ignored.

The next statement made me think, they say that this very important feature! yet it is only available in the Enterprise version? If its so important and so effective why is it only important for Enterprise level clients to get the use of this feature? The point that they are trying to enforce is that High Availability users need to test regularly, but in the next statement they state it is only available in their premier product? Surely it should be available in all their levels of product?

Simulated roleswap? Because they do not switch actual production processing to the target system, that means they are expecting the client to decide what to run on the target system to determine if the switch would actually work in the event a roleswap would be needed. So it won’t be running a real production environment which means they may not run everything that would run in a true production environment. This is normal and not something that should be a concern, but what is the difference between that and just turning off replication while a test is run and then creating a recovery position to start the replication again? Maybe its because it is automated. The point is that if it is not running a REAL production environment all that you are doing is testing the test you have developed runs! Roleswaps which have the actual PRODUCTION run on the system are the best way to check that everything is going to work as it should. Running a simulated roleswap is just a backstop to test new features and that the scripts which have been written to carry out the roleswap are going to work.

The Focal Point Solutions offering does allow the same level of testing that this simulated switch does, the comment about it not being a valid approach because the environment used to do the test is not what the client will eventually use is absolute bunkum! The biggest benefit the Focal Point Solutions has over this offering is that the Recovery Time Objective is not affected at all during the time any testing takes place. Their recovery position is totally protected at all times and if the client needs to switch mid way through testing they can do it without having to reset the target environment and then catch up and PRODUCTION changes which have been stored while the test was being set up and run. To me that is a far better solution than having to switch over to the target system for testing. Our LVLT4i product also offers a similar approach because we just take the backup and use it for testing and I see no additional benefits a simulated roleswap will offer? With the LVLT4i approach the comment made about not testing on the same machine you will use is also mute, the target system is only a backup for the iASP data, when a switch is required the data will be migrated to another system for the client to access and use. I have not dug to deep into the Focal Point Solutions offering but if it gets the High Availability Clients to test more, it has to be a better offering than one which does not provide such opportunities.

With all of that being said, a test is a test is a test is a test! If you really have to switch due to a disaster the roleswap will be put under a lot more pressure and a lot of the testing you have carried out may not perform as it did during the test. No matter what type of testing you do its better than doing nothing, stating one method better than another is where you have to start looking at reality. We encourage everyone to take a look at the new products and features out there. High Availability is changing and peoples requirements are also changing, the need for Recovery Time Objectives that are in the minutes range are not for everyone and very rarely ever met when disaster strikes. Moving the responsibility for managing the High Availability Solution off to a professional organization that specializes in providing such services to clients may be a lot better and a lot cheaper than try to do it all yourself.

If you are interested in discussing your existing solution or want to hear about one of our products let us know. We are constantly updating our products and offerings to meet the ever changing needs of the IBM i client base.

Chris…

Dec 08

System Values and LVLT4i

System values are an important part of the working environment on the IBM i, therefore it is important that are correctly set ready for whenyou move to a recovery system. LVLT4i is working in an environment where the setting of the System Values as part of the replication process is not an option in just the same way we cannot replicate Profiles and authorities. So we had to come up with a process which would allow us to build the required environment as part of the recovery process.

When we first looked at how we could use LVLT4i we were thinking that the recovery process would use a system save process to recovery the clients environment and then restore the iASP data over it to bring the client data and objects up to the last transaction. That was one of the reasons that the Recovery Time Objective was going to be so long, it takes quite some time to restore a system save. Even if we used Image Catalogs for the restore it was still going to take a significant amount of time, this encouraged us to start looking at the options we had.

One of the major advantages we wanted to push for LVLT4i is the ability to take a backup of a clients applications and data from the iASP and use it for things such as DR testing, application upgrade and OS upgrade testing. To do this we envisage the Managed Service Provider having a recovery partition running the correct level of OS for the clients, the back-up of the iASP could be copied over to the running environment and the client could do their testing without affecting their current DR position. Once the test was completed the system could be scratched and made ready for the next client to use. As part of the discussions we looked at how we could speed up the save and recovery processes (see our Blog entry on saving to a QNAP NAS) using the image catalog technology so that the Recovery Time Objective could be reduced to an absolute minimum. Those programs we created for the testing are actually in use in our environments and have significantly reduced the save times plus provide us with a much faster recovery time should we ever need to set in motion a recovery.

Profiles and Passwords were our first priority because they tend to change a lot, we came up with a process that allows the Managed Service Provider to restore the iASP data and then using automated scripts recover the User Profiles and Passwords before setting the authority. Profile recovery has already been implemented in LVLT4i and testing shows that the process is very effective and fast. The next item we wanted to cover was system values, again as with User Profiles they cannot be replicated to the target system from the client. Using the experience we gained with the storage of the profile data etc. we have now built a retrieval process that will capture all of the system values and then keep those system values in sync. When the client recovery is required scripts will be run that will allow all of the captured system values to be set on the recovery partition.

We believe that LVLT4i is a big step forward in being able to provide a recovery process for many IBM i users, even if they have an existing High Availability product in use today they will see many benefits from using it as their preferred recovery tool. We are noticing that many of those companies that implemented a High Availability Solution are not able to keep up with the changing technology being provided, this means that the recovery capabilities of the solution are being eroded and their value is no longer what it used to be. Data protection is the most important point of any availability solution so managing it needs to be a top priority, having a Recovery Time Objective of 4 – 12 hours should be more than enough for most of the IBM i community so paying for a Recovery Time Objective of minutes is not practical or beneficial.

LVLT4i when managed by a reputable Managed Service Provider should provide the users with a better recovery position and at a price that meets even the tightest of budgets. We believe that Recovery solutions are better managed by those who are committed to them and who continue to develop the skills to maintain them at all times. Although we are not big “Cloud” supporters, we think LVLT4i and the services offered by a Manage Service Provider could make the difference in being able to see value from a properly managed recovery process, offloading the day to day management to a service provider alone should show significant savings.

If you would like to know more about LVLT4i and its capabilities please call us and we will be happy to discuss. If you prefer to use Email we have a contact process on our website under contact us that you can use.

Chris…

Dec 02

Getting the most from LVLT4i

While it is early days for the LVLT4i product we have already had a number of interesting conversations with IBM i users and Managed Service Providers about how we see it being deployed to the smaller IBM i user base.

Price advantages
For the smaller IBM i user the thought of going to a full blown High Availability Solution has always been one that comes with thoughts of big budgets and lots of heartache. The clients need a duplicate system plus the infrastructure required to allow the replication processes to sync data and objects between the systems. Add to this licenses for the High Availability Product, OS and ISV software means that many clients believe availability protection at this level as a viable option.
Even if they identify a Managed Service Provider who could offer the target environment, they still see this is as something beyond their budget.
LVLT4i is aimed at easing that problem, this a Managed Service offering with subscription based pricing based on the clients system (IBM Tier group), this allows the MSP to grow the business without having to invest in up front licensing costs while providing a hardware platform which meets their customers requirements. The iASP technology also reduces the costs for the Managed Service Provider because they can run many clients on a single target LPAR/system removing the one to one relationship generally seen in this scenario. The client will only pay a monthly fee, he will have no upfront capital expense to get signed off and will probably find the target systems are much faster and newer than his existing systems.

Skills advantages
We have been involved with IBM i (and its predecessors) for nearly 25 years in the High Availability market and we have carried out a lot of High Availability software implementations. During that time we have seen a lot of the problems people encounter when trying to implement and manage a High Availability environment. Moving that skill requirement to a Managed Service Provider will bring a number of benefits. The client staff will not have to keep up with the changing capabilities of the High Availability Product, they can concentrate on their main focus which is providing a IT infrastructure to meet the business’s needs. Installation and ongoing management of the replicated environment will be managed by the Managed Service Provider, no more consultancy fees to the High Availability Software provider every time you need to make a minor change. The Managed Service Provider will have a lot of knowledge spread throughout their team and many of that team will have specialist skills that can be brought in to figured out problems.

Technology advantages
LVLT4i uses iASP technology on the target system, the clients system will continue to use *SYSBAS so no changes are required for the clients applications. When the client needs to test or recover the iASP data is saved and restored back to *SYSBAS. This brings some added advantages because the content of those iASP’s can be saved and restored at any time to another LPAR/System for testing. This will allow you to test a new release of software without impacting your current production or recovery position, LVLT4i will continue to keep the recovery partition in sync. Recovery testing will be improved because you will be able to check that the recovery procedures you have developed work, all of this while your existing recovery protection is maintained. Being able to check if a new application update works, check out your application on a new release, check the migration of data to a new release/application, all of these can be carried out without affecting your production or recovery position. If you need extra backups to be taken these can be carried out on the target system at any time during the day, suspending the apply processes while the backup is performed or doing a save while active is not a problem.
The technology which is implemented at the Managed Service Provider will probably be much newer and faster than the client would invest in, this means the advantages of running on the newer systems and OS could be shown to the clients management and maybe convincing them that their existing infrastructure should be improved.
JQG4i will be implemented for those who need job queue content recovery and analysis, this means you can re-launch jobs that did not complete or start using the exact same parameters they were launched with on the source.

LVLT4i is the next level of protection for those who currently use tapes and vaulting for recovery. The Recovery Point Objective is already the same as a High Availability offering (at the transaction level) while the Recovery Time Objective in the 4 – 12 hours which is better than existing tape and vaulting solutions. We are not stopping there, we are already looking at how we can improve the Recovery Time Objective through additional automation and new replication processes, in fact we have already added additional features to the product that will help reduce the time it takes to recover a clients system to the recovery partition at the Managed service Provider. The JQG4i offer adds a new dimension to the recovery process, it brings a very important technology to the users that is not available in many of the High Availability offerings today, this could mean the difference between being able to recover or not.

Even if you already run a High Availability solution today you should look at this offering, having someone else manage the environment and provide a Recovery Point Objective/Recovery Time Objective this offers could be what you need. Many are running a High Availability solution to meet the Recovery Point Objective and not interested in a Recovery Time objective of minutes, this could be costing you more than its worth to maintain. LVLT4i and a Managed Service could offer significant benefits.

If you are interested in knowing more about LVLT4i and the Managed Service Providers we are working with let us know. We are actively seeking more Managed Service Providers who are interested in helping us build a better recovery solution for the IBM i user base.

Chris…

Nov 27

Operational Assistant backup to QNAP NAS using NFS

After a recent incident (not related to our IBM i backups) we decided to look at how we backed up our data from the various systems we deploy. We wanted to be able to store our backups in a central store which would allow us to recover data and objects from a know point in time. After some discussion we decided to set up a NAS and have all backups copied to it from the source systems. We already use a QNAP NAS for other data storage so decided on a QNAP TS-853 Pro for this purpose. The NAS and drives were purchased and set up with Raid 6 and a hot spare for the Disk Protection which left us around 18TB of available storage.

We will use a shared folder for each system plus a number of sub-directories for each type of save (*DAILY *WEEKLY *MONTHLY), the daily save required a day for each day Mon – Thu as Friday would either be a *WEEKLY or *MONTHLY save as per our existing tape saves. Below is a picture of the directories.

Folder List

Folder List

We looked at a number of options for transporting the images off the IBM i to the NAS such as FTP, Windows shares (SAMBA) and NFS. FTP would be OK but managing the scripts to carry out the FTP process could become quite cumbersome and probably not very stable. The Windows share using SAMBA seemed like a good option but after some research we found that the IBM i did not play very well in that area. So its was decided to set up NFS, we had done this before using our Linux systems but never a QNAP NAS to an IBM i.

We have 4 systems defined Shield6 – 9 each with its own directory and sub-tree for storing the images created from the save. The NAS was configured to allow the NFS server to use the share the Folders and provide secure access. At first we had a number of problems with the access because it was not clear how the NFS access was set, but as we poked around the security settings but we did find out where you had to set the access. The pictures below shows how we set the folders to be accessible from our local domain. Once the security was set we started the NFS server on the NAS.

Folder Security Setting

Folder Security Setting

The NAS was now configured and ready to accept mount requests, there are some additional security options which we will review later but for the time being we are going to leave them all set up to the defaults. The IBM i also needs to be configured to allow the NFS mounts to be added, we chose to have the QNAP folders mounted over /mnt/shieldnas1 which has to exist before the MOUNT request is run. The NFS services also have to be running on the IBM i before the MOUNT command is run otherwise it cannot negotiate the mount with the remote NFS server. We started all of the NFS Services at once even though some were not going to be used (The IBM i will not be exporting any directories for NFS mounts so that service does not need to run) because starting the services in the right order is also critical. We mounted the shared folder from the NAS over the directory on the IBM i using the command shown in the following display.

Mount command for shared folder on NAS

Mount command for shared folder on NAS

The following display shows the mapped directories below the mount once it was successfully made.

Subtree of the mounted folder

Subtree of the mounted folder

The actual shared folder /Backups/Shield6 is hidden by the mount point /mnt/shieldnas1, when we create the mount points on the other systems they will all map over their relevant system folders ie /Backups/Shield7 etc so that only the save directories need to be added to the store path.

We are using the Operational Assistant for the backup process, this can be setup using the GO BACKUP command and taking the relevant options to set up the save parameters. We are currently using this for the existing Tape saves and wanted to be able to carry out the same saves but have the target set to an Image Catalog, once the save was completed we would copy the Image Catalog Entries to the NAS.

One problem we found with the Operational Assistant backup is that you only have 2 options for the IFS save, all or nothing. We do not want some directories to be saved (especially the image catalog entries) so we needed a way to ensure that they are never saved by any of the save processes. We did this by setting the *ALWSAV attribute for the directory and subtree to *NO. Now when the SAV portion of the save runs it does not save the Backup directory or any of the other ones we do not need saved.

The image catalog was created so that if required we could generate physical tapes from the image catalog entries using DUPTAP etc. Therefore settings had to be compatible with the tapes and drive we have. The size of the images can be set when they are added and we did not want the entire volumes size to be allocated when it was created, setting the ALCSTG to *MIN only allocates the minimum amount of storage required which when we checked for our tapes was 12K.

For the save process which is to be added as a Job Schedule entry we created a program in ‘C’ which we have listed below, (you could use any programming language you want) taht would run the correct save process for us in the same manner as the Operational Assistant Backup does. We used the RUNBCKUP command as this will use the Operational Assistant files and settings to run the backups. The program is very quick and dirty but for now it works well enough to prove the technology.


#include<stdio.h>
#include<string.h>
#include <stdlib.h>
#include<time.h>

int main(int argc, char **argv) {
int dom[12] = {31,28,31,30,31,30,31,31,30,31,30,31}; /* days in month */
char wday[7][3] = {"Sun","Mon","Tue","Wed","Thu","Fri","Sat"}; /* dow array */
int dom_left = 0; /* days left in month */
char Path[255]; /* path to cpy save to */
char Cmd[255]; /* command string */
time_t lt; /* time struct */
struct tm *ts; /* time struct GMTIME */
int LY; /* Leap year flag */

if(time(&lt) == -1) {
printf("Error with Time calculation Contact Support \n");
exit(-1);
}
ts = gmtime(&lt);
/* if leap year LY = 0 */
LY = ts->tm_year%4;
/* if leap year increment feb days in month */
if(LY == 0)
dom[1] = 29;
/* check for end of month */
dom_left = dom[ts->tm_mon] - ts->tm_mday;
if((dom_left < 7) && (ts->tm_wday == 5)) {
system("RUNBCKUP BCKUPOPT(*MONTHLY) DEV(VRTTAP01)");
sprintf(Path,"/mnt/shieldnas1/Monthly");
/* move the save object to the NAS */
sprintf(Cmd,
"CPY OBJ('/backup/MTHA01') TODIR('%s') TOCCSID(*CALC) REPLACE(*YES)",
Path);
}
else if(ts->tm_wday == 5) {
system("RUNBCKUP BCKUPOPT(*WEEKLY) DEV(VRTTAP01)");
sprintf(Path,"/mnt/shieldnas1/Weekly");
/* move the save object to the NAS */
sprintf(Cmd,
"CPY OBJ('/backup/WEKA01') TODIR('%s') TOCCSID(*CALC) REPLACE(*YES)",
Path);
}
else {
system("RUNBCKUP BCKUPOPT(*DAILY) DEV(VRTTAP01)");
sprintf(Path,"/mnt/shieldnas1/Daily/%.3s",wday[ts->tm_wday]);
/* move the save object to the NAS */
sprintf(Cmd,
"CPY OBJ('/backup/DAYA01') TODIR('%s') TOCCSID(*CALC) REPLACE(*YES)",
Path);
}
if(system(Cmd) != 0)
printf("%s\n",Cmd);
return 0;
}

The program will check for the day of the week and the number of days in the month, this allows us to change the Friday Backup to *WEEKLY or *MONTHLY if it is the last Friday of the month. Using the Job Scheduler we added the above program to an entry which will run at 23:55:00 every Monday to Friday (we do not back up on Saturday or Sunday at the moment) and set it up to run.

On a normal day, our *DAILY backup runs for about 45 minutes when being carried out to a tape, the weekly about 2 hours and the monthly about 3 hours. From the testing we did so far, the save to the image catalog took about 1 minute for the *DAILY and more surprisingly only 6 minutes for the *MONTHLY save (which saves everything). The time it took to transfer the our *DAILY save to the NAS (about 300MB) was only a few seconds, the *MONTHLY save which was 6.5 GB took around 7 minutes to complete.

We will keep reviewing the results and improve the program as we find new requirements but for now it will be sufficient. The existing Tape saves will still run in tandem until we prove the recovery processes. The speed differential alone makes the cost of purchase a very worthwhile investment, getting off the system for a few hours to complete a save is a lot more intrusive than doing it for a few minutes. We can also copy the save images back to other systems to restore objects very easily using the same NFS technology and speeding up recovery. I will also look at the iASP saves next as this coupled with LVLT4i could be a real life saver when re-building system images.

Hope you find the information useful.

Chris…

Oct 24

PowerHA and LVLT4i.

We have had a number of conversations about LVLT4i and what it offers to the Managed Service Provider(MSP). As part of those discussions the IBM solution PowerHA often comes up as it also uses iASP technology but that is really where the similarity ends.

PowerHA uses the iASP to isolate the objects that are to be replicated to another system/storage device and it has an exact copy of the iASP from the source on the target. Changes are captured at the hardware level and are sent to the remote system as they occur.

LVLT4i only replicates objects to a remote iASP, it uses either Audit journal triggers or the Remote Journal technology to capture and send the data. The source object resides in *SYSBAS and the target object in an iASP, it is used primarily to allow multiple copies of the same library/object combination to be stored on a single system. The remote iASP is always available to the user.

iASP is not widely implemented at customer sites, this is in part due to the lack of support for iASP’s built into many of the applications that run on the IBM i today (many of the applications were built before iASP technology was available). For a customer to migrate an application to allow iASP use there are a number of constraints which have to be considered plus each users environment has to be adjusted to allow the iASP content to be used (SETASPGRP etc). This has further limited the use of iASP as many do not feel the benefits of moving to the iASP model out-weight the cost of migration. Another issue is you are now adding an additional storage management requirement, the iASP is disk based which will require protection to be added in some form. With LVLT4i you can leave your system unchanged, only the target system is going to need iASP setup and that will be in the hands of your Managed Service Provider. The decision about what to replicate is yours, with some professional help from a Managed Service Provider who knows your application it should be pretty bullet proof when it comes to recovery.

If you implement PowerHA you are probably going to need to set up an Admin Domain, this is where any *SYSBAS objects such as system values, profiles and configuration objects are managed. in LVLT4i we do not manage system values or configuration objects (configuration objects can be troublesome especially with TCP/IP) or system values. We have however just built in a new profile and password process to allow the security aspects of an application to be managed across systems in real time. Simple scripts can capture configuration and system value settings many of which are not important to your application so LVLT4i has you covered. If we find a need to build in system value or configuration management we will do so fairly rapidly.

PowerHA is priced by Core, so you license it for each Active Core on each system. Using CBU licensing, PowerHA can utilize lower active cores on the target and only activate them when the system is required. Unfortunately in a HA environment you are probably switching regularly so you will have the same number of active cores all the time. LVLT4i is priced by IBM tier regardless of the number of active cores. The target system license is included with the source system license regardless of the target system tier so a Manage Service Provider who has a P30 to support many P05 clients is not penalized.
PowerHA also comes in a few flavors which are decided on by the type of set up you require. Some of the functionality such as Asynchronous mirroring is only available in the Enterprise edition so if you need to ensure your application is not constrained by remote confirmation processing (waiting for the remote system to confirm it has the data) your are going to need the Enterprise edition which costs more per core. LVLT4i comes in one flavor and is based on a rental model, the transport of data over Synchronous/Asynchronous remote journals is available to all plus it supports any geographic model.

Because the iASP is always available the ability to backup at any time is possible with LVLT4i. With PowerHA you have to use a Flashcopy to make another disk based copy of the iASP which can then be used for the back up to tape etc. That requires a duplicate set of disks to match the iASP content. With LVLT4i you can use Save While Active or suspend the apply process for point in time saves, the remote journal will still be receiving your application updates which can be applied once the save has completed so data protection is not exposed.

RPO is an important number which is regularly banded around by the High Availability providers, PowerHA states it is 0 because everything is replicated at the hardware level. We believe LVLT4i is pretty close to the same but there are a couple of things to consider.

First of all, RPO of 0 will require synchronous delivery of changes, if you use an Asynchronous delivery method queued changes will affect that for either solution. LVLT4i uses Remote journalling for data changes, so if you use Synchronous mode I feel the two are similar in effect.

Because we use a different process for object changes, any object updates are going to be dependent on the level of change activity being processed by the object replication processes. The amount of data being replicated is also a factor as a single stream of object changes is used to transfer the updates. We have done a lot of work on minimizing the data which has be be sent over the wire such as using commands instead of save restore, pipe-lining changes so multiple updates to an object are optimized into a single action and compression within the save process. This has greatly reduced the activity and therefore bandwidth requirements.

PowerHA is probably better at object replication because of the technology IBM can access, plus it is going to be carried out in line with the data changes. The same constraints about using synchronous mode affect the object replication process so bandwidth is going to be a major factor in the speed of replication etc. Having said that, most of the smaller clients we have implemented any kind of availability for (HA4i/DR4i) do not see significant object activity and little to no backlogs in the object replication process.

The next recovery figure RTO talks about how long it will take from making the decision to switch, to actually switching. My initial findings about iASP tended to show a fairly long role-swap time because you had to vary off the iASP and then on again to make it available. We have never purchased PowerHA so our tests are based around how long it took to vary off and then on again a single iASP on our P05 system (approximately 20 minutes). I would suspect the newer and faster systems have reduced the time it takes but it is still a fairly long time. LVLT4i is not a contender in this role because we expect the role-swap times to be pretty extended (4 – 12 hours) even if you do a lot of automation and preparation.

One of the issues which affect all High Availability Solutions is the management of batch, if you have a batch process running at the time of failure it could affect the integrity of the application data on the target system. LVLT4i and PowerHA both have this limitation as the capture of job queue content is not possible even in an iASP, but we have a solution which when integrated with LVLT4i will allow you to reload job queues and identify orphaned data which has been applied by a batch process. Our JQG4i product captures all activity for specific job queues and will track each job from load to completion. This will allow you to recover the entire application environment to a known start point and thereby ensure your data integrity is maintained. Just being able to automatically reload jobs that did not run before the system failure is a big advantage that many current users benefit from.

There are plenty of options out there to choose from but each has its own strengths and weaknesses. LVLT4i uses the same replication technology as out HA4i and DR4i products with enhancements to allow the use of iASP as the target disk. It is not designed to meet the same RTO expectations as PowerHA even though both make effective use of iASP technology. However, PowerHA is not necessarily the best option for everyone because it does have a number of dependencies that make it more difficult/costly to implement than a logical replication solution, you have to weigh up the pros and cons of each technology and make a decision about what is important.

If you are interested in knowing more or would like to see a demo of the LVLT4i product please let us know and we will be happy to schedule.

Chris…

Oct 23

SAVSECDTA timing?

We are looking at how to manage the recovery of profiles and passwords in an environment where the profiles cannot be managed constantly. When using our HA4i product we have the ability to constantly maintain the user profiles and passwords because the user profiles are allowed to exist on the target system. However in an environment such as that required for the LVLT4i product User Profiles cannot exist because they may conflict with other profiles from other clients (All user profiles have to exist in *SYSBAS)

The process we have tested involves using the SAVSECDTA command to save the data to a save file, this save file can be automatically replicated to the iASP on the target system. The Profile information is captured in a file which is also replicated to the target iASP using normal replication processes (Remote Journals). When the system needs to be rebuilt for recovery the information collected in the SAVSECDTA file will be restored, the profiles will be updated using the profile data we have collected and then the RSTAUT command will be run. This will bring the system and profiles up to the latest content available.

While we were testing the processes we noticed a very strange thing. The first time we ran the request on a system it took a little while to complete about 1 minute, but when we ran the request again it took only a couple of seconds? The content of the save file was the same (we even set the compression level to high with no significant impact) but why is it taking so long the first time? We thought that maybe it was because the save file was already available (we put it in QTEMP) but again signing off and on then retrying gave us the same results, it now only took a few seconds to complete the save? Signing onto another system and doing the exact same process yielded the same results, the first time took about 1 minute while subsequent tries only took a few seconds.

We do not know what is going on under the covers but it certainly seems like something gets lined up after the first save, this leads us to believe that doing a SAVSECDTA on a regular basis (nightly?) may not be a bad thing. If you have any information as to why, let us know as we are very curious.

LVLT4i is new and while we feel the product should attract a number of Managed Service Providers we are interested in knowing what you think. Would you be interested in a solution that provides a very low RPO (close to zero data loss) with a RTO in the 4 – 12 hours time frame? If you are interested let us know, we will be happy to put you in touch with one of the MSP’s we have been working with. If you are a MSP and would like to know more or even see a demo of the product let us know as well, we are excited by the opportunities this could bring.

Chris…

Oct 20

New Product Library Vault, Why?

We have just announced the availability of a new product, Library Vault for IBM i (LVLT4i) which is aimed primarily at the Managed Service Providers. The product allows the replication of data and objects from *SYSBAS on a clients system to an iASP on a target system.

The product evolved after a number of discussions with Managed Service Providers who were looking for something less than a full blown High Availability Product but more than a simple Disaster Recovery solution. It had to be flexible enough to be licensed by the replication content not the systems being used to run it on.

We looked at our existing products and how the licensing worked, it became very apparent that neither would fit the role as they were both licensed at the system level plus HA4i was more than they needed because it had all bells and whistles associated with a High Availability product while DR4i just didn’t have the object capabilities required. So we had to look at what we could do to build something that sits in the middle and license it in such a manner that would allow the price to be fair for all parties.

Originally the product was going to be used in a LPAR to LPAR scenario because the plan was to use the HA4i product with some removed functionality, however one of the MSP’s decided that managing lots of LPAR’s even if they are hosted as VM’s under an IBM i host would entail too much management and effort. The RTO was not going to be the main driver here only the RPO, so keeping the overhead of managing the solution would be a deciding factor. We looked at how to implement the existing redirection process used for mapping libraries that HA4i and DR4i use, it soon became very apparent to us that this would not be ideal as each transaction being processed would require a lot of effort to set the target object. So we decided to look at how we could take the iASP technology we had built many years ago for our RAP product and structure it in such a manner which would meet all of the requirements.

After some discussion and trials we eventually had a working solution that would deliver an effective iASP based replication process. Next we needed to set the licensing to allow flexibility in how it could be deployed. The original concept would be to set the licensing at the library level as most clients would be basing their recovery on a number of libraries so adding the ability to manage the number of licenses against the number of libraries was started. What at first seemed to be a simple task soon threw up more questions than answers! The number of libraries even with a range was not going to be a fair practice for setting our price, some libraries would be larger than others and have more activity which would generate more activity for the replication process. Also the IFS would be totally outside of the licensing as it has no correlation with a library based object (nesting of directories) so it would need to be managed separately. We also recognized that the Data Apply was based solely on the Journal so library based licensing would not work for it either.

The key to getting this to work would be flexibility, we needed to understand this from the MSP’s position, the effort required to manage the set up and licensing had to be simple enough for the sales person to be able to go in and know what price he should set. So we eventually came back to the IBM tier based pricing, even though we have the ability to license all the way back to the object, CPU, LPAR, Journal etc. We needed to give the MSP flexibility to sell the solution at an affordable price without complex license charts. We also understand that a MSP would grow the business and probably have additional resources available for new clients in advance, so we decided that the price had to be based on the clients system and not on the pair of systems being used.

LVLT4i is just getting started, its future will be defined by the MSP community who use it because they will drive the development of new features. We have always felt that Availability is best handled by professionals because Availability is not a one off project, it has to evolve as the clients requirements evolve and develop. Our products hopefully give clients the ability to move through a natural progression from DR to HA. Just because you don’t need High Availability today doesn’t mean you wont later, we have yet to find anyone who doesn’t need to protect their data. Having that data protected to the nearest transaction at an affordable cost is something we want to provide.

If you feel LVLT4i is right for you let us know, we will be happy to put you in touch with one of the partners we are working with to discuss your needs. If you would like to discuss other opportunities for the product such as data aggregation or centralized storage let us know, we are always happy to see if the technology we have, fits other interests.

Chris…

Jun 05

What does V8R0 of HA4i look like?

While we wait for IBM to get back to us about our PowerVM activations (3 days and counting, I often wonder does IBM want to service clients?) I thought I would start to show some of the changes we have made in the next release of HA4i. The announcement date for the next release is a little way off as we still have to get the manual and new PHP interfaces finished, but we all feel excited about some of the new capabilities so we thought we would start to share.

As the PHP interface is not completed and we have found the IBM Access for Web product is performing very well, we thought it would be an ideal opportunity to show it off at the same time we display some of our new features. So far the displays have been pretty pleasing with no problems in showing the content effectively. Again we will point out the fact that the web interface is being run on one system (shield7) and the system running HA4i is another (shield8), the ability to launch a 5250 session from the web interface to another system without the web software running on that system is pretty neat in our view.

The first screen we will share is the main monitoring screen, this is a screen shot of the 5250 green screen output using the standard Client Access emulator.

5250 Roleswap Status Green screen

5250 Roleswap Status Green screen

Here is the IBM Access for Web output of the same screen, we have placed arrows and markers to show some of the features which we will describe below.

Roleswap Status Access for Web

Roleswap Status Access for Web

Arrow 1.
A)These are the options that are available against each of the environment definitions, these can be used to drill down into more specific data about each of the processes involved in the replication of the objects and data.

B)You will notice that we can end and start each environment separately, there is also an option on the operations menu which will start and stop every environment at once.

C) You can Roleswap each individual environment, the previous version only allowed a total system Roleswap.

Arrow 2.
A) Some environments should not allow Roleswaps to be carried out, we have defined 2 such environments to replicate the JQG4i data. Because the data is only ever updated on the generating system and each system has its own data sets you would never want to switch the direction of replication. The Y/N flags show that the BATCHTST environment can be switched while the JQG4i environments cannot.

Arrow 3.
A) These are the environment names, each environment runs its own configurations and processes.

Arrow 4.
A) This is the mode of the environment on this system *PROD states that this is a source system where the object changes are captured while the *BACKUP is where the changes will be applied. when viewing the remote system these roles will be reversed.

Arrow 5.
A) If there are any errors or problems found within any of the replication processes you should not carry out a roleswap, HA4i retrieves the status from both the local and remote system to determine if an environment is capable of being roleswapped based on the state of the replication processes. As you can see if an environment should not be roleswapped the entry is marked as *NA.

Arrow 6/7/8.
A) This is the state of the various replication processes, *GOOD states that there are no errors and everything that should be running is. *NOCFG states that no configurations exist that require the replication process to be running. Data status is the journal apply process and which could encompass more than one apply process if there is more than one journal configured to the environment.

Arrow 9.
A) You can view the configs from any system but changes to the configs can only be carried out on the *BACKUP system. the configuration pages can be accessed using this button (F17 on the 5250 Green screen).
B) The Remote Sys button (F11 on the 5250 green screen) just displays the remote system information.

There are a lot more new features in the next release which will make HA4i more competitive in complex environments, over the next few weeks/months we will show you what they are and why they are important. The big take away from above is the ability to define a much more granular approach to your replication needs. Becuase we can define multiple systems and multiple environments HA4i is going to be a lot more useful when you need to migrate to new hardware and expand data replication beyond 2 systems.

We hope that you like the features and if you are looking at implementing a new HA solution or looking to replace an existing one that you consider HA4i.

Chris…

Feb 12

Adding a Bar Graph for CPU Utilization to HA4i and DR4i

As part of the ongoing improvements to the PHP interfaces we have developed for HA4i and DR4i we decided to add a little extra to the dashboard display. The dashboard display is a quick view of the status for the replication processes, we used traffic lights as indicators to determine if the processes are running OK. Another post recently discussed using a gauge to display the overall CPU utilization for the system but we wanted to take it one step further, we wanted to be able to show just the CPU utilization for the HA4i/DR4i jobs.

We thought about how the data should be displayed and settled on a bar graph, the bars of the graph would represent the CPU utilization as a percentage of the available CPU and would be created for each job that was running in the HA4i/DR4i subsystem. This gave us a couple of challenges because we needed to determine just how many jobs should be running and then allow us to build a table which would be used to display the data. There are plenty of bar graph examples out there which show how to use CSS and HTML to display data, our only difference is that we would need to extract the data from the system and then build the bar graph based on what we were given.

The first program we needed to create was one which would retrieve the information about the jobs that are running that could be called from the Easycom program interface. We have already published a number of tests around this technology so we will just show you the code we added to allow the data to be extracted. To that end we extended the PHPTSTSRV service program with the following function.

typedef _Packed struct job_sts_info_x {
char Job_Name[10];
char Job_User[10];
char Job_Number[6];
int CPU_Util_Percent;
} job_sts_info_t;

int get_job_sts(int *num_jobs, job_sts_info_t dets[]) {
int i,rc = 0,j = 0; /* various ints */
int dta_offset = 0;
char msg_dta[255]; /* message data */
char Spc_Name[20] = "QUSLJOB QTEMP "; /* space name */
char Format_Name[8] = "JOBL0100"; /* Job Format */
char Q_Job_Name[26] = "*ALL HA4IUSER *ALL "; /* Job Name */
char Job_Type = '*'; /* Job Info type */
char *tmp;
char *List_Entry;
char *key_dta;
Qus_Generic_Header_0100_t *space; /* User Space Hdr Ptr */
Qus_JOBL0100_t *Hdr;
Qwc_JOBI1000_t JobDets;
EC_t Error_Code = {0}; /* Error Code struct */

Error_Code.EC.Bytes_Provided = _ERR_REC;
/* get usrspc pointers */
/* memcpy(Q_Job_Name,argv[1],26); */
QUSPTRUS(Spc_Name,
&space,
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
if(memcmp(Error_Code.EC.Exception_Id,"CPF9801",7) == 0) {
/* create the user space */
if(Crt_Usr_Spc(Spc_Name,_1MB) != 1) {
printf(" Create error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
QUSPTRUS(Spc_Name,
&space,
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
printf("Pointer error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
}
else {
printf("Some error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
}
QUSLJOB(Spc_Name,
Format_Name,
Q_Job_Name,
"*ACTIVE ",
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
printf("QUSLJOB error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
List_Entry = (char *)space;
List_Entry += space->Offset_List_Data;
*num_jobs = space->Number_List_Entries;
for(i = 0; i < space->Number_List_Entries; i++) {
Hdr = (Qus_JOBL0100_t *)List_Entry;
memcpy(dets[i].Job_Name,Hdr->Job_Name_Used,10);
memcpy(dets[i].Job_User,Hdr->User_Name_Used,10);
memcpy(dets[i].Job_Number,Hdr->Job_Number_Used,6);
QUSRJOBI(&JobDets,
sizeof(JobDets),
"JOBI1000",
"*INT ",
Hdr->Internal_Job_Id,
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
printf("QUSRJOBI error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
dets[i].CPU_Util_Percent = JobDets.CPU_Used_Percent;
List_Entry += space->Size_Each_Entry;
}
return 1;
}

The program calls the QUSLJOB API to create a list of the jobs which are being run with a User profile of HA4IUSER (we would change the code to DR4IUSER for the DR4I product) and then use the QUSRJOBI API to get the CPU utilization for each of the jobs. We did consider using just the QUSLJOB API with keys to extract the CPU usage but the above program does everything we need just as effectively. As each job is found we are writing the relevant information to the structure which was passed in by the PHP program call.

The PHP side of things requires the i5_toolkit to call the program but you could just as easily (well maybe not as easily :-)) use the XMLSERVICE to carry out the data extraction. We first created the page which would be used to display the bar chart, this in turn calls the functions required to connect to the IBMi and build the table to display the chart. Again we are only showing the code which is additional to the code we have already provided in past examples. First this is the page which will be requested to display the chart.

<?php
/*
Copyright © 2010, Shield Advanced Solutions Ltd
All rights reserved.

http://www.shieldadvanced.ca/

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Neither the name of the Shield Advanced Solutions, nor the names of its
contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

*/
// start the session to allow session variables to be stored and addressed
session_start();
require_once("scripts/functions.php");
// load up the config data
if(!isset($_SESSION['server'])) {
load_config("scripts/config_1.conf");
}
$conn = 0;
$_SESSION['conn_type'] = 'non_encrypted';
if(!connect($conn)) {
if(isset($_SESSION['Err_Msg'])) {
echo($_SESSION['Err_Msg']);
$_SESSION['Err_Msg'] = "";
}
echo("Failed to connect");
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<style>
td.value {
background-image: url(img/gl.gif);
background-repeat: repeat-x;
background-position: left top;
border-left: 1px solid #e5e5e5;
border-right: 1px solid #e5e5e5;
padding:0;
border-bottom: none;
background-color:transparent;
}

td {
padding: 4px 6px;
border-bottom:1px solid #e5e5e5;
border-left:1px solid #e5e5e5;
background-color:#fff;
}

body {
font-family: Verdana, Arial, Helvetica, sans-serif;
font-size: 80%;
}

td.value img {
vertical-align: middle;
margin: 5px 5px 5px 0;
}

th {
text-align: left;
vertical-align:top;
}

td.last {
border-bottom:1px solid #e5e5e5;
}

td.first {
border-top:1px solid #e5e5e5;
}

table {
background-image:url(img/bf.png);
background-repeat:repeat-x;
background-position:left top;
width: 33em;
}

caption {
font-size:90%;
font-style:italic;
}
</style>
</head>

<body>
<?php get_job_sts($conn,"*NET"); ?>
</body>
</html>

The above code shows the STYLE element we used to form the bar chart, normally we would encompass this within a CSS file and include that file, but in this case as it is just for demonstrating the technology we decided to leave it in the page header. the initial code of the page starts up the session, includes the functions code, loads the config data which is used to make the connection to the IBMi and then connects to the IBMi. Once that is done the function which is contained in the functions.php called get_job_sts is called. Here is the code for that function.


/*
* function to display bar chart for active jobs and CPU Usage
* @parms
* the connection to use
*/

function get_job_sts(&$conn,$systyp) {
// get the number of jobs and the data to build the bars for

$desc = array(
array("Name" => 'NumEnt', "io" => I5_INOUT, "type" => I5_TYPE_INT),
array("DSName" =>"jobdet", "count" => 30, "DSParm" => array(
array("Name" => "jobname", "io" => I5_OUT, "type" => I5_TYPE_CHAR, "length" => "10"),
array("Name" => "jobuser", "io" => I5_OUT, "type" => I5_TYPE_CHAR, "length" => "10"),
array("Name" => "jobnumber", "io" => I5_OUT, "type" => I5_TYPE_CHAR, "length" => "6"),
array("Name" => "cpu", "io" => I5_OUT, "type" => I5_TYPE_INT))));
// prepare for the program call
$prog = i5_program_prepare("PHPTSTSRV(get_job_sts)", $desc, $conn);
if ($prog == FALSE) {
$errorTab = i5_error ();
echo "Program prepare failed <br>\n";
var_dump ( $errorTab );
die ();
}
// set up the input output parameters
$parameter = array("NumEnt" => 0);
$parmOut = array("NumEnt" => "nbr", "jobdet" => "jobdets");
$ret = i5_program_call($prog, $parameter, $parmOut);
if (!$ret) {
throw_error("i5_program_call");
exit();
}
echo("<table cellspacing='0' cellpadding='0' summary='CPU Utilization for HA4i Jobs'>");
echo("<caption align=top>The current CPU Utilization for each HA4i Job on the " .$systyp ." System</caption>");
echo("<tr><th scope='col'>Job Name</th><th scope='col'>% CPU Unitlization</th></tr>");
for($i = 0; $i < $nbr; $i++) {
$cpu = $jobdets[$i]['cpu']/10;
if($i == 0) {
echo("<tr><td class='first' width='20px'>" .$jobdets[$i]['jobname'] ."</td><td class='value first'><img src='img/util.png' alt='' width='" .$cpu/2 ."%' height='16' />" .$cpu ."%</td></tr>");
}
elseif($i == ($nbr -1)) {
echo("<tr><td class='last' width='20px'>" .$jobdets[$i]['jobname'] ."</td><td class='value last'><img src='img/util.png' alt='' width='" .$cpu/2 ."%' height='16' />" .$cpu ."%</td></tr>");
}
else {
echo("<tr><td width='20px'>" .$jobdets[$i]['jobname'] ."</td><td class='value'><img src='img/util.png' alt='' width='" .$cpu/2 ."%' height='16' />" .$cpu ."%</td></tr>");
}
}
echo("</table>");
return 1;
}

The program call is prepared with a maximum of 30 job info structures, we would normally look to define this before the call and set the actual number of jobs to extract but for this instance we simply decided that 30 structures would be more than enough. After the program is called and the data returned we then build the table structure that will be used to display the data. We originally allowed the bar to take up all of the table width but after testing on our system which has uncapped CPU found that we would sometimes get over 100% CPU utilization. We still show the actual utilization but decided to halve the bar width which gave us a better display.

HA4i is running on our system in test so the CPU utilization is pretty infrequent even when we run a saturation test, but the image capture below will give you an idea of what the above code produces in our test environment.

CPU_Bar_Chart

CPU Ultization Bar Chart HA4i

Now we just need to include the relevant code into the HA4i/DR4i PHP interfaces and we will be able to provide more data via the dashboard which should help with managing the replication environment. You can see the original bar chart on which this example was produced here

Happy PHP’ing.

Chris…

Dec 18

New features added to HA4i

A couple of new features have been added to the HA4i product as a result of customer requests. Auditing is one area where HA4i has always been well supported but as customers get used to the product they find areas they would like to have some adjustments. The object auditing process was one such area, the client was happy that the results of the audits were correct but asked if we could selectively determine which attributes of an object are to be audited as they have some results which while they are correct are not important to them.

The existing process was a good place to start so we decided to use this as the base but while were making changes improve the audit to bring in more attributes to be checked. We determined a totally new set of programs would be required which would include new commands and interfaces, this would allow the existing audit process to remain intact where clients have already programmed them into their schedulers and programs. The new audits would run by retrieving the list of parameters to be checked from a control file and only compare configured parameters. The results have been tested by the client and he has given us the nod to say this meets with his approval. We also added new recovery features which allow out of sync objects to be fully repaired more effectively.

Another client approached us with a totally different problem, they were having problems with errors being logged from the journal apply process due to developers saving and restoring journaled objects from the production environment into test libraries on the production system. This caused a problem because the objects are automatically journaled to the production journal when they are restored, so when the apply process finds the entry in the remote journal it tries to create the object on the target system and fails because the library does not exist. To overcome this we amended the code which supports the re-direction technology for the remote apply process (It allows journal entries for objects in one library to be applied to objects in another library) to support a new keyword *IGNORE. When the apply process finds these definitions it will automatically ignore any requests for objects in the defined library. NOTE:- The best solution would have been to move the developers off the production systems and develop more HA friendly approaches to making production data available, but in this case that was not an option.

We are always learning and adding new features into HA4i, many of them from customer requirements or suggestions. Being a small organization allows us to react very quickly to these requirements and provide our clients with a High Availability Solution that meets their needs. If you are looking for an Affordable High Availability or Disaster Recovery Solution or would like to ask about replacing an existing solution give us a call. We are always happy to look at your needs and see if HA4i will fit your solution requirements and budget.

Chris…