Dec 15

QNAP and NFS issues

I don’t know why, but the setup we created for backing up to the QNAP NAS from the IBMi LPARS stopped working. We have been installing PTF’s all weekend as we try to get Node.js up and running on them but that was not the issue. The problem seemed to be related to the way the exports had been applied by the IBMi when running the MOUNT command? We spent a lot of time trying to change the authorities on the mount point to no avail, when the mount point was created everything was owned by the profile that created the directories, however once I mount the remote directories the ownership changed to QSECOFR and even as a user with all authorities I could not view the mounted directories. I also had no way of changing the authorities signed on as QSECOFR or not..

I spent a lot of time playing with the authorities on the remote NAS trying to change the authorities of the shared folder, I even gave full access by anyone to the share which did not work? Eventually (I think I tried every setting possible) I stumbled across the issue. When I looked at the NFS security on the QNAP NAS it has a dropdown which shows the NFS host access, originally this was set to ‘*’ which I assumed meant that it would allow access from any host? However, when I changed this to reflect the local network ‘192.168.100.*’ everything started to work again..

So if you are trying to set this up and stumble into authority issues try setting the host access to reflect your local LAN etc. I will try to delve a little more into exactly what the setting does later..

Chris…

Dec 08

System Values and LVLT4i

System values are an important part of the working environment on the IBM i, therefore it is important that are correctly set ready for whenyou move to a recovery system. LVLT4i is working in an environment where the setting of the System Values as part of the replication process is not an option in just the same way we cannot replicate Profiles and authorities. So we had to come up with a process which would allow us to build the required environment as part of the recovery process.

When we first looked at how we could use LVLT4i we were thinking that the recovery process would use a system save process to recovery the clients environment and then restore the iASP data over it to bring the client data and objects up to the last transaction. That was one of the reasons that the Recovery Time Objective was going to be so long, it takes quite some time to restore a system save. Even if we used Image Catalogs for the restore it was still going to take a significant amount of time, this encouraged us to start looking at the options we had.

One of the major advantages we wanted to push for LVLT4i is the ability to take a backup of a clients applications and data from the iASP and use it for things such as DR testing, application upgrade and OS upgrade testing. To do this we envisage the Managed Service Provider having a recovery partition running the correct level of OS for the clients, the back-up of the iASP could be copied over to the running environment and the client could do their testing without affecting their current DR position. Once the test was completed the system could be scratched and made ready for the next client to use. As part of the discussions we looked at how we could speed up the save and recovery processes (see our Blog entry on saving to a QNAP NAS) using the image catalog technology so that the Recovery Time Objective could be reduced to an absolute minimum. Those programs we created for the testing are actually in use in our environments and have significantly reduced the save times plus provide us with a much faster recovery time should we ever need to set in motion a recovery.

Profiles and Passwords were our first priority because they tend to change a lot, we came up with a process that allows the Managed Service Provider to restore the iASP data and then using automated scripts recover the User Profiles and Passwords before setting the authority. Profile recovery has already been implemented in LVLT4i and testing shows that the process is very effective and fast. The next item we wanted to cover was system values, again as with User Profiles they cannot be replicated to the target system from the client. Using the experience we gained with the storage of the profile data etc. we have now built a retrieval process that will capture all of the system values and then keep those system values in sync. When the client recovery is required scripts will be run that will allow all of the captured system values to be set on the recovery partition.

We believe that LVLT4i is a big step forward in being able to provide a recovery process for many IBM i users, even if they have an existing High Availability product in use today they will see many benefits from using it as their preferred recovery tool. We are noticing that many of those companies that implemented a High Availability Solution are not able to keep up with the changing technology being provided, this means that the recovery capabilities of the solution are being eroded and their value is no longer what it used to be. Data protection is the most important point of any availability solution so managing it needs to be a top priority, having a Recovery Time Objective of 4 – 12 hours should be more than enough for most of the IBM i community so paying for a Recovery Time Objective of minutes is not practical or beneficial.

LVLT4i when managed by a reputable Managed Service Provider should provide the users with a better recovery position and at a price that meets even the tightest of budgets. We believe that Recovery solutions are better managed by those who are committed to them and who continue to develop the skills to maintain them at all times. Although we are not big “Cloud” supporters, we think LVLT4i and the services offered by a Manage Service Provider could make the difference in being able to see value from a properly managed recovery process, offloading the day to day management to a service provider alone should show significant savings.

If you would like to know more about LVLT4i and its capabilities please call us and we will be happy to discuss. If you prefer to use Email we have a contact process on our website under contact us that you can use.

Chris…

Dec 02

Getting the most from LVLT4i

While it is early days for the LVLT4i product we have already had a number of interesting conversations with IBM i users and Managed Service Providers about how we see it being deployed to the smaller IBM i user base.

Price advantages
For the smaller IBM i user the thought of going to a full blown High Availability Solution has always been one that comes with thoughts of big budgets and lots of heartache. The clients need a duplicate system plus the infrastructure required to allow the replication processes to sync data and objects between the systems. Add to this licenses for the High Availability Product, OS and ISV software means that many clients believe availability protection at this level as a viable option.
Even if they identify a Managed Service Provider who could offer the target environment, they still see this is as something beyond their budget.
LVLT4i is aimed at easing that problem, this a Managed Service offering with subscription based pricing based on the clients system (IBM Tier group), this allows the MSP to grow the business without having to invest in up front licensing costs while providing a hardware platform which meets their customers requirements. The iASP technology also reduces the costs for the Managed Service Provider because they can run many clients on a single target LPAR/system removing the one to one relationship generally seen in this scenario. The client will only pay a monthly fee, he will have no upfront capital expense to get signed off and will probably find the target systems are much faster and newer than his existing systems.

Skills advantages
We have been involved with IBM i (and its predecessors) for nearly 25 years in the High Availability market and we have carried out a lot of High Availability software implementations. During that time we have seen a lot of the problems people encounter when trying to implement and manage a High Availability environment. Moving that skill requirement to a Managed Service Provider will bring a number of benefits. The client staff will not have to keep up with the changing capabilities of the High Availability Product, they can concentrate on their main focus which is providing a IT infrastructure to meet the business’s needs. Installation and ongoing management of the replicated environment will be managed by the Managed Service Provider, no more consultancy fees to the High Availability Software provider every time you need to make a minor change. The Managed Service Provider will have a lot of knowledge spread throughout their team and many of that team will have specialist skills that can be brought in to figured out problems.

Technology advantages
LVLT4i uses iASP technology on the target system, the clients system will continue to use *SYSBAS so no changes are required for the clients applications. When the client needs to test or recover the iASP data is saved and restored back to *SYSBAS. This brings some added advantages because the content of those iASP’s can be saved and restored at any time to another LPAR/System for testing. This will allow you to test a new release of software without impacting your current production or recovery position, LVLT4i will continue to keep the recovery partition in sync. Recovery testing will be improved because you will be able to check that the recovery procedures you have developed work, all of this while your existing recovery protection is maintained. Being able to check if a new application update works, check out your application on a new release, check the migration of data to a new release/application, all of these can be carried out without affecting your production or recovery position. If you need extra backups to be taken these can be carried out on the target system at any time during the day, suspending the apply processes while the backup is performed or doing a save while active is not a problem.
The technology which is implemented at the Managed Service Provider will probably be much newer and faster than the client would invest in, this means the advantages of running on the newer systems and OS could be shown to the clients management and maybe convincing them that their existing infrastructure should be improved.
JQG4i will be implemented for those who need job queue content recovery and analysis, this means you can re-launch jobs that did not complete or start using the exact same parameters they were launched with on the source.

LVLT4i is the next level of protection for those who currently use tapes and vaulting for recovery. The Recovery Point Objective is already the same as a High Availability offering (at the transaction level) while the Recovery Time Objective in the 4 – 12 hours which is better than existing tape and vaulting solutions. We are not stopping there, we are already looking at how we can improve the Recovery Time Objective through additional automation and new replication processes, in fact we have already added additional features to the product that will help reduce the time it takes to recover a clients system to the recovery partition at the Managed service Provider. The JQG4i offer adds a new dimension to the recovery process, it brings a very important technology to the users that is not available in many of the High Availability offerings today, this could mean the difference between being able to recover or not.

Even if you already run a High Availability solution today you should look at this offering, having someone else manage the environment and provide a Recovery Point Objective/Recovery Time Objective this offers could be what you need. Many are running a High Availability solution to meet the Recovery Point Objective and not interested in a Recovery Time objective of minutes, this could be costing you more than its worth to maintain. LVLT4i and a Managed Service could offer significant benefits.

If you are interested in knowing more about LVLT4i and the Managed Service Providers we are working with let us know. We are actively seeking more Managed Service Providers who are interested in helping us build a better recovery solution for the IBM i user base.

Chris…

Nov 27

Operational Assistant backup to QNAP NAS using NFS

After a recent incident (not related to our IBM i backups) we decided to look at how we backed up our data from the various systems we deploy. We wanted to be able to store our backups in a central store which would allow us to recover data and objects from a know point in time. After some discussion we decided to set up a NAS and have all backups copied to it from the source systems. We already use a QNAP NAS for other data storage so decided on a QNAP TS-853 Pro for this purpose. The NAS and drives were purchased and set up with Raid 6 and a hot spare for the Disk Protection which left us around 18TB of available storage.

We will use a shared folder for each system plus a number of sub-directories for each type of save (*DAILY *WEEKLY *MONTHLY), the daily save required a day for each day Mon – Thu as Friday would either be a *WEEKLY or *MONTHLY save as per our existing tape saves. Below is a picture of the directories.

Folder List

Folder List

We looked at a number of options for transporting the images off the IBM i to the NAS such as FTP, Windows shares (SAMBA) and NFS. FTP would be OK but managing the scripts to carry out the FTP process could become quite cumbersome and probably not very stable. The Windows share using SAMBA seemed like a good option but after some research we found that the IBM i did not play very well in that area. So its was decided to set up NFS, we had done this before using our Linux systems but never a QNAP NAS to an IBM i.

We have 4 systems defined Shield6 – 9 each with its own directory and sub-tree for storing the images created from the save. The NAS was configured to allow the NFS server to use the share the Folders and provide secure access. At first we had a number of problems with the access because it was not clear how the NFS access was set, but as we poked around the security settings but we did find out where you had to set the access. The pictures below shows how we set the folders to be accessible from our local domain. Once the security was set we started the NFS server on the NAS.

Folder Security Setting

Folder Security Setting

The NAS was now configured and ready to accept mount requests, there are some additional security options which we will review later but for the time being we are going to leave them all set up to the defaults. The IBM i also needs to be configured to allow the NFS mounts to be added, we chose to have the QNAP folders mounted over /mnt/shieldnas1 which has to exist before the MOUNT request is run. The NFS services also have to be running on the IBM i before the MOUNT command is run otherwise it cannot negotiate the mount with the remote NFS server. We started all of the NFS Services at once even though some were not going to be used (The IBM i will not be exporting any directories for NFS mounts so that service does not need to run) because starting the services in the right order is also critical. We mounted the shared folder from the NAS over the directory on the IBM i using the command shown in the following display.

Mount command for shared folder on NAS

Mount command for shared folder on NAS

The following display shows the mapped directories below the mount once it was successfully made.

Subtree of the mounted folder

Subtree of the mounted folder

The actual shared folder /Backups/Shield6 is hidden by the mount point /mnt/shieldnas1, when we create the mount points on the other systems they will all map over their relevant system folders ie /Backups/Shield7 etc so that only the save directories need to be added to the store path.

We are using the Operational Assistant for the backup process, this can be setup using the GO BACKUP command and taking the relevant options to set up the save parameters. We are currently using this for the existing Tape saves and wanted to be able to carry out the same saves but have the target set to an Image Catalog, once the save was completed we would copy the Image Catalog Entries to the NAS.

One problem we found with the Operational Assistant backup is that you only have 2 options for the IFS save, all or nothing. We do not want some directories to be saved (especially the image catalog entries) so we needed a way to ensure that they are never saved by any of the save processes. We did this by setting the *ALWSAV attribute for the directory and subtree to *NO. Now when the SAV portion of the save runs it does not save the Backup directory or any of the other ones we do not need saved.

The image catalog was created so that if required we could generate physical tapes from the image catalog entries using DUPTAP etc. Therefore settings had to be compatible with the tapes and drive we have. The size of the images can be set when they are added and we did not want the entire volumes size to be allocated when it was created, setting the ALCSTG to *MIN only allocates the minimum amount of storage required which when we checked for our tapes was 12K.

For the save process which is to be added as a Job Schedule entry we created a program in ‘C’ which we have listed below, (you could use any programming language you want) taht would run the correct save process for us in the same manner as the Operational Assistant Backup does. We used the RUNBCKUP command as this will use the Operational Assistant files and settings to run the backups. The program is very quick and dirty but for now it works well enough to prove the technology.


#include<stdio.h>
#include<string.h>
#include <stdlib.h>
#include<time.h>

int main(int argc, char **argv) {
int dom[12] = {31,28,31,30,31,30,31,31,30,31,30,31}; /* days in month */
char wday[7][3] = {"Sun","Mon","Tue","Wed","Thu","Fri","Sat"}; /* dow array */
int dom_left = 0; /* days left in month */
char Path[255]; /* path to cpy save to */
char Cmd[255]; /* command string */
time_t lt; /* time struct */
struct tm *ts; /* time struct GMTIME */
int LY; /* Leap year flag */

if(time(&lt) == -1) {
printf("Error with Time calculation Contact Support \n");
exit(-1);
}
ts = gmtime(&lt);
/* if leap year LY = 0 */
LY = ts->tm_year%4;
/* if leap year increment feb days in month */
if(LY == 0)
dom[1] = 29;
/* check for end of month */
dom_left = dom[ts->tm_mon] - ts->tm_mday;
if((dom_left < 7) && (ts->tm_wday == 5)) {
system("RUNBCKUP BCKUPOPT(*MONTHLY) DEV(VRTTAP01)");
sprintf(Path,"/mnt/shieldnas1/Monthly");
/* move the save object to the NAS */
sprintf(Cmd,
"CPY OBJ('/backup/MTHA01') TODIR('%s') TOCCSID(*CALC) REPLACE(*YES)",
Path);
}
else if(ts->tm_wday == 5) {
system("RUNBCKUP BCKUPOPT(*WEEKLY) DEV(VRTTAP01)");
sprintf(Path,"/mnt/shieldnas1/Weekly");
/* move the save object to the NAS */
sprintf(Cmd,
"CPY OBJ('/backup/WEKA01') TODIR('%s') TOCCSID(*CALC) REPLACE(*YES)",
Path);
}
else {
system("RUNBCKUP BCKUPOPT(*DAILY) DEV(VRTTAP01)");
sprintf(Path,"/mnt/shieldnas1/Daily/%.3s",wday[ts->tm_wday]);
/* move the save object to the NAS */
sprintf(Cmd,
"CPY OBJ('/backup/DAYA01') TODIR('%s') TOCCSID(*CALC) REPLACE(*YES)",
Path);
}
if(system(Cmd) != 0)
printf("%s\n",Cmd);
return 0;
}

The program will check for the day of the week and the number of days in the month, this allows us to change the Friday Backup to *WEEKLY or *MONTHLY if it is the last Friday of the month. Using the Job Scheduler we added the above program to an entry which will run at 23:55:00 every Monday to Friday (we do not back up on Saturday or Sunday at the moment) and set it up to run.

On a normal day, our *DAILY backup runs for about 45 minutes when being carried out to a tape, the weekly about 2 hours and the monthly about 3 hours. From the testing we did so far, the save to the image catalog took about 1 minute for the *DAILY and more surprisingly only 6 minutes for the *MONTHLY save (which saves everything). The time it took to transfer the our *DAILY save to the NAS (about 300MB) was only a few seconds, the *MONTHLY save which was 6.5 GB took around 7 minutes to complete.

We will keep reviewing the results and improve the program as we find new requirements but for now it will be sufficient. The existing Tape saves will still run in tandem until we prove the recovery processes. The speed differential alone makes the cost of purchase a very worthwhile investment, getting off the system for a few hours to complete a save is a lot more intrusive than doing it for a few minutes. We can also copy the save images back to other systems to restore objects very easily using the same NFS technology and speeding up recovery. I will also look at the iASP saves next as this coupled with LVLT4i could be a real life saver when re-building system images.

Hope you find the information useful.

Chris…

Oct 24

PowerHA and LVLT4i.

We have had a number of conversations about LVLT4i and what it offers to the Managed Service Provider(MSP). As part of those discussions the IBM solution PowerHA often comes up as it also uses iASP technology but that is really where the similarity ends.

PowerHA uses the iASP to isolate the objects that are to be replicated to another system/storage device and it has an exact copy of the iASP from the source on the target. Changes are captured at the hardware level and are sent to the remote system as they occur.

LVLT4i only replicates objects to a remote iASP, it uses either Audit journal triggers or the Remote Journal technology to capture and send the data. The source object resides in *SYSBAS and the target object in an iASP, it is used primarily to allow multiple copies of the same library/object combination to be stored on a single system. The remote iASP is always available to the user.

iASP is not widely implemented at customer sites, this is in part due to the lack of support for iASP’s built into many of the applications that run on the IBM i today (many of the applications were built before iASP technology was available). For a customer to migrate an application to allow iASP use there are a number of constraints which have to be considered plus each users environment has to be adjusted to allow the iASP content to be used (SETASPGRP etc). This has further limited the use of iASP as many do not feel the benefits of moving to the iASP model out-weight the cost of migration. Another issue is you are now adding an additional storage management requirement, the iASP is disk based which will require protection to be added in some form. With LVLT4i you can leave your system unchanged, only the target system is going to need iASP setup and that will be in the hands of your Managed Service Provider. The decision about what to replicate is yours, with some professional help from a Managed Service Provider who knows your application it should be pretty bullet proof when it comes to recovery.

If you implement PowerHA you are probably going to need to set up an Admin Domain, this is where any *SYSBAS objects such as system values, profiles and configuration objects are managed. in LVLT4i we do not manage system values or configuration objects (configuration objects can be troublesome especially with TCP/IP) or system values. We have however just built in a new profile and password process to allow the security aspects of an application to be managed across systems in real time. Simple scripts can capture configuration and system value settings many of which are not important to your application so LVLT4i has you covered. If we find a need to build in system value or configuration management we will do so fairly rapidly.

PowerHA is priced by Core, so you license it for each Active Core on each system. Using CBU licensing, PowerHA can utilize lower active cores on the target and only activate them when the system is required. Unfortunately in a HA environment you are probably switching regularly so you will have the same number of active cores all the time. LVLT4i is priced by IBM tier regardless of the number of active cores. The target system license is included with the source system license regardless of the target system tier so a Manage Service Provider who has a P30 to support many P05 clients is not penalized.
PowerHA also comes in a few flavors which are decided on by the type of set up you require. Some of the functionality such as Asynchronous mirroring is only available in the Enterprise edition so if you need to ensure your application is not constrained by remote confirmation processing (waiting for the remote system to confirm it has the data) your are going to need the Enterprise edition which costs more per core. LVLT4i comes in one flavor and is based on a rental model, the transport of data over Synchronous/Asynchronous remote journals is available to all plus it supports any geographic model.

Because the iASP is always available the ability to backup at any time is possible with LVLT4i. With PowerHA you have to use a Flashcopy to make another disk based copy of the iASP which can then be used for the back up to tape etc. That requires a duplicate set of disks to match the iASP content. With LVLT4i you can use Save While Active or suspend the apply process for point in time saves, the remote journal will still be receiving your application updates which can be applied once the save has completed so data protection is not exposed.

RPO is an important number which is regularly banded around by the High Availability providers, PowerHA states it is 0 because everything is replicated at the hardware level. We believe LVLT4i is pretty close to the same but there are a couple of things to consider.

First of all, RPO of 0 will require synchronous delivery of changes, if you use an Asynchronous delivery method queued changes will affect that for either solution. LVLT4i uses Remote journalling for data changes, so if you use Synchronous mode I feel the two are similar in effect.

Because we use a different process for object changes, any object updates are going to be dependent on the level of change activity being processed by the object replication processes. The amount of data being replicated is also a factor as a single stream of object changes is used to transfer the updates. We have done a lot of work on minimizing the data which has be be sent over the wire such as using commands instead of save restore, pipe-lining changes so multiple updates to an object are optimized into a single action and compression within the save process. This has greatly reduced the activity and therefore bandwidth requirements.

PowerHA is probably better at object replication because of the technology IBM can access, plus it is going to be carried out in line with the data changes. The same constraints about using synchronous mode affect the object replication process so bandwidth is going to be a major factor in the speed of replication etc. Having said that, most of the smaller clients we have implemented any kind of availability for (HA4i/DR4i) do not see significant object activity and little to no backlogs in the object replication process.

The next recovery figure RTO talks about how long it will take from making the decision to switch, to actually switching. My initial findings about iASP tended to show a fairly long role-swap time because you had to vary off the iASP and then on again to make it available. We have never purchased PowerHA so our tests are based around how long it took to vary off and then on again a single iASP on our P05 system (approximately 20 minutes). I would suspect the newer and faster systems have reduced the time it takes but it is still a fairly long time. LVLT4i is not a contender in this role because we expect the role-swap times to be pretty extended (4 – 12 hours) even if you do a lot of automation and preparation.

One of the issues which affect all High Availability Solutions is the management of batch, if you have a batch process running at the time of failure it could affect the integrity of the application data on the target system. LVLT4i and PowerHA both have this limitation as the capture of job queue content is not possible even in an iASP, but we have a solution which when integrated with LVLT4i will allow you to reload job queues and identify orphaned data which has been applied by a batch process. Our JQG4i product captures all activity for specific job queues and will track each job from load to completion. This will allow you to recover the entire application environment to a known start point and thereby ensure your data integrity is maintained. Just being able to automatically reload jobs that did not run before the system failure is a big advantage that many current users benefit from.

There are plenty of options out there to choose from but each has its own strengths and weaknesses. LVLT4i uses the same replication technology as out HA4i and DR4i products with enhancements to allow the use of iASP as the target disk. It is not designed to meet the same RTO expectations as PowerHA even though both make effective use of iASP technology. However, PowerHA is not necessarily the best option for everyone because it does have a number of dependencies that make it more difficult/costly to implement than a logical replication solution, you have to weigh up the pros and cons of each technology and make a decision about what is important.

If you are interested in knowing more or would like to see a demo of the LVLT4i product please let us know and we will be happy to schedule.

Chris…

Oct 23

SAVSECDTA timing?

We are looking at how to manage the recovery of profiles and passwords in an environment where the profiles cannot be managed constantly. When using our HA4i product we have the ability to constantly maintain the user profiles and passwords because the user profiles are allowed to exist on the target system. However in an environment such as that required for the LVLT4i product User Profiles cannot exist because they may conflict with other profiles from other clients (All user profiles have to exist in *SYSBAS)

The process we have tested involves using the SAVSECDTA command to save the data to a save file, this save file can be automatically replicated to the iASP on the target system. The Profile information is captured in a file which is also replicated to the target iASP using normal replication processes (Remote Journals). When the system needs to be rebuilt for recovery the information collected in the SAVSECDTA file will be restored, the profiles will be updated using the profile data we have collected and then the RSTAUT command will be run. This will bring the system and profiles up to the latest content available.

While we were testing the processes we noticed a very strange thing. The first time we ran the request on a system it took a little while to complete about 1 minute, but when we ran the request again it took only a couple of seconds? The content of the save file was the same (we even set the compression level to high with no significant impact) but why is it taking so long the first time? We thought that maybe it was because the save file was already available (we put it in QTEMP) but again signing off and on then retrying gave us the same results, it now only took a few seconds to complete the save? Signing onto another system and doing the exact same process yielded the same results, the first time took about 1 minute while subsequent tries only took a few seconds.

We do not know what is going on under the covers but it certainly seems like something gets lined up after the first save, this leads us to believe that doing a SAVSECDTA on a regular basis (nightly?) may not be a bad thing. If you have any information as to why, let us know as we are very curious.

LVLT4i is new and while we feel the product should attract a number of Managed Service Providers we are interested in knowing what you think. Would you be interested in a solution that provides a very low RPO (close to zero data loss) with a RTO in the 4 – 12 hours time frame? If you are interested let us know, we will be happy to put you in touch with one of the MSP’s we have been working with. If you are a MSP and would like to know more or even see a demo of the product let us know as well, we are excited by the opportunities this could bring.

Chris…

Oct 20

New Product Library Vault, Why?

We have just announced the availability of a new product, Library Vault for IBM i (LVLT4i) which is aimed primarily at the Managed Service Providers. The product allows the replication of data and objects from *SYSBAS on a clients system to an iASP on a target system.

The product evolved after a number of discussions with Managed Service Providers who were looking for something less than a full blown High Availability Product but more than a simple Disaster Recovery solution. It had to be flexible enough to be licensed by the replication content not the systems being used to run it on.

We looked at our existing products and how the licensing worked, it became very apparent that neither would fit the role as they were both licensed at the system level plus HA4i was more than they needed because it had all bells and whistles associated with a High Availability product while DR4i just didn’t have the object capabilities required. So we had to look at what we could do to build something that sits in the middle and license it in such a manner that would allow the price to be fair for all parties.

Originally the product was going to be used in a LPAR to LPAR scenario because the plan was to use the HA4i product with some removed functionality, however one of the MSP’s decided that managing lots of LPAR’s even if they are hosted as VM’s under an IBM i host would entail too much management and effort. The RTO was not going to be the main driver here only the RPO, so keeping the overhead of managing the solution would be a deciding factor. We looked at how to implement the existing redirection process used for mapping libraries that HA4i and DR4i use, it soon became very apparent to us that this would not be ideal as each transaction being processed would require a lot of effort to set the target object. So we decided to look at how we could take the iASP technology we had built many years ago for our RAP product and structure it in such a manner which would meet all of the requirements.

After some discussion and trials we eventually had a working solution that would deliver an effective iASP based replication process. Next we needed to set the licensing to allow flexibility in how it could be deployed. The original concept would be to set the licensing at the library level as most clients would be basing their recovery on a number of libraries so adding the ability to manage the number of licenses against the number of libraries was started. What at first seemed to be a simple task soon threw up more questions than answers! The number of libraries even with a range was not going to be a fair practice for setting our price, some libraries would be larger than others and have more activity which would generate more activity for the replication process. Also the IFS would be totally outside of the licensing as it has no correlation with a library based object (nesting of directories) so it would need to be managed separately. We also recognized that the Data Apply was based solely on the Journal so library based licensing would not work for it either.

The key to getting this to work would be flexibility, we needed to understand this from the MSP’s position, the effort required to manage the set up and licensing had to be simple enough for the sales person to be able to go in and know what price he should set. So we eventually came back to the IBM tier based pricing, even though we have the ability to license all the way back to the object, CPU, LPAR, Journal etc. We needed to give the MSP flexibility to sell the solution at an affordable price without complex license charts. We also understand that a MSP would grow the business and probably have additional resources available for new clients in advance, so we decided that the price had to be based on the clients system and not on the pair of systems being used.

LVLT4i is just getting started, its future will be defined by the MSP community who use it because they will drive the development of new features. We have always felt that Availability is best handled by professionals because Availability is not a one off project, it has to evolve as the clients requirements evolve and develop. Our products hopefully give clients the ability to move through a natural progression from DR to HA. Just because you don’t need High Availability today doesn’t mean you wont later, we have yet to find anyone who doesn’t need to protect their data. Having that data protected to the nearest transaction at an affordable cost is something we want to provide.

If you feel LVLT4i is right for you let us know, we will be happy to put you in touch with one of the partners we are working with to discuss your needs. If you would like to discuss other opportunities for the product such as data aggregation or centralized storage let us know, we are always happy to see if the technology we have, fits other interests.

Chris…

Jun 23

Annoying CPF9E7F message fixed

After the attempted migration from i-hosting-i to a VIOS based partition configuration and subsequent rebuild of the i-hosting-i partitions, we found that the QSYSOPR message queue was being sent CPF9E7F messages constantly. We checked the HMC configurations and everything looked OK because we had configured 4 partitions with a total of 2 Processors out of the 4 we have available. We had upgraded the system to have 4 available processors ready for the VIOS configurations where we intended to use 2 for IBMi, 1 for AIX and 1 for Linux.

We asked our sales rep what the problem was especially as we have a license for the additional AIX core which we wanted to implement as well, his response was to speak with support as it looked like we were exceeding our licenses. Eventually we raised a PMR and spoke with IBM, they informed us that while we were not technically exceeding our entitlement the way the IBMi OS calculated the available CPU cores meant it saw a problem. The answer was pretty simple to implement, we had to set up Shared Processor Pools and allocate a maximum number of available cores to that pool. Then we then had to make each partition use that pool so that we could not exceed our entitlement. This was done using the Shared Processor Pool Management option in the HMC where we created the new pool and set the partitions to use that pool. That fixed the immediate problem, but the partition profiles also needed updating and the to be re-booted for the changes to take permanent effect.

When we created the IBM i shared pool we also took the opportunity to create a AIX pool and a Linux pool so that when we add those partitions to the system we can correctly allocate the additional processors to them.

We no longer see the CPF9E7F messages and everything runs just the same as it always did. We continue to learn just how capable the IBM i Power system can be, the downside to that is just how complex it can be as well. We hope to set up the AIX partition and Linux partitions in the near future, we will post our experiences as we go along.

Chris…

Jun 12

Issue with ‘restore 21′ resolved, everything running

The problems with the restore 21 of the partition data have been resolved and all of the partitions are now up and running.

The problem which gave us the most grief was the update to the content of the partition which was running V7R2. For some reason the restore operation kept hanging at different spots in the restore 21 process. One of the problems seemed to be with damaged objects on the system which caused the restore to hang and required a forced power off of the partition (SYSREQ 2 did nothing). We cleaned up the damaged objects and started the restore again only to hang again while restoring the IFS only this time we could end the restore operation with SYSREQ 2 and get back to a command line. There was nothing in the joblog to show why the restore was hanging so we eventually manually run the command to restore the IFS. We then started the partition and everything looked OK, but when we tried to start the HTTP server (we like the mobile support so we needed it running) it kept ending abnormally, turns out we forgot to run the RSTAUT command. Restore 21 does this after the RST for the IFS completes. After we ran the RSTAUT the jobs all started up correctly and we had the partition up and running again.

The other problem we had was with a V6R1 partition, it refused to start complaining about a lack of resource (B2008105 LP=00004). As this was a deployment of a running configuration so we thought nothing had changed and wondered why it would no longer start up. In the back of our minds we had a vague recollection that setting up partitions for V6R1 on Power7+ systems required the RestrictedIO partition flag to be set so we looked through the partition profile to find where it was set without success. We discovered that it is not part of the profile, you have to set the flag in the properties for the partition. Once we had done this the partition came up without any further problems and we now had all of our original configuration up and running.

We made a couple of additional changes to the configs because one of the reasons we really liked the VIOS option was being able to start everything up at once. With our set up we were powering up the host partition and then powering up each of the clients manually. We wanted to be able to power on the system and all of the partitions would fire up automatically. Also when we wanted to power down we just wanted to power down the host partition and it would take care of all the hosted partitions, the answers is the Power Controlling settings. We set up each of the NWSD objects in the hosting server to be Power Control *YES, we then updated the profiles for the hosted partitons to be Power Controlled by the hosting partition. After initializing the profiles with the NWSD object varied off and shutting down the profiles we then varied on the NWSD objects and the partitions automatically started up. Now when we start the main partition the other partitions all start once the NWSD is activated (they are all set to vary on at IPL). We also set the hosting partition to power on when the server was powered on and the server to power off when all of the partitions were ended. We have not tested the power down sequence to make sure the guest partitions are ended normally when we PWRDWNSYS *IMMED on the hosting partition but it should shut down each partition gracefully before shutting itself down.

Now its back to HA4i development and testing for the new release, manuals to write and a new PHP interface to design and code. Even though we like the Web Access for i interface it is not as comprehensive as the PHP interface in terms of being able to configure and manage the product.

If you are planning a move to partitioning your Power system we hope the documenting of our experiences is helpful.

Chris…

Jun 11

Rebuild of the i-hosting-i underway.

We have finally started the rebuild of the data for the i-hosting-i partitions and came across a few problems.

First problem was to do with the system plan. Before we started down the VIOS route we created a system plan from the existing partition and system information and checked it to make sure we had no errors logged. Nothing was shown as a problem so our plan was to use it to deploy again if we could not get the VIOS set up functioning. As it turns out we could not use the system plan, the deployment failed every time because of adapter issues which did not show up when we viewed the plan on the HMC.

This required us to edit the system plan which required us to use the system planning tool. We downloaded the SPT to a PC and installed it, a slight issue with Windows 8 meant we had to run the program in Windows 7 mode to get it to install, but once it was up and running we managed to import the original system plan. Even though the system plan was created from a running system with active partitions the planning tool threw up a lot of errors. We had problems with the addition of the internal SATA tape drive blocking the USB adapter and so on which took a pretty long time to understand, in the end we just configured few things we must have to export the plan and exported it ready for import to the HMC. Eventually the plan did deploy on the HMC so it looked like we were ready to go.

We did an IPL D using the SAVSYS tape and all seemed to go well until we got to the DASD configuration in DST. We had the LIC installed the first drive as the load source but we needed to add all of the other drives and Raid protect them. As we progressed through the DST options we kept getting errors about connections being missing, a search using Google turned up nothing so decided to take the F10 option (ignore the message and continue). It turned out to be a problem because we only had one of the Raid cards set up, not have both (I thought we only had one but 2 show up in the hardware list) so when we took the option to add the drives to ASP1 and then started Raid protection it took hours (IBM support did try to help by DLPAR’ing the additional Raid card but we were too late to gain any benefit) so 6 hours later we had the drives set up and protected.

Because this is the hosting partition the other partition data was restored at the same time which took about 5 hours to complete. We checked the NWSD objects for the hosted partitions were restored correctly and configured, we saw that they were were in a VARIED OFF state so we VARIED them ON and watched as they became ACTIVE, so far so good.

At this point we thought OK we are now ready to start the other partitions. We took the option to activate the first partition profile on the HMC but it quickly came to a grinding halt! the SRC code displayed was B2004158 LP=0002, not much information turned up with a Google search so I tried to get a console up to see what was actually going on. It appears that when you first start the partition you need to specifically set the advanced start up parameters the first time (the normal setting is do not override the Mode and source settings), we just set it to B,N and the partition started up.

We still have one partition which fails to start, this is a V6R1 partition and while we did see some reference in the VIOS configurations to dedicated IO for V6R1 on Power 7+ we know this was running before so we think it was damaged on the restore of the NWSD? We have a full system save on tape for it so as soon as everything else is fixed we will try a IPL D with the SAVSYS and rebuild the data.

After over a week of fighting with IBM to get the right hardware and software to run a VIOS based partitioned system we have accepted that i-hosting-i will be the solution for now. We have already started to look at SAN in the hopes of one day having enough bandwidth to trek down this road again, this time we know that internal disks are not for VIOS partitioning! Pity the IBM sales team didn’t know that before we ordered the additional hardware for Ethernet and the additional core activations for PowerVM. I am sure that with enough trail and error you could get a VIOS running with internal disk running, but if the performance is degraded as IBM suggests (they don’t say by how much) I think it may be a futile exercise?

Hope you find the information useful, maybe it will help you avoid some of the pitfalls we came across and save you time and money :-).

Chris..