Oct 30

Recovery of a SAP multi tiered environment

One of the problems we have been asked to review, is the use of RAP in a SAP multi tier environment. The question came about because we identified sometime ago the lack of support within the APYJRNCHG(X) commands for the creation of Data Area’s and Data Queues. The fact that IBM would automatically journal a Data Area or Data Queue on the source system led us to believe that they would correspondingly create the new objects on the target system. This is not true, IBM does not process any create messages for these objects, infact it gets worse because you could have a Data Area that exists, you delete it and recreate it and while IBM will delete the Data Area it will not do the recreate! Something I think is a major exposure for anyone rebuilding a system after a system loss.

One question that came to mind was if this is a restriction for RAP it is also going to be a restriction for anyone who needs to recover at a recovery site. If you have been fortunate enough to have saved copies of journal receivers as well as the system save, you would expect the recovery using the APYJRNCHG(X) commands to allow you to rebuild up to the latest receiver? If this turns out to be a long term problem then SAP multi tier environments are going to be particularly prone to not being able to be recovered much beyond the initial save!

For RAP there is a fix, we can simply write an additional module that if required can be run against the journal receiver entries to create the data areas etc on the target system and add the data etc as necessary. Once we create them however they cannot be updated by the APYJRNCHG commands because the JID is going to be wrong between the systems. Another option would be to copy the objects between the systems when required, but how do you do that with the technology used by the product? The latest copy may have data already in it which should be removed prior to the APYJRNCHG command being run or it may even have too much data in it! All of this adds up to a lot of problems when trying to work around something which by rights should be part of the APYJRNCHG(X) commands. Another option will be to run a scanner on the source which automatically copies the object to the target as its created on the source and it would be deleted as necessary. This obviously is not as easy at it seems but with the new version 2.0 technology it will be within reach.

Using one of the Vision products (I know MiMiX had an option that was specifically designed for SAP multi tier) will give you the results you need, however the cost of that is going to be a lot more than a recovery solution or one which implements the technology RAP does.

We have written to IBM but so far just had the confidential agreement barrier thrown up, they cant provide us with future directions without one in place! On the other hand if I do have one and they give me an idea of when it will be fixed I cant tell you about it? I think this is worth a number of those SAP multi tier customers or indeed any customer running software with this kind of trait to ask their recovery solution provider, to in turn ask IBM for advice. BCRS I am sure has many SAP customers in its fold and they need to understand how they can recover, they are paying IBM to do it with them after all!

We will offer an interim solution. Its only a problem for those applications which create and delete Data Areas on the fly so not a show stopper for a lot of companies, the lack of knowledge about the shortcoming is.

Chris…

Oct 30

Vision of the future

It has been a long time now since Vision went through the HA market space and with the help of Thomas Cressy purchased the top players in the HA market space. So what has changed both within the space and within the Vision company.

Many of you will have already been through to see David Vasta’s comments on Vision. He is very clearly upset with the level of support he has been given since Vision took over and saw many problems. This is only the latest in many posts in which David has berated the new Vision. David is one person and obviously not a view of the entire install base that Vision now owns but if there is one there could be many more?

When Vision stared on its quest to become the leading HA vendor I speculated on what I felt was going to happen, I think now is as good a time as any to see if any of my views are turning out to be right.

The initial blog entry HA where will it be next year mentioned the state of the market, most of the sales being made were replacements as opposed to new sales. This was a trend which helped no one except the customers short term. The replacements were more about getting headlines than making business decisions, they would either be done at a very large discount or even for free! The effect of that was less money to the companies and in some cases a much better product for the customer. How is that bad, well if the companies dont gain adequate revenue then they cannot afford to provide superior support or develop the product further. Its a short term solution which usually ends up badly. Well you will be glad to hear that Vision is not in the game of replacement of its own products, if you have a Vision product and want to move to one of the others its going to be an upgrade and will attract the appropriate costs. Vision has not been shy at mentioning to the press that they have more than exceeded their goals for the initial 30 days after the buyouts were completed.

They have also set the products into market segments, if you are an enterprise customer these are the solutions you will be offered, as an SMB customer this is the option you have. This is something I mentioned would happen and it looks like it is! Now one problem you could run into is if youre a SMB customer and need one of the features wich are only available in the enterprise products. I will mention the source side send as something which is available in the MiMiX and Orion products but not in the iTera product. If you need this feature you will have to buy the enterprise product. The importance of the source side send is that it can significantly reduce the bandwidth required to send the journal entries between the systems, for big companies bandwidth is not normally a problem but small companies it may be.

I have also heard rumors that the costs have been raised, while I dont think this is always a bad thing it does show the dominant position they now hold. Before when iTera came to the market the HA products demanded a very high price, iTera changed that which meant more customers could afford the solution. Now we may see a reversal of that trend and many customers deciding they cant afford the solution anymore. As usual there will be products to fill the gaps, our RAP product is one of those and as we grow the technology we will reduce the gap between it and the full grown HA solutions. The latest release should meet the requirements of most companies HA/DR needs. You can always buy new technology, but you cant buy back your data!

Another rumor is that Vision will retain all of the products until at least 2012, they are expecting to take the best of each solution and merge that into a single product sometime in the next 3 years. This is a good thing for the customers as they dont have to worry about holding off a purchasing decision in case the product is retired or a new product comes out to replace it. Before the mergers there was talk about how many customers each of the companies had, well if you added all of the numbers up together it came out to be around 7-8,000 customers depending on whose figures you looked at. Well the new number is 5,000+ which is approximately 6% of the i5 install base (according to the rumors). This number is expected to double in the next 12 months with approximately 10-15% of the i5 install base running a HA /DR solution of some kind! Thats a lot of market to go after and I am sure more companies will join the scramble to gain a piece of the pie. The news releases from Vision are showing significant new sales (not replacements) which would back that statement up.

I had hoped that a new single solution would be available well before the time frames being touted, however I also see that this is good for the customers and the sellers alike, customers dont have to worry about their investment and the seller can provide a working solution today regardless of what market segment the customer is in. We should not forget the other products which the companies now hold such as Replicate1 from Lakeview and the Orion suite with it multi-platform elements. These must make up a fair percentage of the additional revenue and license installs the company now has.

Data Mirror could be a fly in the ointment, but having worked the sales process and competed against the product in the field IBM has to make a number of changes before it will be able to dent the inroads Vision is going to make. In the end that could be too late? I hope not as that would give Vision a dominant position they havent held for a long time and may end up pricing many new customers out of the market.

The question is going to be how well they transition the support and future development teams into a working unit. Support needs Development to gain intricate understanding of the technology thats deployed, and Development needs to understand how that technology is working in various customer environments which comes from the feedback to Support. From Davids rants I would take the opinion that this is probably going to be the major challenge for Vision, the sales teams have an easier job as the competition has been reduced significantly yet the development, support and installation teams have a lot of cross training and technology sharing to do.

I will keep reviewing the situation and post as I see new snippets of information which deserve review and mention.

Chris…

Oct 26

Hardware HA solutions

I like to keep up to speed with whats happening in the HA industry so when IBM published its latest magazine (October 2007) I noticed with interest that they were concentrating on HA. One of the people who should be acutely aware of HA as it relates to the iseries is Steve Finnes and I noted with interest that he had published his views on the HA trends, it is co-authored so I am not sure how much of it was actually Steves thoughts?

Anyhow the first Article I reviewed was from Doug Rock the publisher which talks about the need for a backup in many aspects of our life and then goes on to introduce the reasons you should read the articles because HA is important (OK Looking good so far!). He did go on to mention other articles (thats his job) but they didn’t have the same spark for me as the HA stuff.

I read the article by Richard Schoen on .NET with interest as I want to think about a GUI interface, but thats another story..

Then I hit the pages I was interested in… I started to read with enthusiasm thinking OK these guys really know whats going on and maybe we will get an insight to IBMs view of the latest developments, whats IBM going to do with iCluster and how are they going to get customers to buy into something thats dear to my heart! Second page started to set alarm bells but I thought, OK they have to go on about XSM, iASP, Switched disk and the rest, its IBM technology after all isnt it! I got to the end and thought I missed few pages! Where is all the detail about the iCluster product and the Vision products and how they fit into the picture? So I thought OK there is another section I will read that, I now see that this is not about HA, its about the IBM hardware technology. I took a few days to let the articles sink in and today went back and started to highlight some of the things I didn’t think put the technology in its true light (in the end the highlight covered more than the normal text?). I even found some statements to be outright misleading so here are a few of the things I think need some clarification from someone who isn’t hardware or IBM biased. Before I do that I would like to say we are committed to this space and if IBM actually has a solution which addresses the needs of the i5 community we will endorse it with all the enthusiasm we can muster.

The Article goes on about Logical replication which seems to be pointing at the current crop of HA ISV solutions, but not sure as it doesn’t show what I know to be true? (iCluster is IBMs and not even mentioned) here are a few of the notes I highlighted
HA macro trends follow two broad categories: those IT organizations that are moving to solutions that can provide real-time data protection, and those moving to solutions that can provide near-continuous application availability. The former tend to be small to mid-sized businesses.
I agree with this statement in one way in that Clustering (near-continuous application availability) is only affordable by the big players, the smaller company either cant manage the complexity of a Cluster or doesnt have the environment to support one.
This trend is Largely driven by a growing intolerance to data loss and by compliance requirements in certain industries, but didn’t IBM announce that the SMB market is key to their growth?
Again I agree whole heartedly, I see data protection as being the most important factor for anyone needing Availability (Notice I didnt say High Availability). But you don’t need Clustering, XSM or the like to achieve that! Simple Logical replication is all thats needed with a reasonable RTO. (Recovery Time Objective).
Many of these clients are moving toward independent auxilary storage pools(IASPS). IASP based solutions, such as cross-site mirroring(XSM) or the Copy Services for System i Toolkit, protect datacenter operations while using a logical- replication solution to support disaster-recovery requirements
This is where I started to wonder about the article? Is the Logical replication software not the most widely used? How many are moving to iASP for data protection? Most of the implementations I have been involved with are not using it for HA but more for data separation. I have seen the use of Switched disk and LPARs but again the customer saw major setbacks with performance and management? I looked at getting the technology for testing to see its true potential but gave up after finding the cost of doing so, I only have a small system so it may not be fair to say that for everyone.
Logical replication and Switchable Resources
This section went on to talk about how Logical replication only replicates journalled objects in near-realtime? What about the audit journal based replication? Is this not near-realtime? may not be as fast as the disk block copy, but as the article starts out This trend is Largely driven by a growing intolerance to data loss and by compliance requirements in certain industries would suggest to me that for most companies object replication isn’t a top priority? One part of this paragraph really confused me, it says Only those objects that are journaled (database, IFS, data area, data queue) can be replicated in near real-time; all other changed objects are typically captured via a save-restore process and then replicated to the target system. Practically speaking, this means the source and target copies of data may not be in sync until something is done to synchronize them. For some customers, synchronizing the data on the two systems prior to a role swap can be a lengthy process. The logical replication approach is good for many customers as it lets them meet their disaster-recovery requirements and gives them the flexibility of using the copy of the data on the second system for non-update workload (e.g., queries, backups, etc.).
What do they mean “Practically speaking, this means the source and target copies of data may not be in sync until something is done to synchronize them”? Is that not what the ISV provided HA products do? Last time I checked they did anyhow? If they are talking about Remote Journalling as a stand alone I agree! But they then state you can use the data on the second system for non -update workloads??? Would that not say its in sync???

OK so I have given you a start, there is a lot more I see as contradictory but this blog entry would be too long for anyone to bother to read it! The second article about how IBM has a “REMEDY for what Ails you” is much of the same, no mention of the logical replication and only talks about IBMs solutions! Again IBM has to tout its own products but they should at least mention its not the only solution set available. If you want to read the entire article here is a link FAST Growth
A REMEDY or What Ails You

By the way they do say it takes 15 minutes to switch the iASP which has to be added to the time taken to bring the applications etc online. A good HA implementation based on one of the ISV products takes a few seconds (Could be minutes but not 15 minutes) to switch the product, the rest of the time is taken starting the applications!

I am sure this is more about selling IBM offerings than letting customers know the breadth of offerings available to meet the growing need for Data protection and Availability. We try to offer a complete list of the solutions available including ours on our website. Vision has now amalgamated 3 products under one roof so we will amend the site to reflect that position. I do ask you consider all of the options, IBM has a number of options which do fit the bill for a number of customers but they dont necessarily have them all, let alone the best…

Chris…

Oct 25

Next!

The new Version is finally packaged and ready to ship! We just have to get the manual proofed and finalized before we create the web components and put it up for all to see. I don’t know why but I am already looking for the next challenge! I have a million things I want to do with all sorts of technology yet I also need to stop the very long hours we have been doing.

One of the items I do want to look at is the GUI interfacing of the i5 applications. A lot of talk has been going around lately about what technology should be available from IBM or whoever and how that technology should be presented to the developer. The problem is always going to be how do they allow the old DDS based displays to be converted! I have developed ALL of our displays in UIM so its even harder for me, none of the current batch of GUI generators actually allow the use of panel groups for conversion. My solution would be to totally ignore the old stuff, just let the application owners develop new interfaces with a new technology IBM makes available. Thats why you pay the maintenance charges isnt it? Plus they could always charge an additional fee for those who need the GUI interface by making it the next version. I did look at the GUI generators and thought about generating the DDS displays just so I could use the generator, but to be honest the displays look absolutely awful. If I was looking for a solution and someone gave a presentation using the kind of output I have seen I wouldn’t be encouraged to buy. I even looked at the IBM offering iSeries Access for Web, and thought it looked terrible. All the time you have to add function keys as buttons you spoil the interface… I am sure there are better solutions and IBM must have the ability to add them to the i5. We have a shortened AIX kernel running, why cant it be extended to allow the XWindows technology to run? IBM if youre going to do that let me know, it will save me a lot of work…

My leaning today is to look at a Windows based solution where I use the programs called directly from the client and pass the data back after running the i5 service. Not sure which technology to try out, I have seen .NET being published as a suitable solution but having spent sometime developing with the MS Visual Studio and seeing the mess you can get into with .NET connectivity and management I am not sure its the right path? PHP could be another alternative but shipping a web configuration to suit the application and making sure it runs on different types of browsers will be a challenge. Looking at the forums a lot of i5 staff dont run the Apache server and those that do have some challenges with its configuration. So over the next few months I expect to try out a few possibilities and see what falls out the bottom.

There are also some new items to add to the RAP product, the basics are there but as usual you cant stand still! I want to develop the FTP Client further as the 5250 interface is not what I would want even though it is better than the command line IBM offers.

While the major project for this month is now complete there are a lot more in the wings waiting to be done. I keep saying I will finish the LAMPS project so maybe I need to get on with that next? Who really knows!

Chris…

Oct 22

RAP Support for V5R3

One of the problems we have been asked to address is the support of V5R3. The technology built into RAP V1R1 has always been aimed at the V5R4 OS level due to the technology employed to manage the journals on the target system. However we have had a pre-release technology that will provide the same functionality to the front end of the apply process as the new technology, available in test for sometime.  If you are interested in a V5R3 compliant copy of the product please let us know.  The only requirement for V5R4 was on the target system so we can build the new Source system options to run from V5R3 onwards and have a fully compliant V5R3 set of options already developed and tested. The V5R3 Target code is as yet un-tested in a production environment but we can do this fairly quickly as the majority of the code has already been through significant testing, we just need to production test the delta of the code.

If you are interested in helping with the testing and have a development or test environment you are willing to run the tests against please let us know.

Chris…

Oct 21

Busy with RAP400 but getting near!

Testing the code you have developed is not the easiest of tasks especially when you have such a complex set up as we now have with the RAP product. Another problem is the size of the systems we have for development and the size of objects we can created using the data generators we have.

One problem we recognized was the need to support the biggest journal sequence number that can be generated 18,446,744,073,709,551,600 which is 15 off of the largest unsigned long long value supported on the i5 18,446,744,073,709,551,615. We had developed the original code to use the QXXZTOI to allow us to change the 20 character sequence number to an unsigned long long, this is obviously an error which works as long as we dont have a value above the maximum integer value supported which is 2147483647. If we did get a value above this the call returned -16. Looking around for a solution we decided to use the atoll() function which converts a character string to a long long integer. Having read the information we decided it would meet all of our needs. We coded the call as below.

QjoRtvJrnReceiverInformation(&Rcvr_Output, 
                             &Rcvr_Struct_Size, 
                             msg->JrnRcv1, 
                             "RRCV0100", 
                             &Error_Code); 
if(Error_Code.EC.Bytes_Available > 0) { 
   snd_msg("JRN0090",msg->JrnRcv1,20); 
   return -1; 
   } 
Last_Seq_Number = atoll(Rcvr_Output.Last_Seq_Num_Long); 
Start_Seq_Number = atoll(Rcvr_Output.First_Seq_Num_Long);

This gave us a problem which took sometime to figure out! We couldnt understand why when the sequence numbers were correctly identified in the structure that the Start Sequence number would go off the scale! We just could not see why one call to the atoll() function would return the right value yet the other returned the wrong one! So we went back to the documentation and re-read the call requirements as we seemed to be calling it OK.

Then we looked at how the call really worked, first of all here is the structure that is filled by the API. Note we have removed the bits of the structure which are not important for this explanation.

typedef char Qjo_Seq_Num_t[20]; 
typedef _Packed struct Qjo_RRCV0100 
{ 
..... 
Qjo_Seq_Num_t                  First_Seq_Num_Long; 
Qjo_Seq_Num_t                  Last_Seq_Num_Long; 
Qjo_ASP_Dev_Name_t             ASP_Dev_Name; 
..... 
} Qjo_RRCV0100_t;

So you can see that First_Seq_Num_Long is a 20 character field, Last_Seq_Num_Long has the same format. The problem is how atoll() works, it reads the character string until the last ‘0’ -‘9′ character has been read. So passing in the pointer to First_Seq_Num_Long will continue to read the data from the Last_Seq_Num_Long field! This is why our program correctly read the Last_Seq_Num_Long but not the First_Seq_Num_Long (more by chance than judgement!).

Simple fix, we just created a temporary variable of 20 characters in length, copied the data from the structure and passed the temporary variable into the atoll() function call and everything works as expected.

We are near to finishing the testing of the new version and will post screen shots of the new functions to the website once we have them captured. Then its onto the manual and getting the rest of the marketing and sales prep done.

Hope the explanation make sense!

Chris…

Oct 14

Still not sure?

I metioned in the last post about a problem I was having with the return values from called functions, IBM has not got back to me yet so I decided to play with the code a little more! The problem seems to be the depth of nesting I have on the modules,  I moved the code which printed out the results to the previous function and now everything is working as expected? I tried moving the code back to ensure I hadnt changed the code elsewhere and fixed the problem, but the code failed in exactly the same manner again? While I am sure I may be doing something wrong I cannot understand why a call to a printf function would cause the value returned to be correctly evaulated? Still we will see what IBM has to say about the issue, the problem as always will be they generally need to have a copy of the code so they can run the test on their systems.  This will require me to send the entire code set because of the way the product communicates between the modules and systems!  Not sure IBM will like having to build the entire product just to see what is causing the problem especially as it may be down to me!

I think as the function is now working I will probably let IBM close the call and hope I dont see a re-occurrence of the problem in future code enhancements?? I will let you know if IBM finds or gives me a suitable answer, as the code has now been fixed and the product is working I will reduce the severity which could end up with the problem being lost in the dust!

Chris…

Oct 12

Brain dead and lost

I have spent the last 8 hours trying to find out why a part of my program logic was not firing and finally sent a request off to IBM to see if they can shed some light on it. Its all part of the new Audit functions for the RAP Product that will use the Client server technology we built into the latest release. The program basically reads each record in the file and compares it with the record on the target system. If the record does not exist on the target it flags it as not found, if the data is different it flags it as such etc. I have a number of functions built into a module (all part of the code re-use and modularization) The initial program calls a functions from the module which calls another function from the same module that does the file reads, this then calls another function to send the data to the remote system and return a value dependant on the remote systems return value! (Confused? you should try reading through the code!). This is where the problem starts!

Here is the code for the call which I am trying to capture.

ret = Send_Request(&Record_Inf,DataLength,sockfd,prtf); 
if(ret == -1) { 
do something here; 
return ret; 
} 
else if(ret == 1) { 
do something here; 
} 
else if(ret == 2) { 
do something here; 
}

This results in all of the if statements being skipped! however if I add a printf statement

ret = Send_Request(&Record_Inf,DataLength,sockfd,prtf); 
printf("%dn",ret); 
if(ret == -1) { 
do something here; 
return ret; 
} 
else if(ret == 1) { 
do something here; 
} 
else if(ret == 2) { 
do something here; 
}

The if statement correctly capture the return value and respond!

I added a statement to send a message within the called process just to make sure the return code was being set correctly. This is the code which runs and I have removed all the other code for the function to protect the innocent!

if(memcmp(recv_buf,"NF",2) == 0) {
/* record not found error */
sprintf(msg_dta,"%d",Rec_src->rrn);
snd_aud_msg("AUD0020",msg_dta,strlen(msg_dta));
return 2;
}
if(memcmp(recv_buf," ",4) == 0) {
/* process error */
return -1;
}
/* Data Comparison Error reported */
sprintf(msg_dta,"%d",Rec_src->rrn);
snd_aud_msg("AUD0021",msg_dta,strlen(msg_dta));
return 1;
}
else {
/* no error found */
return 0;
}

I am getting the messages in the message queue so I know the return is being called with the relevant value. But unless I add the printf statement after the function call the if statements are skipped!

As always I doubt my self for a long time and try to fix the problem in what ever way I can, but this is just mystifying me! I have tried adding other actions such as adding the return value to another value etc etc etc but nothing other than calling printf seems to work? I even called a printf with just a string and that worked as well! So its not necessarily just the integer value I am using?

I will let you know if I am stupid(IBM shows me the error of my ways as usual) or if there is actually a problem with the compiler?

Chris…

Oct 09

How long is too long!

As part of my ongoing CRM projects I had to install Visual Studio 2003 .NET because Windows SQL 2000 does not support Reports created with Visual Studio 2005. I wasnt too disturbed when it took 50 minutes to install on the laptop, but its is now over 2 hours later and the SP1 install is still running!   I recently installed the RSE from Eclipse on the Windows PC I have which also took a fair amount of time but not this long.   Why does it take so long to install a package which comes on 2 CD’s? The SP1 package was a single DVD so while it can hold a fair amount of data surely its not more than the whole product was originally?

I am sure the WDS doesnt take this long to install even with all of its components!

Chris…