Jun 26

RAP with automated IFS replication

The PTF is finally built and running in our test environments which provides the IFS replication capabilities to the Receiver Apply program using the QAUDJRN instead of the user journals. The User journals are very good at replicating the changes in most circumstances but we found a few small issues when testing at a customer recently. The biggest problem we found was the inability of the journal apply process (APYJRNCHG command) to support new IFS objects created by the use of a move or copy option taken from the WRKLNK screens.

We had never intended to create our own IFS replication process as we felt the capabilities provided by the IBM APYJRNCHG command would have been more than adequate but having found this glaring problem we decided to bite the bullet and code up the required functions for replication based on entries deposited in the QAUDJRN. Those who follow the iSeriesNetwork forums or this blog will have seen some notes about the issues we encountered when starting down this path, but we have finally got working code which appears to cover most of the scenarios we can throw at it?

We took the decision to keep it Simple and only allow the configuration at the directory level. We did add the ability to add a directory as an Exclude which allows sub directories to be omitted. The option to set individual objects up for replication was discounted as we felt one of the major issues with the HA products out there today is just how complex it can be to set them up. Having a simple directory based configuration allows us to make programmatic decisions quicker than if we had to filter through loads of generic object entries. Plus the users can become lost with when an object meets an inclusion or exclusion clause! One of the biggest complaints we have heard in the past is just how hard it can be to set up the replication parameters for some of the HA tools! That’s why it takes 2 weeks to install these products using specialized staff.

The changes are pretty significant so the decision to release this as a PTF was not taken lightly, however because another recent PTF which allows the filtering of remote libraries for the apply process excluded the IFS if it was configured meant we had to do something now and provide it to the customer base.

If you would like to test the new functionality let us know, the PTF will be in test for sometime to ensure we have not missed something significant. Replication by the use of journals is still supported and will not be replaced by this technology for the time being.

Chris…

Jun 25

Problem with CPFA0A9 resolved

I was having problems with the sending of message CPFA0A9 which was received in an exception routine onto the message queue of our Receiver Apply Program. This stems from the message data which is sent being of type *CCHAR and CCSID 1200.

Initially we tried to convert the message data using the iconv() functions but had limited success and after posting on the iSeriesNetwork forums manged to identify the problem and possible solutions. Eventually we decided to use the optional CCSID parameter on the QMHSNDM API, this also required us to change the CCSID of the message queue from 65535 which is the default to 37 for our system! This could have been a problem further in the installation of the product because we would have to set the CCSID of this message queue as we installed it.

The problem then changed, we now saw some converted text in the message queue but not all of it. As the numbers didn’t ring any bells we logged a problem with IBM.
Here is the test program we supplied to IBM to show the results we were seeing.

#include                          /* Exception Handling */
#include                          /* Exception signals */
#include                        /* Change Error message */
#include                        /* Send Program Message */
#include                         /* Send Non Program Msg */
#include                         /* command execution */
#include                           /* Error Code Structs */
#include                           /* standard I/O */
#include                          /* standard I/O */
#include                          /* memory and string*/

typedef struct  EC_x {
                Qus_EC_t EC;
                char Exception_Data[1024];
                }EC_t;

static void rep_cmd_check(_INTRPT_Hndlr_Parms_T *excp_info) {
int *count = (int *)(excp_info->Com_Area);
int CCSID = 1200;                           /* CCSID CPFA0A9 */
char MsgQ[20] = "TESTMSGQ  *LIBL     ";
char Msg_Type[10] = "*INFO     ";           /* msg type */
char QRpy_Q[20] = {' '};                    /* reply queue */
char Msg_Key[4] = {' '};                    /* msg key */
char msg_dta[1024];                         /* message buffer */
EC_t Error_Code = {0};                      /* error code struct */

Error_Code.EC.Bytes_Provided = sizeof(Error_Code);

*count = 1;
printf("Message length %d",excp_info->Msg_Data_Len);
if(memcmp(excp_info->Msg_Id,"CPFA0A9",7) == 0) {
   QMHSNDM(excp_info->Msg_Id,
          "QCPFMSG   *LIBL     ",
          excp_info->Ex_Data,
          excp_info->Msg_Data_Len,
          Msg_Type,
          MsgQ,
          1,
          QRpy_Q,
          Msg_Key,
          &Error_Code,
          CCSID);
   if(Error_Code.EC.Bytes_Available > 0) {
      printf("Failed %.7s",Error_Code.EC.Exception_Id);
      }
   QMHCHGEM(&(excp_info->Target), 0,
           (char *) (&(excp_info->Msg_Ref_Key)),
           "*HANDLE   ","",0,&Error_Code);
    return;
   }
return;
}

int main(int argc, char** argv) {
volatile int e_count = 0;                   /* error flag */
char msg_dta[50];
char reply;
char cmd_str[255] = "RMVLNK OBJLNK('/home/c.hris/eclipse/RSE/SBM00002.log')";

#pragma exception_handler(rep_cmd_check,e_count,0,_C2_ALL,_CTLA_HANDLE)
QCMDEXC(cmd_str,strlen(cmd_str));
#pragma disable_handler

if(e_count > 0) {
   exit(-1);
   }
exit(0);
}  


The message we saw in the message queue was partly converted with non printable characters following the correctly converted characters.

IBM advised us the problem was related to the exception data not being the entire message data, it is restricted to the first 48 bytes of the message data! We had missed that entirely, most of the messages we had processed in the past must have had message data less than 48 bytes! We had told the QMHSNDM API we were passing 78 bytes, it would pass on 37 bytes of converted data but we only saw 24 bytes of actual converted data in the message queue..

So as per IBM’s advice we coded up the exception routine to get the message data directly from the message itself, we took the decision to do this even if the message data was less than 48 bytes because with UTF-16 data (this is CCSID 1200) each character takes up 2 bytes, so in effect we would only process 24 characters of message data in UTF-16 in the exception routine. Also having been burned this time we didn’t want it to happen again with some other message we had yet to stumble upon.

IBM’s suggestion was to use the QMHRCVM API which turned out to be a double hit because it would automatically convert the data to the Job CCSID. The message queue could remain at CCSID 65535 with no side effects.

This is the code now.

#include                          /* Exception Handling */
#include                          /* Exception signals */
#include                        /* Change Error message */
#include                        /* Send Program Message */
#include                         /* Send Non Program Msg */
#include                        /* Recv Program Msg */
#include                          /* command execution */
#include                              /* Error Code Structs */
#include                               /* standard I/O */
#include                              /* standard I/O */
#include                             /* memory and string*/

typedef struct  EC_x {
                Qus_EC_t EC;
                char Exception_Data[1024];
                }EC_t;

typedef _Packed struct  Rcv_Msg_x {
                        Qmh_Rcvpm_RCVM0100_t msg_struct;
                        char msg_data[2048];
                        }Rcv_Msg_t; 

static void rep_cmd_check(_INTRPT_Hndlr_Parms_T *excp_info) {
int *count = (int *)(excp_info->Com_Area);
char MsgQ[20] = "TESTMSGQ  *LIBL     ";
char Msg_Type[10] = "*INFO     ";           /* msg type */
char QRpy_Q[20] = {' '};                    /* reply queue */
char Msg_Key[4] = {' '};                    /* msg key */
char msg_dta[1024];                         /* message buffer */
Rcv_Msg_t rtv_dta;                          /* message struct */ 
EC_t Error_Code = {0};                      /* error code struct */

Error_Code.EC.Bytes_Provided = sizeof(Error_Code);

*count = 1;
QMHRCVPM(&rtv_dta,
         sizeof(rtv_dta),
         "RCVM0100",
         "*         ",
         0,
         "*ANY      ",
         (char *) (&(excp_info->Msg_Ref_Key)),
         0,
         "*SAME     ",
         &Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
   snd_error_msg(Error_Code);
   sprintf(msg_dta,"Failed to retrieve message data");
   snd_msg("GEN0001",msg_dta,strlen(msg_dta));
   return;
   }  

QMHSNDM(excp_info->Msg_Id,
        "QCPFMSG   *LIBL     ",
        rtv_dta.msg_data,
        rtv_dta.msg_struct.Data_Returned,
        Msg_Type,
        MsgQ,
        1,
        QRpy_Q,
        Msg_Key,
        &Error_Code);  
if(Error_Code.EC.Bytes_Available > 0) {
   snd_error_msg(Error_Code);
   sprintf(msg_dta,"Failed to send message");
   snd_msg("GEN0001",msg_dta,strlen(msg_dta));
   return;
   }  
QMHCHGEM(&(excp_info->Target), 0,
         (char *) (&(excp_info->Msg_Ref_Key)),
         "*HANDLE   ","",0,&Error_Code); 
return;
}

int main(int argc, char** argv) {
volatile int e_count = 0;                   /* error flag */
char msg_dta[50];
char reply;
char cmd_str[255] = "RMVLNK OBJLNK('/home/c.hris/eclipse/RSE/SBM00002.log')";

#pragma exception_handler(rep_cmd_check,e_count,0,_C2_ALL,_CTLA_HANDLE)
QCMDEXC(cmd_str,strlen(cmd_str));
#pragma disable_handler

if(e_count > 0) {
   exit(-1);
   }
exit(0);
}  

Now you will get the correct message displayed in the message queue listed..

Hope this helps others out who have the same problems..

Chris…

Jun 18

Is TigerDirect coining it in 1c at a time?

I was just going through my charges for my credit card and found a small and somewhat insignificant error, that is until you consider this could be happening to thousands of clients every day. Every order I have placed last month has been incremented by 1 cent when I check the invoice with my Credit card Bill? They may have made an extra couple of cents out of me, but what if this has happened to everyone, how much will they have made then?

Have you purchased something from TigerDirect recently and noticed the same problem? It may just be me and its probably not worth the effort of calling to get it sorted but could that be why its done this way?

Chris…

Jun 12

IFS paths and some interesting points.

As part of building of a process to replicate changes to IFS objects outside of the Journal environment we have today we came across a couple of interesting points.

1. The maximum path size defined for an Audit Journal entry is 5002 (this includes 2 bytes which are the length of the path)
2. The maximum length of a path name is 16MB
3. The maximum length of a component is 255 characters (CCSID dependant)
4. There is no defined limit to the number of sub directories.

We had to build screens which would support the maximum sizes, the journal limit is what we took as the limit for all path names, after all if we cant see the path from the journal entry we cant replicate it! Can you imagine creating a path which is 16MB long? We looked at the maximum size we could fit onto a screen and it was just over 1600 characters. The longest path we could find which sat under the QOpenSys directory was just over 80 characters!
Setting the maximum size we support to 5000 characters should be more than enough for most people, shouldn’t it?

Whats the longest path name you have ever had to work with?

Chris…

Jun 12

Object Auditing Attribute for Directories

The IFS replication process is going along pretty well and we have a number of new features in test which should make the final implementation of a real time replication process pretty good. One of the problems we faced was how to ensure the object auditing values were correctly set for all objects below a directory, if you create a new object in the directory you need to make sure its auditing value is correctly set otherwise you will not see the changes to the object in the audit journal.

To ensure new objects are changed to have the correct setting we needed to set the *CRTOBJAUD flag for the directory. The CHGATR command is provided to set the attribute but had few shortcomings.

The initial programs we created simply took the path we were interested in and used the CHGATR command with SUBTREE(*ALL), while this did work it had an annoying side effect of sending messages for every non *DIR object in the path! We could have simply monitored for each of the messages and removed them from the users view but that seemed like a lot of work?

The process we use to set the initial directory mapping up uses a combination of opendir() and readdir() to display a list of all directories, options are provided to walk down the sub directories where you can set *INCLUDE or *EXCLUDE against the individual directories. Our main rule is that if a directory is set for *INCLUDE all sub directories below are automatically selected. An *EXCLUDE selection only excludes the selected directory not all sub directories. Setting the auditing flag on initial selection was fairly easy as the command CHGAUD command allows subtree processing with fairly minimal messaging being returned. However we also needed to set the *CRTOBJAUD flag for all subdirectories to ensure every new object created below the base path would be captured and replicated to the target system. The CHGATR command does support the SUBTREE parameter, but as we have mentioned above, it results in a message for every object which is not suitable for the attribute setting.

We had looked at the Qp0lProcessSubtree() API in the past and discounted it due to some problems in getting the API to function as we wanted it to, however we felt that this would be the correct solution in this instance so we set about creating a couple of functions which would allow us to carry out the task in hand.

There is a good sample in the UNIX type API’s section which shows how to use the API with a function as the exit point. This was a very good starting point which resulted in the following code.

First of all we had to call a function which would determine the right exit point to call, we had to either set the value to *CHANGE or *NONE for each directory we found. We created an initial function which was passed a pointer to the directory we were interested in plus the setting the directory and it’s sub directories had to be set to.

The functions are part of a bigger suite of functions so while they compile in our test environment you may have to make some changes to allow it to compile in yours. These are also non production functions at the moment which will be enhanced for better error control etc in the future.

Here are some of the important structures etc we used in the functions.

#include                        /* Object information */
#include                             /* Qlg structs */   
#include                        /* ccsid conversion */ 
/* other headers not described */

#define PATH_TYPE_CHAR           0x00000000
#define PATH_TYPE_POINTER       0x00000001
#define PATH_TYPE_CHAR_2        0x00000002
#define PATH_TYPE_POINTER_2    0x00000003

typedef union pName_Type {
              char pName_Char[2048];
              char *pName_Ptr;
              };

typedef _Packed struct  pName_Struct_x {
                        Qlg_Path_Name_T qlgStruct;
                        union pName_Type Path;
                        } pName_Struct_t;

typedef _Packed struct  Objtypes_List_x {
                        uint Number_Of_Objtypes;
                        char Objtype[2][11];
                        } Objtypes_List_t; 

Here is the function which is called first from another function.


/**
  * (function) Set_Dir_Atr
  * set the *CRTOBJAUT Attribute on subdirectories
  * @parms
  *     1 directory to start from
  *     2 attribute value
  * returns 1 if the directory was set OK
  */

int Set_Dir_Atr(char *cwd, char *type) {
int rc = 0;                                 /* return code */
char msg_dta[255];                          /* msg data */
IFS_Path_t Path;                            /* path name struct */
Objtypes_List_t MyObj_types;                /* obj types struct */
Qp0l_User_Function_t User_function;         /* user function struct */

struct {
   uint  DataIn;
   uint  DataOut;
   } CtlBlkAreaName;

/* set up the function call */
memset((void *)&User_function, 0x00, sizeof(Qp0l_User_Function_t));
User_function.Function_Type = QP0L_USER_FUNCTION_PTR;
User_function.Mltthdacn[0] = QP0L_MLTTHDACN_NOMSG;
if(memcmp(type,"*CHANGE",7) == 0)
  User_function.Procedure = &Set_Dir_Atr_Chg;
else
  User_function.Procedure = &Set_Dir_Atr_None;

/* set up the path name struct */
memset((void*)&Path, 0x00, sizeof(Path));
Path.Path_Dets.CCSID = 0;
Path.Path_Dets.Path_Type = 0;
Path.Path_Dets.Path_Length = strlen(cwd);
memcpy(Path.Path_Dets.Path_Name_Delimiter,"/ ",2);
memcpy(Path.Path_Name,cwd,strlen(cwd));

/* set up the object types */
MyObj_types.Number_Of_Objtypes = 2;
memcpy(&MyObj_types.Objtype[0],"*DIR       ",11);
memcpy(&MyObj_types.Objtype[1],"*NOQSYS    ",11);

if(rc = Qp0lProcessSubtree((Qlg_Path_Name_T *)&Path,
                            QP0L_SUBTREE_YES,
                            (Qp0l_Objtypes_List_t *)&MyObj_types,
                            QP0L_LOCAL_REMOTE_OBJ,
                            (Qp0l_IN_EXclusion_List_t *)NULL,
                            QP0L_PASS_WITH_ERRORID,
                            &User_function,
                            &CtlBlkAreaName) == 0) {
   sprintf(msg_dta,"Process successful");
   snd_msg("GEN0001",msg_dta,strlen(msg_dta));
   }
else {
   sprintf(msg_dta,"ERROR on Qp0lProcessSubtree(): error = %d\n", errno);
   snd_msg("GEN0001",msg_dta,strlen(msg_dta));
   return -1;
   }
return 1;
}  
 

You will notice we have passed in a selection list, we are only interested in *DIR types plus we want to ignore the QSYSLIB types when it is passed a path which is part of an iASP. We have also mapped it for all sub directories (QP0L_LOCAL_REMOTE_OBJ), we expect this to be changed to QP0L_LOCAL_OBJ in production as we are only interested in the local IFS.

Next we needed a function for each request which would set the attribute accordingly. This is the function for the *CHANGE request, the *NONE request is the same except for the value being set in the command. We are looking at the use of the Controlblock to pass in a value which would decide the setting, but that’s another day.

/**
  * (function) Set_Dir_Atr_Chg
  * set the *CRTOBJAUT Attribute on subdirectories within a directory
  * @parms
  *
  * returns 0 if the directory was set OK
  */

void Set_Dir_Atr_Chg(uint *Sel_sts,
                    uint *Err_val,
                    uint *Ret_val,
                    Qlg_Path_Name_T *Obj_name,
                    void  *Func_ctl_blk) {
int i = 0;                                  /* counter */
int len = 0;                                /* counter */
int ret = 0;                                /* return value */
size_t insz;                                /* path len */
size_t outsz = 2048;                        /* converted outbuf size */
char outbuf[2048];                          /* output buffer */
char *outbuf_ptr;                           /* ptr to output buffer */
iconv_t cd;                                 /* convert struct */
size_t ret_iconv;                           /* returned value */
char cmd[255];                              /* command string */
char msg_dta[255];                          /* message data */
char *path_ptr;                             /* ptr to path string */
pName_Struct_t *pName;                      /* Path name struct */
QtqCode_T toCode   =    {37,0,0,0,0,0};     /* CCSID to struct */
QtqCode_T fromCode = {61952,0,0,1,0,0};     /* CCSID from struct */

if(*Sel_sts == QP0L_SELECT_OK) {
   if(Obj_name != NULL) {
      pName = (pName_Struct_t *)Obj_name;
      if(Obj_name->Path_Type & PATH_TYPE_POINTER)
         path_ptr = pName->Path.pName_Ptr;
      else
         path_ptr = (char *)pName->Path.pName_Char;
      /* convert to US CCSID */
      insz = pName->qlgStruct.Path_Length;
      outbuf_ptr = (char *)outbuf;
      memset(outbuf_ptr, 0x00, insz);
      cd = QtqIconvOpen(&toCode,&fromCode);
      if(cd.return_value == -1) {
         *Ret_val = errno;
         return;
         }
      ret_iconv = (iconv(cd,(char **)&(path_ptr),&insz,(char **)&(outbuf_ptr),
                  &outsz));
      if(ret_iconv != 0) {
         ret_iconv= iconv_close(cd);
         *Ret_val = errno;
         return;
         }
      ret_iconv = iconv_close(cd);
      strcpy(cmd,"CHGATR OBJ('");
      strcat(cmd,outbuf);
      strcat(cmd,"') ATR(*CRTOBJAUD) VALUE(*CHANGE)");
      ret = Issue_Cmd(cmd);
      if(ret != 1) {
         sprintf(msg_dta,"Failed to set *CHANGE - %s",outbuf);
         snd_msg("GEN0001",msg_dta,strlen(msg_dta));
         *Ret_val = -1;
         }
      }
   }
*Ret_val = 0;
}   

This function is called for each directory returned from the API, initial testing has shown no problems so far! We did have some problems with the Path Pointer which we have yet to determine the reason for, if we left out the conversion routines the path could not be correctly mapped? If you want to look at this and let us know what you find please be our guest.. It works for now so getting down to the nitty gritty of the issues is less important for us.. You should also consider the CCSID conversion as this is particular to our system and may not work on yours.

Thats it, we can now set the attribute correctly for the IFS directories without the masses of error messages we got when we simply called the CHGATR with SUBTREE(*ALL)! If you have any comments or want to add any changes to the code for others to see please feel free to comment..

Hope this helps others who like use struggled initially with the API.

Chris…

Jun 02

IFS Replication now working

After some difficulties we have now managed to create an IFS replication process which allows IFS objects to be replicated from one system to another using a selection panel on the source system. This is the start of the new IFS replication process which we expect to replace the Journal based replication used by RAP today.

The reason for the new technology is to allow a new filtering functionality we are testing to function correctly. One side effect of the RAP technology is the locking of all journalled objects on the target side by the IBM APYJRNCHG process. If you have many libraries associated with the target journal this could create some conflicts so we have developed an alternative process which allows the libraries to be defined for the APYJRNCHG request. This in-turn creates its own side effect because the IFS entries are ignored when the technique developed is used. So we needed to have an alternative IFS replication process which would sit alongside the Object replication process available in V4R1 of RAP.

After many hours of resolving what in the end turned out to be stupid coding issues we now have a fully working solution which allows single objects to be selected on the source and not have to worry about the path on the target, it is built automatically to allow the restore to work. This functionality will be included as a standalone command and interface within the product but will also form the basis of the automated IFS replication routines we will build.

The new functionality will be added in the next PTF which should be available at the end of this quarter.

Chris…

Jun 02

New Version of JobQGenie showing great promise.

JobQGenie has been around for a number of years. Originally written for a customer who had major issues with recovery after a system failure, it had changed little in how it collected the information required for recovery purposes. That all changed when a new customer started to run it in his very complex environment, we found that the collection process started to trip over itself as it tried to keep up with the amount of data is was having to collect. The customer readily admits his environment is a challenge for many of the products he runs, JobQGenie was just another one!

We had to look at ways of removing the timing issues without significantly changing the products capabilities. This resulted in a total rewrite of the collection and monitoring engine which in turn meant we had to rewrite all of the interfaces, not a small task. After some initial testing we found a few issues and resolved them with lots of help from the customer, eventually they decided the product was stable enough and moved it into a production environment.

We had to fine tune a few things but now it seems to be running OK… We found a small issue where the system puts some incorrect data into some display variables that is more annoying than anything else, the reason for the strange data is not clear so we are working on finding a solution for it. One thing we did feel was worth sharing though is the following quote “XXXXXXXXXX appears to cycle through the 6 digit job numbers about every week or two. If JQG can track all that without skipping a beat, I think we can put up with a few odd symbols”. Not that putting up with a few odd symbols is acceptable, but cycling through the job numbers every week or so shows just how much work JQG is having to do and as they say doesn’t skip a beat! That is certainly a big step forward for JQG when I look back at some of the initial data capture technology we used and how much data was missed..

While JobQGenie may not be as widely used as it should be, having a customer show us how well the product can perform gives us a lot of pride in what we have achieved! Perhaps now the product will get the attention it deserves and we will see more people using the product (which by the way is the only product on the market that provides these capabilities).

Chris…

Jun 02

New Version of Save File replication tool

Someone asked why we had placed the replication tool on the site, we did it just to allow those who need such a tool to have one for free with no strings.. But we can see why it really doesn’t provide much more than using the OS copy tools such as SNDNETF. So we looked at how you could integrate the tools capabilities to provide some kind of automation. The replication of the save file itself is no great feat, but to be able to use it to carry out an overnight transfer may be useful? So here is our take on what you can do with it..

First you need to automate the save of objects to the save file, this is quite simple as you can use the IBM commands to create the save file and save the objects to it. Using the SYNCSAVF command provided is allows you to send the save file to a remote system (multiple calls can send it to multiple systems). This requires simple CL programming to get the objects saved to a save file and replicated to the remote system(s).

The target side needed a method of allowing a program to see when a file had been successfully replicated, the initial version just kept this information to itself, so we added a message to the message queue which states the save file was successfully replicated. Now all you need to do is write a looping program that looks for this message and restores its contents.

We have hundreds of ideas on how to improve it further but don’t have the time to implement. If you use it and find some limitation which you like to see addressed let us know and we will see how we can do it.. time being the only constraint..

Another option would be to share the code? We had looked at open source projects before but no one seemed to be interested, so we dropped the idea..

Chris…