Feb 12

Adding a Bar Graph for CPU Utilization to HA4i and DR4i

As part of the ongoing improvements to the PHP interfaces we have developed for HA4i and DR4i we decided to add a little extra to the dashboard display. The dashboard display is a quick view of the status for the replication processes, we used traffic lights as indicators to determine if the processes are running OK. Another post recently discussed using a gauge to display the overall CPU utilization for the system but we wanted to take it one step further, we wanted to be able to show just the CPU utilization for the HA4i/DR4i jobs.

We thought about how the data should be displayed and settled on a bar graph, the bars of the graph would represent the CPU utilization as a percentage of the available CPU and would be created for each job that was running in the HA4i/DR4i subsystem. This gave us a couple of challenges because we needed to determine just how many jobs should be running and then allow us to build a table which would be used to display the data. There are plenty of bar graph examples out there which show how to use CSS and HTML to display data, our only difference is that we would need to extract the data from the system and then build the bar graph based on what we were given.

The first program we needed to create was one which would retrieve the information about the jobs that are running that could be called from the Easycom program interface. We have already published a number of tests around this technology so we will just show you the code we added to allow the data to be extracted. To that end we extended the PHPTSTSRV service program with the following function.

typedef _Packed struct job_sts_info_x {
char Job_Name[10];
char Job_User[10];
char Job_Number[6];
int CPU_Util_Percent;
} job_sts_info_t;

int get_job_sts(int *num_jobs, job_sts_info_t dets[]) {
int i,rc = 0,j = 0; /* various ints */
int dta_offset = 0;
char msg_dta[255]; /* message data */
char Spc_Name[20] = "QUSLJOB QTEMP "; /* space name */
char Format_Name[8] = "JOBL0100"; /* Job Format */
char Q_Job_Name[26] = "*ALL HA4IUSER *ALL "; /* Job Name */
char Job_Type = '*'; /* Job Info type */
char *tmp;
char *List_Entry;
char *key_dta;
Qus_Generic_Header_0100_t *space; /* User Space Hdr Ptr */
Qus_JOBL0100_t *Hdr;
Qwc_JOBI1000_t JobDets;
EC_t Error_Code = {0}; /* Error Code struct */

Error_Code.EC.Bytes_Provided = _ERR_REC;
/* get usrspc pointers */
/* memcpy(Q_Job_Name,argv[1],26); */
QUSPTRUS(Spc_Name,
&space,
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
if(memcmp(Error_Code.EC.Exception_Id,"CPF9801",7) == 0) {
/* create the user space */
if(Crt_Usr_Spc(Spc_Name,_1MB) != 1) {
printf(" Create error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
QUSPTRUS(Spc_Name,
&space,
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
printf("Pointer error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
}
else {
printf("Some error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
}
QUSLJOB(Spc_Name,
Format_Name,
Q_Job_Name,
"*ACTIVE ",
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
printf("QUSLJOB error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
List_Entry = (char *)space;
List_Entry += space->Offset_List_Data;
*num_jobs = space->Number_List_Entries;
for(i = 0; i < space->Number_List_Entries; i++) {
Hdr = (Qus_JOBL0100_t *)List_Entry;
memcpy(dets[i].Job_Name,Hdr->Job_Name_Used,10);
memcpy(dets[i].Job_User,Hdr->User_Name_Used,10);
memcpy(dets[i].Job_Number,Hdr->Job_Number_Used,6);
QUSRJOBI(&JobDets,
sizeof(JobDets),
"JOBI1000",
"*INT ",
Hdr->Internal_Job_Id,
&Error_Code);
if(Error_Code.EC.Bytes_Available > 0) {
printf("QUSRJOBI error %.7s\n",Error_Code.EC.Exception_Id);
exit(-1);
}
dets[i].CPU_Util_Percent = JobDets.CPU_Used_Percent;
List_Entry += space->Size_Each_Entry;
}
return 1;
}

The program calls the QUSLJOB API to create a list of the jobs which are being run with a User profile of HA4IUSER (we would change the code to DR4IUSER for the DR4I product) and then use the QUSRJOBI API to get the CPU utilization for each of the jobs. We did consider using just the QUSLJOB API with keys to extract the CPU usage but the above program does everything we need just as effectively. As each job is found we are writing the relevant information to the structure which was passed in by the PHP program call.

The PHP side of things requires the i5_toolkit to call the program but you could just as easily (well maybe not as easily :-)) use the XMLSERVICE to carry out the data extraction. We first created the page which would be used to display the bar chart, this in turn calls the functions required to connect to the IBMi and build the table to display the chart. Again we are only showing the code which is additional to the code we have already provided in past examples. First this is the page which will be requested to display the chart.

<?php
/*
Copyright © 2010, Shield Advanced Solutions Ltd
All rights reserved.

http://www.shieldadvanced.ca/

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Neither the name of the Shield Advanced Solutions, nor the names of its
contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

*/
// start the session to allow session variables to be stored and addressed
session_start();
require_once("scripts/functions.php");
// load up the config data
if(!isset($_SESSION['server'])) {
load_config("scripts/config_1.conf");
}
$conn = 0;
$_SESSION['conn_type'] = 'non_encrypted';
if(!connect($conn)) {
if(isset($_SESSION['Err_Msg'])) {
echo($_SESSION['Err_Msg']);
$_SESSION['Err_Msg'] = "";
}
echo("Failed to connect");
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<style>
td.value {
background-image: url(img/gl.gif);
background-repeat: repeat-x;
background-position: left top;
border-left: 1px solid #e5e5e5;
border-right: 1px solid #e5e5e5;
padding:0;
border-bottom: none;
background-color:transparent;
}

td {
padding: 4px 6px;
border-bottom:1px solid #e5e5e5;
border-left:1px solid #e5e5e5;
background-color:#fff;
}

body {
font-family: Verdana, Arial, Helvetica, sans-serif;
font-size: 80%;
}

td.value img {
vertical-align: middle;
margin: 5px 5px 5px 0;
}

th {
text-align: left;
vertical-align:top;
}

td.last {
border-bottom:1px solid #e5e5e5;
}

td.first {
border-top:1px solid #e5e5e5;
}

table {
background-image:url(img/bf.png);
background-repeat:repeat-x;
background-position:left top;
width: 33em;
}

caption {
font-size:90%;
font-style:italic;
}
</style>
</head>

<body>
<?php get_job_sts($conn,"*NET"); ?>
</body>
</html>

The above code shows the STYLE element we used to form the bar chart, normally we would encompass this within a CSS file and include that file, but in this case as it is just for demonstrating the technology we decided to leave it in the page header. the initial code of the page starts up the session, includes the functions code, loads the config data which is used to make the connection to the IBMi and then connects to the IBMi. Once that is done the function which is contained in the functions.php called get_job_sts is called. Here is the code for that function.


/*
* function to display bar chart for active jobs and CPU Usage
* @parms
* the connection to use
*/

function get_job_sts(&$conn,$systyp) {
// get the number of jobs and the data to build the bars for

$desc = array(
array("Name" => 'NumEnt', "io" => I5_INOUT, "type" => I5_TYPE_INT),
array("DSName" =>"jobdet", "count" => 30, "DSParm" => array(
array("Name" => "jobname", "io" => I5_OUT, "type" => I5_TYPE_CHAR, "length" => "10"),
array("Name" => "jobuser", "io" => I5_OUT, "type" => I5_TYPE_CHAR, "length" => "10"),
array("Name" => "jobnumber", "io" => I5_OUT, "type" => I5_TYPE_CHAR, "length" => "6"),
array("Name" => "cpu", "io" => I5_OUT, "type" => I5_TYPE_INT))));
// prepare for the program call
$prog = i5_program_prepare("PHPTSTSRV(get_job_sts)", $desc, $conn);
if ($prog == FALSE) {
$errorTab = i5_error ();
echo "Program prepare failed <br>\n";
var_dump ( $errorTab );
die ();
}
// set up the input output parameters
$parameter = array("NumEnt" => 0);
$parmOut = array("NumEnt" => "nbr", "jobdet" => "jobdets");
$ret = i5_program_call($prog, $parameter, $parmOut);
if (!$ret) {
throw_error("i5_program_call");
exit();
}
echo("<table cellspacing='0' cellpadding='0' summary='CPU Utilization for HA4i Jobs'>");
echo("<caption align=top>The current CPU Utilization for each HA4i Job on the " .$systyp ." System</caption>");
echo("<tr><th scope='col'>Job Name</th><th scope='col'>% CPU Unitlization</th></tr>");
for($i = 0; $i < $nbr; $i++) {
$cpu = $jobdets[$i]['cpu']/10;
if($i == 0) {
echo("<tr><td class='first' width='20px'>" .$jobdets[$i]['jobname'] ."</td><td class='value first'><img src='img/util.png' alt='' width='" .$cpu/2 ."%' height='16' />" .$cpu ."%</td></tr>");
}
elseif($i == ($nbr -1)) {
echo("<tr><td class='last' width='20px'>" .$jobdets[$i]['jobname'] ."</td><td class='value last'><img src='img/util.png' alt='' width='" .$cpu/2 ."%' height='16' />" .$cpu ."%</td></tr>");
}
else {
echo("<tr><td width='20px'>" .$jobdets[$i]['jobname'] ."</td><td class='value'><img src='img/util.png' alt='' width='" .$cpu/2 ."%' height='16' />" .$cpu ."%</td></tr>");
}
}
echo("</table>");
return 1;
}

The program call is prepared with a maximum of 30 job info structures, we would normally look to define this before the call and set the actual number of jobs to extract but for this instance we simply decided that 30 structures would be more than enough. After the program is called and the data returned we then build the table structure that will be used to display the data. We originally allowed the bar to take up all of the table width but after testing on our system which has uncapped CPU found that we would sometimes get over 100% CPU utilization. We still show the actual utilization but decided to halve the bar width which gave us a better display.

HA4i is running on our system in test so the CPU utilization is pretty infrequent even when we run a saturation test, but the image capture below will give you an idea of what the above code produces in our test environment.

CPU_Bar_Chart

CPU Ultization Bar Chart HA4i

Now we just need to include the relevant code into the HA4i/DR4i PHP interfaces and we will be able to provide more data via the dashboard which should help with managing the replication environment. You can see the original bar chart on which this example was produced here

Happy PHP’ing.

Chris…

Feb 07

Slow Response with i5_pconnect().


While experimenting with the latest version of our DR4i php interface we came across a slight issue with the i5_connection routines. The problem only appeared after we moved the code from our PC testing environment to the iAMP install so we thought it was simply a slow down as we moved from the PC to the IBMi, unfortunately this is only part of the problem. As soon as we found the issue we contacted Aura and asked them for support, they came back asking about how the problem was manifesting itself as they have not seen it elsewhere and were not sure what could be causing the problem.

We asked Aura about the code and what could have changed to cause the significant slow down, they said that nothing had changed and because they were not able to recreate the same issue in their network they could not understand why we were. After some further discussion and discovery they let us know that they had moved away from the gethostbyname() API to the getaddrinfo() API in preparation for IPV6 support. getaddrinfo() is the API which should be used in place of gethostbyname() API where IPV6 support is required.

We scoured the internet and found a number of entries which discussed the slowdown of lookups when getaddrinfo() was used. It was obviously a problem and we needed to understand how this was playing a part in our environment but not in Aura’s. So our first action was to write a test program which would take a host name and try to resolve that using the getaddrinfo() API. Here is the code we started off with.


#include
#include
#include
#include
#include
#include /* CEE date functions */

#ifndef NI_MAXHOST
#define NI_MAXHOST 1025
#endif

int main(int argc,char **argv) {
int error;
int junkl; /* Int holder */
double secs; /* Secs holder */
char Time_Stamp[18]; /* Time Stamp holder */
char hostname[NI_MAXHOST] = ""; /* Host name returned */
unsigned char junk2[23]; /* Junk char string */
struct addrinfo *result;
struct addrinfo *res;

CEELOCT(&junkl, &secs,junk2,NULL);
CEEDATM(&secs,"YYYYMMDDHHMISS999",Time_Stamp,NULL);
printf("Start = %s\n",Time_Stamp);
error = getaddrinfo(argv[1], NULL, NULL, &result);
/* time now */
CEELOCT(&junkl, &secs,junk2,NULL);
CEEDATM(&secs,"YYYYMMDDHHMISS999",Time_Stamp,NULL);
printf("After getaddrinfo = %s\n",Time_Stamp);
if(error != 0) {
fprintf(stderr, "error in getaddrinfo: %s\n", gai_strerror(error));
exit(EXIT_FAILURE);
}
/* loop over all returned results and do inverse lookup */
/* loop over all returned results and do inverse lookup */
for(res = result; res != NULL; res = res->ai_next) {
error = getnameinfo(res->ai_addr,
res->ai_addrlen,
hostname,
NI_MAXHOST,
NULL,
0,
0);
if(error != 0) {
fprintf(stderr, "error in getnameinfo: %s\n", gai_strerror(error));
}
if(*hostname != '\0')
printf("hostname: %s\n", hostname);
CEELOCT(&junkl, &secs,junk2,NULL);
CEEDATM(&secs,"YYYYMMDDHHMISS999",Time_Stamp,NULL);
printf("After getnameinfo = %s\n",Time_Stamp);
}
freeaddrinfo(result);
return 0;
}

When we ran this test program against our network with a simple hostname which is defined in our HOST file here is a sample of the output.

Start = 20130205095444797
After getaddrinfo = 20130205095453000
hostname: SHIELD3.SHIELD.LOCAL
After getnameinfo = 20130205095453000
Press ENTER to end terminal session.

This showed an 8 second response time for the getaddrinfo() API! Obviously this would not be acceptable as it would be used each time a connection was made. This was an issue because we do not have a DNS to resolve our local names and instead rely on the HOST table entries, our default search is set to *LOCAL so we would have expected getaddrinfo() to look up the address in the HOST table first and it would have been resolved. But due to the way the API has been coded it was always going out to the DNS server asking for an IPV6 address before looking for the IPV4 address in the HOST table.

We then looked at the documentation a lot closer and after some experimentation found that if we removed the Domain Information from the TCP/IP setup (option 12 on the CFGTCP menu) we could get the request for a server name back to immediate responses, but as soon as we added Domain information such as ‘shield3.shield.local’ the response time would instantly creep back up to over 8 seconds. Again not acceptable as the environment we needed the fix for is using NamedVirtualHosting which would always pass in a FQDN.

This is when we raised a PMR with IBM and supplied them with all of the data we had been using and asked for support. They came back with a link to a document which described the problem exactly and it was only affecting i/OS from V6R1 onwards. Because from V6R1 onwards IBM had implemented the getaddrinfo() API to do IPV6 lookups first it would always go out to the DNS for a name resolution even if an IPV4 address could be resolved from the HOST file! It would only drop back to a IPV4 lookup after the IPV6 lookup had failed!

The answer in the end was very simple, we just had to code up the AI_ADDRCONFIG flag in the getaddrinfo() request and it would only do an IPV6 lookup if more than 1 IPV6 address had been configured (::1 is not considered a configured IPV6 address). Now we see immediate responses from the API and everything works as it should even with the Domain Information configured.

If you are seeing a dramatic slowdown in your TCP/IP connection after migrating to V6R1 and you or your aplication vendor are using the getaddrinfo() API you may want to consider the above. Easycom connection routines are affected at the moment but a fix is being developed to resolve the issue.

Chris…