If you are writing embedded SQL application using C/C++ or SQLJ application using Java, you will come across an issue about how to use DB2 PREPARE statement effectively in your application code. When I talk to Oracle or SQL Server DBA, they all seem to have an impression that the database will cache the prepare statement and it is OK to use PREPARE anywhere in the code since database engine will not parse and compile that SQL again.

Same is true in DB2 also but even if database engine will not parse that SQL again, you still do the calculations to determine if that SQL is in cache or not. This might become an issue if you are writing a high performance application and even milliseconds of performance may add up to several minutes or hours performance bottleneck when you do millions of such transactions.

The following is a simple test to insert 100,000 rows in DB2 using a dynamic SQL statement in an embedded SQL application using C. The same principle holds good for Java or SQLJ programs.

In the first test, we will prepare the statement outside the loop and in another do the same test with PREPARE inside the loop. This is a most common application issue I have seen in many different places where perception of PREPARE is misunderstood and used incorrectly.

Sample C source

The following source will truncate the DB2 table by calling db2 import API to import an empty file to truncate the table. It prepares a dynamic SQL statement to insert rows in a table 100,000 times and measures the elapsed time.

$ cat dynsql.sqc 
#include <stdio.h>
#include <string.h>
#include <sqlca.h>
#include <db2ApiDf.h>           /* sqluimpr() */

#define NAME_COLUMN_SIZE 27 
#define DUMMY_FILENAME  "/dev/null"

/*
db2 "create table fedsch.fedtest (c1 int not null, c2 varchar(20) not null, c3 decimal(12,3), c4 int not null generated always as identity)"
*/

int main()
{
  int i = 0, rc = 0;
  struct sqldcol dataDescriptor;
  unsigned char actionBuffer[sizeof(struct sqlchar) + 20 + 130 + 1 + 130 + 1];
  struct sqlchar *pAction = (struct sqlchar *)actionBuffer;
  struct sqluimpt_in inputInfo;
  struct sqluimpt_out outputInfo;

    
  EXEC SQL INCLUDE SQLCA;

  EXEC SQL BEGIN DECLARE SECTION; 
     char sqlBuffer[1024];
     char name[27]; 
     sqlint32 c1;
     char c2[21];
     double c3;
  EXEC SQL END DECLARE SECTION;

  /* prepare import parameters */
  dataDescriptor.dcolmeth = SQL_METH_D;
  pAction->length = sprintf(pAction->data, "REPLACE INTO FEDSCH.FEDTEST");
  inputInfo.sizeOfStruct = SQLUIMPT_IN_SIZE;
  inputInfo.commitcnt = 10;
  inputInfo.restartcnt = 0;
  outputInfo.sizeOfStruct = SQLUIMPT_OUT_SIZE;

  strcpy(sqlBuffer, "INSERT INTO fedsch.fedtest (c1, c2, c3) VALUES (?,?,?)");

  EXEC SQL WHENEVER SQLERROR GOTO error_section; 

  EXEC SQL CONNECT TO SAMPLE;

  rc = sqluimpr(DUMMY_FILENAME, NULL, &dataDescriptor, pAction, SQL_DEL,
    NULL, DUMMY_FILENAME, SQLU_INITIAL, &inputInfo, &outputInfo,
    NULL, NULL, &sqlca);
  if (rc) 
  {
     goto error_section;
  }

  EXEC SQL VALUES CURRENT TIMESTAMP INTO :name;
  EXEC SQL PREPARE stmtdyn FROM :sqlBuffer;
  printf ("Server time at start of the test is %s\n", name);
  for (i = 0; i < 100000; ++i)
  {
     c1 = i + 10000;
     strcpy(c2, "United States");
     c3 = 1234.4567893;
     //EXEC SQL PREPARE stmtdyn FROM :sqlBuffer;
     EXEC SQL EXECUTE stmtdyn USING :c1, :c2, :c3;
     if (i % 100 == 0)
     {
          //printf("Commiting results at %d\n", i);
          EXEC SQL COMMIT;
     }
  }
  EXEC SQL VALUES CURRENT TIMESTAMP INTO :name;
  EXEC SQL CONNECT RESET;
  printf ("Server time at end of the test is %s\n", name);
  return 0;

  error_section:
  {
      printf ("Error occured SQLCODE = %d\n", sqlca.sqlcode);
      return -1;
  }
}

Create test DB2 table

$ db2 connect to sample
$ db2 -tf ~/sqllib/misc/EXPLAIN.DDL
$ db2 "create table fedsch.fedtest (c1 int not null, c2 varchar(20) not null, c3 decimal(12,3), c4 int not null generated always as identity)"
$ db2 connect reset

Makefile to compile the program

$ cat makefile 
DB2PATH = $(HOME)/sqllib

ERASE=rm -f

all : \
        test

clean : 
        $(ERASE) *.o

cleanall : \
        clean
        $(ERASE) *.bnd

test :
        ./prepcmd sample dynsql

prepcmd command to compile DB2

$ cat prepcmd 
PACKAGETOKEN=$(date +"%j%M%S")
db2 -v connect to $1
db2 -v "prep $2.sqc bindfile explain yes explsnap yes package using P$PACKAGETOKEN preprocessor 'gcc -I$HOME/sqllib/include -E -o $2.i'"
gcc $2.c -L$HOME/sqllib/lib -ldb2 -o $2prg
db2 -v connect reset

Run the test

First compile the program as it is so that you use PREPARE outside the loop and in second test, uncomment the PREPARE inside the loop and compare the results.

PREPARE outside the loop

$ ./dynsqlprg 
Server time at start of the test is 2007-05-16-12.42.42.465417
Server time at end of the test is 2007-05-16-12.42.48.251052

PREPARE inside the loop

$ ./dynsqlprg 
Server time at start of the test is 2007-05-16-12.43.45.993893
Server time at end of the test is 2007-05-16-12.43.54.469465

If you compare the timings for the above test for an insert of 100,000 rows, the performance difference is 46.5%. This is significant for high throughput systems.

You should take following precautions while you write your embedded SQL or simple Java programs.

  • Use PREPARE in your Java programs in your constructor and make sure that you instantiate your Java class only once before your iterative loop.
  • Use PREPARE in your embedded SQL only once and use it over and over again.
  • Always use parameter markers for performance and do not send hard coded SQL statement to engine

You may violate above rules if performance is not the criteria for you.