New POI Release Brings Excel's XLSX Support to RPG
by Scott Klement - February 25th, 2010 |
(Index of Scott's articles on Excel, RPG and Java) |
Since 2004, I've been writing articles about using the HSSF component of Apache's POI to create Excel spreadsheets from RPG. The articles have been wildly popular, but they were missing something: support for Microsoft's new file formats. You see, starting with Microsoft Office 2007, the default file format for Excel has been the Open Office XML (OOXML) format for spreadsheets, represented by the extension .XLSX. Although this has been the standard for Excel users for a few years now, it wasn't available in POI.
Until now.
Actually, that's not completely true. Support for OOXML was first released in POI version 3.5 back in September 2009. However, the changes that the Apache folks made to POI were massive, and it's taken me awhile to rewrite the RPG interfaces to be compatible with the new version of POI. In December 2009, Apache released version 3.6 of POI, which is the current version as I write this, and it's the version that I've tested all my RPG interfaces with.
The original support for Excel was provided in a set of Java classes called Horrible Spread Sheet Format (HSSF). The developer who created it came up with that name as a joke, and it stuck. In the new version, HSSF refers to the original Binary Excel format. The new XML-based format is referred to as XML Spread Sheet Format (XSSF). The new POI also contains a set of format-independent routines that will work with either the HSSF or XSSF formats.
Requirements
Here are the basic requirements for using the new version of the POI package from RPG:
- V5R4 or higher ILE RPG Compiler (licpgm 57xx-WDS, option 31)
- IBM Developer Kit for Java (licpgm 57xx-JV1, *base)
- Java 5.0 or newer (at least one of the following licpgms 57xx-JV1, opt 7, 8, 9, 10, 11 or 12)
- System Openness Includes (licpgm 57xx-SS1, opt 13)
- QShell (57xx-SS1, opt 30) not required but highly recommended
- Apache POI version 3.5 or higher (3.6 recommended)
- The HSSFR4 RPG code (link found at the end of this article)
With those options installed, you'll be able to work with the "Binary Excel" (.XLS) format of Excel document that was available in previous releases of POI. To take things a step further and use the OOXML support (.XLSX), you'll need the following additional Java JAR files available:
- dom4j-1.6.1.jar from DOM4J (version 1.6.1 or higher)
- jsr173_1.0_api.jar from XmlBeans 2.5.0 (or higher)
- xbean.jar from XmlBeans 2.5.0 (or higher)
Compatibility
I took great pains to keep the RPG interfaces to HSSF unchanged. You should be able to install the new version of POI and my RPG routines (HSSFR4) and continue using your existing code that uses these routines, and it should continue to function as it did before, still creating .XLS spreadsheets using the HSSF code.
However, I should warn you that the changes that were made to POI were absolutely massive, and therefore I cannot guarantee 100 percent backward compatibility. Therefore, I suggest doing at least some light testing with the new version before putting it in front of your users. If you find something that does not work, I suspect that simply recompiling your programs, and rebinding them, will solve the problem.
If you need to make more significant changes or need help solving a particularly tough conversion problem, please post it to the System iNetwork Forums so that I, and others, know about the problem and can offer friendly advice.
Setting Up POI Environments
Perhaps you'd like to have the old version of POI running for your production code, and the new version available on the same system for testing. Or perhaps you have some programs you don't want to update to the new POI, but you still want to develop code using the new POI. Whatever the case, it makes sense to configure your system to be able to run both versions simultaneously, but in different environments.
Or . . . even if you only need to run the latest version, you will need to know how to set up that environment!
It's important to understand that the Java code you'll be working with is packaged in a JAR file. JAR files are really just .ZIP files that are filled up with oodles and oodles of Java class objects. If you think of a Java class as a compiled service program, then you can think of a JAR file as a zipped library that's full of service programs containing routines you can call. One of the neat things about JAR files is that you can run the code in them without unzipping the archive! You can run them straight out of the archive, and in many cases this is even faster than extracting the files. Why? Because the bottleneck for loading the classes into memory is the disk access, and in zipped format they are compressed, so fewer bytes need to be read from disk.
In Java, class files are located using an environment variable named CLASSPATH. A CLASSPATH is very similar to a library list, except that we use library lists for finding all sorts of objects (programs, data areas, files, etc.) whereas Java uses only the CLASSPATH to find classes. A CLASSPATH is a colon-delimited list of IFS directories that might contain the Java classes you want to run code from.
A CLASSPATH is an important part of setting up separate environments for POI. The CLASSPATH is how Java finds the POI routines to call. So if you want to have one environment that uses POI version 3.1 and another that uses POI 3.6, this can be accomplished by having a separate CLASSPATH for each environment.
For example, on my system, I decided to create an IFS folder named /java that I will use to keep all the Java routines that I use frequently. Within that /java folder, I have subfolders for each group of JAR files I use. On my system, it looks like this:
/java | ||
/dom4j | DOM4J version 1.6.1, used for XSSF | |
/jdbc | JDBC drivers for accessing remote databases (not needed for POI) | |
/poi3.1 | This is where the older POI version 3.1 JAR files are. | |
/poi3.6 | This is where the new POI version 3.6 JAR files are. | |
/xlparse | This is where I keep the JAR file for the Event-Model Excel parsing (not yet updated to 3.6). | |
/xmlbeans | This is where I keep the JAR files for XmlBeans 2.5.0 (required for XSSF support). | |
-other- | I have other Java utilities (such as JFreeChart) here as well. |
Creating IFS Folders
To create this directory structure on your system, you'd run the following CL commands:
MKDIR DIR('/java') DTAAUT(*RX) MKDIR DIR('/java/poi3.1') DTAAUT(*RX) MKDIR DIR('/java/poi3.6') DTAAUT(*RX) MKDIR DIR('/java/xmlbeans') DTAAUT(*RX) (If you'd like to use XSSF) MKDIR DIR('/java/dom4j') DTAAUT(*RX) (If you'd like to use XSSF)
Note: It's important to make sure you do not have any POI JAR files in /QIBM/UserData/Java400/ext
. This directory is a special directory of JAR files that are automatically loaded and takes precedence over the CLASSPATH. While that can sometimes simplify the management of JAR files, it also makes it impossible to have two versions of POI operating on the same computer at the same time, because the version in the /QIBM/UserData/Java400/ext
folder will always take precedence!
Creating Libraries
In addition to the IFS directory, I also have separate libraries named POI31 and POI36. In these libraries, I keep the HSSFR4 service program and related copy books, binding directories, and so forth.
POI36 | Library for Scott's HSSFR4 and related code for POI 3.6. |
POI31 | Older POI 3.1 version of Scott's HSSFR4 utilities. |
Environment Starters
I like to write CL programs that set up my library list, CLASSPATH, and other variables that control my environment. That way, whenever I want to work in one environment versus another, I can simply sign on and call the appropriate CL program.
Here's an example CL program that sets up the (older) POI 3.1 environment for me to use for programs that haven't yet been migrated to the new POI:
PGM RMVLIBLE POI36 MONMSG CPF2104 ADDLIBLE POI31 *LAST MONMSG CPF2103 ADDENVVAR ENVVAR(CLASSPATH) + VALUE('/java/poi3.1/poi-3.1-FINAL-20080629.jar+ :/java/poi3.1/poi-contrib-3.1-FINAL-20080629.jar+ :/java/poi3.1/poi-scratchpad-3.1-FINAL-20080629.jar') + LEVEL(*JOB) + REPLACE(*YES) ADDENVVAR ENVVAR(QIBM_RPG_JAVA_PROPERTIES) + VALUE('-Djava.version=1.4;+ -Djava.awt.headless=true;+ -Dos400.awt.native=true;') + REPLACE(*YES) ENDPGM
The preceding CL program adds only the POI 3.1 JAR files to the CLASSPATH. It also sets up the QIBM_RPG_JAVA_PROPERTIES variable that sets some additional parameters that control how RPG starts the Java Virtual Machine (JVM). In this case, I only need to use Java 1.4 (or higher) since that's all that's required for POI 3.1
To set up an environment for the new 3.6 version of POI, I'd set up my environment like this:
PGM RMVLIBLE POI31 MONMSG CPF2104 ADDLIBLE POI36 *LAST MONMSG CPF2103 ADDENVVAR ENVVAR(CLASSPATH) + VALUE('/java/poi3.6/poi-3.6-20091214.jar+ :/java/poi3.6/poi-ooxml-3.6-20091214.jar+ :/java/poi3.6/poi-ooxml-schemas-3.6-20091214.jar+ :/java/dom4j/dom4j-1.6.1.jar+ :/java/xmlbeans/jsr173_1.0_api.jar+ :/java/xmlbeans/xbean.jar') + LEVEL(*JOB) + REPLACE(*YES) ADDENVVAR ENVVAR(QIBM_RPG_JAVA_PROPERTIES) + VALUE('-Djava.awt.headless=true;+ -Dos400.awt.native=true;') + REPLACE(*YES) ADDENVVAR ENVVAR(JAVA_HOME) + VALUE('/QOpenSys/QIBM/ProdData/JavaVM/jdk50/32bit') + REPLACE(*YES) ENDPGM
This version loads the 3.6 versions of the POI classes, as well as the POI36 library to my library list. This way, I'll get the newer versions of the Java classes, and I can use the new features available in POI 3.6.
You'll also notice that I'm no longer using the java.version setting in the QIBM_RPG_JAVA_PROPERTIES variable to control the version of Java that I'm running. IBM has deprecated that method and has suggested that the JAVA_HOME variable is a better way to select which JVM you use. In this example, I'm using the 32-bit IBM Technology for Java version of the JVM (57xx-JV1, option 8). If I wanted to use the "Classic" JVM, (57xx-JV1 option 7) I could do that by setting JAVA_HOME as follows:
ADDENVVAR ENVVAR(JAVA_HOME) + VALUE('/QIBM/ProdData/Java400/jdk15') + REPLACE(*YES)
Also, you'll notice that the CLASSPATH above contains all the JAR files needed for both XSSF and traditional HSSF work in POI. That's versatile, because I can use both document formats in the same job. However, if I don't need the XML (XSSF) version, I could reduce the memory footprint and simplify things a bit by setting my CLASSPATH as follows:
ADDENVVAR ENVVAR(CLASSPATH) + VALUE('/java/poi3.6/poi-3.6-20091214.jar') + LEVEL(*JOB) + REPLACE(*YES)
Hopefully you see that there are a lot of potential options for setting up an environment. I've barely scratched the surface of all the options I can use, but I suspect these will be the options that most readers find useful.
There are some important things to note about these environments:
- Environment variables (set with ADDENVVAR) have case-sensitive names. You must use the names CLASSPATH, JAVA_HOME, and QIBM_RPG_JAVA_PROPERTIES, in all uppercase as shown.
- The first time you invoke Java from an RPG program in a given job, the JVM will be loaded into your job's memory and thereafter will remain resident. You will notice a delay of a few seconds while the JVM loads.
- The environment variables are checked only when the JVM first loads. They are not checked thereafter. Consequently, you must set up your environment before Java is used in a given job.
- If you want to switch environments, you'll need to end the job (sign off if it's an interactive job) and start a new job in order to pick up the new settings.
Obtaining the JAR Files
The POI code is open-source (no charge) software. The extra JAR files that it depends on are also open-source packages.
Many of these projects provide things in both a "Binary" and "Source" distribution. Binary means that it does not contain the source code, rather it contains precompiled Java code, ready for you to use. Source means that you get the source code, and it's up to you to compile it. I recommend the "Binary" option unless you are comfortable with compiling Java projects.
Typically, the downloads will be ZIP files from the respective project's home page. You'll want to download the ZIP file from the appropriate site and then extract the ZIP file to get the JAR file(s) from it.
You should then upload the JAR file to the appropriate directory in your IFS.
At the time that I'm writing this (February 2010), these are the links for the downloads of the various JAR files. However, as time goes by, I'm sure the active version numbers will change, as will the exact links for downloading them. Here are links to the respective project's home pages, just in case:
From the downloads for these projects, you'll need to unzip the main files and get the appropriate JAR files to put into your IFS. Here are the JAR files you'll need, and the location I suggest using in the IFS:
Required for | |||||
---|---|---|---|---|---|
Tool | JAR file | IFS Directory | HSSF | XSSF | Description |
POI 3.6 | poi-3.6-20091214.jar | /java/poi3.6 | X | X | Main POI code |
POI 3.6 | poi-ooxml-3.6-20091214.jar | /java/poi3.6 | X | Additional POI code for XSSF | |
POI 3.6 | poi-ooxml-schemas-3.6-20091214.jar | /java/poi3.6 | X | Schemas for XSSF code | |
POI 3.6 | poi-examples-3.6-20091214.jar | /java/poi3.6 | Java examples of how to use POI (not required) | ||
POI 3.6 | poi-contrib-3.6-20091214.jar | /java/poi3.6 | Utilities contributed to the POI project (not required) | ||
POI 3.6 | poi-scratchpad-3.6-20091214.jar | /java/poi3.6 | Experimental features of POI that may or may not be kept in future releases (not required) | ||
DOM4J 1.6.1 | dom4j-1.6.1.jar | /java/dom4j | X | XML Parser (DOM, SAX, JAXP) for Java | |
XmlBeans 2.5.0 | xbean.jar | /java/xmlbeans | X | XMLBeans Utility | |
XmlBeans 2.5.0 | jsr173_1.0_api.jar | /java/xmlbeans | X | API needed for XMLBeans to parse XML |
Note: Many of the JAR file names above contain version numbers. If you download a different version of the tool, please be aware that the name of the JAR file will differ slightly.
The XmlBeans project contains additional JAR files that I did not list above. These files are not required by POI at this time.
Original (Binary) Excel vs. XML Excel
The RPG code to read or write spreadsheets in HSSF format has not changed. If you are new to HSSF format, you should read the previous articles in this series to understand how to create spreadsheets with HSSF. Links to previous articles can be found on my home page, alongside the code download.
However, please understand that the routines discussed in those articles will create files in traditional Excel (.XLS) format and not in the new XML-based .XLSX format.
Writing Format-Independent Code
The bulk of the changes made to POI, as well as the HSSFR4 RPG tools for working with POI have been related to enabling support for XSSF as well as providing format-independent support.
In the actual POI Java code for Excel, you can call three distinct sets of routines. Routines that begin with HSSF are part of the traditional support, routines starting with XSSF are for XML support, and routines without either of these prefixes are for writing code that's format independent. If you write format-independent code, the same code can be used to generate either HSSF or XSSF format, as well as any new formats added in the future. This method is the most powerful and recommended way of creating spreadsheets.
For example, the Java code to create a workbook with one tab, containing one cell using traditional HSSF code would look something like this:
HSSFWorkbook wb = new HSSFWorkbook(); HSSFSheet s = wb.createSheet(); HSSFRow row = sheet.createRow((short)0); HSSFCell cell = row.createCell(0); cell.setCellValue(1);
To code the same thing with XSSF, you'd do this instead:
XSSFWorkbook wb = new XSSFWorkbook(); XSSFSheet s = wb.createSheet(); XSSFRow row = sheet.createRow((short)0); XSSFCell cell = row.createCell(0); cell.setCellValue(1);
The problem with using the preceding XSSF code is that your program would be able to produce only the XML format of Excel. What if you wanted to write something that's independent of the format? Wouldn't it be better if your code would work for either format? To do that, you'd code the following:
Workbook wb=null; if (use_xssf == TRUE) wb = new XSSFWorkbook(); else wb = new HSSFWorkbook(); Sheet s = wb.createSheet(); Row row = sheet.createRow(0); Cell cell = row.createCell(0); cell.setCellValue(1);
When you create the new workbook, you have to decide whether it's an XSSF or HSSF object. However, for the rest of the code, you can code it exactly the same whether it's XSSF or HSSF. The underlying code looks at the workbook object and determines which underlying routine to call (the HSSF or XSSF routine) so you don't have to code things separately for each format.
In my RPG interfaces to these routines, I used the format-independent routines wherever I could. I kept the existing HSSF routines for backward compatibility, but I tried not to add any new code that used the HSSF or XSSF routines directly.
However, if I used the naming scheme that POI used, I quickly ran into problems. Too many of my existing programs had fields with names like Row, Cell, Sheet, and Workbook. Using these simpler names caused a lot of conflicts for me. So I made the decision to prefix them with SS (for "spreadsheet") to eliminate these conflicts.
Therefore, the RPG code to create a new spreadsheet and add a simple cell to it in a format-neutral way looks like this:
D book s like(SSWorkbook) D sheet s like(SSSheet) D row s like(SSRow) D cell s like(SSCell) D filename s 100a varying . . ss_begin_object_group(100); if (xssf); book = new_XSSFWorkbook(); filename = '/tmp/example.xlsx'; else; book = new_HSSFWorkbook(); filename = '/tmp/example.xls'; endif; sheet = SS_newSheet(book: 'Test'); row = SSSheet_createRow(sheet: 0); cell = SSRow_createCell(row: 0); SSCell_setCellValueD(cell: 1234); ss_Save(book: filename); ss_end_object_group();
You still have to create the workbook in either XSSF or HSSF format so that the system knows what type of document to work with. Once you've done that, the same code can be used to add sheets, rows, and cells without worrying about which format the data is to be stored in. The POI code figures out the format from the workbook you created initially and stores the data accordingly.
The routines work exactly the same way as the older HSSF routines do. The only difference is that they start with SS instead of HSSF, and they work with data in either format.
I have been changing my existing programs to use these format-independent routines so that I can output XSSF spreadsheets. So far I have found this process to be quite painless—by and large, it's just a matter of doing a find/replace to change HSSF into SS, then saving the member and recompiling it. However, it's worth noting that the copy book name and binding directory name have not changed—so be careful when doing a find/replace that you keep the copy book name set to HSSF_H and the binding directory as HSSF.
Reading Existing Workbooks in a Format-Independent Way
Like the earlier versions of POI, you can read an existing spreadsheet into memory, retrieve its cells, add new cells, or update existing cells. The format-independent routines do this exactly the same way as the HSSF routines did.
The neat thing about using the format-independent routines instead of the HSSF routines is that they work with either document type. When creating a workbook from scratch (in the code samples of the previous section), I had to specify either new_HSSFWorkbook() or new_XSSFWorkbook. However, when reading an existing workbook, this is unnecessary—it'll figure out the format from the file you're reading!
Therefore, I can do something like this:
ss_begin_object_group(100); // // Load an existing spreadsheet into memory // filename = '/tmp/demo.xls'; book = ss_open(filename); if (book = *null); complain('Unable to open workbook!'); endif; // // get the SSCell object that needs changing // sheet = ss_getSheet(book: 'January'); if (sheet = *null); Complain('No January sheet in workbook!'); endif; row = SSSheet_getRow(sheet: 5); if (row = *null); row = SSSheet_createRow(sheet: 5); endif; cell = SSRow_GetCell(row: 1); if (cell = *null); cell = SSRow_createCell(row: 1); endif; SSCell_setCellValueD(cell: 12345); ss_Save(book: filename); ss_end_object_group();
I think it's really cool that I don't have to specify the format at all. Simply by changing the filename to point to an HSSF or XSSF file, I can automatically read it in the right format and update that format. No changes to the code are needed at all (other than converting the filename variable into a parameter, that is).
The Event Model
In previous articles, I demonstrated how to use the "event model" to parse an existing spreadsheet. The event model cycles through the entire workbook, and for each cell it finds, it calls a routine of your choosing to give you the contents of a given numeric, character, or formula cell.
I demonstrated this with a utility that I called XLPARSER4, intended to make it easy to parse an XLS workbook from RPG.
I'm currently in the process of converting XLPARSER4 to use POI 3.6. The existing copy of XLPARSER4 does not work with POI 3.6, yet. Look for an article in this newsletter within the next month or two that provides an updated copy of XLPARSER4 that will work with POI 3.6 and will let you parse both HSSF and XSSF workbooks. In the meantime, please keep using XLPARSER4 with POI 3.1.
Examples with Format-Independent Code
All the RPG sample programs that I've published in previous articles in this series have been updated to use format-independent code. If you download the code from this article (or subsequent articles) you'll find that updated copies of all of the previous sample programs are included. Looking over the code in these updated programs is a great way to familiarize yourself with the new format-independent routines.
Looking Forward
I'd like your opinion . . . would it make sense for me to rewrite my existing POI articles to take advantage of the new version? I find myself wondering how difficult this stuff is for someone to learn at this point in time. You'd have to go back and learn how to do things the old way, then read the later articles and update what you've learned. Is that easy enough to do? Or would it make sense to release a new series of articles that demonstrates the new POI version from the ground up?
If you have an opinion on this, please leave a comment, below.
Conclusion & Download
This article is the culmination of a lot of research and recoding of my programs. I'm very pleased (and relieved) to finally be releasing it to the public. I don't normally spend this much time writing code for a single article!
Click here to download the HSSFR4 (RPG utilities for Excel) code, as well as examples.
I sincerely hope you find this as useful and fun as I do!