Skip to main content  
        iSeries home   |   Easy400     |   CGIDEV2  
Public source
 
About it
IBM i and Unicode
IBMi and Base64
CRLF commands
DSPSTMF command
 
Download
 
 
IBMi and Unicode  

Unicode character set is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. Unicode can be implemented by different character encodings. The Unicode standard defines UTF-8, UTF-16, and UTF-32, and several other encodings are in use.
The main difference between traditional character sets (examples: US-ASCII, ISO-8859-1) and unicode (example: UTF_8) is in the way they encode the character and the number of bits that they use for each. ASCII originally used seven bits to encode each character. In contrast, Unicode uses a variable bit encoding program where you can choose between 32, 16, and 8-bit encodings.

While in IBM-i a stream file can easily be converted from a CCSID to another CCSID (example: from CCSID 819 US_ASCII to CCSID 37 English EBCDIC), converting from a stream file from a regular CCSID (like 37 or 819) to an Unicode CCSID (like 1208 UTF-8) is not easy, nor it is ready to read an Unicode stream file through an RPG program.
This is why we provide some tools to convert character strings or stream files to or from Unicode UTF-8 (CCSID 1208).

  • Working with Unicode stream files
    Non-Unicode CCSID stream files data, when opened for read with open flag O_TEXTDATA (in a job with CCSID other than 65535 !!!), are automatically converted to the job CCSID, and the program reading them has no understanding problems.
    Instead, opening an Unicode stream file with open flag O_TEXTDATA results into an exception.
    Because of this, a Unicode stream file should be converted to a "regular" CCSID in prder to be read by a program.
    This is why some conversion tool have been made available in CVT101.
    Note that these tools have been fully tested only for UTF-8 (CCSID 1208).
    These conversion tools are all based on Iconv API, the IBM-i system tool able to convert strings from one CCSID to another CCSID, Unicode included.

  • The Iconv API
    According to Wikipedia, "the iconv API is the standard programming interface for converting character strings from one character encoding to another in Unix-like operating systems."
    The iconv API was made available on iSeries with release V5R2.
    The iconv API is perfect for converting Unicode code page (1208) to any other code page or CCSID, including that of the running job. Obviously it can work the other way around.
    The only problem is that it is not an easy API to deal with.

  • The CvtStg (Convert String) subprocedure
    Library CVT101 contains a small service program CVT101/CVT101. This service program features a Convert String subprocedure, that would help in converting a string from one code page (CCSID) to another code page (CCSID) through the iconv API.
    The following example shows how to use the CvtStg subprocedure:
     * Assume that 
     * 1-The string to be converted is in a buffer addressed by the following pointer
    D SrcPointer      s               *
     *  -its length is defined by the following variable
    D SrcLen          s             10u 0
     *  -that its code page (CCSID) is defined by the variable
    D SrcCodePage     s             10u 0
     * 2-The buffer to receive the converted string is addressed by the following pointer
    D TgtPointer      s               *
     *  -its maximum length is defined by the following variable
    D TgtMaxLen       s             10u 0
     *  -the variable to receive its computed length is
    D TgtLen          s             10u 0
     *  -and the target code page is defined in
    D TgtCodePage     s             10u 0
     * Define two more variables to save/restore the two string pointers
    D SrcPointerSav   s               *
    D TgtPointerSav   s               *
    
     *In order to call the subprocedure CvtStg you need the following prototype:
    D CVTSTG          pr                 
    D                               10u 0
    D                                 *  
    D                               10u 0
    D                               10u 0
    D                                 *  
    D                               10u 0
    D                               10u 0
     
     *This is how you perform the string conversion:
    C                   eval      SrcPointerSav=SrcPointer                 
    C                   eval      TgtPointerSav=TgtPointer                 
    C                   callp     CvtStg(SrcCodePage:SrcPointer:SrcLen:
    C                             TgtCodePage:TgtPointer:TgtMaxLen:TgtLen)
    C                   eval      SrcPointer=SrcPointerSav                 
    C                   eval      TgtPointer=TgtPointerSav
    


  • Command StmfCvt - Convert a stream file
    This CVT101 command allows to convert a stream file from a code page to another code page:
                             Convert a stream file (STMFCVT)
                                                                                    
     Type choices and press Enter.
    
     Source stream file . . . . . . . SRCSTMF                                       
                     
     Target stream file . . . . . . . TGTSTMF                                       
                     
     Target code page . . . . . . . . TGTCODEPAG    819    Number, *JOB, *UNICODE...
    
     Display target stream file . . . DSPTGT        *NO
    
    
    Note 1- The code page of the source stream file does not have to be specified.
    Note 2- The target stream file could be the same as the source stream file.
    Note 3- The target code page can be specified as
    • a number lower than 65535 (e.g. 819 for U.S. ASCII, 1208 for Unicode)
    • *JOB to use the job CCSID as target code page
    • *UNICODE to mean code page 1208
    • *ASCII to mean code page 819

  • Command DirCvt - Convert all the stream files within a directory
    This CVT101 command allows to convert to another code page all the stream files of a given directory:
                          Convert all STMF's in a dir (DIRCVT)
                                                                                    
     Type choices and press Enter.                                           
                                                                                    
     Source directory . . . . . . . . SRCDIR                                       
                     
     Target directory . . . . . . . . TGTDIR                                       
                     
     Target code page . . . . . . . . TGTCODEPAG    819    Number, *JOB, *UNICODE...
    
    
    
    Note 4- The stream files in the directory could have different code pages.
    Note 5- The target directory could be the same as the source directory.


    Contact