Unicode character set is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
Unicode can be implemented by different character encodings. The Unicode standard defines UTF-8, UTF-16, and UTF-32, and several other encodings are in use.
The main difference between traditional character sets (examples: US-ASCII, ISO-8859-1) and unicode (example: UTF_8) is in the way
they encode the character and the number of bits that they use for each. ASCII originally used seven bits to encode each character.
In contrast, Unicode uses a variable bit encoding program where you can choose between 32, 16, and 8-bit encodings.
While in IBM-i a stream file can easily be converted from a CCSID to another CCSID (example: from CCSID 819 US_ASCII to CCSID 37 English EBCDIC),
converting from a stream file from a regular CCSID (like 37 or 819) to an Unicode CCSID (like 1208 UTF-8) is not easy,
nor it is ready to read an Unicode stream file through an RPG program.
This is why we provide some tools to convert character strings or stream files to or from Unicode UTF-8 (CCSID 1208).
- Working with Unicode stream files
Non-Unicode CCSID stream files data, when opened for read with open flag O_TEXTDATA (in a job with CCSID other than 65535 !!!), are automatically converted to the job CCSID, and the program reading them has no understanding problems.
Instead, opening an Unicode stream file with open flag O_TEXTDATA results into an exception.
Because of this, a Unicode stream file should be converted to a "regular" CCSID in prder to be read by a program.
This is why some conversion tool have been made available in CVT101.
Note that these tools have been fully tested only for UTF-8 (CCSID 1208).
These conversion tools are all based on Iconv API, the IBM-i system tool able to convert strings from one CCSID to another CCSID, Unicode included.
- The Iconv API
According to Wikipedia,
"the iconv API is the standard programming interface for converting character strings from one character encoding to another in Unix-like operating systems."
The iconv API was made available on iSeries with release V5R2.
The iconv API is perfect for converting Unicode code page (1208) to any other code page or CCSID, including that of the running job. Obviously it can work the other way around.
The only problem is that it is not an easy API to deal with.
- The CvtStg (Convert String) subprocedure
Library CVT101 contains a small service program CVT101/CVT101.
This service program features a Convert String subprocedure, that would help in converting a string from one code page (CCSID) to another code page (CCSID)
through the iconv API.
The following example shows how to use the CvtStg subprocedure:
* Assume that
* 1-The string to be converted is in a buffer addressed by the following pointer
D SrcPointer s *
* -its length is defined by the following variable
D SrcLen s 10u 0
* -that its code page (CCSID) is defined by the variable
D SrcCodePage s 10u 0
* 2-The buffer to receive the converted string is addressed by the following pointer
D TgtPointer s *
* -its maximum length is defined by the following variable
D TgtMaxLen s 10u 0
* -the variable to receive its computed length is
D TgtLen s 10u 0
* -and the target code page is defined in
D TgtCodePage s 10u 0
* Define two more variables to save/restore the two string pointers
D SrcPointerSav s *
D TgtPointerSav s *
*In order to call the subprocedure CvtStg you need the following prototype:
D CVTSTG pr
D 10u 0
D *
D 10u 0
D 10u 0
D *
D 10u 0
D 10u 0
*This is how you perform the string conversion:
C eval SrcPointerSav=SrcPointer
C eval TgtPointerSav=TgtPointer
C callp CvtStg(SrcCodePage:SrcPointer:SrcLen:
C TgtCodePage:TgtPointer:TgtMaxLen:TgtLen)
C eval SrcPointer=SrcPointerSav
C eval TgtPointer=TgtPointerSav
|
- Command StmfCvt - Convert a stream file
This CVT101 command allows to convert a stream file from a code page to another code page:
Convert a stream file (STMFCVT)
Type choices and press Enter.
Source stream file . . . . . . . SRCSTMF
Target stream file . . . . . . . TGTSTMF
Target code page . . . . . . . . TGTCODEPAG 819 Number, *JOB, *UNICODE...
Display target stream file . . . DSPTGT *NO
|
Note 1- The code page of the source stream file does not have to be specified.
Note 2- The target stream file could be the same as the source stream file.
Note 3- The target code page can be specified as
- a number lower than 65535 (e.g. 819 for U.S. ASCII, 1208 for Unicode)
- *JOB to use the job CCSID as target code page
- *UNICODE to mean code page 1208
- *ASCII to mean code page 819
- Command DirCvt - Convert all the stream files within a directory
This CVT101 command allows to convert to another code page all the stream files of a given directory:
Convert all STMF's in a dir (DIRCVT)
Type choices and press Enter.
Source directory . . . . . . . . SRCDIR
Target directory . . . . . . . . TGTDIR
Target code page . . . . . . . . TGTCODEPAG 819 Number, *JOB, *UNICODE...
|
Note 4- The stream files in the directory could have different code pages.
Note 5- The target directory could be the same as the source directory.
|