Technical Specifications for ETL Process to Re-encrypt the Doc Header Content Field and the Maintenance Document Content Field.
Overview
The Kuali database contains 3 tables that each have a field of type CLOB, where the CLOB contains encrypted XML content. These tables are:
KR schema: KREW_DOC_HDR_CNTNT_T
KR schema: KRNS_MAINT_DOC_T
KFS schema: KRNS_MAINT_DOC_T
Note that the KRNS_MAINT_DOC_T table in the KR schema is relatively small, but is included here for completeness.
The KREW_DOC_HDR_CNTNT_T table in the KR schema and the KRNS_MAINT_DOC_T in the KFS schema are very large and each contain an encrypted column. The re-encryption process is too time-consuming to process the entire table every time we refresh a database. Therefore, we need to maintain a copy where the encrypted column is encrypted with the dev key. This will allow a ready to use table for the refresh process.
In order to minimize processing time, we will keep an updated table. Every night we will add the rows that have been added or modified to the production table. Only the modified and new rows will need to be encrypted each night.
Details
- Develop a process that maintains a copy of each table (krew_doc_hdr_cntnt_t and krns_maint_doc_t tables).
A base line copy of the table will be created. The ETL process will run each night and update the copy with any new records that have been added to the production version.
The krew_doc_hdr_cntnt_t table exists in the kr_prod schema and the krns_maint_doc_t table exists in both the kfs_prod and kr_prod schemas.
The process will include a hook to call a re-encryption function. The re-encrypt function should be called to transform the doc_cntnt_txt field before storing the value in the copy table. This function is not yet developed. The ETL process should run with or without the call to the re-encrypt function.
The key to both the krew_doc_hdr_cntnt_t and krns_maint_doc_t tables is doc_hdr_id. The only other field is doc_cntnt_txt, or doc_cntnt, which is an encrypted string of type CLOB. Since the encrypted value will be different in the copy table it will be inefficient to compare the production table with the copy table. In order to make comparisons easier, we discussed maintaining a third table. This table will be updated exactly the same as the copy table, but without re-encrypting the doc_contnt_txt field. For example:
Table A – production table.
Table B – copy of production table.
Table C – re-encrypted table.
The delta of A and B are the days changes. The delta is applied directly (without re-encrypting) to B. The same delta is applied to C, but with re-encrypting.
2. A re-encryption function must be created.
This will be an Oracle function that calls a new java class, Reencrypt.java. This class will utilize the existing EncryptionService.decrypt method, and also a new class DevEncryptionService to first decrypt, then encrypt and return a new encrypted value.
A re-encrypt function will call this class passing a string.
There is one unresolved problem. That is the CLOB fields that are longer than 4000 characters cannot be converted to varchar2 in SQL Developer. This may just be a problem with the tool. There is no issue when done through java using OJB or the Spring library.