PDF generation in ABAP is simple, and a lot of tutorials show how to develop an application that generate such a PDF and save it on the back-end or on the front-end. However, from times to times, the generated PDF is corrupted. This occured for several clients. This notes describes the problem, and provide a solution.
Problem
Generating a PDF is reasonably easy with ABAP, be it from a smartform, or from an Adobe Document Server. However, sometimes the generated PDF can not be opened with Adobe Reader (some other reader may however open it). The reader display a message indicating that the PDF is corrupted.
If you look at the PDF using an application such as Notepad++, you may notice that at the end of the file, the PDF is closed by the text %%EOF, followed by some NULL characters.
Workaround
Using the same Notepad++ to remove the trailing NULLs gives a file that is no more corrupted.
Explanation
If you look at the PDF definition document (ISO 32000-1:2008), the following text could help you understand the problem (extracted from an answer on StackOverflow on the subject):
In fact, previous version of Adobe Reader where a bit more lax in this part, and allowed the %%EOF marker to be followed by some data. Since this could be used nefariously, this has been in part removed.
Experimental tests show that if there is more than 1000 NULLs after the %%EOF marker, the PDF will be considered as invalid.
Then comes the question : WHY is there trailing NULLs ? The easy answer is that you put them here...
Each time a customer had the problem, we found a code simlilar to the following one used to save the data on the ECC instance :
However, S_TAB is a 1024 raw char table. You are adding data in the file by bloc of 1024 chars. If the %%EOF marker is at the beginning of your last bloc, then NULLs will be added to fill the missing 1024 chars.
What you should be doing is putting data by bloc of 1024 chars, except for the last line. For this last line, you should check where is the end of the %%EOF marker and limit data to this part :
The special part is that if we can not find the marker on the last line, it is probably splitted between this line and the previous. In this case we add only 512 chars (to be under the 1000 chars limit).
>> This solution has yet to generate a corrupted PDF.
7.5.5. File Trailer
The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF.
The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF.
In fact, previous version of Adobe Reader where a bit more lax in this part, and allowed the %%EOF marker to be followed by some data. Since this could be used nefariously, this has been in part removed.
Experimental tests show that if there is more than 1000 NULLs after the %%EOF marker, the PDF will be considered as invalid.
Then comes the question : WHY is there trailing NULLs ? The easy answer is that you put them here...
Each time a customer had the problem, we found a code simlilar to the following one used to save the data on the ECC instance :
OPEN DATASET w_file FOR OUTPUT IN BINARY MODE.
LOOP AT i_pdf INTO s_tab.
TRANSFER s_tab TO w_file
ENDLOOP.
CLOSE DATASET w_file.
LOOP AT i_pdf INTO s_tab.
TRANSFER s_tab TO w_file
ENDLOOP.
CLOSE DATASET w_file.
However, S_TAB is a 1024 raw char table. You are adding data in the file by bloc of 1024 chars. If the %%EOF marker is at the beginning of your last bloc, then NULLs will be added to fill the missing 1024 chars.
Solution
What you should be doing is putting data by bloc of 1024 chars, except for the last line. For this last line, you should check where is the end of the %%EOF marker and limit data to this part :
DATA: c_eof TYPE x LENGTH 5 VALUE '2525454F46'.
LOOP AT i_pdf INTO s_tab.
IF sy-tabix EQ table_size. "last line
FIND FIRST OCCURRENCE OF c_eof IN s_tab
IN BYTE MODE MATCH OFFSET position.
IF sy-subrc EQ 0. " if found (should be the case)
linesize = position + 6. "size of %%EOF + size of LF
ELSE. " not found : probably splitted
linesize = 512.
ENDIF.
ELSE. " not the last line
linesize = 1024.
ENDIF.
TRANSFER s_tab TO w_file LENGTH linesize.
ENDLOOP.
CLOSE DATASET w_file.
LOOP AT i_pdf INTO s_tab.
IF sy-tabix EQ table_size. "last line
FIND FIRST OCCURRENCE OF c_eof IN s_tab
IN BYTE MODE MATCH OFFSET position.
IF sy-subrc EQ 0. " if found (should be the case)
linesize = position + 6. "size of %%EOF + size of LF
ELSE. " not found : probably splitted
linesize = 512.
ENDIF.
ELSE. " not the last line
linesize = 1024.
ENDIF.
TRANSFER s_tab TO w_file LENGTH linesize.
ENDLOOP.
CLOSE DATASET w_file.
>> This solution has yet to generate a corrupted PDF.
No comments:
Post a Comment