Simulated Dendrite: Corrupted PDF Generation in ABAP

PDF generation in ABAP is simple, and a lot of tutorials show how to develop an application that generate such a PDF and save it on the back-end or on the front-end. However, from times to times, the generated PDF is corrupted. This occured for several clients. This notes describes the problem, and provide a solution.

Problem

Generating a PDF is reasonably easy with ABAP, be it from a smartform, or from an Adobe Document Server. However, sometimes the generated PDF can not be opened with Adobe Reader (some other reader may however open it). The reader display a message indicating that the PDF is corrupted.

If you look at the PDF using an application such as Notepad++, you may notice that at the end of the file, the PDF is closed by the text %%EOF, followed by some NULL characters.

Workaround

Using the same Notepad++ to remove the trailing NULLs gives a file that is no more corrupted.

Explanation

If you look at the PDF definition document (ISO 32000-1:2008), the following text could help you understand the problem (extracted from an answer on StackOverflow on the subject):

7.5.5. File Trailer

The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF.

In fact, previous version of Adobe Reader where a bit more lax in this part, and allowed the %%EOF marker to be followed by some data. Since this could be used nefariously, this has been in part removed.

Experimental tests show that if there is more than 1000 NULLs after the %%EOF marker, the PDF will be considered as invalid.

Then comes the question : WHY is there trailing NULLs ? The easy answer is that you put them here...

Each time a customer had the problem, we found a code simlilar to the following one used to save the data on the ECC instance :

OPEN DATASET w_file FOR OUTPUT IN BINARY MODE.

LOOP AT i_pdf INTO s_tab.

   TRANSFER s_tab TO w_file

ENDLOOP.

CLOSE DATASET w_file.

However, S_TAB is a 1024 raw char table. You are adding data in the file by bloc of 1024 chars. If the %%EOF marker is at the beginning of your last bloc, then NULLs will be added to fill the missing 1024 chars.

Solution

What you should be doing is putting data by bloc of 1024 chars, except for the last line. For this last line, you should check where is the end of the %%EOF marker and limit data to this part :

DATA: c_eof TYPE x LENGTH 5 VALUE '2525454F46'.

LOOP AT i_pdf INTO s_tab.

   IF sy-tabix EQ table_size. "last line

      FIND FIRST OCCURRENCE OF c_eof IN s_tab 

      IN BYTE MODE MATCH OFFSET position.

      IF sy-subrc EQ 0. " if found (should be the case)

         linesize = position + 6. "size of %%EOF + size of LF

      ELSE. " not found : probably splitted

         linesize = 512.

      ENDIF.

   ELSE.   " not the last line

      linesize = 1024.

   ENDIF.

      TRANSFER s_tab TO w_file LENGTH linesize.

ENDLOOP.

CLOSE DATASET w_file.

The special part is that if we can not find the marker on the last line, it is probably splitted between this line and the previous. In this case we add only 512 chars (to be under the 1000 chars limit).

>> This solution has yet to generate a corrupted PDF.

Front-end download

If you are saving the PDF on the front-end (instead of the back-end) as a binary file, then you may have the same problem if you don't give the size of the binary file. In this case, since the data structure is the same, the file will have a size multiple of 1024.

10 February 2017

Corrupted PDF Generation in ABAP

Problem

Workaround

Explanation

Solution

Front-end download

No comments:

Post a Comment