HTML-to-RTF conversion

One of the essential PD4ML features is the creation of RTF documents from styled HTML templates.

Despite the fact that the RTF format is quite old and standardized, few viewers implement all of its features. For example, on the MacOS platform, tables look corrupted (like a bunch of text paragraphs) and images don't show up at all in the rendered layout. MS Word is probably the most feature-rich RTF viewer/editor, and PD4ML's RTF output is mostly focused on it.

PD4ML can convert from HTML to RTF the following elements:

  • Page margins
  • Text styles and fonts
  • Text backgrounds
  • Text indentation
  • Ordered and unordered lists (also in right-to-left Arabic and Hebrew direction)
  • Tables (with correct table nesting). It supports col- and row-spans, table and cell backgrounds, cell paddings. Border style (width) is not supported for the time being.
  • Images
  • Hyperlinks (external and internal), image hyperlinks
  • Headers / footers. There is a possibility to define individual header and footer for title page.
  • Forced page breaks

In the list of supported features, we want to focus on support for nested tables. First of all, it is not a trivial task in itself to create nested tables in WYSIWYG editors, but PD4ML does it perfectly.

But tables have other uses as well. In RTF, it is not possible to define a border around a paragraph or assign a background color to it at the block level. PD4ML implicitly transforms such paragraphs into single cell RTF tables and makes all the missing features available. With this approach, support for nested tables becomes critical, since tables can technically appear in unexpected places for the author of the document.

The HTML to RTF conversion can be triggered by the following API calls:

// read and parse HTML
pd4ml.readHTML(inputStream);
boolean contvertImagesToWmf = true; // 'true' improves compatibility but increases resulting file size
pd4ml.writeRTF(outputStream, contvertImagesToWmf);
pd4ml.outputFormat(PD4Constants.RTF);
// or optionally...
pd4ml.outputFormat(PD4Constants.RTF_WMF);
pd4ml.render(inputStream, outputStream);
The equivalents in JSP taglib:
<pd4tl:transform ... outputFormat="rtf"> ... </pd4tl:transform>

<pd4tl:transform ... outputFormat="rtfwmf"> ... </pd4tl:transform>
<pd4ml:transform ... outputFormat="rtf"> ... </pd4ml:transform>

<pd4ml:transform ... outputFormat="rtfwmf"> ... </pd4ml:transform>
(in the case the transform tag automatically sets corresponding Content-type HTTP header application/rtf)

In the command line tool:

java -Xmx512m -Djava.awt.headless=true -jar ./pd4ml.jar <URL> 1200 -out doc.rtf -outformat rtf

java -Xmx512m -Djava.awt.headless=true -jar ./pd4ml.jar <URL> 1200 -out doc.rtf -outformat rtfwmf
java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out doc.rtf -outformat rtf

java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out doc.rtf -outformat rtfwmf

The only difference between RTF and RTF_WMF is in embedded images: with RTF it embeds to RTF images "as is": PNG, JPEG etc. In RTF_WMF mode it converts al images to WMF format for compatibility with WordPad.exe. As a drawback of the image compatibility is a significantly bigger output file size.

Full converter Java application examples:


package samples;

import java.awt.Insets;
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.security.InvalidParameterException;

import com.pd4ml.Dimensions.Units;
import com.pd4ml.PD4ML;
import com.pd4ml.PageMargins;
import com.pd4ml.PageSize;

public class GettingStarted2 {
	protected int topValue = 10;
	protected int leftValue = 20;
	protected int rightValue = 10;
	protected int bottomValue = 10;
	protected int userSpaceWidth = 1300;

	public static void main(String[] args) {
		try {
			GettingStarted2 jt = new GettingStarted2();
			jt.doConversion("https://pd4ml.com/i/rtf/demo.htm", "c:/invoice.rtf");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	public void doConversion( String url, String outputPath ) 
				throws InvalidParameterException, MalformedURLException, IOException {
		File output = new File(outputPath);
		java.io.FileOutputStream fos = new java.io.FileOutputStream(output);

		PD4ML pd4ml = new PD4ML();
			
		pd4ml.setHtmlWidth(userSpaceWidth); // set frame width of "virtual web browser" 
			
		// choose target paper format and "rotate" it to landscape orientation
		pd4ml.setPageSize(PageSize.A4.rotate()); 
			
		// define PDF page margins
		pd4ml.setPageMargins(new PageMargins(topValue, leftValue, bottomValue, rightValue, Units.MM)); 

		// read and parse HTML
		pd4ml.readHTML(new URL(url));
		boolean contvertImagesToWmf = false;
		pd4ml.writeRTF(fos, contvertImagesToWmf);  // actual document conversion from URL to RTF file
		fos.close();
			
		System.out.println( outputPath + "\ndone." );
	}
}

package samples;

import java.awt.Insets;
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.security.InvalidParameterException;

import org.zefer.pd4ml.PD4Constants;
import org.zefer.pd4ml.PD4ML;

public class GettingStarted2 {
	protected int topValue = 10;
	protected int leftValue = 20;
	protected int rightValue = 10;
	protected int bottomValue = 10;
	protected int userSpaceWidth = 1300;

	public static void main(String[] args) {
		try {
			GettingStarted2 jt = new GettingStarted2();
			jt.doConversion("https://pd4ml.com/i/rtf/demo.htm", "c:/invoice.rtf");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	public void doConversion( String url, String outputPath ) 
				throws InvalidParameterException, MalformedURLException, IOException {
		File output = new File(outputPath);
		java.io.FileOutputStream fos = new java.io.FileOutputStream(output);

		PD4ML pd4ml = new PD4ML();
			
		pd4ml.setHtmlWidth(userSpaceWidth); // set frame width of "virtual web browser" 
			
		// choose target paper format and "rotate" it to landscape orientation
		pd4ml.setPageSize(pd4ml.changePageOrientation(PD4Constants.A4)); 
			
		// define PDF page margins
		pd4ml.setPageInsetsMM(new Insets(topValue, leftValue, bottomValue, rightValue)); 

		// Force generate RTF instead of PDF
		pd4ml.outputFormat(PD4Constants.RTF_WMF);

		pd4ml.render(new URL(url), fos); // actual document conversion from URL to RTF file
		fos.close();
			
		System.out.println( outputPath + "\ndone." );
	}
}

RTF conversion samples: