Variable data jobs are increasingly used in some workflows. Variable data printing jobs usually have relatively large areas of the page remaining constant or repeated over multiple pages with small areas, such as text, being changed for each page. Time savings can be made by processing the constant areas only once, especially if the constant areas are complex or large graphic objects. This is the idea behind the Harlequin VariData (HVD) feature. The RIP detects constant areas within a PDF file, retains them, and then re-uses them as necessary,

Any PDF file with pages that share raster elements and has marks that change from page to page should be accelerated by this optimization in the RIP. The RIP scans the PDF for such pages, RIPs the shared raster elements once, and then retains them for use on subsequent pages with the same raster elements.

HVD intelligently identifies graphical elements and groups of graphical elements and groups of graphical elements that are used together multiple times. In doing so it can make use of the "hint" attributes defined in ISO 16612-2 (PDF/VT). Specifically, GTS_Encapsulated and GTS_XID are used, even if the file is a baseline PDF and not PDF/VT. Inclusion of those keys in a PDF file that is being created for variable data printing will likely increase the HVD scan speed.

Harlequin VariData modes

HVD has two modes of operation:

HVD internal mode (iHVD) is where the combination of cached and uncached elements to form the final page raster is performed within the RIP. This mode supports one shared background per page, and one variable element composed over the top of the background. By design, iHVD is more restricted in which marks it can cache. Hence, eHVD and iHVD scans may identify different combinations of graphical elements for caching.
In HVD external mode (eHVD), fixed and variable elements are provided to a client built into the RIP skin, along with metadata defining how to reassemble these elements into final pages. HVD can cache and compose a page from any number of rasters in external mode. In addition, it can cope with imposed flats where several images and text layers are placed on top of each other. In general, external HVD is faster than internal HVD, because it can decompose the variable data job into smaller elements, which can be cached more effectively.

External HVD has two sub-types: position independent and non position independent eHVD. These are explained in eHVD elements and backgrounds. Position-independent eHVD allows that any single cached element to be used at multiple x,y offsets on the page. Its use leads to increased efficiency, particularly for certain classes of VDP jobs such as those containing multiple coupons in lots of different permutations from page to page. Position-independence is especially valuable:
- When a page layout "flexes" with the specific data included in each instance. For example, in a direct mail piece where graphics and images are moved down the page if one recipient's address is longer than others.
- When many instances of a direct mail piece or label are imposed together into a single PDF "page" representing an imposed sheet, and where significant graphics on each imposed instance are selected based on the recipient's metadata and therefore appear pseudo-randomly laid out when the sheets as a whole are viewed.
When using position-independent HVD, you should note the following:
- By default, /OptimizedPDFIgnorePatternPhase is set to false, meaning the presence of a pattern in the job being scanned amends processing within the RIP if position independent HVD was enabled. The rasters and events are still output in the format expected for the value of the OptimizedPDFPositionIndependent flag, but multiple instances of the same graphic or collection of graphics at different phase offsets relative to the pixel grid are treated as different and are rendered separately.
To treat these as the same and hence potentially improve processing speed, set /OptimizedPDFIgnorePatternPhase to true.

HVD external mode is the same as previously described by Global Graphics as "ERR2". Some source code files, configurations and page features still refer to ERR2.

HVD documentation

The Harlequin Extensions Manual describes the configuration commands to activate and control HVD.
Harlequin Technical Note Hqn101: Harlequin VariData hint tags describes the hints that can be included in an optimized PDF or PDF/VT file to further enhance HVD behavior.

Raster back ends that work with HVD

HVD in internal mode works with all Harlequin Core raster back ends. In external mode the raster back end needs to explicitly handle the eHVD handshake (except when using a diagnostic value of OptimizedPDFCacheID).

In the "clrip" application, the raster backends that support HVD external mode are:

HVDNONE: This backend is implemented in hvdrast.c. It uses a /OptimizedPDFCacheID value of GG_HHR_HVDNONE_ERR2 or GG_HHR_HVDNONE_SHM_ERR2. HVDNONE discards the raster data. The difference between the cache ID values is that GG_HHR_HVDNONE_ERR2 saves the cached element data in process memory framebuffers, or GG_HHR_HVDNONE_SHM_ERR2 saves the cached element data in shared memory framebuffers before discarding them. When using the Scalable RIP, GG_HHR_HVDNONE_SHM_ERR2 may be able to share some element rasters between different Farm RIPs.

HVDRAW: This backend is implemented in hvdrast.c. It uses a /OptimizedPDFCacheID value of GG_HHR_HVDRAW_ERR2 or GG_HHR_HVDRAW_SHM_ERR2. HVDRAW generates a raw file for each element, named <id>.raw, and an XML file containing the page and element info, named <job>.pages.xml. The difference between the cache ID values is that GG_HHR_HVDRAW_ERR2 saves the cached element data in process memory framebuffers, or GG_HHR_HVDRAW_SHM_ERR2 saves the cached element data in shared memory framebuffers. When using the Scalable RIP, GG_HHR_HVDRAW_SHM_ERR2 may be able to share some element rasters between different Farm RIPs. For more information on the format of raw files, see the /RAW raster backend.

DEMOTIFF, LIBTIFF, LIBTIFFPS: These backends are implemented in libtiffrast.c. All of these variants of the TIFF output backend use the /OptimizedPDFCacheID value of GG_HHR_LIBTIFF_ERR2.

ASYNCTIFF: This backend is implemented in asynctiffrast.c. This variant of the TIFF output backend uses the /OptimizedPDFCacheID value of GG_HHR_ASYNCTIFF_ERR2.

FRAMETIFF: This backend is implemented in frametiffrast.c. This variant of the TIFF output backend uses the /OptimizedPDFCacheID values of GG_HHR_FRAMETIFF_ERR2 or GG_HHR_FRAMETIFF_SHM_ERR2. The difference between the cache ID values is that GG_HHR_FRAMETIFF_ERR2 saves the cached element data in process memory framebuffers, or GG_HHR_FRAMETIFF_SHM_ERR2 saves the cached element data in shared memory framebuffers. When using the Scalable RIP, GG_HHR_FRAMETIFF_SHM_ERR2 may be able to share some element rasters between different Farm RIPs.

HVD example configurations

Four example page features are provided with Harlequin Core that turn on external mode optimization, all of them found in the SW/Page Features directory:

HVDInternal enables the internal HVD mode, which can be used with any raster output backend.
HVDNone to be used with the HVDNONE example raster backend, discarding output data.
HVDRaw to be used with the HVDRAW example backend, delivering raw raster data and metadata.
HVDDemo, which can be used with any raster backend to demonstrate how pages are deconstructed by HVD.

See the comments in page feature each for more detail.

The OptimizedPDFCacheID usually needs to be the appropriate string for the raster backend in use; the exception is the GGDUMB1 cache ID, which can be used with any raster backend to demonstrate the elements that the page is constructed from.

eHVD and ContoneMask

HVD external mode usually needs to use of ContoneMask for masking, when the raster backend is programmed to handle it. This shifts the color values in the output raster, so that the client composing the raster elements can detect if a pixel was touched when rendering elements or not. If the raster backend is programmed appropriately, you may want to add the following to your configuration or a Page Feature:

<< /ContoneMask 1 >> setpagedevice

HVD incompatibilities

HVD and TrapPro are mutually exclusive. If an attempt is made to enable them both at the same time, HVD is turned off with the warning:

%%[ Warning: TrapPro enabled disabling Harlequin VariData ]%%.

Likewise, RLE output and HVD are incompatible and turning both on at once disables HVD with the warning:

%%[ Warning: RLE output enabled disabling Harlequin VariData ]%%.

HVD auto mode

HVD auto mode detects when a PDF job was created by a known variable-data application, and can automatically enable HVD for these jobs. You can use /EnableOptimizedPDFScan as a tri-state parameter, setting it to /Always, /Never or /Auto while also retaining the boolean options of true and false for backwards compatibility: where /Always is the same as true and /Never is the same as false.

Both internal and external HVD optimizations can benefit from running in auto mode. The use of the auto HVD optimization may require an update to your RIP license.

An example of configuring for HVD auto mode for external HVD:

<<
  /EnableOptimizedPDFScan /Auto
  /OptimizedPDFScanLimitPercent 50
  /OptimizedPDFExternal true
  /OptimizedPDFCacheID (GGDUMB1)
>> setpdfparams
<<
  /ContoneMask 1
>> setpagedevice

An example for internal mode:

<<
  /EnableOptimizedPDFScan /Auto
  /OptimizedPDFScanLimitPercent 50
  /OptimizedPDFCacheID (GGIRR)
>> setpdfparams

A more complete example is included in the page feature file SW/Page Features/HVDInternal.

When in auto mode a procset called /HqnHVDParams invokes a procedure /SetFromInfoAndMetadata, which sets /EnableOptimizedPDFScan based on any of the following:

PDF/VT tag in the metadata dictionary
Creator or Producer key in the info dictionary
CreatorTool or Producer key in metadata dictionary

When set to auto mode and a PDF file is submitted, the RIP:

Scans the document-level metadata and determine if the file is tagged as a PDF/VT file. If so, it processes the file as if it had been set to true.
Looks at the producer and creator strings in the document info dictionary and their equivalents in the document-level metadata and compare those with values in a lookup table of strings used by common VDP composition tools. If the strings match, the file is processed as if /EnableOptimizedPDFScan had been set to /Always; if they don't, it acts as if /EnableOptimizedPDFScan had been set to /Never.

The producer/creator look-up table is available in a PostScript language file called SW/Usr/HqnVariableDataCreators. This file can be edited to add extra creators or names of procedures. If extra names of procedures are added they must be defined in the /HqnHVDParams procset. This PostScript language file returns a dictionary. The keys of the dictionary are names of procedures for matching the known variable data creator strings to the value for creator or producer found in the metadata or info dictionary. Corresponding values of the dictionary are arrays of strings of known variable data creators.

The SW/Usr/HqnVariableDataCreators file is read from the /HqnHVDParams procset. Safety code is provided which produces a warning for incorrect stack handling or type of returns.

Warning: Metadata dictionaries in PDF files can have several different formats. At the moment not all formats are supported. For example, abbreviated XML is not supported and also if the metadata has been compressed or encrypted, it cannot be parsed by the RIP.

You can set the variable /HvdParamsDebug at the top of the /HqnHVDParams procset to true to view extra debug information.

Warning: Setting /EnableOptimizedPDFScan to /Always, /Auto or true implies that the rest of the RIP configuration is appropriate for use with HVD. Meaning, for instance, that if you are using simple imposition you should set HVD to off.

Using HVD with ranges of pages

A large job can be split into "chunks" of data with the use of /PageRange. Here, for example, the job is split into chunks of 10 pages:

/PDFContext (%E%//TestJobs/largejob.pdf) (r) file << >> pdfopen def
  PDFContext << /PageRange [ [1 10] ] >> pdfexecid
  PDFContext << /PageRange [ [11 20] ] >> pdfexecid
  PDFContext << /PageRange [ [21 30] ] >> pdfexecid
  PDFContext << /PageRange [ [31 40] ] >> pdfexecid
  PDFContext << /PageRange [ [41 50] ] >> pdfexecid
PDFContext pdfclose

While running this PostScript language fragment in an HVD setup, if, for example, during the first page range (1 to 10) some variable data is retained for re-use but the scan is aborted during a subsequent range, the scan for variable data is aborted for the rest of the job. Thus, if you are using small chunks of data and are seeing jobs aborting the HVD scan when you think there should be re-use of data, you should increase the /OptimizedPDFScanLimitPercent value, possibly up to the maximum of 100%, in which case the HVD scan continues for the whole job.

If you are writing a PostScript language control stream that needs to execute chunks from different PDF files you should call pdfclose on the first PDF file before calling pdfexecid on a chunk from the second to ensure that HVD scanning is triggered for the second file.

eHVD diagnostic modes

Some diagnostic modes are available for to determine RIP behavior when using HVD. These modes should not be used in production, but can be useful when trying to determine why a job behaves in a particular way. There are several Harlequin-internal values for the /OptimizedPDFCacheID PDF parameter that can be used for diagnosing HVD issues when /OptimizedPDFExternal set to true:

GGDUMB1: Each raster element identified by the HVD scanner is output exactly once. This can be used in conjunction with any raster output backend, such as TIFF or None, to emulates the entire RIP output for a given job assuming that no purges of the back end cache took place.

GGDUMB0: No raster elements are output. That is, only the HVD scan is performed on the job. This is useful if the only item of interest is the RIP monitor output messages about the scan.

GGVARONLY: This mode outputs only the variable data elements, i.e., those with a hit count of exactly 1.

GGCACHEONLY: This mode outputs only the cached elements, i.e., those with a hit count of two or greater.

The following example PostScript language code turns HVD on and selects a diagnostic mode to output each raster element exactly once:

<<
  /EnableOptimizedPDFScan true
  /OptimizedPDFCacheID (GGDUMB1)
  /OptimizedPDFExternal true
>> setpdfparams

For Harlequin Core, see also the supplied HVDDemo example page feature.

Harlequin VariData and the Scalable RIP

The Scalable RIP can be configured to use HVD. When using HVD, it is important to realize that each job is split up into chunks, and the chunks are farmed out to separate RIPs for interpretation and rendering. HVD scans the start of each job it sees to determine whether there is enough repeated content to be worthwhile caching. If not enough content is repeated, HVD disables caching for the rest of the job. In the Scalable RIP, HVD scanning and re-use is performed on each Farm RIP independently. When sending page ranges from a job to a Farm RIP, the Scalable RIP keeps the job context open on the Farm RIP if it had previously run a page range from the same Scalable RIP job.

When using HVD with the Scalable RIP, this means that:

The chunk size must be large enough for HVD to be able to detect re-used content within the first page range of a job.
The HVD scan limit percentage must be set so that HVD is likely to detect re-used content within the first page range of a job.

If HVD does not detect enough content re-use within the first page range of a job, it will disable re-use not just for the first page range, but for all subsequent page ranges of the job sent to the same Farm RIP.

The default chunk size that the Scalable RIP uses to split PDF jobs is 1, which will prevent HVD from working with iHVD and non-position-independent eHVD. For some common PDF-VT job types, HVD can be turned on automatically if the job is likely to benefit, and the chunk size set to a different value (50 in the following example) by using the AutoHVDChunkSize configuration in your configuration file or a page feature:

50 /HqnScalableRIP /ProcSet findresource /AutoHVDChunkSize get exec

This example utilizes the same functionality outlined in the Extensions Manual; for more information see Auto mode for HVD.

The configuration or page feature using /AutoHVDChunkSize also needs to set up the HVD cache ID, any other parameters except for /EnableOptimizedPDFScan, and any license key required for HVD.

The HVD scan limit percentage is configured using the /OptimizedPDFScanLimitPercent PDF parameter. The default value for this is 10 (i.e., up to 10% of the job will be scanned for re-use). For use with the Scalable RIP, this will be the percentage of the first page range encountered, so a much higher percentage is appropriate. To scan the entire submitted page range for re-use, this parameter should be set in the configuration or a page feature to 100%:

<<
  /OptimizedPDFScanLimitPercent 100
  /OptimizedPDFExternal true
  /OptimizedPDFCacheID (GGDUMB1)
  /OptimizedPDFPositionIndependent true
>> setpdfparams

The chunk size can be set explicitly by using the DefaultPageChunkSize key in the global configuration file, but this has the disadvantage that it will affect all jobs (not just variable data jobs), reducing the load-balancing capability of the Scalable RIP for small nonvariable data jobs. The chunk size can be set for each job separately by setting a parameter on the internal Scalable RIP device. An alternate method of configuring the chunk size for variable data jobs separately from non-variable data jobs is to set the /SetChunkSize parameter using the Scalable RIP procset, thus adding this to your configuration:

50 /HqnScalableRIP /ProcSet findresource /SetChunkSize get exec

This will set the chunk size for the current configuration to 50 pages. This configuration option can also be added in a page feature.

APIs and support code for HVD

As well as supporting eHVD directly in some raster output backends, the Harlequin RIP core library contains a library to help integrate eHVD and support functions in the SDK to simplify enabling eHVD in raster backends.

You may also want to implement your own clients of the eHVD event API, especially if you have hardware that supports composing of multiple rasters.