Skip to end of metadata
Go to start of metadata

DDEX’s flat file standards such as the DSR and CDM standards require that each Record that makes up part of a message created in accordance with the relevant standard, contains the specific number of Cells defined in the standard, even if some of those Cells are empty. If, for example the standard requires that a Record type XX01 has six Cells, but the sender of the message only wishes to communicate data in the first, second and fourth Cell, the Record must still contain six Cells separated by five Cell Delimiters (i.e. “tab” characters U+0x09).

This means that for this example, the following data must be sent. The right arrow character is used to represent the Cell Delimiters and the down character is used to represent the Record Delimiter):

XX01→DataForCell2→→DataForCell4→→⏎

However, many spreadsheet applications do not export tab-separated value files in the form required in the DSR and CDM standards. Instead, they usually omit Cells altogether that are set out to the right of a Cell Delimiter, if after that Cell Delimiter, there is no data provided. 

Alternatively, some spreadsheet applications actually add Cells to the right of a Cell Delimiter if no data follows it. While these actions have no consequences to the recipient of the message when the TSV file is imported into another spreadsheet application, such messages may not be correctly processed by ingestors created in accordance with the DSR and CDM standards because some ingestors expect the correct number of Cell Delimiters in every Record.

Perl script: dsr_cdm_length.pl

The Perl script dsr_cdm_length.pl is a simple tool that enables a sender of a DSR or CDM message to export data from a spreadsheet application that ensures that the message is formatted so that it has the correct number of Cell Delimiters in every Record regardless of whether actual data follows each Cell Delimiter or not.

Perl is a simple but powerful scripting language that is natively available on most Unix computers running, for example, Linux or macOS. Perl can also be easily installed on Windows machines from perl.org.

The tool can be called using a command line prompt on Unix and Windows machines as follows:

dsr_cdm_length.pl input.tsv output.tsv

When started, the script will look at each Record in the input file and write the same Record, but with the correct number of Cell Delimiters for that Record in the output file:



The output file name can also be omitted when calling the script. In that case the script creates an output file name based on the input file name:



The script will ensure conformance with Part 8 in Version 1.4 (and later) of the DSR standard as well as Part 2 in Version 1.0.1 (and later) of the CDM standard. The script can also be used by a recipient of a message created in accordance with the DSR or CDM standard, to check that all Record types in a message have the correct number of Cells as defined in the relevant standard.

Download the tool

dsr_cdm_length.pl

 

  • No labels