6.2.3 Delimiters

6.2.3.1 Record Delimiter

Each file making up a BWARM data feed shall be separated into individual Records with each Record being placed into one line terminated by a line feed (Unicode U+000A) or a carriage return and line feed pair (Unicode U+000D 000A).

6.2.3.2 Primary Delimiter

Cells within a Record are separated by tab characters (Unicode U+0009). The files defined in this standard are therefore TSV files and have a .tsv file extension.

6.2.3.3 Secondary Delimiter

Should a single Cell contain two or more data elements, these data elements shall be separated by a pipe character (Unicode U+007C). All data elements in a multi-value Cell shall be of the same primitive data type (see Clause 6.2.4).

6.2.3.4 Namespace delimiter

Should a Cell contain a data element whose origin needs to be provided, the data element shall be preceded by a string that provides a "namespace" and two colon characters (Unicode U+003A).

For example a party identifier can be communicated as ISNI::0000000081266409, indicating that the identifier (0000000081266409) is an International Standard Name Identifier (ISNI).

The sender of the BWARM data feed should ensure that the recipient can, for each specific namespace, ingest data in this form.

6.2.3.5 Spaces and delimiters

Delimiters shall not be surrounded by extra space characters.

For example, the writer pair Lennon/McCartney should be communicated as Lennon|McCartney and not as Lennon⎵|⎵McCartney.

6.2.3.6 Received spaces and delimiters

If a sender has received data with extra white spaces, they are encouraged to trim any such extra white space characters when compiling a file created in accordance with this standard. For example, if the sender received data with the writer, Lennon as Lennon⎵ and McCartney as McCartney⎵, then the writer pair should be communicated by the sender as Lennon|McCartney.

However, it is also permitted, for a sender that received data with the writers Lennon as “Lennon⎵” and McCartney as “McCartney⎵”, to communicate the writer pair as Lennon⎵|McCartney⎵ if the sender is required to provide data “as received” from third parties.

6.2.3.7 Communicating delimiters

To communicate a Delimiter in the data within a Cell, such a Cell shall not be enclosed in double quote characters. Instead the Delimiter shall be immediately preceded by an escaping code as follows:

  1. To escape a tab character contained in a text string, the escaping code is the backslash character (Unicode U+005C). Therefore, the string A[TAB]B would have to be communicated as A\[TAB]B (with [TAB] representing the tabulator);

  2. To escape a pipe character contained in a text string, the escaping code is a double backslash character (Unicode U+005C). Therefore, the string A|B would have to be communicated as A\\|B; and

  3. To communicate a backslash character, the escaping code is a triple backslash character. Therefore, the string A\B would have to be communicated as A\\\\B.

These escaping mechanisms must be used for all special characters in all Cells, whether those Cells allow multiple values or not. A non-escaped pipe character in a single-value Cell is, consequently, an error. For the avoidance of doubt, escaping a character that should not be escaped, or not escaping a character that should have been escaped, will lead to an invalid file.