4 Syntax

A TEON file is a utf-8-encoded TEON document.

This specification does not recommend or prohibit use of BYTE ORDER MARK (BOM).

A TEON text is zero or more TEON line, separated by a newline. A TEON text represents a TEON document. Its semantics is defined by the TEON parsing algorithm.

A TEON line consists of zero or more code points other than U+000D and U+000A. It is either a scalar line, enumeration line, list line, empty line, or invalid line.

A scalar line is a $ character followed by an escaped field name followed by a : character followed by an escaped value. There MUST NOT be more than one scalar lines with same escaped field name.

An enumeration line is a & character followed by an escaped field name followed by a : character followed by an escaped field value. There MUST NOT be more than one enumeration lines with same pair of escaped field name and escaped field value.

A list line is a @ character followed by an escaped field name followed by a : character followed by an escaped field value.

An empty line is the empty string.

Any other TEON line is an invalid line. There MUST NOT be any invalid line.

An escaped field name is one or more sequence of either code point other than U+000D, U+000A and :, or escape. There MUST NOT be a \ character that is not part of an escape in an escaped field name.

An escaped field value is zero or more sequence of either code point other than U+000D and U+000A or escape. There MUST NOT be a \ character that is not part of an escape in an escaped field value. There MUST NOT be an escape \C in an escaped field value.

An escape is a \ character followed by r, n, C, or \.

Escapes \r, \n, \C, and \\ represent U+000D, U+000A, U+003A, and U+005C, respectively.

A newline is either an optional U+000D character followed by a U+000A character, or a U+000D character. It SHOULD be a U+000A character.

4.1 Parsing

The TEON parsing algorithm, which MUST be used to parse a TEON file or TEON text into a TEON document, is as follows:

Let input be the input.
If input is a TEON file, run the steps to utf-8 decode input and set input to the output of the steps.
Let output be an empty TEON document.
Let scalars, enums, and lists be empty sets.
Split input by newlines.
For each substring line obtained by the previous step, in order, run these substeps:
If line is a scalar line
1. Let name be the escaped field name in line.
2. Let value be the escaped field value in line.
3. Unescape name.
4. Unescape value.
5. If scalars contains a field whose field name is name, report a parse error and remove the field from scalars.
6. Add a field whose field name is name and field value is value to scalars.
If line is an enumeration line
1. Let name be the escaped field name in line.
2. Let value be the escaped field value in line.
3. Unescape name.
4. Unescape value.
5. If enums contains a field whose
  field name is name,
  
  Let set be the field value of the field.
  If set already contains value, report a parse error.
  Otherwise, add value to set.
6. Otherwise, add a field whose field name is name and field value is a set which only contains value to enums.
If line is a list line
1. Let name be the escaped field name in line.
2. Let value be the escaped field value in line.
3. Unescape name.
4. Unescape value.
5. If lists contains a field whose field name is name, append value to the list in the field value of the field.
6. Otherwise, add a field whose field name is name and field value is a list which only contains value to lists.
If line is an empty line

Do nothing.
Otherwise

Report a parse error.
Set the set of scalar fields of output to scalars.
Set the set of enumeration fields of output to enums.
Set the set of list fields of output to lists.
Return output.

To unescape a string string, the implementation MUST run these steps:

Let removed be an empty list.
Let i be zero.
Loop: Let c be the ith character in string (using 0-based index). If there is no ith character in string, remove characters whose index is listed in removed from string and abort these steps.
If c is a U+005C character,
1. Increment i by one.
2. Let d be the ith character in string. If there is no ith character in string, report a parse error and go to the step whose label is loop.
3. If d is a r character, replace the ith character in string by a U+000D character and add i − 1 to removed.
4. Otherwise, if d is a n character, replace the ith character in string by a U+000A character and add i − 1 to removed.
5. If d is a C character, replace the ith character in string by a U+003A character and add i − 1 to removed. If string was originally an escaped field value, report a parse error.
6. If d is a \ character, replace the ith character in string by a U+005C character and add i − 1 to removed.
7. Otherwise, report a parse error.
Increment i by one.
Go to the step whose label is loop.

The parsing algorithm returns a TEON document. It might also report zero or more parse errors.

This specification does not define how parse errors are handled by the application that uses the TEON parsing algorithm. A TEON parser might not report parse errors as most applications other than conformance checkers will not need them.

If the input TEON text contains a surrogate code point, the output TEON document can contain it.

4.2 Serialization

The TEON serialization algorithm, which SHOULD be used to serialize a TEON document into a TEON text, is as follows:

Let document be the TEON document to serialize.
Let lines be an empty list.
For each scalar field field in document, sorted by
field name using its code point,
1. Add a $ character, followed by field name escaped field name of field, followed by a : character, followed by field value escaped field value of field, to lines.
For each enumeration field field in document, sorted by
field name using its code point,
1. For each value value in field value of field, sorted using its code point,
  1. Add a & character, followed by field name escaped field name of field, followed by a : character, followed by field value escaped value, to lines.
For each list field field in document, sorted by
field name using its code point,
1. For each value value in field value of field, in order,
  1. Add a @ character, followed by field name escaped field name of field, followed by a : character, followed by field value escaped value, to lines.
Return the concatenation of the strings lines, separated by a U+000A character.

To field value escape a string string, the implementation MUST run these steps:

Replace any occurrence of a U+005C character in string by \\.
Replace any occurrence of a U+000D character in string by \r.
Replace any occurrence of a U+000A character in string by \n.

To field name escape a string string, the implementation MUST run these steps:

Field value escape string.
Replace any occurrence of a U+003A character in string by \C.

The TEON text returned by the serialization algorithm can be non-conforming if the input TEON document is non-conforming.

TEON

Living Standard — 15 April 2015

Table of contents

1 Introduction

2 Definitions

3 Data model

4 Syntax

4.1 Parsing

4.2 Serialization

5 Identifiers

Tests and implementation

Author