This section is non-normative.
TEON is a serialization format for name/value pairs.
TEON texts are easy to read and write by hand. It is easy to compare TEON texts in line-by-line basis.
That is, it should not be difficult to fix conflicting version-controlled TEON file revisions.
It is easy to implement in any platform or language. It is well-defined such that there can be multiple interoerable implementations.
All notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.
Notes are shown like this.
The key words "MUST", "MUST NOT", "SHOULD", and "MAY" in the normative parts of this document are to be interpreted as described in RFC 2119.
Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent.
Implementations MAY impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
Terms code point, scalar value, utf-8, and utf-8 decode are defined in the Encoding Standard.
A TEON document has a set of scalar fields, a set of enumeration fields, and a set of list fields. Any item in these sets, referred to as a field, is a pair of field name and field value.
A field name is a non-empty string. There cannot be more than one fields with same field name in a set.
The field value of a scalar field is a string.
The field value of an enumeration field is a set of zero or more strings. There cannot be more than one same strings in a field value.
The field value of a list field is an ordered list of zero or more strings.
A string is a sequence of zero or more code points. It MUST NOT contain a code point that is not a scalar value.
A conforming TEON document cannot contain any surrogate code point.
Semantics of data structure represented by a TEON document is application-dependent.
A TEON file is a utf-8-encoded TEON document.
This specification does not recommend or prohibit use of BYTE
ORDER MARK
(BOM
).
A TEON text is zero or more TEON line, separated by a newline. A TEON text represents a TEON document. Its semantics is defined by the TEON parsing algorithm.
A TEON line consists of zero or more code points other than U+000D
and U+000A
. It is either a scalar line,
enumeration line, list line, empty
line, or invalid line.
A scalar line is a $
character followed by
an escaped field name followed by a :
character followed by an escaped value.
There MUST NOT be more than
one scalar lines with
same escaped field name.
An enumeration line is a &
character
followed by an escaped field name followed by
a :
character followed by an escaped field
value. There MUST NOT be more than
one enumeration lines with same
pair of escaped field name and escaped field
value.
A list line is a @
character followed by
an escaped field name followed by a :
character followed by an escaped field value.
An empty line is the empty string.
Any other TEON line is an invalid line. There MUST NOT be any invalid line.
An escaped field name is one or more sequence of
either code point other
than U+000D
, U+000A
and :
,
or escape. There MUST NOT be
a \
character that is not part of an escape
in an escaped field name.
An escaped field value is zero or more sequence of
either code point other than U+000D
and U+000A
or escape.
There MUST NOT be a \
character
that is not part of an escape in an escaped field
value. There MUST NOT be
an escape \C
in an escaped field
value.
An escape is a \
character followed by
r
, n
, C
, or \
.
Escapes \r
, \n
,
\C
, and \\
represent U+000D
,
U+000A
, U+003A
, and U+005C
,
respectively.
A newline is either an optional U+000D
character followed by a U+000A
character, or
a U+000D
character. It SHOULD be
a U+000A
character.
The TEON parsing algorithm, which MUST be used to parse a TEON file or TEON text into a TEON document, is as follows:
To unescape a string string, the implementation MUST run these steps:
U+005C
character,
r
character, replace
the ith character in string by
a U+000D
character and add i − 1
to removed.
n
character,
replace the ith character in string by
a U+000A
character and add i − 1
to removed.
C
character, replace
the ith character in string by
a U+003A
character and add i − 1
to removed. If string was originally
an escaped field value, report a parse
error.
\
character, replace
the ith character in string by
a U+005C
character and add i − 1
to removed.
The parsing algorithm returns a TEON document. It might also report zero or more parse errors.
This specification does not define how parse errors are handled by the application that uses the TEON parsing algorithm. A TEON parser might not report parse errors as most applications other than conformance checkers will not need them.
If the input TEON text contains a surrogate code point, the output TEON document can contain it.
The TEON serialization algorithm, which SHOULD be used to serialize a TEON document into a TEON text, is as follows:
$
character, followed by field name escaped field name
of field, followed by a :
character,
followed by field value
escaped field value of field,
to lines.
&
character, followed by field name escaped field name
of field, followed by a :
character,
followed by field value
escaped value, to lines.
@
character, followed by field name escaped field name
of field, followed by a :
character,
followed by field value
escaped value, to lines.
U+000A
character.
To field value escape a string string, the implementation MUST run these steps:
U+005C
character
in string by \\
.
U+000D
character
in string by \r
.
U+000A
character
in string by \n
.
To field name escape a string string, the implementation MUST run these steps:
U+003A
character
in string by \C
.
The TEON text returned by the serialization algorithm can be non-conforming if the input TEON document is non-conforming.
A TEON file or TEON text can be labeled as
MIME type text/teon
. No parameter is
defined for the type.
A TEON file can use the file name
extension .teon
.
It MAY use other extension.
There are test data.
There is a Perl implementation.
This document is written by Wakaba <wakaba@suikawiki.org>.
This document is developed as part of the TR project.
Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.