TEON

Living Standard — 15 April 2015

Latest version
https://wakaba.github.io/spec-teon/
Version history
https://github.com/wakaba/spec-teon/commits/gh-pages

Table of contents

  1. 1 Introduction
  2. 2 Definitions
  3. 3 Data model
  4. 4 Syntax
    1. 4.1 Parsing
    2. 4.2 Serialization
  5. 5 Identifiers
  6. Tests and implementation
  7. Author

1 Introduction

This section is non-normative.

TEON is a serialization format for name/value pairs.

TEON texts are easy to read and write by hand. It is easy to compare TEON texts in line-by-line basis.

That is, it should not be difficult to fix conflicting version-controlled TEON file revisions.

It is easy to implement in any platform or language. It is well-defined such that there can be multiple interoerable implementations.

2 Definitions

All notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

Notes are shown like this.

The key words "MUST", "MUST NOT", "SHOULD", and "MAY" in the normative parts of this document are to be interpreted as described in RFC 2119.

Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent.

Implementations MAY impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

Terms code point, scalar value, utf-8, and utf-8 decode are defined in the Encoding Standard.

3 Data model

A TEON document has a set of scalar fields, a set of enumeration fields, and a set of list fields. Any item in these sets, referred to as a field, is a pair of field name and field value.

A field name is a non-empty string. There cannot be more than one fields with same field name in a set.

The field value of a scalar field is a string.

The field value of an enumeration field is a set of zero or more strings. There cannot be more than one same strings in a field value.

The field value of a list field is an ordered list of zero or more strings.

A string is a sequence of zero or more code points. It MUST NOT contain a code point that is not a scalar value.

A conforming TEON document cannot contain any surrogate code point.

Semantics of data structure represented by a TEON document is application-dependent.

4 Syntax

A TEON file is a utf-8-encoded TEON document.

This specification does not recommend or prohibit use of BYTE ORDER MARK (BOM).

A TEON text is zero or more TEON line, separated by a newline. A TEON text represents a TEON document. Its semantics is defined by the TEON parsing algorithm.

A TEON line consists of zero or more code points other than U+000D and U+000A. It is either a scalar line, enumeration line, list line, empty line, or invalid line.

A scalar line is a $ character followed by an escaped field name followed by a : character followed by an escaped value. There MUST NOT be more than one scalar lines with same escaped field name.

An enumeration line is a & character followed by an escaped field name followed by a : character followed by an escaped field value. There MUST NOT be more than one enumeration lines with same pair of escaped field name and escaped field value.

A list line is a @ character followed by an escaped field name followed by a : character followed by an escaped field value.

An empty line is the empty string.

Any other TEON line is an invalid line. There MUST NOT be any invalid line.

An escaped field name is one or more sequence of either code point other than U+000D, U+000A and :, or escape. There MUST NOT be a \ character that is not part of an escape in an escaped field name.

An escaped field value is zero or more sequence of either code point other than U+000D and U+000A or escape. There MUST NOT be a \ character that is not part of an escape in an escaped field value. There MUST NOT be an escape \C in an escaped field value.

An escape is a \ character followed by r, n, C, or \.

Escapes \r, \n, \C, and \\ represent U+000D, U+000A, U+003A, and U+005C, respectively.

A newline is either an optional U+000D character followed by a U+000A character, or a U+000D character. It SHOULD be a U+000A character.

4.1 Parsing

The TEON parsing algorithm, which MUST be used to parse a TEON file or TEON text into a TEON document, is as follows:

  1. Let input be the input.
  2. If input is a TEON file, run the steps to utf-8 decode input and set input to the output of the steps.
  3. Let output be an empty TEON document.
  4. Let scalars, enums, and lists be empty sets.
  5. Split input by newlines.
  6. For each substring line obtained by the previous step, in order, run these substeps:
    If line is a scalar line
    1. Let name be the escaped field name in line.
    2. Let value be the escaped field value in line.
    3. Unescape name.
    4. Unescape value.
    5. If scalars contains a field whose field name is name, report a parse error and remove the field from scalars.
    6. Add a field whose field name is name and field value is value to scalars.
    If line is an enumeration line
    1. Let name be the escaped field name in line.
    2. Let value be the escaped field value in line.
    3. Unescape name.
    4. Unescape value.
    5. If enums contains a field whose field name is name,
      1. Let set be the field value of the field.
      2. If set already contains value, report a parse error.
      3. Otherwise, add value to set.
    6. Otherwise, add a field whose field name is name and field value is a set which only contains value to enums.
    If line is a list line
    1. Let name be the escaped field name in line.
    2. Let value be the escaped field value in line.
    3. Unescape name.
    4. Unescape value.
    5. If lists contains a field whose field name is name, append value to the list in the field value of the field.
    6. Otherwise, add a field whose field name is name and field value is a list which only contains value to lists.
    If line is an empty line
    Do nothing.
    Otherwise
    Report a parse error.
  7. Set the set of scalar fields of output to scalars.
  8. Set the set of enumeration fields of output to enums.
  9. Set the set of list fields of output to lists.
  10. Return output.

To unescape a string string, the implementation MUST run these steps:

  1. Let removed be an empty list.
  2. Let i be zero.
  3. Loop: Let c be the ith character in string (using 0-based index). If there is no ith character in string, remove characters whose index is listed in removed from string and abort these steps.
  4. If c is a U+005C character,
    1. Increment i by one.
    2. Let d be the ith character in string. If there is no ith character in string, report a parse error and go to the step whose label is loop.
    3. If d is a r character, replace the ith character in string by a U+000D character and add i − 1 to removed.
    4. Otherwise, if d is a n character, replace the ith character in string by a U+000A character and add i − 1 to removed.
    5. If d is a C character, replace the ith character in string by a U+003A character and add i − 1 to removed. If string was originally an escaped field value, report a parse error.
    6. If d is a \ character, replace the ith character in string by a U+005C character and add i − 1 to removed.
    7. Otherwise, report a parse error.
  5. Increment i by one.
  6. Go to the step whose label is loop.

The parsing algorithm returns a TEON document. It might also report zero or more parse errors.

This specification does not define how parse errors are handled by the application that uses the TEON parsing algorithm. A TEON parser might not report parse errors as most applications other than conformance checkers will not need them.

If the input TEON text contains a surrogate code point, the output TEON document can contain it.

4.2 Serialization

The TEON serialization algorithm, which SHOULD be used to serialize a TEON document into a TEON text, is as follows:

  1. Let document be the TEON document to serialize.
  2. Let lines be an empty list.
  3. For each scalar field field in document, sorted by field name using its code point,
    1. Add a $ character, followed by field name escaped field name of field, followed by a : character, followed by field value escaped field value of field, to lines.
  4. For each enumeration field field in document, sorted by field name using its code point,
    1. For each value value in field value of field, sorted using its code point,
      1. Add a & character, followed by field name escaped field name of field, followed by a : character, followed by field value escaped value, to lines.
  5. For each list field field in document, sorted by field name using its code point,
    1. For each value value in field value of field, in order,
      1. Add a @ character, followed by field name escaped field name of field, followed by a : character, followed by field value escaped value, to lines.
  6. Return the concatenation of the strings lines, separated by a U+000A character.

To field value escape a string string, the implementation MUST run these steps:

  1. Replace any occurrence of a U+005C character in string by \\.
  2. Replace any occurrence of a U+000D character in string by \r.
  3. Replace any occurrence of a U+000A character in string by \n.

To field name escape a string string, the implementation MUST run these steps:

  1. Field value escape string.
  2. Replace any occurrence of a U+003A character in string by \C.

The TEON text returned by the serialization algorithm can be non-conforming if the input TEON document is non-conforming.

5 Identifiers

A TEON file or TEON text can be labeled as MIME type text/teon. No parameter is defined for the type.

A TEON file can use the file name extension .teon. It MAY use other extension.

Tests and implementation

There are test data.

There is a Perl implementation.

Author

This document is written by <wakaba@suikawiki.org>.

This document is developed as part of the TR project.

Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.