Contents
  1. Binary
  2. Textual
    1. Structured text
    2. Semi-structured text
    3. Unstructured text
      1. Text selectors
        1. regex
        2. bounds
        3. scanf
  3. Filing

Contexts

Context providers are structs which provide a new view of the data. They are packable structs themselves which have their own keys for extracting the context, then they transform it (probably) and provide the new data for its children to use. They define locator keys based on what kind of context they provide.

Because contexts completely change how the structs work, they block all inheritance by their children. That is, their children cannot inherit keys from the context provider or higher. However, a context provider's grandchild can inherit keys from its parent just fine.

While there could be any number of context types, we currently define three: binary, textual, and filing. The root may be any type the applcation requests. For Exchange, it will be binary if the user interacts with a ROM (or whatever) and filing if the user interacts with a folder.

Binary

As the name implies, they provide access to binary data. Struct types which can be used in a binary context are called serializable types. They must define the methods serialize and unserialize in order to pack and unpack, respectively. The basic binary context provider is the bin type.

The locator keys defined by binary contexts are:

The locators have the following constraints, from which calculations of missing values can potentially be drawn:

Textual

Textual contexts provide string data. They may either be structured text (like XML) or unstructured text (like TXT). Struct types which can be used in a textual context are called stringifiable types. They must define the methods stringify and parse in order to pack and unpack, respectively.

Structured text

Structured text refers largely to object notation and markup, such as JSON, YAML, and XML. The basic structured text context provider is the markup type.

Each type will define its own selectors which are natural for that format and provide them through the base locator key. It may (and probably will) accept multiple types of selectors, though. Because it's common to match multiple entities, these contexts also provide a limit locator key. These is no linear ordering.

XML will use XPATH selectors (and others should also accept this, since it's so general), JSON can use jq or JSON Pointers, HTML could use CSS Selectors probably with jQuery extensions, etc.

Semi-structured text

Semi-structured text includes formatted text (RTF, LaTeX) and possibly even programming or annotated natural languages. The text content is unstructured, but it has structuring around it. These are not a priority but they should be considered during design.

Unstructured text

This works similarly to binary data except instead of being a sequence of bytes, it's a sequence of decoded characters. As such, it can accept a base similar to binary contexts, but also ones similar to regular text contexts. The basic unstructured text context provider is the string type.

The context itself must specify how it's interpreting the individual characters. It could perform one of the forms of Unicode normalization, it might accept combining diacritics as part of the preceding character, or it might just provide the raw codepoints (as text, though).

Children can either use base, length, and end where base and end act like in binary contexts; or they can use just base where it's a selector and may define a limit. Both forms are linearly ordered. If no locator keys are specified at all, it assumes the positional offset method.

The context itself may declare a key ignore which accepts a selector which must correctly match all unselected text between children. If it is declared and it fails to match, an error should be raised.

Offset form:

This form also has the same constraints as binary contexts, and inferrences can be made for missing data:

Selector form:

Text selectors

These are keystructs so they are declared in MPRL like type(basic) or type[arg1, arg2] or type.subtype(basic) etc.

regex

By default (without subtyping) it constructs a Perl Regular Expression. The author may also specify regex.perl specifically.

Other subtypes include:

bounds

Declared in MPRL like bounds[start, end] which selects text between (not including) the first start it finds and the nearest end it finds after.

If start or end is itself a list, it can start and end at any of the respective matches. That is, it will end with any option, regardless of what it started at. start and end may also be regex by using it like bounds[regex(start expression), end] or around end or both separately.

In this example, Element's value is xyz d="

string Example {
	data: 'stuff <xyz d=">"> aa'
	string Element {
		base: bounds["<", ">"]
	}
}

Another option is bounds.recurse[start, end, others] which selects text between (not including) start and the matching end, but only selects the top level ones. others is optional, but is a varadic list of oStart:oEnd markers between which start and end matches are ignored.

In this example, Element's value is xyz d=">"

string Example {
	data: 'stuff <xyz d=">"> aa'
	string Element {
		base: bounds.recurse["<", ">", '"':'"', "'":"'"]
	}
}

scanf

From C, uses form scanf(format) where the format is as defined here. %p is not supported.

While this may be used as the basing for numbers, the format must contain a single %i, %d, %u, %o, %x, or %n. If it contains multiple of those, it must raise an error. Anything else in the format will be ignored. For some theoretical floating point struct type, it works the same, but with %f, %e, %g, and %a. For any string types, it will validate the match but not interpret the values.

scanf itself provides a list basic value of what it matched, such that index 0 is the first matching %-specifier, etc.

Filing

Filing contexts offer whole files. This can be thought of as essentially an archive or a folder. Struct types which can be used in a filing context are called stringifiable types. They must define the methods to_file and from_file in order to pack and unpack, respectively. The basic filing context provider is the folder type.

It provides the locator key base which is the filename (including path as appropriate) and the optional encoding key type which specifies the mimetype. If the context has a strict ordering of files, then the children are linearly ordered.