element:
.Bd -literal -offset indent
96M
.Ed
.Ss "The Gettext Modifier ({g:})"
The gettext modifier is used to translate individual fields using the
gettext domain (typically set using the "{G:}" role) and current
language settings.
Once libxo renders the field value, it is passed
to
.Xr gettext 3 ,
where it is used as a key to find the native language
translation.
.Pp
In the following example, the strings "State" and "full" are passed
to
.Fn gettext
to find locale-based translated strings.
.Bd -literal -offset indent
xo_emit("{Lgwc:State}{g:state}\n", "full");
.Ed
.Ss "The Key Modifier ({k:})"
The key modifier is used to indicate that a particular field helps
uniquely identify an instance of list data.
.Bd -literal -offset indent
EXAMPLE:
xo_open_list("user");
for (i = 0; i < num_users; i++) {
xo_open_instance("user");
xo_emit("User {k:name} has {:count} tickets\\n",
user[i].u_name, user[i].u_tickets);
xo_close_instance("user");
}
xo_close_list("user");
.Ed
.Pp
Currently the key modifier is only used when generating XPath values
for the HTML output style when
.Dv XOF_XPATH
is set, but other uses are likely in the near future.
.Ss "The Leaf-List Modifier ({l:})"
The leaf-list modifier is used to distinguish lists where each
instance consists of only a single value. In XML, these are
rendered as single elements, where JSON renders them as arrays.
.Bd -literal -offset indent
EXAMPLE:
xo_open_list("user");
for (i = 0; i < num_users; i++) {
xo_emit("Member {l:name}\n", user[i].u_name);
}
xo_close_list("user");
XML:
phil
pallavi
JSON:
"user": [ "phil", "pallavi" ]
.Ed
.Ss "The No-Quotes Modifier ({n:})"
The no-quotes modifier (and its twin, the 'quotes' modifier) affect
the quoting of values in the JSON output style.
JSON uses quotes for
string values, but no quotes for numeric, boolean, and null data.
.Xr xo_emit 3
applies a simple heuristic to determine whether quotes are
needed, but often this needs to be controlled by the caller.
.Bd -literal -offset indent
EXAMPLE:
const char *bool = is_true ? "true" : "false";
xo_emit("{n:fancy/%s}", bool);
JSON:
"fancy": true
.Ed
.Ss "The Plural Modifier ({p:})"
The plural modifier selects the appropriate plural form of an
expression based on the most recent number emitted and the current
language settings.
The contents of the field should be the singular
and plural English values, separated by a comma:
.Bd -literal -offset indent
xo_emit("{:bytes} {Ngp:byte,bytes}\n", bytes);
.Ed
The plural modifier is meant to work with the gettext modifier ({g:})
but can work independently.
.Pp
When used without the gettext modifier or when the message does not
appear in the message catalog, the first token is chosen when the last
numeric value is equal to 1; otherwise the second value is used,
mimicking the simple pluralization rules of English.
.Pp
When used with the gettext modifier, the
.Xr ngettext 3
function is
called to handle the heavy lifting, using the message catalog to
convert the singular and plural forms into the native language.
.Ss "The Quotes Modifier ({q:})"
The quotes modifier (and its twin, the 'no-quotes' modifier) affect
the quoting of values in the JSON output style.
JSON uses quotes for
string values, but no quotes for numeric, boolean, and null data.
.Xr xo_emit 3
applies a simple heuristic to determine whether quotes are
needed, but often this needs to be controlled by the caller.
.Bd -literal -offset indent
EXAMPLE:
xo_emit("{q:time/%d}", 2014);
JSON:
"year": "2014"
.Ed
.Ss "The White Space Modifier ({w:})"
The white space modifier appends a single space to the data value:
.Bd -literal -offset indent
EXAMPLE:
xo_emit("{Lw:Name}{:name}\\n", "phil");
TEXT:
Name phil
.Ed
.Pp
The white space modifier is only used for the TEXT and HTML output
styles.
It is commonly combined with the colon modifier ('{c:}').
It is purely a convenience feature.
.Pp
Note that the sense of the 'w' modifier is reversed for the units role
({Uw:}); a blank is added before the contents, rather than after it.
.Ss "Field Formatting"
The field format is similar to the format string for
.Xr printf 3 .
Its use varies based on the role of the field, but generally is used to
format the field's contents.
.Pp
If the format string is not provided for a value field, it defaults
to "%s".
.Pp
Note a field definition can contain zero or more printf-style
.Dq directives ,
which are sequences that start with a '%' and end with
one of following characters: "diouxXDOUeEfFgGaAcCsSp".
Each directive
is matched by one of more arguments to the
.Xr xo_emit 3
function.
.Pp
The format string has the form:
.Bd -literal -offset indent
'%' format-modifier * format-character
.Ed
.Pp
The format- modifier can be:
.Bl -bullet
.It
a '#' character, indicating the output value should be prefixed with
"0x", typically to indicate a base 16 (hex) value.
.It
a minus sign ('-'), indicating the output value should be padded on
the right instead of the left.
.It
a leading zero ('0') indicating the output value should be padded on the
left with zeroes instead of spaces (' ').
.It
one or more digits ('0' - '9') indicating the minimum width of the
argument.
If the width in columns of the output value is less than
the minimum width, the value will be padded to reach the minimum.
.It
a period followed by one or more digits indicating the maximum
number of bytes which will be examined for a string argument, or the maximum
width for a non-string argument.
When handling ASCII strings this
functions as the field width but for multi-byte characters, a single
character may be composed of multiple bytes.
.Xr xo_emit 3
will never dereference memory beyond the given number of bytes.
.It
a second period followed by one or more digits indicating the maximum
width for a string argument.
This modifier cannot be given for non-string arguments.
.It
one or more 'h' characters, indicating shorter input data.
.It
one or more 'l' characters, indicating longer input data.
.It
a 'z' character, indicating a 'size_t' argument.
.It
a 't' character, indicating a 'ptrdiff_t' argument.
.It
a ' ' character, indicating a space should be emitted before
positive numbers.
.It
a '+' character, indicating sign should emitted before any number.
.El
.Pp
Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be
removed eventually.
.Pp
The format character is described in the following table:
.Bl -column C "Argument Type12"
.It Sy "C" "Argument Type " "Format"
.It d "int " "base 10 (decimal)"
.It i "int " "base 10 (decimal)"
.It o "int " "base 8 (octal)"
.It u "unsigned " "base 10 (decimal)"
.It x "unsigned " "base 16 (hex)"
.It X "unsigned long " "base 16 (hex)"
.It D "long " "base 10 (decimal)"
.It O "unsigned long " "base 8 (octal)"
.It U "unsigned long " "base 10 (decimal)"
.It e "double " "[-]d.ddde+-dd"
.It E "double " "[-]d.dddE+-dd"
.It f "double " "[-]ddd.ddd"
.It F "double " "[-]ddd.ddd"
.It g "double " "as 'e' or 'f'"
.It G "double " "as 'E' or 'F'"
.It a "double " "[-]0xh.hhhp[+-]d"
.It A "double " "[-]0Xh.hhhp[+-]d"
.It c "unsigned char " "a character"
.It C "wint_t " "a character"
.It s "char * " "a UTF-8 string"
.It S "wchar_t * " "a unicode/WCS string"
.It p "void * " "'%#lx'"
.El
.Pp
The 'h' and 'l' modifiers affect the size and treatment of the
argument:
.Bl -column "Mod" "d, i " "o, u, x, X "
.It Sy "Mod" "d, i " "o, u, x, X"
.It "hh " "signed char " "unsigned char"
.It "h " "short " "unsigned short"
.It "l " "long " "unsigned long"
.It "ll " "long long " "unsigned long long"
.It "j " "intmax_t " "uintmax_t"
.It "t " "ptrdiff_t " "ptrdiff_t"
.It "z " "size_t " "size_t"
.It "q " "quad_t " "u_quad_t"
.El
.Ss "UTF-8 and Locale Strings"
All strings for
.Nm libxo
must be UTF-8.
.Nm libxo
will handle turning them
into locale-based strings for display to the user.
.Pp
For strings, the 'h' and 'l' modifiers affect the interpretation of
the bytes pointed to argument.
The default '%s' string is a 'char *'
pointer to a string encoded as UTF-8.
Since UTF-8 is compatible with
.Em ASCII
data, a normal 7-bit
.Em ASCII
string can be used.
"%ls" expects a
"wchar_t *" pointer to a wide-character string, encoded as 32-bit
Unicode values.
"%hs" expects a "char *" pointer to a multi-byte
string encoded with the current locale, as given by the
.Ev LC_CTYPE ,
.Ev LANG ,
or
.Ev LC_ALL
environment variables.
The first of this list of
variables is used and if none of the variables are set, the locale defaults to
.Em UTF-8 .
.Pp
.Nm libxo
will
convert these arguments as needed to either UTF-8 (for XML, JSON, and
HTML styles) or locale-based strings for display in text style.
.Bd -literal -offset indent
xo_emit("All strings are utf-8 content {:tag/%ls}",
L"except for wide strings");
.Ed
.Pp
"%S" is equivalent to "%ls".
.Pp
For example, a function is passed a locale-base name, a hat size,
and a time value.
The hat size is formatted in a UTF-8 (ASCII)
string, and the time value is formatted into a wchar_t string.
.Bd -literal -offset indent
void print_order (const char *name, int size,
struct tm *timep) {
char buf[32];
const char *size_val = "unknown";
if (size > 0)
snprintf(buf, sizeof(buf), "%d", size);
size_val = buf;
}
wchar_t when[32];
wcsftime(when, sizeof(when), L"%d%b%y", timep);
xo_emit("The hat for {:name/%hs} is {:size/%s}.\\n",
name, size_val);
xo_emit("It was ordered on {:order-time/%ls}.\\n",
when);
}
.Ed
.Pp
It is important to note that
.Xr xo_emit 3
will perform the conversion
required to make appropriate output.
Text style output uses the
current locale (as described above), while XML, JSON, and HTML use
UTF-8.
.Pp
UTF-8 and locale-encoded strings can use multiple bytes to encode one
column of data.
The traditional "precision'" (aka "max-width") value
for "%s" printf formatting becomes overloaded since it specifies both
the number of bytes that can be safely referenced and the maximum
number of columns to emit.
.Xr xo_emit 3
uses the precision as the former,
and adds a third value for specifying the maximum number of columns.
.Pp
In this example, the name field is printed with a minimum of 3 columns
and a maximum of 6.
Up to ten bytes are in used in filling those columns.
.Bd -literal -offset indent
xo_emit("{:name/%3.10.6s}", name);
.Ed
.Ss "Characters Outside of Field Definitions"
Characters in the format string that are not part of a field definition are
copied to the output for the TEXT style, and are ignored for the JSON
and XML styles.
For HTML, these characters are placed in a
with class "text".
.Bd -literal -offset indent
EXAMPLE:
xo_emit("The hat is {:size/%s}.\\n", size_val);
TEXT:
The hat is extra small.
XML:
extra small
JSON:
"size": "extra small"
HTML:
The hat is
extra small
.
.Ed
.Ss "'%n' is Not Supported"
.Nm libxo
does not support the '%n' directive.
It is a bad idea and we
just do not do it.
.Ss "The Encoding Format (eformat)"
The "eformat" string is the format string used when encoding the field
for JSON and XML.
If not provided, it defaults to the primary format
with any minimum width removed.
If the primary is not given, both default to "%s".
.Sh EXAMPLE
In this example, the value for the number of items in stock is emitted:
.Bd -literal -offset indent
xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\\n",
instock);
.Ed
.Pp
This call will generate the following output:
.Bd -literal -offset indent
TEXT:
In stock: 144
XML:
144
JSON:
"in-stock": 144,
HTML:
.Ed
.Pp
Clearly HTML wins the verbosity award, and this output does
not include
.Dv XOF_XPATH
or
.Dv XOF_INFO
data, which would expand the penultimate line to:
.Bd -literal -offset indent
144
.Ed
.Sh WHAT MAKES A GOOD FIELD NAME?
To make useful, consistent field names, follow these guidelines:
.Ss "Use lower case, even for TLAs"
Lower case is more civilized.
Even TLAs should be lower case
to avoid scenarios where the differences between "XPath" and
"Xpath" drive your users crazy.
Using "xpath" is simpler and better.
.Ss "Use hyphens, not underscores"
Use of hyphens is traditional in XML, and the
.Dv XOF_UNDERSCORES
flag can be used to generate underscores in JSON, if desired.
But the raw field name should use hyphens.
.Ss "Use full words"
Do not abbreviate especially when the abbreviation is not obvious or
not widely used.
Use "data-size", not "dsz" or "dsize".
Use
"interface" instead of "ifname", "if-name", "iface", "if", or "intf".
.Ss "Use
-"
Using the form - or -- helps in
making consistent, useful names, avoiding the situation where one app
uses "sent-packet" and another "packets-sent" and another
"packets-we-have-sent".
The can be dropped when it is
obvious, as can obvious words in the classification.
Use "receive-after-window-packets" instead of
"received-packets-of-data-after-window".
.Ss "Reuse existing field names"
Nothing is worse than writing expressions like:
.Bd -literal -offset indent
if ($src1/process[pid == $pid]/name ==
$src2/proc-table/proc/p[process-id == $pid]/proc-name) {
...
}
.Ed
.Pp
Find someone else who is expressing similar data and follow their
fields and hierarchy.
Remember the quote is not
.Dq "Consistency is the hobgoblin of little minds"
but
.Dq "A foolish consistency is the hobgoblin of little minds" .
.Ss "Think about your users"
Have empathy for your users, choosing clear and useful fields that
contain clear and useful data.
You may need to augment the display content with
.Xr xo_attr 3
calls or "{e:}" fields to make the data useful.
.Ss "Do not use an arbitrary number postfix"
What does "errors2" mean?
No one will know.
"errors-after-restart" would be a better choice.
Think of your users, and think of the future.
If you make "errors2", the next guy will happily make
"errors3" and before you know it, someone will be asking what is the
difference between errors37 and errors63.
.Ss "Be consistent, uniform, unsurprising, and predictable"
Think of your field vocabulary as an API.
You want it useful,
expressive, meaningful, direct, and obvious.
You want the client
application's programmer to move between without the need to
understand a variety of opinions on how fields are named.
They should
see the system as a single cohesive whole, not a sack of cats.
.Pp
Field names constitute the means by which client programmers interact
with our system.
By choosing wise names now, you are making their lives better.
.Pp
After using
.Xr xolint 1
to find errors in your field descriptors, use
.Dq "xolint -V"
to spell check your field names and to detect different
names for the same data.
.Dq dropped-short
and
.Dq dropped-too-short
are both reasonable names, but using them both will lead users to ask the
difference between the two fields.
If there is no difference,
use only one of the field names.
If there is a difference, change the
names to make that difference more obvious.
.Sh SEE ALSO
.Xr libxo 3 ,
.Xr xolint 1 ,
.Xr xo_emit 3