I
Ian Ward
I hope to add support for multi-byte encoded and bidirectional text to
my curses-based UI library:
http://excess.org/urwid/
I would like to support whatever encoding the user likes. Are there
functions for:
- querying the preferred encoding
- splitting encoded strings into characters based on an encoding
- determining the direction (L to R, R to L) of each character
- determining the number of columns used by each character when written
to the terminal
I currently use a "line translation" structure to store instructions
for mapping a text string to a two-dimensional "canvas". Its current,
simple, format is described here:
http://excess.org/urwid/reference.html#Text-get_line_translation
The line translation structures describe the result of
word-wrapping/clipping and justification applied to the source text. A
*new* line translation format would have to support characters that are
N bytes in the string and M columns wide when displayed, as well as text
that is displayed in a different order than it appears in the string.
Is normalizing bidirectional text orthogonal to wrapping/clipping and
aligning that text? Could I create a "direction translation" structure
that describes how a given string can be reordered Left-to-Right, then
solve the wrapping and alignment with this normalized version?
In what situations are characters modified/removed/inserted as part of
displaying them? (eg. punctuation being reversed when surrounding R to L
text)
TIA
Ian Ward <ian#excess,org>
my curses-based UI library:
http://excess.org/urwid/
I would like to support whatever encoding the user likes. Are there
functions for:
- querying the preferred encoding
- splitting encoded strings into characters based on an encoding
- determining the direction (L to R, R to L) of each character
- determining the number of columns used by each character when written
to the terminal
I currently use a "line translation" structure to store instructions
for mapping a text string to a two-dimensional "canvas". Its current,
simple, format is described here:
http://excess.org/urwid/reference.html#Text-get_line_translation
The line translation structures describe the result of
word-wrapping/clipping and justification applied to the source text. A
*new* line translation format would have to support characters that are
N bytes in the string and M columns wide when displayed, as well as text
that is displayed in a different order than it appears in the string.
Is normalizing bidirectional text orthogonal to wrapping/clipping and
aligning that text? Could I create a "direction translation" structure
that describes how a given string can be reordered Left-to-Right, then
solve the wrapping and alignment with this normalized version?
In what situations are characters modified/removed/inserted as part of
displaying them? (eg. punctuation being reversed when surrounding R to L
text)
TIA
Ian Ward <ian#excess,org>