urwid with multi-byte encoded and bidirectional text?

I

Ian Ward

I hope to add support for multi-byte encoded and bidirectional text to
my curses-based UI library:
http://excess.org/urwid/

I would like to support whatever encoding the user likes. Are there
functions for:
- querying the preferred encoding
- splitting encoded strings into characters based on an encoding
- determining the direction (L to R, R to L) of each character
- determining the number of columns used by each character when written
to the terminal

I currently use a "line translation" structure to store instructions
for mapping a text string to a two-dimensional "canvas". Its current,
simple, format is described here:
http://excess.org/urwid/reference.html#Text-get_line_translation

The line translation structures describe the result of
word-wrapping/clipping and justification applied to the source text. A
*new* line translation format would have to support characters that are
N bytes in the string and M columns wide when displayed, as well as text
that is displayed in a different order than it appears in the string.

Is normalizing bidirectional text orthogonal to wrapping/clipping and
aligning that text? Could I create a "direction translation" structure
that describes how a given string can be reordered Left-to-Right, then
solve the wrapping and alignment with this normalized version?

In what situations are characters modified/removed/inserted as part of
displaying them? (eg. punctuation being reversed when surrounding R to L
text)

TIA

Ian Ward <ian#excess,org>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top