UTF-8 regular expressions

B

Ben Lee

Hi,
So I read the post from awhile back about packing multi-byte UTF-8
characters as octal:

r = Regexp.compile("ab\304\243cd", 0, "UTF-8")

or
r = Regexp.compile("ab#{[0x123].pack('U')}cd", 0, "UTF-8")

So this seems to be a way to list out individual multi-byte UTF-8
characters
I was wondering if there's then a convenient way to specify a range of
UTF-8 characters?

For instance the darn
0x2002-2003
0x2013-2014
0x2018-201E
characters?

Thanks,
Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,652
Latest member
Campbellamy

Latest Threads

Top