Avoiding memcpy() during String creation in a C extension

  • Thread starter Wincent Colaiuta
  • Start date
W

Wincent Colaiuta

I'm working on a fast wikitext-to-HTML translator written in C and am
trying to optimize it.

Profiling shows that a fair portion of the time is being spent
instantiating temporary strings and much of that time is spent in
memcpy.

In much of the code I could avoid many String instantiations and just
prepare everything in a buffer at the raw byte level, not touching any
Ruby objects, and do a rb_str_new() at the end.

The problem with this approach is that Ruby will alloc space for the
string and then memcpy the buffer, which is wasteful seeing as I've
already done the alloc and I just want to relinquish control of that
block of memory and hand it over to Ruby.

Basically what I would like is a String instantiation method which
could take a pointer to a buffer and basically take ownership of that
buffer instead of calling memcpy() on it.

I've looked in string.c to see if such a function exists and it
appears that it does not. I know I could write my own such function
but that is starting to meddle with the internals a little bit too
much for my liking, and I'd need a separate version for Ruby 1.8 and
another for 1.9. It would be very brittle and could break if changes
are made to Ruby in the future.

Any ideas? Or should I just resign myself to that redundant memcpy()?
We're not talking about a lot of memory, but it's something that can
run thousands of times per second and is currently processing wikitext
at about 6 megabytes per second and I'd like to make it even faster.

Cheers,
Wincent
 
T

Tim Hunter

Wincent said:
Basically what I would like is a String instantiation method which
could take a pointer to a buffer and basically take ownership of that
buffer instead of calling memcpy() on it.

I've looked in string.c to see if such a function exists and it
appears that it does not.

Instead of putting your data into a buffer and then asking Ruby to make
a String out of it, could you use rb_str_buf_new() to create a String
with a specific capacity and then put your data into the String's
buffer?
 
W

Wincent Colaiuta

Instead of putting your data into a buffer and then asking Ruby to make
a String out of it, could you use rb_str_buf_new() to create a String
with a specific capacity and then put your data into the String's
buffer?

Thanks, Tim.

That would indeed work I think. Didn't know about that function but it
will come in very handy I think. And if the buffer needs to grow as I
append output to it then rb_str_cat should be relatively efficient (I
certainly noticed a huge speed-up when I started using rb_str_cat with
raw buffers rather than creating new Ruby Strings and using
rb_str_append).

Cheers,
Wincent
 
P

Phlip

Wincent said:
I'm working on a fast wikitext-to-HTML translator written in C and am
trying to optimize it.

Profiling shows that a fair portion of the time is being spent
instantiating temporary strings and much of that time is spent in
memcpy.

Shouldn't you just focus on streaming? I always figured that getting faster than
fgetc() and fputc() would requiring reinventing all the memory-mapped files and
DMA that back those up...
 
W

Wincent Colaiuta

Shouldn't you just focus on streaming? I always figured that getting faster than
fgetc() and fputc() would requiring reinventing all the memory-mapped files and
DMA that back those up...

I'm not sure what you mean. Can you clarify a bit? The translator
consists of an extremely faster Ragel tokenizer/lexer/scanner that
emits tokens one at a time, and a parser that decides what to do with
them. The tokens themselves don't contain copies of the corresponding
substrings, but pointers into the input string. The parser is not a
recursive descent parser but an on-the-fly translator that decides
what to emit as it sees each token (thus avoiding function call
overhead; we don't need the same level of structured parsing that a
recursive descent parser would give us because we're only transforming
wikitext into HTML, not parsing a programming language; wikitext can
be imperfect and have syntax errors whereas a strictly defined
programming language must not).

But when it comes time to emit the output you generally have to
instantiate some strings. As an example, consider parsing the
wikitext, ''foo'', which should be translated into <em>foo</em>. At
the Ruby level you want a simple interface, where you pass in a
standard String object and get back a standard String object:

parser.parse "''foo''"
=> "<em>foo</em>"

So under the covers in the C extension this is what happens:

0. Initialize empty output buffer, currently just an empty Ruby
string
1. Provide tokenizer with a pointer to the start of the input string
(no copies are made)
2. Ask tokenizer for the next token
3. Get an "EM" token (no copy of token text made, just a pointer
into the input string
4. Parser appends "<em>" to output buffer using rb_str_cat (avoiding
instantiating of Ruby String)
5. Ask tokenizer for next token
6. Get a "PRINTABLE" token (no copy of token text made, just a
pointer into the input string)
7. Parser appends "foo" to output buffer using rb_str_cat (again
avoiding instantiating a Ruby String)
8. Ask tokenizer for next token
7. Get an "EM" token
9. Parser appends "</em> to output buffer using rb_str_cat
10. Return finished String to caller

So in a general sense it is a "streaming" transformer (meaning that it
transforms the input on the fly without doing any lookahead except in
some special circumstances where it has to ask the lexer for another
token in order to decide what to emit), but I gather from your comment
that you're referring to something else.

So in looking at the code there are now very few String/Object
instantiations that I can avoid because I am already doing as much as
I can at the raw byte level (with rb_str_cat and memcpy itself).
Nevertheless it looks like those few sites are now the lowest-hanging-
fruit reported in profiling and in order to make it faster that's what
I have to attack. So I sent my original email in this thread because
there are still _some_ temporary instantiations that I could avoid on
occasions where the parser calls out to other functions (which are
actually mostly inlined), and being able to pass in a buffer without
having Ruby memcpy it would be a great help.

Cheers,
Wincent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top