Regexp problem for template language

T

Tobias Luetke

I showed this to a few people during rubyconf but couldn't find a good
solution.
Liquid is a non evaling (html-)template engine which I use to allow my
customers to edit their shop's appearance in Shopify.

I would like to tokenize the entire document in one pass.
This is a stripped down test case demonstrating the blocker. Basically
i can't find a good way to get all the text between {% tags %} and {{
variables }} ( which i have omitted from the test case for
simplicity).

Once this is addressed i'll release liquid to rubyforge.

Testcase:

require 'test/unit'
TokenizationRegexp =3D /\{%.*?%\}|[^\{]+/ # this doesn't work

class ParsingTest < Test::Unit::TestCase

def test_tokenization

# Please make me work

text =3D "Hello im liquid this: {% is a tag %} curly brackets like
this { may appear in the text } please parse me"
assert_equal ["Hello im liquid this: ", "{% is a tag %}", " curly
brackets like this { may appear in the text } please parse me"],
text.scan(TokenizationRegexp)
end

def test_tokenization_without_curly

text =3D "Hello im liquid this: {% is a tag %}"
assert_equal [ "Hello im liquid this: ", "{% is a tag %}"],
text.scan(TokenizationRegexp)
end


end

# Loaded suite test
# Started
# F.
# Finished in 0.021796 seconds.
#
# 1) Failure:
# test_tokenization(ParsingTest) [test.rb:11]:
# <["Hello im liquid this: ",
# "{% is a tag %}",
# " curly brackets like this { may appear in the text } please parse
me"]> expected but was
# <["Hello im liquid this: ",
# "{% is a tag %}",
# " curly brackets like this ",
# " may appear in the text } please parse me"]>.
#
# 2 tests, 2 assertions, 1 failures, 0 errors
 
S

Sean O'Halpin

Using split instead of scan and a slightly different regex:

require 'test/unit'

TokenizationRegexp =3D /(\{%.*?%\})+/ # Note different regex

class ParsingTest < Test::Unit::TestCase

def test_tokenization
text =3D "Hello im liquid this: {% is a tag %} curly brackets like
this { may appear in the text } please parse me"
assert_equal ["Hello im liquid this: ", "{% is a tag %}", " curly
brackets like this { may appear in the text } please parse me"],
text.split(TokenizationRegexp) # Note: using split
end

def test_tokenization_without_curly
text =3D "Hello im liquid this: {% is a tag %}"
assert_equal [ "Hello im liquid this: ", "{% is a tag %}"],
text.split(TokenizationRegexp) # Note: using split
end
end

Watch out for gmail's line breaks!

HTH,

Regards,

Sean
 
S

Sean O'Halpin

I made a mistake in the revised regex - it shouldn't have the +

TokenizationRegexp =3D /(\{%.*?%\})/

Regards,

Sean
 
K

Kevin Ballard

If you want between {{ }} variables, shouldn't that second branch of
the regexp be {2,} rather than +? Not to mention that it never matches
the closing }}.

It might be better to, instead of trying to do it all with just a
regex, implement a stateful parser that matches an opening tag and
matches a closing tag from there. That way you can also add spiffy
stuff like nested tags.
 
T

Tobias Luetke

I omitted the {{ }} search from the regexp for this example, the
second part was only for trying to get all the text in between {% tags
%}.

Thanks to Sean for providing an working way to do satisfy the test
cases this in one call!
 
S

Sean O'Halpin

I omitted the {{ }} search from the regexp for this example, the
second part was only for trying to get all the text in between {% tags
%}.

require 'test/unit'

TokenizationRegexp =3D /(\{%.*?%\})|(\{\{.*?\}\})/ # with {{}} tags

class ParsingTest < Test::Unit::TestCase

def test_tokenization
text =3D "Hello im liquid this: {% is a tag %} curly brackets like
this { may appear in the text } please parse {{me}}"
assert_equal ["Hello im liquid this: ", "{% is a tag %}", " curly
brackets like this { may appear in the text } please parse ",=20
"{{me}}"],
text.split(TokenizationRegexp) # Note: using split
end

def test_tokenization_without_curly
text =3D "Hello im liquid this: {% is a tag %}"
assert_equal [ "Hello im liquid this: ", "{% is a tag %}"],
text.split(TokenizationRegexp) # Note: using split
end
end

-- OUTPUT --
Loaded suite liquid-regex
Started
..
Finished in 0.0 seconds.

2 tests, 2 assertions, 0 failures, 0 errors

Regards,

Sean
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,536
Latest member
VeldaYoung

Latest Threads

Top