Regexp problem for template language

Tobias Luetke · Oct 18, 2005

I showed this to a few people during rubyconf but couldn't find a good
solution.
Liquid is a non evaling (html-)template engine which I use to allow my
customers to edit their shop's appearance in Shopify.

I would like to tokenize the entire document in one pass.
This is a stripped down test case demonstrating the blocker. Basically
i can't find a good way to get all the text between {% tags %} and {{
variables }} ( which i have omitted from the test case for
simplicity).

Once this is addressed i'll release liquid to rubyforge.

Testcase:

require 'test/unit'
TokenizationRegexp =3D /\{%.*?%\}|[^\{]+/ # this doesn't work

class ParsingTest < Test::Unit::TestCase

def test_tokenization

# Please make me work

text =3D "Hello im liquid this: {% is a tag %} curly brackets like
this { may appear in the text } please parse me"
assert_equal ["Hello im liquid this: ", "{% is a tag %}", " curly
brackets like this { may appear in the text } please parse me"],
text.scan(TokenizationRegexp)
end

def test_tokenization_without_curly

text =3D "Hello im liquid this: {% is a tag %}"
assert_equal [ "Hello im liquid this: ", "{% is a tag %}"],
text.scan(TokenizationRegexp)
end

end

# Loaded suite test
# Started
# F.
# Finished in 0.021796 seconds.
#
# 1) Failure:
# test_tokenization(ParsingTest) [test.rb:11]:
# <["Hello im liquid this: ",
# "{% is a tag %}",
# " curly brackets like this { may appear in the text } please parse
me"]> expected but was
# <["Hello im liquid this: ",
# "{% is a tag %}",
# " curly brackets like this ",
# " may appear in the text } please parse me"]>.
#
# 2 tests, 2 assertions, 1 failures, 0 errors

Sean O'Halpin · Oct 18, 2005

Using split instead of scan and a slightly different regex:

require 'test/unit'

TokenizationRegexp =3D /(\{%.*?%\})+/ # Note different regex

class ParsingTest < Test::Unit::TestCase

def test_tokenization
text =3D "Hello im liquid this: {% is a tag %} curly brackets like
this { may appear in the text } please parse me"
assert_equal ["Hello im liquid this: ", "{% is a tag %}", " curly
brackets like this { may appear in the text } please parse me"],
text.split(TokenizationRegexp) # Note: using split
end

def test_tokenization_without_curly
text =3D "Hello im liquid this: {% is a tag %}"
assert_equal [ "Hello im liquid this: ", "{% is a tag %}"],
text.split(TokenizationRegexp) # Note: using split
end
end

Watch out for gmail's line breaks!

HTH,

Regards,

Sean

Sean O'Halpin · Oct 18, 2005

I made a mistake in the revised regex - it shouldn't have the +

TokenizationRegexp =3D /(\{%.*?%\})/

Regards,

Sean

Kevin Ballard · Oct 18, 2005

If you want between {{ }} variables, shouldn't that second branch of
the regexp be {2,} rather than +? Not to mention that it never matches
the closing }}.

It might be better to, instead of trying to do it all with just a
regex, implement a stateful parser that matches an opening tag and
matches a closing tag from there. That way you can also add spiffy
stuff like nested tags.

Tobias Luetke · Oct 18, 2005

I omitted the {{ }} search from the regexp for this example, the
second part was only for trying to get all the text in between {% tags
%}.

Thanks to Sean for providing an working way to do satisfy the test
cases this in one call!

Sean O'Halpin · Oct 18, 2005

I omitted the {{ }} search from the regexp for this example, the
second part was only for trying to get all the text in between {% tags
%}.

require 'test/unit'

TokenizationRegexp =3D /(\{%.*?%\})|(\{\{.*?\}\})/ # with {{}} tags

class ParsingTest < Test::Unit::TestCase

def test_tokenization
text =3D "Hello im liquid this: {% is a tag %} curly brackets like
this { may appear in the text } please parse {{me}}"
assert_equal ["Hello im liquid this: ", "{% is a tag %}", " curly
brackets like this { may appear in the text } please parse ",=20
"{{me}}"],
text.split(TokenizationRegexp) # Note: using split
end

def test_tokenization_without_curly
text =3D "Hello im liquid this: {% is a tag %}"
assert_equal [ "Hello im liquid this: ", "{% is a tag %}"],
text.split(TokenizationRegexp) # Note: using split
end
end

-- OUTPUT --
Loaded suite liquid-regex
Started
..
Finished in 0.0 seconds.

2 tests, 2 assertions, 0 failures, 0 errors

Regards,

Sean

Generate one HTML from API based on the object key language and their value	2	Aug 19, 2022
Who are low code solutions designed for?	1	Oct 22, 2023
removing Whitespace using regexp	6	May 6, 2009
Reverse search for a website	2	Apr 24, 2024
[ANN] re_template 0.0.1.2	0	Nov 24, 2009
Is React Native good for mobile game development?	1	Mar 20, 2024
Bash scripts for web apps	1	Jan 16, 2023
Changing location of button on each individual slide of slideshow	0	May 18, 2020

Regexp problem for template language

Tobias Luetke

Sean O'Halpin

Sean O'Halpin

Kevin Ballard

Tobias Luetke

Sean O'Halpin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads