a 100-line indentation-based preprocessor for HTML

S

Steve Howell

Python has this really neat idea called indentation-based syntax, and
there are folks that have caught on to this idea in the HTML
community.

AFAIK the most popular indentation-based solution for generating HTML
is a tool called HAML, which actually is written in Ruby.

I have been poking around with the HAML concepts in Python, with the
specific goal of integrating with Django. But before releasing that,
I thought it would be useful to post code that distills the basic
concept with no assumptions about your target renderer. I hope it
also serves as a good example of what you can do in exactly 100 lines
of Python code.

Here is what it does...

You can use indentation syntax for HTML tags like table.

From this...

table
tr
td
Left
td
Center
td
Right

...you get this:

<table>
<tr>
<td>
Left
</td>
<td>
Center
</td>
<td>
Right
</td>
</tr>
</table>

Lists and divs work the same way, and note that attributes are not
a problem.

From this...

div class="spinnable"
ul
li id="item1"
One
li id="item2"
Two

...you get this:

<div class="spinnable">
<ul>
<li id="item1">
One
</li>
<li id="item2">
Two
</li>
</ul>
</div>

You can still use raw HTML tags where appropriate (such as when
converting
legacy markup to the new style).

From this...

<table>
tr
td
<b>Hello World!</b>
</table>

...you get this:

<table>
<tr>
<td>
<b>Hello World!</b>
</td>
</tr>
</table>

And here is the code:

import re

def convert_text(in_body):
'''
Convert HAML-like markup to HTML. Allow raw HTML to
fall through.
'''
indenter = Indenter()
for prefix, line, kind in get_lines(in_body):
if kind == 'branch' and '<' not in line:
html_block_tag(prefix, line, indenter)
else:
indenter.add(prefix, line)
return indenter.body()


def html_block_tag(prefix, line, indenter):
'''
Block tags have syntax like this and only
apply to branches in indentation:

table
tr
td class="foo"
leaf #1
td
leaf #2
'''
start_tag = '<%s>' % line
end_tag = '</%s>' % line.split()[0]
indenter.push(prefix, start_tag, end_tag)


class Indenter:
'''
Example usage:

indenter = Indenter()
indenter.push('', 'Start', 'End')
indenter.push(' ', 'Foo', '/Foo')
indenter.add (' ', 'bar')
indenter.add (' ', 'yo')
print indenter.body()
'''
def __init__(self):
self.stack = []
self.lines = []

def push(self, prefix, start, end):
self.add(prefix, start)
self.stack.append((prefix, end))

def add(self, prefix, line):
if line:
self.pop(prefix)
self.insert(prefix, line)

def insert(self, prefix, line):
self.lines.append(prefix+line)

def pop(self, prefix):
while self.stack:
start_prefix, end = self.stack[-1]
if len(prefix) <= len(start_prefix):
whitespace_lines = []
while self.lines and self.lines[-1] == '':
whitespace_lines.append(self.lines.pop())
self.insert(start_prefix, end)
self.lines += whitespace_lines
self.stack.pop()
else:
return

def body(self):
self.pop('')
return '\n'.join(self.lines)

def get_lines(in_body):
'''
Splits out lines from a file and identifies whether lines
are branches, leafs, or blanks. The detection of branches
could probably be done in a more elegant way than patching
the last non-blank line, but it works.
'''
lines = []
last_line = -1
for line in in_body.split('\n'):
m = re.match('(\s*)(.*)', line)
prefix, line = m.groups()
if line:
line = line.rstrip()
if last_line >= 0:
old_prefix, old_line, ignore = lines[last_line]
if len(old_prefix) < len(prefix):
lines[last_line] = (old_prefix, old_line,
'branch')
last_line = len(lines)
lines.append((prefix, line, 'leaf')) # leaf for now
else:
lines.append(('', '', 'blank'))
return lines

As I mention in the comment for get_lines(), I wonder if there are
more elegant ways to deal with the indentation, both of the input and
the output.
 
S

Steve Howell

Python has this really neat idea called indentation-based syntax, and
there are folks that have caught on to this idea in the HTML
community.
AFAIK the most popular indentation-based solution for generating HTML
is a tool called HAML, which actually is written in Ruby.
I have been poking around with the HAML concepts in Python, with the
specific goal of integrating with Django.   But before releasing that,
I thought it would be useful to post code that distills the basic
concept with no assumptions about your target renderer.  I hope it
also serves as a good example of what you can do in exactly 100 lines
of Python code.
Here is what it does...
     You can use indentation syntax for HTML tags like table.
     From this...
     table
         tr
             td
                 Left
             td
                 Center
             td
                 Right
     ...you get this:
 ...

[snip]

This is a neat idea but would a two character indentation not be enough?

The code actually preserves whatever indent style you give as input,
as long as you are consistent. I used 4-space indents in my examples
since that seems to be the convention in Python, but when I did my
stint in Ruby, I got pretty comfortable with 2-space indents as well,
and I think it can make sense for HTML, where you do get pretty deep
sometimes with idioms like table/tr/td.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top