S
Steve Howell
Python has this really neat idea called indentation-based syntax, and
there are folks that have caught on to this idea in the HTML
community.
AFAIK the most popular indentation-based solution for generating HTML
is a tool called HAML, which actually is written in Ruby.
I have been poking around with the HAML concepts in Python, with the
specific goal of integrating with Django. But before releasing that,
I thought it would be useful to post code that distills the basic
concept with no assumptions about your target renderer. I hope it
also serves as a good example of what you can do in exactly 100 lines
of Python code.
Here is what it does...
You can use indentation syntax for HTML tags like table.
From this...
table
tr
td
Left
td
Center
td
Right
...you get this:
<table>
<tr>
<td>
Left
</td>
<td>
Center
</td>
<td>
Right
</td>
</tr>
</table>
Lists and divs work the same way, and note that attributes are not
a problem.
From this...
div class="spinnable"
ul
li id="item1"
One
li id="item2"
Two
...you get this:
<div class="spinnable">
<ul>
<li id="item1">
One
</li>
<li id="item2">
Two
</li>
</ul>
</div>
You can still use raw HTML tags where appropriate (such as when
converting
legacy markup to the new style).
From this...
<table>
tr
td
<b>Hello World!</b>
</table>
...you get this:
<table>
<tr>
<td>
<b>Hello World!</b>
</td>
</tr>
</table>
And here is the code:
import re
def convert_text(in_body):
'''
Convert HAML-like markup to HTML. Allow raw HTML to
fall through.
'''
indenter = Indenter()
for prefix, line, kind in get_lines(in_body):
if kind == 'branch' and '<' not in line:
html_block_tag(prefix, line, indenter)
else:
indenter.add(prefix, line)
return indenter.body()
def html_block_tag(prefix, line, indenter):
'''
Block tags have syntax like this and only
apply to branches in indentation:
table
tr
td class="foo"
leaf #1
td
leaf #2
'''
start_tag = '<%s>' % line
end_tag = '</%s>' % line.split()[0]
indenter.push(prefix, start_tag, end_tag)
class Indenter:
'''
Example usage:
indenter = Indenter()
indenter.push('', 'Start', 'End')
indenter.push(' ', 'Foo', '/Foo')
indenter.add (' ', 'bar')
indenter.add (' ', 'yo')
print indenter.body()
'''
def __init__(self):
self.stack = []
self.lines = []
def push(self, prefix, start, end):
self.add(prefix, start)
self.stack.append((prefix, end))
def add(self, prefix, line):
if line:
self.pop(prefix)
self.insert(prefix, line)
def insert(self, prefix, line):
self.lines.append(prefix+line)
def pop(self, prefix):
while self.stack:
start_prefix, end = self.stack[-1]
if len(prefix) <= len(start_prefix):
whitespace_lines = []
while self.lines and self.lines[-1] == '':
whitespace_lines.append(self.lines.pop())
self.insert(start_prefix, end)
self.lines += whitespace_lines
self.stack.pop()
else:
return
def body(self):
self.pop('')
return '\n'.join(self.lines)
def get_lines(in_body):
'''
Splits out lines from a file and identifies whether lines
are branches, leafs, or blanks. The detection of branches
could probably be done in a more elegant way than patching
the last non-blank line, but it works.
'''
lines = []
last_line = -1
for line in in_body.split('\n'):
m = re.match('(\s*)(.*)', line)
prefix, line = m.groups()
if line:
line = line.rstrip()
if last_line >= 0:
old_prefix, old_line, ignore = lines[last_line]
if len(old_prefix) < len(prefix):
lines[last_line] = (old_prefix, old_line,
'branch')
last_line = len(lines)
lines.append((prefix, line, 'leaf')) # leaf for now
else:
lines.append(('', '', 'blank'))
return lines
As I mention in the comment for get_lines(), I wonder if there are
more elegant ways to deal with the indentation, both of the input and
the output.
there are folks that have caught on to this idea in the HTML
community.
AFAIK the most popular indentation-based solution for generating HTML
is a tool called HAML, which actually is written in Ruby.
I have been poking around with the HAML concepts in Python, with the
specific goal of integrating with Django. But before releasing that,
I thought it would be useful to post code that distills the basic
concept with no assumptions about your target renderer. I hope it
also serves as a good example of what you can do in exactly 100 lines
of Python code.
Here is what it does...
You can use indentation syntax for HTML tags like table.
From this...
table
tr
td
Left
td
Center
td
Right
...you get this:
<table>
<tr>
<td>
Left
</td>
<td>
Center
</td>
<td>
Right
</td>
</tr>
</table>
Lists and divs work the same way, and note that attributes are not
a problem.
From this...
div class="spinnable"
ul
li id="item1"
One
li id="item2"
Two
...you get this:
<div class="spinnable">
<ul>
<li id="item1">
One
</li>
<li id="item2">
Two
</li>
</ul>
</div>
You can still use raw HTML tags where appropriate (such as when
converting
legacy markup to the new style).
From this...
<table>
tr
td
<b>Hello World!</b>
</table>
...you get this:
<table>
<tr>
<td>
<b>Hello World!</b>
</td>
</tr>
</table>
And here is the code:
import re
def convert_text(in_body):
'''
Convert HAML-like markup to HTML. Allow raw HTML to
fall through.
'''
indenter = Indenter()
for prefix, line, kind in get_lines(in_body):
if kind == 'branch' and '<' not in line:
html_block_tag(prefix, line, indenter)
else:
indenter.add(prefix, line)
return indenter.body()
def html_block_tag(prefix, line, indenter):
'''
Block tags have syntax like this and only
apply to branches in indentation:
table
tr
td class="foo"
leaf #1
td
leaf #2
'''
start_tag = '<%s>' % line
end_tag = '</%s>' % line.split()[0]
indenter.push(prefix, start_tag, end_tag)
class Indenter:
'''
Example usage:
indenter = Indenter()
indenter.push('', 'Start', 'End')
indenter.push(' ', 'Foo', '/Foo')
indenter.add (' ', 'bar')
indenter.add (' ', 'yo')
print indenter.body()
'''
def __init__(self):
self.stack = []
self.lines = []
def push(self, prefix, start, end):
self.add(prefix, start)
self.stack.append((prefix, end))
def add(self, prefix, line):
if line:
self.pop(prefix)
self.insert(prefix, line)
def insert(self, prefix, line):
self.lines.append(prefix+line)
def pop(self, prefix):
while self.stack:
start_prefix, end = self.stack[-1]
if len(prefix) <= len(start_prefix):
whitespace_lines = []
while self.lines and self.lines[-1] == '':
whitespace_lines.append(self.lines.pop())
self.insert(start_prefix, end)
self.lines += whitespace_lines
self.stack.pop()
else:
return
def body(self):
self.pop('')
return '\n'.join(self.lines)
def get_lines(in_body):
'''
Splits out lines from a file and identifies whether lines
are branches, leafs, or blanks. The detection of branches
could probably be done in a more elegant way than patching
the last non-blank line, but it works.
'''
lines = []
last_line = -1
for line in in_body.split('\n'):
m = re.match('(\s*)(.*)', line)
prefix, line = m.groups()
if line:
line = line.rstrip()
if last_line >= 0:
old_prefix, old_line, ignore = lines[last_line]
if len(old_prefix) < len(prefix):
lines[last_line] = (old_prefix, old_line,
'branch')
last_line = len(lines)
lines.append((prefix, line, 'leaf')) # leaf for now
else:
lines.append(('', '', 'blank'))
return lines
As I mention in the comment for get_lines(), I wonder if there are
more elegant ways to deal with the indentation, both of the input and
the output.