A
aurora
This is an entry I just added to ASPN. It is a somewhat novel technique I
have employed quite successfully in my code. I repost it here for more
explosure and discussions.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/475158
wy
------------------------------------------------------------------------
Title: Design mini-lanugage for data input
Description:
Many programs need a set of initial data. For ease of use and flexibility,
design a mini-language for your input data. Use Python's superb text
handling capability to parse and build the data structure from the input
text.
Source: Text Source
# this is an example to demonstrate the programming technique
DATA = """
# data souce: http://www.mongabay.com/igapo/world_statistics_by_pop.htm
# Country / Captial / Area [sq. km] / 2002 Population Estimate
China / Beijing / 9,596,960 / 1,284,303,705
India / New Delhi / 3,287,590 / 1,045,845,226
United States / Washington DC / 9,629,091 / 280,562,489
Indonesia / Jakarta / 1,919,440 / 231,328,092
Russia / Moscow / 17,075,200 / 144,978,573
"""
def initData():
""" parse and return a country list of (name, captial, area,
population) """
countries = []
for line in DATA.splitlines():
# filter out blank lines/comment lines
line = line.strip()
if not line or line.startswith('#'):
continue
# 4 fields separated by '/'
parts = map(string.strip, line.split('/'))
country, captial, area, population = parts
# remove commas in numbers
area = int(area.replace(',',''))
population = int(population.replace(',',''))
countries.append((country, captial, area, population))
return countries
def findLargestCountry(countries):
# your algorithm here
def main():
countries = initData()
print findLargestCountry(countries)
Discussion:
Problem
-------
Many programs need a set of initial data. The simplest way is to construct
Python data structure directly as shown below. This is often not ideal.
Algorithm and data structure tend to change. Python program statements is
likely differ literally from its data source, which might be text pulled
from web pages or other place. This means a great deal of effort is often
needed to format and maintain the input as Python statements.
This is a sample program that initialize some geographical data.
# map of country -> (captial, area, population)
COUNTRIES = {}
COUNTRIES['China'] = ('Beijing', 9596960, 1284303705)
COUNTRIES['India'] = ('New Delhi', 3287590, 1045845226)
COUNTRIES['United States'] = ('Washington DC', 9629091, 280562489)
COUNTRIES['Indonesia'] = ('Jakarta', 1919440, 231328092)
COUNTRIES['Russia'] = ('Moscow', 17075200, 144978573)
Mini-language
-------------
A more flexible approach is to define a mini-lanugage to describe the
data. This can be as simple as formatting data into a multiple-line string.
1. Define the data format in text. It should mirror the data source and
designed for ease for human editing.
2. Define the data structure.
3. Write glue code to parse the input data and initialize the data
structure.
In the example above we use one line for each record. Each record has four
fields, Country, captial, area and population, separated by slashes. One
of the immediate benefit is that we no longer need to type so many quotes
for every string literal. This concise data format is much easiler to read
and edit than Python statements.
The parser simply break down the input text using splitlines() and then
loop through them line by line. It is useful to account for some extra
white space so that it is more flexible for human editor. In this case the
numbers (area, population) from the data source contains commas. Rather
than manually edit them out, they are copied as is into the text as is.
Then they are parsed into integer using
area = int(area.replace(',',''))
Slash is chosen as the separator (rather than the more common comma)
because it does not otherwise appear in the data. A record is parsed into
field using
line.split('/')
Don't forget to remove extra white space using string.strip()
Finally it built a data structure of list of country record as tuple of
(country, captial, area, population). It is just as easy to turn them into
objects or any other data structure as desired.
The mini-language technique can be refined to represent more complex, more
structured input. It makes transformation and maintenance of input data
much easier.
have employed quite successfully in my code. I repost it here for more
explosure and discussions.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/475158
wy
------------------------------------------------------------------------
Title: Design mini-lanugage for data input
Description:
Many programs need a set of initial data. For ease of use and flexibility,
design a mini-language for your input data. Use Python's superb text
handling capability to parse and build the data structure from the input
text.
Source: Text Source
# this is an example to demonstrate the programming technique
DATA = """
# data souce: http://www.mongabay.com/igapo/world_statistics_by_pop.htm
# Country / Captial / Area [sq. km] / 2002 Population Estimate
China / Beijing / 9,596,960 / 1,284,303,705
India / New Delhi / 3,287,590 / 1,045,845,226
United States / Washington DC / 9,629,091 / 280,562,489
Indonesia / Jakarta / 1,919,440 / 231,328,092
Russia / Moscow / 17,075,200 / 144,978,573
"""
def initData():
""" parse and return a country list of (name, captial, area,
population) """
countries = []
for line in DATA.splitlines():
# filter out blank lines/comment lines
line = line.strip()
if not line or line.startswith('#'):
continue
# 4 fields separated by '/'
parts = map(string.strip, line.split('/'))
country, captial, area, population = parts
# remove commas in numbers
area = int(area.replace(',',''))
population = int(population.replace(',',''))
countries.append((country, captial, area, population))
return countries
def findLargestCountry(countries):
# your algorithm here
def main():
countries = initData()
print findLargestCountry(countries)
Discussion:
Problem
-------
Many programs need a set of initial data. The simplest way is to construct
Python data structure directly as shown below. This is often not ideal.
Algorithm and data structure tend to change. Python program statements is
likely differ literally from its data source, which might be text pulled
from web pages or other place. This means a great deal of effort is often
needed to format and maintain the input as Python statements.
This is a sample program that initialize some geographical data.
# map of country -> (captial, area, population)
COUNTRIES = {}
COUNTRIES['China'] = ('Beijing', 9596960, 1284303705)
COUNTRIES['India'] = ('New Delhi', 3287590, 1045845226)
COUNTRIES['United States'] = ('Washington DC', 9629091, 280562489)
COUNTRIES['Indonesia'] = ('Jakarta', 1919440, 231328092)
COUNTRIES['Russia'] = ('Moscow', 17075200, 144978573)
Mini-language
-------------
A more flexible approach is to define a mini-lanugage to describe the
data. This can be as simple as formatting data into a multiple-line string.
1. Define the data format in text. It should mirror the data source and
designed for ease for human editing.
2. Define the data structure.
3. Write glue code to parse the input data and initialize the data
structure.
In the example above we use one line for each record. Each record has four
fields, Country, captial, area and population, separated by slashes. One
of the immediate benefit is that we no longer need to type so many quotes
for every string literal. This concise data format is much easiler to read
and edit than Python statements.
The parser simply break down the input text using splitlines() and then
loop through them line by line. It is useful to account for some extra
white space so that it is more flexible for human editor. In this case the
numbers (area, population) from the data source contains commas. Rather
than manually edit them out, they are copied as is into the text as is.
Then they are parsed into integer using
area = int(area.replace(',',''))
Slash is chosen as the separator (rather than the more common comma)
because it does not otherwise appear in the data. A record is parsed into
field using
line.split('/')
Don't forget to remove extra white space using string.strip()
Finally it built a data structure of list of country record as tuple of
(country, captial, area, population). It is just as easy to turn them into
objects or any other data structure as desired.
The mini-language technique can be refined to represent more complex, more
structured input. It makes transformation and maintenance of input data
much easier.