problem with meteo datas

N

napolpie

----Messaggio originale----
Da: (e-mail address removed)
Data: 3-mag-2007
10.02
A: <[email protected]>
Ogg: problem with meteo datas

Hello,
I'm Peter and I'm new in python codying and I'm using parsying
to
extract data from one meteo Arpege file.
This file is long file and
it's composed by word and number arguments like this:

GRILLE EURAT5
Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-Est : 44.50/ 2.50
MODELE PA
PARAMETRE P
NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS
25
1020.91 1020.87 1020.91 1021.05 1021.13

1020.07 1020.27 1020.49 1020.91 1021.15
1019.37
1019.65 1019.79 1020.53 1020.77
1018.73 1018.89
1019.19 1019.83 1020.81
1018.05 1018.19 1018.75
1019.55 1020.27
NIVEAU MER 0 ECHEANCE 3.0 DATE 20020304000000
NB_POINTS 25
1019.80 1019.78 1019.92 1020.18 1020.34
1018.94 1019.24 1019.54 1020.08 1020.32
1018.24
1018.64 1018.94 1019.84 1019.98
1017.48 1017.88
1018.28 1018.98 1019.98
1016.62 1017.08 1017.66
1018.26 1018.34
NIVEAU MER 0 ECHEANCE 6.0 DATE 20020304000000
NB_POINTS 25
1019.37 1019.39 1019.57 ........
.........
....... .........
.......
.......

........
....... .........
NIVEAU MER 0 ECHEANCE 48.0 DATE
20020304000000 NB_POINTS 25
1017.84 1017.46 1017.14
1016.86 1016.58
1017.28 1016.90 1016.46 1016.48
1016.34
1016.50 1016.06 1015.62 1015.90 1015.72

1015.94 1015.30 1014.78 1014.68 1014.86
1015.86
1015.10 1014.36 1014.00 1013.90

..............................
MODELE PA PARAMETRE T
NIVEAU HAUTEUR 2
ECHEANCE 0.0 DATE 20020304000000 NB_POINTS 25
1.34
1.51 1.40 0.56 -0.36
1.73 1.43
0.89 -0.16 -0.99
2.06 1.39 1.14
-0.53 -0.99
2.12 2.22 2.15 0.76
-1.16
1.67 1.45 1.40 1.26 0.28

NIVEAU
HAUTEUR 2 ECHEANCE 3.0 DATE 20020304000000 NB_POINTS 25

0.94 1.16 1.03 0.44 -0.41
0.95
0.61 0.22 .............................................

I'am at
the begginning of computation and for the moment I write this code to
extract only number data in form of a string:

from pyparsing import *
dec = Combine (Optional( "-" ) + delimitedList( Word( nums ), ".",
combine=True ))
datas = ZeroOrMore( dec )
f=file("arqal-Arpege.00",
"r")
g=file("out3", "w")
for line in f:
try:
result =
datas.
parseString (line)
add = result
add1 = ";".join
(add)
print >> g,"(",add1,")"
except ParseException, pe:

print pe

This is the output result in file g=file("out3",
"w")

( )
( )
( )
( 1020.91;1020.87;1020.91;1021.05;1021.13 )
(
1020.07;1020.27;
1020.49;1020.91;1021.15 )
( 1019.37;1019.65;1019.79;
1020.53;1020.77 )
(
1018.73;1018.89;1019.19;1019.83;1020.81 )
(
1018.05;1018.19;1018.75;
1019.55;1020.27 )
( )
( 1019.80;1019.78;
1019.92;1020.18;1020.34 )
(
1018.94;1019.24;1019.54;1020.08;1020.32 )
( 1018.24;1018.64;1018.94;
1019.84;1019.98 )
( 1017.48;1017.88;1018.28;
1018.98;1019.98 )
( 1016.62;
1017.08;1017.66;1018.26;1018.34 )
( )
(
1019.37;1019.39;1019.57;
1019.9;......;
.........


.........;
1016.87)
( )
( 1017.84;
1017.46;1017.14;1016.86;1016.58 )
( 1017.28;
1016.90;1016.46;1016.48;
1016.34 )
( 1016.50;1016.06;1015.62;1015.90;
1015.72 )
( 1015.94;1015.30;
1014.78;1014.68;1014.86 )
( 1015.86;
1015.10;1014.36;1014.00;1013.90 )

So I don't have any word but the
problem is that Now I have to put in
order this numerical datas in a
type of NESTED matrix emulated by
python like a nested dictionary :

{ 'P ' : { MER 0 : [ (1020.91;
1020.87;........;1020.27 ) ;
(.........) ; ( 1019.80;1019.78;........;
1018.26;1018.34 ) ]; ......;
SOL 0 : [ ( .......);.....;(........ ) ] }
; 'T' : { SOL 0 :
[(.....;......) ; (ECHEANCE 3.0) ; (ECHEANCE 6.0) ;
(.......;........)
]; HAUTEUR 2 : [(.......;......;......) ] } }
======>>>>>>
{
'Parameter X' : { Level X : [ (predict step 3 hours
from +0 to +48
hours ) ;]} }
Dictionary
PARAMETER in the example is P= 'Pressure' but thre are many
of this
Temperature = T , Wind = U and V ecc... the second nested
shell is
setted by another Dictionary NIVEAU MER 0 in the example is
MER 0 =
sea level or SOL 0, but can be HAUTER 2,10 (HEIGHT 2,10 METERS)
ecc..... (soil level , 1;0 meter from soil) ecc (from French language)
and after every Level is associated with a LIST OF TUPLE: [(....);
(....);(....)] to rappresented every step hours of prediction or
expiration hours in French language: ECHEANCE XX.X = predicted hour
+3.
0 +6.0 until 48H is setted of a list of tuple [(ECHEANCE 3.0);
(ECHEANCE
6.0); (ECHEANCE XX.0);.........;(ECHEANCE 48.0)] like so:
[1019.37;
1019.39;........;1020.27 );(.........);(1019.80;
1019.78;........;
1018.26;1018.34 )] where every list is at the end the
is the datas
grill: (5 x 5 points)= 25 datas
1020.91 1020.87
1020.91
1021.05 1021.13
1020.07 1020.27 1020.49
1020.91
1021.15
1019.37 1019.65 1019.79 1020.53
1020.77

1018.73 1018.89 1019.19 1019.83 1020.81

1018.05
1018.19 1018.75 1019.55 1020.27

So I ask you
wich is the
best way to begin to code the grammar parsying to make
recognize him
the 'word' inside of the data file and put the data in
the form of
nested dictionary and list of tuple illustrated before.
In
attached
file there is one meteo arpege datas file and text of the
message in
open office file

Thanks a lot for everyone can said me
anything to
solve this, big problem (for me)!!!!
 
P

Paul McGuire

----Messaggio originale----
Da: (e-mail address removed)
Data: 3-mag-2007
10.02
A: <[email protected]>
Ogg: problem with meteo datas

Hello,
I'm Peter and I'm new in python codying and I'm using parsying
to
extract data from one meteo Arpege file.
This file is long file and
it's composed by word and number arguments like this:

GRILLE EURAT5
Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-Est : 44.50/ 2.50
MODELE PA
PARAMETRE P
NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS
25
1020.91 1020.87 1020.91 1021.05 1021.13

1020.07 1020.27 1020.49 1020.91 1021.15
1019.37
1019.65 1019.79 1020.53 1020.77
1018.73 1018.89
1019.19 1019.83 1020.81
1018.05 1018.19 1018.75
1019.55 1020.27
NIVEAU MER 0 ECHEANCE 3.0 DATE 20020304000000
NB_POINTS 25
1019.80 1019.78 1019.92 1020.18 1020.34
1018.94 1019.24 1019.54 1020.08 1020.32
1018.24
1018.64 1018.94 1019.84 1019.98
1017.48 1017.88
1018.28 1018.98 1019.98
1016.62 1017.08 1017.66
1018.26 1018.34
NIVEAU MER 0 ECHEANCE 6.0 DATE 20020304000000
NB_POINTS 25
1019.37 1019.39 1019.57 ........
........

Peter -

Your first attempt at pyparsing is a good step - just get something
working! You've got a first pattern working that detects and extracts
all decimal numbers. (I think you are the first one to model a
decimal number as a delimited list of integers with "." as the
delimiter.)

The next step is to start looking for some higher-level text groups or
patterns. Your data is well structured as an n-level hierarchy, that
looks to me like:

- model+parameter
- level
- nb_points
- level
- nb_points
- level
- nb_points
- model+parameter
- level
- nb_points
- level
- nb_points
...

You can build your pyparsing grammar from the ground up, first to
parse individual terminal expressions (such as decimal numbers which
you already have), and then buld up to more and more complex
structures within your data.

The first thing to change about your approach is to start looking at
this data as a whole, instead of line by line. Instead of extracting
this first line of 5 point values:

1020.91 1020.87 1020.91 1021.05 1021.13

look at this as one piece of a larger structure, a data set for a
given niveau:

NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS 25
1020.91 1020.87 1020.91 1021.05 1021.13
1020.07 1020.27 1020.49 1020.91 1021.15
1019.37 1019.65 1019.79 1020.53 1020.77
1018.73 1018.89 1019.19 1019.83 1020.81
1018.05 1018.19 1018.75 1019.55 1020.27

So let's create a parser for this structure that is the next step up
in the data hierarchy.

NIVEAU, ECHEANCE, DATE, and NB_POINTS are helpful labels for marking
the data, but not really important to return in the parsed results.
So I will start by creating definitions for these labels which will
parse them, but leave out (suppress) them from the returned data:

NIVEAU, ECHEANCE, DATE, NB_POINTS = \
map(Suppress,"NIVEAU ECHEANCE DATE NB_POINTS"
.split())

You stated that there are several options for what a niveau identifier
can look like, so this should be its own expression:

niveau_ref = Literal("MER 0") | Literal("SOL 0") | \
Combine(Literal("HAUTEUR ") + eurodec)

(I defined eurodec as you defined dec, but with a comma delimiter.)

I'll also define a dateString as a Word(nums) of exactly 14 digits,
but you can come back to this later and refine this as you like (build
in parse-time conversion for example).

dateString = Word(nums,exact=14)

And then you can create an expression for a full niveau's-worth of
data:

niveau = NIVEAU + niveau_ref +
ECHEANCE + dec +
DATE + dateString +
NB_POINTS + countedArray(dec)

Notice that we can use the pyparsing built-in countedArray to capture
all of the data point values, since NB_POINTS gives the number of
points to follow, and these are followed immediately by the points
themselves. Pyparsing will convert all of these into a nice n-element
list for us.

You astutely requested that these values should be accessible like
values in a dict, so we do this in pyparsing by adding results names:

niveau = NIVEAU + niveau_ref.setResultsName("niveau") + \
ECHEANCE + dec.setResultsName("echeance") + \
DATE + dateString.setResultsName("date") + \
NB_POINTS + countedArray(dec).setResultsName("nb_points")

Now you should be able to search through your data file, extracting
all of the niveaux (?) and their related data:

f=file("arqal-Arpege.00", "r")
fdata = f.read() # read the entire file, instead of going line-by-
line
for n in niveau.searchString(fdata):
print n.niveau
print n.dump()
pointValues = map(float,n.nb_points[0])
print "- NB_POINTS mean:", sum(pointValues) / len(pointValues)
print
(I also added some examples of extracting data using the results
names. You can also use dict-style notation, n["niveau"], if you
prefer.)

Gives this output (I've truncated with '...' for the sake of Usenet
posting, but the actual program gives the full lists of values):

MER 0
['MER 0', '0.0', '20020304000000', ['1020.91', '1020.87', ...
- date: 20020304000000
- echeance: 0.0
- nb_points: [['1020.91', '1020.87', '1020.91', '1021.05', ...
- niveau: MER 0
- NB_POINTS mean: 1020.0052

MER 0
['MER 0', '3.0', '20020304000000', ['1019.80', '1019.78', ...
- date: 20020304000000
- echeance: 3.0
- nb_points: [['1019.80', '1019.78', '1019.92', '1020.18', ...
- niveau: MER 0
- NB_POINTS mean: 1018.9736

MER 0
['MER 0', '48.0', '20020304000000', ['1017.84', '1017.46', ...
- date: 20020304000000
- echeance: 48.0
- nb_points: [['1017.84', '1017.46', '1017.14', '1016.86', ...
- niveau: MER 0
- NB_POINTS mean: 1015.9168

HAUTEUR 2
['HAUTEUR 2', '0.0', '20020304000000', ['1.34', '1.51', '1.40', ...
- date: 20020304000000
- echeance: 0.0
- nb_points: [['1.34', '1.51', '1.40', '0.56', '-0.36', '1.73', ...
- niveau: HAUTEUR 2
- NB_POINTS mean: 0.9028

HAUTEUR 2,4
['HAUTEUR 2,4', '3.0', '20020304000000', ['1.34', '1.51', '1.40', ...
- date: 20020304000000
- echeance: 3.0
- nb_points: [['1.34', '1.51', '1.40', '0.56', '-0.36', '1.73', ...
- niveau: HAUTEUR 2,4
- NB_POINTS mean: 0.9028

Now I'll let you take this the next step: compose the expression for
the model+parameter hierarchy level (hint: the body of each model
+parameter value will be an expression of OneOrMore( Group( niveau ) )
- be sure to give this a results name, too).


-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top