A
A. Farber
Hi,
I'm trying to parse this kind of texts:
TARGET Snake.app
TARGETTYPE app
UID 0x100079be 0x103F5BE8
TARGETPATH \system\apps\Snake
SOURCEPATH ..\UiSrc
SOURCE CSApplication.cpp CSAppUI.cpp CSDocument.cpp
CSView.cpp CSViewControl.cpp CSSettingsDialog.cpp
SOURCE CSHighScoreDialog.cpp CSKeyboardReader.cpp
CSGameDrawer.cpp CSPauseNoteDialog.cpp
CSHighscoreStore.cpp CSPlayDialog.cpp
CSHelpDialog.cpp
SOURCE CSConnectionNoteDialog.cpp
CSAsyncWait.cpp // added 13.02.2004
CSGameOverNoteDialog.cpp // added 24.7.2001
SOURCE ..\audiosrc\CSoundBank.cpp CdBitmapManager.cpp
DOCUMENT ..\group\Snake.loc
by the following grammar:
startrule: comment(s) | directive(s)
comment: m{//[^\\n]*}
directive: keyword value(s)
{ print "KEYWORD: $item{keyword}\n" }
value: file | type | uid
file: m{[\\w\\\\/.-]+}
type: 'app' | 'dll'
uid: /0[xX][0-9a-fA-F]+/
keyword:
'AIF' |
'DOCUMENT' |
'LANG' |
'LIBRARY' |
'RESOURCE' |
'SOURCE' |
'SOURCEPATH' |
'SYSTEMINCLUDE' |
'TARGET' |
'TARGETPATH' |
'TARGETTYPE' |
'UID' |
'USERINCLUDE'
But only get the single line printed out:
KEYWORD: TARGET
Probably because the very first line is being
parsed as the "keyword" TARGET with all the rest
words in the file being parsed as "value"s.
I've tried to change the last rule in my grammar to
a set of regexes (which has uglified it as well):
keyword:
/^\s*AIF/ |
/^\s*DOCUMENT/ |
/^\s*LANG/ |
/^\s*LIBRARY/ |
/^\s*RESOURCE/ |
/^\s*SOURCE/ |
/^\s*SOURCEPATH/ |
/^\s*SYSTEMINCLUDE/ |
/^\s*TARGET/ |
/^\s*TARGETPATH/ |
/^\s*TARGETTYPE/ |
/^\s*UID/ |
/^\s*USERINCLUDE/
But the problem persists. I wonder if there is a
nice way to solve this (probably frequent) problem?
I.e. I'd like the words matching the keyword list
above to be parsed as "keyword", not as "value" -
provided they are found at a beginning of a line.
Thank you for any suggestions
Alex
PS: Also, is there a way to make the grammar
case-insensitive, without using /regexes/i
(since that would make my grammar less readable)?
I'm trying to parse this kind of texts:
TARGET Snake.app
TARGETTYPE app
UID 0x100079be 0x103F5BE8
TARGETPATH \system\apps\Snake
SOURCEPATH ..\UiSrc
SOURCE CSApplication.cpp CSAppUI.cpp CSDocument.cpp
CSView.cpp CSViewControl.cpp CSSettingsDialog.cpp
SOURCE CSHighScoreDialog.cpp CSKeyboardReader.cpp
CSGameDrawer.cpp CSPauseNoteDialog.cpp
CSHighscoreStore.cpp CSPlayDialog.cpp
CSHelpDialog.cpp
SOURCE CSConnectionNoteDialog.cpp
CSAsyncWait.cpp // added 13.02.2004
CSGameOverNoteDialog.cpp // added 24.7.2001
SOURCE ..\audiosrc\CSoundBank.cpp CdBitmapManager.cpp
DOCUMENT ..\group\Snake.loc
by the following grammar:
startrule: comment(s) | directive(s)
comment: m{//[^\\n]*}
directive: keyword value(s)
{ print "KEYWORD: $item{keyword}\n" }
value: file | type | uid
file: m{[\\w\\\\/.-]+}
type: 'app' | 'dll'
uid: /0[xX][0-9a-fA-F]+/
keyword:
'AIF' |
'DOCUMENT' |
'LANG' |
'LIBRARY' |
'RESOURCE' |
'SOURCE' |
'SOURCEPATH' |
'SYSTEMINCLUDE' |
'TARGET' |
'TARGETPATH' |
'TARGETTYPE' |
'UID' |
'USERINCLUDE'
But only get the single line printed out:
KEYWORD: TARGET
Probably because the very first line is being
parsed as the "keyword" TARGET with all the rest
words in the file being parsed as "value"s.
I've tried to change the last rule in my grammar to
a set of regexes (which has uglified it as well):
keyword:
/^\s*AIF/ |
/^\s*DOCUMENT/ |
/^\s*LANG/ |
/^\s*LIBRARY/ |
/^\s*RESOURCE/ |
/^\s*SOURCE/ |
/^\s*SOURCEPATH/ |
/^\s*SYSTEMINCLUDE/ |
/^\s*TARGET/ |
/^\s*TARGETPATH/ |
/^\s*TARGETTYPE/ |
/^\s*UID/ |
/^\s*USERINCLUDE/
But the problem persists. I wonder if there is a
nice way to solve this (probably frequent) problem?
I.e. I'd like the words matching the keyword list
above to be parsed as "keyword", not as "value" -
provided they are found at a beginning of a line.
Thank you for any suggestions
Alex
PS: Also, is there a way to make the grammar
case-insensitive, without using /regexes/i
(since that would make my grammar less readable)?