Easy Method to 'slurp' a line toeknized by split

R

Rhugga

I am writing a syslog parser that normalizes log entries and the loads
them into a oracle database, I open the input file and process each
line and tokenize it for use but I am running into a roadblock that I
can't seem to find a clean solution for. (my perl skills have grown
rusty over the years)

Here is a sample entry from my input file:

1 Feb 8 00:05:41 back-0202 tldcd[928]: [ID 359804 daemon.notice]
TLD(0) opening robotic path /dev/sg/c3t0l0

This is basically 7 fields of info:
<log count> <timestamp (3 fields)> <hostname> <process info> <log
content>

(<log count> is the number of times an identical log entry was detected
and truncated)

So using split I break this down into components:

@ARGS = (split / /, $line);
$line =~ s/ +/ /g;
$line =~ s/^ +//g;
$log_count = $ARGS[0];
$log_month = $ARGS[1];
$log_day = $ARGS[2];
$log_time = $ARGS[3];
$log_hostname = $ARGS[4];
$log_proc_info = $ARGS[5];
$log_message = $ARGS[6];

My problem is I want $log_message to contain everything after the
process info field. (in the sample entry above, $log_proc_info will
contain tldcd[928]). However $log_message will only contain the next
space delimited field, in this case it will be '[ID'. WHat I want to do
is after I glean $log_proc_info, I then want to set $log_message to the
remaining bytes up to but not including EOL. (ie: I want $log_message =
"[ID 359804 daemon.notice] TLD(0) opening robotic path /dev/sg/c3t0l0"
)

I hope this is making sense, I have working on no sleep as usual.

Thanks for any help,
CC
 
P

Paul Lalli

Rhugga said:
@ARGS = (split / /, $line);
$line =~ s/ +/ /g;
$line =~ s/^ +//g;
$log_count = $ARGS[0];
$log_month = $ARGS[1];
$log_day = $ARGS[2];
$log_time = $ARGS[3];
$log_hostname = $ARGS[4];
$log_proc_info = $ARGS[5];
$log_message = $ARGS[6];

My problem is I want $log_message to contain everything after the
process info field.

Have you considered reading the documentation for the function you're
using? The solution is readily available to anyone who reads
perldoc -f split
(fourth paragraph)

Paul Lalli
 
N

nobull

Rhugga said:
@ARGS = (split / /, $line);
$log_count = $ARGS[0];
$log_month = $ARGS[1];
$log_day = $ARGS[2];
$log_time = $ARGS[3];
$log_hostname = $ARGS[4];
$log_proc_info = $ARGS[5];
$log_message = $ARGS[6];

This is more simply written

my ($log_count,$log_month,$log_day,$log_time,$log_hostname,
$log_proc_info,$log_message) = split / /, $line;

Note: I've guessed that the omission of my() was a mistake since it
probably was. An assignment without a my() overwrites one or more
existing variables rather then intruducing new ones. Unless you have a
possitive reason to modify an existing variable it is generally not a
good idea to do so.

Note: It would probably be more natural to represent the parsed record
as a hash rather than 7 separate scalars. It is generally a good idea
to us the natural representations of things unless you have a positive
reason to do otherwise.

my %log;
@log{'count','month','day','time','hostname',
'proc_info','message'} = split / /, $line;
My problem is I want $log_message to contain everything after the
process info field. (in the sample entry above, $log_proc_info will
contain tldcd[928]).

You appear to have a question about the split() function.

Have you considered the radical option of looking-up the description of
the split function in the reference manual?

Pay particular attension to the semantics of the 4th argument.
 
R

Rhugga

Maybe I am seeing a bug on SLES 9 then. I initially tried an approach
like this:

my ($log_count, $log_month, $log_day, $log_time, $log_hostname,
$log_proc_info, $log_message) = split( / /, $line);

I was getting errant non-consistent results. (which is why I dummied
it down to the brute force approach I posted in the orignal post) The
code I listed above is in a debug type of setup right now. I'll look at
it again, just working on not much sleep and haven't written much perl
in the last 5 years so very rusty. Ironcially, I parse the local
timestamp into vars in the same way as you are suggesting for my data
and that works fine.

@ARGS is in all caps because it is my preference to set all caps for
var names of 'un-refined' data. By un-refined meaning something I
intend to do with it later on. When I see @ARGS or @MYVAR for example,
that tells me this is raw data generated from a split() or something
similiar. Just a personal preference. I'm the only one that sees my
code so I dont need to adhere to coding standards as one would in a
team envrionment.

The reason why my() is missing is because the code you see is inside a
loop, I define all those vars before the loop using my(). Once one
iteration of the loop ends I no longer have a need for anything stored
in those vars. (as this is getting shoved into oracle)

Thanks for all the suggestions, I defintely have much more to go on
now.

Thx
 
R

Rhugga

Maybe I am seeing a bug on SLES 9 then. I initially tried an approach
like this:

my ($log_count, $log_month, $log_day, $log_time, $log_hostname,
$log_proc_info, $log_message) = split( / /, $line);

I was getting errant non-consistent results. (which is why I dummied
it down to the brute force approach I posted in the orignal post) The
code I listed above is in a debug type of setup right now. I'll look at
it again, just working on not much sleep and haven't written much perl
in the last 5 years so very rusty. Ironcially, I parse the local
timestamp into vars in the same way as you are suggesting for my data
and that works fine.

@ARGS is in all caps because it is my preference to set all caps for
var names of 'un-refined' data. By un-refined meaning something I
intend to do with it later on. When I see @ARGS or @MYVAR for example,
that tells me this is raw data generated from a split() or something
similiar. Just a personal preference. I'm the only one that sees my
code so I dont need to adhere to coding standards as one would in a
team envrionment.

The reason why my() is missing is because the code you see is inside a
loop, I define all those vars before the loop using my(). Once one
iteration of the loop ends I no longer have a need for anything stored
in those vars. (as this is getting shoved into oracle)

Thanks for all the suggestions, I defintely have much more to go on
now.

Thx
 
R

Rhugga

Just FYI:

@ARGS = (split / /, $line, 7);

This works perfectly, so much thanks for that suggestion. I just need
to revert back to my ($count, $month, ...) = (split / /, $line, 7);

Thx all.
 
E

Eric Bohlman

The reason why my() is missing is because the code you see is inside a
loop, I define all those vars before the loop using my(). Once one
iteration of the loop ends I no longer have a need for anything stored
in those vars. (as this is getting shoved into oracle)

If the variables are used only inside the loop, you should be declaring
them inside the loop. Variables should have the narrowest possible scope;
among other things, it will make it easier to understand/modify your code
when you come back to it six months later.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,244
Members
46,838
Latest member
KandiceChi

Latest Threads

Top