New to ruby

B

bigbrother

Hey guys, I'm pretty new to ruby. I've got a question
I want to parse a log file (standard apache log) and look for a couple
things.
1270.0.1 - - [13/Dec/2007:09:44:41 -0600] "GET /v700.php?
GN=MosherHB&UN=mosherhb&URL=http://photos-179.ll.facebook.com/photos-
ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif HTTP/1.1" 200 23
"-" "Filter7"

ok that's a snippet from my log files. These requests are filtered
through a content filtering program. The GN= in the URL is the Account
name, UN= is the User name. I want to parse a log file, and find out
which user (in any group) has made the most requests. Then grab the
requests that user has made and place them in an array. Then time how
long it takes to get to that host (to test the filter load)

Can anyone please help me?
 
D

Drew Olson

Joshua said:
Can anyone please help me?

First, welcome to ruby! Second, maybe something like this:

user_requests = Hash.new{0}
File.open("my_log.txt") do |line|
user = line.scan(/US=(\w+)/).flatten.first
user_requests[user] += 1
end

max_requests = user_requests.inject([0,nil]) do |max,user,requests|
if requests > max[0]
max = requests,user
end
max
end

puts "User #{max_requests[1] has #{max_requests[0] requests"
 
D

Drew Olson

Drew said:
First, welcome to ruby! Second, maybe something like this:

Whoops, that only gets you the max requests from the user, but it should
get you on the right track.
 
S

Sebastian Hungerecker

Drew said:
user = line.scan(/US=(\w+)/).flatten.first

user = line[/US=(\w+)/, 1]

No need for scan if you only care for the first match.

HTH,
Sebastian
 
D

Drew Olson

Sebastian said:
Drew said:
user = line.scan(/US=(\w+)/).flatten.first

user = line[/US=(\w+)/, 1]

Sebastian -

Thanks for posting that. I'm always finding myself needing something
similar and had a feeling there was an easier syntax.

- Drew
 
S

Sebastian Hungerecker

Drew said:
Sebastian said:
Drew said:
user = line.scan(/US=(\w+)/).flatten.first

user = line[/US=(\w+)/, 1]

Thanks for posting that. I'm always finding myself needing something
similar and had a feeling there was an easier syntax.

You're welcome. There also Regexp#match if String#[] is not enough (e.g. if
you need to access the values more than one capturing group), but you still
only need one match:
md="I have 100 dollars for you.".match(/(\d+) (dollars|euros)/)
md[0] #=> "100 dollars"
md[1] #=> "100"
md[2] #=> "dollars"

You only need scan when a match can occur multiple times in a string.


HTH,
Sebastian
 
B

bigbrother

Whoops, that only gets you the max requests from the user, but it should
get you on the right track.

Cool, thanks. When I'm trying it though I get
Joshua$./test.rb
../test.rb:4: private method `scan' called for #<File:access.log
(closed)> (NoMethodError)
from ./test.rb:3:in `open'
from ./test.rb:3
(I use linux)
What does that mean?
 
S

Sebastian Hungerecker

bigbrother said:
Drew said:
File.open("my_log.txt") do |line|
user = line.scan(/US=(\w+)/).flatten.first
user_requests[user] += 1
end

Cool, thanks. When I'm trying it though I get
Joshua$./test.rb
./test.rb:4: private method `scan' called for #<File:access.log
(...)
What does that mean?

It means that you're calling scan on line when line is File object which
doesn't have scan. If you change File.open to File.foreach, line will
actually be a line of the file (i.e. a string) and the code will work.


HTH,
Sebastian
 
B

bigbrother

bigbrother said:
Drew Olson wrote:
File.open("my_log.txt") do |line|
user = line.scan(/US=(\w+)/).flatten.first
user_requests[user] += 1
end
Cool, thanks. When I'm trying it though I get
Joshua$./test.rb
./test.rb:4: private method `scan' called for #<File:access.log
(...)
What does that mean?

It means that you're calling scan on line when line is File object which
doesn't have scan. If you change File.open to File.foreach, line will
actually be a line of the file (i.e. a string) and the code will work.

HTH,
Sebastian
I'm dumb or something's not right
jthomas@jthomas-desktop:~/work$ ./test.rb
../test.rb:14: undefined method `>' for nil:NilClass (NoMethodError)
from ./test.rb:20:in `inject'
from ./test.rb:13:in `each'
from ./test.rb:13:in `inject'
from ./test.rb:13


Sorry guys, I'm really new to programming.
 
M

Marc Heiler

I'm dumb or something's not right

I think you are the only one who insist on this ;)
Sorry guys, I'm really new to programming.

I don't think anyone gets mad about this (and if, then this guy has a
bad attitude), but at any rate you should consider posting your .rb file
too,
so others can see what you made wrong and correct it.
 
D

Drew Olson

Joshua said:
On Dec 18, 2:59 pm, Sebastian Hungerecker <[email protected]>
wrote:
Sorry guys, I'm really new to programming.

Ok, based on feedback from people in the thread and re-reading your
initial explanation, I *think* this is what you want:

user_requests = Hash.new{[]}

File.foreach("my_log.txt") do |line|
user = line[/US=(\w+)/,1]
user_requests[user] += line
end

max_user_info = user_requests.max{|a,b| a[1].size <=> b[1].size}

puts "User name: #{max_user_info[0]}"
puts "Requests:"
max_user_info[1].each do |request|
puts request
end
 
B

bigbrother

Ugh, US= should be UN=, sorry for the multiple emails.

-Drew

Thanks for the help. Can you answer a few questions though?
ok when it prints out the user requests, is there a way I can use
regex to get just a URL? That's really what I need in that array.
Thanks for all your help. I've got a book on ruby and will start
reading it when I get the chance
 
B

bigbrother

Thanks for the help. Can you answer a few questions though?
ok when it prints out the user requests, is there a way I can use
regex to get just a URL? That's really what I need in that array.
Thanks for all your help. I've got a book on ruby and will start
reading it when I get the chance

Guess I should clarify, the first bit of regex works fine, gets the
users and selects the user with the most requests. But when it prints
out the requests it does this
User name: tendercarepro
Requests: 70.xxx.xxx.xxx- - [13/Dec/2007:09:44:39 -0600] "GET /
[email protected]&UN=tendercarepro&URL=http://
s.wsj.net/public/resources/images/OB-AV076_Umami_20071205151444.jpg
HTTP/1.1" 200 17 "-" "Filter7"
I want to strip out everything but the base URL, in this case
http://s.wsj.net/public/resources/images/OB-AV076_Umami_20071205151444.jpg
(taken from my firewall logs)
sorry if I'm annoying you guys, you've been a huge help
 
B

botp

I want to strip out everything but the base URL

hint:

botp@it:~$ irb
irb(main):001:0> line=%q(1270.0.1 - - [13/Dec/2007:09:44:41 -0600]
"GET /v700.php?GN=MosherHB&UN=mosherhb&URL=http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif
HTTP/1.1" 200 23 "-" "Filter7")
=> "1270.0.1 - - [13/Dec/2007:09:44:41 -0600] \"GET
/v700.php?GN=MosherHB&UN=mosherhb&URL=http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif
HTTP/1.1\" 200 23 \"-\" \"Filter7\""

irb(main):002:0> user,url = line.match(/UN=(\w+)\&URL=(.+)\sHTTP/).captures
=> ["mosherhb",
"http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif"]

irb(main):003:0> p user,url
"mosherhb"
"http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif"
=> nil

kind regards -botp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top