Z
zdennis
I am doing some large queries with Mysql and the memory that gets allocated never seems to go back
to the system. In this test I am querying 47,000 records.
My setup for my test is ubuntu breezy, rails 1.0, mysql 5.0.18 and a compiled mysql 2.7 ruby driver
running in production mode. I have wrote a helper to my program which monitors object count, memory
utilization and a threshold of 10% to determine how many objects are sticking around and how many
are being garbage collected.
On first run...
Loaded suite test/unit/table1_test
Started
String count 73685
Building query 21Mb
String count 73745
Received query results 89Mb
String count 901034
Threshold breaker String (827236) started w/ 73743 ended w/ 900979
Threshold breaker Table1 47000 started w/ 0 ended w/ 47000
Starting GC 89Mb
String count 25041
Done with GC 82Mb
Threshold breaker String -48704 started w/ 73743 ended w/ 25039
The first block of text shows that when I started my test ruby was utilizing 21Mb of memory and over
70,000 strings were in existance. After the query results were constructed there was over 900,000
strings in existance and my ruby process had grown to 89Mb. The threshold is shown as a gain in
47,000 Table1 objects and 827,236 Strings since the the first object count was captured (which
occured right before the original String count).
After garbage collecting the Stringcount is down 48,704 from when it was first captured, and there
is no threshold break for Table1 because when the program first started there were 0 in existance.
Thus meaning that 0 Table1 objects are in existance. However the memory size never seems to leave
82Mb. It never seems to leave that size.
If I query again, memory goes up to 133Mb, after all Strings and ActiveRecord models have been
garbage collected, memory goes down ever so slightly again. And this continues as long as I keep
querying.
The test is broken out into three methods:
- test_build_mem_usage (count objects, perform query, store results in local variable )
- test_gc (count objects, GC.start)
- test_z (done, recount objects since GC is done)
I guess my biggest unknown at the moment is...as I do large queries is ruby just hanging onto that
space? Why would it keep growing for the next time I did 47,000 item query, if it already had unused
space available from my last query?
At the bottom of this post is the actual test schema and test code I was using.
Zach
---- start schema ----
create table table1 (
id int unsigned not null auto_increment,
description varchar(255),
store_name varchar(255),
address1 varchar(40),
address2 varchar(40),
city varchar(40),
state varchar(15),
zip_code varchar(5),
primary key( id )
)TYPE=MyISAM;
---- start test code ----
# hook into the Rails environment
require File.dirname(__FILE__) + '/../test_helper'
require 'table1'
class Object
def count_objects
objects = Hash.new{ |h,k| h[k]=0 }
ObjectSpace.each_object{ |obj| objects[obj.class] += 1 }
objects
end
def print_threshold_breakers hsh1, hsh2, threshold
# threshold is in percentages
threshold = 1.0 + 1.0 / threshold
hsh2.each_key do |key|
max_num = hsh1[key] * threshold
min_num = hsh1[key] / threshold
if hsh2[key] > max_num or hsh2[key] < min_num
putsf "Threshold breaker #{key.to_s}", "(#{hsh2[key]-hsh1[key]}) started w/ #{hsh1[key]}
ended w/ #{hsh2[key]}"
end
end
end
def count_objects_for clazz
c = 0
ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
c
end
def mem_usage
# get the top two lines from top
line_arr = `ps -p #{Process.pid} -F`.split( /\n/ )
# split the line array into columns of headers and data
arr1, arr2 = line_arr.map{ |line| line.split( /\s+/ ) }
# force the same number of elements in arr2 as there are in arr1 by joining any leftover
elements
column_arr = [ arr1 ]
column_arr << arr2[0 .. arr1.size-2] + arr2[arr1.size-1 .. arr2.size-1].join( ' ' ).to_a
# get column/data key pair array
keypair_arr = column_arr.transpose
# create hash
hsh = {}
keypair_arr.each{ |e| hsh[e[0]] = e[1] }
# grab results from RSS, which are stored in Kb
(hsh[ 'RSS' ].to_i / 1024.0).round.to_s << "Mb"
end
def putsf label, *args
printf( "%-40.40s %-40s\n", label.to_s, args.join( ' ' ) )
end
def print_class_count clazz
putsf "#{clazz.name} count", count_objects_for( clazz )
end
end
class TableTest < Test::Unit::TestCase
def test_build_mem_usage
print_class_count String
putsf 'Building mem usage', mem_usage
h1 = @@h1 = count_objects
print_class_count String
records = Table1.find :all, :limit=>47000
@@oid = records.object_id
h2 = count_objects
putsf 'Done building mem usage', mem_usage
print_class_count String
print_threshold_breakers h1, h2, 10
sleep 2
puts
end
def test_starting_gc
putsf 'Starting GC', mem_usage
h1 = count_objects
GC.start
h2 = @@h2 = count_objects
print_class_count String
putsf 'Done with GC', mem_usage
print_threshold_breakers h1, h2, 10
puts
end
def test_z
test_build_mem_usage
test_starting_gc
test_build_mem_usage
test_starting_gc
test_build_mem_usage
test_starting_gc
print_class_count String
print_class_count Table1
putsf 'Done', mem_usage
print_threshold_breakers @@h1, @@h2, 10
puts
ObjectSpace.each_object{ |obj| puts "FOUND THE RECORD ARRAY " if obj.object_id == @@oid }
# ObjectSpace.each_object { |obj| puts obj if obj.is_a? String }
end
end
to the system. In this test I am querying 47,000 records.
My setup for my test is ubuntu breezy, rails 1.0, mysql 5.0.18 and a compiled mysql 2.7 ruby driver
running in production mode. I have wrote a helper to my program which monitors object count, memory
utilization and a threshold of 10% to determine how many objects are sticking around and how many
are being garbage collected.
On first run...
Loaded suite test/unit/table1_test
Started
String count 73685
Building query 21Mb
String count 73745
Received query results 89Mb
String count 901034
Threshold breaker String (827236) started w/ 73743 ended w/ 900979
Threshold breaker Table1 47000 started w/ 0 ended w/ 47000
Starting GC 89Mb
String count 25041
Done with GC 82Mb
Threshold breaker String -48704 started w/ 73743 ended w/ 25039
The first block of text shows that when I started my test ruby was utilizing 21Mb of memory and over
70,000 strings were in existance. After the query results were constructed there was over 900,000
strings in existance and my ruby process had grown to 89Mb. The threshold is shown as a gain in
47,000 Table1 objects and 827,236 Strings since the the first object count was captured (which
occured right before the original String count).
After garbage collecting the Stringcount is down 48,704 from when it was first captured, and there
is no threshold break for Table1 because when the program first started there were 0 in existance.
Thus meaning that 0 Table1 objects are in existance. However the memory size never seems to leave
82Mb. It never seems to leave that size.
If I query again, memory goes up to 133Mb, after all Strings and ActiveRecord models have been
garbage collected, memory goes down ever so slightly again. And this continues as long as I keep
querying.
The test is broken out into three methods:
- test_build_mem_usage (count objects, perform query, store results in local variable )
- test_gc (count objects, GC.start)
- test_z (done, recount objects since GC is done)
I guess my biggest unknown at the moment is...as I do large queries is ruby just hanging onto that
space? Why would it keep growing for the next time I did 47,000 item query, if it already had unused
space available from my last query?
At the bottom of this post is the actual test schema and test code I was using.
Zach
---- start schema ----
create table table1 (
id int unsigned not null auto_increment,
description varchar(255),
store_name varchar(255),
address1 varchar(40),
address2 varchar(40),
city varchar(40),
state varchar(15),
zip_code varchar(5),
primary key( id )
)TYPE=MyISAM;
---- start test code ----
# hook into the Rails environment
require File.dirname(__FILE__) + '/../test_helper'
require 'table1'
class Object
def count_objects
objects = Hash.new{ |h,k| h[k]=0 }
ObjectSpace.each_object{ |obj| objects[obj.class] += 1 }
objects
end
def print_threshold_breakers hsh1, hsh2, threshold
# threshold is in percentages
threshold = 1.0 + 1.0 / threshold
hsh2.each_key do |key|
max_num = hsh1[key] * threshold
min_num = hsh1[key] / threshold
if hsh2[key] > max_num or hsh2[key] < min_num
putsf "Threshold breaker #{key.to_s}", "(#{hsh2[key]-hsh1[key]}) started w/ #{hsh1[key]}
ended w/ #{hsh2[key]}"
end
end
end
def count_objects_for clazz
c = 0
ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
c
end
def mem_usage
# get the top two lines from top
line_arr = `ps -p #{Process.pid} -F`.split( /\n/ )
# split the line array into columns of headers and data
arr1, arr2 = line_arr.map{ |line| line.split( /\s+/ ) }
# force the same number of elements in arr2 as there are in arr1 by joining any leftover
elements
column_arr = [ arr1 ]
column_arr << arr2[0 .. arr1.size-2] + arr2[arr1.size-1 .. arr2.size-1].join( ' ' ).to_a
# get column/data key pair array
keypair_arr = column_arr.transpose
# create hash
hsh = {}
keypair_arr.each{ |e| hsh[e[0]] = e[1] }
# grab results from RSS, which are stored in Kb
(hsh[ 'RSS' ].to_i / 1024.0).round.to_s << "Mb"
end
def putsf label, *args
printf( "%-40.40s %-40s\n", label.to_s, args.join( ' ' ) )
end
def print_class_count clazz
putsf "#{clazz.name} count", count_objects_for( clazz )
end
end
class TableTest < Test::Unit::TestCase
def test_build_mem_usage
print_class_count String
putsf 'Building mem usage', mem_usage
h1 = @@h1 = count_objects
print_class_count String
records = Table1.find :all, :limit=>47000
@@oid = records.object_id
h2 = count_objects
putsf 'Done building mem usage', mem_usage
print_class_count String
print_threshold_breakers h1, h2, 10
sleep 2
puts
end
def test_starting_gc
putsf 'Starting GC', mem_usage
h1 = count_objects
GC.start
h2 = @@h2 = count_objects
print_class_count String
putsf 'Done with GC', mem_usage
print_threshold_breakers h1, h2, 10
puts
end
def test_z
test_build_mem_usage
test_starting_gc
test_build_mem_usage
test_starting_gc
test_build_mem_usage
test_starting_gc
print_class_count String
print_class_count Table1
putsf 'Done', mem_usage
print_threshold_breakers @@h1, @@h2, 10
puts
ObjectSpace.each_object{ |obj| puts "FOUND THE RECORD ARRAY " if obj.object_id == @@oid }
# ObjectSpace.each_object { |obj| puts obj if obj.is_a? String }
end
end