Easily parsing a string to retrieve values and assign them to a variable/symbol.

E

Eric DUMINIL

Hi!

I've been looking in API's for a while in desperate need for an easy
way to parse string and retrieve data (forget about Regexp or scanf),
so that any non-rubyist guy I work with could describe, with a single
string, a FTP directory on which some files are saved. Moreover, I
need some metadata so that I can effectively sort and work with data I
retrieve from this FTP.

For example, I would not know which file I should retrieve on:
'ftp://ftp.org/DATA/mike'
but
'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
just fine, so that I could, for example, get this hash:
{:year=>"2005", :user_name=>"mike", :day=>"15", :month=>"10"}
for this filename:
'ftp://ftp.org/DATA/mike/2005/10-15.txt'


I don't know if such a method is already available for Ruby, so I
decided to implement it on my own. Here it is:

###### Source ###########################################

class String
def parse_for_variables(description,begin_var_name="{",end_var_name="}")
split_reg_exp=Regexp.new(Regexp.quote(begin_var_name)<<"(.+?)"<<Regexp.quote(end_var_name))
@variables=[]
@is_a_variable_name=true
searching_reg_exp=Regexp.new("^"<<description.split(split_reg_exp).collect{|str|
@is_a_variable_name=!@is_a_variable_name
if @is_a_variable_name then
@variables<<str.sub(/:(\d+)$/,'').intern
str=~/:(\d+)$/ ? '(.{'<<$1<<'})' :"(.+)"
else
Regexp.quote(str)
end
}.join<<"$")
values=searching_reg_exp.match(self).to_a[1..-1]

!values.nil? &&
@variables.length==values.length &&
Hash.check_for_consistency_and_create_from_arrays(@variables,values)
end
end


class Hash
def self.create_from_arrays(keys,values)
self[*keys.zip(values).flatten]
end

def self.check_for_consistency_and_create_from_arrays(keys,values)
@result={}
keys.each_with_index{|k,i|
raise ArgumentError if @result.has_key?(k) and @result[k]!=values
@result[k]=values
}
@result
rescue ArgumentError
false
end
end

############################################################


#### Examples ###############################################

irb(main):026:0> 'foobar'.parse_for_variables('foo{name}')
=> {:name=>"bar"}

# You can specify the length of a string by adding :i to the end of a
variable name

irb(main):027:0> 'foobar'.parse_for_variables('foo{name:3}')
=> {:name=>"bar"}

irb(main):028:0> 'foobar'.parse_for_variables('foo{name:2}')
=> false

irb(main):029:0> 'foobar'.parse_for_variables('foo{name}')
=> {:name=>"bar"}

# By default, variable names are written between {}, but it could be
overridden with optional arguments

irb(main):030:0> 'foo(bar){|x|
x+2}'.parse_for_variables('foo(<<arg>>){|<<var>>|
<<expression>>}','<<','>>')
=> {:arg=>"bar", :var=>"x", :expression=>"x+2"}

irb(main):031:0>
'C:\Windows\system32\vbrun700.dll'.parse_for_variables('{disk}:\{path}\{filename}.{extension}')
=> {:disk=>"C", :extension=>"dll", :filename=>"vbrun700",
:path=>"Windows\\system32"}

irb(main):032:0>
'2006-12-09.csv'.parse_for_variables('{year}-{month}-{day}.csv')
=> {:year=>"2006", :day=>"09", :month=>"12"}

irb(main):033:0> '2005 12 15'.parse_for_variables('{year} {month} {day}')
=> {:year=>"2005", :day=>"15", :month=>"12"}

irb(main):034:0>
'20061209.txt'.parse_for_variables('{year:4}{month:2}{day:2}.txt')
=> {:year=>"2006", :day=>"09", :month=>"12"}

irb(main):035:0>
'20061209.txt'.parse_for_variables('{year:2}{month:2}{day:2}.txt')
=> false

# You can use a variable name twice:
irb(main):036:0>
'DATA/2007/2007-12-09.csv'.parse_for_variables('DATA/{year}/{year}-{month}-{day}.csv')
=> {:year=>"2007", :day=>"09", :month=>"12"}

# as long as values are consistent:
irb(main):037:0>
'DATA/2007/2006-12-09.csv'.parse_for_variables('DATA/{year}/{year}-{month}-{day}.csv')
=> false

irb(main):038:0> 'whateverTooLong'.parse_for_variables('whatever{name:4}')
=> false

irb(main):039:0>
'whateverAsLongAsIWant'.parse_for_variables('whateverKsome_variableK','K','K')
=> {:some_variable=>"AsLongAsIWant"}

irb(main):040:0>
'whatevertoolong.csv'.parse_for_variables('whatever$name:4$.csv','$','$')
=> false
############################################################


Have you ever use such a method?
Is it possible to implement it in a more elegant way?


Thanks for reading, and please feel free to use my code if you ever need it,

Eric Duminil
 
D

Daniel Lucraft

Eric said:
'20061209.txt'.parse_for_variables('{year:4}{month:2}{day:2}.txt')
=> {:year=>"2006", :day=>"09", :month=>"12"}

I like this. It's sort of like a cut down regex for non-programmers. You
should write this up with a definition and put it in a library. I bet
people would use it.

Don't forget to come up with a cool name.

best,
Dan
 
P

Peña, Botp

From: Eric DUMINIL [mailto:[email protected]]=20
# For example, I would not know which file I should retrieve on:
# 'ftp://ftp.org/DATA/mike'
# but
# 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
# just fine, so that I could, for example, get this hash:
# {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
# for this filename:
# 'ftp://ftp.org/DATA/mike/2005/10-15.txt'

very nice.
but would it be more practical if we delineate a variable just like we =
used to in ruby inline string; ie, use #{var} instead of just {var}

this would be handy like, if i want to rename or move all folders under =
/mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just =
stay the same...

kind regards -botp
 
E

Eric DUMINIL

Hi
Thanks for the appreciation!
Your suggestion is interesting, even though I'm not sure it would work, bec=
ause:

'foobar'.parse_for_variables('foo#{name}','#{')
=3D> {:name=3D>"bar"}

works, but when you use it with double quotes string:

'foobar'.parse_for_variables("foo#{name}",'#{')
NameError: undefined local variable or method `name' for main:Object

it already tries to evaluate "name" inside the string...
so either you get retrieval or assignment right, but not both :(
Anyway, assignment is not that big a deal:

(irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"=
10"}
=3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}

(irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\{(.+=
?)\}/){h[$1.intern]}
=3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"

Best regards,

Eric














From: Eric DUMINIL [mailto:[email protected]]
# For example, I would not know which file I should retrieve on:
# 'ftp://ftp.org/DATA/mike'
# but
# 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
# just fine, so that I could, for example, get this hash:
# {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
# for this filename:
# 'ftp://ftp.org/DATA/mike/2005/10-15.txt'

very nice.
but would it be more practical if we delineate a variable just like we us=
ed to in ruby inline string; ie, use #{var} instead of just {var}
this would be handy like, if i want to rename or move all folders under /=
mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just sta=
y the same...
 
P

Peña, Botp

From: Eric DUMINIL [mailto:[email protected]]=20
# 'foobar'.parse_for_variables("foo#{name}",'#{')
# NameError: undefined local variable or method `name' for main:Object

oops, totally ignored that, was thinking about lazy evals..
i think you're current interface is good, it would be easy to infix the =
"#" later...

kind regards -botp
 
S

SonOfLilit

There is an option for regexen to lazily evaluate. So you could
represent a regex-free string with a regex like that, then whenever
you need it - evaluate the regex, convert it to a string and use it
:).

OR you could store the string 'stuff \#{name}' and later #eval() it or
something similar and less dangerous when you need it's evaluation.

Aur

Hi
Thanks for the appreciation!
Your suggestion is interesting, even though I'm not sure it would work, b= ecause:

'foobar'.parse_for_variables('foo#{name}','#{')
=3D> {:name=3D>"bar"}

works, but when you use it with double quotes string:

'foobar'.parse_for_variables("foo#{name}",'#{')
NameError: undefined local variable or method `name' for main:Object

it already tries to evaluate "name" inside the string...
so either you get retrieval or assignment right, but not both :(
Anyway, assignment is not that big a deal:

(irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D=
"10"}
=3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"= }

(irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\{(= +?)\}/){h[$1.intern]}
=3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"

Best regards,

Eric














From: Eric DUMINIL [mailto:[email protected]]
# For example, I would not know which file I should retrieve on:
# 'ftp://ftp.org/DATA/mike'
# but
# 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
# just fine, so that I could, for example, get this hash:
# {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"= }
# for this filename:
# 'ftp://ftp.org/DATA/mike/2005/10-15.txt'

very nice.
but would it be more practical if we delineate a variable just like we = used to in ruby inline string; ie, use #{var} instead of just {var}

this would be handy like, if i want to rename or move all folders under=
/mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just s=
tay the same...
 
E

Eric DUMINIL

I think that what you describe is exactly what I implemented as
searching_reg_exp.

For example searching_reg_exp corresponding to
'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' is:
/^ftp:\/\/ftp\.org\/DATA\/(.+)\/(.+)\/(.+)\-(.+)\.txt$/

if you want it to be non-greedy, it would be:
/^ftp:\/\/ftp\.org\/DATA\/(.+?)\/(.+?)\/(.+?)\-(.+?)\.txt$/

Or did I get you wrong?

I wouldn't choose the eval() path for security reasons, as you mentioned it=
...
'foo{system("rm -rf ~/")}' would be pretty bad!

Which method are you thinking about when you wrote "something similar
and less dangerous"?

Bye,

Eric






There is an option for regexen to lazily evaluate. So you could
represent a regex-free string with a regex like that, then whenever
you need it - evaluate the regex, convert it to a string and use it
:).

OR you could store the string 'stuff \#{name}' and later #eval() it or
something similar and less dangerous when you need it's evaluation.

Aur

Hi
Thanks for the appreciation!
Your suggestion is interesting, even though I'm not sure it would work,= because:

'foobar'.parse_for_variables('foo#{name}','#{')
=3D> {:name=3D>"bar"}

works, but when you use it with double quotes string:

'foobar'.parse_for_variables("foo#{name}",'#{')
NameError: undefined local variable or method `name' for main:Object

it already tries to evaluate "name" inside the string...
so either you get retrieval or assignment right, but not both :(
Anyway, assignment is not that big a deal:

(irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month= =3D>"10"}
=3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"1= 0"}

(irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\= {(.+?)\}/){h[$1.intern]}
=3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"

Best regards,

Eric














From: Eric DUMINIL [mailto:[email protected]]
# For example, I would not know which file I should retrieve on:
# 'ftp://ftp.org/DATA/mike'
# but
# 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
# just fine, so that I could, for example, get this hash:
# {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"1= 0"}
# for this filename:
# 'ftp://ftp.org/DATA/mike/2005/10-15.txt'

very nice.
but would it be more practical if we delineate a variable just like w= e used to in ruby inline string; ie, use #{var} instead of just {var}

this would be handy like, if i want to rename or move all folders und=
er /mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just=
stay the same...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,733
Latest member
LonaMonzon

Latest Threads

Top