Extraction of single subarrays from multidimensional array

M

Maurizio Cirilli

Hi there,

I am a new member of this group and a newbie about Ruby. After a
not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = ["tcg", "agt", "tct", "agc", "tca", "tcc"], ["aaa", "aag"],
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]]

What I would like to get are the separated arrays s[0], s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:

irb(main):039:0> v0 = ss[0].clone
=> ["tcg", "agt", "tct", "agc", "tca", "tcc"]

but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

I tried iterations like:

v"#{n}"= ss.clone(n) do |n|
end
or

ss(n).each do |n|
v"#{n}" = ss.clone(n)
end

with no success.
Any help is greatly appreciated. Thanks.

-- Maurizio
 
S

Siep Korteling

You're working too hard. The most simple way usually works. Just

ss.each do |codon_cluster|
#do something with codon_cluster, like;
p codon_cluster
end

is enough.
Since all these strings are different objects (eating memory) you might
want to convert them into symbols (google them) as soon as possible.
 
R

Robert Klemme

I am a new member of this group and a newbie about Ruby. After a
not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = ["tcg", "agt", "tct", "agc", "tca", "tcc"], ["aaa", "aag"],
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]]

Btw, I assume you do bioinformatics. If all your Arrays contain three
letter sequences you should probably change entries to Symbols (there
are only 4 ^ 3 = 64 of them).

I'd probably replace these Arrays by a custom class for handling
sequences and work with that. Then you can optimize internal
representation (e.g. use a Fixnum to code a three letter sequence) to
save even more memory. I believe there are libraries for bioinformatics
out there which probably do exactly that.
What I would like to get are the separated arrays s[0], s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:

irb(main):039:0> v0 = ss[0].clone
=> ["tcg", "agt", "tct", "agc", "tca", "tcc"]

but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

Do you actually need a copy or do you want to reference the original?
If you need the original here's the simplest approach

a, b, c = *ss

For copy you can do (in 1.9.*)

a, b, c = *ss.map &:clone # 1.9.*
a, b, c = *ss.map {|x| x.clone} # 1.8.6 and earlier

Note that then you still share String instances! So if you want to
manipulate individual strings you need to take a different approach (e.g.)

a, b, c = *ss.map {|arr| arr.map {|s| s.dup}}
a, b, c = *Marshal.load(Marshal.dump(ss))
I tried iterations like:

v"#{n}"= ss.clone(n) do |n|
end
or

Apart from that it does not work, where's the point in creating
variables with calculated names with indexes if you can do indexed
access via the Array already? That does not seem like a viable approach.

Kind regards

robert
 
M

Maurizio Cirilli

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

-- Maurizio

      I am a new member of this group and a newbie about Ruby. After a
not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = ["tcg", "agt", "tct", "agc", "tca", "tcc"], ["aaa", "aag"],
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]]

Btw, I assume you do bioinformatics.  If all your Arrays contain three
letter sequences you should probably change entries to Symbols (there
are only 4 ^ 3 = 64 of them).

I'd probably replace these Arrays by a custom class for handling
sequences and work with that.  Then you can optimize internal
representation (e.g. use a Fixnum to code a three letter sequence) to
save even more memory.  I believe there are libraries for bioinformatics
out there which probably do exactly that.
What I would like to get are the separated arrays s[0],  s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:
irb(main):039:0>  v0 = ss[0].clone
=>  ["tcg", "agt", "tct", "agc", "tca", "tcc"]
but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

Do you actually need a copy or do you want to reference the original?
If you need the original here's the simplest approach

a, b, c = *ss

For copy you can do (in 1.9.*)

a, b, c = *ss.map &:clone # 1.9.*
a, b, c = *ss.map {|x| x.clone} # 1.8.6 and earlier

Note that then you still share String instances!  So if you want to
manipulate individual strings you need to take a different approach (e.g.)

a, b, c = *ss.map {|arr| arr.map {|s| s.dup}}
a, b, c = *Marshal.load(Marshal.dump(ss))
I tried  iterations like:
  v"#{n}"= ss.clone(n) do |n|
end
or

Apart from that it does not work, where's the point in creating
variables with calculated names with indexes if you can do indexed
access via the Array already?   That does not seem like a viable approach.

Kind regards

        robert
 
W

w_a_x_man

Do you actually need a copy or do you want to reference the original?
If you need the original here's the simplest approach

a, b, c = *ss


This is simpler.

a, b, c = ss
 
J

Jesús Gabriel y Galán

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.

It's usually called the splat operator, and its function in the above
expression is to take the array elements one by one and use them in
the parallel assigment, so that the first element is assigned to a,
the second to b, the third to c, and any other is discarded.

It's also used to collect the rest of the parameters in an assigment
or in a method call:

irb(main):001:0> ss = [1,2,3,4,5]
=> [1, 2, 3, 4, 5]
irb(main):002:0> a,b,c = *ss
=> [1, 2, 3, 4, 5]
irb(main):003:0> a
=> 1
irb(main):004:0> b
=> 2
irb(main):005:0> c
=> 3
irb(main):006:0> a,b,*c = *ss
=> [1, 2, 3, 4, 5]
irb(main):007:0> a
=> 1
irb(main):008:0> b
=> 2
irb(main):009:0> c
=> [3, 4, 5]
irb(main):010:0> def test a,b,*c
irb(main):011:1> p [a,b,c]
irb(main):012:1> end
=> nil
irb(main):013:0> test 1,2,3,4,5,6
[1, 2, [3, 4, 5, 6]]



Jesus.
 
C

Colin Bartlett

[Note: parts of this message were removed to make it a legal post.]

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

There's an explanation of *array in the online Programming Ruby, probably in
the sections on assignment and/or method calls: I did think about searching
for it, but the link below looks as though it has a reasonable explanation.
Subject to correction by anyone more knowledgeable than me, the second
statement below (extracted from the linked page) also applies to assignment,
so you can do something like:
aa = [1, 2]
bb = [4, 5]
cc = [7, 8]
a, b, c, d, e, f, g = *aa, 3, bb, 6, *cc
which sets a to 1, b to 2, c to 3, d to [4, 5], e to 6, f to 7, g to 8.

As w_a_x_man pointed out, if the right hand side of an assignment statement
is an array, and there are two or more variables on the left hand side of
the assignment statement, then Ruby automatically expands the array for you,
so you can omit the "*" operator if you want to..

http://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Method_Calls
...
Variable Length Argument List, Asterisk Operator

The last parameter of a method may be preceded by an asterisk(*), which is
sometimes called the 'splat' operator. This indicates that more parameters
may be passed to the function. Those parameters are collected up and an
array is created.
...
The asterisk operator may also precede an Array argument in a method call.
In this case the Array will be expanded and the values passed in as if they
were separated by commas.
...
 
M

Maurizio Cirilli

Thank you all for the very very instructive replies-
I have the very last question: how to make this iteration
through splat operator general i.e. flexible covering cases
in which the number of subarrays (a,b,c) in the above example
is unknown? I mean, the splatter operator, doing iteration
automatically,
does not return any count on the columns of the ss input m-array so
how to know
how many variables to put on the left side of the assignment
a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!) :)

Thanks again.

- Maurizio
 
J

Jeremy Bopp

Thank you all for the very very instructive replies-
I have the very last question: how to make this iteration
through splat operator general i.e. flexible covering cases
in which the number of subarrays (a,b,c) in the above example
is unknown? I mean, the splatter operator, doing iteration
automatically,
does not return any count on the columns of the ss input m-array so
how to know
how many variables to put on the left side of the assignment
a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!) :)

It is not possible to do what you're proposing. If you just want to
iterate over the array contents, us the each method of the array object:

ss.each do |item|
# Do something with the item here.
end

-Jeremy
 
R

Robert Klemme

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

There's an explanation of *array in the online Programming Ruby, probably in
the sections on assignment and/or method calls: I did think about searching
for it, but the link below looks as though it has a reasonable explanation.
Subject to correction by anyone more knowledgeable than me, the second
statement below (extracted from the linked page) also applies to assignment,
so you can do something like:
aa = [1, 2]
bb = [4, 5]
cc = [7, 8]
a, b, c, d, e, f, g = *aa, 3, bb, 6, *cc
which sets a to 1, b to 2, c to 3, d to [4, 5], e to 6, f to 7, g to 8.

As w_a_x_man pointed out, if the right hand side of an assignment statement
is an array, and there are two or more variables on the left hand side of
the assignment statement, then Ruby automatically expands the array for you,
so you can omit the "*" operator if you want to..

It even works with one variable to the left - but then you need a comma:

09:11:30 ~$ ruby19 -e 'a=%w{foo bar baz};b,=a;p b'
"foo"

While splat alone does not work in this case:

09:11:50 ~$ ruby19 -e 'a=%w{foo bar baz};b=*a;p b'
["foo", "bar", "baz"]

You need to add the comma here as well

09:12:24 ~$ ruby19 -e 'a=%w{foo bar baz};b,=*a;p b'
"foo"

Of course, you could also do

09:12:50 ~$ ruby19 -e 'a=%w{foo bar baz};b=a.first;p b'
"foo"
09:13:18 ~$ ruby19 -e 'a=%w{foo bar baz};b=a[0];p b'
"foo"

Or, if destruction is allowed:

09:13:23 ~$ ruby19 -e 'a=%w{foo bar baz};b=a.shift;p b'
"foo"
http://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Method_Calls
...
Variable Length Argument List, Asterisk Operator

The last parameter of a method may be preceded by an asterisk(*), which is
sometimes called the 'splat' operator. This indicates that more parameters
may be passed to the function. Those parameters are collected up and an
array is created.
...

Actually this is not correct any more for 1.9.*: here the splat
operator can occur at _any_ position and Ruby will do the pattern
matching for you:

09:13:29 ~$ ruby19 -e 'def f(a,*b,c) p a, b, c end;f(1,2,3,4,5)'
1
[2, 3, 4]
5
09:15:03 ~$ ruby19 -e 'def f(*a,b,c) p a, b, c end;f(1,2,3,4,5)'
[1, 2, 3]
4
5
09:15:42 ~$ ruby19 -e 'def f(a,b,*c) p a, b, c end;f(1,2,3,4,5)'
1
2
[3, 4, 5]

Kind regards

robert
 
R

Robert Klemme

It is not possible to do what you're proposing.

I go further and say: it is not even reasonable to do that. That's
the same as setting local variables with calculated names like v1, v2,
v3 etc. If someone wants to do that he must be aware that access to
these variables (since they are generated) must be generated as well.
In this case using an Array indexing is the more appropriate
mechanism.
=A0If you just want to
iterate over the array contents, us the each method of the array object:

Exactly!

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
M

Maurizio Cirilli

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

-- Maurizio
 
A

Alex Gutteridge

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

-- Maurizio

It almost certainly can. I think you just need to rephrase your question
so people can see what exactly you want to do. Here is an irb session
showing you one way of doing what I think you want to do:
ss = [[1,2,3],[4,5,6],[7,8,9]] => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
main_scope = binding()
=> # said:
ss.each_with_index{|x,i| eval("v#{i} = x.clone",main_scope) } => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
v1 => [4, 5, 6]
v0 => [1, 2, 3]
v2
=> [7, 8, 9]

It's almost certainly not the 'right' way to do what you really want
though.
 
R

Robert Klemme

OK, so =A0looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

I can only chime in to what Alex wrote: you *can* extract arrays from
arbitrary nested arrays but generating variable names is almost
certainly the wrong way to go about it. What exactly do you want to
do? Can you describe the input and what you want to do with it with
more context than you provided so far?

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
W

w_a_x_man

OK, so  looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

-- Maurizio

If you don't know the length of ss when you write your program,
then obviously you don't know how many variables to use in your
assignment statement. You don't know whether to say

a,b,c = ss

or

a,b,c,d = ss

or

a,b,c,d,e = ss


However, those variables are not needed at all. To extract the
first subarray, say ss[0] or ss.first. To extract the last
subarray, say ss[-1] or ss.last. To extract each in turn along
with its index:

ss.each_with_index{|x,i| p i, x}
0
["tcg", "agt", "tct", "agc", "tca", "tcc"]
1
["aaa", "aag"]
2
["ctg", "tta", "ctt", "cta", "ctc", "ttg"]

All of this will become obvious after you have some programming
experience.
 
M

Maurizio Cirilli

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this "variable" otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote "unknown"
dimension.
Sorry again for misunderstanding.

- Maurizio
 
M

Maurizio Cirilli

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this "variable" otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote "unknown"
dimension.
Sorry again for misunderstanding.

- Maurizio
 
R

Robert Klemme

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this "variable" otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote "unknown"
dimension.

Then why don't you just iterate the outermost Array and be done?

robert
 
M

Maurizio Cirilli

Dear Robert,

this is I want to do:

(1) extract subarrays from a md-arrays (but the number of subarrays
can vary from case to case)
(2) use separated subarrays to make all possible nucleotide sequences
is possible to build from them permutating the codons at each
position
(row) i.e. all the 6*2*6 sequences 9 nucleotide long in my
example
(3) convert them to string and put in a single array ( this is
required for
compatibility with BioRuby classes and methods that deal withDNA
sequences as strings)
(4) make further analysis on these sequences by BioRuby.

That's it.

While is clear to me how to do points 2,3 and 4 I really struggled
how
to accomplish point 1 , which is the subject of this thread.

-- Maurizio
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top