J
jim.goodman
i want to find out how many unique "regions are in part of the list
attached.... i have also included the perl script that i have
created... i am new to this so please no flames.... it appears to
partially work, but the intended output isn't what i wanted... it
appears to loop and actually find non-unique items instead of just the
unique... I don't know, i'm lost, that's why i'm here )! As i look
at it more, i understand why it's doing what it is (it's a second loop
thing with teh "uniqueness"), but i don't know how to fix it to get the
end result i want....
#!/usr/bin/perl
use strict;
use warnings;
open (ORG_STUFF, "/Users/goodman/Desktop/region.txt") or die "Can't
open ORG_STUFF : $!";
my (@org, @new); # declare arrays
my ($org_data, $new_data, $org_line, $new_line); # declare
variables
while( <ORG_STUFF> ) {
push @org, $_; # push the data line onto the array
}
push @new, "helpme"; # load something into the array or it won't match
at all
foreach $org_line (@org) { #loop through the original data
$org_data = $3 if ($org_line =~ /(.*?\t)(.*?\t)(.*?\n)/); #get
the data chunk
chomp $org_data;
foreach $new_line (@new) {
if ($org_data ne $new_line) {
push @new, $org_data;
}
}
}
close ORG_STUFF;
print @new;
Now part of the input file... i am interested in the "third" chunk of
data on each line.... and only the unique ones!
1 0,1160,10,00 Napa Valley
2 0,1160,100,00 Monterey Bay Area
3 0,1160,1000,00 Napa Valley
4 0,1160,1001,00 Napa Valley
5 0,1160,1002,00 Sonoma
6 0,1160,1003,00 South Central Coast
7 0,1160,1005,00 Sonoma
8 0,1160,1006,00 South Central Coast
9 0,1160,1007,00 South Central Coast
10 0,1160,1008,00 Napa Valley
11 0,1160,1009,00 Napa Valley
12 0,1160,101,00 Napa Valley
13 0,1160,1010,00 Sonoma
14 0,1160,1011,00 Sonoma
15 0,1160,1012,00 Sonoma
16 0,1160,1013,00 South Central Coast
17 0,1160,1014,00 Napa Valley
18 0,1160,1015,00 Piedmont
19 0,1160,1016,00 Lombardy
20 0,1160,1017,00 Veneto
21 0,1160,1018,00 Veneto
22 0,1160,1019,00 Tuscany
23 0,1160,102,00 Napa Valley
24 0,1160,1020,00 Sicily
25 0,1160,1021,00 Veneto
26 0,1160,1022,00 Piedmont
27 0,1160,1023,00 Piedmont
28 0,1160,1024,00 Piedmont
29 0,1160,1025,00 Piedmont
30 0,1160,1026,00 Piedmont
31 0,1160,1027,00 Veneto
32 0,1160,1028,00 Latium & Rome
33 0,1160,1029,00 Tuscany
34 0,1160,103,00 Sierra Foothills
35 0,1160,1030,00 Mendocino
36 0,1160,1031,00 Napa Valley
37 0,1160,1032,00 Central Valley
38 0,1160,1033,00 Willamette Valley
39 0,1160,1034,00 Napa Valley
40 0,1160,1035,00 New England
41 0,1160,1036,00 New England
42 0,1160,1037,00 South Central Coast
43 0,1160,1038,00 Hudson River Valley
44 0,1160,1039,00 Hudson River Valley
45 0,1160,104,00 Napa Valley
46 0,1160,1040,00 Hudson River Valley
47 0,1160,1041,00 Hudson River Valley
48 0,1160,1042,00 Hudson River Valley
49 0,1160,1043,00 Bordeaux
50 0,1160,1044,00 Bordeaux
from teh above list, in the end, i should get output that looks
something like this... maybe sorted alphabetically...?
Napa Valley
Monterey Bay Area
Sonoma
South Central Coast
Piedmont
Lombardy
Veneto
Tuscany
Sicily
Latium & Rome
Sierra Foothills
Mendocino
Central Valley
Willamette Valley
New England
Hudson River Valley
Bordeaux
attached.... i have also included the perl script that i have
created... i am new to this so please no flames.... it appears to
partially work, but the intended output isn't what i wanted... it
appears to loop and actually find non-unique items instead of just the
unique... I don't know, i'm lost, that's why i'm here )! As i look
at it more, i understand why it's doing what it is (it's a second loop
thing with teh "uniqueness"), but i don't know how to fix it to get the
end result i want....
#!/usr/bin/perl
use strict;
use warnings;
open (ORG_STUFF, "/Users/goodman/Desktop/region.txt") or die "Can't
open ORG_STUFF : $!";
my (@org, @new); # declare arrays
my ($org_data, $new_data, $org_line, $new_line); # declare
variables
while( <ORG_STUFF> ) {
push @org, $_; # push the data line onto the array
}
push @new, "helpme"; # load something into the array or it won't match
at all
foreach $org_line (@org) { #loop through the original data
$org_data = $3 if ($org_line =~ /(.*?\t)(.*?\t)(.*?\n)/); #get
the data chunk
chomp $org_data;
foreach $new_line (@new) {
if ($org_data ne $new_line) {
push @new, $org_data;
}
}
}
close ORG_STUFF;
print @new;
Now part of the input file... i am interested in the "third" chunk of
data on each line.... and only the unique ones!
1 0,1160,10,00 Napa Valley
2 0,1160,100,00 Monterey Bay Area
3 0,1160,1000,00 Napa Valley
4 0,1160,1001,00 Napa Valley
5 0,1160,1002,00 Sonoma
6 0,1160,1003,00 South Central Coast
7 0,1160,1005,00 Sonoma
8 0,1160,1006,00 South Central Coast
9 0,1160,1007,00 South Central Coast
10 0,1160,1008,00 Napa Valley
11 0,1160,1009,00 Napa Valley
12 0,1160,101,00 Napa Valley
13 0,1160,1010,00 Sonoma
14 0,1160,1011,00 Sonoma
15 0,1160,1012,00 Sonoma
16 0,1160,1013,00 South Central Coast
17 0,1160,1014,00 Napa Valley
18 0,1160,1015,00 Piedmont
19 0,1160,1016,00 Lombardy
20 0,1160,1017,00 Veneto
21 0,1160,1018,00 Veneto
22 0,1160,1019,00 Tuscany
23 0,1160,102,00 Napa Valley
24 0,1160,1020,00 Sicily
25 0,1160,1021,00 Veneto
26 0,1160,1022,00 Piedmont
27 0,1160,1023,00 Piedmont
28 0,1160,1024,00 Piedmont
29 0,1160,1025,00 Piedmont
30 0,1160,1026,00 Piedmont
31 0,1160,1027,00 Veneto
32 0,1160,1028,00 Latium & Rome
33 0,1160,1029,00 Tuscany
34 0,1160,103,00 Sierra Foothills
35 0,1160,1030,00 Mendocino
36 0,1160,1031,00 Napa Valley
37 0,1160,1032,00 Central Valley
38 0,1160,1033,00 Willamette Valley
39 0,1160,1034,00 Napa Valley
40 0,1160,1035,00 New England
41 0,1160,1036,00 New England
42 0,1160,1037,00 South Central Coast
43 0,1160,1038,00 Hudson River Valley
44 0,1160,1039,00 Hudson River Valley
45 0,1160,104,00 Napa Valley
46 0,1160,1040,00 Hudson River Valley
47 0,1160,1041,00 Hudson River Valley
48 0,1160,1042,00 Hudson River Valley
49 0,1160,1043,00 Bordeaux
50 0,1160,1044,00 Bordeaux
from teh above list, in the end, i should get output that looks
something like this... maybe sorted alphabetically...?
Napa Valley
Monterey Bay Area
Sonoma
South Central Coast
Piedmont
Lombardy
Veneto
Tuscany
Sicily
Latium & Rome
Sierra Foothills
Mendocino
Central Valley
Willamette Valley
New England
Hudson River Valley
Bordeaux