K
klashxx
Hi , i need a fast way to delete duplicates entrys from very huge
files ( >2 Gbs ) , these files are in plain text.
...To clarify, this is the structure of the file:
30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|
01|F|0207|00|||+0005655,00|||+0000000000000,00
30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|
01|F|0207|00|||+0000000000000,00|||+0000000000000,00
30xx|4150010003502043|CARDS|20081031|MP415001|00000024265698|01|F|
1804|
00|||+0000000000000,00|||+0000000000000,00
Having a key formed by the first 7 fields i want to print or delete
only the duplicates( the delimiter is the pipe..).
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but
it always ended with the same result (out of memory!)
In using HP-UX large servers.
I 'm very new to perl, but i read somewhere tha Tie::File module can
handle very large files , i tried but cannot get the right code...
Any advice will be very well come.
Thank you in advance.
Regards
PD:I do not want to split the files.
files ( >2 Gbs ) , these files are in plain text.
...To clarify, this is the structure of the file:
30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|
01|F|0207|00|||+0005655,00|||+0000000000000,00
30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|
01|F|0207|00|||+0000000000000,00|||+0000000000000,00
30xx|4150010003502043|CARDS|20081031|MP415001|00000024265698|01|F|
1804|
00|||+0000000000000,00|||+0000000000000,00
Having a key formed by the first 7 fields i want to print or delete
only the duplicates( the delimiter is the pipe..).
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but
it always ended with the same result (out of memory!)
In using HP-UX large servers.
I 'm very new to perl, but i read somewhere tha Tie::File module can
handle very large files , i tried but cannot get the right code...
Any advice will be very well come.
Thank you in advance.
Regards
PD:I do not want to split the files.