Extract data from ASCII file

R

Ren

Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?
 
M

Mike C. Fletcher

With Python 2.3:
.... line = line[9:] # skip prefix
.... while line:
.... prefix, line = line[:4],line[4:]
.... yield prefix[2:]+prefix[:2]
........ print number
....
28E7
3005
00AC
30A5
00AD
0BAD
2805
0BAC
E2
If you want to convert the hexadecimal strings to actual integers, use
int( prefix, 16 ).

HTH,
Mike
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
 
I

Irmen de Jong

Ren said:
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

Say the file is called data.txt
Try this:
---------------------------------
def process(line):
line=line[9:]
result=[]
for i in range(0,32,4):
result.append( line[i+2:i+4] + line[i:i+2] )
return result

for line in open("data.txt"):
print process(line)
---------------------------------
For your single example data line, it prints
['28E7', '3005', '00AC', '30A5', '00AD', '0BAD', '2805', '0BAC']

It's a list containing the 8 extracted hexadecimal strings.
Instead of printing the list you can do whatever you want with it.
If you need more info, just ask.

--Irmen de Jong
 
E

eleyg

Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?

The first response only works with python-2.3 (yield is a newly
reserved word).

The second response did not work for me and left off the last couple
values.

You might want to try this. It iterates down the list, grabbing two
characters at a time, reversing them and appending them to a list. It
also allows a second list argument to store the first 8 digits
(mutable lists are passed by reference)

-------------------------------------------------------
from types import *

def process(line,key):
""" Pass in a string type (line) and
an empty list to store the key """
if type(key) is ListType and key == []:
key.append(line[1:8])
else:
print "Key not ListType or not empty"
result=[]
line=line[9:]
while line:
k2,k1 = line[:2],line[2:4]
line=line[4:]
result.append(k1+k2)
return result
-------------------------------------------------------
 
W

William Park

Ren said:
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?

1. Use FIXEDWIDTH in Awk.

2. Use string slice in Python.

3. Use variable operation in (Bash) shell.
 
A

Anton Vredegoor

The first response only works with python-2.3 (yield is a newly
reserved word).

The second response did not work for me and left off the last couple
values.

The third response uses typechecking and stores a value in an
unreachable place ...

Maybe the feachur-less code is better (tested very lightly):

def asBytes(line,offset):
""" split a line into 2-char chunks, starting at offset'"""
res = []
for i in range(offset,len(line),2):
res.append(line[i:i+2])
return res

def asWords(line,offset=0,swapbytes=0):
"""split a line into words that have maximally 4 chars,
starting at offset, optionally swapping 2-char chunks"""
res = []
flip = 0
for b in asBytes(line,offset):
if flip:
if swapbytes:
res.append(b+prev)
else:
res.append(prev+b)
else:
prev = b
flip = 1-flip
if flip:
res.append(b)
return res

def test():
line =":10000000E7280530AC00A530AD00AD0B0528AC0BE2"
print asWords(line,offset=9,swapbytes=1)

if __name__=='__main__':
test()

output is:

['28E7', '3005', '00AC', '30A5', '00AD', '0BAD', '2805', '0BAC', 'E2']

Anton
 
R

Ren

What is 'prefix' used for? I searched the docs and didn't come up with
anything that seemed appropriated.


Mike C. Fletcher said:
With Python 2.3:
... line = line[9:] # skip prefix
... while line:
... prefix, line = line[:4],line[4:]
... yield prefix[2:]+prefix[:2]
...... print number
... ............snip...............
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
 
I

Irmen de Jong

Ren said:
What is 'prefix' used for? I searched the docs and didn't come up with
anything that seemed appropriated.

Umm... it's just a variable name :)

--Irmen
 
M

Mike C. Fletcher

Ren said:
What is 'prefix' used for? I searched the docs and didn't come up with
anything that seemed appropriated.
It's just the name (variable) I used to store the "prefix" of the rest
of the line. It could just as easily have been called "vlad", but using
simple, descriptive names for variables makes the code easier to read
(in most cases, this being the obvious counter-example). In Python when
you assign to something:

x, y = v, t

you are creating a (possibly new) bound name (if something of the same
name exists in a higher namespace it is shadowed by this bound name, so
even if there was a built-in function called "prefix" my assignment to
the name would have shadowed the name).

This line here says:

prefix, line = line[:4],line[4:]

that is, assign the name "prefix" to the result of slicing the line from
the starting index to index 4, and assign the name "line" to the result
of slicing from index 4 to the ending index. Under the covers the
right-hand-side of the expression is creating a two-element tuple, then
that tuple is unpacked to assign it's elements to the two variables on
the left-hand-side.

Python is a fairly small language, if a linguistic construct works a
particular way in one context it *normally* works that way in every
context (unless the programmer explicitly changes that (and that's
generally *only* done by meta-programmers seeking to create
domain-specific functionality, and even then as a matter of style, it's
kept to a minimum to avoid confusing people (and in this particular
case, AFAIK there's no way to override variable assignment (though (evil
;) ) people have proposed adding such a hook on numerous occasions)))).

The later line is simply manipulating the (string) object now referred
to as "prefix":

result.append( prefix[2:]+prefix[:2] )

that is, take the result of slicing from index 2 to the end and add it
to the result of slicing from the start to index 2. This has the effect
of reversing the order of the 2-byte hexadecimal encodings of "characters".

Oh, and since someone took issue with my use of (new in Python 2.2)
yield (luddites :) ;) ), here's a non-generator version using the same
basic code pattern:
.... line = line[9:] # skip prefix
.... result = []
.... while line:
.... prefix, line = line[:4],line[4:]
.... result.append( prefix[2:]+prefix[:2] )
.... return result
.... ['28E7', '3005', '00AC', '30A5', '00AD', '0BAD', '2805', '0BAC', 'E2']

Have fun :) ,
Mike

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
 
A

Anton Vredegoor

Mike C. Fletcher said:
Oh, and since someone took issue with my use of (new in Python 2.2)
yield (luddites :) ;) ), here's a non-generator version using the same
basic code pattern:
... line = line[9:] # skip prefix
... result = []
... while line:
... prefix, line = line[:4],line[4:]
... result.append( prefix[2:]+prefix[:2] )
... return result

The basic problem with this code pattern is that it makes a lot of
large slices of the line. With a small line there is no problem but it
looks like it doesn't scale well.

After reconsidering all alternatives I finally favor a variant of
Irmen's code, but without slicing the whole line and -after all-
definitely *using* yield because it seems appropriate here.

def process(line,offset):
for i in xrange(offset,len(line),4):
yield line[i+2:i+4] + line[i:i+2]

def test():
line = ":10000000E7280530AC00A530AD00AD0B0528AC0BE2"
print '\n'.join(process(line,9))

if __name__=='__main__':
test()

output is:

28E7
3005
00AC
30A5
00AD
0BAD
2805
0BAC
E2

Anton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top