B
blatt
Hi all,
but a particular hello to Chris Angelino which with their critics and
suggestions pushed me to make a full revision of my application on
hex dump in presence of utf-8 chars.
If you are not using python 3, the utf-8 codec can add further programming
problems, especially if you are not a guru....
The script seems very long but I commented too much ... sorry.
It is very useful (at least IMHO...)
It works under Linux. but there is still a little problem which I didn't
solve (at least programmatically...).
# -*- coding: utf-8 -*-
# px.py vers. 11 (pxb.py) # python 2.6.6
# hex-dump w/ or w/out utf-8 chars
# Using spaces as separators, this script shows
# (better than tabnanny) uncorrect indentations.
# to save output > python pxb.py hex.txt > px9_out_hex.txt
nLenN=3 # n. of digits for lines
# version almost thoroughly rewritten on the ground of
# the critics and modifications suggested by Chris Angelico
# in the first version the utf-8 conversion to hex was shown horizontaly:
# 005 # qwerty: non è unicode bensì ascii
# 2 7767773 666 ca 7666666 6667ca 676660
# 3 175249a efe 38 5e93f45 25e33c 13399a
# ... but I had to insert additional chars to keep the
# synchronization between the literal and the hex part
# 005 # qwerty: non è. unicode bensì. ascii
# 2 7767773 666 ca 7666666 6667ca 676660
# 3 175249a efe 38 5e93f45 25e33c 13399a
# in the second version I followed Chris suggestion:
# "to show the hex utf-8 vertically"
# 005 # qwerty: non è unicode bensì ascii
# 2 7767773 666 c 7666666 6667c 676660
# 3 175249a efe 3 5e93f45 25e33 13399a
# a a
# 8 c
# between the two solutions, I selected the first one + syncronization,
# which seems more compact and easier to program (... I'm lazy...)
# various run options:
# std : python px.py file
# bash cat : cat file | python px.py (alias hex)
# bash echo: echo line | python px.py " "
# works on any n. of bytes for utf-8
# For the user: it is helpful to have in a separate file
# all special characters of interest, together with their names.
# error:
# echo '345"789"'|hex > 345"789" 345"789"
# 33323332 instead of 333233320
# 3452789 a " " 34527892a
# ... correction: avoiding "\n at end of test-line
# echo "345'789'"|hex > 345'789'
# 333233320
# 34577897a
# same error in every run option
# If someone can solve this bug...
###################
import fileinput
import sys, commands
lF=[] # input file as list
for line in fileinput.input(): # handles all the details of args-or-stdin
lF.append(line)
sSpacesXLN = ' ' * (nLenN+1)
for n in xrange(len(lF)):
sLineHexND=lF[n].encode('hex') # ND = no delimiter (space)
sLineHex =lF[n].encode('hex').replace('20',' ')
sLineHexH =sLineHex[::2]
sLineHexL =sLineHex[1::2]
sSynchro=''
for k in xrange(0,len(sLineHexND),2):
if sLineHexND[k]<'8':
sSynchro+= sLineHexND[k]+sLineHexND[k+1]
k+=1
elif sLineHexND[k]=='c':
sSynchro+='c'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+'2e'
k+=3
elif sLineHexND[k]=='e':
sSynchro+='e'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+\
sLineHexND[k+4]+sLineHexND[k+5]+'2e2e'
k+=5
# text output (synchroinized)
print str(n+1).zfill(nLenN)+' '+sSynchro.decode('hex'),
print sSpacesXLN + sLineHexH
print sSpacesXLN + sLineHexL+ '\n'
If there are problems of understanding, probably due to fonts, the best
thing is import it in an editor with "mono" fonts...
As I already told to Chris... critics are welcome!
Bye, Blatt.
but a particular hello to Chris Angelino which with their critics and
suggestions pushed me to make a full revision of my application on
hex dump in presence of utf-8 chars.
If you are not using python 3, the utf-8 codec can add further programming
problems, especially if you are not a guru....
The script seems very long but I commented too much ... sorry.
It is very useful (at least IMHO...)
It works under Linux. but there is still a little problem which I didn't
solve (at least programmatically...).
# -*- coding: utf-8 -*-
# px.py vers. 11 (pxb.py) # python 2.6.6
# hex-dump w/ or w/out utf-8 chars
# Using spaces as separators, this script shows
# (better than tabnanny) uncorrect indentations.
# to save output > python pxb.py hex.txt > px9_out_hex.txt
nLenN=3 # n. of digits for lines
# version almost thoroughly rewritten on the ground of
# the critics and modifications suggested by Chris Angelico
# in the first version the utf-8 conversion to hex was shown horizontaly:
# 005 # qwerty: non è unicode bensì ascii
# 2 7767773 666 ca 7666666 6667ca 676660
# 3 175249a efe 38 5e93f45 25e33c 13399a
# ... but I had to insert additional chars to keep the
# synchronization between the literal and the hex part
# 005 # qwerty: non è. unicode bensì. ascii
# 2 7767773 666 ca 7666666 6667ca 676660
# 3 175249a efe 38 5e93f45 25e33c 13399a
# in the second version I followed Chris suggestion:
# "to show the hex utf-8 vertically"
# 005 # qwerty: non è unicode bensì ascii
# 2 7767773 666 c 7666666 6667c 676660
# 3 175249a efe 3 5e93f45 25e33 13399a
# a a
# 8 c
# between the two solutions, I selected the first one + syncronization,
# which seems more compact and easier to program (... I'm lazy...)
# various run options:
# std : python px.py file
# bash cat : cat file | python px.py (alias hex)
# bash echo: echo line | python px.py " "
# works on any n. of bytes for utf-8
# For the user: it is helpful to have in a separate file
# all special characters of interest, together with their names.
# error:
# echo '345"789"'|hex > 345"789" 345"789"
# 33323332 instead of 333233320
# 3452789 a " " 34527892a
# ... correction: avoiding "\n at end of test-line
# echo "345'789'"|hex > 345'789'
# 333233320
# 34577897a
# same error in every run option
# If someone can solve this bug...
###################
import fileinput
import sys, commands
lF=[] # input file as list
for line in fileinput.input(): # handles all the details of args-or-stdin
lF.append(line)
sSpacesXLN = ' ' * (nLenN+1)
for n in xrange(len(lF)):
sLineHexND=lF[n].encode('hex') # ND = no delimiter (space)
sLineHex =lF[n].encode('hex').replace('20',' ')
sLineHexH =sLineHex[::2]
sLineHexL =sLineHex[1::2]
sSynchro=''
for k in xrange(0,len(sLineHexND),2):
if sLineHexND[k]<'8':
sSynchro+= sLineHexND[k]+sLineHexND[k+1]
k+=1
elif sLineHexND[k]=='c':
sSynchro+='c'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+'2e'
k+=3
elif sLineHexND[k]=='e':
sSynchro+='e'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+\
sLineHexND[k+4]+sLineHexND[k+5]+'2e2e'
k+=5
# text output (synchroinized)
print str(n+1).zfill(nLenN)+' '+sSynchro.decode('hex'),
print sSpacesXLN + sLineHexH
print sSpacesXLN + sLineHexL+ '\n'
If there are problems of understanding, probably due to fonts, the best
thing is import it in an editor with "mono" fonts...
As I already told to Chris... critics are welcome!
Bye, Blatt.