Thursday, January 15, 2009

This can't be right

lines in the input file may contain a hex sequence interspersed among ascii text, in the form 0x{hex characters}.

import fileinput
import re
import codecs
utf = codecs.lookup('utf-8')
hex = re.compile("[0-9a-f]{2}")
pattern = re.compile("0x[0-9a-f]*")
def hexrepl(matchobj):
input = matchobj.group(0)
result = ""
numbers = []
for m in hex.findall(input):
numbers.append(chr(int(m,16)))
result = result.join(numbers)
return utf.encode(utf.decode(result)[0])[0]
for line in fileinput.input():
line = re.sub(pattern,hexrepl,line)
print line.rstrip()
exit


This feels overly complicated, and I'm only an occasional Python hack. I'm forgetting something.

No comments: