B
Bhay Zone
I am pretty new to ruby and am trying to read text data coming from a
backend which can only be queried using proprietary Command Line
Interface commands.
The problem is that this text data contains non-ascii characters...I
don't know what these characters are .. and nor do I know the encoding.
Earlier, when we were using ruby 1.8.7 we had some code that handled
these characters pretty well. Now after switching to ruby 1.9.2, the
same code breaks with encoding errors like "invalid multibyte sequence"
in gsub.
Here is the code we were using to replace the non-ascii characters which
is breaking now. The code it breaks at the first line.
content.gsub!( "\221", '')
content.gsub!( "\222", '')
content.gsub!( "\223", '')
content.gsub!( "\224", '')
content.gsub!( "\246", '')
content.gsub!( "\247", '')
content.gsub!( "\237", '')
content.gsub!( "\377", '')
content.gsub!( "\226", '')
content.gsub!( "\227", '')
content.gsub!( "\\000", "?")
content.gsub!( "\\001", "?")
content.gsub!( "\FB01", "")
content.gsub!(/[\x80-\xFF]/,'')
content.gsub!(/[\x00-\x08]/,'')
content.gsub!(/[\x0B-\x0C]/,'')
content.gsub!(/[\x0E-\x1F]/,'')
I just cannot figure how to fix this problem and any help would be
greatly appreciated.
backend which can only be queried using proprietary Command Line
Interface commands.
The problem is that this text data contains non-ascii characters...I
don't know what these characters are .. and nor do I know the encoding.
Earlier, when we were using ruby 1.8.7 we had some code that handled
these characters pretty well. Now after switching to ruby 1.9.2, the
same code breaks with encoding errors like "invalid multibyte sequence"
in gsub.
Here is the code we were using to replace the non-ascii characters which
is breaking now. The code it breaks at the first line.
content.gsub!( "\221", '')
content.gsub!( "\222", '')
content.gsub!( "\223", '')
content.gsub!( "\224", '')
content.gsub!( "\246", '')
content.gsub!( "\247", '')
content.gsub!( "\237", '')
content.gsub!( "\377", '')
content.gsub!( "\226", '')
content.gsub!( "\227", '')
content.gsub!( "\\000", "?")
content.gsub!( "\\001", "?")
content.gsub!( "\FB01", "")
content.gsub!(/[\x80-\xFF]/,'')
content.gsub!(/[\x00-\x08]/,'')
content.gsub!(/[\x0B-\x0C]/,'')
content.gsub!(/[\x0E-\x1F]/,'')
I just cannot figure how to fix this problem and any help would be
greatly appreciated.