remove non binary characters from string

J

Junkone

HI
When i run this sql, i get error
nsert into stocksymbols
(symbolname,symbol2exchange,company_name,symbol2sector,symbol2industry,
stocksummary )
values('CHINA',16,'CDC Corp.',null ,null,'CDC Corporation (CDC)
is a global provider of enterprise software, mobile services and
applications, and Internet and media services, with facilities in the
People"s Republic of China, North America, Europe and Australia.
The Company operates in four segments: Software, Business Services,
Mobile Services and Applications, and Internet and Media. During the
year ended December 31, 2005, the Company reorganized these segments
into two core business units: CDC Software and China.com Inc. In
February 2006, CDC acquired the assets of JRG Software, Inc. and
Horizon Companies, Inc. In April 2006, the Company acquired c360
Solutions, Inc. In June 2005, CDC acquired Unitedcrest Investments
Limited (which holds Shenzhen KK Technology Ltd.) In April 2005, the
Company changed its name from chinadotcom corporation to CDC
Corporation. In April 2005, CDC"s 81%-owned subsidiary changed its
name from hongkong.com Corporation to China.com Inc. More from Reuters
»')
ERROR C22021 Minvalid byte sequence for encoding "UNICODE":
0xbb Fwchar.c L1310 Rreport_invalid_encoding


How do i clean up the data before inserting into the database. I tried
str.dump but it ends up escaping my strings for eg,
values('CHINA',16 became values("'CHINA'","16" ......

How can i clean it up without escaping the characters.

Seede
 
D

David Vallner

HI
When i run this sql, i get error

[snip probably unimportant SQL spam]
How do i clean up the data before inserting into the database. I tried
str.dump but it ends up escaping my strings for eg,
values('CHINA',16 became values("'CHINA'","16" ......

How can i clean it up without escaping the characters.

Hmm, this doesn't quite have enough details to go by, but...

<inane>
You can't have nonbinary characters in a string - they're all binary ;;P
</inane>

Are you building up the query string yourself, or using one of the methods
your database driver (most certainly) provides?

The usual pattern is to use something like db_connection.execute('insert
into womble(fluff) values(?, ?)', 'foo', 'bar'), and any self-respecting
DB library should be able to determine the needed encoding and escape and
convert string data by itself. Avoid hacking your own query string
escaping routines, that's newbly and tends to break when you least expect
it.

I think your problem is that you're inserting the '»' character on the end
in Latin 1 (where it's 0xBB), which is probably invalid in UTF-8 - the
encoding probably used by the database. If the database driver doesn't do
that, you'll have to convert encodings by hand (which might get a bit
sensitive if you'll ever deploy that application on multiple computers) -
look at the documentation to the 'iconv' library, which is part of the
standard distribution unless memory fails me.\

David Vallner
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top