question about csv.DictReader

N

Norman Clerman

Hello,

I have the following python script (some of lines are wrapped):

#! /usr/bin/env python

import csv

def dict_test_1():
""" csv test program """

# Open the file Holdings_EXA.csv
HOLDING_FILE = 'Holdings_EXA.csv'
try:
csv_file = open(HOLDING_FILE, 'rt')
except IOError:
print('Problem opening {0}\nExiting').format(HOLDING_FILE)
exit()

# create a dictionary reader
try:
csv_reader = csv.DictReader(csv_file)
except NameError:
print('Cannot find file {0} to create a dictionary reader \nExiting').format(HOLDING_FILE)
exit()

# Print the keys in each row
i_row = 1
for row in csv_reader:
print ('There are {0} keys in row {1}').format(len(row.keys()), i_row)
print ('The keys in row {0} are \n{1}').format(i_row, row.keys())
i_row += 1
dict_test_1()

Here are the lines in file Holdings_EXA.csv:
Please note that the first field in the first row is "Holdings"

"Holdings","Weighting","Type","Ticker","Style","First Bought","Shares Owned","Shares Change","Sector","Price","Day Change","Day high/low","Volume","52-Wk high/low","Country","3-Month Return","1-Year Return","3-Year Return","5-Year Return","Market Cap Mil","Currency","Morningstar Rating","YTD Return","P/E","Maturity Date","Coupon %","Yield to Maturity"
"Nestle SA","1.91","EQUITY","NESN","Large Core","1999-12-31","3732276","197810","Consumer Defensive","67.65","-","67.75-67.35","1211531","67.75-53.8","Switzerland","10.42","21.25","10.5","8.84","213475.59","CHF","2","12.92","21.69","-","-","-"
"HSBC Holdings PLC","1.75","EQUITY","HSBA","Large Value","1999-12-31","21120203","1711934","Financial Services","733.3","-1.4|-0","738.8-731","7839724","739.9-501.2","United Kingdom","14.51","37.17","3.88","2.77","132694.66","GBP","3","13.93","15.55","-","-","-"
"Novartis AG","1.33","EQUITY","NOVN","Large Core","2003-06-30","2669523","206851","Healthcare","65.95","0.5|0.01","66-65.4","1121549","66-48.29","Switzerland","15.1","36.5","6.16","8.53","158671.66","CHF","4","16.7","17.76","-","-","-"
"Roche Holding AG","1.31","EQUITY","ROG","Large Growth","2003-05-31","817830","59352","Healthcare","214.8","1.4|0.01","215.2-213.1","684173","220.4-148.4","Switzerland","17.45","37.95","7.78","4.09","34000","CHF","3","18.09","19.05","-","-","-"

Finally, here are the results of running the script:


norm@lima:~/python/overlap$ python dict_test_1.py
There are 27 keys in row 1
The keys in row 1 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 2
The keys in row 2 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 3
The keys in row 3 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 4
The keys in row 4 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
norm@lima:~/python/overlap$


Can anyone explain the presence of the characters "\xref\xbb\xbf" before the first field contents "Holdings" ?

Thanks,
Norm
 
M

MRAB

Hello,

I have the following python script (some of lines are wrapped):

#! /usr/bin/env python

import csv

def dict_test_1():
""" csv test program """

# Open the file Holdings_EXA.csv
HOLDING_FILE = 'Holdings_EXA.csv'
try:
csv_file = open(HOLDING_FILE, 'rt')
except IOError:
print('Problem opening {0}\nExiting').format(HOLDING_FILE)
exit()

# create a dictionary reader
try:
csv_reader = csv.DictReader(csv_file)
except NameError:
print('Cannot find file {0} to create a dictionary reader \nExiting').format(HOLDING_FILE)
exit()

# Print the keys in each row
i_row = 1
for row in csv_reader:
print ('There are {0} keys in row {1}').format(len(row.keys()), i_row)
print ('The keys in row {0} are \n{1}').format(i_row, row.keys())
i_row += 1
dict_test_1()

Here are the lines in file Holdings_EXA.csv:
Please note that the first field in the first row is "Holdings"

"Holdings","Weighting","Type","Ticker","Style","First Bought","Shares Owned","Shares Change","Sector","Price","Day Change","Day high/low","Volume","52-Wk high/low","Country","3-Month Return","1-Year Return","3-Year Return","5-Year Return","Market Cap Mil","Currency","Morningstar Rating","YTD Return","P/E","Maturity Date","Coupon %","Yield to Maturity"
"Nestle SA","1.91","EQUITY","NESN","Large Core","1999-12-31","3732276","197810","Consumer Defensive","67.65","-","67.75-67.35","1211531","67.75-53.8","Switzerland","10.42","21.25","10.5","8.84","213475.59","CHF","2","12.92","21.69","-","-","-"
"HSBC Holdings PLC","1.75","EQUITY","HSBA","Large Value","1999-12-31","21120203","1711934","Financial Services","733.3","-1.4|-0","738.8-731","7839724","739.9-501.2","United Kingdom","14.51","37.17","3.88","2.77","132694.66","GBP","3","13.93","15.55","-","-","-"
"Novartis AG","1.33","EQUITY","NOVN","Large Core","2003-06-30","2669523","206851","Healthcare","65.95","0.5|0.01","66-65.4","1121549","66-48.29","Switzerland","15.1","36.5","6.16","8.53","158671.66","CHF","4","16.7","17.76","-","-","-"
"Roche Holding AG","1.31","EQUITY","ROG","Large Growth","2003-05-31","817830","59352","Healthcare","214.8","1.4|0.01","215.2-213.1","684173","220.4-148.4","Switzerland","17.45","37.95","7.78","4.09","34000","CHF","3","18.09","19.05","-","-","-"

Finally, here are the results of running the script:


norm@lima:~/python/overlap$ python dict_test_1.py
There are 27 keys in row 1
The keys in row 1 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 2
The keys in row 2 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 3
The keys in row 3 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 4
The keys in row 4 are
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
norm@lima:~/python/overlap$


Can anyone explain the presence of the characters "\xref\xbb\xbf" before the first field contents "Holdings" ?
Microsoft Windows indicates that a text file contains text encoded as
UTF-8 by including a signature at its start. (Does the file also have
"\r\n" line endings? Presumably it was created on a Windows system.)

Try opening the file with the "utf-8-sig" encoding instead; this will
drop the signature if present.
 
T

Tim Chase

Can anyone explain the presence of the characters "\xref\xbb\xbf"
before the first field contents "Holdings" ?

(you mean "\xef", not "\xref")

This is a byte-order-mark (BOM), which you can read about at [1]. In
this case, it denotes the file as UTF-8 encoded. Certain programs
insert these, though it's more important with UTF-16 or UTF-32
encodings where the byte-order and endian'ness actually matters. I
believe Notepad and Visual Studio on Win32 were both offenders when
it came to inserting unbidden BOMs.

-tkc

[1]
http://en.wikipedia.org/wiki/Byte_order_mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top