Python : CSV to Dictionary
This post is about writing a CSV reader which generates a dictionary from a csv file.The reader accepts path to csv file and python types corresponding to the CSV headers as input params. The output is a dictionary generated from the specified file. By the end of this post you will get a good understanding of lists, dictionary and zip in Python.
import csv
def read_csv(file, types):
with open(file, 'r', encoding='utf-8-sig')
rows = csv.reader(f)
head = next(rows)
records = []
for row_num, row in enumerate(rows):
try:
record = dict(zip(head,
[func(value) for func, value in zip(types, row)]))
records.append(record)
except ValueError as ve:
print('Ignored row {} - {} in {} due to {}'
.format(row_num, row, file, ve))
return records
Calling above function..
import pprint
records = read_csv('stocks/stocks.csv',[str, str, int, float])
pprint.pprint(records)
Now let us dissect the above code line by line. The function first opens the file and pass the
stream to csv.reader()
. we are using encoding=‘utf-8-sig’ to ignore BOM. The first line is header and
its grabbed using next()
. In order to grab the line number we are using enumerate()
.
We are using line numbers to show meaninful error when we fail to convert a type. Before disecting the
next line let us understand how zip()
works.
name = ['John', 'Bond', 'Gavin']
age = [33, 32, 23]
name_age = zip(name,age)
for name, age in name_age:
print (name,age)
#output
John 33
Bond 32
Gavin 23
Zip combines two list into a single list, combining corresponding elements in lists as a tuple. So output is a list of tuples. So the output generated by zip looks similar to this.
[(John,33), (Bond,32), (Gavin,23)]
Now back to our code line:10
. Let us consider a snap shot of our input csv file as shown below.
Name,Date,Shares,Price
HPQ,7/11/2007,100,32.2
IBM,7/12/2007,50,91.9
GE,7/13/2007,150,83.44
CAT,7/14/2007,200,51.23
MSFT,7/15/2007,95,40.37
HPE,7/16/2007,50,65.1
AFL,7/17/2007,100,70.44
[func(value) for func, value in zip(types, row)]
.
The zip(types, row)
will generate following output for each row in rows.
[(str,'HPQ'), (str,'7/11/2007'), (int,100), (float,32.2)]
Its important to remember that str, int, float we are passing are actual types and not strings.
Now [func(value) for func, value in zip(types, row)]` will generate following output for each row.
['HPQ', '7/11/2007', 100, 32.2]
So we generated a single list after proper type conversion. Its important that we handle exception here. Now we are applying zip on above list with headers we grabbed earlier as shown below.
record = dict(zip(head,[func(value) for func, value in zip(types, row)]))
. So this will generate output
like this for each row.
[('Name','HPQ'), ('Date','7/11/2007'), ('Shares',100), ('Price',32.2)]
Now we are converting this list to a dictionary, using dict() which will generate following output for a single row.
{‘Name’:‘HPQ’, ‘Date’:‘7/11/2007’, ‘Shares’:100), (‘Price’:32.2)}
In line 11
we are appending each dictionary corresponding to a row into a called records[]
.The final output
for above csv file looks like this.
[{'Date': '7/11/2007', 'Name': 'HPQ', 'Price': 32.2, 'Shares': 100},
{'Date': '7/12/2007', 'Name': 'IBM', 'Price': 91.9, 'Shares': 50},
{'Date': '7/13/2007', 'Name': 'GE', 'Price': 83.44, 'Shares': 150},
{'Date': '7/14/2007', 'Name': 'CAT', 'Price': 51.23, 'Shares': 200},
{'Date': '7/15/2007', 'Name': 'MSFT', 'Price': 40.37, 'Shares': 95},
{'Date': '7/16/2007', 'Name': 'HPE', 'Price': 65.1, 'Shares': 50},
{'Date': '7/17/2007', 'Name': 'AFL', 'Price': 70.44, 'Shares': 100}]
So we succesfully convert a csv file to python dictionary.
Coding is fun enjoy…