I needed a batch converter to turn non-xml style Excel spreadsheets (all the older versions) into delimited text files. If you run it without supplying a sheet name, it will list all the sheets available. You can also optionally define a delimiter to use instead of the default comma. So here’s my code along with part of the xlrd module which I tweaked a little.
#!/usr/bin/env python
character_encoding = 'latin_1'
import sys, re, xlrd
def get_row(bk, sh, rowx):
result = []
dmode = bk.datemode
ctys = sh.row_types(rowx)
cvals = sh.row_values(rowx)
for colx in range(sh.ncols):
cty = ctys[colx]
cval = cvals[colx]
cxfx = ''
if cty == xlrd.XL_CELL_DATE:
try:
showval = xlrd.xldate_as_tuple(cval, dmode)
except xlrd.XLDateError:
e1, e2 = sys.exc_info()[:2]
showval = "%s:%s" % (e1.__name__, e2)
cty = xlrd.XL_CELL_ERROR
elif cty == xlrd.XL_CELL_ERROR:
showval = xlrd.error_text_from_code.get(cval, '' % cval)
else:
showval = cval
result.append(showval)
return result
# print out sheet names
if len(sys.argv) > 2:
if sys.argv[1] == "--help" or sys.argv[1] == "-h":
print 'Usage: %s ' % sys.argv[0]
sys.exit(1)
elif len(sys.argv) > 3:
delimeter = sys.argv[3]
else:
delimeter = ','
# parse spreadsheet
book = xlrd.open_workbook(sys.argv[1])
for num in range(book.nsheets):
sheet = book.sheet_by_index(num)
if isinstance(sheet.name, unicode):
sheetname = sheet.name.encode(character_encoding,'replace')
findname = sys.argv[2].encode(character_encoding,'replace')
else:
sheetname = str(sheet.name)
findname = str(sys.argv[2])
if re.match(findname, sheetname):
for x in range(sheet.nrows):
row = get_row(book, sheet, x)
print delimeter.join(map(str, row))
elif len(sys.argv) == 2:
if sys.argv[1] == "--help" or sys.argv[1] == "-h":
print 'Usage: %s ' % sys.argv[0]
else:
book = book = xlrd.open_workbook(sys.argv[1])
for num in range(book.nsheets):
sheet = book.sheet_by_index(num)
print sheet.name
elif len(sys.argv) < 2:
print 'Usage: %s ' % sys.argv[0]
sys.exit(1)
I find it easiest to search a spreadsheet for a specific character to use as a delimiter, and if it doesn’t appear anywhere in the document then it’s good to go without wrapping the data into quotation marks or anything else. This program relies on just one module — xlrd. I did build it for a special purpose where all the spreadsheets that pass through it will all have similar data, so I know a comma will never show, but it will work on just about any xls you throw at it.