Convert Excel spreadsheets to CSV in Linux without OpenOffice

Posted in python

I needed a batch converter to turn non-xml style Excel spreadsheets (all the older versions) into delimited text files. If you run it without supplying a sheet name, it will list all the sheets available. You can also optionally define a delimiter to use instead of the default comma. So here’s my code along with part of the xlrd module which I tweaked a little.

#!/usr/bin/env python

character_encoding = 'latin_1'
import sys, re, xlrd

def get_row(bk, sh, rowx):
    result = []
    dmode = bk.datemode
    ctys = sh.row_types(rowx)
    cvals = sh.row_values(rowx)
    for colx in range(sh.ncols):
        cty = ctys[colx]
        cval = cvals[colx]
        cxfx = ''
        if cty == xlrd.XL_CELL_DATE:
            try:
                showval = xlrd.xldate_as_tuple(cval, dmode)
            except xlrd.XLDateError:
                e1, e2 = sys.exc_info()[:2]
                showval = "%s:%s" % (e1.__name__, e2)
                cty = xlrd.XL_CELL_ERROR
        elif cty == xlrd.XL_CELL_ERROR:
            showval = xlrd.error_text_from_code.get(cval, '' % cval)
        else:
            showval = cval
        result.append(showval)
    return result

# print out sheet names
if len(sys.argv) > 2:
    if sys.argv[1] == "--help" or sys.argv[1] == "-h":
        print 'Usage: %s   ' % sys.argv[0]
        sys.exit(1)
    elif len(sys.argv) > 3:
        delimeter = sys.argv[3]
    else:
        delimeter = ','

    # parse spreadsheet
    book = xlrd.open_workbook(sys.argv[1])

    for num in range(book.nsheets):
        sheet = book.sheet_by_index(num)

        if isinstance(sheet.name, unicode):
            sheetname = sheet.name.encode(character_encoding,'replace')
            findname = sys.argv[2].encode(character_encoding,'replace')
        else:
            sheetname = str(sheet.name)
            findname = str(sys.argv[2])

        if re.match(findname, sheetname):
            for x in range(sheet.nrows):
                row = get_row(book, sheet, x)
                print delimeter.join(map(str, row))

elif len(sys.argv) == 2:
    if sys.argv[1] == "--help" or sys.argv[1] == "-h":
        print 'Usage: %s   ' % sys.argv[0]
    else:
        book = book = xlrd.open_workbook(sys.argv[1])
        for num in range(book.nsheets):
            sheet = book.sheet_by_index(num)
            print sheet.name

elif len(sys.argv) < 2:
        print 'Usage: %s   ' % sys.argv[0]
        sys.exit(1)

I find it easiest to search a spreadsheet for a specific character to use as a delimiter, and if it doesn’t appear anywhere in the document then it’s good to go without wrapping the data into quotation marks or anything else. This program relies on just one module — xlrd. I did build it for a special purpose where all the spreadsheets that pass through it will all have similar data, so I know a comma will never show, but it will work on just about any xls you throw at it.

Posted by admica   @   25 September 2009

Related Posts

0 Comments

No comments yet. Be the first to leave a comment !
Leave a Comment

Name

Email

Website

Previous Post
« System wide proxy settings in Windows
Next Post
Fuel Injector Spray Timing for Reduced Hydrocarbon Emissions and Higher BSFC »
Powered by Wordpress   |   Lunated designed by ZenVerse