Reading Data From OpenXML Spreadsheets

During one of my assignments I had the task to read information out of a spreadsheet created with Excel 2007 (.xlsx). I choose to use the Open XML Format SDK 2.0 (april 2009 ctp) to read data from the spreadsheet.

As I Googled for the standard way of retrieving data of different type from individual cells I could not find how to interpret data from a cell with the format of "tt:mm:ss" but could not find any information on how to do it.

So after a little debugging looking into what the different cells looked like when different types of data was put into the cells I creates a small piece of code describing what to do.

Even though the example code doesn't look like it, every cell can have a style index, which also describes things like alignment and borders. But describing that will be a post of it's own if I even needs to figure out how to do that.

using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(Path, false))
    WorkbookPart wb = spreadsheet.WorkbookPart;
    List<cellformats> formats = new List<cellformats>(wb.WorkbookStylesPart.Stylesheet.Descendants<cellformats>());
    foreach (var sheet in wb.Workbook.Descendants<sheet>())
        WorksheetPart wp = (WorksheetPart)spreadsheet.WorkbookPart.GetPartById(sheet.Id);
        SharedStringTablePart sstp = wb.GetPartsOfType<sharedstringtablepart>().First();
        SharedStringItems = sstp.SharedStringTable.Elements<sharedstringitem>().ToArray();
        foreach (var row in wp.Worksheet.Descendants<row>())
            List<cell> cells = new List<cell>(row.Descendants<cell>());
            for (int i = 0; i < cells.Count; i++)
                if (cells[i].CellValue == null)
                    // What ever to to if data is null
                    if (cells[i].DataType != null &&
                    cells[i].DataType.HasValue &&
                    cells[i].DataType.Value == CellValues.SharedString)
                        // look up the actual string value in the shared string table
                        SharedStringItem ssi = SharedStringItems[int.Parse(cells[i].CellValue.Text)];
                        // Do what you please with the string
                    else if (cells[i].StyleIndex != null &&
                        // look up the style used for the cell
                        int formatStyleIndex = Convert.ToInt32(cells[i].StyleIndex.Value);
                        CellFormat cf = formats[0].Descendants<cellformat>().ToList()[formatStyleIndex];

                        // now you can check the NumberFormatId
                        if (cf.NumberFormatId == 21) // 21 is the “tt:mm:ss” format
                            // and the culture seem to be english-US 
                            System.Globalization.CultureInfo ci = new System.Globalization.CultureInfo("EN-US");
                            // one second is
                            double second = Convert.ToDouble("1.1574074074074073E-5", ci.NumberFormat);
                            // get what’s in the cell
                            CellValue customValue = cells[i].CellValue;
                            // convert that value to a double
                            double cellSeconds = Convert.ToDouble(customValue.InnerText, ci.NumberFormat);
                            // and calculate the total amount of seconds
                            // this is an integer to compensate for the small deviation that can occur with doubles
                            int totalSeconds = (int)(cellSeconds / second);
                            // and from that you can make a TimeSpan representing “tt:mm:ss:”
                            TimeSpan ts = new TimeSpan(0, 0, totalSeconds);
                        else if (cf.NumberFormatId == 0) // 0 seems to be just a number
                            // Do stuff