Some initial details of using XMLReader with file streaming based on its implementation for the listWorksheetNames() and listWorksheetInfo() methods.
Version 1.7.8 | Current Development Code | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
listWorksheetNames() | listWorksheetInfo() | listWorksheetNames() | listWorksheetInfo() | |||||||||
Format | Time (s) | Memory after call (MB) | Peak memory usage (MB) | Time (s) | Memory after call (MB) | Peak memory usage (MB) | Time (s) | Memory after call (MB) | Peak memory usage (MB) | Time (s) | Memory after call (MB) | Peak memory usage (MB) |
Excel 2007 .xlsx | 0.0114 | 1.25 | 1.25 | 4.1998 | 2.50 | 8.75 | 0.0113 | 1.25 | 1.25 | 3.2238 | 1.25 | 1.25 |
Open/Libre Office .ods | 0.1404 | 0.50 | 5.75 | 0.5296 | 0.50 | 5.75 | 0.1194 | 0.50 | 0.50 | 0.2543 | 0.50 | 0.50 |
Gnumeric | 3.4176 | 1.00 | 38.75 | 5.2841 | 1.00 | 38.75 | 0.0077 | 0.75 | 0.75 | 3.1676 | 0.75 | 0.75 |
Testing was done against a relatively small spreadsheet, comprising 2 worksheets, each with 16370 rows by 9 columns.
While the main load() code won't be as performant as these methods, I hope that the peak memory savings avoiding loading the file itself into memory will be every bit as good.
I'm still hoping that I can get at least the Excel2007 Reader converted to working with this method in time for the 1.7.9 release.