- Excel_Performant Reader via Interfaces for Memory Efficiency.
- Without using any external libraries.
- Optimised for Range extraction.

- Yet another Excel reader ?,
- Starting with .Net 8 as the performant Runtime (See Benchmarks)
- .Net9 gives an extra 5% boost,
- .Net10 Another 5% over .Net9 ;-)
Lets take each of the above elements and explain:
- Open Large 2007 (Onwards) XLSX file formats and XLSB (BIFF12) in V3.##
- Zip Deflate format Only
- Try to be as fast as possible, i.e.
- Forward only Lazy loading
- Only "Quick" decipher / convert of the cell(s) types to ease GC pressure
- No attempt at "creating / using" datatables with headers etc.
- Use
IEnumerables with initial offset starts (Row / Column) - Allow
CancellationTokens to be used to allow page transitioning cancellation (More on this later)
- Now the fastest in Real world usage 2025-11-19 onwards
- Q: There are others that are faster
- A: Agreed, but then
- They do not have range extraction.
- Or optionally allow the use of the OS's TempFile System to store massive sheets
- Or re-use of already extracted (massive) sheets
- Or allow multiple sheets to be read at the same time
- because others use global memory to represent the current row
- Or have a single access into the Zip Excel file
- Read only
- Therefore no calculations or updates to formula calls
- Will use the DotNet core functionality by default
- But, if your target deployment allows for the use of native performant binaries, then via the use of interfaces these will be pluggable
- i.e. Using
Zlib.Netfor getting the data streams out of the compressed Excel file faster. (OrSharpZipLib/PowerPlayZipper) - A faster / slimmer implementation for xml stream reading (i.e. TurboXml)
- i.e. Using
- Allow the implementation of different source files (i.e. XLSB)
- Q: Why?
- A: As mentioned above, this is to allow a developer to replace with external nugets that might perform better XML speed etc.
- The reason for this project, is to handle very large XSLX files (i.e. > 500K rows with > 180 columns per sheet, with multiple sheets of this size)
- For
ETLvalidation scenarios, i.e. make sure that the user modified data that has been transferred has interaction rules applied, before moving onto theTandLstages - Try not to hit / store in the LOH
- No internal .Net memory of previously loaded sheets / rows.
- Q: It appears that this uses more memory than other implementations
- A: Currently yes, but it is being optimised for Range Extraction,
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via
ToList()call); - AND to allow multiple sheets to be read at the same time (Unlike some to of the others that use a single global memory to represent a row)
- And it appears that the current benchmarks do not extract unless a
ToStringand a check on the result is used (Otherwise the Jit removes the unassigned dead code) - And, the memory used will actually be used in the ETL pipeline anyway, so it's just being truthful
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via
- As hinted by the above statements, this is to be targetted at memory restricted environments (i.e. ASP Net VM's)
- Use the OS's "Temp File" caching, so if the memory is tight then the Owner app will not have to worry about OOM exceptions, or having to use Swap Disk speeds.
- Only unzip the sheet(s) when they are asked for
- Only load the shared strings upto the current request number
- Q: Sometimes the
Asyncawait s add too much overhead - A: true, that is why there are also the equivalent base interfaces that perform the same functionality without the need for the
async awaitoverheads.
- This is to allow the Large files to be Aborted
- Make "Most" of the "Net Cores'" API's Asynchronous
Tasks
- Got to tidy up those
Temp Files, and release theFileStream's
- CellValue instances are returned to users
- They must be thread-safe (multiple readers possible)
- Each "Cell Type" / "Cell Instance" / " Row Instance"(string, numeric, boolean, datetime, error) have different lifecycle requirements
- It will Not be same sheet Instance thread safe, because the xml reader will be locked (Forward only) to the sheet in use.
- but you can Open the sheet more than once, and have different threads running over it,
- And you can have Parallel threads access the Excel file
- Just remember to set
Options{ AccessExcelFileInForwardOnlyMode = false}
- i.e. Ones that contain formulas:
<definedName name="Prices">OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),1)</definedName>
- A POCO / Type populator (Extensions can be written for that later)
- Totally beyond the scope of this project remit
| Badge π | Area |
|---|---|
| Release build and tests |
- β Setup this github
- β Create the main project
- β Add Unit Test project
- β Add simple Test Data
- β
Use Net Core Interface(s)
- β
Use
ZipArchive - β
Use
XDocument
- β
Use
- β
Implement Open / Dispose (Async)
- β Sheet Names
- β Shared Strings
- β
Implement Sheet loading (unzip and be ready for use)
- β
Use
XDocumentas POC only
- β
Use
- β
Implement Row extraction
- β Skip
- β Delayed read - until a cell is actually needed
- β Deal with Null / Empty cells (Utilise sparse array?)
- β
Keep last used offset (i.e. no need to reload sheet if the next range API
startRowcall is later)
- β
Benchmarks
- β Add Other "Excel readers" to the Benchmark project(s)
- π Now With
Sylvan.Data.Excel - π Now With
XlsxHelper
- β More UnitTests
- β
Add
IEnumerables and benchmark - β
Implement
XmlReader.Createfor - β
More Benchmarks
- Now With
FastExcel - β Some Profiling Enahancements
- Now With
- β Better Storage of the SharedStrings
- β Cell object type π
- β
Use internal
ZipEntryrented buffer - β Investigation into the smallest function πͺ
- β
Optimise for
CellConversion.Noneπͺ - β Parallel Sheet threads Access
- β
Nuget
- β Beta etc.
- π Released as Nuget V1.yyMM.dd ->
1.2511.14
- β
Add
IEnumerables All the way downβ€΅οΈ - β
Nuget
- β Manual workflow deploy Release
- β Manual workflow deploy Beta
- β Read "definedName"s (Ranges / Cell / Value / Dynamic) π
- β Deal with blank rows in a sheet π
- β Deal with Empty cells in a row π
- β Implement Sheet scoping of "definedName"s
- β Implement Row extraction π
- β Implement RangeExtraction π²
- β
Add Benchmarks for "Excel readers" That perform Range Extraction
- β
ClosedXMLVersion="0.105.0" - β
EPPlus_LPGLVersion="4.5.3.13" β οΈ FastExcelVersion="3.0.13" -> Fails on Range Extraction- β
FreeSpire.XLSVersion="14.2.0" - β
Aspose.CellsVersion="25.11.0" β οΈ Extend benchmarks to cover the other large file types- It appears that most of the others do not like the
pivot-tablesfile.!! π€― - Performance on 2025-11-28
- It appears that most of the others do not like the
- β
- β
Investigate memory usage(s) π§βπ»
- β Sacrificed a little speed β‘οΈ Performance on 2025-12-07
- β Release as Nuget V2.2512-10 π¨
- βοΈβπ₯ Breaking Change(s)
FileTypehas been removed, and Open via the Public class typeIXmlReaderHelpershas becomeIOpenXmlReaderHelpers, with slightly different methodsIXmlWorkBookReaderhas becomeIOpenXmlWorkBookReaderIXmlSheetReaderhas becomeIOpenXmlSheetReader- Removal of the Conversion options
Number### - Changed
GetAllCellsto returnIReadOnlyList<ICell?>?- Watch out for those null rows !
- β
Branch and beta yml
- β Convert test data in xlsb format
- β
Implement Open / Dispose (Async)
- β Sheet Names
- β Shared Strings
- β Implement Sheet loading
- β
Implement Row extraction
- β Skip
- β Delayed read - until a cell is actually needed
- β Deal with Null / Empty cells
- β Cell object type π
- β
Benchmarks π²οΈ
- β Add "Excel readers" That support XLSB Extraction
- β πΆββ‘οΈ 1st Pass Performance on 2025-12-20
- β π 2nd Pass Performance on 2025-12-21
- β
Read "definedName"s (Ranges / Cell / Value / Dynamic) π
- β Read from global
- β
Strongly-typed accessors (
AsInt32,AsDateTime, etc)- Slightly slower, but less memory pressure for
xslb - β 2026-01-02
- Slightly slower, but less memory pressure for
- β
Parallel Sheet threads Access
- β Multiple times (with locking)
- β
Release as Nuget V3.yyMM.dd
- π Released RC1 as Nuget
V3.2601.04-RC1
- π Released RC1 as Nuget
- β
Investigate Performance and edge cases, then Release as Stable
- π Big Performance improvements 2026-01-11
- π Released V3 as Nuget
V3.2601.11
- Remove some
AggressiveOptimizationand allowi-cacheto do its job - Implement "Hot-Paths" for cell type access
- Reduce some memory allocations for ReadOnly CellCollections
- βοΈβπ₯ Breaking Change(s)
- Removal of
GetSheetFileName(int offsetSheetId); - Removal of
GetDefinedRangeviaint sheetId - Removal of
Indexproperty fromISheet - Internal Creation of WorkBooks
- Internal implementation of
IOpenXmlWorkBookReader::GetSheetNamesnow returns the relative path to the "Sheet Name" CellValueis now aclass, therefore no need to use.ValueICell.CellValueis now nullable
- Removal of
- β
Cell object type π
- β
"Best Effort"
Operatorbased conversion - β
TryGet
Typewill returnout type, if stored as that type. - β Unit Tests
- β
"Best Effort"
- β
Performance
- β
Use
ValueTaskand reduce memory allocations in some hot paths - π Fix fallout from making
CellValueis now aclass - β
ArrayPoolsupport has been added to ThreadStringBuilderPool using ArrayPool. - β
Release-specific optimizations added
- β EnableTrimAnalyzer: true
- β TieredCompilation: true
- β TieredCompilationQuickJit: true
- β TieredCompilationQuickJitForLoops: true
- β
Use
- β
Implement
System.DBNullreturn option, for empty cells- β
Implement
INullRowreturn option, for empty rows - β
Update tests to use
INullRowdetection
- β
Implement
- β
Implement
GetCell###(string columnLetters, ...)#8 - π 2026-05-05
- Return null when
EndValueis spotted in the xml #22 - Return null for not found SheetId #23
- Changed CellValue to an abstract base class and removed all [FieldOffset(0)] fields.
- Introduced internal sealed class CellValue : CellValue to store values of type T without boxing for value types.
- βοΈβπ₯ Breaking Change(s)
- Cell base type will resolve to decimal first before attempting double. #20
CellandCellValueconverted fromclasstoreadonly structto eliminate object-per-cell overhead.IRow,ISheet, andIExcel_PRIMEinterfaces updated to return structs directly, avoiding boxing.GetAllCellsnow returnsArraySegment<Cell>to provide zero-allocation access to pooled cell arrays.
- Memory & Allocation Optimizations
- β Significant reduction in memory allocations (up to 4x) for large files by eliminating object-per-cell overhead.
- β
Implemented
ArrayPool<Cell>for row-level cell storage. - β
Optimized
Cellstruct layout to exactly 32 bytes (half a cache line) for improved performance.
- Performance Improvements
- β
Optimized numeric parsing: restored custom
TryDecimalParsepriority while maintainingdoublestorage for non-integers. - β
Reduced async overhead by using
ValueTask<Cell>in sheet reading loops. - β
Improved parallel throughput by making
SharedStringloaders thread-safe at the instance level rather than static. - β
Added
AggressiveInliningandAggressiveOptimizationto all high-frequency methods. - β IAsyncEnumerable stream processing
- β
Optimized numeric parsing: restored custom
- β
Cell object type π
- β Store cell style type (see Options enum)
- β Unit Tests
- β
Implement reading of the styles to determine the default
DateTime/DateOnly/TimeOnlyformats #19 - Fixup
Ecma376StandardProvider - Fix the
StylesExtractor's - Add more
cellStyles - Make the CellVaule default to the lowest type rather than sticking with
decimal
- Code-Level Optimizations
- β
Implement
ISpanFormattablein CellValue - β Optimisationm in the Xlsb workflow
- β Return to the usage of the FieldOffsets to store the BCL type to prevent boxings in the hot paths
- β Usage of the Fast convertors i.e.Our ToDecimal is 3 times faster than Convert.ToDecimal #20
- β
Implement
- Advanced Scenarios
- β
Enable
PublishTrimmed=truewith trim warnings resolved - β Native AOT compilation testing
- β
Enable
- Bug Fixes
- Implement reading of the styles to determine the default
DateTime/DateOnly/TimeOnlyformats #19 AsDecimalmethod had an issue where it produces incorrect precision, but only with default options #20- When Attempting to use the "SkipRows" on a a sheet that has null rows to start with, causes infinite loop #27
- When opening the source file, then use "Sharing Mode" to allow it to be opened by other things! (i.e. 2 instances of this !) #28
- Update BugTesting
- Resolved several compiler warnings and potential resource leaks.
- Excel itself treats named ranges in a case-insensitive manner #34
- Implement reading of the styles to determine the default
- βοΈβπ₯ Breaking Change(s)
- None yet.
- Excercise the Implementation of Interfaces for other Libs (Xml / Zip)
- Separate Nuget(s) ?
- Benchmarks
- e.g. search isages of
Class PoolingArrayBufferWriter<T> - [ ]
- e.g. search isages of
-
Investigate a different way of storing the Shared strings to the Filesystem, when they are in the MB's
- e.g. Search for
Class FileBufferingWriter
- e.g. Search for
-
Investigate possibility of using "Pipelining" to get data for Next row / cell population after yield?
- Locking
- How to deal with rows that are completely blank
-
fibres?
-
Indicate that things may be
HiddenπΊ- Sheet
- Row
- Column
- Cell ?
-
Indicate that things may be
Readonly- Sheet
- Row
- Column
- Cell ?
-
More ideas to be added later, Please suggest... ;-)