Skip to content

Python: high performance backend #8

@imagovrn

Description

@imagovrn

More Efficient Python Implementation

Current flatdata-py implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:

  • Benchmark two implementations on the same data, to know the gap, monitor the benchmarks in CI. Performance benchmarks #9
  • Optimize pure-python implementation.
  • Introduce parallel processing in pure python implementation (or ease integration with a library that would do it for us, like dask).
  • As an alternative approach, create flatdata-py-ext implementation which would build and use binary extensions to improve performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions