Apache Arrow - In-Memory Columnar Data Structure
Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works.
Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits:
Arrow isn’t a standalone piece of software but rather a component used to accelerate analytics within a particular system and to allow Arrow-enabled systems to exchange data with low overhead. It is sufficiently flexible to support most complex data models.
Performance Advantage of Columnar In-Memory
Advantages of a Common Data Layer
Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits:
- A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in analytics workloads and permits SIMD optimizations with modern processors. Developers can create very fast algorithms which process Arrow data structures.
- Efficient and fast data interchange between systems without the serialization costs associated with other systems like Thrift, Avro, and Protocol Buffers.
- A flexible structured data model supporting complex types that handles flat tables as well as real-world JSON-like data engineering workloads.
Arrow isn’t a standalone piece of software but rather a component used to accelerate analytics within a particular system and to allow Arrow-enabled systems to exchange data with low overhead. It is sufficiently flexible to support most complex data models.
Performance Advantage of Columnar In-Memory
Advantages of a Common Data Layer
Details:
Comments
Post a Comment