ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly.
Apache Iceberg is a high-performance, open-source format for large analytic tables. It allows the use of SQL tables for big data, enabling engines like Spark, Trino, Flink, Presto, and Hive to ...