Storage Efficiency – Modern Data Warehouses



Storage Efficiency

Advancement in storage management includes integrating with data and analytics tools, AI operations, and AI operation platforms.

It also includes ultra-low latency for power-efficient, high-density edge data center infrastructure for storage.

The use of consumption-based and pay-for-use subscription licenses can replace (capital expenditure) CAPEX spending by (operation expenditure) OPEX, instead spending on hybrid cloud and edge infrastructure with cloud-native benefits.

Cloud vendor offerings reflect that the storage capacity growth trend is positively moving toward consumption-based offerings. We need to reduce time spent and increase ease of IT storage administration, support and maintenance. This points toward using storage as a service (STaaS) and artificial intelligence for IT operations (AIOps). More than 40 percent of on-premises IT storage administration, support, and maintenance will be replaced by storage as a service (STaaS) and artificial intelligence for IT operations (AIOps), which is an increase from less than 10 percent in 2022. The external enterprise storage arrays deployed to support primary storage workloads will adopt nonvolatile memory express over fabric (NVMe-oF), compared with fewer than 10 percent in 2022.

Data compression is an important technique and factor affecting performance and must be considered in a modern data warehouse. Lossless compression is used for financial or health-care data that is sensitive in nature. This data needs to be preserved without loss of its original form; lossless compression uses efficient encoded schemes to represent data.

Lossy compression is the opposite of lossless compression. This is for non-sensitive data that has no need to be preserved in its original form. The data removed is small.

Type of data, compression ratio, frequency, total time required, I/O effort to compress/uncompress, and compression algorithm itself are important factors in considering which type of compression should be used. For example, in I/O-intensive applications like image or text processing, compression provides a 35 percent to 60 percent gain in performance. In some use cases, compression shifts the computation load of data processing from I/O to CPU, but in some cases CPU compute–intensive application performance gains from data compression are negligible. Hadoop clusters are shared resources, hence a diminishing I/O load for one app can increase the bandwidth of other apps as they use the same bandwidth of I/O.

In some use cases, data compression is undesirable and even reduces performance; e.g., custom binary input files. In other cases, such as sequence files and derivative files, compression is always preferable. Compression is also preferable for intermediate files used for the shuffling and sorting of data.

In storage administration, along with its risks and associated costs, maintenance and support can be replaced by AI operations and STaaS (storage as a service) management capabilities. However, storage SME needs to be reskilled for software and AI/ML development initiatives.

In life-cycle management with centralized cloud-native control and data services capabilities, replace quality metrics with metric-based SLA sourcing. Both in-line compression and deduplication can be turned on and off at an individual logical unit number (LUN) level.

Leave a Reply

Your email address will not be published. Required fields are marked *