Performance
Performance is about ensuring all the system resources are used optimally in the process of meeting the application’s requirements. Last but not least, the cost of database performance is often measured in terms of how quickly data are fetched from and returned to their respective applications.
Multiple layers that might impact DW performance are:
Physical layer, like disks, RAM, OS, Server processes Data schema layer, like tables, container, data types, volumes, and location Network layer performance, like network speed, etc.
Application layer on-premises, off-premises, or hybrid, size of application Modern data warehouses have the same definition of performance as traditional warehouses. There are three factors of DW performance:
1. Workload of the data warehouse is the combination of number and complexity of online transactions, regular batches, ad-hoc jobs, and configuration of server settings, like frequency of logs, back-ups, etc. Workloads can be predicted with confidence after analyzing historical data. The workload can be planned and optimized depending on the patterns. However, DBAs (database administrators) don’t have historical data in the case of a new server setup. Workload has a major influence on performance.
2. Throughput is the measure of the DW’s ability to carry out the amount of processing involved with the workload, size of server, and type of job. This can impact overall throughput, including I/O, CPU speed, size of RAM, parallel processing capabilities of server, and efficiency of operating system.
3. Contention for system resources. This is the condition in which two or more components of the workload are competing to use the same resources. Common examples are multiple users writing/updating a single row, resulting in long waits. As the contention increases throughput decreases for a traditional DW. A modern DW handles this by choosing scalable cluster architectures to distribute workload and avoid contention. Databases use automatic sharding and replication to improve availability and performance while reducing contention.
There are ways to improve performance by providing indexes. Network speed, increasing number of buffers and buffer size, partitioning, range, and list, database tuning, hashing, and joining eliminate system-wide and application-specific bottlenecks.
There are multiple dimensions of performance: storage, I/O, and processing.