理解MPP(Massively Parallel Processing) in database

192 阅读 0 评论 127 点赞

我是靠谱客的博主专注雪碧，这篇文章主要介绍理解MPP(Massively Parallel Processing) in database，现在分享给大家，希望可以做个参考。

Overview

While storage and computing power have come long a way in the last several decades, the unfortunate reality is that they haven’t kept up with modern data storage and analysis needs.

MPP databases solve this problem by allotting the required processing power onto several different nodes to most efficiently analyze large datasets.

MPP databases are usually columar, which allows analytical queries to be processed faster;
Massively Parallel

Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously ferform a set of coordinated computations in paralle.
Massively Parallel Processing

Massively Parallel Processing (MPP) is a storage structure designed to handle the coordinated processing of program operations by multiple processors.

This coordinated processing can work on different parts of a program, with each processor using its own operating system and memory. This allows MPP databases to handle massive amounts of data and provide much faster analytics based on large datasets.

MPP splitting up simple but large tasks into multiple buckets and getting those buckets processed at the same time will be much faster than one person working alone, no matter how skilled that person is.

SImply put, an MPP database is a type of database or data warehouse where the data and processing power are split up among several different nodes (servers), with one leader node and one or many compute nodes.

MPP databases can scale horizontally by adding more compute nodes, rather than having to worry about upgrading to more and more expensive individual servers (scalling vertically);
One Leader Node

Leader node tell all the other nodes what to do and sorting the final tally;
Many Computer Nodes

Compute nodes are dealing with all the data, running the queries and counting up the words;
Approachs of MPP

There are several types of MPP database architectures, each with their own benefits:
- Grid computing
  
  The processing power of many computers in disributed, diverse administrative domains is opportunistically used whenever a computer is available.
  
  Use mutiple computers in distributed networks. This type of architecture uses use resources opportunistically based on their avaiablity. This architecture reduces costs for server space, but also limits bandwidth and capacity at peak times or when there are too many requests;
- Computer clustering
  
  Links the avaiable power into nodes that can connect with each other to handle multiple tasks at once;
计算机微观层面的Massively Parallel (Processor)

Graphics cards, containing multiple Graphic Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors.
Summary

看一圈下来，MPP有两个意思，一个是Massively Parallel Processing,一个是Massively Parallel Processor;

前者是针对database而言的一种业务逻辑层面的架构；

后者是计算机硬件层面的一种组合架构；

而Massively Parallel是词源，表示对于大量数据的并行处理，是一种scalling horizontally；
References

What Is Massively Parallel Processing (MPP)
What is an MPP Database? Intro to Massively Parallel Processing
MPP Architecture in database
Many SQL databases designed for large data volumes are built on column-store and massively parallel processing (MPP) architectures.