概述
-
Overview
While storage and computing power have come long a way in the last several decades, the unfortunate reality is that they haven’t kept up with modern data storage and analysis needs.
MPP databases solve this problem by allotting the required processing power onto several different nodes to most efficiently analyze large datasets.
MPP databases are usually columar, which allows analytical queries to be processed faster;
-
Massively Parallel
Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously ferform a set of coordinated computations in paralle.
-
Massively Parallel Processing
Massively Parallel Processing (MPP) is a storage structure designed to handle the coordinated processing of program operations by multiple processors.
This coordinated processing can work on different parts of a program, with each processor using its own operating system and memory. This allows MPP databases to handle massive amounts of data and provide much faster analytics based on large datasets.
MPP splitting up simple but large tasks into multiple buckets and getting those buckets processed at the same time will be much faster than one person working alone, no matter how skilled that person is.
SImply put, an MPP database is a type of database or data warehouse where the data and processing power are split up among several different nodes (servers), with one leader node and one or many compute nodes.
MPP databases can scale horizontally by adding more compute nodes, rather than having to worry about upgrading to more and more expensive individual servers (scalling vertically);
-
One Leader Node
Leader node tell all the other nodes what to do and sorting the final tally;
-
Many Computer Nodes
Compute nodes are dealing with all the data, running the queries and counting up the words;
-
Approachs of MPP
There are several types of MPP database architectures, each with their own benefits:
-
Grid computing
The processing power of many computers in disributed, diverse administrative domains is opportunistically used whenever a computer is available.
Use mutiple computers in distributed networks. This type of architecture uses use resources opportunistically based on their avaiablity. This architecture reduces costs for server space, but also limits bandwidth and capacity at peak times or when there are too many requests;
-
Computer clustering
Links the avaiable power into nodes that can connect with each other to handle multiple tasks at once;
-
-
计算机微观层面的Massively Parallel (Processor)
Graphics cards, containing multiple Graphic Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors.
-
Summary
看一圈下来,MPP有两个意思,一个是Massively Parallel Processing,一个是Massively Parallel Processor;
前者是针对database而言的一种业务逻辑层面的架构;
后者是计算机硬件层面的一种组合架构;
而Massively Parallel是词源,表示对于大量数据的并行处理,是一种scalling horizontally;
-
References
- What Is Massively Parallel Processing (MPP)
- What is an MPP Database? Intro to Massively Parallel Processing
- MPP Architecture in database
- Many SQL databases designed for large data volumes are built on column-store and massively parallel processing (MPP) architectures.
最后
以上就是专注雪碧为你收集整理的理解MPP(Massively Parallel Processing) in database的全部内容,希望文章能够帮你解决理解MPP(Massively Parallel Processing) in database所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复