在Ubuntu上安装Hadoop

100 阅读 0 评论 66 点赞

我是靠谱客的博主魁梧小丸子，这篇文章主要介绍在Ubuntu上安装Hadoop，现在分享给大家，希望可以做个参考。

In this lesson, we will see how we can get started with Apache Hadoop by installing it on our Ubuntu machine. Installing and running Apache Hadoop can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.

在本课程中，我们将了解如何通过在Ubuntu计算机上安装Apache Hadoop来入门。安装和运行Apache Hadoop可能很棘手，这就是为什么我们将尽量简化本课并提供更多信息的原因。

In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

在此安装指南中，我们将使用Ubuntu 17.10（GNU / Linux 4.13.0-37-generic x86_64）计算机：

ubuntu machine version

Ubuntu Version

Ubuntu版本

Also, if you just want quickly explore Hadoop, read CloudEra Hadoop VMWare Single Node Environment Setup.

另外，如果您只是想快速探索Hadoop，请阅读CloudEra Hadoop VMWare单节点环境设置。

在Ubuntu上安装Hadoop的前提条件 (Prerequisite for Installing Hadoop on Ubuntu)

Before we can start installing Hadoop, we need to update Ubuntu with the latest software patches available:

在开始安装Hadoop之前，我们需要使用可用的最新软件补丁更新Ubuntu：

复制代码

1
2
sudo apt-get update && sudo apt-get -y dist-upgrade

Next, we need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:

接下来，我们需要在计算机上安装Java，因为Java是运行Hadoop的主要前提条件。 Hadoop支持Java 6及更高版本。让我们为此课程安装Java 8：

复制代码

1
2
sudo apt-get -y install openjdk-8-jdk-headless

To install Hadoop, make a directory and move inside it:

要安装Hadoop，请创建一个目录并在其中移动：

复制代码

1
2
mkdir jd-hadoop && cd jd-hadoop

在Ubuntu上安装Hadoop (Installing Hadoop on Ubuntu)

Now that we’re ready with the basic setup for Hadoop on our Ubuntu machine, let’s download Hadoop installation files so that we can work on its configuration as well:

现在我们已经准备好在我们的Ubuntu机器上进行Hadoop的基本设置，让我们下载Hadoop安装文件，以便我们也可以对其进行配置：

复制代码

1
2
wget https://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.0.1/hadoop-3.0.1.tar.gz

We’re going to use the Hadoop 3.0.1 version for Hadoop. Find the latest version for Hadoop here. Once the file is downloaded, run the following command to unzip the file:

我们将针对Hadoop使用Hadoop 3.0.1版本。在此处找到Hadoop的最新版本。下载文件后，运行以下命令解压缩文件：

复制代码

1
2
tar xvzf hadoop-3.0.1.tar.gz

This might take few moments as the archive is big in size. At this moment, Hadoop should be unarchived in your current directory:

由于归档文件很大，因此可能需要一些时间。此时，Hadoop应该在当前目录中取消归档：

hadoop installer unarchived

Hadoop Unarchived

Hadoop未存档

添加Hadoop用户帐户 (Adding Hadoop user account)

We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:

我们将在计算机上创建一个单独的Hadoop用户，以使HDFS与原始文件系统保持独立。我们首先可以在我们的机器上创建一个用户组：

复制代码

1
2
addgroup hadoop

You should see something like this:

您应该会看到以下内容：

adding hadoop user on ubuntu

Ubuntu Adding User Group

Ubuntu添加用户组

Now we can add a new user to this group:

现在我们可以向该组添加新用户：

复制代码

1
2
useradd -G hadoop jdhadoopuser

Note that I am running all commands as a root user. Now, we have a user called jdhadoopuser in the hadoop group.

请注意，我以root用户身份运行所有命令。现在，在hadoop组中有一个名为jdhadoopuser的用户。

Finally, we’ll provide root access to jdhadoopuser user. To do this, open the /etc/sudoers file with this command:

最后，我们将为jdhadoopuser用户提供root访问权限。为此，请使用以下命令打开/etc/sudoers文件：

复制代码

1
2
sudo visudo

Now, enter this as the last line in the file:

现在，将其作为文件的最后一行输入：

复制代码

1
2
jdhadoopuser ALL=(ALL) ALL

As of now, file should look like this:

到目前为止，文件应如下所示：

adding hadoop user to sudo users

Making root user

成为root用户

Hadoop单节点设置：独立模式 (Hadoop Single Node Setup: Standalone Mode)

Hadoop on a Single Node means that Hadoop will run as a single Java process. This mode is usually used only in debugging environments and not for production use. With this mode, we can run simple Map R programs which process a smaller amount of data.

单个节点上的Hadoop意味着Hadoop将作为单个Java进程运行。此模式通常仅在调试环境中使用，而不用于生产环境。使用这种模式，我们可以运行简单的Map R程序，以处理少量数据。

Rename the hadoop archive as currently present to hadoop only:

将hadoop存档重命名为当前存在，仅用于hadoop ：

复制代码

1
2
mv /root/jd-hadoop/hadoop-3.0.1 /root/jd-hadoop/hadoop

Now, provide ownership of this directory to the jdhadoopuser.

现在，将此目录的所有权提供给jdhadoopuser 。

复制代码

1
2
chown -R jdhadoopuser:hadoop /root/jd-hadoop/hadoop

A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:

对于Hadoop来说，更好的位置是/ usr / local /目录，所以我们将其移到那里：

复制代码

1
2
3
mv hadoop /usr/local/
cd /usr/local/

Now, edit the .bashrc file to add Hadoop and Java to path using this command:

现在，使用以下命令编辑.bashrc文件以将Hadoop和Java添加到路径：

复制代码

1
2
vi ~/.bashrc

Add these lines to the end of the .bashrc file:

将这些行添加到.bashrc文件的末尾：

复制代码

1
2
3
4
5
6
# Configure Hadoop and Java Home
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export PATH=$PATH:$HADOOP_HOME/bin

Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh file. In separate Hadoop installations, the location of this file can be different. To find where this file is, run the following command right outside the hadoop directory:

现在，是时候告诉Hadoop Java在哪里了。我们可以通过在hadoop-env.sh文件中提供此路径来做到这一点。在单独的Hadoop安装中，此文件的位置可以不同。要查找此文件的位置，请在hadoop目录外部运行以下命令：

复制代码

1
2
find hadoop/ -name hadoop-env.sh

When I visit the directory I am shown, I can see the needed file present there:

当我访问显示的目录时，可以在此处看到所需的文件：

hadoop env file

Hadoop Env file

Hadoop Env文件

Now, edit the file:
现在，编辑文件：

复制代码

1
2
vi hadoop-env.sh

On the last line, enter the following and save it:

在最后一行，输入以下内容并保存：

复制代码

1
2
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

在Ubuntu上测试Hadoop安装 (Testing Hadoop Installation on Ubuntu)

We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR. Just execute the following command:

我们可以通过执行一个示例应用程序来测试Hadoop的安装，该示例应用程序是Hadoop（单词计数器示例JAR）预先制成的。只需执行以下命令：

复制代码

1
2
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.1.jar wordcount /usr/local/hadoop/README.txt /root/jd-hadoop/Output

Once you execute the following command, we see the file part-r-00000 as an output:

执行以下命令后，我们将文件part-r-00000视为输出：

hadoop install on ubuntu and running simple program

Output file

输出文件

If you want, you can see the content of this file with following command:
如果需要，可以使用以下命令查看此文件的内容：

复制代码

1
2
cat part-r-00000

Now that this example ran, this means that Hadoop has been successfully installed on your system!

现在已经运行了该示例，这意味着Hadoop已成功安装在您的系统上！

结论 (Conclusion)

In this lesson, we saw how we can install Apache Hadoop on an Ubuntu server and start executing sample programs with it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.

在本课程中，我们了解了如何在Ubuntu服务器上安装Apache Hadoop并开始使用它执行示例程序。阅读更多有关大数据的文章，以更深入地了解可用的大数据工具和处理框架。