【菜鸟系列】hbase（java）接口--基于hbase1.1.2

345 阅读 0 评论 228 点赞

我是靠谱客的博主繁荣荷花，这篇文章主要介绍【菜鸟系列】hbase（java）接口--基于hbase1.1.2，现在分享给大家，希望可以做个参考。

更多hbase简介：请查看hbase入门系列
传送门：https://blog.csdn.net/java_soldier/article/details/80708346
最近集群升级，开启kerberos认证，所有的应用都要改造，所以复习了下hbase的接口操作，代码见下方

先讲解下主要的接口类

HBaseConfiguration

org.apache.hadoop.hbase.HBaseConfiguration
Adds HBase configuration files to a Configuration

我们一般通过来获取configuration ，然后在set一些参数，比如zk的地址，端口，是否启用kerberos认证等

Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.quorum","bd36,bd37,bd38,bd66,bd67");
configuration.set("hbase.zookeeper.property.clientPort","2181");
configuration.set("zookeeper.znode.parent", "/hbase-unsecure");

Connection

org.apache.hadoop.hbase.client.Connection
A cluster connection encapsulating lower level individual connections to actual servers and a connection to zookeeper. Connections are instantiated through the ConnectionFactory class. The lifecycle of the connection is managed by the caller, who has to close() the connection to release the resources.
The connection object contains logic to find the master, locate regions out on the cluster, keeps a cache of locations and then knows how to re-calibrate after they move. The individual connections to servers, meta cache, zookeeper connection, etc are all shared by the Table and Admin instances obtained from this connection.
Connection creation is a heavy-weight operation. Connection implementations are thread-safe, so that the client can create a connection once, and share it with different threads. Table and Admin instances, on the other hand, are light-weight and are not thread-safe. Typically, a single connection per client application is instantiated and every thread will obtain its own Table instance. Caching or pooling of Table and Admin is not recommended.

官网给的太复杂，总结起来一句话：用来获取和hbase的连接，这里同样采用了工厂模式ConnectionFactory

connection = ConnectionFactory.createConnection(configuration);

Admin

org.apache.hadoop.hbase.client.Admin
Admin can be used to create, drop, list, enable and disable and otherwise modify tables, as well as perform other administrative operations.
Since:0.99.0

这个类主要用来创建表，删除表，启用禁用表等操作的接口类，hbase之前有个过期的方法叫HBaseAdmin，推荐用最新的，我们该如何获取Admin类呢？

Admin admin = connection.getAdmin();

TableName

org.apache.hadoop.hbase.TableName

这个类就是描述表名称的接口类，也就是把我们的字符串（表名）转换为hbase认识的样子

TableName tname = TableName.valueOf(tablename);

HTableDescriptor

org.apache.hadoop.hbase.HTableDescriptor
HTableDescriptor contains the details about an HBase table such as the descriptors of all the column families, is the table a catalog table, hbase:meta , if the table is read only, the maximum size of the memstore, when the region split should occur, coprocessors associated with it etc…
但是这个要过期了
Deprecated.
As of release 2.0.0, this will be removed in HBase 3.0.0. Use TableDescriptorBuilder to build HTableDescriptor.

这个是表描述信息的接口类

HTableDescriptor tDescriptor = new HTableDescriptor(tname);

HColumnDescriptor

org.apache.hadoop.hbase.HColumnDescriptor
An HColumnDescriptor contains information about a column family such as the number of versions, compression settings, etc. It is used as input when creating a table or adding a column.

这个是列簇的描述信息类，比如版本，压缩方式，添加一个列的时候会使用

HColumnDescriptor famliy = new HColumnDescriptor(cf);

Put

org.apache.hadoop.hbase.client.Put
Used to perform Put operations for a single row.
To perform a Put, instantiate a Put object with the row to insert to, and for each column to be inserted, execute add or add if setting the timestamp.

添加数据的时候，可以选择批量添加，还是单条添加，如果是批量添加需要创建一个List,将Put对象放入

Table table = connection.getTable(tableName);
List<Put> batPut = new ArrayList<Put>();
Put put = new Put(Bytes.toBytes("rowkey_"+i));
//插入的rowkey
put.addColumn(Bytes.toBytes("i"), Bytes.toBytes("username"), Bytes.toBytes("un_"+i)); //列簇，列，值
batPut.add(put)
table.put(batPut)

Get

org.apache.hadoop.hbase.client.Get
Used to perform Get operations on a single row.
To get everything for a row, instantiate a Get object with the row to get. To further narrow the scope of what to Get, use the methods below.
To get all columns from specific families, execute addFamily for each family to retrieve.
To get specific columns, execute addColumn for each column to retrieve.
To only retrieve columns within a specific range of version timestamps, execute setTimeRange.
To only retrieve columns with a specific timestamp, execute setTimestamp.
To limit the number of versions of each column to be returned, execute setMaxVersions.
To add a filter, call setFilter.

Get用于封装我们的请求参数，如rowkey，过滤器等

List<Get> gets = new ArrayList<Get>(); //批量封装请求信息
Get get = new Get(Bytes.toBytes("rowkey_"+i)); //查询的rowkey
gets.add(get);
Result[] results = table.get(gets);
//通过Result[]接收数据

Result

org.apache.hadoop.hbase.client.Result
Single row result of a Get or Scan query.
This class is NOT THREAD SAFE.
Convenience methods are available that return various Map structures and values directly.
To get a complete mapping of all cells in the Result, which can include multiple families and multiple versions, use getMap().
To get a mapping of each family to its columns (qualifiers and values), including only the latest version of each, use getNoVersionMap(). To get a mapping of qualifiers to latest values for an individual family use getFamilyMap(byte[]).
To get the latest value for a specific family and qualifier use getValue(byte[], byte[]). A Result is backed by an array of Cell objects, each representing an HBase cell defined by the row, family, qualifier, timestamp, and value.
The underlying Cell objects can be accessed through the method listCells(). This will create a List from the internal Cell []. Better is to exploit the fact that a new Result instance is a primed CellScanner; just call advance() and current() to iterate over Cells as you would any CellScanner. Call cellScanner() to reset should you need to iterate the same Result over again (CellScanners are one-shot). If you need to overwrite a Result with another Result instance – as in the old ‘mapred’ RecordReader next invocations – then create an empty Result with the null constructor and in then use copyFrom(Result)

非线程安全的类，用于封装hbase返回的结果集

Result[] results = table.get(gets);

CellScanner

org.apache.hadoop.hbase.CellScanner

while(cellScanner.advance()){
Cell cell = cellScanner.current();
//从单元格cell中把数据获取并输出
//使用 CellUtil工具类，从cell中把数据获取出来
String famliy = Bytes.toString(CellUtil.cloneFamily(cell));
String qualify = Bytes.toString(CellUtil.cloneQualifier(cell));
String rowkey = Bytes.toString(CellUtil.cloneRow(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("rowkey:"+rowkey+",columnfamily:"+famliy+",qualify:"+qualify+",value:"+value);
}
}

Cell

org.apache.hadoop.hbase.Cell
The unit of storage in HBase consisting of the following fields:
1) row
2) column family
3) column qualifier
4) timestamp
5) type
6) MVCC version
7) value

就是结果集的最小单元。

下面是完整的代码

package com.jiangtao.asiainfo;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.NavigableMap;
import java.util.Random;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellScanner;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;
/**
* hbase操作 创建表
* 1.通过HBaseConfiguration.create()
：获取配置 conf
* 2.conf.set() ：设置zk等参数（kerberos认证等）
* 3.ConnectionFactory.createConnection(configuration)
：获取连接conn
* 4.通过conn.getAdmin()来获取Admin
：表相关操作的类 (HBaseAdmin已过期)
* 5.创建TableName：描述表名称的 ： TableName tname = TableName.valueOf(tablename);
* 6.创建表描述信息类: HTableDescriptor tDescriptor = new HTableDescriptor(tname);
* 7.添加表列簇描述信息类：HColumnDescriptor famliy = new HColumnDescriptor(cf);
* 8.将表列簇描述信息类添加到表描述信息类：tDescriptor.addFamily(famliy);
* 9.调用admin创建表：admin.createTable(tDescriptor);
*
* hbase操作 添加数据
* 1.通过HBaseConfiguration.create()
：获取配置 conf
* 2.conf.set() ：设置zk等参数（kerberos认证等）
* 3.ConnectionFactory.createConnection(configuration)
：获取连接conn
* 4.创建TableName：描述表名称的 ： TableName tname = TableName.valueOf(tablename);
* 5.通过conn连接获得表 对象 ：Table table = connection.getTable(tableName);
* 6.1.单挑插入table.put(Put)
* 6.2.批量插入数据，先用list封装put对象：List<Put> batPut = new ArrayList<Put>();
*
Put put = new Put(Bytes.toBytes("rowkey_"+i));
//插入的rowkey
*
put.addColumn(Bytes.toBytes("i"), Bytes.toBytes("username"), Bytes.toBytes("un_"+i)); //列簇，列，值
*
batPut.add(put)
*
table.put(batPut)
*
*
* hbase操作 获取数据
* 1.通过HBaseConfiguration.create()
：获取配置 conf
* 2.conf.set() ：设置zk等参数（kerberos认证等）
* 3.ConnectionFactory.createConnection(configuration)
：获取连接conn
* 4.创建TableName：描述表名称的 ： TableName tname = TableName.valueOf(tablename);
* 5.通过conn连接获得表 对象 ：Table table = connection.getTable(tableName);
* 6.List<Get> gets = new ArrayList<Get>(); //批量封装请求信息
*
Get get = new Get(Bytes.toBytes("rowkey_"+i)); //查询的rowkey
*
gets.add(get);
* 7.Result[] results = table.get(gets);
//通过Result[]接收数据
* 8.使用CellScanner cellScanner = result.cellScanner(); 获取cell
* while(cellScanner.advance()){
Cell cell = cellScanner.current();
//从单元格cell中把数据获取并输出
//使用 CellUtil工具类，从cell中把数据获取出来
String famliy = Bytes.toString(CellUtil.cloneFamily(cell));
String qualify = Bytes.toString(CellUtil.cloneQualifier(cell));
String rowkey = Bytes.toString(CellUtil.cloneRow(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("rowkey:"+rowkey+",columnfamily:"+famliy+",qualify:"+qualify+",value:"+value);
}
* @author jiangtao
*
*/
public class HbaseTest {
public Connection connection;
//用hbaseconfiguration初始化配置信息时会自动加载当前应用classpath下的hbase-site.xml
public static Configuration configuration = HBaseConfiguration.create();
public Table table;
public Admin admin;
public HBaseAdmin ad;
public HbaseTest() throws Exception{
//ad = new HBaseAdmin(configuration); //过期了，推荐使用Admin
configuration.set("hbase.zookeeper.quorum","bd36,bd37,bd38,bd66,bd67");
configuration.set("hbase.zookeeper.property.clientPort","2181");
configuration.set("zookeeper.znode.parent", "/hbase-unsecure");
//对connection初始化
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}
//创建表
public void createTable(String tablename,String... cf1) throws Exception{
//获取admin对象
Admin admin = connection.getAdmin();
//创建tablename对象描述表的名称信息
TableName tname = TableName.valueOf(tablename);//bd17:mytable
//创建HTableDescriptor对象，描述表信息
HTableDescriptor tDescriptor = new HTableDescriptor(tname);
//判断是否表已存在
if(admin.tableExists(tname)){
System.out.println("表"+tablename+"已存在");
return;
}
//添加表列簇信息
for(String cf:cf1){
HColumnDescriptor famliy = new HColumnDescriptor(cf);
tDescriptor.addFamily(famliy);
}
//调用admin的createtable方法创建表
admin.createTable(tDescriptor);
System.out.println("表"+tablename+"创建成功");
}
//删除表
public void deleteTable(String tablename) throws Exception{
Admin admin = connection.getAdmin();
TableName tName = TableName.valueOf(tablename);
if(admin.tableExists(tName)){
admin.disableTable(tName);
admin.deleteTable(tName);
System.out.println("删除表"+tablename+"成功！");
}else{
System.out.println("表"+tablename+"不存在。");
}
}
//新增数据到表里面Put
public void putData(String table_name) throws Exception{
TableName tableName = TableName.valueOf(table_name);
Table table = connection.getTable(tableName);
Random random = new Random();
List<Put> batPut = new ArrayList<Put>();
for(int i=0;i<10;i++){
//构建put的参数是rowkey
rowkey_i (Bytes工具类，各种java基础数据类型和字节数组之间的相互转换)
Put put = new Put(Bytes.toBytes("rowkey_"+i));
put.addColumn(Bytes.toBytes("i"), Bytes.toBytes("username"), Bytes.toBytes("un_"+i));
put.addColumn(Bytes.toBytes("i"), Bytes.toBytes("age"), Bytes.toBytes(random.nextInt(50)+1));
put.addColumn(Bytes.toBytes("i"), Bytes.toBytes("birthday"), Bytes.toBytes("20170"+i+"01"));
put.addColumn(Bytes.toBytes("j"), Bytes.toBytes("phone"), Bytes.toBytes("电话_"+i));
put.addColumn(Bytes.toBytes("j"), Bytes.toBytes("email"), Bytes.toBytes("email_"+i));
//单记录put
//
table.put(put);
batPut.add(put);
}
table.put(batPut);
System.out.println("表插入数据成功！");
}
public void getData(String table_Name) throws Exception{
TableName tableName = TableName.valueOf(table_Name);
table = connection.getTable(tableName);
//构建get对象
List<Get> gets = new ArrayList<Get>();
for(int i=0;i<5;i++){
Get get = new Get(Bytes.toBytes("rowkey_"+i));
gets.add(get);
}
Result[] results = table.get(gets);
for(Result result:results){
//一行一行读取数据
//
NavigableMap<byte[],NavigableMap<byte[],NavigableMap<Long,byte[]>>> maps = result.getMap();
//
for(byte[] cf:maps.keySet()){
//
NavigableMap<byte[],NavigableMap<Long,byte[]>> valueWithColumnQualify = maps.get(cf);
//
for(byte[] columnQualify:valueWithColumnQualify.keySet()){
//
NavigableMap<Long,byte[]> valueWithTimeStamp = valueWithColumnQualify.get(columnQualify);
//
for(Long ts:valueWithTimeStamp.keySet()){
//
byte[] value = valueWithTimeStamp.get(ts);
//
System.out.println("rowkey:"+Bytes.toString(result.getRow())+",columnFamliy:"+
//
Bytes.toString(cf)+",comlumnQualify:"+Bytes.toString(columnQualify)+",timestamp:"
//
+new Date(ts)+",value:"+Bytes.toString(value)
//
);
//
}
//
}
//
}
//使用字段名称和列簇名称来获取value值
//
System.out.println("rowkey:"+Bytes.toString(result.getRow())+",columnfamily:i,columnqualify:username,value:"+
//
Bytes.toString(result.getValue(Bytes.toBytes("i"), Bytes.toBytes("username")))
//
);
//
System.out.println("rowkey:"+Bytes.toString(result.getRow())+",columnfamily:i,columnqualify:age,value:"+
//
Bytes.toInt(result.getValue(Bytes.toBytes("i"), Bytes.toBytes("age")))
//
);
//使用cell获取result里面的数据
CellScanner cellScanner = result.cellScanner();
while(cellScanner.advance()){
Cell cell = cellScanner.current();
//从单元格cell中把数据获取并输出
//使用 CellUtil工具类，从cell中把数据获取出来
String famliy = Bytes.toString(CellUtil.cloneFamily(cell));
String qualify = Bytes.toString(CellUtil.cloneQualifier(cell));
String rowkey = Bytes.toString(CellUtil.cloneRow(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("rowkey:"+rowkey+",columnfamily:"+famliy+",qualify:"+qualify+",value:"+value);
}
}
}
//关闭连接
public void cleanUp() throws Exception{
connection.close();
}
public static void main(String[] args) throws Exception {
HbaseTest hbaseTest = new HbaseTest();
hbaseTest.createTable("jiangtao:test", "i","j");
hbaseTest.putData("jiangtao:test");
hbaseTest.getData("jiangtao:test");
hbaseTest.cleanUp();
}
}