概述
hive底层是通过MR进行计算,将其改变为SparkCore来执行
配置步骤
1.在不是高可用集群的前提下,只需要将Hadoop安装目录中的core-site.xml拷贝到spark的配置conf文件目录下即可
2.将hive安装路径下的hive-site.xml拷贝到spark的配置conf配置文件目录下即可
注意:
若是高可用:需要将hadoop安装路径下的core-site,xml和hdfs-site.xml拷到spark的conf目录下
操作完成后建议重启集群
通过sparksql来操作,需要在spark安装路径中bin目录
启动:
./spark-sql --master spark://hdp-1:7077 --executor-memory 512m --total-executor-cores 2 --jars /root/mysql-connector-java-5.1.39.jar --driver-class-path /root/mysql-connector-java-5.1.39.jar
基本操作:
1.创建表:
create table person1(id int,name string,age int) row format delimited fields terminated by ' ';
加载信息:
19/11/20 18:26:09 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:26:09 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:26:09 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:26:09 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:26:09 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:26:09 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:26:10 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:26:10 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:26:10 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:26:10 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:26:10 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:26:10 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:26:10 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:26:10 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:26:11 INFO metastore.HiveMetaStore: 0: create_table: Table(tableName:person1, dbName:default, owner:root, createTime:1574245564, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:int, comment:null)], location:hdfs://hdp-1:9000/user/hive/warehouse/person1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"id","type":"integer","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"age","type":"integer","nullable":true,"metadata":{}}]}, spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.4.4}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null))
19/11/20 18:26:11 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=create_table: Table(tableName:person1, dbName:default, owner:root, createTime:1574245564, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:int, comment:null)], location:hdfs://hdp-1:9000/user/hive/warehouse/person1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"id","type":"integer","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"age","type":"integer","nullable":true,"metadata":{}}]}, spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.4.4}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null))
19/11/20 18:26:11 WARN metastore.HiveMetaStore: Location: hdfs://hdp-1:9000/user/hive/warehouse/person1 specified for non-external table:person1
19/11/20 18:26:11 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://hdp-1:9000/user/hive/warehouse/person1
Time taken: 9.634 seconds
19/11/20 18:26:13 INFO thriftserver.SparkSQLCLIDriver: Time taken: 9.634 seconds
2.加载数据:(本地加载)
数据源
vi person.txt
1 blue 20
2 yellow 25
3 red 18
4 blacke 10
5 orange 15
6 white 23
7 green 9
load data local inpath '/root/person.txt' into table person1;
加载信息:
19/11/20 18:29:11 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:29:11 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:29:11 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:29:11 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:29:11 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:29:11 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:29:12 INFO spark.ContextCleaner: Cleaned accumulator 0
19/11/20 18:29:13 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:29:13 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:29:13 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:29:13 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:29:13 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:29:13 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:29:13 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:29:13 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:29:13 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/11/20 18:29:20 INFO metadata.Hive: Renaming src: file:/root/person.txt, dest: hdfs://hdp-1:9000/user/hive/warehouse/person1/person.txt, Status:true
19/11/20 18:29:20 INFO metastore.HiveMetaStore: 0: alter_table: db=default tbl=person1 newtbl=person1
19/11/20 18:29:20 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=alter_table: db=default tbl=person1 newtbl=person1
19/11/20 18:29:20 INFO hive.log: Updating table stats fast for person1
19/11/20 18:29:20 INFO hive.log: Updated size of table person1 to 0
19/11/20 18:29:21 INFO metastore.HiveMetaStore: 0: get_database: global_temp
19/11/20 18:29:21 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: global_temp
19/11/20 18:29:21 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Time taken: 9.933 seconds
19/11/20 18:29:21 INFO thriftserver.SparkSQLCLIDriver: Time taken: 9.933 seconds
3.查询:
//查询所有
select * from person1;
查询结果:
1 blue 20
2 yellow 25
3 red 18
4 blacke 10
5 orange 15
6 white 23
7 green 9
加载信息:
19/11/20 18:32:25 INFO spark.ContextCleaner: Cleaned accumulator 1
19/11/20 18:32:26 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:32:26 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:32:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 282.5 KB, free 413.6 MB)
19/11/20 18:32:29 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.3 KB, free 413.6 MB)
19/11/20 18:32:29 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hdp-4:36130 (size: 24.3 KB, free: 413.9 MB)
19/11/20 18:32:29 INFO spark.SparkContext: Created broadcast 0 from
19/11/20 18:32:31 INFO mapred.FileInputFormat: Total input paths to process : 1
19/11/20 18:32:31 INFO spark.SparkContext: Starting job: processCmd at CliDriver.java:376
19/11/20 18:32:32 INFO scheduler.DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 2 output partitions
19/11/20 18:32:32 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
19/11/20 18:32:32 INFO scheduler.DAGScheduler: Parents of final stage: List()
19/11/20 18:32:32 INFO scheduler.DAGScheduler: Missing parents: List()
19/11/20 18:32:32 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376), which has no missing parents
19/11/20 18:32:32 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.2 KB, free 413.6 MB)
19/11/20 18:32:32 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.4 KB, free 413.6 MB)
19/11/20 18:32:32 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hdp-4:36130 (size: 4.4 KB, free: 413.9 MB)
19/11/20 18:32:32 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
19/11/20 18:32:32 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1))
19/11/20 18:32:33 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
19/11/20 18:32:33 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.150.154, executor 0, partition 0, ANY, 7920 bytes)
19/11/20 18:32:33 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.150.152, executor 1, partition 1, ANY, 7920 bytes)
19/11/20 18:32:40 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.150.152:36828 (size: 4.4 KB, free: 117.0 MB)
19/11/20 18:32:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.150.152:36828 (size: 24.3 KB, free: 116.9 MB)
19/11/20 18:32:58 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.150.154:37766 (size: 4.4 KB, free: 117.0 MB)
19/11/20 18:33:13 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.150.154:37766 (size: 24.3 KB, free: 116.9 MB)
19/11/20 18:33:40 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 67156 ms on 192.168.150.152 (executor 1) (1/2)
19/11/20 18:33:48 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 75538 ms on 192.168.150.154 (executor 0) (2/2)
19/11/20 18:33:48 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/11/20 18:33:49 INFO scheduler.DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 76.188 s
19/11/20 18:33:49 INFO scheduler.DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 77.314474 s
1 blue 20
2 yellow 25
3 red 18
4 blacke 10
5 orange 15
6 white 23
7 green 9
Time taken: 84.766 seconds, Fetched 7 row(s)
19/11/20 18:33:50 INFO thriftserver.SparkSQLCLIDriver: Time taken: 84.766 seconds, Fetched 7 row(s)
//查询年龄大于20的姓名和年龄
select name,age from person1 where age > 20 order by age;
查询结果:
white 23
yellow 25
加载信息:
19/11/20 18:35:11 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:35:11 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:35:16 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on hdp-4:36130 in memory (size: 4.4 KB, free: 413.9 MB)
19/11/20 18:35:16 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.150.154:37766 in memory (size: 4.4 KB, free: 116.9 MB)
19/11/20 18:35:18 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.150.152:36828 in memory (size: 4.4 KB, free: 116.9 MB)
19/11/20 18:35:18 INFO spark.ContextCleaner: Cleaned accumulator 28
19/11/20 18:35:20 INFO codegen.CodeGenerator: Code generated in 3545.120711 ms
19/11/20 18:35:21 INFO codegen.CodeGenerator: Code generated in 15.930455 ms
19/11/20 18:35:21 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 282.4 KB, free 413.3 MB)
19/11/20 18:35:21 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.3 KB, free 413.3 MB)
19/11/20 18:35:21 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hdp-4:36130 (size: 24.3 KB, free: 413.9 MB)
19/11/20 18:35:21 INFO spark.SparkContext: Created broadcast 2 from
19/11/20 18:35:22 INFO codegen.CodeGenerator: Code generated in 196.279074 ms
19/11/20 18:35:22 INFO mapred.FileInputFormat: Total input paths to process : 1
19/11/20 18:35:23 INFO spark.SparkContext: Starting job: processCmd at CliDriver.java:376
19/11/20 18:35:23 INFO scheduler.DAGScheduler: Got job 1 (processCmd at CliDriver.java:376) with 2 output partitions
19/11/20 18:35:23 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (processCmd at CliDriver.java:376)
19/11/20 18:35:23 INFO scheduler.DAGScheduler: Parents of final stage: List()
19/11/20 18:35:23 INFO scheduler.DAGScheduler: Missing parents: List()
19/11/20 18:35:23 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[12] at processCmd at CliDriver.java:376), which has no missing parents
19/11/20 18:35:23 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 11.4 KB, free 413.3 MB)
19/11/20 18:35:23 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 5.8 KB, free 413.3 MB)
19/11/20 18:35:23 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hdp-4:36130 (size: 5.8 KB, free: 413.9 MB)
19/11/20 18:35:23 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1161
19/11/20 18:35:23 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[12] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1))
19/11/20 18:35:23 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
19/11/20 18:35:23 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 192.168.150.154, executor 0, partition 0, ANY, 7920 bytes)
19/11/20 18:35:23 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 192.168.150.152, executor 1, partition 1, ANY, 7920 bytes)
19/11/20 18:35:23 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.150.152:36828 (size: 5.8 KB, free: 116.9 MB)
19/11/20 18:35:24 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.150.154:37766 (size: 5.8 KB, free: 116.9 MB)
19/11/20 18:35:24 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.150.154:37766 (size: 24.3 KB, free: 116.9 MB)
19/11/20 18:35:24 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.150.152:36828 (size: 24.3 KB, free: 116.9 MB)
19/11/20 18:35:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1574 ms on 192.168.150.154 (executor 0) (1/2)
19/11/20 18:35:25 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 2134 ms on 192.168.150.152 (executor 1) (2/2)
19/11/20 18:35:25 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
19/11/20 18:35:25 INFO scheduler.DAGScheduler: ResultStage 1 (processCmd at CliDriver.java:376) finished in 2.152 s
19/11/20 18:35:25 INFO scheduler.DAGScheduler: Job 1 finished: processCmd at CliDriver.java:376, took 2.174645 s
19/11/20 18:35:25 INFO spark.SparkContext: Starting job: processCmd at CliDriver.java:376
19/11/20 18:35:25 INFO scheduler.DAGScheduler: Registering RDD 13 (processCmd at CliDriver.java:376)
19/11/20 18:35:25 INFO scheduler.DAGScheduler: Got job 2 (processCmd at CliDriver.java:376) with 2 output partitions
19/11/20 18:35:25 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (processCmd at CliDriver.java:376)
19/11/20 18:35:25 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
19/11/20 18:35:25 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 2)
19/11/20 18:35:26 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[13] at processCmd at CliDriver.java:376), which has no missing parents
19/11/20 18:35:26 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 13.4 KB, free 413.3 MB)
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 54
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 49
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 41
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 42
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 43
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 53
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 47
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 50
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 58
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 52
19/11/20 18:35:26 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 7.0 KB, free 413.3 MB)
19/11/20 18:35:26 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hdp-4:36130 (size: 7.0 KB, free: 413.9 MB)
19/11/20 18:35:26 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1161
19/11/20 18:35:26 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on hdp-4:36130 in memory (size: 5.8 KB, free: 413.9 MB)
19/11/20 18:35:26 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.150.152:36828 in memory (size: 5.8 KB, free: 116.9 MB)
19/11/20 18:35:26 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[13] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1))
19/11/20 18:35:26 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
19/11/20 18:35:26 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, 192.168.150.152, executor 1, partition 0, ANY, 7909 bytes)
19/11/20 18:35:26 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, 192.168.150.154, executor 0, partition 1, ANY, 7909 bytes)
19/11/20 18:35:26 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.150.154:37766 in memory (size: 5.8 KB, free: 116.9 MB)
19/11/20 18:35:26 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.150.152:36828 (size: 7.0 KB, free: 116.9 MB)
19/11/20 18:35:26 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.150.154:37766 (size: 7.0 KB, free: 116.9 MB)
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 61
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 46
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 40
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 37
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 38
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 59
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 45
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 57
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 56
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 60
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 44
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 39
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 55
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 51
19/11/20 18:35:26 INFO spark.ContextCleaner: Cleaned accumulator 48
19/11/20 18:35:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 3275 ms on 192.168.150.152 (executor 1) (1/2)
19/11/20 18:35:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 3279 ms on 192.168.150.154 (executor 0) (2/2)
19/11/20 18:35:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
19/11/20 18:35:29 INFO scheduler.DAGScheduler: ShuffleMapStage 2 (processCmd at CliDriver.java:376) finished in 3.409 s
19/11/20 18:35:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
19/11/20 18:35:29 INFO scheduler.DAGScheduler: running: Set()
19/11/20 18:35:29 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 3)
19/11/20 18:35:29 INFO scheduler.DAGScheduler: failed: Set()
19/11/20 18:35:29 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[16] at processCmd at CliDriver.java:376), which has no missing parents
19/11/20 18:35:29 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 14.6 KB, free 413.3 MB)
19/11/20 18:35:29 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 7.7 KB, free 413.3 MB)
19/11/20 18:35:29 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hdp-4:36130 (size: 7.7 KB, free: 413.9 MB)
19/11/20 18:35:29 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1161
19/11/20 18:35:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (MapPartitionsRDD[16] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1))
19/11/20 18:35:29 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
19/11/20 18:35:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 6, 192.168.150.152, executor 1, partition 1, NODE_LOCAL, 7771 bytes)
19/11/20 18:35:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 7, 192.168.150.154, executor 0, partition 0, NODE_LOCAL, 7771 bytes)
19/11/20 18:35:29 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.150.154:37766 (size: 7.7 KB, free: 116.9 MB)
19/11/20 18:35:29 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.150.152:36828 (size: 7.7 KB, free: 116.9 MB)
19/11/20 18:35:32 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.150.154:58098
19/11/20 18:35:32 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.150.152:36110
19/11/20 18:35:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 6) in 9009 ms on 192.168.150.152 (executor 1) (1/2)
19/11/20 18:35:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 7) in 9009 ms on 192.168.150.154 (executor 0) (2/2)
19/11/20 18:35:38 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
19/11/20 18:35:38 INFO scheduler.DAGScheduler: ResultStage 3 (processCmd at CliDriver.java:376) finished in 9.044 s
19/11/20 18:35:38 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at CliDriver.java:376, took 12.726366 s
white 23
yellow 25
Time taken: 28.984 seconds, Fetched 2 row(s)
19/11/20 18:35:38 INFO thriftserver.SparkSQLCLIDriver: Time taken: 28.984 seconds, Fetched 2 row(s)
4.删除
drop table person1;
加载信息:
19/11/20 18:39:02 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:02 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_database: default
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=person1
19/11/20 18:39:03 INFO metastore.HiveMetaStore: 0: drop_table : db=default tbl=person1
19/11/20 18:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=drop_table : db=default tbl=person1
19/11/20 18:39:03 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:03 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/11/20 18:39:04 INFO metastore.hivemetastoressimpl: deleting hdfs://hdp-1:9000/user/hive/warehouse/person1
19/11/20 18:39:05 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
19/11/20 18:39:05 INFO metastore.hivemetastoressimpl: Deleted the diretory hdfs://hdp-1:9000/user/hive/warehouse/person1
Time taken: 2.705 seconds
19/11/20 18:39:05 INFO thriftserver.SparkSQLCLIDriver: Time taken: 2.705 seconds
最后
以上就是平淡香烟为你收集整理的HIVE-on-Spark的全部内容,希望文章能够帮你解决HIVE-on-Spark所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复