Sqoop学习
创始人
2024-04-04 02:37:12

Sqoop 学习

1.简介

官网地址:https://sqoop.apache.org/;

主要版本Sqoop1 和 Sqoop2 两个版本

注意:最新的稳定版本是1.4.7(下载,文档)。Sqoop2的最新版本是1.99.7(下载,文档)。请注意,1.99.7与1.4.7不兼容,且功能不完整,不适用于生产部署。 总结就是2版本不好用于生产

Sqoop ===》 SQL to hadoop ;Sqoop 是一款开源工具,主要用于Hadoop (Hive) 数据和传统数据库(mysql ,postgresql)数据传递;可以把传统数据库数据转换到Hadoop HDFS 中,也可以把HDFS 数据导入到关系型数据库中;

发展历史:起于2009 ,起初作为Hadoop 的第三方模块,后来为了方便部署和快速迭代,Sqoop独立出来作为apache 的项目;目前Apache已经终止Sqoop项目了

2.Sqoop实现原理

导入和导出命令翻译成MR查询来执行

主要是对 MapReduce的inputformat 和outputformat 定制

3.安装部署

3.1 安装包下载

下载地址 https://archive.apache.org/dist/sqoop/

在这里插入图片描述

上传到服务器解压 后可以看到下面模块

[root@node1 sqoop]# pwd
/soft/sqoop
[root@node1 sqoop]# ls
bin        CHANGELOG.txt  conf  ivy      lib          NOTICE.txt   README.txt       sqoop-patch-review.py  src
build.xml  COMPILING.txt  docs  ivy.xml  LICENSE.txt  pom-old.xml  sqoop-1.4.6.jar  sqoop-test-1.4.6.jar   testdata
[root@node1 sqoop]# 

3.2 配置文件修改

//模板配置文件修改 在备份
[root@node1 conf]# cp sqoop-env-template.sh sqoop-env.sh
//修改配置文件
[root@node1 conf]# vim sqoop-env.sh# 先配置hadoop  hive 的地址 
export HADOOP_COMMON_HOME=/soft/hadoop-3.3.0
export HADOOP_MAPRED_HOME=/soft/hadoop-3.3.0export HIVE_HOME=/soft/hive## ZOOCFGDIR  HBASE_HOME 可以先不配置也可以

3.3 拷贝数据库 jdbc 的驱动包

[root@node1 lib]# ls *mysql*
mysql-connector-java-5.1.37.jar
[root@node1 lib]# pwd
/soft/sqoop/lib
[root@node1 lib]#

3.4 验证 Sqoop

执行命令 ./sqoop help

[root@node1 bin]# ./sqoop help
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2022-11-03 22:37:32,708 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]Available commands:codegen            Generate code to interact with database recordscreate-hive-table  Import a table definition into Hiveeval               Evaluate a SQL statement and display the resultsexport             Export an HDFS directory to a database tablehelp               List available commandsimport             Import a table from a database to HDFSimport-all-tables  Import tables from a database to HDFSimport-mainframe   Import datasets from a mainframe server to HDFSjob                Work with saved jobslist-databases     List available databases on a serverlist-tables        List available tables in a databasemerge              Merge results of incremental importsmetastore          Run a standalone Sqoop metastoreversion            Display version informationSee 'sqoop help COMMAND' for information on a specific command.
[root@node1 bin]#

执行 mysql 测试连接

[root@node1 bin]# ./sqoop list-databases --connect jdbc:mysql://192.168.141.155/sqoop --username root --password hadoop
..............................
information_schema
ahs
cm
hive3
mysql
performance_schema
sqoop
sys

OK 环境正常

4. 数据导入 import

4.1 关键字含义

官方指导手册 https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

导入: 关系型数据库(MySQL ,Oracle) 到大数据集群 HDFS HIVE HBASE ,使用关键字 import

命令含义
connectjdbc url
usernamejdbc username
passwordjdbc password
tablemysql 里面的数据源
target-dir目标输出文件
delete-target-dir导出前删除目标目录
fields-terminated-by字段分割符
querySQL 查询语句,有这个参数就不需要table,必须要加$CONDITIONS;
query 里面的SQL 使用单引号 ' ,使用双引号 " 需要手动转义
wherewhere 会和 query 里面的查询冲突

4.2 数据准备

drop table if exists  student;
create table student(id int auto_increment primary key ,num int,name varchar(20)
) charset =utf8;
INSERT INTO student(num, name) VALUES (95001, '李勇');
INSERT INTO student(num, name) VALUES (95002, '刘晨');
INSERT INTO student(num, name) VALUES (95003, '王敏');
INSERT INTO student(num, name) VALUES (95004, '张立');
INSERT INTO student(num, name) VALUES (95005, '刘刚');
INSERT INTO student(num, name) VALUES (95006, '孙庆');
INSERT INTO student(num, name) VALUES (95007, '易思玲');
INSERT INTO student(num, name) VALUES (95008, '李娜');
INSERT INTO student(num, name) VALUES (95009, '梦圆圆');
INSERT INTO student(num, name) VALUES (95010, '孔小涛');
.......

4.3.全部导入数据

[root@node1 bin]# ./sqoop  import \
--connect jdbc:mysql://192.168.141.155/sqoop  \
--username root --password hadoop  \
--table student \
--target-dir /user/sqoop_mysql_to_hdfs  \
--delete-target-dir  \
--num-mappers 1 \
--fields-terminated-by "\t"Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2022-11-03 23:15:35,237 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
2022-11-03 23:15:35,304 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2022-11-03 23:15:35,389 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2022-11-03 23:15:35,389 INFO tool.CodeGenTool: Beginning code generation
2022-11-03 23:15:35,652 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
2022-11-03 23:15:35,671 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
2022-11-03 23:15:35,676 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/hadoop-3.3.0
注: /tmp/sqoop-root/compile/14cc91c986370cd735b8a241a21c6874/student.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
2022-11-03 23:15:36,758 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/14cc91c986370cd735b8a241a21c6874/student.jar
2022-11-03 23:15:37,458 INFO tool.ImportTool: Destination directory /user/sqoop_mysql_to_hdfs is not present, hence not deleting.
2022-11-03 23:15:37,458 WARN manager.MySQLManager: It looks like you are importing from mysql.
2022-11-03 23:15:37,458 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2022-11-03 23:15:37,458 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2022-11-03 23:15:37,458 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2022-11-03 23:15:37,462 INFO mapreduce.ImportJobBase: Beginning import of student
2022-11-03 23:15:37,463 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2022-11-03 23:15:37,472 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2022-11-03 23:15:37,488 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2022-11-03 23:15:37,573 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.141.151:8032
2022-11-03 23:15:37,910 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1663767415605_0024
2022-11-03 23:15:39,401 INFO db.DBInputFormat: Using read commited transaction isolation
2022-11-03 23:15:39,455 INFO mapreduce.JobSubmitter: number of splits:1
2022-11-03 23:15:39,566 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1663767415605_0024
2022-11-03 23:15:39,566 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-11-03 23:15:39,726 INFO conf.Configuration: resource-types.xml not found
2022-11-03 23:15:39,726 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-11-03 23:15:39,996 INFO impl.YarnClientImpl: Submitted application application_1663767415605_0024
2022-11-03 23:15:40,033 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1663767415605_0024/
2022-11-03 23:15:40,034 INFO mapreduce.Job: Running job: job_1663767415605_0024
2022-11-03 23:15:49,194 INFO mapreduce.Job: Job job_1663767415605_0024 running in uber mode : false
2022-11-03 23:15:49,195 INFO mapreduce.Job:  map 0% reduce 0%
2022-11-03 23:16:05,319 INFO mapreduce.Job:  map 100% reduce 0%
2022-11-03 23:16:05,331 INFO mapreduce.Job: Job job_1663767415605_0024 completed successfully
2022-11-03 23:16:05,421 INFO mapreduce.Job: Counters: 33File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=273519FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=737HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2HDFS: Number of bytes read erasure-coded=0Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=13522Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=13522Total vcore-milliseconds taken by all map tasks=13522Total megabyte-milliseconds taken by all map tasks=13846528Map-Reduce FrameworkMap input records=44Map output records=44Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=67CPU time spent (ms)=810Physical memory (bytes) snapshot=199610368Virtual memory (bytes) snapshot=2785349632Total committed heap usage (bytes)=148897792Peak Map Physical memory (bytes)=199610368Peak Map Virtual memory (bytes)=2785349632File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=737
2022-11-03 23:16:05,428 INFO mapreduce.ImportJobBase: Transferred 737 bytes in 27.9275 seconds (26.3898 bytes/sec)
2022-11-03 23:16:05,433 INFO mapreduce.ImportJobBase: Retrieved 44 records.

从日志看是到处成功

HDFS 验证导入的数据

在这里插入图片描述

导入数据成功 ~

4.4 使用查询导出

[root@node1 bin]# ./sqoop  import \
--connect jdbc:mysql://192.168.141.155/sqoop  \
--username root --password hadoop  \
--target-dir /user/sqoop_mysql_to_hdfs  \
--delete-target-dir  \
--num-mappers 1 \
--fields-terminated-by "\t"  \
--query 'select id , name from student ;'[root@node1 bin]# ./sqoop  import \
> --connect jdbc:mysql://192.168.141.155/sqoop  \
> --username root --password hadoop  \
> --target-dir /user/sqoop_mysql_to_hdfs  \
> --delete-target-dir  \
> --num-mappers 1 \
> --fields-terminated-by "\t"  \
> --query 'select id , name from student ;'
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2022-11-03 23:33:57,756 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
2022-11-03 23:33:57,822 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2022-11-03 23:33:57,897 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2022-11-03 23:33:57,897 INFO tool.CodeGenTool: Beginning code generation
2022-11-03 23:33:57,902 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id , name from student ;] must contain '$CONDITIONS' in WHERE clause.at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)at org.apache.sqoop.Sqoop.run(Sqoop.java:143)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)at org.apache.sqoop.Sqoop.main(Sqoop.java:236)-----修改为[root@node1 bin]# ./sqoop  import --connect jdbc:mysql://192.168.141.155/sqoop  --username root --password hadoop  --target-dir /user/sqoop_mysql_to_hdfs  --delete-target-dir  --num-mappers 1 --fields-terminated-by "\t"  --query 'select id , name from student where 1=1 and $CONDITIONS;'

注意: query 里面的SQL 使用单引号 ' ,使用双引号 " 需要手动转义

验证: 导入成功

在这里插入图片描述

4.5 指定导出列

--columns id,name \

[root@node1 bin]# ./sqoop  import \
--connect jdbc:mysql://192.168.141.155/sqoop  \
--username root --password hadoop  \
--table student \
--columns id,name  \
--target-dir /user/sqoop_mysql_to_hdfs  \
--delete-target-dir  \
--num-mappers 1 \
--fields-terminated-by "\t"

4.6指定where 条件

[root@node1 bin]# ./sqoop  import \
--connect jdbc:mysql://192.168.141.155/sqoop  \
--username root --password hadoop  \
--table student \
--columns id,name  \
--where "id=1"  \
--target-dir /user/sqoop_mysql_to_hdfs  \
--delete-target-dir  \
--num-mappers 1 \
--fields-terminated-by "\t"

4.7 导出到hive表

[root@node1 bin]# ./sqoop  import \
--connect jdbc:mysql://192.168.141.155/sqoop  \
--username root --password hadoop  \
--table student \
--columns id,name  \
--hive-import  \
--hive-overwrite \
--num-mappers 1 \
--fields-terminated-by "\t" \
--hive-table sqooptest_mysql_tohive 

验证: 导入成功~

在这里插入图片描述

5.数据导出 export

导出: HDFS/HIVE/HBase 到关系型数据库 MySQL Oracle

创建一个数据库

 create table  student2 like student;
[root@node1 bin]# ./sqoop  export \
--connect "jdbc:mysql://192.168.141.155/sqoop?useUnicode=true&characterEncoding=utf-8"  \
--username root --password hadoop  \
--table student2 \
--columns id,name  \
--num-mappers 1 \
--input-fields-terminated-by "\t" \
--export-dir /user/sqoop_mysql_to_hdfs 

注意:?useUnicode=true&characterEncoding=utf-8 不加这个MySQL 数据库 数据可能会乱码

验证:正确

6.其他

输出压缩文件

只需将参数放到 Sqoop 命令字符串中即可。

--compression-codec 

snappy压缩

--compression-codec org.apache.hadoop.io.compress.SnappyCodec

Gzip压缩

--compression-codec org.apache.hadoop.io.compress.GzipCodec

Bzip压缩

--compression-codec org.apache.hadoop.io.compress.BZip2Codec

相关内容

热门资讯

财联社1月8日早间新闻精选 转自:财联社【财联社1月8日早间新闻精选】 1、工业和信息化部等八部门印发《“人工智能+制造”专项行...
国家医保局:2028年前全面推... 转自:北京日报客户端今后看病缴费将不用再为排长队发愁了。1月8日,国家医保局发布通知,将在全国范围内...
新闻分析丨格陵兰岛为何让美国如... 来源:新华社新华社北京1月7日电 新闻分析|格陵兰岛为何让美国如此垂涎新华社记者林昊美军强行控制委内...
数字人主播纳入监管 “会员降权...   市场监管总局和国家网信办近日联合发布《网络交易平台规则监督管理办法》《直播电商监督管理办法》。这...
突破困境 “丫邦”组合更加坚定 北京时间1月7日,马来西亚羽毛球公开赛混双首轮,2号种子蒋振邦/魏雅欣2比0击败印度组合卡普尔/加德...