admin 管理员组

文章数量: 887016

RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop

Sqoop 简介

Apache Sqoop 项目旨在协助 RDBMS 与 Hadoop 之间进行高效的大数据交流。用户可以在 Sqoop 的帮助下,轻松地把关系型数据库的数据导入到 Hadoop 与其相关的系统 ( 如 HBase 和 Hive) 中;同时也可以把数据从 Hadoop 系统里抽取并导出到关系型数据库里。除了这些主要的功能外,Sqoop 也提供了一些诸如查看数据库表等实用的小工具。

Sqoop 支持的数据库

理论上,Sqoop 支持任何一款支持 JDBC 规范的数据库,如 DB2、MySQL 等。在使用 Sqoop 连接关系型数据库前,首先需要把相关的 JDBC 驱动拷贝到 $SQOOP_HOME/lib 文件夹下,然后在“connect”参数后指定好数据库连接的 url,如“--connect jdbc:db2://localhost:50000/SAMPLE”。 对于 DB2 数据库来说,Sqoop 目前支持 DB2 的绝大多数数据类型,而且 Sqoop 的大多数工具也能在 DB2 上较好地运行。

Sqoop 支持的文件类型

Sqoop 能够将 DB2 数据库的数据导入到 HDFS 上,并保存为多种文件类型。常见的有定界文本类型,Avro 二进制类型以及 SequenceFiles 类型。在本文里,统一用定界文本类型。
Available commands:codegen            Generate code to interact with database recordscreate-hive-table  Import a table definition into Hiveeval               Evaluate a SQL statement and display the resultsexport             Export an HDFS directory to a database tablehelp               List available commandsimport             Import a table from a database to HDFSimport-all-tables  Import tables from a database to HDFSlist-databases     List available databases on a serverlist-tables        List available tables in a databaseversion            Display version informationSee 'sqoop help COMMAND' for information on a specific command.



List RDBMS Databases;  ./sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -password root123;

List RDBMS Tables;
./sqoop list-tables --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123;

Import RDBMS Tables;  ./sqoop import --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123 --table USERS;

hadoop@feixu-master1:/usr/local/sqoop/bin$ ./sqoop import --connect jdbc:mysql://192.168.56.102:3306/feixu --username root --password root123 --table RATES_BK;
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
13/12/20 05:03:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/12/20 05:03:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/12/20 05:03:52 INFO tool.CodeGenTool: Beginning code generation
13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1
13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1
13/12/20 05:03:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/12/20 05:03:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.jar
13/12/20 05:03:53 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/12/20 05:03:53 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/12/20 05:03:53 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/12/20 05:03:53 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job.
13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job.
13/12/20 05:03:53 INFO mapreduce.ImportJobBase: Beginning import of RATES_BK
13/12/20 05:03:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`USER_ID`), MAX(`USER_ID`) FROM `RATES_BK`
13/12/20 05:03:57 INFO mapred.JobClient: Running job: job_201312200120_0010
13/12/20 05:03:58 INFO mapred.JobClient: map 0% reduce 0%
13/12/20 05:04:13 INFO mapred.JobClient: map 25% reduce 0%
13/12/20 05:04:14 INFO mapred.JobClient: map 50% reduce 0%
13/12/20 05:04:15 INFO mapred.JobClient: map 100% reduce 0%
13/12/20 05:04:17 INFO mapred.JobClient: Job complete: job_201312200120_0010
13/12/20 05:04:17 INFO mapred.JobClient: Counters: 18
13/12/20 05:04:17 INFO mapred.JobClient: Job Counters
13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=36108
13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/12/20 05:04:17 INFO mapred.JobClient: Launched map tasks=4
13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/12/20 05:04:17 INFO mapred.JobClient: File Output Format Counters
13/12/20 05:04:17 INFO mapred.JobClient: Bytes Written=216525
13/12/20 05:04:17 INFO mapred.JobClient: FileSystemCounters
13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_READ=440
13/12/20 05:04:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=268158
13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=216525
13/12/20 05:04:17 INFO mapred.JobClient: File Input Format Counters
13/12/20 05:04:17 INFO mapred.JobClient: Bytes Read=0
13/12/20 05:04:17 INFO mapred.JobClient: Map-Reduce Framework
13/12/20 05:04:17 INFO mapred.JobClient: Map input records=10000
13/12/20 05:04:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=152514560
13/12/20 05:04:17 INFO mapred.JobClient: Spilled Records=0
13/12/20 05:04:17 INFO mapred.JobClient: CPU time spent (ms)=3350
13/12/20 05:04:17 INFO mapred.JobClient: Total committed heap usage (bytes)=37519360
13/12/20 05:04:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1391800320
13/12/20 05:04:17 INFO mapred.JobClient: Map output records=10000
13/12/20 05:04:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=440
13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Transferred 211.4502 KB in 23.3888 seconds (9.0407 KB/sec)
13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Retrieved 10000 records.


本文标签: RDBMS和HDFS Hive HBASE的迁移工具Sqoop