Tag Archives: mysql

Exporting MySQL query results using the CSV engine

MySQL Storage Engines

MySQL supports a quite large number of different storage engines. Each has different characteristics (transactional or non-transactional, foreign key support etc) and table file format. Some of them show very special behaviour and I believe that are worth noting because they can enable a DBA to easily perform various tasks:

CSV
Stores data in the very popular CSV format that can also be processed by common spreadsheet software. There are limitations though.
Federated
This engine can be used for setting up a table representation from a remote MySQL server. You’ll need to supply a connection string in the format:

mysql://user_name[:password]@host_name[:port_num]/db_name/tbl_name
Blackhole
Does not store any data, but checks & executes DML and DDL statements consistently. Can be used for setting up a relay server for replication.

For this post, I am using the CSV engine in combination with the Federated engine. By storing the query results in a table which uses the CSV engine we have actually performed a quite easy and cool export. In addition, using the Federated engine enables us to combine results from a remote server. In our example, each database stores different records based on values of the foreign key field user_id, let’s say because we are implementing sharding and we need aggregated results. Thus, without using any external programming language with a MySQL API to combine resultsets and then write the rows in CSV format we simply let MySQL server do some magic.

Database & Table setup

First we setup the 2 databases we need, named federated_1 & federated_2.

CREATE DATABASE federated_1;
CREATE DATABASE federated_2;

We create a table named t2 in federated_2 database with the following structure:

USE federated_2
CREATE TABLE `t2` (
  `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  `description` VARCHAR(50) NOT NULL,
  `user_id` INT(10) UNSIGNED NOT NULL,
  `date_inserted` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

… and a table named t1 in federated_1 database with the same structure as federated_2.t2

USE federated_1
CREATE TABLE `t1` (
  `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  `description` VARCHAR(50) NOT NULL,
  `user_id` INT(10) UNSIGNED NOT NULL,
  `date_inserted` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Federated engine tables need a proper CONNECTION table option value. Here we use localhost but it can be any remote host. t2_remote has the same structure but no indexes:

CREATE TABLE `t2_remote` (
  `id` int(11) NOT NULL,
  `description` varchar(50) NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `date_inserted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP 
) ENGINE=FEDERATED DEFAULT CHARSET=utf8 CONNECTION='mysql://root:1234@localhost:3306/federated_2/t2'

We insert some sample data to federated_1.t1 and federated_2.t2 tables. Notice that we are concatenating the database name for the description field so that we can easily recognize local and remote records:

-- Insert some random sample data in table t1
USE federated_1
INSERT INTO t1(id, description, user_id) VALUES(NULL, CONCAT(DATABASE(), "_", MD5(CAST(RAND() AS CHAR))), 2);
-- you can repeat this INSERT command as many times as you want, it'll just double the records each time
INSERT INTO t1(id, description, user_id) 
SELECT NULL, CONCAT(DATABASE(), "_", MD5(CAST(RAND() AS CHAR))), 2 FROM t1;
INSERT INTO t1(id, description, user_id) 
SELECT NULL, CONCAT(DATABASE(), "_", MD5(CAST(RAND() AS CHAR))), 2 FROM t1;

-- Insert some random sample data in table t2
USE federated_2
INSERT INTO t2(id, description, user_id) VALUES(NULL, CONCAT(DATABASE(), "_", MD5(CAST(RAND() AS CHAR))), 1);
-- you can repeat this INSERT command as many times as you want, it'll just double the records each time
INSERT INTO t2(id, description, user_id) 
SELECT NULL, CONCAT(DATABASE(), "_", MD5(CAST(RAND() AS CHAR))), 1 FROM t2;
INSERT INTO t2(id, description, user_id) 
SELECT NULL, CONCAT(DATABASE(), "_", MD5(CAST(RAND() AS CHAR))), 1 FROM t2;

Getting the job done

We’ll need a UNION of the records of the 2 tables for our “report” and we are creating a table which uses the CSV storage engine from that resultset. Notice the limitations of the CSV engine in the comments. I am using DATE_FORMAT to show that you can use MySQL functions to transform your data at your will before exporting:

-- We must explicitly specify column definitions because of the CSV engine limitations:
-- 1) not NULL-able columns
-- 2) no indexes
-- 3) no AUTO_INCREMENT columns
CREATE TABLE table_combined(id INT UNSIGNED NOT NULL, description VARCHAR(50) NOT NULL, user_id INT NOT NULL, date_inserted CHAR(14) NOT NULL) ENGINE=CSV
SELECT id, description, user_id, DATE_FORMAT(date_inserted,"%Y%m%d%H%i%s") AS date_inserted FROM `t1` 
UNION ALL 
SELECT id, description, user_id, DATE_FORMAT(date_inserted,"%Y%m%d%H%i%s") AS date_inserted FROM `t2_remote` ORDER BY description;

Results

Now if you list files in the MySQL data directory of the federated_1 database:

root@host-1:/var/lib/mysql/federated_1# ls -al table_combined.*
-rw-rw---- 1 mysql mysql 35 Nov 5 07:15 table_combined.CSM
-rw-rw---- 1 mysql mysql 19420 Nov 5 07:15 table_combined.CSV
-rw-rw---- 1 mysql mysql 8684 Nov 5 07:15 table_combined.frm

If you open the table_combined.CSV file in an editor you’ll see the following contents:

4,"federated_1_275b5bf71cc676897ecb6ffa11b72eb3",2,"2012-11-04 18:17:04"
2,"federated_1_890e1409f7a007139670d5f1925b9bb0",2,"2012-11-04 18:17:02"
1,"federated_1_97b64f11bca0e439e8f485428f4eb760",2,"2012-11-04 18:16:29"
3,"federated_1_c08581747c9321659e5da35639c08736",2,"2012-11-04 18:17:04"
366,"federated_2_04106e4c4f37cfd6ce7a86217b9eec99",1,"2012-11-04 18:23:34"
...

…and we’re ready! We saw how MySQL server storage engines can help us get resultsets from remote servers and export them in a friendly format without needing a scripting language.

Advertisements
Tagged , , ,
%d bloggers like this: