Once you have your new database schema ready, you sometimes wish to insert random data just to see how things are going. For example, to test the performance of specific SELECT or INSERT statements or to help you decide better on the index usage. Some may be rendered useless or excessive by such a process or you may discover that e.g. a composite index is better suited for your needs.
Moreover, you could decide on the storage engine you should use (although for this there are many other parameters which involve application requirements as well, for instance the need for transactional logic makes imperative the use of InnoDB) or get a rough estimate of the disk space your DB will consume over time. One could even use this technique combined with the mysqlslap utility to test your server configuration, although it can generate test data itself.
So let’s take it step-by-step.
- Create 2 sample tables, one “holding data” and a second one which will serve as a lookup table:
CREATE TABLE random_test(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, random_refer_id INT NOT NULL,sometext VARCHAR(255), test_date timestamp); CREATE TABLE random_lookup(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, lookup_value VARCHAR(100));
- Fill the lookup table with data first:
INSERT INTO random_lookup(lookup_value) VALUES( LPAD('', 100, MD5 ( CAST( RAND() AS CHAR ) ) ) );
As you can see, I use the LPAD function along with RAND functions. Easy to guess, we will be using the
RANDfunction quite frequently in this example; it returns a random floating-point number between 0 and 1, thus we must use
CASTto change the returned value to a string type for
MD5. Of course are character set will be constrained by the hexadecimal representation produced by
Now that we have a single random record in our table, we can start adding a lot more. The proper tool for this task is the INSERT INTO … SELECT statement.
INSERT INTO random_lookup(lookup_value) SELECT LPAD( '', 100, MD5( CAST( RAND() AS CHAR ) ) ) FROM random_lookup;
… and we just addressed the very fundamental issue of fast random data generation by using the table records themselves. By running the latter
INSERTstatement N times you will generate 2N records.
- Now we need to find the maximum
random_lookup.idvalue which not surprisingly will be equal to 2N, but one
SELECTto verify can do no harm, thus:
SELECT MAX(id) AS max_lookup_id FROM random_lookup;
INSERT INTO random_test(random_refer_id, sometext, test_date) VALUES( FLOOR(RAND() * ([max_lookup_id] - 1) + 1), LPAD( '', 255, MD5( CAST( RAND() AS CHAR ) ) ), FROM_UNIXTIME(CURRENT_TIMESTAMP + FLOOR(RAND() * 2500000)) ); -- step #1 INSERT INTO random_test(random_refer_id, sometext, test_date) SELECT FLOOR(RAND() * ([max_lookup_id] - 1) + 1), LPAD( '', 255, MD5( CAST( RAND() AS CHAR ) ) ), FROM_UNIXTIME(CURRENT_TIMESTAMP + FLOOR(RAND() * 2500000)) FROM random_test; -- step #2
Of course [max_lookup_id] is to be replaced with the maximum
random_lookup.id. You repeat the second step as many times as the records you want to generate.