Generating Random Data with MySQL

Once you have your new database schema ready, you sometimes wish to insert random data just to see how things are going. For example, to test the performance of specific SELECT or INSERT statements or to help you decide better on the index usage. Some may be rendered useless or excessive by such a process or you may discover that e.g. a composite index is better suited for your needs.

Moreover, you could decide on the storage engine you should use (although for this there are many other parameters which involve application requirements as well, for instance the need for transactional logic makes imperative the use of InnoDB) or get a rough estimate of the disk space your DB will consume over time. One could even use this technique combined with the mysqlslap utility to test your server configuration, although it can generate test data itself.

So let’s take it step-by-step.

  1. Create 2 sample tables, one “holding data” and a second one which will serve as a lookup table:
    CREATE TABLE random_test(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, random_refer_id INT NOT NULL,sometext VARCHAR(255), test_date timestamp);
    CREATE TABLE random_lookup(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, lookup_value VARCHAR(100));
    
  2. Fill the lookup table with data first:
    INSERT INTO random_lookup(lookup_value) VALUES( LPAD( '', 100, MD5( CAST( RAND() AS CHAR ) ) ) );
    

    As you can see, I use the LPAD function along with RAND
    and the MD5 functions. Easy to guess, we will be using the RAND function quite frequently in this example; it returns a random floating-point number between 0 and 1, thus we must use CAST to change the returned value to a string input for MD5 and LPAD.

    OK now we have a single random record in our table, but we need a lot more. The proper tool for this task is the INSERT INTO … SELECT statement.

    INSERT INTO random_lookup(lookup_value) SELECT LPAD( '', 100, MD5( CAST( RAND() AS CHAR ) ) ) FROM random_lookup;
    

    … and we just addressed the very fundamental issue of fast random data generation by using the table records themselves. By running the latter INSERT statement N times you will generate 2N records!

  3. Now we need to find the maximum random_lookup.ID value which not surprisingly will be equal to 2N, but one SELECT will not harm, thus:
    SELECT MAX(id) AS max_lookup_id FROM random_lookup;
    
  4. INSERT INTO random_test(random_refer_id, sometext, test_date) 
    VALUES(
    FLOOR(RAND() * ([max_lookup_id] - 1) + 1),
    LPAD( '', 255, MD5( CAST( RAND() AS CHAR ) ) ),
    FROM_UNIXTIME(CURRENT_TIMESTAMP + FLOOR(RAND() * 2500000))
    ); -- step #1
    
    INSERT INTO random_test(random_refer_id, sometext, test_date) 
    SELECT 
    FLOOR(RAND() * ([max_lookup_id] - 1) + 1),
    LPAD( '', 255, MD5( CAST( RAND() AS CHAR ) ) ),
    FROM_UNIXTIME(CURRENT_TIMESTAMP + FLOOR(RAND() * 2500000))
    FROM random_test; -- step #2
    

    Of course [max_lookup_id] is to be replaced with the maximum random_lookup.ID . You repeat the second step as many times as the records you want to generate.

I hope this gave you an idea on random data generation. In addition, I am currently working on a tool for fun that will generate random data automatically for a given database schema.

Advertisements
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: