Transferring data between MySQL and MongoDB

In this project, I had to grandfather in the existing mySQL database. While it’s certainly do-able to use mySQL with Node.js/Meteor, the re-architecture elected use of MongoDB.

There are of course many ways to move data between the two databases.  Exporting from MySQL to a CSV file, and then importing that into MongoDB is one way.  The problem for me, in this particular case, was one of minimalism.  By their nature, SQL databases can have columns that are empty/null.  MongoDB allows removal of such cruft– if a field is empty, it simply doesn’t need to be included.  This allows for smaller data sizes, and cleaner documents.

So, I opted to write a quick and dirty Python script that would only transfer a mySQL record’s field and value if it wasn’t null. Not shown here, but easy to add would be a translation table– rather than keep mySQL’s column names, they can be renamed prior to insert into mongodb. That was what I did, but left it out here to make the direct transfer clearer.  

For MongoDB, I used easy_install to install pymongo.  MySQL offers a Python Connector.  And, it’s documentation.  For OS X, at least, it’s very easy to install.  Further, its API is basic and straight forward.

Here is an example script (not optimized, or the ‘best’ way, just a quick and dirty/”get the job done” sort of thing):

First, the necessary imports:

import mysql.connector
from pymongo import MongoClient

Followed by initializing the database handles:

client     = MongoClient()
db         = client.test
collection = db.payment

cnx        = mysql.connector.connect( 
                                     user     ='xxx', 
                                     password ='xxx', 
                                     database ='xxx')
cursor     = cnx.cursor()

Next, a tuple containing all the mySQL column names:

row_titles = (
               'payment_id',   'payment_cust_id',
               'payment_desc', 'payment_refer', 
               'payment_amt',  'payment_dt'


The query into mySQL is straight forward:

query = ( "SELECT " 
          "payment_dbid, payment_customer_bid, "  
          "payment_desc_txt, payment_refer_txt, "
          "payment_amt, payment_dt "
          "FROM payment" )

cursor.execute( query )

So now, the cursor is loaded with data. The strategy is: cycle through each field (row_titles) of each mySQL record (row). If a field’s value is null, skip it and move on to the next one, only transferring populated fields to mongodb.

#mongo client specifically requires python dict
cus = dict()

#custom record id rather than mongodb default hash id                                          
cid = 0                                                 

#cycle through each mySQL row
for ( row ) in cursor:
    cid        += 1   #increment id
    cus['_id'] = cid                                    
    #check if current row is null
    for i in range( 0, len( row ) ):
        if row[i] == None:
            #if the record is null, skip it                  
            #conversion to string
            row_title      = "".join( row_titles[i] )  
            #conversion to string 
            field          = str( row[i] ) 
            #add current record's field's title and value             
            cus[row_title] = field
   #we've completed processing this row, insert it into mongoldb      
    id = collection.insert_one( cus ).inserted_id

    #just for debug purposes, show me running row count
    print id 

#shut down the mysql connection

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s