Improving Performance by Batching Azure Table Storage Inserts

Author: Brian Swan <brian.swan@microsoft.com>

Date: Wednesday, March 21, 2012, 12:45:43 PM

Tags: Tutorial

Table of Contents

    Note:This article pertains to the CodePlex SDK initially released late 2009. The Windows Azure team has since then released a newer version of the Azure SDK for PHP on Github. Please refer to the Windows Azure PHP Developer Center for documentation on this more recent version of the SDK.

    Please stay tuned and come back here regularly as we are working on refreshing the tutorials to deliver up to date and useful content for our PHP developers.

    ;

    This is a short post to share the results of a little investigation I did that was inspired by comments on a post I wrote about using SQL Azure for handling session data. The comment was by someone reporting that SQL Azure seemed to be faster than Azure Table Storage for handling session data. My experiments show that SQL Azure and Table Storage have very similar performance when doing single writes (YMMV), so I can't verify or refute the claim. However, I got to wondering which is faster for inserting and retrieving many "rows" of data. I know that Table Storage is supposed to be faster, but I wondered how much faster. So I wrote a two-part PHP script that does the following:

    1. Connects to SQL Azure.
    2. Inserts 100 rows to an existing database.
    3. Retrieves the 100 rows.

    Here's the code:

    
    $conn = sqlsrv_connect(SQLAZURE_SERVER_ID.".database.windows.net,1433", array("UID"=>SQLAZURE_USER."@".SQLAZURE_SERVER_ID
    
    
                                                                                , "PWD"=>SQLAZURE_PASSWORD
    
    
                                                                                , "Database"=>SQLAZURE_DB
    
    
                                                                                , "ReturnDatesAsStrings"=>true));   
    
    for($i = 0; $i < 100; $i++)
    
    
    {   
    
    
        $id = $i;
    
    
        $data = "GolferMessage".$i;
    
    
        $params = array($id, $data);
    
    
        $stmt1 = sqlsrv_query($conn, "INSERT INTO Table_1 (id, data) VALUES (?,?)", $params);
    
    if($stmt1 === false)
    
    die(print_r(sqlsrv_errors()));
    
    
    }
    
    
    $stmt2 = sqlsrv_query($conn, "SELECT id, data, timestamp FROM Table_1");
    
    while($row = sqlsrv_fetch_array($stmt2, SQLSRV_FETCH_ASSOC))
    
    
    {
    
    
    }
    

    Note: The code above uses the SQL Server Driver for PHP to connect to SQL Azure.

    The second part of the script does the equivalent for Table Storage:

    1. Connects to Azure Storage.
    2. Inserts 100 entities to an existing table.
    3. Retrieves the 100 entities.

    Here's the code:

    $tableStorageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', STORAGE_ACCOUNT_NAME, STORAGE_ACCOUNT_KEY);

    
    $batch = $tableStorageClient->startBatch();
    
    for($i = 0; $i < 100; $i++)
    
    
    {
    
    
        $name = $i;
    
    
        $message = "GolferMessage".$i;
    
    
        $mbEntry = new MessageBoardEntry();
    
    
        $mbEntry->golferName = $name;
    
    
        $mbEntry->golferMessage = $message;
    
    
        $tableStorageClient->insertEntity('MessageBoardEntry', $mbEntry);
    
    
    }
    
    
    $batch->commit();
    
    
    $messages = $tableStorageClient->retrieveEntities("MessageBoardEntry", null, "MessageBoardEntry");
    
    foreach($messages as $message)
    
    
    {
    
    
    }
    

    Note: The code above uses the Windows Azure SDK for PHP to connect to Azure Storage.

    The result of the test was that Table Storage was consistently 4 to 5 times faster than SQL Azure (again, YMMV). The key, however, was to use the $tableStorageClient->startBatch() and $batch->commit() methods with Table Storage. Without using batches, Table Storage opens and closes a new HTTP connection for each write, which results in slower performance than SQL Azure (which keeps a connection open for writes). When using batches with Table Storage, the connection is kept open for all writes.

    Note: Many thanks to Maarten Balliauw who, when I was perplexed about the results of my tests without batching (I expected Table Storage to be faster, but because I didn't know about batches for Table Storage, I was not getting the results I expected), suggested I try batching.

    The complete script (with set up/tear down of database and Table) is attached in case you want to try for yourself.

    Thanks.

    -Brian

 
blog comments powered by Disqus

Related Content