Making Your Google App Engine Backend Scalable

Posted on Posted in Stories

Managing Backend Scalability In Google App Engine

Google App Engine (GAE) is known as a cloud computing platform for developing and hosting web apps. It was designed to automatically scale for web apps as requests increase.

In GAE, regular requests have a short deadline of 30 seconds. Longer running processes must rely on App Engine Backends which allows processes to run continuously and consume more memory. This, along with cron jobs, is useful for regular maintenance tasks requiring processing of existing data.

However, as your data grows to hundreds of thousands to millions of records, scalability becomes an issue. GAE provides features that allow your application to scale with the number of requests. But without proper implementation, your backend tasks will take an impractical amount of time to process all your data.

A simple way to handle processing of any amount of data is by utilizing the GAE Task Queue. The Task Queue will allow you to run multiple processes that may run asynchronously. Through asynchronous processing of smaller tasks, processing large amounts of data can finish faster and be easier to scale.

In one of our ecommerce project, it was important that our Product records were updated daily. As the number of products grew from thousands to tens of thousands, I noticed that the time required to process everything became too long. It required around 4 days to finish updating 70,000 products. In order to mitigate this delay, I had to make some improvements in the implementation of the product updates. Here is what I did to accomplish better scalability for the specific task of product updates:

  1. Created a task queue in queue.xml
    
    product-update
    3/s
    20
    

    The bucket size is equivalent to the number of tasks it can process at a time. This number represents the number of “workers” that can work on the tasks. The rate is how fast this bucket is refilled. In order to create a cost effective solution to the scalability issue, you must consider the rate at which the task queue processes things. These two will determine this.

  2. Created a backend dedicated for processing your processes. If it is going to run all day, it’s best that the instance is set as not dynamic (resident instance) to avoid startup and shutdown delays.
    "productupdate">
    
    false
    
    
  3. Redesigned product update function such that I am able to split the products for update into smaller chunks.
    1. Product records are given “last checked” dates and are retrieved by batches, the older ones first. These records are updated and divided into smaller lists.
    2. Lists are passed into tasks designed to process these small lists and are pushed into the queue
       
      public final String queueProductsForPriceCheck(HttpServletRequest request, HttpServletResponse response) throws IOException { 
      	... 
          Queue queue = QueueFactory.getQueue("product-update"); 
          List productList = productService.findProductForUpdate(); 
          List subList = null; 
          int count = 0; 
      
          for(Product product : productList) { 
          	product.setPriceCheckDate(new Date()); 
              productService.update(product); 
      
              if(count % PRODUCTS_PER_QUEUE == 0) { 
              	// submit list for queueing 
                  if(count != 0) { 
                  	queueProductList(queue, subList); 
                  } 
      
                  subList = new ArrayList(); 
              } 
      
              subList.add(product); 
              count++;
      
          } // end of for loop - 
      
          products queueProductList(queue, subList); 
      
          ... 
      } 
      
      private void queueProductList(Queue queue, List subList) throws IOException { 
      	ByteArrayOutputStream output = new ByteArrayOutputStream(); 
          ObjectOutputStream objout = new ObjectOutputStream(output); 
          objout.writeObject(subList); 
          queue.add(withUrl("/tracking/checkProduct").payload(output.toByteArray()).header("Host", BackendServiceFactory.getBackendService().getBackendAddress("productupdate")));
      }

      To save a few milliseconds and a few database retrievals, we pass serializable Product objects as byte arrays instead of passing keys to the task.

When this is all running, you can see the progress of the task queues in the Task Queue page. You might find that the Tasks are not being run quick enough. This could be caused by the specifications of your Backends. This is where the actual scaling occurs. You have three options to improve the performance as needed:



false

B8
3
  1. Increase the hardware specifications of your app engine instance. This is done by changing the class of the backends by adding the tag in your backend configuration. There are four types of classes available with different costs for consideration:
  2. Increase the number of instances running. This can be done through the configuration by adding the tag in your backend configuration.
  3. Optimize your code. This option can be difficult but once you are running multiple instances with powerful specifications, you might find a bit of optimization to be cost efficient.

These are just a few of the scaling techniques I discovered in the few months I’ve used Google App Engine. Got tips and tricks? I’d love to hear them! Please comment below.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.