Cloud OCR with Google Vision API with Spring Boot

Posted on Posted in Java, Technology Center

Cloud OCR with Google Vision API with Spring BootGoogle has released its Google Vision API that allows developers to use advanced AI techniques to analyze images. This API can recognize objects, logos, and texts. With this, developers can now integrate visual recognition into their application to deliver better user experience. Some applications that can be supported by the API includes emotion detection to know if a person is happy, or sad; detection of inappropriate (nude, etc.) images, detection of company logos, and even street sign translation.

Our recent implementation using this API is to perform OCR (Optical Character Recognition) that can automatically classify documents. Based on our sample documents, the Google Vision API has done a pretty good job recognising texts in scanned documents.

The infrastructure of our main application runs on Microsoft Azure. As such, we need to integrate Google Cloud API to an external server. This requires additional authentication step compared to hosting the application within Google’s infrastructure. When deploying the application within Google Compute Engine or App Engine, the user authentication is automatically enabled using the corresponding service account. However, when the application is hosted in an external environment, additional authentication steps are needed.

Below are the steps performed to use Google API using Spring Boot hosted on another provider (Azure, AWS, or anywhere not Google). While the application is running on an external provider, the files/images are stored Google Cloud Storage to reduce data transfer between servers.

STEP 1: Setup Authentication using API Key.
Follow steps recommended in the documentation here – https://cloud.google.com/vision/docs/auth-template/cloud-api-auth. Since we are accessing the API from an external server, we need to use a credential using API key. You will need to download the json certificate as this is necessary to setup authentication.

Step 2: Whitelist domain name and authenticate domain ownership.
You will need to add your domain into the whitelist. This can be found on your Google Cloud Platform console > API Manager > Credentials. You will need to verify domain ownership as instructed by Google either by adding DNS settings or uploading specific files.

Once tasks above have been successfully performed, create a sample program as shown below:

public class GVision {
	private static final String APPLICATION_NAME = "Ideyatech-Sample/1.0";
	private static final int MAX_RESULTS = 6;
	private final Logger logger = LoggerFactory.getLogger(this.getClass());
	private Vision vision;

	public GVision() {
		vision = authenticateGoogleAPI();
	}

	/**
	 * Connects to the Vision API using Application Default Credentials.
	 */
	private Vision authenticateGoogleAPI() {
		try {
			GoogleCredential credential = GoogleCredential.getApplicationDefault().createScoped(VisionScopes.all());
			JsonFactory jsonFactory = JacksonFactory.getDefaultInstance();
			return new Vision.Builder(GoogleNetHttpTransport.newTrustedTransport(), jsonFactory, credential)
					.setApplicationName(APPLICATION_NAME).build();
		} catch (IOException e) {
			logger.error("Unable to access Google Vision API", e);
		} catch (GeneralSecurityException e) {
			logger.error("Unable to authenticate with Google Vision API", e);
		}
		return vision;
	}

	/**
	 * Gets up to {@code maxResults} text for an image stored at
	 * {@code uri}.
	 */
	public List doOCR(String uri) throws Exception {

		if (vision == null)
			authenticateGoogleAPI();

		AnnotateImageRequest request = new AnnotateImageRequest()
				.setImage(new Image().setSource(new ImageSource().setGcsImageUri(uri)))
				.setFeatures(ImmutableList.of(new Feature().setType("TEXT_DETECTION").setMaxResults(MAX_RESULTS)));
		Vision.Images.Annotate annotate;
		try {
			annotate = vision.images()
					.annotate(new BatchAnnotateImagesRequest().setRequests(ImmutableList.of(request)));
			BatchAnnotateImagesResponse batchResponse = annotate.execute();
			assert batchResponse.getResponses().size() == 1;
			AnnotateImageResponse response = batchResponse.getResponses().get(0);
			if (response.getError() != null) {
				logger.error("Failed to process document ["+uri+"]");
				logger.error(response.getError().getMessage());
				throw new Exception(response.getError().getMessage());
			} else {
				return response.getTextAnnotations();				
			}
		} catch (IOException e) {
			logger.error("Failed to process document ["+uri+"]",e);
			throw e;
		}
	}
}

The class above establishes a connection with the Google Cloud Vision API upon instantiation of the class.
Afterwards, you may invoke doOCR method passing the “gs” path of the file for processing. Below is a sample call to the class:

    private GVision vision = new GVision();
...
    String gs = "gs://" + getBucket() + "/" +getName();
    vision.doOCR(gs);

where getBucket() and getName() refers to cloud storage bucket and filename.

When you encounter an error about your credential file, you need to export GOOGLE_EXPORT_CREDENTIALS to the path where your credential file is downloaded.

In addition, you may want to automatically trigger OCR operation when a new file is uploaded to your cloud storage. To do this, you will need to setup valid SSL access in order for Google to send a notification to your application as documented here: https://cloud.google.com/storage/docs/object-change-notification. Take note that the SSL must be signed by a CA, and cannot be self-signed. Here is a sample command to enable the object listener:

gsutil notification watchbucket -t  https:///storage/docu-notify gs:///

This enables a notification to trigger the specified sample url whenever changes occur on Google Cloud Storage. The secret code is a security token passed in the header as channelToken and used to authenticate the request.

Afterwards, you will need to create a controller that will be listed to Google API call, as shown below:

	@RequestMapping(value = "/storage/docu-notify", method = RequestMethod.POST)
	public String notify(
			@RequestBody ChangeNotification req,
			@RequestHeader("X-Goog-Resource-State") String resourceState,
			@RequestHeader("X-Goog-Resource-Id") String resourceId,
			@RequestHeader("X-Goog-Channel-Token") String channelToken,
			@RequestHeader("X-Goog-Channel-Id") String channelId			
			) {
		logger.info("Notification initiated. " +
				"Resource ID=" + resourceId + ":" +
				"Channel ID="+ channelId + ":" +
				"Channel Token="+channelToken
		);
		if (API_KEY.equals(channelToken)) {
			if ("exists".equals(resourceState)) {
				DocuMeta meta = new DocuMeta();
				meta.setBucket(req.getBucket());
				meta.setName(req.getName());
				meta.setStatus("new");
				logger.info("Receiving new document [" + req.getName() + "].");
				repository.save(meta);
			} else if ("not_exists".equals(resourceState)) {
				repository.delete(repository.findByName(req.getName()));
				logger.info("Removing document [" + req.getName() + "].");
			} else {
				logger.warn("Invalid resource state [" + resourceState + "].");
			}
			return "success";
		} else {
			logger.warn("Invalid key parameter.");
			return "";
		}
	}

The sample code above is invoked by Google when change is performed on Cloud Storage. When invoked, this method saves the bucket and name into a MongoDB database for further processing of the OCR function.

Hope these snippets of code guides you with your Vision API project.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.