Autocomplete engine in Go: let’s build it

Reading Time: 4 minutes

Some time ago I worked on a small autocomplete web service for fun and curiosity. Part of it consists pretty much of what I’m going to talk about in this post.

We’re gonna build a small completion suggester in Go.

A couple of assumptions I’ll have for this experiment are:

  • we don’t care about sub-word matching
  • we want a little typo tolerance
  • we’ll do case-insensitive matching

The prefix-based map

We’re gonna take advantage of PrefixMap, a small map implementation of mine that helps with prefix-based key matching. You may have seen this kind of map called Radix Tree, Trie or Prefix Tree as specified on Wikipedia, but I couldn’t find a naming convention for this kind of tree in the Go land. Maybe you can suggest a better name in the comments.

Anyway, the prefix-based map will be our main data source for suggestions. We’ll use it to store all the suggestions we want to match through our engine.

The Levenshtein distance

We’ll use the Levenshtein distance as a metric to compute how far the typed in string is from the matches that we’ve found. In particular, we’ll define our own similarity metric as:

[code]
levenshtein(match,substr)
1.0 – —————————
max(|match|,|substr|)
[/code]

Where substr is the typed in string and match is the candidate match we have found.

A similarity of 1.0 means that substr and match are equal.

The problem

We have a list of strings that the input will potentially match against. We want to find a few candidate matches that honor a certain similarity threshold we define.

This is a very simplified version of what happens when you start typing text into an auto-completion-enabled input field. Every time you type in, the string you’ve typed in so far gets evaluated against a data source to find the most relevant suggestions for you. And all of this has to happen pretty fast to be effective. You want your auto-completion service to be faster than the user who’s typing so that he can save time and select the suggestion rather than typing it all.

For this particular reason, the implementation of the Prefix Map we’re gonna use will be able to efficiently find all the values for the given prefix. This will save us from having to populate a more traditional map with all possible prefixes for a given key in advance. Instead, thanks to the tree-like structure values will be stored into the map, we’ll be able to just traverse the values in the tree that share a common prefix.

An example solution

For this specific example, our data source is going to be a list of world countries. Our target users will have to select the right country so they will start typing it in and we’ll provide a few suggestions to save them typing.

The autocomplete code

First of all, let’s start from the low-hanging fruits.

We’ve defined our concept of similarity so we’ll start by writing a function for it.

As you can see, the function accepts the ld parameter as one of the inputs. That’s the Levenshtein Distance computed between the words we want to know the similarity of.

Now, let’s populate our PrefixMap with the country list we have. For the purpose of the exercise I’m gonna use the list of country names in English provided by this online service that I’ve turned into a Go slice. You can find a gist for it here.

Now that we have everything we need, let’s work on the main component of our little program: the matching code.

...
        values := datasource.GetByPrefix(strings.ToLower(input))
	results := make([]*Match, 0, len(values))
	for _, v := range values {
		value := v.(string)
		s := similarity(len(value), len(input), LevenshteinDistance(value, input))
		if s >= similarityInput {
			m := &Match{value, s}
			results = append(results, m)
		}
	}

	fmt.Printf("Result for target similarity: %.2f\n", similarityInput)
	PrintMatches(results)
...

We’re taking advantage of the GetByPrefix method from PrefixMap. GetByPrefix will return a flattened collection of all the values in the map that belong to the specified prefix. Pretty handy, isn’t it?

A further filtering I’m applying there, as you can see, is the similarity verification step. I’m going through the list of matches that we have retrieved to filter them according to the similarity input we have received as input in our program.

You can find the full example implementation on GitHub.

The output

This is really it. Here’s a few example invocations of the little code we’ve just written:

$ go run autocompleter.go -similarity 0.3 itlaly

Result for target similarity: 0.30
match: 	Italy	similarity: 0.67
$ go run autocompleter.go -similarity 0.2 united

Result for target similarity: 0.20
match: 	United Arab Emirates	similarity: 0.25	
match: 	United Kingdom	similarity: 0.36	
match: 	United States	similarity: 0.38
$ go run autocompleter.go France

Result for target similarity: 0.30
match: 	France	similarity: 1.00	
match: 	Metropolitan France	similarity: 0.32	

 

Further improvements

As you have seen, our little program allows for little typos in some situations, except when it occurs at the beginning of our input. This is because we’re using a PrefixMap which will not match anything at all if we start with the wrong prefix, of course. An improvement in this sense would probably be to fall back to a full-text search when no matches have been found for the specified similarity in the first pass.

Hopefully this post has been somewhat useful for you, or at least entertaining. Let me know if you need any clarification or if you have to recommend a better approach to achieve this.

Gogoa – Cocoa bindings for Go

Reading Time: 2 minutes

I don’t actually know why but I was wondering how easy it is to integrate Cocoa with Go. Well [SPOILER] looks like it’s super easy!
The first comfortable piece I encountered, actually, was Go 1.3 changelog where it states:

Finally, the go command now supports packages that import Objective-C files (suffixed .m) through cgo.

Since I’m now with Go 1.4.2 I thought: “This should be even easier now!”. Reading through the cgo reference looks like interoperability is getting stronger with Go, despite this comment on SO.

Hahaha… Go’s biggest weakpoint… interoperability. –  Matt JoinerJun 12 ’11 at 13:54

Gimme the code

I know, too much words and no code so far. Turns out that showing a Cocoa window with Go may be as easy as the following:

If you think this is too clean to be true you are kind of right. I actually started this project on GitHub which performs all the ugly bits underneath. I’d really love you to help me get this a little further.

How does it work?

There’s this go tool called cgo which does most of the magic by itself when you build a hybrid project like this. The reference is rather comprehensive but I’d like to highlight a few issues I encountered while wrapping the code up.

1. Keep import “C” after the C statements

Let’s take my application.go as an example:

As you can see import "C" is preceded by three directives as per the cgo standard. Make sure the import "C" directive is right after the actual C directives otherwise your code won’t build. This is because C is a pseudo package generated also using the code you provide through those preceding directives.

2. Always specify the compiler directives

Every file exposing the Objective-C bindings should specify the following flags at least.

// #cgo CFLAGS: -x objective-c
// #cgo LDFLAGS: -framework Cocoa

//#include "g_application.h"

import "C"

Otherwise you would end up in a long long list of compilation errors. At least this is what happened to me.

3. Always void*

If you end up working with pointers to ObjectiveC classes always convert them into void* before returning them back to Go. If you don’t, passing them back to C/ObjectiveC might become painful or you may end up with errors like the following:

struct size calculation error off=8 bytesize=0

At least this is what occurred to me. You may help me see the light with this.

4. Keep the non-Go code in separate files

Although cgo allows for the whole non-Go code to go in the comments, please, don’t do that. You’ll end up in unreadable code. Check out my repository for a possible solution.

Now what?

I don’t know actually. It really took me 1 hour to get this code compiling and running just for the sake of seeing a Cocoa Window. Didn’t expect much more.
Also, I’ve been doing some work with bindings in the past (Qt – Android – JNI, Cocoa – Qt – C++) and it always gets painful when the main framework is not run in the main thread. I don’t know if this will ever the case with Go. I’m not even sure how far this can be pushed especially when it’s about goroutines and defer‘ed stuff. But, despite not so pleasant past experiences I’d like experimenting this with Go and Cocoa as well.

A question for you

How would you write tests for those bindings?

Please, contribute

If you are any interested in this project, please, shout out loud. That would really help me experiment a bit more. Even if you just want to complain about something terribly wrong I did, please, do!

Cheers.