Efficient Trie implementation for unicode strings

Posted by U Mad on Programmers See other posts from Programmers or by U Mad
Published on 2012-07-05T11:25:42Z Indexed on 2012/07/05 15:22 UTC
Read the original article Hit count: 371

Filed under:

how-to

I have been looking for an efficient String trie implementation. Mostly I have found code like this:

Referential implementation in Java (per wikipedia)

I dislike these implementations for mostly two reasons:

They support only 256 ASCII characters. I need to cover things like cyrillic.
They are extremely memory inefficient.

Each node contains an array of 256 references, which is 4096 bytes on a 64 bit machine in Java. Each of these nodes can have up to 256 subnodes with 4096 bytes of references each. So a full Trie for every ASCII 2 character string would require a bit over 1MB. Three character strings? 256MB just for arrays in nodes. And so on.

Of course I don't intend to have all of 16 million three character strings in my Trie, so a lot of space is just wasted. Most of these arrays are just null references as their capacity far exceeds the actual number of inserted keys. And if I add unicode, the arrays get even larger (char has 64k values instead of 256 in Java).

Is there any hope of making an efficient trie for strings? I have considered a couple of improvements over these types of implementations:

Instead of using array of references, I could use an array of primitive integer type, which indexes into an array of references to nodes whose size is close to the number of actual nodes.
I could break strings into 4 bit parts which would allow for node arrays of size 16 at the cost of a deeper tree.

Related posts about how-to

How to Backup Your Web-Based Email Account Using Thunderbird

as seen on How to geek - Search for 'How to geek'
If the Gmail scare earlier this week has you thinking about backing up your Gmail or other web-based email account, we’re here to help. Read on to learn how to backup your web-based email using open source email application Thunderbird. In case you missed it, earlier this week Gmail suffered an unusual… >>> More
How to Enable Desktop Notifications for Gmail in Chrome

as seen on How to geek - Search for 'How to geek'
Last year Google rolled out desktop notifications for Google Calendar, now you can get Gmail and Gchat notifications on your desktop too. Read on as we walk you through configuring them both. Chrome’s desktop notifications are clean, easy to read, and really handy for keeping an eye on what’s going… >>> More
How to Remote View and Control Your Android Phone

as seen on How to geek - Search for 'How to geek'
If you’ve ever wished you could see your Android phone’s screen on your desktop or remote control it using your mouse and keyboard we’ll show you how in this simple guide to gaining remote access to your Android device. Why would you want to gain access? When you’re done with this tutorial you’ll… >>> More
How to Monitor the Bandwidth Consumption of Individual Applications

as seen on How to geek - Search for 'How to geek'
Yesterday we showed you how to monitor and track your total bandwidth usage, today we’re back to show you how to keep tabs on individual applications and how much bandwidth they’re gobbling up. We’ve received several reader requests, both by email and in the aforementioned post about bandwidth tracking… >>> More
How to Change the Default Application for Android Tasks

as seen on How to geek - Search for 'How to geek'
When it comes time to switch from using one application to another on your Android device it isn’t immediately clear how to do so. Follow along as we walk you through swapping the default application for any Android task. Initially changing the default application in Android is a snap. After you… >>> More

Developer IT