n-grams from text in PostgreSQL

Posted by harshsinghal on Stack Overflow See other posts from Stack Overflow or by harshsinghal
Published on 2010-06-15T12:59:55Z Indexed on 2010/06/15 13:02 UTC
Read the original article Hit count: 280

Filed under:

sql

|

postgresql

|

text-mining

I am looking to create n-grams from text column in PostgreSQL. I currently split(on white-space) data(sentences) in a text column to an array.

select regexp_split_to_array(sentenceData,E'\s+') from tableName

Once I have this array, how do I go about:

Creating a loop to find n-grams, and write each to a row in another table

Using unnest I can obtain all the elements of all the arrays on separate rows, and maybe I can then think of a way to get n-grams from a single column, but I'd loose the sentence boundaries which I wise to preserve.

Sample SQL code for PostgreSQL to emulate the above scenario

create table tableName(sentenceData text);

INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');

INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');

INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');

select regexp_split_to_array(sentenceData,E'\s+') from tableName;

select unnest(regexp_split_to_array(sentenceData,E'\s+')) from tableName;

© Stack Overflow or respective owner

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

as seen on SQL Authority - Search for 'SQL Authority'
Concatenating string is one of the most common tasks in SQL Server and every developer has to come across it. We have to concat the string when we have to see the display full name of the person by first name and last name. In this video we will see various methods to concatenate the strings. SQL… >>> More
SQL SERVER – Concat Function in SQL Server – SQL Concatenation

as seen on SQL Authority - Search for 'SQL Authority'
Earlier this week, I was delivering Advanced BI training on the subject of “SQL Server 2008 R2″. I had great time delivering the session. During the session, we talked about SQL Server 2010 Denali. Suddenly one of the attendees suggested his displeasure for the product. He said, even though… >>> More
Error with SQL Server Setup 2012 on Windows 2012

as seen on Server Fault - Search for 'Server Fault'
I am trying to install SQL Server on Windows 2012. I was able to finally get the wizard up and running after making some changes on the server, but now it fails no matter what I do with the following error: TITLE: SQL Server Setup failure. SQL Server Setup has encountered the following error: … >>> More
How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to detect which version of SQL I am dealing with to perorm various tasks, I need specifically detect if I am on SQL 2008 or SQL Azure. How can I do this with detection code written in SQL? >>> More
Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

as seen on Stack Overflow - Search for 'Stack Overflow'
Here is the query: INSERT INTO @TempTable SELECT UserID, Name, Address1 = (SELECT TOP 1 [Address] FROM (SELECT TOP 1 [Address] FROM [UserAddress] ua INNER JOIN UserAddressOrder uo ON ua.UserID = uo.UserID WHERE ua.UserID = u.UserID ORDER BY uo.AddressOrder ASC) q ORDER BY AddressOrder… >>> More

Related posts about postgresql

Postgresql fails to start on Ubuntu 10.04.4 LTS

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed postgresql 9.2 from add-apt-repository ppa:pitti/postgresql using apt-get install postgresql-9.2 At the end of the install and every time I try to launch postgresql by using the following command /etc/init.d/postgresql start or service postgresql start I get this error: Error:… >>> More
can't install psycopg2 in my env on mac os x lion

as seen on Server Fault - Search for 'Server Fault'
I tried install psycopg2 via pip in my virtual env, but got this error: ld: library not found for -lpq (full log here: http://pastebin.com/XdmGyJ4u ) I tried install postgres 9.1 from .dmg and via port, (gksks)iMac-Alexander:~ lorddaedra$ locate libpq /Developer/SDKs/MacOSX10.7.sdk/usr/include/libpq /Developer/SDKs/MacOSX10… >>> More
Postgresql has broken apt-get on Ubuntu

as seen on Super User - Search for 'Super User'
On ubuntu 12.04, whenever I try to install a package using apt-get I'm greeted by: The following packages have unmet dependencies: postgresql-9.1 : Depends: postgresql-client-9.1 but it is not going to be instal led E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a so lution)… >>> More
Installing PostgreSQL on FreeBSD (with ports)

as seen on Server Fault - Search for 'Server Fault'
Hey everyone, I am trying to install (using ports) PostgreSQL on a virtual server, running FreeBSD. My one question is this: Which of the following should I install? postgresql-contrib postgresql-docs postgresql-jdbc postgresql-libpgeasy postgresql-libpq++ postgresql-libpqxx postgresql-odbc … >>> More
Strange permission errors in new PostgreSQL installation

as seen on Server Fault - Search for 'Server Fault'
A freshly installed PostgreSQL (with configuration overwritten) won't start: $ sudo service postgresql start * Starting PostgreSQL 9.1 database server * Error: could not read /etc/postgresql/9.1/main/postgresql.conf: Permission denied Looks like it should be able to read it though: $ ls -l postgresql… >>> More