PDXPUG August: Tsearch2 and Materialized Views

selenamarie's picture

This month's PDXPUG meeting featured Lloyd Albin from SCHARP, the Statistical Center for HIV/AIDS Research & Prevention located near Seattle, WA. Lloyd was nice enough to drive down for the evening and tell us about some of the work he's done over the last six months.

We had a couple announcements - we need talk ideas for future meetings! Please get in touch with Selena if you have ideas for a talk. Tom Raney offered to give a talk on his Visual Planner tool next month, and Len Shapiro offered to give a talk in January or February on teaching database theory using PostgreSQL.

Also, PostgreSQL West is coming up. Get your talks submitted now! See: http://www.postgresqlconference.org/west08/

Lloyd started off with a description of some of the problems his group encountered that led them to use Tsearch. The researchers and financial group both needed to be able to search through most of the textual data, but didn't necessarily know ahead of time which tables or even the columns that they'd be most interested in. So, Lloyd helped construct search tables from the original tables that contained all of the text and references back to the original tables.

SCHARP currently has about 1.5 TB of data spread across 6 PostgreSQL instances, BerkeleyDB files and spreadsheets.

Using a series of triggers and materialized views (generated nightly to improve search speed), the search tables improved the search speed from 8-minutes per query to about 1/2 a second. Lloyd chose to use GIST indexes instead of GIN for speed, and is using the built-in dictionaries for now. He thinks in the future that they will have to generate custom dictionaries as their data set grows.

Lloyd got permission to share his PL/Perl code and table structures. All that information is attached to this post.

AttachmentSize
TSearch2.ppt1.09 MB
sql_source.zip20.25 KB
TSearch2.pptx1.11 MB

Comments

Thank you for your help on

igorma's picture

Thank you for your help on this.
---------------------
my regards

PostgreSQL and a HUGE public health application

sreeberk's picture

RE: The database you mention in your blog

Does this database include information on population, family planning, and/or maternal health? (Specifically globally).

Thank you for your help on this.

Warm Regards,
Amita

Back to top