Using hashing to group similar records

Posted by Neil Dobson on Stack Overflow See other posts from Stack Overflow or by Neil Dobson
Published on 2010-05-22T01:30:41Z Indexed on 2010/05/22 1:40 UTC
Read the original article Hit count: 365

Filed under:

database-design

|

hash

I work for a fulfillment company and we have to pack and ship many orders from our warehouse to customers. To improve efficiency we would like to group identical orders and pack these in the most optimum way. By identical I mean having the same number of order lines containing the same SKUs and same order quantities.

To achieve this I was thinking about hashing each order. We can then group by hash to quickly see which orders are the same.

We are moving from an Access database to a PostgreSQL database and we have .NET based systems for data loading and general order processing systems, so we can either do the hashing during the data loading or hand this task over to the DB.

My question firstly is should the hashing be managed by DB, possibly using triggers, or should the hash be created on-the-fly using a view or something?

And secondly would it be best to calculate a hash for each order line and then to combine these to find an order-level hash for grouping, or should I just use a trigger for all CRUD operations on the order lines table which re-calculates a single hash for the entire order and store the value in the orders table?

TIA

© Stack Overflow or respective owner

Related posts about database-design

(Database Design - products attributes): What is better option for product attribute database design

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I new in database design. What is better option for product attribute database design for cms?(Please suggest other options also). option 1: 1 table products{ id product_name color price attribute_name1 attribute_value1 attribute_name2 attribute_value2 attribute_name3 attribute_value3 } option… >>> More
Book Review: Pro SQL Server 2008 Relational Database Design and Implementation

as seen on SQL Blog - Search for 'SQL Blog'
Investing in proper database design is a very efficient way to cut maintenance costs. If we expect a system to last, we need to make sure it has a good solid foundation - high quality database design. Surely we can and sometimes do cut corners and save on database design to get things done faster… >>> More
Advice on database design / SQL for retrieving data with chronological order

as seen on Stack Overflow - Search for 'Stack Overflow'
I am creating a database that will help keep track of which employees have been on a certain training course. I would like to get some guidance on the best way to design the database. Specifically, each employee must attend the training course each year and my database needs to keep a history of… >>> More
Fiscal year handling strategies in database design

as seen on Stack Overflow - Search for 'Stack Overflow'
By fiscal year I mean all the data in the database (in all tables) that occurred in the particular year. Lets say that we are building an application that allows user to choose from different years. What way of implementing this would you prefer, and why: Separate fiscal year data based on multiple… >>> More
Database design for summarized data

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a new table I'm going to add to a bunch of other summarized data, basically to take some of the load off by calculating weekly avgs. My question is whether I would be better off with one model over the other. One model with days of the week as a column with an additional column for price… >>> More

Related posts about hash

Problem with hash function: hash(1) == hash(1.0)

as seen on Stack Overflow - Search for 'Stack Overflow'
I have an instance of dict with ints, floats, strings as keys, but the problem is when there are a as int and b as float, and float(a) == b, then their hash values are the same, and thats what I do NOT want to get because I need unique hash vales for this cases in order to get corresponding values… >>> More
Hash table vs Hash list vs Hash tree?

as seen on Stack Overflow - Search for 'Stack Overflow'
What property makes Hash table, Hash list and Hash tree different from each other? Which one is used when? When is table superior than tree. >>> More
Hash of unique value = unique hash?

as seen on Stack Overflow - Search for 'Stack Overflow'
Theoretically does hashing a unique value yield a unique value? Let's say I have a DB table with 2 columns: id and code. id is an auto-incrementing int and code is a varchar. If I do ... $code = sha1($id); ... and then store $code into the same row as $id. Will my code column be unique as well… >>> More
Constructing a hash table/hash function.

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I would like to construct a hash table that looks up keys in sequences (strings) of bytes ranging from 1 to 15 bytes. I would like to store an integer value, so I imagine an array for hashing would suffice. I'm having difficulty conceptualizing how to construct a hash function such that given… >>> More
EMERGENCY - Major Problems After Perl Module Installed via WHM

as seen on Stack Overflow - Search for 'Stack Overflow'
I attempted to install the perl module Net::Twitter::Role::API::Lists using WHM and after doing so my whole site came down. It seems that something that was updated with the install isn't functioning correctly and since our website it written in Perl none of our site scripts will run. In almost 8… >>> More