I have a database that contains a history of product sales. For example the following table
CREATE TABLE SalesHistoryTable (
OrderID, // Order Number Unique to all orders
ProductID, // Product ID can be used as a Key to look up product info in another table
Price, // Price of the product per unit at the time of the order
Quantity, // quantity of the product for the order
Total, // total cost of the order for the product. (Price * Quantity)
Date, // Date of the order
StoreID, // The store that created the Order
PRIMARY KEY(OrderID));
The table will eventually have millions of transactions. From this, profiles can be created for products in different geographical regions (based on the StoreID). Creating these profiles can be very time consuming as a database query. For example.
SELECT ProductID, StoreID,
SUM(Total) AS Total,
SUM(Quantity) QTY,
SUM(Total)/SUM(Quantity) AS AvgPrice
FROM SalesHistoryTable
GROUP BY ProductID, StoreID;
The above query could be used to get the Information based on products for any particular store. You could then determine which store has sold the most, has made the most money, and on average sells for the most/least. This would be very costly to use as a normal query run anytime. What are some design descisions in order to allow these types of queries to run faster assuming storage size isn’t an issue. For example, I could create another Table with duplicate information.
Store ID (Key), Product ID, TotalCost, QTY, AvgPrice
And provide a trigger so that when a new order is received, the entry for that store is updated in a new table. The cost for the update is almost nothing.
What should be considered when given the above scenario?