warehouse - Page 4 - Developer IT

Google ou le data warehouse mondial : Partage de connaissance ou possession du marché mondiale ?

Google ou le data warehouse mondiale Partage de connaissance ou possession du marché mondiale ? Google a annoncé la mise en ligne de données supplémentaires nommé World Bank sur son outil public data explorer Via cet outil, vous trouvez toutes les informations mondiales concernant l'agriculture, la consommation électrique par capital, l'émission de CO2 par capital, le nombre d'utilisateurs d'internet,... Ainsi, vous avez par différents graphiques, des statistiques sur toutes les capitales dont vous pourrez comparer les différentes informations propres à ...

Read the article

DataWarehouse - What is a good definition?

- by Randy Minder

Could someone give me a good, practical definition of what a data warehouse is?

Read the article

What books should be on the modern data warehouse architect's shelf?

- by McPeterson

I have a few on mine. Basically just the set of Kimball books: The Data Warehouse LifeCycle The Data Warehouse ETL Toolkit The Data Warehouse Toolkit Are there any others that should be included? Or any other good ones you know? -mcpeterson

Read the article

Database warehoue design: fact tables and dimension tables

- by morpheous

I am building a poor man's data warehouse using a RDBMS. I have identified the key 'attributes' to be recorded as: sex (true/false) demographic classification (A, B, C etc) place of birth date of birth weight (recorded daily): The fact that is being recorded My requirements are to be able to run 'OLAP' queries that allow me to: 'slice and dice' 'drill up/down' the data and generally, be able to view the data from different perspectives After reading up on this topic area, the general consensus seems to be that this is best implemented using dimension tables rather than normalized tables. Assuming that this assertion is true (i.e. the solution is best implemented using fact and dimension tables), I would like to see some help in the design of these tables. 'Natural' (or obvious) dimensions are: Date dimension Geographical location Which have hierarchical attributes. However, I am struggling with how to model the following fields: sex (true/false) demographic classification (A, B, C etc) The reason I am struggling with these fields is that: They have no obvious hierarchical attributes which will aid aggregation (AFAIA) - which suggest they should be in a fact table They are mostly static or very rarely change - which suggests they should be in a dimension table. Maybe the heuristic I am using above is too crude? I will give some examples on the type of analysis I would like to carryout on the data warehouse - hopefully that will clarify things further. I would like to aggregate and analyze the data by sex and demographic classification - e.g. answer questions like: How does male and female weights compare across different demographic classifications? Which demographic classification (male AND female), show the most increase in weight this quarter. etc. Can anyone clarify whether sex and demographic classification are part of the fact table, or whether they are (as I suspect) dimension tables.? Also assuming they are dimension tables, could someone elaborate on the table structures (i.e. the fields)? The 'obvious' schema: CREATE TABLE sex_type (is_male int); CREATE TABLE demographic_category (id int, name varchar(4)); may not be the correct one.

Read the article

What is a manhattan database?

- by Adnan Anwar

A friend of mine was interviewing for a data warehouse and Business Object role But he was asked about the Manhattan database? I have Googled "Manhattan database" and even searched for it on Bing and Yahoo but have found no relevant information. Any help would be greatly appreciated!

Read the article

ETL Operation - Return Primary Key

- by user302254

I am using Talend to populate a data warehouse. My job is writing customer data to a dimension table and transaction data to the fact table. The surrogate key (p_key) on the fact table is auto-incrementing. When I insert a new customer, I need my fact table to reflect the id of the related customer. As I mentioned my p_key is auto auto_incrementing so I can't just insert an arbitrary value for the p_key. Any thought on how I can insert a row into my dimension table and still retrieve the primary key to reference in my fact record? Thanks.

Read the article

Continuous Integration with Oracle Products

- by Lee Gathercole

Hi, I'm currently working on a Datawarehouse project using an Oracle Database, Oracle Data Integrator, Oracle Warehouse Builder and some Jython thrown in for good measure. All of which is held within TFS. My background is .net and prior to this project was seeing a lot of promise in CI. I'm not suggesting that the testing element of CI is feasible in this instance, but I would like to implement a stable deployment strategy. What I'm trying to understand is whether or not I can build some NANT scripts that will allow me to deploy ODI\OWB\Oracle DB code to any given environment at any point. Has anyone tried this before? Are there more appropriate tools out there that lends themselves better to this sort of toolset? Am I just a crazy horse to be evening contemplating this? Any view would be greatly appreciated. Thanks Lee

Read the article

Metadata requirements for developers

- by Paul James

I'm tasked with providing a list of metadata requirements our data warehouse developers might need. This is not the business metadata (nice descriptions etc), but rather data required for change management (also known as impact assesment), data lineage etc. I've seen this article Meta Meta Data Data - Ralph Kimball but as I'm not the first person to do this I'm throwing it to the SO community. The actual question is this: What metadata do datawarehouse developers require to design, develop and manage change in ETL routines? PS: I'm trying to keep the answer platform agnostic but for context this is an Oracle database with PL/SQL and Datastage.

Read the article

How do you verify the correct data is in a data mart?

- by blockcipher

I'm working on a data warehouse and I'm trying to figure out how to best verify that data from our data cleansing (normalized) database makes it into our data marts correctly. I've done some searches, but the results so far talk more about ensuring things like constraints are in place and that you need to do data validation during the ETL process (E.g. dates are valid, etc.). The dimensions were pretty easy as I could easily either leverage the primary key or write a very simple and verifiable query to get the data. The fact tables are more complex. Any thoughts? We're trying to make this very easy for a subject matter export to run a couple queries, see some data from both the data cleansing database and the data marts, and visually compare the two to ensure they are correct.

Read the article

NoSQL for filesystem storage organization and replication?

- by wheaties

We've been discussing design of a data warehouse strategy within our group for meeting testing, reproducibility, and data syncing requirements. One of the suggested ideas is to adapt a NoSQL approach using an existing tool rather than try to re-implement a whole lot of the same on a file system. I don't know if a NoSQL approach is even the best approach to what we're trying to accomplish but perhaps if I describe what we need/want you all can help. Most of our files are large, 50+ Gig in size, held in a proprietary, third-party format. We need to be able to access each file by a name/date/source/time/artifact combination. Essentially a key-value pair style look-up. When we query for a file, we don't want to have to load all of it into memory. They're really too large and would swamp our server. We want to be able to somehow get a reference to the file and then use a proprietary, third-party API to ingest portions of it. We want to easily add, remove, and export files from storage. We'd like to set up automatic file replication between two servers (we can write a script for this.) That is, sync the contents of one server with another. We don't need a distributed system where it only appears as if we have one server. We'd like complete replication. We also have other smaller files that have a tree type relationship with the Big files. One file's content will point to the next and so on, and so on. It's not a "spoked wheel," it's a full blown tree. We'd prefer a Python, C or C++ API to work with a system like this but most of us are experienced with a variety of languages. We don't mind as long as it works, gets the job done, and saves us time. What you think? Is there something out there like this?

Read the article

Should we put our Reporting data warehouse on our Transaction database Server.

- by DrFloyd5

We are about to add a Reporting Data Warehouse to our system. Currently we are using Oracle 10g for our transaction database. Should we put the Reporting Warehouse on physically different hardware? or can we "go cheap" and share the Transaction Server?

Read the article

Video Tutorials for Ab Initio ETL Data Ware housing Tool !

- by srihari

Hi All, Please tell me where I can find video tutorials of Ab-Initio ETL Data Ware Housing tool. I surfed in google but i did not find any materials. Thanks in Advance.

Read the article

Please explain about AbInitio recovery file(.rec)?When should we roll back the file?

- by srihari

Hi All, Please tell the concept of AbInitio recovery file. When the Abinitio graph fails in execution which cases should we rollback the recovery file and in which cases we shouldnt rollback the recovery file. Please provide links for any AbInitio materials. Thanks.

Read the article

Using MS SQL Server Analysis Services to create cubes on hierarchical data

- by Jeff Meatball Yang

I have a hierarchy, like an ProductDimension hierarchy. Currently, I query this table using the Nested Sets methodology. I write the usual star-schema joins to get a time-series report of groups of Products (Batches, CountryOfOrigin, ProductLine, etc.). I'm currently investigating using SSAS cubes- but can a cube model this?

Read the article

I'm looking for a good, active forum on data warehousing

- by TrickyNixon

I've been in the DW business for 15+ years and apart from the Kimball institute and a few blogs, I've yet to find a vital online community for data warehousing topics. Am I missing out on something or will I need to build it myself?

Read the article

Non-relational database modeling tool?

- by Angel Escobedo

Hey guys, please recommend some tools you have used succesfully on DW, DataMart, BI an non-relational modeling. Example for automatic creation of snow-flake Schemas, dimensions and facts tables. Wich tools makes you sense familiarity with the diagrams and surrogates keys and it will have the option for export or connect to SQL Server 2008. Thanks

Read the article

open source business intelligence solutions

- by opensas

which open source business intelligence solution would you recommend? All I need is to build some cubes and let the end user play with dimensions, filter data, sort, etc, and once it's done being able to export it to excel... I'd like the solution to be as simple and easy on resources as possible, and also I'd like it to be as much open source as possible, by the way. I've heard that many solutions available do have many restrictions when it comes to there community version. I'd like to ear your advices and the pros/cons of each alternative, to help me choose the right tool, and if you could point me to some basic demo and tutorial to get started. thanks a lot ps: I'm using sql server databases, they aren't huge databases (in general less than a million records) and I doesn't necessarily have to work on "live" data... ps: some useful links: http://en.wikipedia.org/wiki/Business_intelligence_tools#Open_source_free_products http://www.manageability.org/blog/stuff/open-source-java-business-intelligence http://www.jaspersoft.com/jasperanalysis http://community.pentaho.com/projects/bi_platform/ http://community.pentaho.com/faq/platform_licensing.php http://www.eclipse.org/birt/phoenix/ http://www.spagoworld.org/xwiki/bin/view/SpagoWorld/ http://docs.google.com/viewer?a=v&q=cache:vhsqMQXwCUkJ:www.ow2.org/xwiki/bin/download/Activities/EuropeLocalChapterWebinars/ELCWebinarOSBI.pdf+open+source+business+intelligence&hl=en&pid=bl&srcid=ADGEESgpJJ2MqaKprJQOF2jX2UXCZQjg_asv8d7EVYtq0Vma-e-tR1tFxS-I0SOW0IhJC5acYc94rkDOrgP1WckCp_vk4qhKqR9y2Klp_u9cL8hlXoKoUpMkpAd5wabu61A4W0y15E5P&sig=AHIEtbRJ5FAI-3YK-qtayPjKkF_CwOgZag

Read the article

MDX performance vs. T-SQL

- by SubPortal

I have a database containing tables with more than 600 million records and a set of stored procedures that make complex search operations on the database. The performance of the stored procedures is so slow even with suitable indexes on the tables. The design of the database is a normal relational db design. I want to change the database design to be multidimensional and use the MDX queries instead of the traditional T-SQL queries but the question is: Is the MDX query better than the traditional T-SQL query with regard to performance? and if yes, to what extent will that improve the performance of the queries? Thanks for any help.

Read the article

Problem performance datawarehouse with lots of indexes

- by Lieven Cardoen

Our product takes tests of some 350 candidates at the same time. At the end of the test, results for each candidate are moved to a datawarehouse full of indexes on it. For each test there's some 400 records to be entered in datawarehouse. So 400 x 350 is a lot of records. If there are not much records in the datawarehouse, all goes well. But if there are already lots of records in the datawarehouse, then a lot of inserts fail... Is there a way to have indexes that are only rebuild at the end of the day or isn't that the real problem? Or how would you solve this?

Read the article

How to improve performance of non-scalar aggregations on denormalized tables

- by The Lazy DBA

Suppose we have a denormalized table with about 80 columns, and grows at the rate of ~10 million rows (about 5GB) per month. We currently have 3 1/2 years of data (~400M rows, ~200GB). We create a clustered index to best suit retrieving data from the table on the following columns that serve as our primary key... [FileDate] ASC, [Region] ASC, [KeyValue1] ASC, [KeyValue2] ASC ... because when we query the table, we always have the entire primary key. So these queries always result in clustered index seeks and are therefore very fast, and fragmentation is kept to a minimum. However, we do have a situation where we want to get the most recent FileDate for every Region, typically for reports, i.e. SELECT [Region] , MAX([FileDate]) AS [FileDate] FROM HugeTable GROUP BY [Region] The "best" solution I can come up to this is to create a non-clustered index on Region. Although it means an additional insert on the table during loads, the hit isn't minimal (we load 4 times per day, so fewer than 100,000 additional index inserts per load). Since the table is also partitioned by FileDate, results to our query come back quickly enough (200ms or so), and that result set is cached until the next load. However I'm guessing that someone with more data warehousing experience might have a solution that's more optimal, as this, for some reason, doesn't "feel right".

Read the article

best way to statistically detect anomalies in data

- by reinier

Hi, our webapp collects huge amount of data about user actions, network business, database load, etc etc etc All data is stored in warehouses and we have quite a lot of interesting views on this data. if something odd happens chances are, it shows up somewhere in the data. However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities. My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'. Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go? Any pointers would be great! EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly. It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off. R

Read the article

informatica mapping examples

- by user223541

i want to devlope generic mapping for handling data bases errors in Informatica. Can any one give me any examples of such mappings ? Also can u suggent me some resources for informatica samples mappings (livw webiste or links)

Read the article

database design suggestion needed

- by JMSA

I need to design a table for daily sales of pharmaceutical products. There are hundreds of types of products available {Name, code}. Thousands of sales-persons are employed to sell those products{name, code}. They collect products from different depots{name, code}. They work in different Areas - Zones - Markets - Outlets, etc. {All have names and codes} Each product has various types of prices {Production Price, Trade Price, Business Price, Discount Price, etc.}. And, sales-persons are free to choose from those combination to estimate the sales price. The problem is, daily sales requires huge amount of data-entry. Within couple of years there may be gigabytes of data (if not terabytes). If I need to show daily, weekly, monthly, quarterly and yearly sales reports there will be various types of sql queries I shall need. This is my initial design: Product {ID, Code, Name, IsActive} ProductXYZPriceHistory {ID, ProductID, Date, EffectDate, Price, IsCurrent} SalesPerson {ID, Code, Name, JoinDate, and so on..., IsActive} SalesPersonSalesAraeaHistory {ID, SalesPersonID, SalesAreaID, IsCurrent} Depot {ID, Code, Name, IsActive} Outlet {ID, Code, Name, AreaID, IsActive} AreaHierarchy {ID, Code, Name, PrentID, AreaLevel, IsActive} DailySales {ID, ProductID, SalesPersonID, OutletID, Date, PriceID, SalesPrice, Discount, etc...} Now, apart from indexing, how can I normalize my DailySales table to have a fine grained design that I shall not need to change for years to come? Please show me a sample design of only the DailySales data-entry table (from which all types of reports would be queried) on the basis of above information. I don't need a detailed design advice. I just need an advice regarding only the DailySales table. Is there any way to break this particular table to achieve granularity?

Read the article

Many-To-Many dimensional model

- by Mevdiven

Folks, I have a dimension table called DIM_FILE which holds information of the files we received from customers. Each file has detail records which constitutes my FACT table, CUST_DETAIL. In the main process, file is gone through several stages and each stage tags a status to it. Long in a short, I have many-to-many relationship. Any ideas around star schema dimensional modeling. A customer record only belong to a single file and a file can have multiple statuses. FACT ---- CustID FileID AmountDue DIM_FILE -------- FileID FileName DateReceived FILE_STATUS ----------- FileID StatusDateTime StatusCode

Read the article

Linked Measure Groups and Local Dimensions

- by ekoner

Mulling over something I've been reading up on. According to Chris Webb, A linked measure group can only be used with dimensions from the same database as the source measure group. So I took this to mean as long as two cubes share a database, a linked measure group can be used with a dimension. So I created a new cube and added a local measure group, a local dimension and a linked measure group. However, I can't create a relationship between the linked measure group and the local dimension even though they are within the same database. I get the message below: Regular relationships in the current database between non-linked (local) dimensions and linked measure groups cannot be edited. These relationship can only be created through the wizard. This dialog can be used to delete these relationships. I see that I can go to the original cube and add the dimension there, but does the message below mean I have an alternative? I just know it's going to be something simple and trivial! Thanks for reading.

Search Results

Search found 478 results on 20 pages for 'warehouse'.

Page 4/20 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Randy Minder

- by McPeterson

- by morpheous

- by Adnan Anwar

- by user302254

- by Lee Gathercole

- by Paul James

- by blockcipher

- by wheaties

- by DrFloyd5

- by srihari

- by srihari

- by Jeff Meatball Yang

- by TrickyNixon

- by Angel Escobedo

- by opensas

- by SubPortal

- by Lieven Cardoen

- by The Lazy DBA

- by reinier

- by user223541

- by JMSA

- by Mevdiven

- by ekoner

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >