etl instrumentation - Page 5

What books should be on the modern data warehouse architect's shelf?

- by McPeterson

I have a few on mine. Basically just the set of Kimball books: The Data Warehouse LifeCycle The Data Warehouse ETL Toolkit The Data Warehouse Toolkit Are there any others that should be included? Or any other good ones you know? -mcpeterson

Read the article

How to extract data from Google Analytics and build a data warehouse (webhouse) from it?

- by nkaur301

I have click stream data such as referring URL, top landing pages, top exit pages and metrics such as page views, number of visits, bounces all in Google Analytics. I am required to build a data warehouse from scratch(which I believe is known as web-house) from this data. My questions are:- 1)Is it possible? Every day data increases (some in terms of metrics or measures such as visits and some in terms of new referring sites), how would the process of loading the warehouse go about? 2)What ETL tool would help me to achieve this? Pentaho I believe has a way to pull out data from Google Analytics, has anyone used it? How does that process go? Any references, links would be appreciated besides answers.

Read the article

Empty data problem - data layer or DAL?

- by luckyluke

I designing the new App now and giving the following question a lot of thought. I consume a lot of data from the warehouse, and the entities have a lot of dictionary based values (currency, country, tax-whatever data) - dimensions. I cannot be assured though that there won't be nulls. So I am thinking: create an empty value in each of teh dictionaries with special keyID - ie. -1 do the ETL (ssis) do the correct stuff and insert -1 where it needs to let the DAL know that -1 is special (Static const whatever thing) don't care in the code to check for nullness of dictionary entries because THEY will always have a value But maybe I should be thinking: import data AS IS let the DAL do the thinking using empty record Pattern still don't care in the code because business layer will have what it needs from DAL. I think is more of a approach thing but maybe i am missing something important here... What do You think? Am i clear? Please don't confuse it with empty record problem. I do use emptyCustomer think all the time and other defaults too.

Read the article

Handling primary key duplicates in a data warehouse load

- by Meff

I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID. I've encountered a problem with transactions being reversed - In the transactional database this is done via a status, which I pick up and I can work out if the transaction is being done, or rolled back so I can load a reversal row in the warehouse. However, the reversal row will have the same transaction ID and so I get a primary key violation. I've solved this for now by negating the primary key, so transaction ID 1 would be a payment, and transaction ID -1 (In the warehouse only) would be the reversal. I have considered an alternative of generating a BIT column, where 0 is normal and 1 is reversal, then making the PK the transaction ID and the BIT column. My question is, is this a good practice, and has anyone else encountered anything like this? For reference, this is a payment processing system, so values will not be modified, so there will only ever be transactions and reversals.

Read the article

VS 2010 Profiling Problem with Signed Assemblies

- by Binder

I have a website that uses AjaxControlToolkit.dll and Log4Net.dll; When I try to run the performance profiling tool in VS 2010 on it it gives me the following warnings "AjaxControlToolkit.dll is signed and instrumenting it will invalidate its signature. If you proceed without a post-instrument event to re-sign the binary it may not load correctly". Now, if I choose the option to continue without re-signing the profiling starts but the assembly doesn't load and gives an ASP.NET exception.

Read the article

VS 2010 Profiling Problem with Signed Assemblies

- by Binder

I have a website that uses AjaxControlToolkit.dll and Log4Net.dll; When I try to run the performance profiling tool in VS 2010 on it it gives me the following warnings "AjaxControlToolkit.dll is signed and instrumenting it will invalidate its signature. If you proceed without a post-instrument event to re-sign the binary it may not load correctly". Now, if I choose the option to continue without re-signing the profiling starts but the assembly doesn't load and gives an ASP.NET exception.

Read the article

Android InstrumentationTestRunner XML output for Hudson ingestion

- by Dan Watling

I have an Android test project that I'd like to link into Hudson, but I haven't found a way to output the test results as XML instead of text. Does anyone know if there's an easy way to do this already? -Dan

Read the article

Is there an equivalent to Java's ClassFileTransformer in .NET? (a way to replace a class)

- by Alix

I've been searching for this for quite a while with no luck so far. Is there an equivalent to Java's ClassFileTransformer in .NET? Basically, I want to create a class CustomClassFileTransformer (which in Java would implement the interface ClassFileTransformer) that gets called whenever a class is loaded, and is allowed to tweak it and replace it with the tweaked version. I know there are frameworks that do similar things, but I was looking for something more straightforward, like implementing my own ClassFileTransformer. Is it possible? EDIT #1. More details about why I need this: Basically, I have a C# application and I need to monitor the instructions it wants to run in order to detect read or write operations to fields (operations Ldfld and Stfld) and insert some instructions before the read/write takes place. I know how to do this (except for the part where I need to be invoked to replace the class): for every method whose code I want to monitor, I must: Get the method's MethodBody using MethodBase.GetMethodBody() Transform it to byte array with MethodBody.GetILAsByteArray(). The byte[] it returns contains the bytecode. Analyse the bytecode as explained here, possibly inserting new instructions or deleting/modifying existing ones by changing the contents of the array. Create a new method and use the new bytecode to create its body, with MethodBuilder.CreateMethodBody(byte[] il, int count), where il is the array with the bytecode. I put all these tweaked methods in a new class and use the new class to replace the one that was originally going to be loaded. An alternative to replacing classes would be somehow getting notified whenever a method is invoked. Then I'd replace the call to that method with a call to my own tweaked method, which I would tweak only the first time is invoked and then I'd put it in a dictionary for future uses, to reduce overhead (for future calls I'll just look up the method and invoke it; I won't need to analyse the bytecode again). I'm currently investigating ways to do this and LinFu looks pretty interesting, but if there was something like a ClassFileTransformer it would be much simpler: I just rewrite the class, replace it, and let the code run without monitoring anything. An additional note: the classes may be sealed. I want to be able to replace any kind of class, I cannot impose restrictions on their attributes. EDIT #2. Why I need to do this at runtime. I need to monitor everything that is going on so that I can detect every access to data. This applies to the code of library classes as well. However, I cannot know in advance which classes are going to be used, and even if I knew every possible class that may get loaded it would be a huge performance hit to tweak all of them instead of waiting to see whether they actually get invoked or not. POSSIBLE (BUT PRETTY HARDCORE) SOLUTION. In case anyone is interested (and I see the question has been faved, so I guess someone is), this is what I'm looking at right now. Basically I'd have to implement the profiling API and I'll register for the events that I'm interested in, in my case whenever a JIT compilation starts. An extract of the blogpost: In your ICorProfilerCallback2::ModuleLoadFinished callback, you call ICorProfilerInfo2::GetModuleMetadata to get a pointer to a metadata interface on that module. QI for the metadata interface you want. Search MSDN for "IMetaDataImport", and grope through the table of contents to find topics on the metadata interfaces. Once you're in metadata-land, you have access to all the types in the module, including their fields and function prototypes. You may need to parse metadata signatures and this signature parser may be of use to you. In your ICorProfilerCallback2::JITCompilationStarted callback, you may use ICorProfilerInfo2::GetILFunctionBody to inspect the original IL, and ICorProfilerInfo2::GetILFunctionBodyAllocator and then ICorProfilerInfo2::SetILFunctionBody to replace that IL with your own. The great news: I get notified when a JIT compilation starts and I can replace the bytecode right there, without having to worry about replacing the class, etc. The not-so-great news: you cannot invoke managed code from the API's callback methods, which makes sense but means I'm on my own parsing the IL code, etc, as opposed to be able to use Cecil, which would've been a breeze. I don't think there's a simpler way to do this without using AOP frameworks (such as PostSharp). If anyone has any other idea please let me know. I'm not marking the question as answered yet.

Read the article

Agilent E4426B signal generator locks up during multiple GPIB *SAV operations

- by aspiehler

I have a test fixture with an Agilent E4426B RF signal generator connected to a PC via a National Instrument Ethernet-to-GPIB bridge. My software is attempting to sanitize the instrument by presetting it and then saving the current state to all of the memory locations writable via the standard SCPI command "*SAV x,y". The loop works to a point, but eventually the instrument responds with an error and continuously displays the "L" icon on the front display and a "Remote preset" message at the bottom. At that point it won't respond to any more remote commands and I have to either cycle power or press LOCAL, then PRESET at which point it takes about 3 minutes to finish presetting. At that point the "L" icon is still present and and the next GPIB command sent to the instrument causes it to report a -113 error (undefined header) in the instrument error queue. I fired up NI spy to see what was happening, and found that the error was happening at the same point in the loop - "*SAV 6,2" in this case. From NI Spy: Send (0, 0x0017, "*SAV 6,2", 8 (0x8), NLend (0,0x01)) Process ID: 0x00000520 Thread ID: 0x00000518 ibsta:0xc168 iberr: 6 ibcntl: 2(0x2) And here's the code from the instrument driver: int CHP_E4426b::Erase() { if ((m_StatusCode = Initialize()) != GPIB_SUCCESS) // basically just sends "*RST" return m_StatusCode; m_SaveState = "*SAV %d, %d"; for (int i=0; i < 10; i++) for (int j=0; j < 100; j++) { sprintf(m_CmdString, m_SaveState, j, i); if ((m_StatusCode = Send(m_CmdString, strlen(m_CmdString))) != GPIB_SUCCESS) return m_StatusCode; } return GPIB_SUCCESS; } I tried putting a small Sleep() delay (10-20 ms) at the end of the inner loop, and to my surprise it caused the error to show up earlier rather than later. 10 ms caused the loop to error out at 44,1 and 20 ms was even sooner. I've already eliminated faulty cabling or the instrument as the culprit. This same type of sequence works without any error on a higher end signal generator, so I'm tempted to chalk this up to a bug in the instrument's firmware.

Read the article

How mature is java.lang.instrument?

- by Juan Tamayo

Hi Everyone, I'll be working on a project for instrumenting a relatively complex java application, and I'm planning to use java.lang.instrument to hook into the JVM and redefine classes before they're loaded. What is your take on this package? Is it well supported across JVMs? Does it work well with Hotspot? Thanks!

Read the article

Is there an equivalent to Java's ClassFileTransformer in .NET?

- by Alix

I've been searching for this for quite a while with no luck so far. Is there an equivalent to Java's ClassFileTransformer in .NET? Basically, I want to create a class CustomClassFileTransformer (which in Java would implement the interface ClassFileTransformer) that gets called whenever a class is loaded, and is allowed to tweak it and replace it with the tweaked version. I know there are frameworks that do similar things, but I was looking for something more straightforward, like implementing my own ClassFileTransformer. Is it possible?

Read the article

Instrumenting a string

- by George Polevoy

Somewhere in C++ era i have crafted a library, which enabled string representation of the computation history. Having a math expression like: TScalar Compute(TScalar a, TScalar b, TScalar c) { return ( a + b ) * c; } I could render it's string representation: r = Compute(VerbalScalar("a", 1), VerbalScalar("b", 2), VerbalScalar("c", 3)); Assert.AreEqual(9, r.Value); Assert.AreEqual("(a+b)*c==(1+2)*3", r.History ); C++ operator overloading allowed for substitution of a simple type with a complex self-tracking entity with an internal tree representation of everything happening with the objects. Now i would like to have the same possibility for NET strings, only instead of variable names i would like to see a stack traces of all the places in code which affected a string. And i want it to work with existing code, and existing compiled assemblies. Also i want all this to hook into visual studio debugger, so i could set a breakpoint, and see everything that happened with a string. Which technology would allow this kind of things? I know it sound like an utopia, but I think visual studio code coverage tools actually do the same kind of job while instrumenting the assemblies.

Read the article

Is Object constructor called when creating an array in Java?

- by Peter Štibraný

In Java, an array IS AN Object. My question is... is an Object constructor called when new arrays is being created? We would like to use this fact to instrument Object constructor with some extra bytecode which checks length of array being constructed. Would that work?

Read the article

convert char[] to String in btrace

- by usovmv

Hi folks! I'm profiling application with btrace (https://btrace.dev.java.net) and faced with limitation. I try to get a name of current java.lang.Thread. Normaly you can call getName() but it's forbidden in btrace-scripts (any calls exception BTraceUtils). Is there any idea how to get String from char[]. The original task is check if name of thread contains sub-string and only then log out tracing info (reducing output). thanks, Max.

Read the article

Please explain about AbInitio recovery file(.rec)?When should we roll back the file?

- by srihari

Hi All, Please tell the concept of AbInitio recovery file. When the Abinitio graph fails in execution which cases should we rollback the recovery file and in which cases we shouldnt rollback the recovery file. Please provide links for any AbInitio materials. Thanks.

Read the article

Difference Pentaho to JasperSoft - 2010

- by Bradford

What are the main differences between the Pentaho BI Suite and JasperSoft BI Suite (as they are currently packaged in 2010)?

Read the article

Best way to produce automated exports in tab-delimited form from Teradata?

- by Cade Roux

I would like to be able to produce a file by running a command or batch which basically exports a table or view (SELECT * FROM tbl), in text form (default conversions to text for dates, numbers, etc are fine), tab-delimited, with NULLs being converted to empty field (i.e. a NULL colum would have no space between tab characters, with appropriate line termination (CRLF or Windows), preferably also with column headings. This is the same export I can get in SQL Assistant 12.0, but choosing the export option, using tab delimiter, setting my NULL value to '' and including column headings. I have been unable to find the right combination of options - the closest I have gotten is by building a single column with CAST and '09'XC, but the rows still have a leading 2-byte length indicator in most settings I have tried. I would prefer not to have to build large strings for the various different tables.

Read the article

Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

- by alex25

Hi! I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?! Thanks, alexl

Read the article

Sybase: how can I remove non-printable characters from CHAR or VARCHAR fields with SQL?

- by Kenny Drobnack

I'm working with a Sybase database that seems to have non-printable characters in some of the string fields and this is throwing off some of our processing code. At first glance, it seemed to only be newlines and carriage returns, but we also have an ASCII code 27 in there - an ESC character, some accented characters, and some other oddities in there. I have no direct access to change the database, so changing the bad data isn't an option, yet. For now I have to make do with just filtering it out. We're trying to export the table data from one database and load it into a database used by another application in a nightly batch process. Ideally, I'd like to have a function that I can pass a list of characters and just have Sybase return the data with those characters removed. I'd like to keep it something we could do in plain SQL if possible. Something like this to remove characters that are ASCII 0 - 31. select str_replace(FIELD1, (0-31), NULL) as FIELD1, str_replace(FIELD2, (0-31), NULL) as FIELD2 from TABLE So far, str_replace is the nearest I can find, but it only allows replacing one string with another. No support for character ranges and won't let me do the above. We're running on Sybase ASE 12.5 on Unix servers.

Read the article

alternatives to SSIS

- by Johannes Rudolph

What is a good,stable and preferably free alternative to SQL Server Integration Services? I'm so tired of this buggy piece of software.

Read the article

SSIS 2005 - How to Import a Fixed Width Flat File?

- by Greg

I have a flat file that looks something like this: junk I don't care about \n \n columns names\n val1 val2 val3\n val1 val2 val3\n columns names \n val1 val2 val3\n I only care the lines with values. These value lines are all fixed width format and have the same line length. The other junk lines and column names can have any line width. When I try the flat file fixed width option or the ragged right option the preview looks all wrong. Any ideas what the easiest way to get this into SSIS is?

Read the article

transforming binary data using ssis and sql server 2008

- by Rick

Hello All - I have a task to import/transform and extract zipped binary files that contain both text data as well as embeded binary data. Within the data is data that is relational in nature and needs to be processed into a defined database structure. Currently I have a C# single threaded app that essentially grabs all the files from the directory (currently there is 13K files of varying sizes) and extracts the data on a single thread line by line inserts to the database. As you could imagine this is a very slow process and unacceptable. There are several different parsing routines used depending on the header record in the file. There are potentially upto a million rows per file when all the data is extracted to the row level of detail. Follow on task is to parse those rows into their appropriate tables based on is content. i.e. the textual content has to be parsed further into "buckets" of like data in the database. That about sums up the big picture. Now for the problem task list. How do i iterate through a packet of data using SSIS? In the app the file is decompressed and then is parsed using streams data type and byte arrays and is routed to the required parsing routine based on the header data of each packet. There is bit swapping involved as well. Should i wrap up the app code into a script task(s) and let it do the custom processing? The data is seperated by year and the sql server tables is partitioned by year as well. I need to be able to "catch" bad file data as well and process by hand most likely. Should i simply load the zipped file to sql as a blob and parse the file with T-SQL? Would that be multi threaded if done that way? Not sure how to do the parsing in tsql that is involved here. Which do you think would be faster? Potentially the data that is currently processed via files could come to us via a socket. Can SSIS collect that data in real time? How would i go about setting that up? Processing these new files from the directorys will become a daily task. I can manage the data once i get it to sql server. Getting it there in a timely fashion seems to be the long pole in the tent for me. I would appreciate any comments or suggestions from the group. Rick

Read the article

Common Lisp condition system for transfer of control

- by Ken

I'll admit right up front that the following is a pretty terrible description of what I want to do. Apologies in advance. Please ask questions to help me explain. :-) I've written ETLs in other languages that consist of individual operations that look something like: // in class CountOperation IEnumerable<Row> Execute(IEnumerable<Row> rows) { var count = 0; foreach (var row in rows) { row["record number"] = count++; yield return row; } } Then you string a number of these operations together, and call The Dispatcher, which is responsible for calling Operations and pushing data between them. I'm trying to do something similar in Common Lisp, and I want to use the same basic structure, i.e., each operation is defined like a normal function that inputs a list and outputs a list, but lazily. I can define-condition a condition (have-value) to use for yield-like behavior, and I can run it in a single loop, and it works great. I'm defining the operations the same way, looping through the inputs: (defun count-records (rows) (loop for count from 0 for row in rows do (signal 'have-value :value `(:count ,count @,row)))) The trouble is if I want to string together several operations, and run them. My first attempt at writing a dispatcher for these looks something like: (let ((next-op ...)) ;; pick an op from the set of all ops (loop (handler-bind ((have-value (...))) ;; records output from operation (setq next-op ...) ;; pick a new next-op (call next-op))) But restarts have only dynamic extent: each operation will have the same restart names. The restart isn't a Lisp object I can store, to store the state of a function: it's something you call by name (symbol) inside the handler block, not a continuation you can store for later use. Is it possible to do something like I want here? Or am I better off just making each operation function explicitly look at its input queue, and explicitly place values on the output queue?

Read the article

C# Importing Large Volume of Data from CSV to Database

- by guazz

What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database. The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.) I need to do this in a efficiently as possible i.e. time constraints I am siding with the option of reading, transforming and loading the data using a C# application row-by-row? Is this ideal, if not, what are my options? Should I use multithreading?

Read the article

Import de-normalized relational data from Excel into SQL Server

- by roryf

I need to import data from an Excel spreadsheet into SQL Server, but the data isn't in a relational/normalized format so the import wizard isn't going to cut it (as far as I know). The data is in this format: Category SubCategory Name Description Category#1 SubCategory#1 Product#1 Description#1 Category#1 SubCategory#1 Product#2 Description#2 Category#1 SubCategory#2 Product#3 Description#3 Category#1 SubCategory#2 Product#4 Description#4 Category#2 SubCategory#3 Product#5 Description#5 (apologies I'm lacking the inventiveness to come up with 'real' data at this time in the morning...) Each row contains a unique product, but the cateogry structure is duplicated. I want to import this data into three tables: Category SubCategory Product (I know SubCategory should really be contained within Category, DB was not my design) I need a way to import unique rows based on the Category and then SubCategory columns, and then when importing the other columns into Product, obtain a reference to the SubCategory based on name. Short of scripting this, is there any way to do it using the import wizard or some other tool?

Search Results

Search found 487 results on 20 pages for 'etl instrumentation'.

Page 5/20 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by McPeterson

- by nkaur301

- by luckyluke

- by Meff

- by Binder

- by Binder

- by Dan Watling

- by Alix

- by aspiehler

- by Juan Tamayo

- by Alix

- by George Polevoy

- by Peter Štibraný

- by usovmv

- by srihari

- by Bradford

- by Cade Roux

- by alex25

- by Kenny Drobnack

- by Johannes Rudolph

- by Greg

- by Rick

- by Ken

- by guazz

- by roryf

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >