Search Results

Search found 6908 results on 277 pages for 'knowledge capture'.

Page 31/277 | < Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38  | Next Page >

  • Why can't we capture the design of software more effectively?

    - by Ira Baxter
    As engineers, we all "design" artifacts (buildings, programs, circuits, molecules...). That's an activity (design-the-verb) that produces some kind of result (design-the-noun). I think we all agree that design-the-noun is a different entity than the artifact itself. A key activity in the software business (indeed, in any business where the resulting product artifact needs to be enhanced) is to understand the "design (the-noun)". Yet we seem, as a community, to be pretty much complete failures at recording it, as evidenced by the amount of effort people put into rediscovering facts about their code base. Ask somebody to show you the design of their code and see what you get. I think of a design for software as having: An explicit specification for what the software is supposed to do and how well it does it An explicit version of the code (this part is easy, everybody has it) An explanation for how each part of the code serves to achieve the specification A rationale as to why the code is the way it is (e.g., why a particualr choice rather than another) What is NOT a design is a particular perspective on the code. For example [not to pick specifically on] UML diagrams are not designs. Rather, they are properties you can derive from the code, or arguably, properties you wish you could derive from the code. But as a general rule, you can't derive the code from UML. Why is it that after 50+ years of building software, why don't we have regular ways to express this? My personal opinion is that we don't have good ways to express this. Even if we do, most of the community seems so focused on getting "code" that design-the-noun gets lost anyway. (IMHO, until design becomes the purpose of engineering, with the artifact extracted from the design, we're not going to get around this). What have you seen as means for recording designs (in the sense I have described it)? Explicit references to papers would be good. Why do you think specific and general means have not been succesful? How can we change this?

    Read the article

  • How to capture a Header or Trailer Count Value in a Flat File and Assign to a Variable

    - by Compudicted
    Recently I had several questions concerning how to process files that carry a header and trailer in them. Typically those files are a product of data extract from non Microsoft products e.g. Oracle database encompassing various tables data where every row starts with an identifier. For example such a file data record could look like: HDR,INTF_01,OUT,TEST,3/9/2011 11:23 B1,121156789,DATA TEST DATA,2011-03-09 10:00:00,Y,TEST 18 10:00:44,2011-07-18 10:00:44,Y B2,TEST DATA,2011-03-18 10:00:44,Y B3,LEG 1 TEST DATA,TRAN TEST,N B4,LEG 2 TEST DATA,TRAN TEST,Y FTR,4,TEST END,3/9/2011 11:27 A developer is normally able to break the records using a Conditional Split Transformation component by employing an expression similar to Output1 -- SUBSTRING(Output1,1,2) == "B1" and so on, but often a verification is required after this step to check if the number of data records read corresponds to the number specified in the trailer record of the file. This portion sometimes stumbles some people so I decided to share what I came up with. As an aside, I want to mention that the approach I use is slightly more portable than some others I saw because I use a separate DFT that can be copied and pasted into a new SSIS package designer surface or re-used within the same package again and it can survive several trailer/footer records (!). See how a ready DFT can look: The first step is to create a Flat File Connection Manager and make sure you get the row split into columns like this: After you are done with the Flat File connection, move onto adding an aggregate which is in use to simply assign a value to a variable (here the aggregate is used to handle the possibility of multiple footers/headers): The next step is adding a Script Transformation as destination that requires very little coding. First, some variable setup: and finally the code: As you can see it is important to place your code into the appropriate routine in the script, otherwise the end result may not be as expected. As the last step you would use the regular Script Component to compare the variable value obtained from the DFT above to a package variable value obtained say via a Row Count component to determine if the file being processed has the right number of rows.

    Read the article

  • Capture a Query Executed By An Application Or User Against a SQL Server Database in Less Than a Minute

    - by Compudicted
    At times a Database Administrator, or even a developer is required to wear a spy’s hat. This necessity oftentimes is dictated by a need to take a glimpse into a black-box application for reasons varying from a performance issue to an unauthorized access to data or resources, or as in my most recent case, a closed source custom application that was abandoned by a deserted contractor without source code. It may not be news or unknown to most IT people that SQL Server has always provided means of back-door access to everything connecting to its database. This indispensible tool is SQL Server Profiler. This “gem” is always quietly sitting in the Start – Programs – SQL Server <product version> – Performance Tools folder (yes, it is for performance analysis mostly, but not limited to) ready to help you! So, to the action, let’s start it up. Once ready click on the File – New Trace button, or using Ctrl-N with your keyboard. The standard connection dialog you have seen in SSMS comes up where you connect the standard way: One side note here, you will be able to connect only if your account belongs to the sysadmin or alter trace fixed server role. Upon a successful connection you must be able to see this initial dialog: At this stage I will give a hint: you will have a wide variety of predefined templates: But to shorten your time to results you would need to opt for using the TSQL_Grouped template. Now you need to set it up. In some cases, you will know the principal’s login name (account) that needs to be monitored in advance, and in some (like in mine), you will not. But it is VERY helpful to monitor just a particular account to minimize the amount of results returned. So if you know it you can already go to the Event Section tab, then click the Column Filters button which would bring a dialog below where you key in the account being monitored without any mask (or whildcard):  If you do not know the principal name then you will need to poke around and look around for things like a config file where (typically!) the connection string is fully exposed. That was the case in my situation, an application had an app.config (XML) file with the connection string in it not encrypted: This made my endeavor very easy. So after I entered the account to monitor I clicked on Run button and also started my black-box application. Voilà, in a under a minute of time I had the SQL statement captured:

    Read the article

  • Capture text typed by user on website and allow user to email a link to someone else so that they can view the message [on hold]

    - by Dano007
    Using the following html, css,jquery, js, what would be the best approach to achieving this. A visitor hits the website. The website page is displaying html and a css3 animation. The visitor is given the option to enter text freely into a text box, they enter an email address and hit send. The person receiving the message gets an email with a link, when they click the link it takes them to the webpage with the animation and the custom message their friend entered. Is this easy to achieve, what would be the best approach? anyone know of existing code I could use? Thanks

    Read the article

  • What other tool is using my hotkey?

    - by Sammy
    I use Greenshot for screenshots, and it's been nagging about some other software tool using the same hotkey. I started receiving this warning message about two days ago. It shows up each time I reboot and log on to Windows. The hotkey(s) "Ctrl + Shift + PrintScreen" could not be registered. This problem is probably caused by another tool claiming usage of the same hotkey(s)! You could either change your hotkey settings or deactivate/change the software making use of the hotkey(s). What's this all about? The only software I recently installed is CPU-Z Core Temp Speed Fan HD Tune Epson Print CD NetStress What I would like to know is how to find out what other tool is causing this conflict? Do I really have to uninstall each program, one by one, until there is no conflict anymore? I see no option for customizing any hotkeys in CPU-Z, and according to docs there are only a few keyboard shortcuts. These are F5 through F9, but they are no hotkeys. There is nothing in Core Temp, and from what I can see... nothing in Speed Fan. Is any of these programs known to use Ctrl + Shift + PrintScreen hotkey for screenshots? I am actually suspecting the Dropbox client. I think I saw a warning recently coming from Dropbox program, something to do with hotkeys or keyboard shortcuts. I see that it has an option for sharing screenshots under Preferences menu, but I see no option for hotkeys. Core Temp actually also has an option for taking screenshots (F9) but it's just that - a keyboard shortcut, not a hotkey. And again, there's no option actually for changing this setting in Options/Settings menu. How do you resolve this type of conflicts? Are there any general methods you can use to pinpoint the second conflicting software? Like... is there some Windows registry key that holds the hotkeys? Or is it just down to mere luck and trial and error? Addendum I forgot to mention, when I do use the Ctrl + Shift + PrintScreen hotkey, what happens is that the Greenshot context menu shows up, asking me where I want to save the screenshot. So it appears to be working. But I am still getting the darn warning every time I reboot and log on to Windows?! I actually tried changing the key bindings in Greenshot preferences, but after a reboot it seems to have returned back to the settings I had previously. Update I can't see any hotkey conflicts in the Widnows Hotkey Explorer. The aforementioned hotkey is reserved by Greenshot, and I don't see any other program using the same hotkey binding. But when I went into Greenshot preferences, this is what I discovered. As you can see it's the Greenshot itself that uses the same hotkey twice! I guess that's why no other program was listed above as using this hotkey. But how can Greenshot be so stupid to use the same hotkey more than once? I didn't do this! It's not my fault... I'm not that stupid. This is what it's set to right now: Capture full screen: Ctrl + Skift + Prntscrn Capture window: Alt + Prntscrn Capture region: Ctrl + Prntscrn Capture last region: Skift + Prntscrn Capture Internet Explorer: Ctrl + Skift + Prntscrn And this is my preferred setting: Capture full screen: Prntscrn Capture window: Alt + Prntscrn Capture region: Ctrl + Prntscrn Capture last region: Capture Internet Explorer: I don't use any hotkey for "last region" and IE. But when I set this to my liking, as listed here, Greenshot gives me the same warning message, even as I tab through the hotkey entry fields. Sometimes it even gives me the warning when I just click Cancel button. This is really crazy! On the side note... You might have noticed that I have "update check" set to 0 (zero). This is because, in my experience, Greenshot changes all or only some of my preferences back to default settings whenever it automatically updates to a new version. So I opted to stay off updates to get rid of the problem. It has done so for the past three updates or so. I hoped to receive a new update that would fix the issue, but I think it still reverts back to default settings after each update to a new version, including setting default hotkeys. Update 2 I'll give you just one example of how Greenshot behaves. This is the dialog I have in front of me right now. As you can see, I have removed the last two hotkeys and changed the first one to my own liking. While I was clicking in the fields and removing the two hotkeys I was getting the warning message. So let's say I click in the "capture last region" field. Then I get this: Note that none of the entries include "Ctrl + Shift + PrintScreen" that it's warning about. Now I will change all the hotkeys so I get something like this: So now I'm using QWERTY letters for binding, like Ctrl+Alt+Q, Ctrl+Alt+W and so on. As far as I know no Windows program is using these. While I was clicking through the different fields it was giving me the warning. Now when I try to click OK to save the changes, it once again gives me a warning about "ctrl + shift + printscreen". Update 3 After setting the above key bindings (QWERTY) and saving changes, and then rebooting, the conflict seems to have been resolved. I was then able to set following key bindings. Capture full screen: Prntscrn Capture window: Alt + Prntscrn Capture region: Ctrl + Prntscrn I was not prompted with the warning message this time. Perhaps changing key binding required a system reboot? Sounds far fetched but that appears to be the case. I'm still not sure what caused this conflict, but I know for sure that it started after installing aforementioned programs. It might just have to do with Greenshot itself, and not some other program. Like I said, I know from experience that Greenshot likes to mess with users' settings after each update. I wouldn't be surprised if it was actually silently updated, even though I have specified not to check for updates, then it changed the key bindings back to defaults and caused a conflict with the hotkeys that were registered with the operating system previously. I rarely reboot the system, so that could have added to the conflict. Next time if I see this I will run Hotkey Explorer immediately and see if there is another program causing the conflict.

    Read the article

  • MySQL Triggers - How to capture external web variables? (eg. web username, ip)

    - by Ken
    Hi, I'm looking to create an audit trail for my PHP web app's database (eg. capture inserts, updates, deletes). MySQL triggers seem to be just the thing -- but how do I capture the IP address and the web username (as opposed to the mysql username, localhost) of the user who invoked the trigger? Thanks so much. -Ken P.S. I'm working with this example code I found: DROP TRIGGER IF EXISTS history_trigger $$ CREATE TRIGGER history_trigger BEFORE UPDATE ON clients FOR EACH ROW BEGIN IF OLD.first_name != NEW.first_name THEN INSERT INTO history_clients ( client_id , col , value , user_id , edit_time ) VALUES ( NEW.client_id, 'first_name', NEW.first_name, NEW.editor_id, NEW.last_mod ); END IF; IF OLD.last_name != NEW.last_name THEN INSERT INTO history_clients ( client_id , col , value , user_id , edit_time ) VALUES ( NEW.client_id, 'last_name', NEW.last_name, NEW.editor_id, NEW.last_mod ); END IF; END; $$

    Read the article

  • Good way to capture/replay sessions from Apache Log?

    - by Mark Harrison
    For performance testing, I would like to capture some traffic from a production server and use that as a basis to replay the request to a test server in order to simulate a realistic load in our development environment. These are all stateless queries, so no issues regarding cookies, sessions, etc. The Apache log timestamps everything down to a 1 second resolution, but that's not fine enough granularity for our peak times. What's the best way to capture more fine-grained timestamps for replay? And is there some ab-like load generating program that can use this data to replicate load?

    Read the article

  • How does one capture H.323 voice traffic on a VOIP network?

    - by Chris Holmes
    What I am trying to do is capture the WAV data of a phone conversation on a VOIP network using SharpPCap/PCap.Net. We are using the H.323 recommendation and my understanding is that voice data is located in the RTP packets. However, there is no way to heuristically determine if a UDP packet is a RTP packet, so we have to do more work before we can capture the data. The H.323 recommendation apparently uses a lot of traffic on specific TCP ports to negotiate the call before the WAV data is sent via RTP. However, I am having very little luck determining what data is actually sent on those TCP ports, when it is sent, what the packets look like, how to handle it, etc. If anyone has any information on how to go about this I'd really appreciate it. My Google-Fu seems to be failing me on this one.

    Read the article

  • Mouse Wheel Scroll - How can I capture the time interval between start and stop of scrolling?

    - by Rahat
    Is there any way to capture the time interval between mouse wheel scroll start and stop? Actually I want to capture the interval between the scrolling start and stop when I very quickly scroll the mouse wheel. I have already looked at MouseWheel event but it don't fulfill my requirement. In senes that it always gives a value of Delta 120 or -120 but i want to call a function depending on the speed of the mouse scroll for example when i scroll the mouse normally i want to perform function 1 and when i scrolled the mouse very quickly i want to perform the function 2. In other words is there any way to distinguish between the mouse scroll high and normal speed. Any advice will be appreciated.

    Read the article

  • How to include named capture groups in java regex?

    - by jrummell
    I'm new to regex in Java and I can't figure out how to include named capture groups in an expression. I'm writing a ScrewTurn Image Converter for Confluence's Universal Wiki Converter. This is what I have: String image = "\\[image(?<align>auto)?\\|\\|{UP\\(((?<namespace>\\w+)\\.)?(?<pagename>[\\w-]+)\\)}(?<filename>[\\w- ]+\\.[\\w]+)\\]"; Pattern imagePattern = Pattern.compile(image, Pattern.CASE_INSENSITIVE); It's throwing this exception in Pattern.comiple(): java.util.regex.PatternSyntaxException: Unknown look-behind group near index 19 \[image(?<align>auto)?\|\|{UP\(((?<namespace>\w+)\.)?(?<pagename>[\w-]+)\)}(?<filename>[\w- ]+\.[\w]+)\] ^ I've used named capture groups like this before in C# (?<namedgroup>asdf), but not in Java. What am I missing?

    Read the article

  • What is the best strategy for a streaming screen capture application ?

    - by The_AlienCoder
    Ok Ive figured out how to do a screen capture of my desktop using Windows Media Encoder 9 + SDK and saving it as a movie on my hardrive. But now I want to take my project further and stream a realtime screen capture of my desktop . The idea is that users accessing my asp.net website will be able to view whatever Im doing on my desktop in real time via the video streaming from my computer. I really dont need the source code just the best strategy, procedure and server requirements.How will users access the streaming video on my website? Will it work on my godaddy shared hosting account or at what point will I need an upgrade? Will I be saving the stream to a file on the server for users to access or is there a way to stream directly ? I just need someone to point me in the right direction...

    Read the article

  • Is there a way to pass a regex capture to a block in Ruby?

    - by Gordon Fontenot
    I have a hash with a regex for the key and a block for the value. Something like the following: { 'test (.+?)' => { puts $1 } } Not exactly like that, obviously, since the block is being stored as a Proc, but that's the idea. I then have a regex match later on that looks a lot like this hash.each do |pattern, action| if /#{pattern}/i.match(string) action.call end end The idea was to store the block away in the hash to make it a bit easier for me to expand upon in the future, but now the regex capture doesn't get passed to the block. Is there a way to do this cleanly that would support any number of captures I put in the regex (as in, some regex patterns may have 1 capture, others may have 3)?

    Read the article

  • In web project can we write core services layer without knowledge of UI ?

    - by Silent Warrior
    I am working on web project. We are using flex as UI layer. My question is often we are writing core service layer separately from web/UI layer so we can reuse same services for different UI layer/technology. So practically is it possible to reuse same core layer services without any changes/addition in API with different kind of UI technologies/layers. For e.g. same core service layer with UI technology which supports synchronized request response (e.g. jsp etc.) and non synchronize or event driven UI technology (e.g Ajax, Flex, GWT etc.) or with multiple devices like (computers, mobiles, pdas etc.). Personally I feel its very tough to write core service layer without any knowledge of UI. Looking for thoughts from other people.

    Read the article

  • What is the best software to capture full-screen 3h programming session in Windows?

    - by Hugo S Ferreira
    Hi, I'm planning a laboratorial experiment to assess behavior of groups when programming using some tools under study. For that, I'll need to capture their entire screen to disk. Mostly, what will be displayed is code, so I'm not to worried with image quality. However, it's paramount that the team is not able to stop the recording by accident, and the tool should be rebust enough to hold at least 3h of video. If possible, it would be nice for researchers in other rooms to "watch" the video as it is recording. Actually, this last requirement reminded me that I could use a VNC recording software, and install a VNC client in each laboratory computer. Anyway, what is your experience with this? Which software do you recommend? Thanks.

    Read the article

  • How should I capture Linux kernel panic stack traces?

    - by Alnitak
    What's current best practice to capture full kernel stack traces on a Linux system (RHEL 5.x, kernel 2.6.18) that occasionally panics in a device driver? I'm used to the "old" SunOS way of doing things - crash dumps get written to swap, and on reboot the dump gets retrieved in the local file system. man 8 crash refers to diskdump, but that appears to be unsupported. and/or deprecated. I've played with kdump, but it's unclear whether I can get a stack trace from that. Triggering a panic via Magic SysRq didn't create one. It also seems wasteful to reserve so much memory (128MB) just for a kexec crash recovery kernel.

    Read the article

  • Capture the build number for a remote-triggered Hudson job?

    - by EMiller
    I have a very simple inhouse web app from which certain Hudson builds (on another server) can be triggered remotely. I have no problem triggering the builds, but I don't know how to capture the associated build number for later reference. I'm using the buildWithParameters trigger, and the actual result of that call is just a mess of HTML - I don't believe it gives me back the build number. I started down the path of pulling the whole build list for the job (via the api), and then attempting to reconcile that list against my records - but that's much more complicated than I'd like it to be. I also considered sleeping for a few seconds after launching the job, and then grabbing the latestBuild from the Hudson api - but I'm sure that's going to go wrong at some point (someone will fire off two jobs quickly, and I'll get the association wrong).

    Read the article

  • How do I capture and playback http web requests against multiple web servers?

    - by KevM
    My overall goal is to not interrupt a production system while capturing HTTP Posts to a web application so that I can reverse engineer the telemetry coming from a closed application. I have control over the transmitter of the HTTP Posts but not the receiving web application. It seems like I need a request "forking" proxy. Sort of a reverse proxy that pushes the request to 2 endpoints, a master and slave, only relaying the response from the master endpoint back to the requester. I am not a server geek so something like this may exist but I don't know the term of art for what I am looking for. Another possibility could be a simple logging proxy. Capture a log of the web requests. Rewrite the log to target my "slave" web application. Playback the log with curl or something. Thank you for your assistance.

    Read the article

  • BitBlt ignores CAPTUREBLT and seems to always capture a cached copy of the target...

    - by Jake Petroules
    I am trying to capture screenshots using the BitBlt function. However, every single time I capture a screenshot, the non-client area NEVER changes no matter what I do. It's as if it's getting some cached copy of it. The client area is captured correctly. If I close and then re-open the window, and take a screenshot, the non-client area will be captured as it is. Any subsequent captures after moving/resizing the window have no effect on the captured screenshot. Again, the client area will be correct. Furthermore, the CAPTUREBLT flag seems to do absolutely nothing at all. I notice no change with or without it. Here is my capture code: QPixmap WindowManagerUtils::grabWindow(WId windowId, GrabWindowFlags flags, int x, int y, int w, int h) { RECT r; switch (flags) { case WindowManagerUtils::GrabWindowRect: GetWindowRect(windowId, &r); break; case WindowManagerUtils::GrabClientRect: GetClientRect(windowId, &r); break; case WindowManagerUtils::GrabScreenWindow: GetWindowRect(windowId, &r); return QPixmap::grabWindow(QApplication::desktop()->winId(), r.left, r.top, r.right - r.left, r.bottom - r.top); case WindowManagerUtils::GrabScreenClient: GetClientRect(windowId, &r); return QPixmap::grabWindow(QApplication::desktop()->winId(), r.left, r.top, r.right - r.left, r.bottom - r.top); default: return QPixmap(); } if (w < 0) { w = r.right - r.left; } if (h < 0) { h = r.bottom - r.top; } #ifdef Q_WS_WINCE_WM if (qt_wince_is_pocket_pc()) { QWidget *widget = QWidget::find(winId); if (qobject_cast<QDesktopWidget*>(widget)) { RECT rect = {0,0,0,0}; AdjustWindowRectEx(&rect, WS_BORDER | WS_CAPTION, FALSE, 0); int magicNumber = qt_wince_is_high_dpi() ? 4 : 2; y += rect.top - magicNumber; } } #endif // Before we start creating objects, let's make CERTAIN of the following so we don't have a mess Q_ASSERT(flags == WindowManagerUtils::GrabWindowRect || flags == WindowManagerUtils::GrabClientRect); // Create and setup bitmap HDC display_dc = NULL; if (flags == WindowManagerUtils::GrabWindowRect) { display_dc = GetWindowDC(NULL); } else if (flags == WindowManagerUtils::GrabClientRect) { display_dc = GetDC(NULL); } HDC bitmap_dc = CreateCompatibleDC(display_dc); HBITMAP bitmap = CreateCompatibleBitmap(display_dc, w, h); HGDIOBJ null_bitmap = SelectObject(bitmap_dc, bitmap); // copy data HDC window_dc = NULL; if (flags == WindowManagerUtils::GrabWindowRect) { window_dc = GetWindowDC(windowId); } else if (flags == WindowManagerUtils::GrabClientRect) { window_dc = GetDC(windowId); } DWORD ropFlags = SRCCOPY; #ifndef Q_WS_WINCE ropFlags = ropFlags | CAPTUREBLT; #endif BitBlt(bitmap_dc, 0, 0, w, h, window_dc, x, y, ropFlags); // clean up all but bitmap ReleaseDC(windowId, window_dc); SelectObject(bitmap_dc, null_bitmap); DeleteDC(bitmap_dc); QPixmap pixmap = QPixmap::fromWinHBITMAP(bitmap); DeleteObject(bitmap); ReleaseDC(NULL, display_dc); return pixmap; } Most of this code comes from Qt's QWidget::grabWindow function, as I wanted to make some changes so it'd be more flexible. Qt's documentation states that: The grabWindow() function grabs pixels from the screen, not from the window, i.e. if there is another window partially or entirely over the one you grab, you get pixels from the overlying window, too. However, I experience the exact opposite... regardless of the CAPTUREBLT flag. I've tried everything I can think of... nothing works. Any ideas?

    Read the article

  • How to increase my "advanced" knowledge of PHP further? (quickly)

    - by Kerry
    I have been working with PHP for years and gotten a very good grasp of the language, created many advanced and not-so-advanced systems that are working very well. The problem I'm running into is that I only learn when I find a need for something that I haven't learned before. This causes me to look up solutions and other code that handles the problem, and so I will learn about a new function or structure that I hadn't seen before. It is in this way that I have learned many of my better techniques (such as studying classes put out by Amazon, Google or other major companies). The main problem with this is the concept of not being able to learn something if you don't know it exists. For instance, it took me several months of programming to learn about the empty() function, and I simply would check the string length using strlen() to check for empty values. I'm now getting into building bigger and bigger systems, and I've started to read blogs like highscalability.com and been researching MySQL replication and server data for scaling. I know that structure of your code is very important to make full systems work. After reading a recent blog about reddit's structure, it made me question if there is some standard or "accepted systems" out there. I have looked into frameworks (I've used Kohana, which I regretted, but decided that PHP frameworks were not for me) and I prefer my own library of functions rather than having a framework. My current structure is a mix between WordPress, Kohana and my own knowledge. The ways I can see as being potentially beneficial are: Read blogs Read tutorials Work with someone else Read a book What would be the best way(s) to "get to the next level" the level of being a very good system developer?

    Read the article

  • SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

    - by pinaldave
    Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base  Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

    Read the article

< Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38  | Next Page >