Search Results

Search found 65872 results on 2635 pages for 'core data migration'.

Page 62/2635 | < Previous Page | 58 59 60 61 62 63 64 65 66 67 68 69  | Next Page >

  • SQL SERVER – 5 Tips for Improving Your Data with expressor Studio

    - by pinaldave
    It’s no secret that bad data leads to bad decisions and poor results.  However, how do you prevent dirty data from taking up residency in your data store?  Some might argue that it’s the responsibility of the person sending you the data.  While that may be true, in practice that will rarely hold up.  It doesn’t matter how many times you ask, you will get the data however they decide to provide it. So now you have bad data.  What constitutes bad data?  There are quite a few valid answers, for example: Invalid date values Inappropriate characters Wrong data Values that exceed a pre-set threshold While it is certainly possible to write your own scripts and custom SQL to identify and deal with these data anomalies, that effort often takes too long and becomes difficult to maintain.  Instead, leveraging an ETL tool like expressor Studio makes the data cleansing process much easier and faster.  Below are some tips for leveraging expressor to get your data into tip-top shape. Tip 1:     Build reusable data objects with embedded cleansing rules One of the new features in expressor Studio 3.2 is the ability to define constraints at the metadata level.  Using expressor’s concept of Semantic Types, you can define reusable data objects that have embedded logic such as constraints for dealing with dirty data.  Once defined, they can be saved as a shared atomic type and then re-applied to other data attributes in other schemas. As you can see in the figure above, I’ve defined a constraint on zip code.  I can then save the constraint rules I defined for zip code as a shared atomic type called zip_type for example.   The next time I get a different data source with a schema that also contains a zip code field, I can simply apply the shared atomic type (shown below) and the previously defined constraints will be automatically applied. Tip 2:     Unlock the power of regular expressions in Semantic Types Another powerful feature introduced in expressor Studio 3.2 is the option to use regular expressions as a constraint.   A regular expression is used to identify patterns within data.   The patterns could be something as simple as a date format or something much more complex such as a street address.  For example, I could define that a valid IP address should be made up of 4 numbers, each 0 to 255, and separated by a period.  So 192.168.23.123 might be a valid IP address whereas 888.777.0.123 would not be.   How can I account for this using regular expressions? A very simple regular expression that would look for any 4 sets of 3 digits separated by a period would be:  ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ Alternatively, the following would be the exact check for truly valid IP addresses as we had defined above:  ^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$ .  In expressor, we would enter this regular expression as a constraint like this: Here we select the corrective action to be ‘Escalate’, meaning that the expressor Dataflow operator will decide what to do.  Some of the options include rejecting the offending record, skipping it, or aborting the dataflow. Tip 3:     Email pattern expressions that might come in handy In the example schema that I am using, there’s a field for email.  Email addresses are often entered incorrectly because people are trying to avoid spam.  While there are a lot of different ways to define what constitutes a valid email address, a quick search online yields a couple of really useful regular expressions for validating email addresses: This one is short and sweet:  \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b (Source: http://www.regular-expressions.info/) This one is more specific about which characters are allowed:  ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ (Source: http://regexlib.com/REDetails.aspx?regexp_id=26 ) Tip 4:     Reject “dirty data” for analysis or further processing Yet another feature introduced in expressor Studio 3.2 is the ability to reject records based on constraint violations.  To capture reject records on input, simply specify Reject Record in the Error Handling setting for the Read File operator.  Then attach a Write File operator to the reject port of the Read File operator as such: Next, in the Write File operator, you can configure the expressor operator in a similar way to the Read File.  The key difference would be that the schema needs to be derived from the upstream operator as shown below: Once configured, expressor will output rejected records to the file you specified.  In addition to the rejected records, expressor also captures some diagnostic information that will be helpful towards identifying why the record was rejected.  This makes diagnosing errors much easier! Tip 5:    Use a Filter or Transform after the initial cleansing to finish the job Sometimes you may want to predicate the data cleansing on a more complex set of conditions.  For example, I may only be interested in processing data containing males over the age of 25 in certain zip codes.  Using an expressor Filter operator, you can define the conditional logic which isolates the records of importance away from the others. Alternatively, the expressor Transform operator can be used to alter the input value via a user defined algorithm or transformation.  It also supports the use of conditional logic and data can be rejected based on constraint violations. However, the best tip I can leave you with is to not constrain your solution design approach – expressor operators can be combined in many different ways to achieve the desired results.  For example, in the expressor Dataflow below, I can post-process the reject data from the Filter which did not meet my pre-defined criteria and, if successful, Funnel it back into the flow so that it gets written to the target table. I continue to be impressed that expressor offers all this functionality as part of their FREE expressor Studio desktop ETL tool, which you can download from here.  Their Studio ETL tool is absolutely free and they are very open about saying that if you want to deploy their software on a dedicated Windows Server, you need to purchase their server software, whose pricing is posted on their website. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – Core Concepts – Elasticity, Scalability and ACID Properties – Exploring NuoDB an Elastically Scalable Database System

    - by pinaldave
    I have been recently exploring Elasticity and Scalability attributes of databases. You can see that in my earlier blog posts about NuoDB where I wanted to look at Elasticity and Scalability concepts. The concepts are very interesting, and intriguing as well. I have discussed these concepts with my friend Joyti M and together we have come up with this interesting read. The goal of this article is to answer following simple questions What is Elasticity? What is Scalability? How ACID properties vary from NOSQL Concepts? What are the prevailing problems in the current database system architectures? Why is NuoDB  an innovative and welcome change in database paradigm? Elasticity This word’s original form is used in many different ways and honestly it does do a decent job in holding things together over the years as a person grows and contracts. Within the tech world, and specifically related to software systems (database, application servers), it has come to mean a few things - allow stretching of resources without reaching the breaking point (on demand). What are resources in this context? Resources are the usual suspects – RAM/CPU/IO/Bandwidth in the form of a container (a process or bunch of processes combined as modules). When it is about increasing resources the simplest idea which comes to mind is the addition of another container. Another container means adding a brand new physical node. When it is about adding a new node there are two questions which comes to mind. 1) Can we add another node to our software system? 2) If yes, does adding new node cause downtime for the system? Let us assume we have added new node, let us see what the new needs of the system are when a new node is added. Balancing incoming requests to multiple nodes Synchronization of a shared state across multiple nodes Identification of “downstate” and resolution action to bring it to “upstate” Well, adding a new node has its advantages as well. Here are few of the positive points Throughput can increase nearly horizontally across the node throughout the system Response times of application will increase as in-between layer interactions will be improved Now, Let us put the above concepts in the perspective of a Database. When we mention the term “running out of resources” or “application is bound to resources” the resources can be CPU, Memory or Bandwidth. The regular approach to “gain scalability” in the database is to look around for bottlenecks and increase the bottlenecked resource. When we have memory as a bottleneck we look at the data buffers, locks, query plans or indexes. After a point even this is not enough as there needs to be an efficient way of managing such large workload on a “single machine” across memory and CPU bound (right kind of scheduling)  workload. We next move on to either read/write separation of the workload or functionality-based sharing so that we still have control of the individual. But this requires lots of planning and change in client systems in terms of knowing where to go/update/read and for reporting applications to “aggregate the data” in an intelligent way. What we ideally need is an intelligent layer which allows us to do these things without us getting into managing, monitoring and distributing the workload. Scalability In the context of database/applications, scalability means three main things Ability to handle normal loads without pressure E.g. X users at the Y utilization of resources (CPU, Memory, Bandwidth) on the Z kind of hardware (4 processor, 32 GB machine with 15000 RPM SATA drives and 1 GHz Network switch) with T throughput Ability to scale up to expected peak load which is greater than normal load with acceptable response times Ability to provide acceptable response times across the system E.g. Response time in S milliseconds (or agreed upon unit of measure) – 90% of the time The Issue – Need of Scale In normal cases one can plan for the load testing to test out normal, peak, and stress scenarios to ensure specific hardware meets the needs. With help from Hardware and Software partners and best practices, bottlenecks can be identified and requisite resources added to the system. Unfortunately this vertical scale is expensive and difficult to achieve and most of the operational people need the ability to scale horizontally. This helps in getting better throughput as there are physical limits in terms of adding resources (Memory, CPU, Bandwidth and Storage) indefinitely. Today we have different options to achieve scalability: Read & Write Separation The idea here is to do actual writes to one store and configure slaves receiving the latest data with acceptable delays. Slaves can be used for balancing out reads. We can also explore functional separation or sharing as well. We can separate data operations by a specific identifier (e.g. region, year, month) and consolidate it for reporting purposes. For functional separation the major disadvantage is when schema changes or workload pattern changes. As the requirement grows one still needs to deal with scale need in manual ways by providing an abstraction in the middle tier code. Using NOSQL solutions The idea is to flatten out the structures in general to keep all values which are retrieved together at the same store and provide flexible schema. The issue with the stores is that they are compromising on mostly consistency (no ACID guarantees) and one has to use NON-SQL dialect to work with the store. The other major issue is about education with NOSQL solutions. Would one really want to make these compromises on the ability to connect and retrieve in simple SQL manner and learn other skill sets? Or for that matter give up on ACID guarantee and start dealing with consistency issues? Hybrid Deployment – Mac, Linux, Cloud, and Windows One of the challenges today that we see across On-premise vs Cloud infrastructure is a difference in abilities. Take for example SQL Azure – it is wonderful in its concepts of throttling (as it is shared deployment) of resources and ability to scale using federation. However, the same abilities are not available on premise. This is not a mistake, mind you – but a compromise of the sweet spot of workloads, customer requirements and operational SLAs which can be supported by the team. In today’s world it is imperative that databases are available across operating systems – which are a commodity and used by developers of all hues. An Ideal Database Ability List A system which allows a linear scale of the system (increase in throughput with reasonable response time) with the addition of resources A system which does not compromise on the ACID guarantees and require developers to learn new paradigms A system which does not force fit a new way interacting with database by learning Non-SQL dialect A system which does not force fit its mechanisms for providing availability across its various modules. Well NuoDB is the first database which has all of the above abilities and much more. In future articles I will cover my hands-on experience with it. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • Problem when trying to configure enterprise library 5.0 (Data Access Application Block)

    - by Phil
    Hi There Stackoverflow, I am running into some problems while trying to get DAAB from Enterprise library 5.0 running. I have followed the steps as per the tutorial, but am getting errors... 1) Download / install enterprise library 2) Add references to the blocks I need (common / data) 3) Imports Imports Microsoft.Practices.EnterpriseLibrary.Common Imports Microsoft.Practices.EnterpriseLibrary.Data 4) Through the enterprise library config software. I open up the web.config from my site. I then click Blocks, then Add data settings... fill in my details and save / close 5) I then (thinking setup is complete) try to get an instance of the database via Dim db As Database = DatabaseFactory.CreateDatabase() 6) I compile and receive the following error: Could not load file or assembly 'Microsoft.Practices.EnterpriseLibrary.Data, Version=5.0.414.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) (C:\site\web.config line 4) Line 4 off my web.config was generated by the config tool and is: <section name="dataConfiguration" type="Microsoft.Practices.EnterpriseLibrary.Data.Configuration.DatabaseSettings, Microsoft.Practices.EnterpriseLibrary.Data, Version=5.0.414.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" requirePermission="true" /> Am I missing a required step? Have I done the steps in the wrong order? Have I made a mistake? Thanks a lot for the assistance.

    Read the article

  • Handling incremental Data Modeling Changes in Functional Programming

    - by Adam Gent
    Most of the problems I have to solve in my job as a developer have to do with data modeling. For example in a OOP Web Application world I often have to change the data properties that are in a object to meet new requirements. If I'm lucky I don't even need to programmatically add new "behavior" code (functions,methods). Instead I can declarative add validation and even UI options by annotating the property (Java). In Functional Programming it seems that adding new data properties requires lots of code changes because of pattern matching and data constructors (Haskell, ML). How do I minimize this problem? This seems to be a recognized problem as Xavier Leroy states nicely on page 24 of "Objects and Classes vs. Modules" - To summarize for those that don't have a PostScript viewer it basically says FP languages are better than OOP languages for adding new behavior over data objects but OOP languages are better for adding new data objects/properties. Are there any design pattern used in FP languages to help mitigate this problem? I have read Phillip Wadler's recommendation of using Monads to help this modularity problem but I'm not sure I understand how?

    Read the article

  • How to sort data in a table data structure in Java?

    - by rgksugan
    I need to sort data based on the third column of the table data structure. I tried based on the answers for the following question. But my sorting does not work. Please help me in this. Here goes my code. Object[] data = new Object[y]; rst.beforeFirst(); while (rst.next()) { int p_id = Integer.parseInt(rst.getString(1)); String sw2 = "select sum(quantity) from tbl_order_detail where product_id=" + p_id; rst1 = stmt1.executeQuery(sw2); rst1.next(); String sw3 = "select max(order_date) from tbl_order where tbl_order.`Order_ID` in (select tbl_order_detail.`Order_ID` from tbl_order_detail where product_id=" + p_id + ")"; rst2 = stmt2.executeQuery(sw3); rst2.next(); data[i] = new Object[]{new String(rst.getString(2)), new String(rst.getString(3)), new Integer(rst1.getString(1)), new String(rst2.getString(1))}; i++; } ColumnComparator cc = new ColumnComparator(2); Arrays.sort(data, cc); if (i == 0) { table.addCell(""); table.addCell(""); table.addCell(""); table.addCell(""); } else { for (int j = 0; j < y; j++) { Object[] theRow = (Object[]) data[j]; table.addCell((String) theRow[0]); table.addCell((String) theRow[1]); table.addCell((String) theRow[2]); table.addCell((String) theRow[3]); }

    Read the article

  • Classifying captured data in unknown format?

    - by monch1962
    I've got a large set of captured data (potentially hundreds of thousands of records), and I need to be able to break it down so I can both classify it and also produce "typical" data myself. Let me explain further... If I have the following strings of data: 132T339G1P112S 164T897F5A498S 144T989B9B223T 155T928X9Z554T ... you might start to infer the following: possibly all strings are 14 characters long the 4th, 8th, 10th and 14th characters may always be alphas, while the rest are numeric the first character may always be a '1' the 4th character may always be the letter 'T' the 14th character may be limited to only being 'S' or 'T' and so on... As you get more and more samples of real data, some of these "rules" might disappear; if you see a 15 character long string, then you have evidence that the 1st "rule" is incorrect. However, given a sufficiently large sample of strings that are exactly 14 characters long, you can start to assume that "all strings are 14 characters long" and assign a numeric figure to your degree of confidence (with an appropriate set of assumptions around the fact that you're seeing a suitably random set of all possible captured data). As you can probably tell, a human can do a lot of this classification by eye, but I'm not aware of libraries or algorithms that would allow a computer to do it. Given a set of captured data (significantly more complex than the above...), are there libraries that I can apply in my code to do this sort of classification for me, that will identify "rules" with a given degree of confidence? As a next step, I need to be able to take those rules, and use them to create my own data that conforms to these rules. I assume this is a significantly easier step than the classification, but I've never had to perform a task like this before so I'm really not sure how complex it is. At a guess, Python or Java (or possibly Perl or R) are possibly the "common" languages most likely to have these sorts of libraries, and maybe some of the bioinformatic libraries do this sort of thing. I really don't care which language I have to use; I need to solve the problem in whatever way I can. Any sort of pointer to information would be very useful. As you can probably tell, I'm struggling to describe this problem clearly, and there may be a set of appropriate keywords I can plug into Google that will point me towards the solution. Thanks in advance

    Read the article

  • Is Core Animation causing my subviews to call -drawRect for every single frame?

    - by mystify
    I made a nice UIView subclass which paints all its stuff in -drawRect:, because people said that's good. That view is a subview of another. This another view is beeing animated with Core Animation: It's scaled down, rotated and moved. However, I encountered this: -drawRect seems to get called trillion of times during animation, and performance sucks. Is that normal or did I do something wrong, probably?

    Read the article

  • Best practice? - Array/Dictionary as a Core Data Entity Attribute

    - by Run Loop
    I am new to Core Data. I have noticed that collection types are not available as attribute types and would like to know what the most efficient way is of storing array/dictionary type data as an attribute (e.g. the elements that make up an address like street, city, etc. does not require a separate entity and is more conveniently stored as a dictionary/array than separate attributes/fields). Thank you.

    Read the article

  • Lost all data on Windows XP after blue screen

    - by Barb
    I got a blue screen and was trying to boot with my OS disk. Frankly, I was unsure exactly how to do this. I was trying everything and booted in partition mode. Finally, I booted with disk and ran chkdsk /r and was able to log into Windows. But, all of my files and pictures are gone. I have no backup and all I'm sick to think that I lost the last seven years of pictures of my kids. What can I do?

    Read the article

  • Data Protection Manager System Protection Backups Failing

    - by TrueDuality
    I'm just starting to setup DPM 2010 in a test environment with a Domain Controller and a File Server. Everything seem to be working fairly well and I can get all of my backup jobs to succeed except for the "Computer\System Protection" backups. Both servers are running fully up to date 64 bit Windows Server 2008 R2 Enterprise with Service Pack 1. The error that is being provided is: DPM cannot create a backup because Windows Server Backup (WSB) on the protected computer encountered an error (WSB Event ID: 517, WSB Error Code: 0x8078001D). (ID 30229 Details: Internal error code: 0x809909FB) This Microsoft Knowledge Base article describes the issue perfectly and provides a hotfix. I downloaded the hotfix, moved it onto the affected server, attempt to run it and receive the following error: The update is not applicable to your computer. I've verified that I have indeed downloaded the 64 bit version. According to this thread the hotfix got rolled into Service Pack 1, yet I'm still experiencing the issue. Both machines do have the Windows Server Backup feature installed. Can anybody point me in the right direction? What am I missing?

    Read the article

  • Apache mod_wsgi error: ImportError: No module named django.core.handlers.wsgi

    - by bigmac
    I am using Python 2.7 with mod_python 3.3.1 and mod_wsgi 3.3. I get an Internal Server Error and this stack trace in the apache logs: [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] import django.core.handlers.wsgi [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] ImportError: No module named django.core.handlers.wsgi [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] mod_wsgi (pid=4463): Target WSGI script '/home/one/codebase/campman/wsgi_handler.py' cannot be loaded as Python module. [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] mod_wsgi (pid=4463): Exception occurred processing WSGI script '/home/one/codebase/campman/wsgi_handler.py'. [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] Traceback (most recent call last): [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] File "/home/one/codebase/campman/wsgi_handler.py", line 13, in <module> [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] import django.core.handlers.wsgi [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] ImportError: No module named django.core.handlers.wsgi

    Read the article

  • Server hang - data loss on reboot, post mortem analysis

    - by rovangju
    A development server I'm responsible for (ext3 on raid 5 w/Debian Squeeze) froze up over the weekend and I was forced to reset it, as in unresponsive from KVM/physical keyboard access, no eth devices responding, etc. Not even the backup process ran (Figures, the one time I don't check for confirmation) So after the reset, it turns out that every trace of disk IO activity that should have happened for a period of ~24H is completely gone. The log files have a big gap in the dates and times. As if the writes were never committed to disk, no processes seemed to have run. Luckily it was a weekend and nothing of value would have been lost and I don't suspect a hack. What can I do in post mortem to this event - to prevent it from ever happening again? I've seen this happen before on a completely different machine running FreeBSD. I am rounding up the disk checking tools right now - but there must be more going on! Mount options: /dev/sda1 on / type ext3 (rw,errors=remount-ro) Kernel: Linux dev 2.6.32-5-686-bigmem Disk/Inodes: 13%/3%

    Read the article

  • Smss.exe - setting any core affinity breaks rdp on Windows 7 / Windows Server 2012

    - by Hetman
    I have tried to set core affinity of smss.exe to not run on one critical core on Windows 7 and Windows Server 2008r2. It turns out that simply setting the core affinity to anything (even the full mask that smss.exe already has) seems to work but prevents users from rdp'ing into the machine until it is restarted. The users already logged in may continue to use their sessions. This behaviour does not occur on Windows 8/Windows Server 2012. Does anyone know why it is happening?

    Read the article

  • 7-Zip on multi-core computers

    - by Peter Mortensen
    Does 7-Zip take advantage of multiprocessor or multi-core systems? For example, would there be a close to 16 times speed-up on a 16 core system assuming no disk or memory bottlenecks? Or is it is limited to 2 threads (2 times speed-up on systems with more than one CPU or core)?

    Read the article

  • Apache mod_wsgi error: ImportError: No module named django.core.handlers.wsgi

    - by bigmac
    I am using Python 2.7 with mod_python 3.3.1 and mod_wsgi 3.3. I get an Internal Server Error and this stack trace in the apache logs: [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] import django.core.handlers.wsgi [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] ImportError: No module named django.core.handlers.wsgi [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] mod_wsgi (pid=4463): Target WSGI script '/home/one/codebase/campman/wsgi_handler.py' cannot be loaded as Python module. [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] mod_wsgi (pid=4463): Exception occurred processing WSGI script '/home/one/codebase/campman/wsgi_handler.py'. [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] Traceback (most recent call last): [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] File "/home/one/codebase/campman/wsgi_handler.py", line 13, in <module> [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] import django.core.handlers.wsgi [Thu Apr 21 10:25:37 2011] [error] [client 83.244.243.242] ImportError: No module named django.core.handlers.wsgi

    Read the article

< Previous Page | 58 59 60 61 62 63 64 65 66 67 68 69  | Next Page >