Search Results

Search found 60939 results on 2438 pages for 'data quality'.

Page 6/2438 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

Textures quality issues with Libgdx

- by user1876708

I have drawn several vector objets and characters ( in Adobe Illustrator ) for my game on Android. They are all scalable at any size without any quality losses ( of course it's vector ^^ ). I have tried to simulate my gameboard directly on Illustrator just before setting my assests on libdgx to implement them in my game. I set all the objects at the good size, so that they fit perfectly on my XHDPI device I am running my test on. As you can see it works great ( for me at least ^^ ), the PNG quality is good for me, as expected ! So I have edited all my PNG at this size, set my assets on libgdx and build my game apk. And here is a screenshot of my gameboard ( don't pay attention at the differences of placing and objects, but check at the objets presents on both screenshot ). As you can see, I have a loss of my PNG quality in the game. It can be seen clealry on the hedgehog PNG, but also ( but not as obvious ) on the mushroom ( check at the outline ) and the hole PNG. If you really pay attention, on every objects, you can see pixels that are not visible on my first screenshot. And I just can't figure out why this is happening Oo If you have any ideas, you are very welcome ! Thanks. PS : You can check more clearly the 2 gameboard on this two links ( look at them at 100%, display at high resolution ) : Good quality link, from Illustrator Poor quality link, from the game Second phase of tests : We display an object ( the hedgehog ) on our main menu screen to see how it looks like. The things is that it looks like he is suppose to, which means, high quality with no pixels. The hedgehog PNG is coming from an atlas : layer.addActor(hedgehog); No loss of quality with this method So we think the problem is comming from the method we are using to display it on our gameboard : blocks[9][3] = new Block(TextureUtils.hedgehog, new Vector2(9, 3)); the block is getting the size from the vector we are associating to it, but we have a loss of quality with this method.

Read the article
In which fields does quality of the software product matter as much as the completion time?

- by Nav

Someone told me that if the software product meets clients expectations, it is good quality. But I've worked with Interaction Designers (the same kind of people who made Gmail's interface and usability so cool!), and I've loved working with them because even though they came up with hundreds of changes in requirements, and emphasised on many many subtle details, when the software was complete, I could look at the product and say WOW! The current place I work, the only thing that matters is completing the project on time. As long as it works and as long as the client says it's ok, nobody bothers to improve it. I'm not talking about gold-plating, but I believe that for a programmer to enjoy his (well, maybe her too ;) ) job, they should be able to proudly say that "Hey, I made that software" and that comes only when the product is of good quality. Apart from your opinions on this, I'd also like to know which fields (Eg. Aerospace, Finance etc.) could I find companies (or you could mention the company name) where the quality of a product is as important as completing the project on time?

Read the article
Big Data – Final Wrap and What Next – Day 21 of 21

- by Pinal Dave

In yesterday’s blog post we explored various resources related to learning Big Data and in this blog post we will wrap up this 21 day series on Big Data. I have been exploring various terms and technology related to Big Data this entire month. It was indeed fun to write about Big Data in 21 days but the subject of Big Data is much bigger and larger than someone can cover it in 21 days. My first goal was to write about the basics and I think we have got that one covered pretty well. During this 21 days I have received many questions and answers related to Big Data. I have covered a few of the questions in this series and a few more I will be covering in the next coming months. Now after understanding Big Data basics. I am personally going to do a list of the things next. I thought I will share the same with you as this will give you a good idea how to continue the journey of the Big Data. Build a schedule to read various Apache documentations Watch all Pluralsight Courses Explore HortonWorks Sandbox Start building presentation about Big Data – this is a great way to learn something new Present in User Groups Meetings on Big Data Topics Write more blog posts about Big Data I am going to continue learning about Big Data – I want you to continue learning Big Data. Please leave a comment how you are going to continue learning about Big Data. I will publish all the informative comments on this blog with due credit. I want to end this series with the infographic by UMUC. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Partner Webcast - Focus on Oracle Data Profiling and Data Quality 11g

- by lukasz.romaszewski(at)oracle.com

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-ansi-language:RO;} Partner Webcast Focus on Oracle Data Profiling and Data Quality 11g February 24th, 12am CET Oracle offers an integrated suite Data Quality software architected to discover and correct today's data quality problems and establish a platform prepared for tomorrow's yet unknown data challenges. Oracle Data Profiling provides data investigation, discovery, and profiling in support of quality, migration, integration, stewardship, and governance initiatives. It includes a broad range of features that expand upon basic profiling, including automated monitoring, business-rule validation, and trend analysis. Oracle Data Quality for Data Integrator provides cleansing, standardization, matching, address validation, location enrichment, and linking functions for global customer data and operational business data. It ensures that data adheres to established standards that are adaptable to fit each organization's specific needs. Both single - and double - byte data are processed in local languages to provide a unique and centralized view of customers, products and services. During this in-person briefing, Data Integration Solution Specialists will be providing a technical overview and a walkthrough. Agenda · Oracle Data Integration Strategy overview · A focus on Oracle Data Profiling and Oracle Data Quality for Data Integrator: o Oracle Data Profiling o Oracle Data Quality for Data Integrator o Live demoo Q&A Delivery Format This FREE online LIVE eSeminar will be delivered over the Web and Conference Call. Registrations received less than 24hours prior to start time may not receive confirmation to attend. To register , click here. For any questions please contact [email protected].

Read the article
SQL SERVER – Configuring Interactive Cleansing Suggestion Min Score for Suggestions in Data Quality Services (DQS) – Sensitivity of Suggestion

- by pinaldave

Earlier I talked about what kind of questions, I do not like when I get asked. Today we will go over the question which I like when I get asked the same. One of the reader practices various steps in my earlier blog post Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS. While reading the blog post he noticed that Data Quality Services is not providing very helpful suggestions. He wrote an email to me about it. Let us go over his email. “Pinal, I noticed in one of your images that DQS is not providing very helpful suggestions. First of all DQS should be able to make intelligent guesses and make the necessary correction by itself. If it cannot do the same, in that case, it should give us intelligent suggestions but in the image included here, I see the suggestions are not there as well. Why is it so? Would you please tell me how to increase the numbers of suggestion? I do understand this may not be preferable solution in many case but all the business cases go on it depends. There are cases when the high sensitivity required and there are cases when higher sensitivities are not required. I would like to seek your help here. –Sriram MD” This is indeed a great question. I see that Sriram understands that every system is different and every application has a different need. I will not have to tell him this most important concept. The question is about how to change the sensitivity of suggestions for correction in DQS. Well, this option is available under the configuration tab in the DQS client. Once you click on Configuration you will see the following screen. Click the Tab of General Settings. You will see the section of Interactive Cleansing. Under this second there is the first option of “Min score for suggestions”. As this is set to 0.7 every suggestion which matches 0.7 probabilities or higher probability are displayed under the suggestion tab. You can see in the following image that there is no suggestion as the min score for suggestions is set to 0.7 and there is no record which qualifies to that much confidence. Now let us change the value of Min Score for suggestion to 0.5. The lower value increased the confidence of DQS to give further suggestion to values which are over 0.5. However, in our case the suggestions which it provides are also accurate. This may not be true for your sample. Every sample is different so you should manually review it before approving them. I guess, this is a simple blog post to demonstrate how to change the confidence value for the suggestions which Data Quality Services provides. Use this feature with care and always tune it according to your datasets and record diversity. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

Read the article
Migrating test cases & defects from Quality Center to TFS 2008/2010

- by stackoverflowuser

Tool that can be used to migrate (or even better..synchronize) test cases and bugs between: TFS 2008 and Quality Center 9.2 (or later) TFS 2010 and Quality Center 9.2 (or later) I am aware of the following tools: Test Case Migrator (Excel/MHT) Tool TFS Bug Item Synchronizer 2.2 for Quality Center Also shai raiten mentions on his blog about QC 2 Team System 2010 migration tool that he has been working on and its done. But could not find any link for downloading the tool. http://blogs.microsoft.co.il/blogs/shair/archive/2009/12/31/quality-center-migration-to-team-system-2010-done.aspx Before jumping on coding with TFS SDK and QC components to come up with my own tool I need some inputs from the stackoverflow community.

Read the article
Ensuring quality of your software and code

- by Filip Ekberg

When I usually write code I follow some guidelines to ensure that my code has a certain standard and I as any other developer try to ensure that my code and software is of quality. Try to focus on the programming and not the understanding of the domain or any other pre-programming steps. These are the following steps I live by: Writing unit tests Make it fail ( no code ) Make it Work ( working code ) Analysing abstraction Extracting methods Exteract interfaces Refactoring In addition to the above which is a part of refactoring, I also try to refactor the code with good tools such as ReSharper, CodeRush or others. The question; What is the next step? Commenting the code is trivial and shouldn't even have to be mentioned, but updated comments and xml-comments where it's needed / everywhere is something that I try to have. But all the above helps he ensure that other developers might understand my code, that the code has some sort of quality and follows naming standards. It does however not ensure any product quality. I am looking for tools for post-development quality ensurance, such as profilers and how one would use these tools to increase product quality.

Read the article
Big Data – Buzz Words: What is HDFS – Day 8 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is MapReduce. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – HDFS. What is HDFS ? HDFS stands for Hadoop Distributed File System and it is a primary storage system used by Hadoop. It provides high performance access to data across Hadoop clusters. It is usually deployed on low-cost commodity hardware. In commodity hardware deployment server failures are very common. Due to the same reason HDFS is built to have high fault tolerance. The data transfer rate between compute nodes in HDFS is very high, which leads to reduced risk of failure. HDFS creates smaller pieces of the big data and distributes it on different nodes. It also copies each smaller piece to multiple times on different nodes. Hence when any node with the data crashes the system is automatically able to use the data from a different node and continue the process. This is the key feature of the HDFS system. Architecture of HDFS The architecture of the HDFS is master/slave architecture. An HDFS cluster always consists of single NameNode. This single NameNode is a master server and it manages the file system as well regulates access to various files. In additional to NameNode there are multiple DataNodes. There is always one DataNode for each data server. In HDFS a big file is split into one or more blocks and those blocks are stored in a set of DataNodes. The primary task of the NameNode is to open, close or rename files and directory and regulate access to the file system, whereas the primary task of the DataNode is read and write to the file systems. DataNode is also responsible for the creation, deletion or replication of the data based on the instruction from NameNode. In reality, NameNode and DataNode are software designed to run on commodity machine build in Java language. Visual Representation of HDFS Architecture Let us understand how HDFS works with the help of the diagram. Client APP or HDFS Client connects to NameSpace as well as DataNode. Client App access to the DataNode is regulated by NameSpace Node. NameSpace Node allows Client App to connect to the DataNode based by allowing the connection to the DataNode directly. A big data file is divided into multiple data blocks (let us assume that those data chunks are A,B,C and D. Client App will later on write data blocks directly to the DataNode. Client App does not have to directly write to all the node. It just has to write to any one of the node and NameNode will decide on which other DataNode it will have to replicate the data. In our example Client App directly writes to DataNode 1 and detained 3. However, data chunks are automatically replicated to other nodes. All the information like in which DataNode which data block is placed is written back to NameNode. High Availability During Disaster Now as multiple DataNode have same data blocks in the case of any DataNode which faces the disaster, the entire process will continue as other DataNode will assume the role to serve the specific data block which was on the failed node. This system provides very high tolerance to disaster and provides high availability. If you notice there is only single NameNode in our architecture. If that node fails our entire Hadoop Application will stop performing as it is a single node where we store all the metadata. As this node is very critical, it is usually replicated on another clustered as well as on another data rack. Though, that replicated node is not operational in architecture, it has all the necessary data to perform the task of the NameNode in the case of the NameNode fails. The entire Hadoop architecture is built to function smoothly even there are node failures or hardware malfunction. It is built on the simple concept that data is so big it is impossible to have come up with a single piece of the hardware which can manage it properly. We need lots of commodity (cheap) hardware to manage our big data and hardware failure is part of the commodity servers. To reduce the impact of hardware failure Hadoop architecture is built to overcome the limitation of the non-functioning hardware. Tomorrow In tomorrow’s blog post we will discuss the importance of the relational database in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Oracle Data Integration 12c: Perspectives of Industry Experts, Customers and Partners

- by Irem Radzik

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 As you may have seen from our recent blog posts on Oracle Data Integrator 12c and Oracle GoldenGate 12c, we are very excited to share with you the great new features the 12c release brings to Oracle’s data integration solutions. And, fortunately we are not alone in this sentiment. Since the press announcement October 17th, which incorporates our customers' and experts' testimonials, we have seen positive comments in leading technology publications and social media as well. Here are some examples: In CIO and PCWorld you can find Joab Jackson’s article, Oracle Data Integrator 12c ready for real-time analysis, where wrote about the tight integration between Oracle Data Integrator and Oracle GoldenGate . He noted “Heeding the call from enterprise customers who clamor for more immediacy in their data-driven reports, Oracle has updated its data-integration software portfolio so that it can more rapidly deliver data to data warehouses and analysis applications.” Integration Developer News’ Vance McCarthy wrote the article Oracle Ships ‘Future Proofs’ Integration Tools for Traditional, Cloud, Big Data, Real-Time Projects and mentioned that “Oracle Data Integrator 12c and Oracle GoldenGate 12c sport a wide range of improvements to let devs more easily deliver data integration for cloud, analytics, big data and other new projects that leverage multiple datasets for business.“ InformationWeek’s Doug Henschen gave a great overview to several key features including the new flow-based UI in Oracle Data Integrator. Doug said “Oracle Data Integrator 12c introduces a complete makeover of the job-building experience, while real-time oriented GoldenGate 12c introduces performance gains “. In Database Trends and Applications’ article Oracle Strengthens Data Integration with Release of Oracle Data Integrator 12c and Oracle GoldenGate 12c highlighted the productivity aspect of the new solution with his remarks: “tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c enables developers to leverage Oracle GoldenGate’s low overhead, real-time change data capture completely within the Oracle Data Integrator Studio without additional training”. We are also thrilled about what our customers and partners have to say about our products and the new release. And we are equally excited to share those perspectives with you in our upcoming launch video webcast on November 12th. SolarWorld Industries America’s Senior Database Manager, Russ Toyama will join our executives in our studio in Redwood Shores to discuss GoldenGate’s core benefits and the new release, while Surren Partharb, CTO of Strategic Technology Services for BT, and Mark Rittman, CTO of Rittman Mead, will provide their comments via the interviews conducted in the UK. This interactive panel discussion in the video webcast will unveil the new release with the expertise of our development executives and the great insight from our customers and partners. In addition, our product experts will be available online to answer chat questions. This is really a great opportunity to learn how Oracle's data integration offering has changed the integration and replication technology space with the new release, and established itself as the new leader. If you have not registered for this free event yet, you can do so via this link. We will run the live event at 8am PT/4pm GMT, followed by a replay of the event with live chat for Q&A at 10am PT/6pm GMT. The replay will be available on-demand for those who register but cannot attend either session on November 12th. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";}

Read the article
Enterprise Data Quality - New and Improved on Oracle Technology Network

- by Mala Narasimharajan

Looking for Enterprise Data Quality technical and developer resources on your projects? Wondering where the best place is to go for finding the latest documentations, downloads and even code samples and libraries? Check out the new and improved Oracle Technical Network pages for Oracle Enterprise Data Quality. This section features developer forums as well for EDQ and Master Data Management so that you can connect with other technical professionals who have submitted concerns or posted tips and tricks and learn from them. Here are the links to bookmark: Oracle Technology Network website * NEW * Installation Guide for Enterprise Data Quality Address Verification Enterprise Data Quality Forum For more information on Oracle's software offerings for data quality and master data management visit: http://www.oracle.com/us/products/applications/master-data-management/index.html http://www.oracle.com/us/products/middleware/data-integration/enterprise-data-quality/overview/index.html

Read the article
New Feature in ODI 11.1.1.6: Enterprise Data Quality Integration

- by Julien Testut

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Oracle Data Integrator 11.1.1.6.0 introduces a new Open Tool called EnterpriseDataQuality which allows ODI users to invoke an Oracle Enterprise Data Quality Job from a Package. This post will give you an overview of this new feature. Oracle Enterprise Data Quality (OEDQ) provides organizations with an integrated suite of data quality tools that offer an end-to-end solution to measure, improve, and manage the quality of data from any domain, including customer and product data. The addition of the EnterpriseDataQuality Open Tool extends the inline Data Quality capabilities of Oracle Data Integrator with Oracle Enterprise Data Quality powerful data profiling, cleansing, matching, and monitoring capabilities. The EnterpriseDataQuality Open Tool can invoke any OEDQ Job stored in a Project. This Open Tool connects to an OEDQ server using a JMX (Java Management Extensions) interface. Once installed, this Open Tool will be found under Plugins in the Package Toolbox area: This EnterpriseDataQuality Open Tool takes a couple of parameters as inputs such as the Enterprise Data Quality Job and Project names, the OEDQ hostname and JMX port etc. With the EnterpriseDataQuality Open Tool, ODI customers can now incorporate their Oracle Enterprise Data Quality processes within their Data Integration workflows. You will find instructions about how to use the Enterprise Data Quality Open Tool in the Oracle Data Integrator documentation at: Using the EnterpriseDataQuality Open Tool.You can find an overview of all the new features introduced in ODI 11.1.1.6 in the following document: ODI 11.1.1.6 New Features Overview.

Read the article
photshop: why low quality jpg saved as high quality increases the file size?

- by Alex Angelico

i have a low quality background about 85kb. If I open the file in Photoshop and save it, I have to save this image as 100% quality, the file size increases to 640kb. This doesn't make sense, the image si already compressed, the quality cannot be better than the source, so the 100% quality while saving should produce THE SAME FILE OPENED. The problem is, I have this background and want to add a logo image over it, and then save it. But If I do so, i have to save this with 10% quality, and the logo looks horrible. If I save this to 100% quality, the file size is huge... How can I achive this?

Read the article
New Big Data Appliance Security Features

- by mgubar

The Oracle Big Data Appliance (BDA) is an engineered system for big data processing. It greatly simplifies the deployment of an optimized Hadoop Cluster – whether that cluster is used for batch or real-time processing. The vast majority of BDA customers are integrating the appliance with their Oracle Databases and they have certain expectations – especially around security. Oracle Database customers have benefited from a rich set of security features: encryption, redaction, data masking, database firewall, label based access control – and much, much more. They want similar capabilities with their Hadoop cluster. Unfortunately, Hadoop wasn’t developed with security in mind. By default, a Hadoop cluster is insecure – the antithesis of an Oracle Database. Some critical security features have been implemented – but even those capabilities are arduous to setup and configure. Oracle believes that a key element of an optimized appliance is that its data should be secure. Therefore, by default the BDA delivers the “AAA of security”: authentication, authorization and auditing. Security Starts at Authentication A successful security strategy is predicated on strong authentication – for both users and software services. Consider the default configuration for a newly installed Oracle Database; it’s been a long time since you had a legitimate chance at accessing the database using the credentials “system/manager” or “scott/tiger”. The default Oracle Database policy is to lock accounts thereby restricting access; administrators must consciously grant access to users. Default Authentication in Hadoop By default, a Hadoop cluster fails the authentication test. For example, it is easy for a malicious user to masquerade as any other user on the system. Consider the following scenario that illustrates how a user can access any data on a Hadoop cluster by masquerading as a more privileged user. In our scenario, the Hadoop cluster contains sensitive salary information in the file /user/hrdata/salaries.txt. When logged in as the hr user, you can see the following files. Notice, we’re using the Hadoop command line utilities for accessing the data: $ hadoop fs -ls /user/hrdataFound 1 items-rw-r--r-- 1 oracle supergroup 70 2013-10-31 10:38 /user/hrdata/salaries.txt$ hadoop fs -cat /user/hrdata/salaries.txtTom Brady,11000000Tom Hanks,5000000Bob Smith,250000Oprah,300000000 User DrEvil has access to the cluster – and can see that there is an interesting folder called “hrdata”. $ hadoop fs -ls /user Found 1 items drwx------ - hr supergroup 0 2013-10-31 10:38 /user/hrdata However, DrEvil cannot view the contents of the folder due to lack of access privileges: $ hadoop fs -ls /user/hrdata ls: Permission denied: user=drevil, access=READ_EXECUTE, inode="/user/hrdata":oracle:supergroup:drwx------ Accessing this data will not be a problem for DrEvil. He knows that the hr user owns the data by looking at the folder’s ACLs. To overcome this challenge, he will simply masquerade as the hr user. On his local machine, he adds the hr user, assigns that user a password, and then accesses the data on the Hadoop cluster: $ sudo useradd hr $ sudo passwd $ su hr $ hadoop fs -cat /user/hrdata/salaries.txt Tom Brady,11000000 Tom Hanks,5000000 Bob Smith,250000 Oprah,300000000 Hadoop has not authenticated the user; it trusts that the identity that has been presented is indeed the hr user. Therefore, sensitive data has been easily compromised. Clearly, the default security policy is inappropriate and dangerous to many organizations storing critical data in HDFS. Big Data Appliance Provides Secure Authentication The BDA provides secure authentication to the Hadoop cluster by default – preventing the type of masquerading described above. It accomplishes this thru Kerberos integration. Figure 1: Kerberos Integration The Key Distribution Center (KDC) is a server that has two components: an authentication server and a ticket granting service. The authentication server validates the identity of the user and service. Once authenticated, a client must request a ticket from the ticket granting service – allowing it to access the BDA’s NameNode, JobTracker, etc. At installation, you simply point the BDA to an external KDC or automatically install a highly available KDC on the BDA itself. Kerberos will then provide strong authentication for not just the end user – but also for important Hadoop services running on the appliance. You can now guarantee that users are who they claim to be – and rogue services (like fake data nodes) are not added to the system. It is common for organizations to want to leverage existing LDAP servers for common user and group management. Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository. This simplifies the deployment and administration of the secure environment. Authorize Access to Sensitive Data Kerberos-based authentication ensures secure access to the system and the establishment of a trusted identity – a prerequisite for any authorization scheme. Once this identity is established, you need to authorize access to the data. HDFS will authorize access to files using ACLs with the authorization specification applied using classic Linux-style commands like chmod and chown (e.g. hadoop fs -chown oracle:oracle /user/hrdata changes the ownership of the /user/hrdata folder to oracle). Authorization is applied at the user or group level – utilizing group membership found in the Linux environment (i.e. /etc/group) or in the LDAP server. For SQL-based data stores – like Hive and Impala – finer grained access control is required. Access to databases, tables, columns, etc. must be controlled. And, you want to leverage roles to facilitate administration. Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project’s founding members. Sentry satisfies the following three authorization requirements: Secure Authorization: the ability to control access to data and/or privileges on data for authenticated users. Fine-Grained Authorization: the ability to give users access to a subset of the data (e.g. column) in a database Role-Based Authorization: the ability to create/apply template-based privileges based on functional roles. With Sentry, “all”, “select” or “insert” privileges are granted to an object. The descendants of that object automatically inherit that privilege. A collection of privileges across many objects may be aggregated into a role – and users/groups are then assigned that role. This leads to simplified administration of security across the system. Figure 2: Object Hierarchy – granting a privilege on the database object will be inherited by its tables and views. Sentry is currently used by both Hive and Impala – but it is a framework that other data sources can leverage when offering fine-grained authorization. For example, one can expect Sentry to deliver authorization capabilities to Cloudera Search in the near future. Audit Hadoop Cluster Activity Auditing is a critical component to a secure system and is oftentimes required for SOX, PCI and other regulations. The BDA integrates with Oracle Audit Vault and Database Firewall – tracking different types of activity taking place on the cluster: Figure 3: Monitored Hadoop services. At the lowest level, every operation that accesses data in HDFS is captured. The HDFS audit log identifies the user who accessed the file, the time that file was accessed, the type of access (read, write, delete, list, etc.) and whether or not that file access was successful. The other auditing features include: MapReduce: correlate the MapReduce job that accessed the file Oozie: describes who ran what as part of a workflow Hive: captures changes were made to the Hive metadata The audit data is captured in the Audit Vault Server – which integrates audit activity from a variety of sources, adding databases (Oracle, DB2, SQL Server) and operating systems to activity from the BDA. Figure 4: Consolidated audit data across the enterprise. Once the data is in the Audit Vault server, you can leverage a rich set of prebuilt and custom reports to monitor all the activity in the enterprise. In addition, alerts may be defined to trigger violations of audit policies. Conclusion Security cannot be considered an afterthought in big data deployments. Across most organizations, Hadoop is managing sensitive data that must be protected; it is not simply crunching publicly available information used for search applications. The BDA provides a strong security foundation – ensuring users are only allowed to view authorized data and that data access is audited in a consolidated framework.

Read the article
How to present a stable data model in a public API that allows internal data structures to be changed without breaking the public view of the data?

- by Max Palmer

I am in the process of developing an application that allows users to write C# scripts. These scripts allow users to call selected methods and to access and manipulate data in a document. This works well, however, in the development version, scripts access the document's (internal) data structures directly. This means that if we were to change the internal data model/structure, there is a good chance that someone's script will no longer compile. We obviously want to prevent this breaking change from happening, but still want to allow the user to write sensible C# code (whilst not restricting how we develop our internal data model as a result). We therefore need to decouple our scripting API and its data structures from our internal methods and data structures. We've a few ideas as to how we might allow the user to access a what is effectively a stable public version of the document's internal data*, but I wanted to throw the question out there to someone who might have some real experience of this problem. NB our internal document's data structure is quite complex and it could be quite difficult to wrap. We know we want to expose as little as possible in our public API, especially as once it's out there, it's out there for good. Can anyone help? How do scripting languages / APIs decouple their public API and data structures from their internal data structures? Is there no real alternative to having to write a complex interaction layer? If we need to do this, what's a good approach or pattern for wrapping complex data structures that include nested objects, including collections? I've looked at the API facade pattern, which looks like it's trying to address these kinds of issues, but are there alternatives? *One idea is to build a data facade that is kept stable across versions of our application. The facade exposes a set of facade data objects that are used in the script code. These maintain backwards compatibility and wrap access to our internal document's data model.

Read the article
Why Cornell University Chose Oracle Data Masking

- by Troy Kitch

One of the eight Ivy League schools, Cornell University found itself in the unfortunate position of having to inform over 45,000 University community members that their personal information had been breached when a laptop was stolen. To ensure this wouldn’t happen again, Cornell took steps to ensure that data used for non-production purposes is de-identified with Oracle Data Masking. A recent podcast highlights why organizations like Cornell are choosing Oracle Data Masking to irreversibly de-identify production data for use in non-production environments. Organizations often copy production data, that contains sensitive information, into non-production environments so they can test applications and systems using “real world” information. Data in non-production has increasingly become a target of cyber criminals and can be lost or stolen due to weak security controls and unmonitored access. Similar to production environments, data breaches in non-production environments can cost millions of dollars to remediate and cause irreparable harm to reputation and brand. Cornell’s applications and databases help carry out the administrative and academic mission of the university. They are running Oracle PeopleSoft Campus Solutions that include highly sensitive faculty, student, alumni, and prospective student data. This data is supported and accessed by a diverse set of developers and functional staff distributed across the university. Several years ago, Cornell experienced a data breach when an employee’s laptop was stolen. Centrally stored backup information indicated there was sensitive data on the laptop. With no way of knowing what the criminal intended, the university had to spend significant resources reviewing data, setting up service centers to handle constituent concerns, and provide free credit checks and identity theft protection services—all of which cost money and took time away from other projects. To avoid this issue in the future Cornell came up with several options; one of which was to sanitize the testing and training environments. “The project management team was brought in and they developed a project plan and implementation schedule; part of which was to evaluate competing products in the market-space and figure out which one would work best for us. In the end we chose Oracle’s solution based on its architecture and its functionality.” – Tony Damiani, Database Administration and Business Intelligence, Cornell University The key goals of the project were to mask the elements that were identifiable as sensitive in a consistent and efficient manner, but still support all the previous activities in the non-production environments. Tony concludes, “What we saw was a very minimal impact on performance. The masking process added an additional three hours to our refresh window, but it was well worth that time to secure the environment and remove the sensitive data. I think some other key points you can keep in mind here is that there was zero impact on the production environment. Oracle Data Masking works in non-production environments only. Additionally, the risk of exposure has been significantly reduced and the impact to business was minimal.” With Oracle Data Masking organizations like Cornell can: Make application data securely available in non-production environments Prevent application developers and testers from seeing production data Use an extensible template library and policies for data masking automation Gain the benefits of referential integrity so that applications continue to work Listen to the podcast to hear the complete interview. Learn more about Oracle Data Masking by registering to watch this SANS Institute Webcast and view this short demo.

Read the article
Removing Barriers to Create Effective Data Models

After years of creating and maintaining data models, I have started to notice common barriers that decrease the accuracy and usefulness of models. In my opinion, the main causes of these barriers are the lack of knowledge and communication from within a company. The lack of knowledge in regards to data models or data modeling can take many forms. Company Culture Knowledge Whether documented or undocumented, existing business rules of a company can affect how data is modeled. For example, if a company only allows 1 assigned person per customer to be able to manipulate a customer’s record then then a data model that includes an associated table that joins customers and employee’s would be unneeded because that would allow for the possibility of multiple employees to handle a customer because of the potential for a many to many relationship between Customers and Employees. Technical Knowledge Depending on the data modeler’s proficiency in modeling data they can inadvertently cause issues and/or complications with a design without even noticing. It is important that companies share data modeling responsibilities so that the models are developed from multiple perspectives of a system, company and the original problem. In addition, the tools that a company selects to create data models can also affect the accuracy of the model if designer are not familiar with the tools or the tools are too complex to use for the designer. Existing System Knowledge In order for a data modeler to model data for an existing system so that new changes can be applied to a system then they need to at least know the basic concepts of a system so that they can work within it. This will promote reusability of data and prevent the chance of duplicating data. Project Knowledge This should be pretty obvious, but it is very hard to create an accurate data model without knowing what data needs to be modeled. I have always found it strange that I have been asked to start modeling data prior to a client formalizing any requirements. Usually when this happens I have to make several iterations to a model, and the client still does not know exactly what they want. In addition additional issues can arise when certain stakeholders of a project are not consulted prior to the design or after the project is over because it can cause miss understandings and confusion by the end user as well as possibly not solving the original problem for which a project is intended to solve. One common thread between each type of knowledge is that they can all be avoided through the use of good communication. For example, if a modeler is new to a company then they should ask older employees about any business specific rules that may be documented or undocumented that must be applied to projects in general. Furthermore, if a modeler is not really familiar with a specific data modeling software then they need to speak up and ask for help form other employees or their manager. This will not only help the modeler in the project, but also help them in future projects that they do for the company. Additionally, if a project is not clearly defined prior to a data modeler being assigned the modeling project then it is their responsibility to communicate with the other stakeholders to clarify any part of a project that is unclear so that the data model that is created is accurately aligned with a project.

Read the article
How often do you use data structures (ie Binary Trees, Linked Lists) in your jobs/side projects?

- by Chris2021

It seems to me that, for everyday use, more primitive data structures like arrays get the job done just as well as a binary tree would. My question is how common is to use these structures when writing code for projects at work or projects that you pursue in your free time? I understand the better insertion time/deletion time/sorting time for certain structures but would that really matter that much if you were working with a relatively small amount of data?

Read the article
Why would you use data structures (ie Binary Trees, Linked Lists) in your jobs/side projects? [closed]

- by Chris2021

It seems to me that, for everyday use, more primitive data structures like arrays get the job done just as well as a binary tree would. My question is how common is to use these structures when writing code for projects at work or projects that you pursue in your free time? I understand the better insertion time/deletion time/sorting time for certain structures but would that really matter that much if you were working with a relatively small amount of data?

Read the article
Queued Loadtest to remove Concurrency issues using Shared Data Service in OpenScript

- by stefan.thieme(at)oracle.com

Queued Processing to remove Concurrency issues in Loadtest ScriptsSome scripts act on information returned by the server, e.g. act on first item in the returned list of pending tasks/actions. This may lead to concurrency issues if the virtual users simulated in a load test scenario are not synchronized in some way.As the load test cases should be carried out in a comparable and straight forward manner simply cancel a transaction in case a collision occurs is clearly not an option. In case you increase the number of virtual users this approach would lead to a high number of requests for the early steps in your transaction (e.g. login, retrieve list of action points, assign an action point to the virtual user) but later steps would be rarely visited successfully or at all, depending on the application logic.A way to tackle this problem is to enqueue the virtual users in a Shared Data Service queue. Only the first virtual user in this queue will be allowed to carry out the critical steps (retrieve list of action points, assign an action point to the virtual user) in your transaction at any one time.Once a virtual user has passed the critical path it will dequeue himself from the head of the queue and continue with his actions. This does theoretically allow virtual users to run in parallel all steps of the transaction which are not part of the critical path.In practice it has been seen this is rarely the case, though it does not allow adding more than N users to perform a transaction without causing delays due to virtual users waiting in the queue. N being the time of the total transaction divided by the sum of the time of all critical steps in this transaction.While this problem can be circumvented by allowing multiple queues to act on individual segments of the list of actions, e.g. per country filter, ends with 0..9 filter, etc.This would require additional handling of these additional queues of slots for the virtual users at the head of the queue in order to maintain the mutually exclusive access to the first element in the list returned by the server at any one time of the load test. Such an improved handling of multiple queues and/or multiple slots is above the subject of this paper.Shared Data Services Pre-RequisitesStart WebLogic Server to host Shared Data ServicesYou will have to make sure that your WebLogic server is installed and started. Shared Data Services may not work if you installed only the minimal installation package for OpenScript. If however you installed the default package including OLT and OTM, you may follow the instructions below to start and verify WebLogic installation.To start the WebLogic Server deployed underneath of Oracle Load Testing and/or Oracle Test Manager you can go to your Start menu, Oracle Application Testing Suite and select the Restart Oracle Application Testing Suite Application Service entry from the Tools submenu.To verify the service has been started you can run the Microsoft Management Console for Services by Selecting Run from the Start Menu and entering services.msc. Look for the entry that reads Oracle Application Testing Suite Application Service, once it has changed it status from Starting to Started you can proceed to verify the login. Please note that this may take several minutes, I would say up to 10 minutes depending on the strength of your CPU horse-power.Verify WebLogic Server user credentialsYou will have to make sure that your WebLogic Server is installed and started. Next open the Oracle WebLogic Server Adminstration Console on http://localhost:8088/console.It may take a while until the application is deployed and started. It may display the following until the Administration Console has been deployed on the fly.Afterwards you can login using the username oats and the password that you selected during install time for your Application Testing Suite administrative purposes.This will bring up the Home page of you WebLogic Server. You have actually verified that you are able to login with these credentials already. However if you want to check the details, navigate to Security Realms, myrealm, Users and Groups tab.Here you could add users to your WebLogic Server which could be used in the later steps. Details on the Groups required for such a custom user to work are exceeding this quick overview and have to be selected with the WebLogic Server Adminstration Guide in mind.Shared Data Services pre-requisites for Load testingOpenScript Preferences have to be set to enable Encryption and provide a default Shared Data Service Connection for Playback.These are pre-requisites you want to use for load testing with Shared Data Services.Please note that the usage of the Connection Parameters (individual directive in the script) for Shared Data Services did not playback reliably in the current version 9.20.0370 of Oracle Load Testing (OLT) and encryption of credentials still seemed to be mandatory as well.General Encryption settingsSelect OpenScript Preferences from the View menu and navigate to the General, Encryption entry in the tree on the left. Select the Encrypt script data option from the list and enter the same password that you used for securing your WebLogic Server Administration Console.Enable global shared data access credentialsSelect OpenScript Preferences from the View menu and navigate to the Playback, Shared Data entry in the tree on the left. Enable the global shared data access credentials and enter the Address, User name and Password determined for your WebLogic Server to host Shared Data Services.Please note, that you may want to replace the localhost in Address with the hosts realname in case you plan to run load tests with Loadtest Agents running on remote systems.Queued Processing of TransactionsEnable Shared Data Services Module in Script PropertiesThe Shared Data Services Module has to be enabled for each Script that wants to employ the Shared Data Service Queue functionality in OpenScript. It can be enabled under the Script menu selecting Script Properties. On the Script Properties Dialog select the Modules section and check Shared Data to enable Shared Data Service Module for your script. Checking the Shared Data Services option will effectively add a line to your script code that adds the sharedData ScriptService to your script class of IteratingVUserScript.@ScriptService oracle.oats.scripting.modules.sharedData.api.SharedDataService sharedData;Record your scriptRecord your script as usual and then add the following things for Queue handling in the Initialize code block, before the first step and after the last step of your critical path and in the Finalize code block.The java code to be added at individual locations is explained in the following sections in full detail.Create a Shared Data Queue in InitializeTo create a Shared Data Queue go to the Java view of your script and enter the following statements to the initialize() code block.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);This will create an instantiation of the Shared Data Queue object named queueA which is maintained for upto 120 minutes.If you want to use the code for multiple scripts, make sure to use a different queue name for each one here and in the subsequent steps. You may even consider to use a dynamic queueName based on filters of your result list being concurrently accessed.Prepare a unique id for each IterationIn order to keep track of individual virtual users in our queue we need to create a unique identifier from the virtual user id and the used username right after retrieving the next record from our databank file.getDatabank("Usernames").getNextDatabankRecord();getVariables().set("usernameValue1","VU_{{@vuid}}_{{@iterationnum}}_{{db.Usernames.Username}}_{{@timestamp}}_{{@random(10000)}}");String usernameValue = getVariables().get("usernameValue1");info("Now running virtual user " + usernameValue);As you can see from the above code block, we have set the OpenScript variable usernameValue1 to VU_{{@vuid}}_{{@iterationnum}}_{{db.Usernames.Username}}_{{@timestamp}}_{{@random(10000)}} which is a concatenation of the virtual user id and the iterationnumber for general uniqueness; as well as the username from our databank, the timestamp and a random number for making it further unique and ease spotting of errors.Not all of these fields are actually required to make it really unique, but adding the queue name may also be considered to help troubleshoot multiple queues.The value is then retrieved with the getVariables.get() method call and assigned to the usernameValue String used throughout the script.Please note that moving the getDatabank("Usernames").getNextDatabankRecord(); call to the initialize block was later considered to remove concurrency of multiple virtual users running with the same userid and therefor accessing the same "My Inbox" in step 6. This will effectively give each virtual user a userid from the databank file. Make sure you have enough userids to remove this second hurdle.Enqueue and attend Queue before Critical PathTo maintain the right order of virtual users being allowed into the critical path of the transaction the following pseudo step has to be added in front of the first critical step. In the case of this example this is right in front of the step where we retrieve the list of actions from which we select the first to be assigned to us.beginStep("[0] Waiting in the Queue", 0);{info("Enqueued virtual user " + usernameValue + " at the end of queueA");sharedData.offerLast("queueA", usernameValue);info("Wait until the user is the first in queueA");String queueValue1 = null;do {// we wait for at least 0.7 seconds before we check the head of the// queue. This is the time it takes one user to move through the// critical path, i.e. pass steps [5] Enter country and [6] Assign// to meThread.sleep(700);queueValue1 = (String) sharedData.peekFirst("queueA");info("The first user in queueA is currently: '" + queueValue1 + "' " + queueValue1.getClass() + " length " + queueValue1.length() );info("The current user is '"+ usernameValue + "' " + usernameValue.getClass() + " length " + usernameValue.length() + ": indexOf " + usernameValue.indexOf(queueValue1) + " equals " + usernameValue.equals(queueValue1) );} while ( queueValue1.indexOf(usernameValue) < 0 );info("Now the user is the first in queueA");}endStep();This will enqueue the username to the tail of our Queue. It will will wait for at least 700 milliseconds, the time it takes for one user to exit the critical path and then compare the head of our queue with it's username. This last step will be repeated while the two are not equal (indexOf less than zero). If they are equal the indexOf will yield a value of zero or larger and we will perform the critical steps.Dequeue after Critical PathAfter the virtual user has left the critical path and complete its last step the following code block needs to dequeue the virtual user. In the case of our example this is right after the action has been actually assigned to the virtual user. This will allow the next virtual user to retrieve the list of actions still available and in turn let him make his selection/assignment.info("Get and remove the current user from the head of queueA");String pollValue1 = (String) sharedData.pollFirst("queueA");The current user is removed from the head of the queue. The next one will now be able to match his username against the head of the queue.Clear and Destroy Queue for FinishWhen the script has completed, it should clear and destroy the queue. This code block can be put in the finish block of your script and/or in a separate script in order to clear and remove the queue in case you have spotted an error or want to reset the queue for some reason.info("Clear queueA");sharedData.clearQueue("queueA");info("Destroy queueA");sharedData.destroyQueue("queueA");The users waiting in queueA are cleared and the queue is destroyed. If you have scripts still executing they will be caught in a loop.I found it better to maintain a separate Reset Queue script which contained only the following code in the initialize() block. I use to call this script to make sure the queue is cleared in between multiple Loadtest runs. This script could also even be added as the first in a larger scenario, which would execute it only once at very start of the Loadtest and make sure the queues do not contain any stale entries.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);info("Clear queueA");sharedData.clearQueue("queueA");This will create a Shared Data Queue instance of queueA and clear all entries from this queue.Monitoring QueueWhile creating the scripts it was useful to monitor the contents, i.e. the current first user in the Queue. The following code block will make sure the Shared Data Queue is accessible in the initialize() block.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);In the run() block the following code will continuously monitor the first element of the Queue and write an informational message with the current username Value to the Result window.info("Monitor the first users in queueA");String queueValue1 = null;do {queueValue1 = (String) sharedData.peekFirst("queueA");if (queueValue1 != null)info("The first user in queueA is currently: '" + queueValue1 + "' " + queueValue1.getClass() + " length " + queueValue1.length() );} while ( true );This script can be run from OpenScript parallel to a loadtest performed by the Oracle Load Test.However it is not recommend to run this in a production loadtest as the performance impact is unknown. Accessing the Queue's head with the peekFirst() method has been reported with about 2 seconds response time by both OpenScript and OTL. It is advised to log a Service Request to see if this could be lowered in future releases of Application Testing Suite, as the pollFirst() and even offerLast() writing to the tail of the Queue usually returned after an average 0.1 seconds.Debugging QueueWhile debugging the scripts the following was useful to remove single entries from its head, i.e. the current first user in the Queue. The following code block will make sure the Shared Data Queue is accessible in the initialize() block.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);In the run() block the following code will remove the first element of the Queue and write an informational message with the current username Value to the Result window.info("Get and remove the current user from the head of queueA");String pollValue1 = (String) sharedData.pollFirst("queueA");info("The first user in queueA was currently: '" + pollValue1 + "' " + pollValue1.getClass() + " length " + pollValue1.length() );ReferencesOracle Functional Testing OpenScript User's Guide Version 9.20 [E15488-05]Chapter 17 Using the Shared Data Modulehttp://download.oracle.com/otn/nt/apptesting/oats-docs-9.21.0030.zipOracle Fusion Middleware Oracle WebLogic Server Administration Console Online Help 11g Release 1 (10.3.4) [E13952-04]Administration Console Online Help - Manage users and groupshttp://download.oracle.com/docs/cd/E17904_01/apirefs.1111/e13952/taskhelp/security/ManageUsersAndGroups.htm

Read the article
Oracle Enterprise Data Quality: Ever Integration-ready

- by Mala Narasimharajan

It is closing in on a year now since Oracle’s acquisition of Datanomic, and the addition of Oracle Enterprise Data Quality (EDQ) to the Oracle software family. The big move has caused some big shifts in emphasis and some very encouraging excitement from the field. To give an illustration, combined with a shameless promotion of how EDQ can help to give quick insights into your data, I did a quick Phrase Profile of the subject field of emails to the Global EDQ mailing list since it was set up last September. The results revealed a very clear theme: Integration, Integration, Integration! As well as the important Siebel and Oracle Data Integrator (ODI) integrations, we have been asked about integration with a huge variety of Oracle applications, including EBS, Peoplesoft, CRM on Demand, Fusion, DRM, Endeca, RightNow, and more - and we have not stood still! While it would not have been possible to develop specific pre-integrations with all of the above within a year, we have developed a package of feature-rich out-of-the-box web services and batch processes that can be plugged into any application or middleware technology with ease. And with Siebel, they work out of the box. Oracle Enterprise Data Quality version 9.0.4 includes the Customer Data Services (CDS) pack – a ready set of standard processes with standard interfaces, to provide integrated: Address verification and cleansing Individual matching Organization matching The services can are suitable for either Batch or Real-Time processing, and are enabled for international data, with simple configuration options driving the set of locale-specific dictionaries that are used. For example, large dictionaries are provided to support international name transcription and variant matching, including highly specialized handling for Arabic, Japanese, Chinese and Korean data. In total across all locales, CDS includes well over a million dictionary entries. Excerpt from EDQ’s CDS Individual Name Standardization Dictionary CDS has been developed to replace the OEM of Informatica Identity Resolution (IIR) for attached Data Quality on the Oracle price list, but does this in a way that creates a ‘best of both worlds’ situation for customers, who can harness not only the out-of-the-box functionality of pre-packaged matching and standardization services, but also the flexibility of OEDQ if they want to customize the interfaces or the process logic, without having to learn more than one product. From a competitive point of view, we believe this stands us in good stead against our key competitors, including Informatica, who have separate ‘Identity Resolution’ and general DQ products, and IBM, who provide limited out-of-the-box capabilities (with a steep learning curve) in both their QualityStage data quality and Initiate matching products. Here is a brief guide to the main services provided in the pack: Address Verification and Standardization EDQ’s CDS Address Cleaning Process The Address Verification and Standardization service uses EDQ Address Verification (an OEM of Loqate software) to verify and clean addresses in either real-time or batch. The Address Verification processor is wrapped in an EDQ process – this adds significant capabilities over calling the underlying Address Verification API directly, specifically: Country-specific thresholds to determine when to accept the verification result (and therefore to change the input address) based on the confidence level of the API Optimization of address verification by pre-standardizing data where required Formatting of output addresses into the input address fields normally used by applications Adding descriptions of the address verification and geocoding return codes The process can then be used to provide real-time and batch address cleansing in any application; such as a simple web page calling address cleaning and geocoding as part of a check on individual data. Duplicate Prevention Unlike Informatica Identity Resolution (IIR), EDQ uses stateless services for duplicate prevention to avoid issues caused by complex replication and synchronization of large volume customer data. When a record is added or updated in an application, the EDQ Cluster Key Generation service is called, and returns a number of key values. These are used to select other records (‘candidates’) that may match in the application data (which has been pre-seeded with keys using the same service). The ‘driving record’ (the new or updated record) is then presented along with all selected candidates to the EDQ Matching Service, which decides which of the candidates are a good match with the driving record, and scores them according to the strength of match. In this model, complex multi-locale EDQ techniques can be used to generate the keys and ensure that the right balance between performance and matching effectiveness is maintained, while ensuring that the application retains control of data integrity and transactional commits. The process is explained below: EDQ Duplicate Prevention Architecture Note that where the integration is with a hub, there may be an additional call to the Cluster Key Generation service if the master record has changed due to merges with other records (and therefore needs to have new key values generated before commit). Batch Matching In order to allow customers to use different match rules in batch to real-time, separate matching templates are provided for batch matching. For example, some customers want to minimize intervention in key user flows (such as adding new customers) in front end applications, but to conduct a more exhaustive match on a regular basis in the back office. The batch matching jobs are also used when migrating data between systems, and in this case normally a more precise (and automated) type of matching is required, in order to minimize the review work performed by Data Stewards. In batch matching, data is captured into EDQ using its standard interfaces, and records are standardized, clustered and matched in an EDQ job before matches are written out. As with all EDQ jobs, batch matching may be called from Oracle Data Integrator (ODI) if required. When working with Siebel CRM (or master data in Siebel UCM), Siebel’s Data Quality Manager is used to instigate batch jobs, and a shared staging database is used to write records for matching and to consume match results. The CDS batch matching processes automatically adjust to Siebel’s ‘Full Match’ (match all records against each other) and ‘Incremental Match’ (match a subset of records against all of their selected candidates) modes. The Future The Customer Data Services Pack is an important part of the Oracle strategy for EDQ, offering a clear path to making Data Quality Assurance an integral part of enterprise applications, and providing a strong value proposition for adopting EDQ. We are planning various additions and improvements, including: An out-of-the-box Data Quality Dashboard Even more comprehensive international data handling Address search (suggesting multiple results) Integrated address matching The EDQ Customer Data Services Pack is part of the Enterprise Data Quality Media Pack, available for download at http://www.oracle.com/technetwork/middleware/oedq/downloads/index.html.

Read the article
New version of SQL Server Data Tools is now available

- by jamiet

If you don’t follow the SQL Server Data Tools (SSDT) blog then you may not know that two days ago an updated version of SSDT was released (and by SSDT I mean the database projects, not the SSIS/SSRS/SSAS stuff) along with a new version of the SSDT Power Tools. This release incorporates a an updated version of the SQL Server Data Tier Application Framework (aka DAC Framework, aka DacFX) which you can read about on Adam Mahood’s blog post SQL Server Data-Tier Application Framework (September 2012) Available. DacFX is essentially all the gubbins that you need to extract and publish .dacpacs and according to Adam’s post it incorporates a new feature that I think is very interesting indeed: Extract DACPAC with data – Creates a database snapshot file (.dacpac) from a live SQL Server or Windows Azure SQL Database that contains data from user tables in addition to the database schema. These packages can be published to a new or existing SQL Server or Windows Azure SQL Database using the SqlPackage.exe Publish action. Data contained in package replaces the existing data in the target database. In short, .dacpacs can now include data as well as schema. I’m very excited about this because one of my long-standing complaints about SSDT (and its many forebears) is that whilst it has great support for declarative development of schema it does not provide anything similar for data – if you want to deploy data from your SSDT projects then you have to write Post-Deployment MERGE scripts. This new feature for .dacpacs does not change that situation yet however it is a very important pre-requisite so I am hoping that a feature to provide declaration of data (in addition to declaration of schema which we have today) is going to light up in SSDT in the not too distant future. Read more about the latest SSDT, Power Tools & DacFX releases at: Now available: SQL Server Data Tools - September 2012 update! by Janet Yeilding New SSDT Power Tools! Now for both Visual Studio 2010 and Visual Studio 2012 by Sarah McDevitt SQL Server Data-Tier Application Framework (September 2012) Available by Adam Mahood @Jamiet

Read the article
How Oracle Data Integration Customers Differentiate Their Business in Competitive Markets

- by Irem Radzik

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 With data being a central force in driving innovation and competing effectively, data integration has become a key IT approach to remove silos and ensure working with consistent and trusted data. Especially with the release of 12c version, Oracle Data Integrator and Oracle GoldenGate offer easy-to-use and high-performance solutions that help companies with their critical data initiatives, including big data analytics, moving to cloud architectures, modernizing and connecting transactional systems and more. In a recent press release we announced the great momentum and analyst recognition Oracle Data Integration products have achieved in the data integration and replication market. In this press release we described some of the key new features of Oracle Data Integrator 12c and Oracle GoldenGate 12c. In addition, a few from our 4500+ customers explained how Oracle’s data integration platform helped them achieve their business goals. In this blog post I would like to go over what these customers shared about their experience. Land O’Lakes is one of America’s premier member-owned cooperatives, and offers an extensive line of agricultural supplies, as well as production and business services. Rich Bellefeuille, manager, ETL & data warehouse for Land O’Lakes told us how GoldenGate helped them modernize their critical ERP system without impacting service and how they are moving to new projects with Oracle Data Integrator 12c: “With Oracle GoldenGate 11g, we've been able to migrate our enterprise-wide implementation of Oracle’s JD Edwards EnterpriseOne, ERP system, to a new database and application server platform with minimal downtime to our business. Using Oracle GoldenGate 11g we reduced database migration time from nearly 30 hours to less than 30 minutes. Given our quick success, we are considering expansion of our Oracle GoldenGate 12c footprint. We are also in the midst of deploying a solution leveraging Oracle Data Integrator 12c to manage our pricing data to handle orders more effectively and provide a better relationship with our clients. We feel we are gaining higher productivity and flexibility with Oracle's data integration products." ICON, a global provider of outsourced development services to the pharmaceutical, biotechnology and medical device industries, highlighted the competitive advantage that a solid data integration foundation brings. Diarmaid O’Reilly, enterprise data warehouse manager, ICON plc said “Oracle Data Integrator enables us to align clinical trials intelligence with the information needs of our sponsors. It helps differentiate ICON’s services in an increasingly competitive drug-development industry." You can find more info on ICON's implementation here. A popular use case for Oracle GoldenGate’s real-time data integration is offloading operational reporting from critical transaction processing systems. SolarWorld, one of the world’s largest solar-technology producers and the largest U.S. solar panel manufacturer, implemented Oracle GoldenGate for real-time data integration of manufacturing data for fast analysis. Russ Toyama, U.S. senior database administrator for SolarWorld told us real-time data helps their operations and GoldenGate’s solution supports high performance of their manufacturing systems: “We use Oracle GoldenGate for real-time data integration into our decision support system, which performs real-time analysis for manufacturing operations to continuously improve product quality, yield and efficiency. With reliable and low-impact data movement capabilities, Oracle GoldenGate also helps ensure that our critical manufacturing systems are stable and operate with high performance." You can watch the full interview with SolarWorld's Russ Toyama here. Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Starwood Hotels and Resorts is one of the many customers that found out how well Oracle Data Integration products work with Oracle Exadata. Gordon Light, senior director of information technology for StarWood Hotels, says they had notable performance gain in loading Oracle Exadata reporting environment: “We leverage Oracle GoldenGate to replicate data from our central reservations systems and other OLTP databases – significantly decreasing the overall ETL duration. Moving forward, we plan to use Oracle GoldenGate to help the company achieve near-real-time reporting.”You can listen about Starwood Hotels' implementation here. Many companies combine the power of Oracle GoldenGate with Oracle Data Integrator to have a single, integrated data integration platform for variety of use cases across the enterprise. Ufone is another good example of that. The leading mobile communications service provider of Pakistan has improved customer service using timely customer data in its data warehouse. Atif Aslam, head of management information systems for Ufone says: “Oracle Data Integrator and Oracle GoldenGate help us integrate information from various systems and provide up-to-date and real-time CRM data updates hourly, rather than daily. The applications have simplified data warehouse operations and allowed business users to make faster and better informed decisions to protect revenue in the fast-moving Pakistani telecommunications market.” You can read more about Ufone's use case here. In our Oracle Data Integration 12c launch webcast back in November we also heard from BT’s CTO Surren Parthab about their use of GoldenGate for moving to private cloud architecture. Surren also shared his perspectives on Oracle Data Integrator 12c and Oracle GoldenGate 12c releases. You can watch the video here. These are only a few examples of leading companies that have made data integration and real-time data access a key part of their data governance and IT modernization initiatives. They have seen real improvements in how their businesses operate and differentiate in today’s competitive markets. You can read about other customer examples in our Ebook: The Path to the Future and access resources including white papers, data sheets, podcasts and more via our Oracle Data Integration resource kit. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

Read the article
Data Security Through Structure, Procedures, Policies, and Governance

Security Structure and Procedures One of the easiest ways to implement security is through the use of structure, in particular the structure in which data is stored. The preferred method for this through the use of User Roles, these Roles allow for specific access to be granted based on what role a user plays in relation to the data that they are manipulating. Typical data access actions are defined by the CRUD Principle. CRUD Principle: Create New Data Read Existing Data Update Existing Data Delete Existing Data Based on the actions assigned to a role assigned, User can manipulate data as they need to preform daily business operations. An example of this can be seen in a hospital where doctors have been assigned Create, Read, Update, and Delete access to their patient’s prescriptions so that a doctor can prescribe and adjust any existing prescriptions as necessary. However, a nurse will only have Read access on the patient’s prescriptions so that they will know what medicines to give to the patients. If you notice, they do not have access to prescribe new prescriptions, update or delete existing prescriptions because only the patient’s doctor has access to preform those actions. With User Roles comes responsibility, companies need to constantly monitor data access to ensure that the proper roles have the most appropriate access levels to ensure users are not exposed to inappropriate data. In addition this also protects rouge employees from gaining access to critical business information that could be destroyed, altered or stolen. It is important that all data access is monitored because of this threat. Security Governance Current Data Governance laws regarding security Health Insurance Portability and Accountability Act (HIPAA) Sarbanes-Oxley Act Database Breach Notification Act The US Department of Health and Human Services defines HIIPAA as a Privacy Rule. This legislation protects the privacy of individually identifiable health information. Currently, HIPAA sets the national standards for securing electronically protected health records. Additionally, its confidentiality provisions protect identifiable information being used to analyze patient safety events and improve patient safety. In 2002 after the wake of the Enron and World Com Financial scandals Senator Paul Sarbanes and Representative Michael Oxley lead the creation of the Sarbanes-Oxley Act. This act administered by the Securities and Exchange Commission (SEC) dramatically altered corporate financial practices and data governance. In addition, it also set specific deadlines for compliance. The Sarbanes-Oxley is not a set of standard business rules and does not specify how a company should retain its records; In fact, this act outlines which pieces of data are to be stored as well as the storage duration. The Database Breach Notification Act requires companies, in the event of a data breach containing personally identifiable information, to notify all California residents whose information was stored on the compromised system at the time of the event, according to Gregory Manter. He further explains that this act is only California legislation. However, it does affect “any person or business that conducts business in California, and that owns or licenses computerized data that includes personal information,” regardless of where the compromised data is located. This will force any business that maintains at least limited interactions with California residents will find themselves subject to the Act’s provisions. Security Policies All companies must work in accordance with the appropriate city, county, state, and federal laws. One way to ensure that a company is legally compliant is to enforce security policies that adhere to the appropriate legislation in their area or areas that they service. These types of polices need to be mandated by a company’s Security Officer. For smaller companies, these policies need to come from executives, Directors, and Owners.

Read the article
Tips on a tool to measure code quality?

- by Cristi Diaconescu

I'm looking for a tool that can provide code quality metrics. For instance it could report very long functions (spaghetti code) very complex classes (which could contain do-it-all code) ... While we're on the (subjective:-) subject of code quality, what other code metrics would you suggest? I'm targetting C#/.NET code, but I'm sure this could extend to most programming languages.

Read the article
Excel Help: Data Input Help

- by B-Ballerl

Everyday I download data from a site that will have rows each filled with individual data for clients. I'm able to input the data into excel as a whole but after that I'm having trouble figuring out how to put it into a chart. For example Web visits time. So say Client 1 stayed for 5 min increasing his total time on the site to 20 min and Client 2 stayed for 0 min keeping his time of 10 min and they were both registered on new years eve, and R1's last login was today and R2's was yesterday. (R for some reason repersents Client, no idea why...). Client 3 hasn't been on since he registered keeping his total at 4 min So my data would look something like this for Today (20110104) R1,20101231,20110104,20 R2,20101231,20110103,10 R3,20101231,20101231,4 And this for the day before (201101030), R1,20101231,20110102,15 R2,20101231,20110103,10 R3,20101231,20101231,4 I get about 200+ client rows each day where even the names of the Client list are changing. Is it possible to import the data each day and fill it in a excel sheet where the Client number is off on the left hand side in a table, and the amount of time (Whole Number ex. 4) each day it spends on the site extend to the right under it's specific date see Picture? I've manage to create a manual sheet but have been unsucessful at getting excel to do any of it for me. Here are two pictures:

Read the article

Search Results

Search found 60939 results on 2438 pages for 'data quality'.

Page 6/2438 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by user1876708

- by Nav

- by Pinal Dave

- by lukasz.romaszewski(at)oracle.com

- by pinaldave

- by stackoverflowuser

- by Filip Ekberg

- by Pinal Dave

- by Irem Radzik

- by Mala Narasimharajan

- by Julien Testut

- by Alex Angelico

- by mgubar

- by Max Palmer

- by Troy Kitch

- by Chris2021

- by Chris2021

- by stefan.thieme(at)oracle.com

- by Mala Narasimharajan

- by jamiet

- by Irem Radzik

- by Cristi Diaconescu

- by B-Ballerl

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >