Search Results

Search found 11338 results on 454 pages for 'big dave'.

Page 18/454 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25  | Next Page >

  • SQL SERVER – Using expressor Composite Types to Enforce Business Rules

    - by pinaldave
    One of the features that distinguish the expressor Data Integration Platform from other products in the data integration space is its concept of composite types, which provide an effective and easily reusable way to clearly define the structure and characteristics of data within your application.  An important feature of the composite type approach is that it allows you to easily adjust the content of a record to its ultimate purpose.  For example, a record used to update a row in a database table is easily defined to include only the minimum set of columns, that is, a value for the key column and values for only those columns that need to be updated. Much like a class in higher level programming languages, you can also use the composite type as a way to enforce business rules onto your data by encapsulating a datum’s name, data type, and constraints (for example, maximum, minimum, or acceptable values) as a single entity, which ensures that your data can not assume an invalid value.  To what extent you use this functionality is a decision you make when designing your application; the expressor design paradigm does not force this approach on you. Let’s take a look at how these features are used.  Suppose you want to create a group of applications that maintain the employee table in your human resources database. Your table might have a structure similar to the HumanResources.Employee table in the AdventureWorks database.  This table includes two columns, EmployeID and rowguid, that are maintained by the relational database management system; you cannot provide values for these columns when inserting new rows into the table. Additionally, there are columns such as VacationHours and SickLeaveHours that you might choose to update for all employees on a monthly basis, which justifies creation of a dedicated application. By creating distinct composite types for the read, insert and update operations against this table, you can more easily manage this table’s content. When developing this application within expressor Studio, your first task is to create a schema artifact for the database table.  This process is completely driven by a wizard, only requiring that you select the desired database schema and table.  The resulting schema artifact defines the mapping of result set records to a record within the expressor data integration application.  The structure of the record within the expressor application is a composite type that is given the default name CompositeType1.  As you can see in the following figure, all columns from the table are included in the result set and mapped to an identically named attribute in the default composite type. If you are developing an application that needs to read this table, perhaps to prepare a year-end report of employees by department, you would probably not be interested in the data in the rowguid and ModifiedDate columns.  A typical approach would be to drop this unwanted data in a downstream operator.  But using an alternative composite type provides a better approach in which the unwanted data never enters your application. While working in expressor  Studio’s schema editor, simply create a second composite type within the same schema artifact, which you could name ReadTable, and remove the attributes corresponding to the unwanted columns. The value of an alternative composite type is even more apparent when you want to insert into or update the table.  In the composite type used to insert rows, remove the attributes corresponding to the EmployeeID primary key and rowguid uniqueidentifier columns since these values are provided by the relational database management system. And to update just the VacationHours and SickLeaveHours columns, use a composite type that includes only the attributes corresponding to the EmployeeID, VacationHours, SickLeaveHours and ModifiedDate columns. By specifying this schema artifact and composite type in a Write Table operator, your upstream application need only deal with the four required attributes and there is no risk of unintentionally overwriting a value in a column that does not need to be updated. Now, what about the option to use the composite type to enforce business rules?  If you review the composition of the default composite type CompositeType1, you will note that the constraints defined for many of the attributes mirror the table column specifications.  For example, the maximum number of characters in the NationaIDNumber, LoginID and Title attributes is equivalent to the maximum width of the target column, and the size of the MaritalStatus and Gender attributes is limited to a single character as required by the table column definition.  If your application code leads to a violation of these constraints, an error will be raised.  The expressor design paradigm then allows you to handle the error in a way suitable for your application.  For example, a string value could be truncated or a numeric value could be rounded. Moreover, you have the option of specifying additional constraints that support business rules unrelated to the table definition. Let’s assume that the only acceptable values for marital status are S, M, and D.  Within the schema editor, double-click on the MaritalStatus attribute to open the Edit Attribute window.  Then click the Allowed Values checkbox and enter the acceptable values into the Constraint Value text box. The schema editor is updated accordingly. There is one more option that the expressor semantic type paradigm supports.  Since the MaritalStatus attribute now clearly specifies how this type of information should be represented (a single character limited to S, M or D), you can convert this attribute definition into a shared type, which will allow you to quickly incorporate this definition into another composite type or into the description of an output record from a transform operator. Again, double-click on the MaritalStatus attribute and in the Edit Attribute window, click Convert, which opens the Share Local Semantic Type window that you use to name this shared type.  There’s no requirement that you give the shared type the same name as the attribute from which it was derived.  You should supply a name that makes it obvious what the shared type represents. In this posting, I’ve overviewed the expressor semantic type paradigm and shown how it can be used to make your application development process more productive.  The beauty of this feature is that you choose when and to what extent you utilize the functionality, but I’m certain that if you opt to follow this approach your efforts will become more efficient and your work will progress more quickly.  As always, I encourage you to download and evaluate expressor Studio for your current and future data integration needs. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, Pinal Dave, PostADay, SQL, SQL Authority, SQL Documentation, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

    Read the article

  • Windows Azure Recipe: Big Data

    - by Clint Edmonson
    As the name implies, what we’re talking about here is the explosion of electronic data that comes from huge volumes of transactions, devices, and sensors being captured by businesses today. This data often comes in unstructured formats and/or too fast for us to effectively process in real time. Collectively, we call these the 4 big data V’s: Volume, Velocity, Variety, and Variability. These qualities make this type of data best managed by NoSQL systems like Hadoop, rather than by conventional Relational Database Management System (RDBMS). We know that there are patterns hidden inside this data that might provide competitive insight into market trends.  The key is knowing when and how to leverage these “No SQL” tools combined with traditional business such as SQL-based relational databases and warehouses and other business intelligence tools. Drivers Petabyte scale data collection and storage Business intelligence and insight Solution The sketch below shows one of many big data solutions using Hadoop’s unique highly scalable storage and parallel processing capabilities combined with Microsoft Office’s Business Intelligence Components to access the data in the cluster. Ingredients Hadoop – this big data industry heavyweight provides both large scale data storage infrastructure and a highly parallelized map-reduce processing engine to crunch through the data efficiently. Here are the key pieces of the environment: Pig - a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Mahout - a machine learning library with algorithms for clustering, classification and batch based collaborative filtering that are implemented on top of Apache Hadoop using the map/reduce paradigm. Hive - data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in distributed storage. Directly accessible to Microsoft Office and other consumers via add-ins and the Hive ODBC data driver. Pegasus - a Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop and that provides algorithms for important graph mining tasks such as Degree, PageRank, Random Walk with Restart (RWR), Radius, and Connected Components. Sqoop - a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. Flume - a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large log data amounts to HDFS. Database – directly accessible to Hadoop via the Sqoop based Microsoft SQL Server Connector for Apache Hadoop, data can be efficiently transferred to traditional relational data stores for replication, reporting, or other needs. Reporting – provides easily consumable reporting when combined with a database being fed from the Hadoop environment. Training These links point to online Windows Azure training labs where you can learn more about the individual ingredients described above. Hadoop Learning Resources (20+ tutorials and labs) Huge collection of resources for learning about all aspects of Apache Hadoop-based development on Windows Azure and the Hadoop and Windows Azure Ecosystems SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

    Read the article

  • Is there something better than a StringBuilder for big blocks of SQL in the code

    - by Eduardo Molteni
    I'm just tired of making a big SQL statement, test it, and then paste the SQL into the code and adding all the sqlstmt.append(" at the beginning and the ") at the end. It's 2011, isn't there a better way the handle a big chunk of strings inside code? Please: don't suggest stored procedures or ORMs. edit Found the answer using XML literals and CData. Thanks to all the people that actually tried to answer the question without questioning me for not using ORM, SPs and using VB edit 2 the question leave me thinking that languages could try to make a better effort for using inline SQL with color syntax, etc. It will be cheaper that developing Linq2SQL. Just something like: dim sql = <sql> SELECT * ... </sql>

    Read the article

  • Skills to Focus on to land Big 5 Software Engineer Position

    - by Megadeth.Metallica
    Guys, I'm in my penultimate quarter of grad school and have a software engineering internship lined up at a big 5 tech company. I have dabbled a lot recently in Python and am average at Java. I want to prepare myself for coding interviews when I apply for new grad positions at the Big 5 tech companies when I graduate at the end of this year. Since I want to have a good shot at all 5 companies (Amazon,Google,Yahoo,Microsoft and Apple) - Should I focus my time and effort on mastering and improving my Java. Or is my time better spent checking out other languages and tools ( Attracted to RoR, Clojure, Git, C# ) I am planning to spend my spring break implementing all the common algorithms and Data structure out of my algorithms textbook in Java.

    Read the article

  • Skynet Big Data Demo Using Hexbug Spider Robot, Raspberry Pi, and Java SE Embedded (Part 4)

    - by hinkmond
    Here's the first sign of life of a Hexbug Spider Robot converted to become a Skynet Big Data model T-1. Yes, this is T-1 the precursor to the Cyberdyne Systems T-101 (and you know where that will lead to...) It is demonstrating a heartbeat using a simple Java SE Embedded program to drive it. See: Skynet Model T-1 Heartbeat It's alive!!! Well, almost alive. At least there's a pulse. We'll program more to its actions next, and then finally connect it to Skynet Big Data to do more advanced stuff, like hunt for Sara Connor. Java SE Embedded programming makes it simple to create the first model in the long line of T-XXX robots to take on the world. Raspberry Pi makes connecting it all together on one simple device, easy. Next post, I'll show how the wires are connected to drive the T-1 robot. Hinkmond

    Read the article

  • how to get accepted at a big company like google [on hold]

    - by prof
    I'm 18 Years old; I started teaching myself programming when I was twelve. I've developed many projects in PHP, Javascript, Ruby, Ruby on Rails. I know a very little about C, C++, Objective C and extending PHP with extensions created in C Programming Language. Now I'm working as a freelance Web Developer with a very low salary :(, My Dream is to get a good career with very high salary so I thought of Big Companies like Google Or Microsoft. My Question is How to get Accepted on those big Companies ? What Pre-requests they want And do you need to finish collage education ?

    Read the article

  • SQLAuthority Book Review – DBA Survivor: Become a Rock Star DBA

    - by pinaldave
    DBA Survivor: Become a Rock Star DBA – Thomas LaRock Link to Amazon Link to Flipkart First of all, I thank all my readers when I wrote that I could not get this book in any local book stores, because they offered me to send a copy of this good book. A very special mention goes to Sripada and Jayesh for they gave so much effort in finding my home address and sending me the hard copy. Before, I did not have the copy of the book, but now I have two of it already! It surprises me how my readers were able to find my home address, which I have not publicly shared. Quick Review: This is indeed a one easy-to-read and fun book. We all work day and night with technology yet we should not forget to show our love and care for our family at home. For our souls that starve for peace and guidance, this one book is the “it” book for all the technology enthusiasts. Though this book was specifically written for DBAs, the reach is not limited to DBAs only because the lessons incorporated in it actually applies to all. This is one of the most motivating technical books I have read. Detailed Review: Let us go over a few questions first: Who wants to be as famous as rockstars in the field of Database Administration? How can one learn what it takes to become a top notch software developer? If you are a beginner in your field, how will you go to next level? Your boss may be very kind or like Dilbert’s Boss, what will you do? How do you keep growing when Eco-system around you does not support you? You are almost at top but there is someone else at the TOP, what do you do and how do you avoid office politics? As a database developer what should be your basic responsibility? and many more… I was able to completely read book in one sitting and I loved it. Before I continue with my opinion, I want to echo the opinion of Kevin Kline who has written the Forward of the book. He has truly suggested that “You hold in your hands a collection of insights and wisdom on the topic of database administration gained through many years of hard-won experience, long nights of study, and direct mentorship under some of the industry’s most talented database professionals and information technology (IT) experts.” Today, IT field is getting bigger and better, while talking about terabytes of the database becomes “more” normal every single day. The gods and demigods of database professionals are taking care of these large scale databases and are carefully maintaining them. In this world, there are only a few beginnings on the first step. There are many experts in different technology fields who are asked to address the issues with databases. There is YOU and ME, who is just new to this work. So we ask ourselves WHERE to begin and HOW to begin. We adore and follow the religion of our rockstars, but oftentimes we really have no idea about their background and their struggles. Every rockstar has his success story which needs to be digested before learning his tricks and tips. This book starts with the same note and teaches the two most important lessons for anybody who wants to be a DBA Rockstar –  to focus on their single goal of learning and to excel the technology. The story starts with three simple guidelines – Get Prepared, Get Trained, Get Certified. Once a person learns the skills, and then, it would be about time that he needs to enrich or to improve those skills you have learned. I am sure that the right opportunity will come finding themselves and they will not have to go run behind it. However, the real challenge for any person is the first day or first week. A new employee, no matter how much experienced he is, sometimes has no clue about what should one do at new job. Chapter 2 and chapter 3 precisely talk about what one should do as soon as the new job begins. It is also written with keeping the fact in focus that each job can be very much different but there are few infrastructure setups and programming concepts are the same. Learning basics of database was really interesting. I like to focus on the roots of any technology. It is important to understand the structure of the database before suggesting what indexes needs to be created, the same way this book covers the most essential knowledge one must learn by most database developers. I think the title of the fourth chapter is my favorite sentence in this book. I can see that I will be saying this again and again in the future – “A Development Server Is a Production Server to a Developer“. I have worked in the software industry for almost 8 years now and I have seen so many developers sitting on their chairs and waiting for instructions from their lead about how to improve the code or what to do the next. When I talk to them, I suggest that the experiment with their server and try various techniques. I think they all should understand that for them, a development server is their production server and needs to pay proper attention to the code from the beginning. There should be NO any inappropriate code from the beginning. One has to fully focus and give their best, if they are not sure they should ask but should do something and stay active. Chapter 5 and 6 talks about two essential skills for any developer and database administration – what are the ethics of developers when they are working with production server and how to support software which is running on the production server. I have met many people who know the theory by heart but when put in front of keyboard they do not know where to start. The first thing they do opening the browser and searching online, instead of opening SQL Server Management Studio. This can very well happen to anybody who is experienced as well. Chapter 5 and 6 addresses that situation as well includes the handy scripts which can solve almost all the basic trouble shooting issues. “Where’s the Buffet?” By far, this is the best chapter in this book. If you have ever met me, you would know that I love food. I think after reading this chapter, I felt Thomas has written this just keeping me in mind. I think there will be many other people who feel the same way, too. Even my wife who read this chapter thought this was specifically written for me. I will not talk any more about this chapter as this is one must read chapter. And of course this is about real ‘FOOD‘. I am an SQL Server Trainer and Consultant and I totally agree with the point made in the chapter 8 of this book. Yes, it says here that what is necessary to train employees and people. Millions of dollars worth the labor is continuously done in the world which has faults and incorrect. Once something goes wrong, very expensive consultant comes in and fixes the problem. This whole cycle which can be stopped and improved if proper training is done. There is plenty of free trainings available as well, if one cannot afford paid training. “Connect. Learn. Share” – I think this is a great summary and bird’s eye view of this book. Networking is the key. Everything which is discussed in this book can be taken to next level if one properly uses this tips and continuously grow with it. Connecting with others, helping learn each other and building the good knowledge sharing environment should be the goal of everyone. Before I end the review I want to share a real experience. I have personally met one DBA who has worked in a single department in a company for so long that when he was put in a different department in his company due to closing that department, he could not adjust and quit the job despite the same people and company around him. Adjusting in the new environment gets much tougher as one person gets more and more experienced. This book precisely addresses the same issue along with their solutions. I just cannot stop comparing the book with my personal journey. I found so many things which are coincidently in the book is written as how we developer and DBA think. I must express special thanks to Thomas for taking time in his personal life and write this book for us. This book is indeed a book for everybody who wants to grow healthy in the tough and competitive environment. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Book Review, SQLAuthority News, SQLServer, T SQL, Technology

    Read the article

  • SQL SERVER – Guest Post – Jonathan Kehayias – Wait Type – Day 16 of 28

    - by pinaldave
    Jonathan Kehayias (Blog | Twitter) is a MCITP Database Administrator and Developer, who got started in SQL Server in 2004 as a database developer and report writer in the natural gas industry. After spending two and a half years working in TSQL, in late 2006, he transitioned to the role of SQL Database Administrator. His primary passion is performance tuning, where he frequently rewrites queries for better performance and performs in depth analysis of index implementation and usage. Jonathan blogs regularly on SQLBlog, and was a coauthor of Professional SQL Server 2008 Internals and Troubleshooting. On a personal note, I think Jonathan is extremely positive person. In every conversation with him I have found that he is always eager to help and encourage. Every time he finds something needs to be approved, he has contacted me without hesitation and guided me to improve, change and learn. During all the time, he has not lost his focus to help larger community. I am honored that he has accepted to provide his views on complex subject of Wait Types and Queues. Currently I am reading his series on Extended Events. Here is the guest blog post by Jonathan: SQL Server troubleshooting is all about correlating related pieces of information together to indentify where exactly the root cause of a problem lies. In my daily work as a DBA, I generally get phone calls like, “So and so application is slow, what’s wrong with the SQL Server.” One of the funny things about the letters DBA is that they go so well with Default Blame Acceptor, and I really wish that I knew exactly who the first person was that pointed that out to me, because it really fits at times. A lot of times when I get this call, the problem isn’t related to SQL Server at all, but every now and then in my initial quick checks, something pops up that makes me start looking at things further. The SQL Server is slow, we see a number of tasks waiting on ASYNC_IO_COMPLETION, IO_COMPLETION, or PAGEIOLATCH_* waits in sys.dm_exec_requests and sys.dm_exec_waiting_tasks. These are also some of the highest wait types in sys.dm_os_wait_stats for the server, so it would appear that we have a disk I/O bottleneck on the machine. A quick check of sys.dm_io_virtual_file_stats() and tempdb shows a high write stall rate, while our user databases show high read stall rates on the data files. A quick check of some performance counters and Page Life Expectancy on the server is bouncing up and down in the 50-150 range, the Free Page counter consistently hits zero, and the Free List Stalls/sec counter keeps jumping over 10, but Buffer Cache Hit Ratio is 98-99%. Where exactly is the problem? In this case, which happens to be based on a real scenario I faced a few years back, the problem may not be a disk bottleneck at all; it may very well be a memory pressure issue on the server. A quick check of the system spec’s and it is a dual duo core server with 8GB RAM running SQL Server 2005 SP1 x64 on Windows Server 2003 R2 x64. Max Server memory is configured at 6GB and we think that this should be enough to handle the workload; or is it? This is a unique scenario because there are a couple of things happening inside of this system, and they all relate to what the root cause of the performance problem is on the system. If we were to query sys.dm_exec_query_stats for the TOP 10 queries, by max_physical_reads, max_logical_reads, and max_worker_time, we may be able to find some queries that were using excessive I/O and possibly CPU against the system in their worst single execution. We can also CROSS APPLY to sys.dm_exec_sql_text() and see the statement text, and also CROSS APPLY sys.dm_exec_query_plan() to get the execution plan stored in cache. Ok, quick check, the plans are pretty big, I see some large index seeks, that estimate 2.8GB of data movement between operators, but everything looks like it is optimized the best it can be. Nothing really stands out in the code, and the indexing looks correct, and I should have enough memory to handle this in cache, so it must be a disk I/O problem right? Not exactly! If we were to look at how much memory the plan cache is taking by querying sys.dm_os_memory_clerks for the CACHESTORE_SQLCP and CACHESTORE_OBJCP clerks we might be surprised at what we find. In SQL Server 2005 RTM and SP1, the plan cache was allowed to take up to 75% of the memory under 8GB. I’ll give you a second to go back and read that again. Yes, you read it correctly, it says 75% of the memory under 8GB, but you don’t have to take my word for it, you can validate this by reading Changes in Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2. In this scenario the application uses an entirely adhoc workload against SQL Server and this leads to plan cache bloat, and up to 4.5GB of our 6GB of memory for SQL can be consumed by the plan cache in SQL Server 2005 SP1. This in turn reduces the size of the buffer cache to just 1.5GB, causing our 2.8GB of data movement in this expensive plan to cause complete flushing of the buffer cache, not just once initially, but then another time during the queries execution, resulting in excessive physical I/O from disk. Keep in mind that this is not the only query executing at the time this occurs. Remember the output of sys.dm_io_virtual_file_stats() showed high read stalls on the data files for our user databases versus higher write stalls for tempdb? The memory pressure is also forcing heavier use of tempdb to handle sorting and hashing in the environment as well. The real clue here is the Memory counters for the instance; Page Life Expectancy, Free List Pages, and Free List Stalls/sec. The fact that Page Life Expectancy is fluctuating between 50 and 150 constantly is a sign that the buffer cache is experiencing constant churn of data, once every minute to two and a half minutes. If you add to the Page Life Expectancy counter, the consistent bottoming out of Free List Pages along with Free List Stalls/sec consistently spiking over 10, and you have the perfect memory pressure scenario. All of sudden it may not be that our disk subsystem is the problem, but is instead an innocent bystander and victim. Side Note: The Page Life Expectancy counter dropping briefly and then returning to normal operating values intermittently is not necessarily a sign that the server is under memory pressure. The Books Online and a number of other references will tell you that this counter should remain on average above 300 which is the time in seconds a page will remain in cache before being flushed or aged out. This number, which equates to just five minutes, is incredibly low for modern systems and most published documents pre-date the predominance of 64 bit computing and easy availability to larger amounts of memory in SQL Servers. As food for thought, consider that my personal laptop has more memory in it than most SQL Servers did at the time those numbers were posted. I would argue that today, a system churning the buffer cache every five minutes is in need of some serious tuning or a hardware upgrade. Back to our problem and its investigation: There are two things really wrong with this server; first the plan cache is excessively consuming memory and bloated in size and we need to look at that and second we need to evaluate upgrading the memory to accommodate the workload being performed. In the case of the server I was working on there were a lot of single use plans found in sys.dm_exec_cached_plans (where usecounts=1). Single use plans waste space in the plan cache, especially when they are adhoc plans for statements that had concatenated filter criteria that is not likely to reoccur with any frequency.  SQL Server 2005 doesn’t natively have a way to evict a single plan from cache like SQL Server 2008 does, but MVP Kalen Delaney, showed a hack to evict a single plan by creating a plan guide for the statement and then dropping that plan guide in her blog post Geek City: Clearing a Single Plan from Cache. We could put that hack in place in a job to automate cleaning out all the single use plans periodically, minimizing the size of the plan cache, but a better solution would be to fix the application so that it uses proper parameterized calls to the database. You didn’t write the app, and you can’t change its design? Ok, well you could try to force parameterization to occur by creating and keeping plan guides in place, or we can try forcing parameterization at the database level by using ALTER DATABASE <dbname> SET PARAMETERIZATION FORCED and that might help. If neither of these help, we could periodically dump the plan cache for that database, as discussed as being a problem in Kalen’s blog post referenced above; not an ideal scenario. The other option is to increase the memory on the server to 16GB or 32GB, if the hardware allows it, which will increase the size of the plan cache as well as the buffer cache. In SQL Server 2005 SP1, on a system with 16GB of memory, if we set max server memory to 14GB the plan cache could use at most 9GB  [(8GB*.75)+(6GB*.5)=(6+3)=9GB], leaving 5GB for the buffer cache.  If we went to 32GB of memory and set max server memory to 28GB, the plan cache could use at most 16GB [(8*.75)+(20*.5)=(6+10)=16GB], leaving 12GB for the buffer cache. Thankfully we have SQL Server 2005 Service Pack 2, 3, and 4 these days which include the changes in plan cache sizing discussed in the Changes to Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2 blog post. In real life, when I was troubleshooting this problem, I spent a week trying to chase down the cause of the disk I/O bottleneck with our Server Admin and SAN Admin, and there wasn’t much that could be done immediately there, so I finally asked if we could increase the memory on the server to 16GB, which did fix the problem. It wasn’t until I had this same problem occur on another system that I actually figured out how to really troubleshoot this down to the root cause.  I couldn’t believe the size of the plan cache on the server with 16GB of memory when I actually learned about this and went back to look at it. SQL Server is constantly telling a story to anyone that will listen. As the DBA, you have to sit back and listen to all that it’s telling you and then evaluate the big picture and how all the data you can gather from SQL about performance relate to each other. One of the greatest tools out there is actually a free in the form of Diagnostic Scripts for SQL Server 2005 and 2008, created by MVP Glenn Alan Berry. Glenn’s scripts collect a majority of the information that SQL has to offer for rapid troubleshooting of problems, and he includes a lot of notes about what the outputs of each individual query might be telling you. When I read Pinal’s blog post SQL SERVER – ASYNC_IO_COMPLETION – Wait Type – Day 11 of 28, I noticed that he referenced Checking Memory Related Performance Counters in his post, but there was no real explanation about why checking memory counters is so important when looking at an I/O related wait type. I thought I’d chat with him briefly on Google Talk/Twitter DM and point this out, and offer a couple of other points I noted, so that he could add the information to his blog post if he found it useful.  Instead he asked that I write a guest blog for this. I am honored to be a guest blogger, and to be able to share this kind of information with the community. The information contained in this blog post is a glimpse at how I do troubleshooting almost every day of the week in my own environment. SQL Server provides us with a lot of information about how it is running, and where it may be having problems, it is up to us to play detective and find out how all that information comes together to tell us what’s really the problem. This blog post is written by Jonathan Kehayias (Blog | Twitter). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: MVP, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

    Read the article

  • SQL SERVER – Import CSV into Database – Transferring File Content into a Database Table using CSVexpress

    - by pinaldave
    One of the most common data integration tasks I run into is a desire to move data from a file into a database table.  Generally the user is familiar with his data, the structure of the file, and the database table, but is unfamiliar with data integration tools and therefore views this task as something that is difficult.  What these users really need is a point and click approach that minimizes the learning curve for the data integration tool.  This is what CSVexpress (www.CSVexpress.com) is all about!  It is based on expressor Studio, a data integration tool I’ve been reviewing over the last several months. With CSVexpress, moving data between data sources can be as simple as providing the database connection details, describing the structure of the incoming and outgoing data and then connecting two pre-programmed operators.   There’s no need to learn the intricacies of the data integration tool or to write code.  Let’s look at an example. Suppose I have a comma separated value data file with data similar to the following, which is a listing of terminated employees that includes their hiring and termination date, department, job description, and final salary. EMP_ID,STRT_DATE,END_DATE,JOB_ID,DEPT_ID,SALARY 102,13-JAN-93,24-JUL-98 17:00,Programmer,60,"$85,000" 101,21-SEP-89,27-OCT-93 17:00,Account Representative,110,"$65,000" 103,28-OCT-93,15-MAR-97 17:00,Account Manager,110,"$75,000" 304,17-FEB-96,19-DEC-99 17:00,Marketing,20,"$45,000" 333,24-MAR-98,31-DEC-99 17:00,Data Entry Clerk,50,"$35,000" 100,17-SEP-87,17-JUN-93 17:00,Administrative Assistant,90,"$40,000" 334,24-MAR-98,31-DEC-98 17:00,Sales Representative,80,"$40,000" 400,01-JAN-99,31-DEC-99 17:00,Sales Manager,80,"$55,000" Notice the concise format used for the date values, the fact that the termination date includes both date and time information, and that the salary is clearly identified as money by the dollar sign and digit grouping.  In moving this data to a database table I want to express the dates using a format that includes the century since it’s obvious that this listing could include employees who left the company in both the 20th and 21st centuries, and I want the salary to be stored as a decimal value without the currency symbol and grouping character.  Most data integration tools would require coding within a transformation operation to effect these changes, but not expressor Studio.  Directives for these modifications are included in the description of the incoming data. Besides starting the expressor Studio tool and opening a project, the first step is to create connection artifacts, which describe to expressor where data is stored.  For this example, two connection artifacts are required: a file connection, which encapsulates the file system location of my file; and a database connection, which encapsulates the database connection information.  With expressor Studio, I use wizards to create these artifacts. First click New Connection > File Connection in the Home tab of expressor Studio’s ribbon bar, which starts the File Connection wizard.  In the first window, I enter the path to the directory that contains the input file.  Note that the file connection artifact only specifies the file system location, not the name of the file. Then I click Next and enter a meaningful name for this connection artifact; clicking Finish closes the wizard and saves the artifact. To create the Database Connection artifact, I must know the location of, or instance name, of the target database and have the credentials of an account with sufficient privileges to write to the target table.  To use expressor Studio’s features to the fullest, this account should also have the authority to create a table. I click the New Connection > Database Connection in the Home tab of expressor Studio’s ribbon bar, which starts the Database Connection wizard.  expressor Studio includes high-performance drivers for many relational database management systems, so I can simply make a selection from the “Supplied database drivers” drop down control.  If my desired RDBMS isn’t listed, I can optionally use an existing ODBC DSN by selecting the “Existing DSN” radio button. In the following window, I enter the connection details.  With Microsoft SQL Server, I may choose to use Windows Authentication rather than rather than account credentials.  After clicking Next, I enter a meaningful name for this connection artifact and clicking Finish closes the wizard and saves the artifact. Now I create a schema artifact, which describes the structure of the file data.  When expressor reads a file, all data fields are typed as strings.  In some use cases this may be exactly what is needed and there is no need to edit the schema artifact.  But in this example, editing the schema artifact will be used to specify how the data should be transformed; that is, reformat the dates to include century designations, change the employee and job ID’s to integers, and convert the salary to a decimal value. Again a wizard is used to create the schema artifact.  I click New Schema > Delimited Schema in the Home tab of expressor Studio’s ribbon bar, which starts the Database Connection wizard.  In the first window, I click Get Data from File, which then displays a listing of the file connections in the project.  When I click on the file connection I previously created, a browse window opens to this file system location; I then select the file and click Open, which imports 10 lines from the file into the wizard. I now view the file’s content and confirm that the appropriate delimiter characters are selected in the “Field Delimiter” and “Record Delimiter” drop down controls; then I click Next. Since the input file includes a header row, I can easily indicate that fields in the file should be identified through the corresponding header value by clicking “Set All Names from Selected Row. “ Alternatively, I could enter a different identifier into the Field Details > Name text box.  I click Next and enter a meaningful name for this schema artifact; clicking Finish closes the wizard and saves the artifact. Now I open the schema artifact in the schema editor.  When I first view the schema’s content, I note that the types of all attributes in the Semantic Type (the right-hand panel) are strings and that the attribute names are the same as the field names in the data file.  To change an attribute’s name and type, I highlight the attribute and click Edit in the Attributes grouping on the Schema > Edit tab of the editor’s ribbon bar.  This opens the Edit Attribute window; I can change the attribute name and select the desired type from the “Data type” drop down control.  In this example, I change the name of each attribute to the name of the corresponding database table column (EmployeeID, StartingDate, TerminationDate, JobDescription, DepartmentID, and FinalSalary).  Then for the EmployeeID and DepartmentID attributes, I select Integer as the data type, for the StartingDate and TerminationDate attributes, I select Datetime as the data type, and for the FinalSalary attribute, I select the Decimal type. But I can do much more in the schema editor.  For the datetime attributes, I can set a constraint that ensures that the data adheres to some predetermined specifications; a starting date must be later than January 1, 1980 (the date on which the company began operations) and a termination date must be earlier than 11:59 PM on December 31, 1999.  I simply select the appropriate constraint and enter the value (1980-01-01 00:00 as the starting date and 1999-12-31 11:59 as the termination date). As a last step in setting up these datetime conversions, I edit the mapping, describing the format of each datetime type in the source file. I highlight the mapping line for the StartingDate attribute and click Edit Mapping in the Mappings grouping on the Schema > Edit tab of the editor’s ribbon bar.  This opens the Edit Mapping window in which I either enter, or select, a format that describes how the datetime values are represented in the file.  Note the use of Y01 as the syntax for the year.  This syntax is the indicator to expressor Studio to derive the century by setting any year later than 01 to the 20th century and any year before 01 to the 21st century.  As each datetime value is read from the file, the year values are transformed into century and year values. For the TerminationDate attribute, my format also indicates that the datetime value includes hours and minutes. And now to the Salary attribute. I open its mapping and in the Edit Mapping window select the Currency tab and the “Use currency” check box.  This indicates that the file data will include the dollar sign (or in Europe the Pound or Euro sign), which should be removed. And on the Grouping tab, I select the “Use grouping” checkbox and enter 3 into the “Group size” text box, a comma into the “Grouping character” text box, and a decimal point into the “Decimal separator” character text box. These entries allow the string to be properly converted into a decimal value. By making these entries into the schema that describes my input file, I’ve specified how I want the data transformed prior to writing to the database table and completely removed the requirement for coding within the data integration application itself. Assembling the data integration application is simple.  Onto the canvas I drag the Read File and Write Table operators, connecting the output of the Read File operator to the input of the Write Table operator. Next, I select the Read File operator and its Properties panel opens on the right-hand side of expressor Studio.  For each property, I can select an appropriate entry from the corresponding drop down control.  Clicking on the button to the right of the “File name” text box opens the file system location specified in the file connection artifact, allowing me to select the appropriate input file.  I indicate also that the first row in the file, the header row, should be skipped, and that any record that fails one of the datetime constraints should be skipped. I then select the Write Table operator and in its Properties panel specify the database connection, normal for the “Mode,” and the “Truncate” and “Create Missing Table” options.  If my target table does not yet exist, expressor will create the table using the information encapsulated in the schema artifact assigned to the operator. The last task needed to complete the application is to create the schema artifact used by the Write Table operator.  This is extremely easy as another wizard is capable of using the schema artifact assigned to the Read Table operator to create a schema artifact for the Write Table operator.  In the Write Table Properties panel, I click the drop down control to the right of the “Schema” property and select “New Table Schema from Upstream Output…” from the drop down menu. The wizard first displays the table description and in its second screen asks me to select the database connection artifact that specifies the RDBMS in which the target table will exist.  The wizard then connects to the RDBMS and retrieves a list of database schemas from which I make a selection.  The fourth screen gives me the opportunity to fine tune the table’s description.  In this example, I set the width of the JobDescription column to a maximum of 40 characters and select money as the type of the LastSalary column.  I also provide the name for the table. This completes development of the application.  The entire application was created through the use of wizards and the required data transformations specified through simple constraints and specifications rather than through coding.  To develop this application, I only needed a basic understanding of expressor Studio, a level of expertise that can be gained by working through a few introductory tutorials.  expressor Studio is as close to a point and click data integration tool as one could want and I urge you to try this product if you have a need to move data between files or from files to database tables. Check out CSVexpress in more detail.  It offers a few basic video tutorials and a preview of expressor Studio 3.5, which will support the reading and writing of data into Salesforce.com. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Documentation, SQL Download, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

    Read the article

  • Globacom and mCentric Deploy BDA and NoSQL Database to analyze network traffic 40x faster

    - by Jean-Pierre Dijcks
    In a fast evolving market, speed is of the essence. mCentric and Globacom leveraged Big Data Appliance, Oracle NoSQL Database to save over 35,000 Call-Processing minutes daily and analyze network traffic 40x faster.  Here are some highlights from the profile: Why Oracle “Oracle Big Data Appliance works well for very large amounts of structured and unstructured data. It is the most agile events-storage system for our collect-it-now and analyze-it-later set of business requirements. Moreover, choosing a prebuilt solution drastically reduced implementation time. We got the big data benefits without needing to assemble and tune a custom-built system, and without the hidden costs required to maintain a large number of servers in our data center. A single support license covers both the hardware and the integrated software, and we have one central point of contact for support,” said Sanjib Roy, CTO, Globacom. Implementation Process It took only five days for Oracle partner mCentric to deploy Oracle Big Data Appliance, perform the software install and configuration, certification, and resiliency testing. The entire process—from site planning to phase-I, go-live—was executed in just over ten weeks, well ahead of the four months allocated to complete the project. Oracle partner mCentric leveraged Oracle Advanced Customer Support Services’ implementation methodology to ensure configurations are tailored for peak performance, all patches are applied, and software and communications are consistently tested using proven methodologies and best practices. Read the entire profile here.

    Read the article

  • SQL SERVER – Securing TRUNCATE Permissions in SQL Server

    - by pinaldave
    Download the Script of this article from here. On December 11, 2010, Vinod Kumar, a Databases & BI technology evangelist from Microsoft Corporation, graced Ahmedabad by spending some time with the Community during the Community Tech Days (CTD) event. As he was running through a few demos, Vinod asked the audience one of the most fundamental and common interview questions – “What is the difference between a DELETE and TRUNCATE?“ Ahmedabad SQL Server User Group Expert Nakul Vachhrajani has come up with excellent solutions of the same. I must congratulate Nakul for this excellent solution and as a encouragement to User Group member, I am publishing the same article over here. Nakul Vachhrajani is a Software Specialist and systems development professional with Patni Computer Systems Limited. He has functional experience spanning legacy code deprecation, system design, documentation, development, implementation, testing, maintenance and support of complex systems, providing business intelligence solutions, database administration, performance tuning, optimization, product management, release engineering, process definition and implementation. He has comprehensive grasp on Database Administration, Development and Implementation with MS SQL Server and C, C++, Visual C++/C#. He has about 6 years of total experience in information technology. Nakul is an member of the Ahmedabad and Gandhinagar SQL Server User Groups, and actively contributes to the community by actively participating in multiple forums and websites like SQLAuthority.com, BeyondRelational.com, SQLServerCentral.com and many others. Please note: The opinions expressed herein are Nakul own personal opinions and do not represent his employer’s view in anyway. All data from everywhere here on Earth go through a series of  four distinct operations, identified by the words: CREATE, READ, UPDATE and DELETE, or simply, CRUD. Putting in Microsoft SQL Server terms, is the process goes like this: INSERT, SELECT, UPDATE and DELETE/TRUNCATE. Quite a few interesting responses were received and evaluated live during the session. To summarize them, the most important similarity that came out was that both DELETE and TRUNCATE participate in transactions. The major differences (not all) that came out of the exercise were: DELETE: DELETE supports a WHERE clause DELETE removes rows from a table, row-by-row Because DELETE moves row-by-row, it acquires a row-level lock Depending upon the recovery model of the database, DELETE is a fully-logged operation. Because DELETE moves row-by-row, it can fire off triggers TRUNCATE: TRUNCATE does not support a WHERE clause TRUNCATE works by directly removing the individual data pages of a table TRUNCATE directly occupies a table-level lock. (Because a lock is acquired, and because TRUNCATE can also participate in a transaction, it has to be a logged operation) TRUNCATE is, therefore, a minimally-logged operation; again, this depends upon the recovery model of the database Triggers are not fired when TRUNCATE is used (because individual row deletions are not logged) Finally, Vinod popped the big homework question that must be critically analyzed: “We know that we can restrict a DELETE operation to a particular user, but how can we restrict the TRUNCATE operation to a particular user?” After returning home and having a nice cup of coffee, I noticed that my gray cells immediately started to work. Below was the result of my research. As what is always said, the devil is in the details. Upon looking at the Permissions section for the TRUNCATE statement in Books On Line, the following jumps right out: “The minimum permission required is ALTER on table_name. TRUNCATE TABLE permissions default to the table owner, members of the sysadmin fixed server role, and the db_owner and db_ddladmin fixed database roles, and are not transferable. However, you can incorporate the TRUNCATE TABLE statement within a module, such as a stored procedure, and grant appropriate permissions to the module using the EXECUTE AS clause.“ Now, what does this mean? Unlike DELETE, one cannot directly assign permissions to a user/set of users allowing or revoking TRUNCATE rights. However, there is a way to circumvent this. It is important to recall that in Microsoft SQL Server, database engine security surrounds the concept of a “securable”, which is any object like a table, stored procedure, trigger, etc. Rights are assigned to a principal on a securable. Refer to the image below (taken from the SQL Server Books On Line). urable”, which is any object like a table, stored procedure, trigger, etc. Rights are assigned to a principal on a securable. Refer to the image below (taken from the SQL Server Books On Line). SETTING UP THE ENVIRONMENT – (01A_Truncate Table Permissions.sql) Script Provided at the end of the article. By the end of this demo, one will be able to do all the CRUD operations, except the TRUNCATE, and the other will only be able to execute the TRUNCATE. All you will need for this test is any edition of SQL Server 2008. (With minor changes, these scripts can be made to work with SQL 2005.) We begin by creating the following: 1.       A test database 2.        Two database roles: associated logins and users 3.       Switch over to the test database and create a test table. Then, add some data into it. I am using row constructors, which is new to SQL 2008. Creating the modules that will be used to enforce permissions 1.       We have already created one of the modules that we will be assigning permissions to. That module is the table: TruncatePermissionsTest 2.       We will now create two stored procedures; one is for the DELETE operation and the other for the TRUNCATE operation. Please note that for all practical purposes, the end result is the same – all data from the table TruncatePermissionsTest is removed Assigning the permissions Now comes the most important part of the demonstration – assigning permissions. A permissions matrix can be worked out as under: To apply the security rights, we use the GRANT and DENY clauses, as under: That’s it! We are now ready for our big test! THE TEST (01B_Truncate Table Test Queries.sql) Script Provided at the end of the article. I will now need two separate SSMS connections, one with the login AllowedTruncate and the other with the login RestrictedTruncate. Running the test is simple; all that’s required is to run through the script – 01B_Truncate Table Test Queries.sql. What I will demonstrate here via screen-shots is the behavior of SQL Server when logged in as the AllowedTruncate user. There are a few other combinations than what are highlighted here. I will leave the reader the right to explore the behavior of the RestrictedTruncate user and these additional scenarios, as a form of self-study. 1.       Testing SELECT permissions 2.       Testing TRUNCATE permissions (Remember, “deny by default”?) 3.       Trying to circumvent security by trying to TRUNCATE the table using the stored procedure Hence, we have now proved that a user can indeed be assigned permissions to specifically assign TRUNCATE permissions. I also hope that the above has sparked curiosity towards putting some security around the probably “destructive” operations of DELETE and TRUNCATE. I would like to wish each and every one of the readers a very happy and secure time with Microsoft SQL Server. (Please find the scripts – 01A_Truncate Table Permissions.sql and 01B_Truncate Table Test Queries.sql that have been used in this demonstration. Please note that these scripts contain purely test-level code only. These scripts must not, at any cost, be used in the reader’s production environments). 01A_Truncate Table Permissions.sql /* ***************************************************************************************************************** Developed By          : Nakul Vachhrajani Functionality         : This demo is focused on how to allow only TRUNCATE permissions to a particular user How to Use            : 1. Run through, step-by-step through the sequence till Step 08 to create a test database 2. Switch over to the "Truncate Table Test Queries.sql" and execute it step-by-step in two different SSMS windows, one where you have logged in as 'RestrictedTruncate', and the other as 'AllowedTruncate' 3. Come back to "Truncate Table Permissions.sql" 4. Execute Step 10 to cleanup! Modifications         : December 13, 2010 - NAV - Updated to add a security matrix and improve code readability when applying security December 12, 2010 - NAV - Created ***************************************************************************************************************** */ -- Step 01: Create a new test database CREATE DATABASE TruncateTestDB GO USE TruncateTestDB GO -- Step 02: Add roles and users to demonstrate the security of the Truncate operation -- 2a. Create the new roles CREATE ROLE AllowedTruncateRole; GO CREATE ROLE RestrictedTruncateRole; GO -- 2b. Create new logins CREATE LOGIN AllowedTruncate WITH PASSWORD = 'truncate@2010', CHECK_POLICY = ON GO CREATE LOGIN RestrictedTruncate WITH PASSWORD = 'truncate@2010', CHECK_POLICY = ON GO -- 2c. Create new Users using the roles and logins created aboave CREATE USER TruncateUser FOR LOGIN AllowedTruncate WITH DEFAULT_SCHEMA = dbo GO CREATE USER NoTruncateUser FOR LOGIN RestrictedTruncate WITH DEFAULT_SCHEMA = dbo GO -- 2d. Add the newly created login to the newly created role sp_addrolemember 'AllowedTruncateRole','TruncateUser' GO sp_addrolemember 'RestrictedTruncateRole','NoTruncateUser' GO -- Step 03: Change over to the test database USE TruncateTestDB GO -- Step 04: Create a test table within the test databse CREATE TABLE TruncatePermissionsTest (Id INT IDENTITY(1,1), Name NVARCHAR(50)) GO -- Step 05: Populate the required data INSERT INTO TruncatePermissionsTest VALUES (N'Delhi'), (N'Mumbai'), (N'Ahmedabad') GO -- Step 06: Encapsulate the DELETE within another module CREATE PROCEDURE proc_DeleteMyTable WITH EXECUTE AS SELF AS DELETE FROM TruncateTestDB..TruncatePermissionsTest GO -- Step 07: Encapsulate the TRUNCATE within another module CREATE PROCEDURE proc_TruncateMyTable WITH EXECUTE AS SELF AS TRUNCATE TABLE TruncateTestDB..TruncatePermissionsTest GO -- Step 08: Apply Security /* *****************************SECURITY MATRIX*************************************** =================================================================================== Object                   | Permissions |                 Login |             | AllowedTruncate   |   RestrictedTruncate |             |User:NoTruncateUser|   User:TruncateUser =================================================================================== TruncatePermissionsTest  | SELECT,     |      GRANT        |      (Default) | INSERT,     |                   | | UPDATE,     |                   | | DELETE      |                   | -------------------------+-------------+-------------------+----------------------- TruncatePermissionsTest  | ALTER       |      DENY         |      (Default) -------------------------+-------------+----*/----------------+----------------------- proc_DeleteMyTable | EXECUTE | GRANT | DENY -------------------------+-------------+-------------------+----------------------- proc_TruncateMyTable | EXECUTE | DENY | GRANT -------------------------+-------------+-------------------+----------------------- *****************************SECURITY MATRIX*************************************** */ /* Table: TruncatePermissionsTest*/ GRANT SELECT, INSERT, UPDATE, DELETE ON TruncateTestDB..TruncatePermissionsTest TO NoTruncateUser GO DENY ALTER ON TruncateTestDB..TruncatePermissionsTest TO NoTruncateUser GO /* Procedure: proc_DeleteMyTable*/ GRANT EXECUTE ON TruncateTestDB..proc_DeleteMyTable TO NoTruncateUser GO DENY EXECUTE ON TruncateTestDB..proc_DeleteMyTable TO TruncateUser GO /* Procedure: proc_TruncateMyTable*/ DENY EXECUTE ON TruncateTestDB..proc_TruncateMyTable TO NoTruncateUser GO GRANT EXECUTE ON TruncateTestDB..proc_TruncateMyTable TO TruncateUser GO -- Step 09: Test --Switch over to the "Truncate Table Test Queries.sql" and execute it step-by-step in two different SSMS windows: --    1. one where you have logged in as 'RestrictedTruncate', and --    2. the other as 'AllowedTruncate' -- Step 10: Cleanup sp_droprolemember 'AllowedTruncateRole','TruncateUser' GO sp_droprolemember 'RestrictedTruncateRole','NoTruncateUser' GO DROP USER TruncateUser GO DROP USER NoTruncateUser GO DROP LOGIN AllowedTruncate GO DROP LOGIN RestrictedTruncate GO DROP ROLE AllowedTruncateRole GO DROP ROLE RestrictedTruncateRole GO USE MASTER GO DROP DATABASE TruncateTestDB GO 01B_Truncate Table Test Queries.sql /* ***************************************************************************************************************** Developed By          : Nakul Vachhrajani Functionality         : This demo is focused on how to allow only TRUNCATE permissions to a particular user How to Use            : 1. Switch over to this from "Truncate Table Permissions.sql", Step #09 2. Execute this step-by-step in two different SSMS windows a. One where you have logged in as 'RestrictedTruncate', and b. The other as 'AllowedTruncate' 3. Return back to "Truncate Table Permissions.sql" 4. Execute Step 10 to cleanup! Modifications         : December 12, 2010 - NAV - Created ***************************************************************************************************************** */ -- Step 09A: Switch to the test database USE TruncateTestDB GO -- Step 09B: Ensure that we have valid data SELECT * FROM TruncatePermissionsTest GO -- (Expected: Following error will occur if logged in as "AllowedTruncate") -- Msg 229, Level 14, State 5, Line 1 -- The SELECT permission was denied on the object 'TruncatePermissionsTest', database 'TruncateTestDB', schema 'dbo'. --Step 09C: Attempt to Truncate Data from the table without using the stored procedure TRUNCATE TABLE TruncatePermissionsTest GO -- (Expected: Following error will occur) --  Msg 1088, Level 16, State 7, Line 2 --  Cannot find the object "TruncatePermissionsTest" because it does not exist or you do not have permissions. -- Step 09D:Regenerate Test Data INSERT INTO TruncatePermissionsTest VALUES (N'London'), (N'Paris'), (N'Berlin') GO -- (Expected: Following error will occur if logged in as "AllowedTruncate") -- Msg 229, Level 14, State 5, Line 1 -- The INSERT permission was denied on the object 'TruncatePermissionsTest', database 'TruncateTestDB', schema 'dbo'. --Step 09E: Attempt to Truncate Data from the table using the stored procedure EXEC proc_TruncateMyTable GO -- (Expected: Will execute successfully with 'AllowedTruncate' user, will error out as under with 'RestrictedTruncate') -- Msg 229, Level 14, State 5, Procedure proc_TruncateMyTable, Line 1 -- The EXECUTE permission was denied on the object 'proc_TruncateMyTable', database 'TruncateTestDB', schema 'dbo'. -- Step 09F:Regenerate Test Data INSERT INTO TruncatePermissionsTest VALUES (N'Madrid'), (N'Rome'), (N'Athens') GO --Step 09G: Attempt to Delete Data from the table without using the stored procedure DELETE FROM TruncatePermissionsTest GO -- (Expected: Following error will occur if logged in as "AllowedTruncate") -- Msg 229, Level 14, State 5, Line 2 -- The DELETE permission was denied on the object 'TruncatePermissionsTest', database 'TruncateTestDB', schema 'dbo'. -- Step 09H:Regenerate Test Data INSERT INTO TruncatePermissionsTest VALUES (N'Spain'), (N'Italy'), (N'Greece') GO --Step 09I: Attempt to Delete Data from the table using the stored procedure EXEC proc_DeleteMyTable GO -- (Expected: Following error will occur if logged in as "AllowedTruncate") -- Msg 229, Level 14, State 5, Procedure proc_DeleteMyTable, Line 1 -- The EXECUTE permission was denied on the object 'proc_DeleteMyTable', database 'TruncateTestDB', schema 'dbo'. --Step 09J: Close this SSMS window and return back to "Truncate Table Permissions.sql" Thank you Nakul to take up the challenge and prove that Ahmedabad and Gandhinagar SQL Server User Group has talent to solve difficult problems. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Pinal Dave, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Security, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • MapRedux - PowerShell and Big Data

    - by Dittenhafer Solutions
    MapRedux – #PowerShell and #Big Data Have you been hearing about “big data”, “map reduce” and other large scale computing terms over the past couple of years and been curious to dig into more detail? Have you read some of the Apache Hadoop online documentation and unfortunately concluded that it wasn't feasible to setup a “test” hadoop environment on your machine? More recently, I have read about some of Microsoft’s work to enable Hadoop on the Azure cloud. Being a "Microsoft"-leaning technologist, I am more inclinded to be successful with experimentation when on the Windows platform. Of course, it is not that I am "religious" about one set of technologies other another, but rather more experienced. Anyway, within the past couple of weeks I have been thinking about PowerShell a bit more as the 2012 PowerShell Scripting Games approach and it occured to me that PowerShell's support for Windows Remote Management (WinRM), and some other inherent features of PowerShell might lend themselves particularly well to a simple implementation of the MapReduce framework. I fired up my PowerShell ISE and started writing just to see where it would take me. Quite simply, the ScriptBlock feature combined with the ability of Invoke-Command to create remote jobs on networked servers provides much of the plumbing of a distributed computing environment. There are some limiting factors of course. Microsoft provided some default settings which prevent PowerShell from taking over a network without administrative approval first. But even with just one adjustment, a given Windows-based machine can become a node in a MapReduce-style distributed computing environment. Ok, so enough introduction. Let's talk about the code. First, any machine that will participate as a remote "node" will need WinRM enabled for remote access, as shown below. This is not exactly practical for hundreds of intended nodes, but for one (or five) machines in a test environment it does just fine. C:> winrm quickconfig WinRM is not set up to receive requests on this machine. The following changes must be made: Set the WinRM service type to auto start. Start the WinRM service. Make these changes [y/n]? y Alternatively, you could take the approach described in the Remotely enable PSRemoting post from the TechNet forum and use PowerShell to create remote scheduled tasks that will call Enable-PSRemoting on each intended node. Invoke-MapRedux Moving on, now that you have one or more remote "nodes" enabled, you can consider the actual Map and Reduce algorithms. Consider the following snippet: $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose Invoke-MapRedux takes an instance of a MapReduceItem which references the Map and Reduce scriptblocks, an array of computer names which are the remote nodes, and the initial data set to be processed. As simple as that, you can start working with concepts of big data and the MapReduce paradigm. Now, how did we get there? I have published the initial version of my PsMapRedux PowerShell Module on GitHub. The PsMapRedux module provides the Invoke-MapRedux function described above. Feel free to browse the underlying code and even contribute to the project! In a later post, I plan to show some of the inner workings of the module, but for now let's move on to how the Map and Reduce functions are defined. Map Both the Map and Reduce functions need to follow a prescribed prototype. The prototype for a Map function in the MapRedux module is as follows. A simple scriptblock that takes one PsObject parameter and returns a hashtable. It is important to note that the PsObject $dataset parameter is a MapRedux custom object that has a "Data" property which offers an array of data to be processed by the Map function. $aMap = { Param ( [PsObject] $dataset ) # Indicate the job is running on the remote node. Write-Host ($env:computername + "::Map"); # The hashtable to return $list = @{}; # ... Perform the mapping work and prepare the $list hashtable result with your custom PSObject... # ... The $dataset has a single 'Data' property which contains an array of data rows # which is a subset of the originally submitted data set. # Return the hashtable (Key, PSObject) Write-Output $list; } Reduce Likewise, with the Reduce function a simple prototype must be followed which takes a $key and a result $dataset from the MapRedux's partitioning function (which joins the Map results by key). Again, the $dataset is a MapRedux custom object that has a "Data" property as described in the Map section. $aReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) # The hashtable to return $redux = @{}; # Return Write-Output $redux; } All Together Now When everything is put together in a short example script, you implement your Map and Reduce functions, query for some starting data, build the MapReduxItem via New-MapReduxItem and call Invoke-MapRedux to get the process started: # Import the MapRedux and SQL Server providers Import-Module "MapRedux" Import-Module “sqlps” -DisableNameChecking # Query the database for a dataset Set-Location SQLSERVER:\sql\dbserver1\default\databases\myDb $query = "SELECT MyKey, Date, Value1 FROM BigData ORDER BY MyKey"; Write-Host "Query: $query" $dataset = Invoke-SqlCmd -query $query # Build the Map function $MyMap = { Param ( [PsObject] $dataset ) Write-Host ($env:computername + "::Map"); $list = @{}; foreach($row in $dataset.Data) { # Write-Host ("Key: " + $row.MyKey.ToString()); if($list.ContainsKey($row.MyKey) -eq $true) { $s = $list.Item($row.MyKey); $s.Sum += $row.Value1; $s.Count++; } else { $s = New-Object PSObject; $s | Add-Member -Type NoteProperty -Name MyKey -Value $row.MyKey; $s | Add-Member -type NoteProperty -Name Sum -Value $row.Value1; $list.Add($row.MyKey, $s); } } Write-Output $list; } $MyReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) $redux = @{}; $count = 0; foreach($s in $dataset.Data) { $sum += $s.Sum; $count += 1; } # Reduce $redux.Add($s.MyKey, $sum / $count); # Return Write-Output $redux; } # Create the item data $Mr = New-MapReduxItem "My Test MapReduce Job" $MyMap $MyReduce # Array of processing nodes... $MyNodes = ("node1", "node2", "node3", "node4", "localhost") # Run the Map Reduce routine... $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose # Show the results Set-Location C:\ $MyMrResults | Out-GridView Conclusion I hope you have seen through this article that PowerShell has a significant infrastructure available for distributed computing. While it does take some code to expose a MapReduce-style framework, much of the work is already done and PowerShell could prove to be the the easiest platform to develop and run big data jobs in your corporate data center, potentially in the Azure cloud, or certainly as an academic excerise at home or school. Follow me on Twitter to stay up to date on the continuing progress of my Powershell MapRedux module, and thanks for reading! Daniel

    Read the article

  • Data Integration 12c Raising the Big Data Roof at Oracle OpenWorld

    - by Tanu Sood
    Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} Author: Dain Hansen, Director, Oracle It was an exciting OpenWorld 2013 for us in the Data Integration track. Our theme this year was all about ‘being future ready’ - previewing one of our biggest releases this year: Oracle Data Integration 12c. Just this week we followed up with this preview by announcing the general availability of 12c release for Oracle’s key data integration products: Oracle Data Integrator 12c and Oracle GoldenGate 12c. The new release delivers extreme performance, increase IT productivity, and simplify deployment, while helping IT organizations to keep pace with new data-oriented technology trends including cloud computing, big data analytics, real-time business intelligence. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} Mark Hurd's keynote on day one set the tone for the Data Integration sessions. Mark focused on big data analytics and the changing consumer expectations. Especially real-time insight is a key theme for Oracle overall and data integration products. In Mark Hurd's keynote we heard from key customers, such as Airbus and Thomson Reuters, how real-time analysis of operational data including machine data creates value, in some cases even saves lives. Thomas Kurian gave a deeper look into Oracle's big data and fast data solutions. In the initial lead Data Integration track session - Brad Adelberg, VP of Development, presented Oracle’s Data Integration 12c product strategy based on key trends from the initial OpenWorld keynotes. Brad talked about how Oracle's data integration products address the new data integration requirements that evolved with cloud computing, big data, and changing consumer expectations and how they set the key themes in our products’ road map. Brad explained why and how fast-time to value, high-performance and future-ready solutions is the top focus areas for product development. If you were not able to attend OpenWorld or this session I recommend reading the white paper: Five New Data Integration Requirements and How to Meet them with Oracle Data Integration, which provides an in-depth look into how Oracle addresses the new trends in the DI market. Following Brad’s session, Nick Wagner provided in depth review of Oracle GoldenGate’s latest features and roadmap. Nick discussed how Oracle GoldenGate’s tight integration with Oracle Database sets the product apart from the competition. We also heard that heterogeneity of the product is still a major focus for GoldenGate’s development and there will be more news on that front when there is a major release. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} After GoldenGate’s product strategy session, Denis Gray from the PM team presented Oracle Data Integrator’s product strategy session, talking about the latest and greatest on ODI. Another good session was delivered by long-time GoldenGate users, Comcast.  Jason Hurd and Amit Patel of Comcast talked about the various use cases they deploy Oracle GoldenGate throughout their enterprise, from database upgrades, feeding reporting systems, to active-active database synchronization.  The Comcast team shared many good tips on how to use GoldenGate for both zero downtime upgrades and active-active replication with conflict management requirement. One of our other important goals we had this year for the Data Integration track at OpenWorld was hearing from our customers. We ended day 1 on just that, with a wonderful award ceremony for Oracle Excellence Awards for Oracle Fusion Middleware Innovation. The ceremony was held in the Yerba Buena Center for the Arts. Congratulations to Royal Bank of Scotland and Yalumba Wine Company, the winners in the Data Integration category. You can find more information on the award and the winners in our previous blog post: 2013 Oracle Excellence Awards for Fusion Middleware Innovation… Selected for their innovation use of Oracle’s Data Integration products; the winners for the Data Integration Category are Royal Bank of Scotland and The Yalumba Wine Company. Congratulations!!! Royal Bank of Scotland’s Market and International Banking division provides clients across the globe with seamless trading and competitive pricing, underpinned by a deep knowledge of risk management across the full spectrum of financial products. They handle millions of transactions daily to keep the lifeblood of their clients’ businesses flowing – whether through payment management solutions or through bespoke trade finance solutions. Royal Bank of Scotland is leveraging Oracle GoldenGate and Oracle Data Integrator along with Oracle Business Intelligence Enterprise Edition and the Oracle Database for a variety of solutions. Mainly, Oracle GoldenGate and Oracle Data Integrator are used to feed their data warehouse – providing a real-time data integration solution that feeds transactional data to their analytics system in minutes to enable improved decision making with timely, accurate data for their business users. Oracle Data Integrator’s in-database transformation capabilities and its ability to integrate with Oracle GoldenGate for real-time data capture is the foundation of this implementation. This solution makes it such that changes happening in the analytics systems are available the same day they are deployed on the operational system with 100% data quality guaranteed. Additionally, the solution has helped to reduce their operational database size from 150GB to 10GB. Impressive! Now what if I told you this solution was built in 3 months and had a less than 6 month return on investment? That’s outstanding! The Yalumba Wine Company is situated in the Barossa Valley of Australia. It is the oldest family owned winery in Australia with a unique way of aging their wines in specially crafted 100 liter barrels. Did you know that “Yalumba” is Aboriginal for “all the land around”? The Yalumba Wine Company is growing rapidly, and was in need of introducing a more modern standard to the existing manufacturing processes to meet globalization demands, overall time-to-market, and better operational efficiency objectives of product development. The Yalumba Wine Company worked with a partner, Bristlecone to develop a unique solution whereby Oracle Data Integrator is leveraged to pull data from Salesforce.com and JD Edwards, in addition to their other pre-existing source systems, for consumption into their data warehouse. They have emphasized the overall ease of developing integration workflows with Oracle Data Integrator. The solution has brought better visibility for the business users, shorter data loading and transformation performance to their data warehouse with rapid incorporation of new data sources, and a solid future-proof foundation for their organization. Moving forward, they plan on leveraging more from Oracle’s Data Integration portfolio. Terrific! In addition to these two customers on Tuesday we featured many other important Oracle Data Integrator and Oracle GoldenGate customers. On Tuesday the GoldenGate panel included: Land O’Lakes, Smuckers, and Veolia Water. Besides giving us yummy nutrition and healthy water, these companies have another aspect in common. They all use GoldenGate to boost their ERP application. Please read the recap by Irem Radzik. On Wednesday, the ODI Panel included: Barry Ralston and Ryan Weber of Infinity Insurance, Paul Stracke of Paychex Inc., and Ian Wall of Vertex Pharmaceuticals for a session filled with interesting projects, use cases and approaches to leveraging Oracle Data Integrator. Please read the recap by Sandrine Riley for more. Thanks to everyone who joined with us and we hope to stay connected! To hear more about our Data Integration12c products join us in an upcoming webcast to learn more. Follow us www.twitter.com/ORCLGoldenGate or goto our website at www.oracle.com/goto/dataintegration

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part II

    - by dbayard
    Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet.  We demonstrated the calculations against sample data for a small set of accounts.  While this worked fine, in the real-world the problem is much bigger because the amount of data is much bigger.  So much bigger that our approach in the previous post won’t scale to meet the real-world needs. From our previous post, here are the challenges we need to conquer: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In this post, we will show how we moved from sample data environment to working with full-scale data.  This post is based on actual work we did for a financial services customer during a recent proof-of-concept. Getting started with the Database At this point, we have some sample data and our IRR function.  We were at a similar point in our customer proof-of-concept exercise- we had sample data but we did not have the full customer data yet.  So our database was empty.  But, this was easily rectified by leveraging the transparency features of Oracle R Enterprise (see https://blogs.oracle.com/R/entry/analyzing_big_data_using_the).  The following code shows how we took our sample data SimpleMWRRData and easily turned it into a new Oracle database table called IRR_DATA via ore.create().  The code also shows how we can access the database table IRR_DATA as if it was a normal R data.frame named IRR_DATA. If we go to sql*plus, we can also check out our new IRR_DATA table: At this point, we now have our sample data loaded in the database as a normal Oracle table called IRR_DATA.  So, we now proceeded to test our R function working with database data. As our first test, we retrieved the data from a single account from the IRR_DATA table, pull it into local R memory, then call our IRR function.  This worked.  No SQL coding required! Going from Crawling to Walking Now that we have shown using our R code with database-resident data for a single account, we wanted to experiment with doing this for multiple accounts.  In other words, we wanted to implement the split-apply-combine technique we discussed in our first post in this series.  Fortunately, Oracle R Enterprise provides a very scalable way to do this with a function called ore.groupApply().  You can read more about ore.groupApply() here: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1 Here is an example of how we ask ORE to take our IRR_DATA table in the database, split it by the ACCOUNT column, apply a function that calls our SimpleMWRR() calculation, and then combine the results. (If you are following along at home, be sure to have installed our myIRR package on your database server via  “R CMD INSTALL myIRR”). The interesting thing about ore.groupApply is that the calculation is not actually performed in my desktop R environment from which I am running.  What actually happens is that ore.groupApply uses the Oracle database to perform the work.  And the Oracle database is what actually splits the IRR_DATA table by ACCOUNT.  Then the Oracle database takes the data for each account and sends it to an embedded R engine running on the database server to apply our R function.  Then the Oracle database combines all the individual results from the calls to the R function. This is significant because now the embedded R engine only needs to deal with the data for a single account at a time.  Regardless of whether we have 20 accounts or 1 million accounts or more, the R engine that performs the calculation does not care.  Given that normal R has a finite amount of memory to hold data, the ore.groupApply approach overcomes the R memory scalability problem since we only need to fit the data from a single account in R memory (not all of the data for all of the accounts). Additionally, the IRR_DATA does not need to be sent from the database to my desktop R program.  Even though I am invoking ore.groupApply from my desktop R program, because the actual SimpleMWRR calculation is run by the embedded R engine on the database server, the IRR_DATA does not need to leave the database server- this is both a performance benefit because network transmission of large amounts of data take time and a security benefit because it is harder to protect private data once you start shipping around your intranet. Another benefit, which we will discuss in a few paragraphs, is the ability to leverage Oracle database parallelism to run these calculations for dozens of accounts at once. From Walking to Running ore.groupApply is rather nice, but it still has the drawback that I run this from a desktop R instance.  This is not ideal for integrating into typical operational processes like nightly data warehouse refreshes or monthly statement generation.  But, this is not an issue for ORE.  Oracle R Enterprise lets us run this from the database using regular SQL, which is easily integrated into standard operations.  That is extremely exciting and the way we actually did these calculations in the customer proof. As part of Oracle R Enterprise, it provides a SQL equivalent to ore.groupApply which it refers to as “rqGroupEval”.  To use rqGroupEval via SQL, there is a bit of simple setup needed.  Basically, the Oracle Database needs to know the structure of the input table and the grouping column, which we are able to define using the database’s pipeline table function mechanisms. Here is the setup script: At this point, our initial setup of rqGroupEval is done for the IRR_DATA table.  The next step is to define our R function to the database.  We do that via a call to ORE’s rqScriptCreate. Now we can test it.  The SQL you use to run rqGroupEval uses the Oracle database pipeline table function syntax.  The first argument to irr_dataGroupEval is a cursor defining our input.  You can add additional where clauses and subqueries to this cursor as appropriate.  The second argument is any additional inputs to the R function.  The third argument is the text of a dummy select statement.  The dummy select statement is used by the database to identify the columns and datatypes to expect the R function to return.  The fourth argument is the column of the input table to split/group by.  The final argument is the name of the R function as you defined it when you called rqScriptCreate(). The Real-World Results In our real customer proof-of-concept, we had more sophisticated calculation requirements than shown in this simplified blog example.  For instance, we had to perform the rate of return calculations for 5 separate time periods, so the R code was enhanced to do so.  In addition, some accounts needed a time-weighted rate of return to be calculated, so we extended our approach and added an R function to do that.  And finally, there were also a few more real-world data irregularities that we needed to account for, so we added logic to our R functions to deal with those exceptions.  For the full-scale customer test, we loaded the customer data onto a Half-Rack Exadata X2-2 Database Machine.  As our half-rack had 48 physical cores (and 96 threads if you consider hyperthreading), we wanted to take advantage of that CPU horsepower to speed up our calculations.  To do so with ORE, it is as simple as leveraging the Oracle Database Parallel Query features.  Let’s look at the SQL used in the customer proof: Notice that we use a parallel hint on the cursor that is the input to our rqGroupEval function.  That is all we need to do to enable Oracle to use parallel R engines. Here are a few screenshots of what this SQL looked like in the Real-Time SQL Monitor when we ran this during the proof of concept (hint: you might need to right-click on these images to be able to view the images full-screen to see the entire image): From the above, you can notice a few things (numbers 1 thru 5 below correspond with highlighted numbers on the images above.  You may need to right click on the above images and view the images full-screen to see the entire image): The SQL completed in 110 seconds (1.8minutes) We calculated rate of returns for 5 time periods for each of 911k accounts (the number of actual rows returned by the IRRSTAGEGROUPEVAL operation) We accessed 103m rows of detailed cash flow/market value data (the number of actual rows returned by the IRR_STAGE2 operation) We ran with 72 degrees of parallelism spread across 4 database servers Most of our 110seconds was spent in the “External Procedure call” event On average, we performed 8,200 executions of our R function per second (110s/911k accounts) On average, each execution was passed 110 rows of data (103m detail rows/911k accounts) On average, we did 41,000 single time period rate of return calculations per second (each of the 8,200 executions of our R function did rate of return calculations for 5 time periods) On average, we processed over 900,000 rows of database data in R per second (103m detail rows/110s) R + Oracle R Enterprise: Best of R + Best of Oracle Database This blog post series started by describing a real customer problem: how to perform a lot of calculations on a lot of data in a short period of time.  While standard R proved to be a very good fit for writing the necessary calculations, the challenge of working with a lot of data in a short period of time remained. This blog post series showed how Oracle R Enterprise enables R to be used in conjunction with the Oracle Database to overcome the data volume and performance issues (as well as simplifying the operations and security issues).  It also showed that we could calculate 5 time periods of rate of returns for almost a million individual accounts in less than 2 minutes. In a future post, we will take the same R function and show how Oracle R Connector for Hadoop can be used in the Hadoop world.  In that next post, instead of having our data in an Oracle database, our data will live in Hadoop and we will how to use the Oracle R Connector for Hadoop and other Oracle Big Data Connectors to move data between Hadoop, R, and the Oracle Database easily.

    Read the article

  • Splitting big request in multiple small ajax requests

    - by Ionut
    I am unsure regarding the scalability of the following model. I have no experience at all with large systems, big number of requests and so on but I'm trying to build some features considering scalability first. In my scenario there is a user page which contains data for: User's details (name, location, workplace ...) User's activity (blog posts, comments...) User statistics (rating, number of friends...) In order to show all this on the same page, for a request there will be at least 3 different database queries on the back-end. In some cases, I imagine that those queries will be running quite a wile, therefore the user experience may suffer while waiting between requests. This is why I decided to run only step 1 (User's details) as a normal request. After the response is received, two ajax requests are sent for steps 2 and 3. When those responses are received, I only place the data in the destined wrappers. For me at least this makes more sense. However there are 3 requests instead of one for every user page view. Will this affect the system on the long term? I'm assuming that this kind of approach requires more resources but is this trade of UX for resources a good dial or should I stick to one plain big request?

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part I

    - by dbayard
    Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls.  R is great for doing advanced computations.  Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns.  This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios.  This information is needed for customer statements and the online web application.  In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations.  But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly.  Rather, IRR is a calculation where approximation techniques need to be used.  In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns.  You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R.  In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale.  The first step we did was to get some sample data.  For a historical IRR calculation, you have a balances and cash flows.  In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel.      The above figure shows part of the spreadsheet of sample data.  The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R.  This is something that R does well.  R offers multiple ways to work with spreadsheet data.  For instance, one could save the spreadsheet as a .csv file.  In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account.  To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files.  We wrote ourselves a little helper function called getsheet() around the RODBC package.  Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself.  The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book.  In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer.  (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique.  For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero.  With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum.  Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer.   To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high.  We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs.  To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range.  See the R help page on optimize()  for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function.  (Our real-world version is  more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation.  In our actual customer proof, we also defined time-weighted rate of return calculations.  The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data.  For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data.  We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory.  In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account.  The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html).  Given that our sample data can fit in memory, one easy approach is to use R’s “by” function.  (Other approaches to Split-Apply-Combine such as plyr can also be used.  See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set.  Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR.  There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

    Read the article

  • When you’re on a high, start something big

    - by BuckWoody
    Most days are pretty average – we have some highs, some lows, and just regular old work to do. But some days the sun is shining, your co-workers are especially nice, and everything just falls into place. You really *enjoy* what you do. Don’t let that moment pass. All of us have “big” projects that we need to tackle. Things that are going to take a long time, and a lot of money. Those kinds of data projects take a LOT of planning, and many times we put that off just to get to the day’s work. I’ve found that the “high” moments are the perfect time to take on these big projects. I’m more focused, and more importantly, more positive. And as the quote goes, “whether you think you can or you think you can’t, you’re probably right.” You’ll find a way to make it happen if you’re in a positive mood. Now – having those “great days” is actually something you can influence, but I’ll save that topic for a future post. I have a project to work on. :) Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Go Big or Go Special

    - by Ajarn Mark Caldwell
    Watching Shark Tank tonight and the first presentation was by Mango Mango Preserves and it highlighted an interesting contrast in business trends today and how to capitalize on opportunities.  <Spoiler Alert> Even though every one of the sharks was raving about the product samples they tried, with two of them going for second and third servings, none of them made a deal to invest in the company.</Spoiler>  In fact, one of the sharks, Kevin O’Leary, kept ripping into the owners with statements to the effect that he thinks they are headed over a financial cliff because he felt their costs were way out of line and would be their downfall if they didn’t take action to radically cut costs. He said that he had previously owned a jams and jellies business and knew the cost ratios that you had to have to make it work.  I don’t doubt he knows exactly what he’s talking about and is 100% accurate…for doing business his way, which I’ll call “Go Big”.  But there’s a whole other way to do business today that would be ideal for these ladies to pursue. As I understand it, based on his level of success in various businesses and the fact that he is even in a position to be investing in other companies, Kevin’s approach is to go mass market (Go Big) and make hundreds of millions of dollars in sales (or something along that scale) while squeezing out every ounce of cost that you can to produce an acceptable margin.  But there is a very different way of making a very successful business these days, which is all about building a passionate and loyal community of customers that are rooting for your success and even actively trying to help you succeed by promoting your product or company (Go Special).  This capitalizes on the power of social media, niche marketing, and The Long Tail.  One of the most prolific writers about capitalizing on this trend is Seth Godin, and I hope that the founders of Mango Mango pick up a couple of his books (probably Purple Cow and Tribes would be good starts) or at least read his blog.  I think the adoration expressed by all of the sharks for the product is the biggest hint that they have a remarkable product and that they are perfect for this type of business approach. Both are completely valid business models, and it may certainly be that the scale at which Kevin O’Leary wants to conduct business where he invests his money is well beyond the long tail, but that doesn’t mean that there is not still a lot of money to be made there.  I wish them the best of luck with their endeavors!

    Read the article

  • Is it wise to store a big lump of json on a database row

    - by Ieyasu Sawada
    I have this project which stores product details from amazon into the database. Just to give you an idea on how big it is: [{"title":"Genetic Engineering (Opposing Viewpoints)","short_title":"Genetic Engineering ...","brand":"","condition":"","sales_rank":"7171426","binding":"Book","item_detail_url":"http://localhost/wordpress/product/?asin=0737705124","node_list":"Books > Science & Math > Biological Sciences > Biotechnology","node_category":"Books","subcat":"","model_number":"","item_url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=128","details_url":"http://localhost/wordpress/product/?asin=0737705124","large_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/large-notfound.png","medium_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/medium-notfound.png","small_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/small-notfound.png","thumbnail_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/thumbnail-notfound.png","tiny_img":"http://localhost/wordpress/wp-content/plugins/ecom/img/tiny-notfound.png","swatch_img":"http://localhost/wordpress/wp-content/plugins/ecom/img/swatch-notfound.png","total_images":"6","amount":"33.70","currency":"$","long_currency":"USD","price":"$33.70","price_type":"List Price","show_price_type":"0","stars_url":"","product_review":"","rating":"","yellow_star_class":"","white_star_class":"","rating_text":" of 5","reviews_url":"","review_label":"","reviews_label":"Read all ","review_count":"","create_review_url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=132","create_review_label":"Write a review","buy_url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=19186","add_to_cart_action":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/add_to_cart.php","asin":"0737705124","status":"Only 7 left in stock.","snippet_condition":"in_stock","status_class":"ninstck","customer_images":["http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/51M2vvFvs2BL.jpg","http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/31FIM-YIUrL.jpg","http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/51M2vvFvs2BL.jpg","http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/51M2vvFvs2BL.jpg"],"disclaimer":"","item_attributes":[{"attr":"Author","value":"Greenhaven Press"},{"attr":"Binding","value":"Hardcover"},{"attr":"EAN","value":"9780737705126"},{"attr":"Edition","value":"1"},{"attr":"ISBN","value":"0737705124"},{"attr":"Label","value":"Greenhaven Press"},{"attr":"Manufacturer","value":"Greenhaven Press"},{"attr":"NumberOfItems","value":"1"},{"attr":"NumberOfPages","value":"224"},{"attr":"ProductGroup","value":"Book"},{"attr":"ProductTypeName","value":"ABIS_BOOK"},{"attr":"PublicationDate","value":"2000-06"},{"attr":"Publisher","value":"Greenhaven Press"},{"attr":"SKU","value":"G0737705124I2N00"},{"attr":"Studio","value":"Greenhaven Press"},{"attr":"Title","value":"Genetic Engineering (Opposing Viewpoints)"}],"customer_review_url":"http://localhost/wordpress/wp-content/ecom-customer-reviews/0737705124.html","flickr_results":["http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/5105560852_06c7d06f14_m.jpg"],"freebase_text":"No around the web data available yet","freebase_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/freebase-notfound.jpg","ebay_related_items":[{"title":"Genetic Engineering (Introducing Issues With Opposing Viewpoints), , Good Book","image":"http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/140.jpg","url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=12165","currency_id":"$","current_price":"26.2"},{"title":"Genetic Engineering Opposing Viewpoints by DAVID BENDER - 1964 Hardcover","image":"http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/140.jpg","url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=130","currency_id":"AUD","current_price":"11.99"}],"no_follow":"rel=\"nofollow\"","new_tab":"target=\"_blank\"","related_products":[],"super_saver_shipping":"","shipping_availability":"","total_offers":"7","added_to_cart":""}] So the structure for the table is: asin title details (the product details in json) Will the performance suffer if I have to store like 10,000 products? Is there any other way of doing this? I'm thinking of the following, but the current setup is really the most convenient one since I also have to use the data on the client side: store the product details in a file. So something like ASIN123.json store the product details in one big file. (I'm guessing it will be a drag to extract data from this file) store each of the fields in the details in its own table field Thanks in advance!

    Read the article

  • Pack of resources in one big file with XNA

    - by Cristian
    Is it possible to pack all the little .xnb files into one big file? Given the level of abstraction of the XNA Framework I though this would come out of the box but I can't find any well integrated solution. So far the best candidate is XnaZip but in addition to having to compile the resources in a post-build event, and a little trouble porting the game to XBOX I have to rename all the references to resources I have already implemented.

    Read the article

  • It's the Freedom You Big Dummy

    <b>Daniweb:</b> "No one has given his life for Linux but certainly there have been sacrifices. But, like their armed soldier counterparts, it isn't about the sacrifice, it's the freedom you big dummy."

    Read the article

  • Venez nous voir au Forum Oracle Big Data le 5 avril !

    - by Kinoa
    Le Big Data vient de plus en plus souvent au devant de la scène et vous souhaitez en apprendre davantage ? Générés à partir des réseaux sociaux, de capteurs numériques et autres équipements mobiles, les Big Data - autrement dits, d'énormes volumes de données - constituent une mine d'informations précieuses sur vos activités et les comportements de vos clients. Votre challenge aujourd’hui consiste à gérer l’acquisition, l’organisation et la compréhension de ces volumes de données non structurées, et à les intégrer dans votre système d’information. Vous avez des questions ? Ca vous parait complexe ? Alors le Forum Oracle Bid Data organisé par Oracle et Intel est fait pour vous !   Nous aborderons plusieurs points : Accélération du déploiement de Big Data par l'approche intégrée du hardware et du software Mise à disposition de tous les outils nécessaires au processus complet, de l'acquisition des données à la restitution Intégration de Big Data dans votre système d'information pour fournir aux utilisateurs la quintessence de l'information Nous vous avons concocté un programme des plus alléchant pour cette journée du 5 avril : 9h00 Accueil et remise des badges 9h30 Big Data : The Industry View. Are you ready ?Johan Hendrickx, Core Technology Director, Oracle EMEA Keynote : Big Data – Are you ready ? George Lumpkin, Vice President of DW Product Management, Oracle Corporation Acquisition des données dans votre Big Dataavec Hadoop et Oracle NoSQL Pause Organisez et structurez l'information au sein de votre Big Data avec Big Data Connectors et Oracle Data Integrator Tirez parti des analyses des données de votre Big Dataavec Oracle Endeca et Oracle Business Intelligence 13h00 Cocktail déjeunatoire Le nombre de places est limité, pensez à vous inscrire dès maintenant. Lieu :  Maison de la Chimie28 B, rue Saint Dominique 75007 Paris

    Read the article

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25  | Next Page >