Search Results

Search found 58440 results on 2338 pages for 'data cleansing'.

Page 527/2338 | < Previous Page | 523 524 525 526 527 528 529 530 531 532 533 534 | Next Page >

SQL SERVER – A Puzzle – Fun with NULL – Fix Error 8117

- by pinaldave

During my 8 years of career, I have been involved in many interviews. Quite often, I act as the interview. If I am the interviewer, I ask many questions – from easy questions to difficult ones. When I am the interviewee, I frequently get an opportunity to ask the interviewer some questions back. Regardless of the my capacity in attending the interview, I always make it a point to ask the interviewer at least one question. What is NULL? It’s always fun to ask this question during interviews, because in every interview, I get a different answer. NULL is often confused with false, absence of value or infinite value. Honestly, NULL is a very interesting subject as it bases its behavior in server settings. There are a few properties of NULL that are universal, but the knowledge about these properties is not known in a universal sense. Let us run this simple puzzle. Run the following T-SQL script: SELECT SUM(data) FROM (SELECT NULL AS data) t It will return the following error: Msg 8117, Level 16, State 1, Line 1 Operand data type NULL is invalid for sum operator. Now the error makes it very clear that NULL is invalid for sum Operator. Frequently enough, I have showed this simple query to many folks whom I came across. I asked them if they could modify the subquery and return the result as NULL. Here is what I expected: Even though this is a very simple looking query, so far I’ve got the correct answer from only 10% of the people to whom I have asked this question. It was common for me to receive this kind of answer – convert the NULL to some data type. However, doing so usually returns the value as 0 or the integer they passed. SELECT SUM(data) FROM (SELECT ISNULL(NULL,0) AS data) t I usually see many people modifying the outer query to get desired NULL result, but that is not allowed in this simple puzzle. This small puzzle made me wonder how many people have a clear understanding about NULL. Well, here is the answer to my simple puzzle. Just CAST NULL AS INT and it will return the final result as NULL: SELECT SUM(data) FROM (SELECT CAST(NULL AS INT) AS data) t Now that you know the answer, don’t you think it was very simple indeed? This blog post is especially dedicated to my friend Madhivanan who has written an excellent blog post about NULL. I am confident that after reading the blog post from Madhivanan, you will have no confusion regarding NULL in the future. Read: NULL, NULL, NULL and nothing but NULL. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – SSMS: Disk Usage Report

- by Pinal Dave

Let us start with humor! I think we the series on various reports, we come to a logical point. We covered all the reports at server level. This means the reports we saw were targeted towards activities that are related to instance level operations. These are mostly like how a doctor diagnoses a patient. At this point I am reminded of a dialog which I read somewhere: Patient: Doc, It hurts when I touch my head. Doc: Ok, go on. What else have you experienced? Patient: It hurts even when I touch my eye, it hurts when I touch my arms, it even hurts when I touch my feet, etc. Doc: Hmmm … Patient: I feel it hurts when I touch anywhere in my body. Doc: Ahh … now I get it. You need a plaster to your finger John. Sometimes the server level gives an indicator to what is happening in the system, but we need to get to the root cause for a specific database. So, this is the first blog in series where we would start discussing about database level reports. To launch database level reports, expand selected server in Object Explorer, expand the Databases folder, and then right-click any database for which we want to look at reports. From the menu, select Reports, then Standard Reports, and then any of database level reports. In this blog, we would talk about four “disk” reports because they are similar: Disk Usage Disk Usage by Top Tables Disk Usage by Table Disk Usage by Partition Disk Usage This report shows multiple information about the database. Let us discuss them one by one. We have divided the output into 5 different sections. Section 1 shows the high level summary of the database. It shows the space used by database files (mdf and ldf). Under the hood, the report uses, various DMVs and DBCC Commands, it is using sys.data_spaces and DBCC SHOWFILESTATS. Section 2 and 3 are pie charts. One for data file allocation and another for the transaction log file. Pie chart for “Data Files Space Usage (%)” shows space consumed data, indexes, allocated to the SQL Server database, and unallocated space which is allocated to the SQL Server database but not yet filled with anything. “Transaction Log Space Usage (%)” used DBCC SQLPERF (LOGSPACE) and shows how much empty space we have in the physical transaction log file. Section 4 shows the data from Default Trace and looks at Event IDs 92, 93, 94, 95 which are for “Data File Auto Grow”, “Log File Auto Grow”, “Data File Auto Shrink” and “Log File Auto Shrink” respectively. Here is an expanded view for that section. If default trace is not enabled, then this section would be replaced by the message “Trace Log is disabled” as highlighted below. Section 5 of the report uses DBCC SHOWFILESTATS to get information. Here is the enhanced version of that section. This shows the physical layout of the file. In case you have In-Memory Objects in the database (from SQL Server 2014), then report would show information about those as well. Here is the screenshot taken for a different database, which has In-Memory table. I have highlighted new things which are only shown for in-memory database. The new sections which are highlighted above are using sys.dm_db_xtp_checkpoint_files, sys.database_files and sys.data_spaces. The new type for in-memory OLTP is ‘FX’ in sys.data_space. The next set of reports is targeted to get information about a table and its storage. These reports can answer questions like: Which is the biggest table in the database? How many rows we have in table? Is there any table which has a lot of reserved space but its unused? Which partition of the table is having more data? Disk Usage by Top Tables This report provides detailed data on the utilization of disk space by top 1000 tables within the Database. The report does not provide data for memory optimized tables. Disk Usage by Table This report is same as earlier report with few difference. First Report shows only 1000 rows First Report does order by values in DMV sys.dm_db_partition_stats whereas second one does it based on name of the table. Both of the reports have interactive sort facility. We can click on any column header and change the sorting order of data. Disk Usage by Partition This report shows the distribution of the data in table based on partition in the table. This is so similar to previous output with the partition details now. Here is the query taken from profiler. SELECT row_number() OVER (ORDER BY a1.used_page_count DESC, a1.index_id) AS row_number , (dense_rank() OVER (ORDER BY a5.name, a2.name))%2 AS l1 , a1.OBJECT_ID , a5.name AS [schema] , a2.name , a1.index_id , a3.name AS index_name , a3.type_desc , a1.partition_number , a1.used_page_count * 8 AS total_used_pages , a1.reserved_page_count * 8 AS total_reserved_pages , a1.row_count FROM sys.dm_db_partition_stats a1 INNER JOIN sys.all_objects a2 ON ( a1.OBJECT_ID = a2.OBJECT_ID) AND a1.OBJECT_ID NOT IN (SELECT OBJECT_ID FROM sys.tables WHERE is_memory_optimized = 1) INNER JOIN sys.schemas a5 ON (a5.schema_id = a2.schema_id) LEFT OUTER JOIN sys.indexes a3 ON ( (a1.OBJECT_ID = a3.OBJECT_ID) AND (a1.index_id = a3.index_id) ) WHERE (SELECT MAX(DISTINCT partition_number) FROM sys.dm_db_partition_stats a4 WHERE (a4.OBJECT_ID = a1.OBJECT_ID)) >= 1 AND a2.TYPE <> N'S' AND a2.TYPE <> N'IT' ORDER BY a5.name ASC, a2.name ASC, a1.index_id, a1.used_page_count DESC, a1.partition_number Using all of the above reports, you should be able to get the usage of database files and also space used by tables. I think this is too much disk information for a single blog and I hope you have used them in the past to get data. Do let me know if you found anything interesting using these reports in your environments. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Much Ado About Nothing: Stub Objects

- by user9154181

The Solaris 11 link-editor (ld) contains support for a new type of object that we call a stub object. A stub object is a shared object, built entirely from mapfiles, that supplies the same linking interface as the real object, while containing no code or data. Stub objects cannot be executed — the runtime linker will kill any process that attempts to load one. However, you can link to a stub object as a dependency, allowing the stub to act as a proxy for the real version of the object. You may well wonder if there is a point to producing an object that contains nothing but linking interface. As it turns out, stub objects are very useful for building large bodies of code such as Solaris. In the last year, we've had considerable success in applying them to one of our oldest and thorniest build problems. In this discussion, I will describe how we came to invent these objects, and how we apply them to building Solaris. This posting explains where the idea for stub objects came from, and details our long and twisty journey from hallway idea to standard link-editor feature. I expect that these details are mainly of interest to those who work on Solaris and its makefiles, those who have done so in the past, and those who work with other similar bodies of code. A subsequent posting will omit the history and background details, and instead discuss how to build and use stub objects. If you are mainly interested in what stub objects are, and don't care about the underlying software war stories, I encourage you to skip ahead. The Long Road To Stubs This all started for me with an email discussion in May of 2008, regarding a change request that was filed in 2002, entitled: 4631488 lib/Makefile is too patient: .WAITs should be reduced This CR encapsulates a number of cronic issues with Solaris builds: We build Solaris with a parallel make (dmake) that tries to build as much of the code base in parallel as possible. There is a lot of code to build, and we've long made use of parallelized builds to get the job done quicker. This is even more important in today's world of massively multicore hardware. Solaris contains a large number of executables and shared objects. Executables depend on shared objects, and shared objects can depend on each other. Before you can build an object, you need to ensure that the objects it needs have been built. This implies a need for serialization, which is in direct opposition to the desire to build everying in parallel. To accurately build objects in the right order requires an accurate set of make rules defining the things that depend on each other. This sounds simple, but the reality is quite complex. In practice, having programmers explicitly specify these dependencies is a losing strategy: It's really hard to get right. It's really easy to get it wrong and never know it because things build anyway. Even if you get it right, it won't stay that way, because dependencies between objects can change over time, and make cannot help you detect such drifing. You won't know that you got it wrong until the builds break. That can be a long time after the change that triggered the breakage happened, making it hard to connect the cause and the effect. Usually this happens just before a release, when the pressure is on, its hard to think calmly, and there is no time for deep fixes. As a poor compromise, the libraries in core Solaris were built using a set of grossly incomplete hand written rules, supplemented with a number of dmake .WAIT directives used to group the libraries into sets of non-interacting groups that can be built in parallel because we think they don't depend on each other. From time to time, someone will suggest that we could analyze the built objects themselves to determine their dependencies and then generate make rules based on those relationships. This is possible, but but there are complications that limit the usefulness of that approach: To analyze an object, you have to build it first. This is a classic chicken and egg scenario. You could analyze the results of a previous build, but then you're not necessarily going to get accurate rules for the current code. It should be possible to build the code without having a built workspace available. The analysis will take time, and remember that we're constantly trying to make builds faster, not slower. By definition, such an approach will always be approximate, and therefore only incremantally more accurate than the hand written rules described above. The hand written rules are fast and cheap, while this idea is slow and complex, so we stayed with the hand written approach. Solaris was built that way, essentially forever, because these are genuinely difficult problems that had no easy answer. The makefiles were full of build races in which the right outcomes happened reliably for years until a new machine or a change in build server workload upset the accidental balance of things. After figuring out what had happened, you'd mutter "How did that ever work?", add another incomplete and soon to be inaccurate make dependency rule to the system, and move on. This was not a satisfying solution, as we tend to be perfectionists in the Solaris group, but we didn't have a better answer. It worked well enough, approximately. And so it went for years. We needed a different approach — a new idea to cut the Gordian Knot. In that discussion from May 2008, my fellow linker-alien Rod Evans had the initial spark that lead us to a game changing series of realizations: The link-editor is used to link objects together, but it only uses the ELF metadata in the object, consisting of symbol tables, ELF versioning sections, and similar data. Notably, it does not look at, or understand, the machine code that makes an object useful at runtime. If you had an object that only contained the ELF metadata for a dependency, but not the code or data, the link-editor would find it equally useful for linking, and would never know the difference. Call it a stub object. In the core Solaris OS, we require all objects to be built with a link-editor mapfile that describes all of its publically available functions and data. Could we build a stub object using the mapfile for the real object? It ought to be very fast to build stub objects, as there are no input objects to process. Unlike the real object, stub objects would not actually require any dependencies, and so, all of the stubs for the entire system could be built in parallel. When building the real objects, one could link against the stub objects instead of the real dependencies. This means that all the real objects can be built built in parallel too, without any serialization. We could replace a system that requires perfect makefile rules with a system that requires no ordering rules whatsoever. The results would be considerably more robust. We immediately realized that this idea had potential, but also that there were many details to sort out, lots of work to do, and that perhaps it wouldn't really pan out. As is often the case, it would be necessary to do the work and see how it turned out. Following that conversation, I set about trying to build a stub object. We determined that a faithful stub has to do the following: Present the same set of global symbols, with the same ELF versioning, as the real object. Functions are simple — it suffices to have a symbol of the right type, possibly, but not necessarily, referencing a null function in its text segment. Copy relocations make data more complicated to stub. The possibility of a copy relocation means that when you create a stub, the data symbols must have the actual size of the real data. Any error in this will go uncaught at link time, and will cause tragic failures at runtime that are very hard to diagnose. For reasons too obscure to go into here, involving tentative symbols, it is also important that the data reside in bss, or not, matching its placement in the real object. If the real object has more than one symbol pointing at the same data item, we call these aliased symbols. All data symbols in the stub object must exhibit the same aliasing as the real object. We imagined the stub library feature working as follows: A command line option to ld tells it to produce a stub rather than a real object. In this mode, only mapfiles are examined, and any object or shared libraries on the command line are are ignored. The extra information needed (function or data, size, and bss details) would be added to the mapfile. When building the real object instead of the stub, the extra information for building stubs would be validated against the resulting object to ensure that they match. In exploring these ideas, I immediately run headfirst into the reality of the original mapfile syntax, a subject that I would later write about as The Problem(s) With Solaris SVR4 Link-Editor Mapfiles. The idea of extending that poor language was a non-starter. Until a better mapfile syntax became available, which seemed unlikely in 2008, the solution could not involve extentions to the mapfile syntax. Instead, we cooked up the idea (hack) of augmenting mapfiles with stylized comments that would carry the necessary information. A typical definition might look like: # DATA(i386) __iob 0x3c0 # DATA(amd64,sparcv9) __iob 0xa00 # DATA(sparc) __iob 0x140 iob; A further problem then became clear: If we can't extend the mapfile syntax, then there's no good way to extend ld with an option to produce stub objects, and to validate them against the real objects. The idea of having ld read comments in a mapfile and parse them for content is an unacceptable hack. The entire point of comments is that they are strictly for the human reader, and explicitly ignored by the tool. Taking all of these speed bumps into account, I made a new plan: A perl script reads the mapfiles, generates some small C glue code to produce empty functions and data definitions, compiles and links the stub object from the generated glue code, and then deletes the generated glue code. Another perl script used after both objects have been built, to compare the real and stub objects, using data from elfdump, and validate that they present the same linking interface. By June 2008, I had written the above, and generated a stub object for libc. It was a useful prototype process to go through, and it allowed me to explore the ideas at a deep level. Ultimately though, the result was unsatisfactory as a basis for real product. There were so many issues: The use of stylized comments were fine for a prototype, but not close to professional enough for shipping product. The idea of having to document and support it was a large concern. The ideal solution for stub objects really does involve having the link-editor accept the same arguments used to build the real object, augmented with a single extra command line option. Any other solution, such as our prototype script, will require makefiles to be modified in deeper ways to support building stubs, and so, will raise barriers to converting existing code. A validation script that rederives what the linker knew when it built an object will always be at a disadvantage relative to the actual linker that did the work. A stub object should be identifyable as such. In the prototype, there was no tag or other metadata that would let you know that they weren't real objects. Being able to identify a stub object in this way means that the file command can tell you what it is, and that the runtime linker can refuse to try and run a program that loads one. At that point, we needed to apply this prototype to building Solaris. As you might imagine, the task of modifying all the makefiles in the core Solaris code base in order to do this is a massive task, and not something you'd enter into lightly. The quality of the prototype just wasn't good enough to justify that sort of time commitment, so I tabled the project, putting it on my list of long term things to think about, and moved on to other work. It would sit there for a couple of years. Semi-coincidentally, one of the projects I tacked after that was to create a new mapfile syntax for the Solaris link-editor. We had wanted to do something about the old mapfile syntax for many years. Others before me had done some paper designs, and a great deal of thought had already gone into the features it should, and should not have, but for various reasons things had never moved beyond the idea stage. When I joined Sun in late 2005, I got involved in reviewing those things and thinking about the problem. Now in 2008, fresh from relearning for the Nth time why the old mapfile syntax was a huge impediment to linker progress, it seemed like the right time to tackle the mapfile issue. Paving the way for proper stub object support was not the driving force behind that effort, but I certainly had them in mind as I moved forward. The new mapfile syntax, which we call version 2, integrated into Nevada build snv_135 in in February 2010: 6916788 ld version 2 mapfile syntax PSARC/2009/688 Human readable and extensible ld mapfile syntax In order to prove that the new mapfile syntax was adequate for general purpose use, I had also done an overhaul of the ON consolidation to convert all mapfiles to use the new syntax, and put checks in place that would ensure that no use of the old syntax would creep back in. That work went back into snv_144 in June 2010: 6916796 OSnet mapfiles should use version 2 link-editor syntax That was a big putback, modifying 517 files, adding 18 new files, and removing 110 old ones. I would have done this putback anyway, as the work was already done, and the benefits of human readable syntax are obvious. However, among the justifications listed in CR 6916796 was this We anticipate adding additional features to the new mapfile language that will be applicable to ON, and which will require all sharable object mapfiles to use the new syntax. I never explained what those additional features were, and no one asked. It was premature to say so, but this was a reference to stub objects. By that point, I had already put together a working prototype link-editor with the necessary support for stub objects. I was pleased to find that building stubs was indeed very fast. On my desktop system (Ultra 24), an amd64 stub for libc can can be built in a fraction of a second: % ptime ld -64 -z stub -o stubs/libc.so.1 -G -hlibc.so.1 \ -ztext -zdefs -Bdirect ... real 0.019708910 user 0.010101680 sys 0.008528431 In order to go from prototype to integrated link-editor feature, I knew that I would need to prove that stub objects were valuable. And to do that, I knew that I'd have to switch the Solaris ON consolidation to use stub objects and evaluate the outcome. And in order to do that experiment, ON would first need to be converted to version 2 mapfiles. Sub-mission accomplished. Normally when you design a new feature, you can devise reasonably small tests to show it works, and then deploy it incrementally, letting it prove its value as it goes. The entire point of stub objects however was to demonstrate that they could be successfully applied to an extremely large and complex code base, and specifically to solve the Solaris build issues detailed above. There was no way to finesse the matter — in order to move ahead, I would have to successfully use stub objects to build the entire ON consolidation and demonstrate their value. In software, the need to boil the ocean can often be a warning sign that things are trending in the wrong direction. Conversely, sometimes progress demands that you build something large and new all at once. A big win, or a big loss — sometimes all you can do is try it and see what happens. And so, I spent some time staring at ON makefiles trying to get a handle on how things work, and how they'd have to change. It's a big and messy world, full of complex interactions, unspecified dependencies, special cases, and knowledge of arcane makefile features... ...and so, I backed away, put it down for a few months and did other work... ...until the fall, when I felt like it was time to stop thinking and pondering (some would say stalling) and get on with it. Without stubs, the following gives a simplified high level view of how Solaris is built: An initially empty directory known as the proto, and referenced via the ROOT makefile macro is established to receive the files that make up the Solaris distribution. A top level setup rule creates the proto area, and performs operations needed to initialize the workspace so that the main build operations can be launched, such as copying needed header files into the proto area. Parallel builds are launched to build the kernel (usr/src/uts), libraries (usr/src/lib), and commands. The install makefile target builds each item and delivers a copy to the proto area. All libraries and executables link against the objects previously installed in the proto, implying the need to synchronize the order in which things are built. Subsequent passes run lint, and do packaging. Given this structure, the additions to use stub objects are: A new second proto area is established, known as the stub proto and referenced via the STUBROOT makefile macro. The stub proto has the same structure as the real proto, but is used to hold stub objects. All files in the real proto are delivered as part of the Solaris product. In contrast, the stub proto is used to build the product, and then thrown away. A new target is added to library Makefiles called stub. This rule builds the stub objects. The ld command is designed so that you can build a stub object using the same ld command line you'd use to build the real object, with the addition of a single -z stub option. This means that the makefile rules for building the stub objects are very similar to those used to build the real objects, and many existing makefile definitions can be shared between them. A new target is added to the Makefiles called stubinstall which delivers the stub objects built by the stub rule into the stub proto. These rules reuse much of existing plumbing used by the existing install rule. The setup rule runs stubinstall over the entire lib subtree as part of its initialization. All libraries and executables link against the objects in the stub proto rather than the main proto, and can therefore be built in parallel without any synchronization. There was no small way to try this that would yield meaningful results. I would have to take a leap of faith and edit approximately 1850 makefiles and 300 mapfiles first, trusting that it would all work out. Once the editing was done, I'd type make and see what happened. This took about 6 weeks to do, and there were many dark days when I'd question the entire project, or struggle to understand some of the many twisted and complex situations I'd uncover in the makefiles. I even found a couple of new issues that required changes to the new stub object related code I'd added to ld. With a substantial amount of encouragement and help from some key people in the Solaris group, I eventually got the editing done and stub objects for the entire workspace built. I found that my desktop system could build all the stub objects in the workspace in roughly a minute. This was great news, as it meant that use of the feature is effectively free — no one was likely to notice or care about the cost of building them. After another week of typing make, fixing whatever failed, and doing it again, I succeeded in getting a complete build! The next step was to remove all of the make rules and .WAIT statements dedicated to controlling the order in which libraries under usr/src/lib are built. This came together pretty quickly, and after a few more speed bumps, I had a workspace that built cleanly and looked like something you might actually be able to integrate someday. This was a significant milestone, but there was still much left to do. I turned to doing full nightly builds. Every type of build (open, closed, OpenSolaris, export, domestic) had to be tried. Each type failed in a new and unique way, requiring some thinking and rework. As things came together, I became aware of things that could have been done better, simpler, or cleaner, and those things also required some rethinking, the seeking of wisdom from others, and some rework. After another couple of weeks, it was in close to final form. My focus turned towards the end game and integration. This was a huge workspace, and needed to go back soon, before changes in the gate would made merging increasingly difficult. At this point, I knew that the stub objects had greatly simplified the makefile logic and uncovered a number of race conditions, some of which had been there for years. I assumed that the builds were faster too, so I did some builds intended to quantify the speedup in build time that resulted from this approach. It had never occurred to me that there might not be one. And so, I was very surprised to find that the wall clock build times for a stock ON workspace were essentially identical to the times for my stub library enabled version! This is why it is important to always measure, and not just to assume. One can tell from first principles, based on all those removed dependency rules in the library makefile, that the stub object version of ON gives dmake considerably more opportunities to overlap library construction. Some hypothesis were proposed, and shot down: Could we have disabled dmakes parallel feature? No, a quick check showed things being build in parallel. It was suggested that we might be I/O bound, and so, the threads would be mostly idle. That's a plausible explanation, but system stats didn't really support it. Plus, the timing between the stub and non-stub cases were just too suspiciously identical. Are our machines already handling as much parallelism as they are capable of, and unable to exploit these additional opportunities? Once again, we didn't see the evidence to back this up. Eventually, a more plausible and obvious reason emerged: We build the libraries and commands (usr/src/lib, usr/src/cmd) in parallel with the kernel (usr/src/uts). The kernel is the long leg in that race, and so, wall clock measurements of build time are essentially showing how long it takes to build uts. Although it would have been nice to post a huge speedup immediately, we can take solace in knowing that stub objects simplify the makefiles and reduce the possibility of race conditions. The next step in reducing build time should be to find ways to reduce or overlap the uts part of the builds. When that leg of the build becomes shorter, then the increased parallelism in the libs and commands will pay additional dividends. Until then, we'll just have to settle for simpler and more robust. And so, I integrated the link-editor support for creating stub objects into snv_153 (November 2010) with 6993877 ld should produce stub objects PSARC/2010/397 ELF Stub Objects followed by the work to convert the ON consolidation in snv_161 (February 2011) with 7009826 OSnet should use stub objects 4631488 lib/Makefile is too patient: .WAITs should be reduced This was a huge putback, with 2108 modified files, 8 new files, and 2 removed files. Due to the size, I was allowed a window after snv_160 closed in which to do the putback. It went pretty smoothly for something this big, a few more preexisting race conditions would be discovered and addressed over the next few weeks, and things have been quiet since then. Conclusions and Looking Forward Solaris has been built with stub objects since February. The fact that developers no longer specify the order in which libraries are built has been a big success, and we've eliminated an entire class of build error. That's not to say that there are no build races left in the ON makefiles, but we've taken a substantial bite out of the problem while generally simplifying and improving things. The introduction of a stub proto area has also opened some interesting new possibilities for other build improvements. As this article has become quite long, and as those uses do not involve stub objects, I will defer that discussion to a future article.

Read the article
A New Threat To Web Applications: Connection String Parameter Pollution (CSPP)

- by eric.maurice

Hi, this is Shaomin Wang. I am a security analyst in Oracle's Security Alerts Group. My primary responsibility is to evaluate the security vulnerabilities reported externally by security researchers on Oracle Fusion Middleware and to ensure timely resolution through the Critical Patch Update. Today, I am going to talk about a serious type of attack: Connection String Parameter Pollution (CSPP). Earlier this year, at the Black Hat DC 2010 Conference, two Spanish security researchers, Jose Palazon and Chema Alonso, unveiled a new class of security vulnerabilities, which target insecure dynamic connections between web applications and databases. The attack called Connection String Parameter Pollution (CSPP) exploits specifically the semicolon delimited database connection strings that are constructed dynamically based on the user inputs from web applications. CSPP, if carried out successfully, can be used to steal user identities and hijack web credentials. CSPP is a high risk attack because of the relative ease with which it can be carried out (low access complexity) and the potential results it can have (high impact). In today's blog, we are going to first look at what connection strings are and then review the different ways connection string injections can be leveraged by malicious hackers. We will then discuss how CSPP differs from traditional connection string injection, and the measures organizations can take to prevent this kind of attacks. In web applications, a connection string is a set of values that specifies information to connect to backend data repositories, in most cases, databases. The connection string is passed to a provider or driver to initiate a connection. Vendors or manufacturers write their own providers for different databases. Since there are many different providers and each provider has multiple ways to make a connection, there are many different ways to write a connection string. Here are some examples of connection strings from Oracle Data Provider for .Net/ODP.Net: Oracle Data Provider for .Net / ODP.Net; Manufacturer: Oracle; Type: .NET Framework Class Library: - Using TNS Data Source = orcl; User ID = myUsername; Password = myPassword; - Using integrated security Data Source = orcl; Integrated Security = SSPI; - Using the Easy Connect Naming Method Data Source = username/password@//myserver:1521/my.server.com - Specifying Pooling parameters Data Source=myOracleDB; User Id=myUsername; Password=myPassword; Min Pool Size=10; Connection Lifetime=120; Connection Timeout=60; Incr Pool Size=5; Decr Pool Size=2; There are many variations of the connection strings, but the majority of connection strings are key value pairs delimited by semicolons. Attacks on connection strings are not new (see for example, this SANS White Paper on Securing SQL Connection String). Connection strings are vulnerable to injection attacks when dynamic string concatenation is used to build connection strings based on user input. When the user input is not validated or filtered, and malicious text or characters are not properly escaped, an attacker can potentially access sensitive data or resources. For a number of years now, vendors, including Oracle, have created connection string builder class tools to help developers generate valid connection strings and potentially prevent this kind of vulnerability. Unfortunately, not all application developers use these utilities because they are not aware of the danger posed by this kind of attacks. So how are Connection String parameter Pollution (CSPP) attacks different from traditional Connection String Injection attacks? First, let's look at what parameter pollution attacks are. Parameter pollution is a technique, which typically involves appending repeating parameters to the request strings to attack the receiving end. Much of the public attention around parameter pollution was initiated as a result of a presentation on HTTP Parameter Pollution attacks by Stefano Di Paola and Luca Carettoni delivered at the 2009 Appsec OWASP Conference in Poland. In HTTP Parameter Pollution attacks, an attacker submits additional parameters in HTTP GET/POST to a web application, and if these parameters have the same name as an existing parameter, the web application may react in different ways depends on how the web application and web server deal with multiple parameters with the same name. When applied to connections strings, the rule for the majority of database providers is the "last one wins" algorithm. If a KEYWORD=VALUE pair occurs more than once in the connection string, the value associated with the LAST occurrence is used. This opens the door to some serious attacks. By way of example, in a web application, a user enters username and password; a subsequent connection string is generated to connect to the back end database. Data Source = myDataSource; Initial Catalog = db; Integrated Security = no; User ID = myUsername; Password = XXX; In the password field, if the attacker enters "xxx; Integrated Security = true", the connection string becomes, Data Source = myDataSource; Initial Catalog = db; Integrated Security = no; User ID = myUsername; Password = XXX; Intergrated Security = true; Under the "last one wins" principle, the web application will then try to connect to the database using the operating system account under which the application is running to bypass normal authentication. CSPP poses serious risks for unprepared organizations. It can be particularly dangerous if an Enterprise Systems Management web front-end is compromised, because attackers can then gain access to control panels to configure databases, systems accounts, etc. Fortunately, organizations can take steps to prevent this kind of attacks. CSPP falls into the Injection category of attacks like Cross Site Scripting or SQL Injection, which are made possible when inputs from users are not properly escaped or sanitized. Escaping is a technique used to ensure that characters (mostly from user inputs) are treated as data, not as characters, that is relevant to the interpreter's parser. Software developers need to become aware of the danger of these attacks and learn about the defenses mechanism they need to introduce in their code. As well, software vendors need to provide templates or classes to facilitate coding and eliminate developers' guesswork for protecting against such vulnerabilities. Oracle has introduced the OracleConnectionStringBuilder class in Oracle Data Provider for .NET. Using this class, developers can employ a configuration file to provide the connection string and/or dynamically set the values through key/value pairs. It makes creating connection strings less error-prone and easier to manager, and ultimately using the OracleConnectionStringBuilder class provides better security against injection into connection strings. For More Information: - The OracleConnectionStringBuilder is located at http://download.oracle.com/docs/cd/B28359_01/win.111/b28375/OracleConnectionStringBuilderClass.htm - Oracle has developed a publicly available course on preventing SQL Injections. The Server Technologies Curriculum course "Defending Against SQL Injection Attacks!" is located at http://st-curriculum.oracle.com/tutorial/SQLInjection/index.htm - The OWASP web site also provides a number of useful resources. It is located at http://www.owasp.org/index.php/Main_Page

Read the article
Customer Support Spotlight: Clemson University

- by cwarticki

I've begun a Customer Support Spotlight series that highlights our wonderful customers and Oracle loyalists. A week ago I visited Clemson University. As I travel to visit and educate our customers, I provide many useful tips/tricks and support best practices (as found on my blog and twitter). Most of all, I always discover an Oracle gem who deserves recognition for their hard work and advocacy. Meet George Manley. George is a Storage Engineer who has worked in Clemson's Data Center all through college, partially in the Hardware Architecture group and partially in the Storage group. George and the rest of the Storage Team work with most all of the storage technologies that they have here at Clemson. This includes a wide array of different vendors' disk arrays, with the most of them being Oracle/Sun 2540's. He also works with SAM/QFS, ACSLS, and our SL8500 Tape Libraries (all three Oracle/Sun products). (pictured L to R, Matt Schoger (Oracle), Mark Flores (Oracle) and George Manley) George was kind enough to take us for a data center tour. It was amazing. I rarely get to see the inside of data centers, and this one was massive. Clemson Computing and Information Technology’s physical resources include the main data center located in the Information Technology Center at the Innovation Campus and Technology Park. The core of Clemson’s computing infrastructure, the data center has 21,000 sq ft of raised floor and is powered by a 14MW substation. The ITC power capacity is 4.5MW. The data center is the home of both enterprise and HPC systems, and is staffed by CCIT staff on a 24 hour basis from a state of the art network operations center within the ITC. A smaller business continuance data center is located on the main campus. The data center serves a wide variety of purposes including HPC (supercomputing) resources which are shared with other Universities throughout the state, the state's medicaid processing system, and nearly all other needs for Clemson University. Yes, that's no typo (14,256 cores and 37TB of memory!!! Thanks for the tour George and thank you very much for your time. The tour was fantastic. I enjoyed getting to know your team and I look forward to many successes from Clemson using Oracle products. -Chris WartickiGlobal Customer Management

Read the article
Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article
Performance problems loading XML with SSIS, an alternative way!

- by AtulThakor

I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed: Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table. Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed: The same foreach container was used but instead the following SQL command was executed (this is an expression): "INSERT INTO MyTable(FileDate) SELECT CAST(bulkcolumn AS XML) FROM OPENROWSET( BULK '" + @[User::CurrentFile] + "', SINGLE_BLOB ) AS x" Using this method we managed to load approximately 20 records per second, much faster…for data loading! For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose! Openrowset Method Much faster to get the data into SQL You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!) Not able to apply validation/transformation against the data when loading it The SQL Server service account will need permission to the file No schema validation when loading files SSIS Slower (in our case) Schema validation Allows you to apply transformations/joins to the data Permissions should be less of a problem Data can be loaded into the final form through the package When using a schema validation errors can fail the package (I’ll do another post on this)

Read the article
Book Review (Book 10) - The Information: A History, a Theory, a Flood

- by BuckWoody

This is a continuation of the books I challenged myself to read to help my career - one a month, for year. You can read my first book review here, and the entire list is here. The book I chose for March 2012 was: The Information: A History, a Theory, a Flood by James Gleick. I was traveling at the end of last month so I’m a bit late posting this review here. Why I chose this book: My personal belief about computing is this: All computing technology is simply re-arranging data. We take data in, we manipulate it, and we send it back out. That’s computing. I had heard from some folks about this book and it’s treatment of data. I heard that it dealt with the basics of data - and the semantics of data, information and so on. It also deals with the earliest forms of history of information, which fascinates me. It’s similar I was told, to GEB which a favorite book of mine as well, so that was a bonus. Some folks I talked to liked it, some didn’t - so I thought I would check it out. What I learned: I liked the book. It was longer than I thought - took quite a while to read, even though I tend to read quickly. This is the kind of book you take your time with. It does in fact deal with the earliest forms of human interaction and the basics of data. I learned, for instance, that the genesis of the binary communication system is based in the invention of telegraph (far-writing) codes, and that the earliest forms of communication were expensive. In fact, many ciphers were invented not to hide military secrets, but to compress information. A sort of early “lol-speak” to keep the cost of transmitting data low! I think the comparison with GEB is a bit over-reaching. GEB is far more specific, fanciful and so on. In fact, this book felt more like something fro Richard Dawkins, and tended to wander around the subject quite a bit. I imagine the author doing his research and writing each chapter as a book that followed on from the last one. This is what possibly bothered those who tended not to like it, I think. Towards the middle of the book, I think the author tended to be a bit too fragmented even for me. He began to delve into memes, biology and more - I think he might have been better off breaking that off into another work. The existentialism just seemed jarring. All in all, I liked the book. I recommend it to any technical professional, specifically ones involved with data technology in specific. And isn’t that all of us? :)

Read the article
How one decision can turn web services to hell

- by DigiMortal

In this posting I will show you how one stupid decision may turn developers life to hell. There is a project where bunch of complex applications exchange data frequently and it is very hard to change something without additional expenses. Well, one analyst thought that string is silver bullet of web services. Read what happened. Bad bad mistake In the early stages of integration project there was analyst who also established architecture and technical design for web services. There was one very bad mistake this analyst made: All data must be converted to strings before exchange! Yes, that’s correct, this was the requirement. All integers, decimals and dates are coming in and going out as strings. There was also explanation for this requirement: This way we can avoid data type conversion errors! Well, this guy works somewhere else already and I hope he works in some burger restaurant – far away from computers. Consequences If you first look at this requirement it may seem like little annoying piece of crap you can easily survive. But let’s see the real consequences one stupid decision can cause: hell load of data conversions are done by receiving applications and SSIS packages, SSIS packages are not error prone and they depend heavily on strings they get from different services, there are more than one format per type that is used in different services, for larger amounts of data all these conversion tasks slow down the work of integration packages, practically all developers have been in hurry with some SSIS import tasks and some fields that are not used in different calculations in SSAS cube are imported without data conversions (by example, some prices are strings in format “1.021 $”). The most painful problem for developers is the part of data conversions because they don’t expect that there is such a stupid requirement stated and therefore they are not able to estimate the time their tasks take on these web services. Also developers must be prepared for cases when suddenly some service sends data that is not in acceptable format and they must solve the problems ASAP. This puts unexpected load on developers and they are not very happy with it because they can’t understand why they have to live with this horror if it is possible to fix. What to do if you see something like this? Well, explain the problem to customer and demand special tasks to project schedule to get this mess solved before going on with new developments. It is cheaper to solve the problems now that later.

Read the article
I have an apache process that takes 98% CPU. How can I find what apache call it runs?

- by Nir

As you can see below, a single Apache process hangs and takes large amount of CPU resources. How can I find what http call this apache process runs? PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12554 www-data 20 0 776m 285m 199m R 97 3.7 67:15.84 apache2 14580 www-data 20 0 748m 372m 314m S 4 4.8 0:13.60 apache2 12561 www-data 20 0 784m 416m 322m S 3 5.4 0:58.10 apache2 12592 www-data 20 0 785m 427m 332m S 2 5.6 0:57.06 apache2

Read the article
Database Security Events in April

- by Troy Kitch

Wed, Apr 18, Executive Oracle Database Security Round Table - Tampa, FL Tue, Apr 24, ISC(2) Leadership Regional Event Series - San Diego, CA April 24 - May 17, Independent Oracle Users Group Enterprise Data at Risk Seminar Series Tue, Apr 24 IOUG Enterprise Data at Risk Seminar Series - Toronto Wed, Apr 25 IOUG Enterprise Data at Risk Seminar Series - New York Thu, Apr 26 IOUG Enterprise Data at Risk Seminar Series - Boston Thu, Apr 26 ISC(2) Leadership Regional Event Series - San Jose, CA

Read the article
The SPARC SuperCluster

- by Karoly Vegh

Oracle has been providing a lead in the Engineered Systems business for quite a while now, in accordance with the motto "Hardware and Software Engineered to Work Together." Indeed it is hard to find a better definition of these systems. Allow me to summarize the idea. It is: Build a compute platform optimized to run your technologies Develop application aware, intelligently caching storage components Take an impressively fast network technology interconnecting it with the compute nodes Tune the application to scale with the nodes to yet unseen performance Reduce the amount of data moving via compression Provide this all in a pre-integrated single product with a single-pane management interface All these ideas have been around in IT for quite some time now. The real Oracle advantage is adding the last one to put these all together. Oracle has built quite a portfolio of Engineered Systems, to run its technologies - and run those like they never ran before. In this post I'll focus on one of them that serves as a consolidation demigod, a multi-purpose engineered system. As you probably have guessed, I am talking about the SPARC SuperCluster. It has many great features inherited from its predecessors, and it adds several new ones. Allow me to pick out and elaborate about some of the most interesting ones from a technological point of view. I. It is the SPARC SuperCluster T4-4. That is, as compute nodes, it includes SPARC T4-4 servers that we learned to appreciate and respect for their features: The SPARC T4 CPUs: Each CPU has 8 cores, each core runs 8 threads. The SPARC T4-4 servers have 4 sockets. That is, a single compute node can in parallel, simultaneously execute 256 threads. Now, a full-rack SPARC SuperCluster has 4 of these servers on board. Remember the keyword demigod. While retaining the forerunner SPARC T3's exceptional throughput, the SPARC T4 CPUs raise the bar with single performance too - a humble 5x better one than their ancestors. actually, the SPARC T4 CPU cores run in both single-threaded and multi-threaded mode, and switch between these two on-the-fly, fulfilling not only single-threaded OR multi-threaded applications' needs, but even mixed requirements (like in database workloads!). Data security, anyone? Every SPARC T4 CPU core has a built-in encryption engine, that is, encryption algorithms cast into silicon. A PCI controller right on the chip for customers who need I/O performance. Built-in, no-cost Virtualization: Oracle VM for SPARC (the former LDoms or Logical Domains) is not a server-emulation virtualization technology but rather a serverpartitioning one, the hypervisor runs in the server firmware, and all the VMs' HW resources (I/O, CPU, memory) are accessed natively, without performance overhead. This enables customers to run a number of Solaris 10 and Solaris 11 VMs separated, independent of each other within a physical server II. For Database performance, it includes Exadata Storage Cells - one of the main reasons why the Exadata Database Machine performs at diabolic speed. What makes them important? They provide DB backend storage for your Oracle Databases to run on the SPARC SuperCluster, that is what they are built and tuned for DB performance. These storage cells are SQL-aware. That is, if a SPARC T4 database compute node executes a query, it doesn't simply request tons of raw datablocks from the storage, filters the received data, and throws away most of it where the statement doesn't apply, but provides the SQL query to the storage node too. The storage cell software speaks SQL, that is, it is able to prefilter and through that transfer only the relevant data. With this, the traffic between database nodes and storage cells is reduced immensely. Less I/O is a good thing - as they say, all the CPUs of the world do one thing just as fast as any other - and that is waiting for I/O. They don't only pre-filter, but also provide data preprocessing features - e.g. if a DB-node requests an aggregate of data, they can calculate it, and handover only the results, not the whole set. Again, less data to transfer. They support the magical HCC, (Hybrid Columnar Compression). That is, data can be stored in a precompressed form on the storage. Less data to transfer. Of course one can't simply rely on disks for performance, there is Flash Storage included there for caching. III. The low latency, high-speed backbone network: InfiniBand, that interconnects all the members with: Real High Speed: 40 Gbit/s. Full Duplex, of course. Oh, and a really low latency. RDMA. Remote Direct Memory Access. This technology allows the DB nodes to do exactly that. Remotely, directly placing SQL commands into the Memory of the storage cells. Dodging all the network-stack bottlenecks, avoiding overhead, placing requests directly into the process queue. You can also run IP over InfiniBand if you please - that's the way the compute nodes can communicate with each other. IV. Including a general-purpose storage too: the ZFSSA, which is a unified storage, providing NAS and SAN access too, with the following features: NFS over RDMA over InfiniBand. Nothing is faster network-filesystem-wise. All the ZFS features onboard, hybrid storage pools, compression, deduplication, snapshot, replication, NFS and CIFS shares Storageheads in a HA-Cluster configuration providing availability of the data DTrace Live Analytics in a web-based Administration UI Being a general purpose application data storage for your non-database applications running on the SPARC SuperCluster over whichever protocol they prefer, easily replicating, snapshotting, cloning data for them. There's a lot of great technology included in Oracle's SPARC SuperCluster, we have talked its interior through. As for external scalability: you can start with a half- of full- rack SPARC SuperCluster, and scale out to several racks - that is, stacking not separate full-rack SPARC SuperClusters, but extending always one large instance of the size of several full-racks. Yes, over InfiniBand network. Add racks as you grow. What technologies shall run on it? SPARC SuperCluster is a general purpose scaleout consolidation/cloud environment. You can run Oracle Databases with RAC scaling, or Oracle Weblogic (end enjoy the SPARC T4's advantages to run Java). Remember, Oracle technologies have been integrated with the Oracle Engineered Systems - this is the Oracle on Oracle advantage. But you can run other software environments such as SAP if you please too. Run any application that runs on Oracle Solaris 10 or Solaris 11. Separate them in Virtual Machines, or even Oracle Solaris Zones, monitor and manage those from a central UI. Here the key takeaways once again: The SPARC SuperCluster: Is a pre-integrated Engineered System Contains SPARC T4-4 servers with built-in virtualization, cryptography, dynamic threading Contains the Exadata storage cells that intelligently offload the burden of the DB-nodes Contains a highly available ZFS Storage Appliance, that provides SAN/NAS storage in a unified way Combines all these elements over a high-speed, low-latency backbone network implemented with InfiniBand Can grow from a single half-rack to several full-rack size Supports the consolidation of hundreds of applications To summarize: All these technologies are great by themselves, but the real value is like in every other Oracle Engineered System: Integration. All these technologies are tuned to perform together. Together they are way more than the sum of all - and a careful and actually very time consuming integration process is necessary to orchestrate all these for performance. The SPARC SuperCluster's goal is to enable infrastructure operations and offer a pre-integrated solution that can be architected and delivered in hours instead of months of evaluations and tests. The tedious and most importantly time and resource consuming part of the work - testing and evaluating - has been done. Now go, provide services. -- charlie

Read the article
Hype and LINQ

- by Tony Davis

"Tired of querying in antiquated SQL?" I blinked in astonishment when I saw this headline on the LinqPad site. Warming to its theme, the site suggests that what we need is to "kiss goodbye to SSMS", and instead use LINQ, a modern query language! Elsewhere, there is an article entitled "Why LINQ beats SQL". The designers of LINQ, along with many DBAs, would, I'm sure, cringe with embarrassment at the suggestion that LINQ and SQL are, in any sense, competitive ways of doing the same thing. In fact what LINQ really is, at last, is an efficient, declarative language for C# and VB programmers to access or manipulate data in objects, local data stores, ORMs, web services, data repositories, and, yes, even relational databases. The fact is that LINQ is essentially declarative programming in a .NET language, and so in many ways encourages developers into a "SQL-like" mindset, even though they are not directly writing SQL. In place of imperative logic and loops, it uses various expressions, operators and declarative logic to build up an "expression tree" describing only what data is required, not the operations to be performed to get it. This expression tree is then parsed by the language compiler, and the result, when used against a relational database, is a SQL string that, while perhaps not always perfect, is often correctly parameterized and certainly no less "optimal" than what is achieved when a developer applies blunt, imperative logic to the SQL language. From a developer standpoint, it is a mistake to consider LINQ simply as a substitute means of querying SQL Server. The strength of LINQ is that that can be used to access any data source, for which a LINQ provider exists. Microsoft supplies built-in providers to access not just SQL Server, but also XML documents, .NET objects, ADO.NET datasets, and Entity Framework elements. LINQ-to-Objects is particularly interesting in that it allows a declarative means to access and manipulate arrays, collections and so on. Furthermore, as Michael Sorens points out in his excellent article on LINQ, there a whole host of third-party LINQ providers, that offers a simple way to get at data in Excel, Google, Flickr and much more, without having to learn a new interface or language. Of course, the need to be generic enough to deal with a range of data sources, from something as mundane as a text file to as esoteric as a relational database, means that LINQ is a compromise and so has inherent limitations. However, it is a powerful and beautifully compact language and one that, at least in its "query syntax" guise, is accessible to developers and DBAs alike. Perhaps there is still hope that LINQ can fulfill Phil Factor's lobster-induced fantasy of a language that will allow us to "treat all data objects, whether Word files, Excel files, XML, relational databases, text files, HTML files, registry files, LDAPs, Outlook and so on, in the same logical way, as linked databases, and extract the metadata, create the entities and relationships in the same way, and use the same SQL syntax to interrogate, create, read, write and update them." Cheers, Tony.

Read the article
Characteristics of a Web service that promote reusability and change

Characteristics of a Web service that promote reusability and change: Standardized Data Exchange Formats (XML, JSON) Standardized communication protocols (Soap, Rest) Promotes Loosely Coupled Systems Standardized Data Exchange Formats (XML, JSON) XML W3.org defines Extensible Markup Language (XML) as a simplistic text format derived from SGML. XML was designed to solve challenges found in large-scale electronic publishing. In addition, XML is playing an important role in the exchange of data primarily focusing on data exchange on the web. JSON JavaScript Object Notation (JSON) is a human-readable text-based standard designed for data interchange. This format is used for serializing and transmitting data over a network connection in a structured format. The primary use of JSON is to transmit data between a server and web application. JSON is an alternative to XML. Standardized communication protocols (Soap, Rest) Soap W3Scools.com defines SOAP as a simple XML-based protocol. This protocol lets applications exchange data over HTTP. SOAP provides a way to communicate between applications running on different operating systems, with different technologies and programming languages. Rest In 2007, Stefan Tilkov defines Representational State Transfer (REST) as a set of principles that outlines how Web standards are supposed to be used. Using REST in an application will ensure that it exploits the Web’s architecture to its benefit. Promotes Loosely Coupled Systems “Loose coupling as an approach to interconnecting the components in a system or network so that those components, also called elements, depend on each other to the least extent practicable. Coupling refers to the degree of direct knowledge that one element has of another.” (TechTarget.com, 2007) “Loosely coupled system can be easily broken down into definable elements. The extent of coupling in a system can be measured by mapping the maximum number of element changes that can occur without adverse effects. Examples of such changes include adding elements, removing elements, renaming elements, reconfiguring elements, modifying internal element characteristics and rearranging the way in which elements are interconnected.” (TechTarget.com, 2007) References: W3C. (2011). Extensible Markup Language (XML). Retrieved from W3.org: http://www.w3.org/XML/ W3Scools.com. (2011). SOAP Introduction. Retrieved from W3Scools.com: http://www.w3schools.com/soap/soap_intro.asp Tilkov, Stefan. (2007). A Brief Introduction to REST. Retrieved from Infoq.com: http://www.infoq.com/articles/rest-introduction TechTarget.com. (2011). loose coupling. Retrieved from TechTarget.com: http://searchnetworking.techtarget.com/definition/loose-coupling

Read the article
PHP may be executing as a "privileged" group and user, which could be a serious security vulnerability

- by Martin

I ran some security tests on a Ubuntu 12.04 Server, and I've got these warnings : PHP may be executing as a "privileged" group, which could be a serious security vulnerability. PHP may be executing as a "privileged" user, which could be a serious security vulnerability. In /etc/apache2/envvars, I have this: export APACHE_RUN_USER=www-data export APACHE_RUN_GROUP=www-data And all files in /var/www are having these user/group: www-data:www-data Am I setting this correctly? What should I do to fix this problem?

Read the article
Azure Diagnostics: The Bad, The Ugly, and a Better Way

- by jasont

If you’re a .Net web developer today, no doubt you’ve enjoyed watching Windows Azure grow up over the past couple of years. The platform has scaled, stabilized (mostly), and added on a slew of great (and sometimes overdue) features. What was once just an endpoint to host a solution, developers today have tremendous flexibility and options in the platform. Organizations are building new solutions and offerings on the platform, and others have, or are in the process of, migrating existing applications out of their own data centers into the Azure cloud. Whether new application development or migrating legacy, every development shop and IT organization needs to monitor their applications in the cloud, the same as they do on premises. Azure Diagnostics has some capabilities, but what I constantly hear from users is that it’s either (a) not enough, or (b) too cumbersome to set up. Today, Stackify is happy to announce that we fully support Azure deployments, just the same as your on-premises deployments. Let’s take a look below and compare and contrast the options. Azure Diagnostics Let’s crack open the Windows Azure documentation on Azure Diagnostics and see just how easy it is to use. The high level steps are: Step 1: Import the Diagnostics Oh, I’ve already deployed my app without the diagnostics module. Guess I can’t do anything until I do this and re-deploy. Step 2: Configure the Diagnostics (and multiple sub-steps) Do I want it all? Or just pieces of it? Whoops, forgot to include a specific performance counter, I guess I’ll have to deploy again. Wait a minute… I have to specifically code these performance counters into my role’s OnStart() method, compile and deploy again? And query and consume it myself? Step 3: (Optional) Permanently store diagnostic data Lucky for me, Azure storage has gotten pretty cheap. But how often should I move the data into storage? I want to see real-time data, so I guess that’s out now as well. Step 4: (Optional) View stored diagnostic data Optional? Of course I want to see it. Conveniently, Microsoft recommends 3 tools to do this with. Un-conveniently, none of these are web based and they all just give you access to raw data, and very little charting or real-time intelligence. Just….. data. Nevermind that one product seems to have gotten stale since a recent acquisition, and doesn’t even have screenshots! So, let’s summarize: lots of diagnostics data is available, but think realistically. Think Dev Ops. What happens when you are in the middle of a major production performance issue and you don’t have the diagnostics you need? You are redeploying an application (and thankfully you have a great branching strategy, so you feel perfectly safe just willy-nilly launching code into prod, don’t you?) to get data, then shipping it to storage, and then digging through that data to find a needle in a haystack. Would you like to be able to troubleshoot a performance issue in the middle of the night, or on a weekend, from your iPad or home computer’s web browser? Forget it: the best you get is this spark line in the Azure portal. If it’s real pointy, you probably have an issue; but since there is no alert based on a threshold your customers have likely already let you know. And high CPU, Memory, I/O, or Network doesn’t tell you anything about where the problem is. The Better Way – Stackify Stackify supports application and server monitoring in real time, all through a great web interface. All of the things that Azure Diagnostics provides, Stackify provides for your on-premises deployments, and you don’t need to know ahead of time that you’ll need it. It’s always there, it’s always on. Azure deployments are essentially no different than on-premises. It’s a Windows Server (or Linux) in the cloud. It’s behind a different firewall than your corporate servers. That’s it. Stackify can provide the same powerful tools to your Azure deployments in two simple steps. Step 1 Add a startup task to your web or worker role and deploy. If you can’t deploy and need it right now, no worries! Remote Desktop to the Azure instance and you can execute a Powershell script to download / install Stackify. Step 2 Log in to your account at www.stackify.com and begin monitoring as much as you want, as often as you want and see the results instantly. WMI? It’s there Event Viewer? You’ve got it. File System Access? Yes, please! Would love to make sure my web.config is correct. IIS / App Pool Info? Yep. You can even restart it. Running Services? All of them. Start and Stop them to your heart’s content. SQL Database access? You bet’cha. Alerts and Notification? Of course! You should know before your customers let you know. … and so much more. Conclusion Microsoft has shown, consistently, that they love developers, developers, developers. What every developer needs to realize from this is that they’ve given you a canvas, which is exactly what Azure is. It’s great infrastructure that is readily available, easy to manage, and fairly cost effective. However, the tooling is your responsibility. What you get, at best, is bare bones. App and server diagnostics should be available when you need them. While we, as developers, try to plan for and think of everything ahead of time, there will come times where we need to get data that just isn’t available. And having to go through a lot of cumbersome steps to get that data, and then have to find a friendlier way to consume it…. well, that just doesn’t make a lot of sense to me. I’d rather spend my time writing and developing features and completing bug fixes for my applications, than to be writing code to monitor and diagnose.

Read the article
Is it important for reflection-based serialization maintain consistent field ordering?

- by Matchlighter

I just finished writing a packet builder that dynamically loads data into a data stream for eventual network transmission. Each builder operates by finding fields in a given class (and its superclasses) that are marked with a @data annotation. When I finishing my implementation, I remembered that getFields() does not return results in any specific order. Should reflection-based methods for serializing arbitrary data (like my packets) attempt to preserve a specific field ordering (such as alphabetical), and if so, how?

Read the article
The need for user-defined index types

- by Greg Low

Since the removal of the 8KB limit on serialization, the ability to define new data types using SQL CLR integration is now almost at a usable level, apart from one key omission: indexes. We have no ability to create our own types of index to support our data types. As a good example of this, consider that when Microsoft introduced the geometry and geography (spatial) data types, they did so as system CLR data types but also needed to introduce a spatial index as a new type of index. Those of us that...(read more)

Read the article
Isn't MVC anti OOP?

- by m3th0dman

The main idea behind OOP is to unify data and behavior in a single entity - the object. In procedural programming there is data and separately algorithms modifying the data. In the Model-View-Controller pattern the data and the logic/algorithms are placed in distinct entities, the model and the controller respectively. In an equivalent OOP approach shouldn't the model and the controller be placed in the same logical entity?

Read the article
logrotate isn't rotating a particular log file (and i think it should be)

- by Max Williams

Hi all. For a particular app, i have log files in two places. One of the places has just one log file that i want to use with logrotate, for the other location i want to use logrotate on all log files in that folder. I've set up an entry called millionaire-staging in /etc/logrotate.d and have been testing it by calling logrotate -f millionaire-staging. Here's my entry: #/etc/logrotate.d/millionaire-staging compress rotate 1000 dateext missingok sharedscripts copytruncate /var/www/apps/test.millionaire/log/staging.log { weekly } /var/www/apps/test.millionaire/shared/log/*log { size 40M } So, for the first folder, i want to rotate weekly (this seems to have worked fine). For the other, i want to rotate only when the log files get bigger than 40 meg. When i look in that folder (using the same locator as in the logrotate config), i can see a file in there that's 54M and which hasn't been rotated: $ ls -lh /var/www/apps/test.millionaire/shared/log/*log -rw-r--r-- 1 www-data root 33M 2010-12-29 15:00 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.access-log -rw-r--r-- 1 www-data root 54M 2010-09-10 16:57 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.debug-log -rw-r--r-- 1 www-data root 53K 2010-12-14 15:48 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.error-log -rw-r--r-- 1 www-data root 3.8M 2010-12-29 14:30 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.ssl.access-log -rw-r--r-- 1 www-data root 16K 2010-12-17 15:00 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.ssl.error-log -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 /var/www/apps/test.millionaire/shared/log/unicorn.stderr.log -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 /var/www/apps/test.millionaire/shared/log/unicorn.stdout.log Some of the other log files in that folder have been rotated though: $ ls -lh /var/www/apps/test.millionaire/shared/log total 91M -rw-r--r-- 1 www-data root 33M 2010-12-29 15:05 test.millionaire.charanga.com.access-log -rw-r--r-- 1 www-data root 54M 2010-09-10 16:57 test.millionaire.charanga.com.debug-log -rw-r--r-- 1 www-data root 53K 2010-12-14 15:48 test.millionaire.charanga.com.error-log -rw-r--r-- 1 www-data root 3.8M 2010-12-29 14:30 test.millionaire.charanga.com.ssl.access-log -rw-r--r-- 1 www-data root 16K 2010-12-17 15:00 test.millionaire.charanga.com.ssl.error-log -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 unicorn.stderr.log -rw-r--r-- 1 deploy deploy 41K 2010-12-29 11:03 unicorn.stderr.log-20101229.gz -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 unicorn.stdout.log -rw-r--r-- 1 deploy deploy 1.1K 2010-10-15 11:05 unicorn.stdout.log-20101229.gz I think what might have happened is that i first ran this config with a pattern matching *.log, and that means it only rotated the two files that ended in .log (as opposed to -log). Then, when i changed the config and ran it again, it won't do any more since it think's its already had its weekly run, or something. Can anyone see what i'm doing wrong? Is it to do with those top folders being owned by root rather than deploy do you think? thanks, max

Read the article
logrotate isn't rotating a particular log file (and i think it should be)

- by Max Williams

Hi all. For a particular app, i have log files in two places. One of the places has just one log file that i want to use with logrotate, for the other location i want to use logrotate on all log files in that folder. I've set up an entry called millionaire-staging in /etc/logrotate.d and have been testing it by calling logrotate -f millionaire-staging. Here's my entry: #/etc/logrotate.d/millionaire-staging compress rotate 1000 dateext missingok sharedscripts copytruncate /var/www/apps/test.millionaire/log/staging.log { weekly } /var/www/apps/test.millionaire/shared/log/*log { size 40M } So, for the first folder, i want to rotate weekly (this seems to have worked fine). For the other, i want to rotate only when the log files get bigger than 40 meg. When i look in that folder (using the same locator as in the logrotate config), i can see a file in there that's 54M and which hasn't been rotated: $ ls -lh /var/www/apps/test.millionaire/shared/log/*log -rw-r--r-- 1 www-data root 33M 2010-12-29 15:00 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.access-log -rw-r--r-- 1 www-data root 54M 2010-09-10 16:57 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.debug-log -rw-r--r-- 1 www-data root 53K 2010-12-14 15:48 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.error-log -rw-r--r-- 1 www-data root 3.8M 2010-12-29 14:30 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.ssl.access-log -rw-r--r-- 1 www-data root 16K 2010-12-17 15:00 /var/www/apps/test.millionaire/shared/log/test.millionaire.charanga.com.ssl.error-log -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 /var/www/apps/test.millionaire/shared/log/unicorn.stderr.log -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 /var/www/apps/test.millionaire/shared/log/unicorn.stdout.log Some of the other log files in that folder have been rotated though: $ ls -lh /var/www/apps/test.millionaire/shared/log total 91M -rw-r--r-- 1 www-data root 33M 2010-12-29 15:05 test.millionaire.charanga.com.access-log -rw-r--r-- 1 www-data root 54M 2010-09-10 16:57 test.millionaire.charanga.com.debug-log -rw-r--r-- 1 www-data root 53K 2010-12-14 15:48 test.millionaire.charanga.com.error-log -rw-r--r-- 1 www-data root 3.8M 2010-12-29 14:30 test.millionaire.charanga.com.ssl.access-log -rw-r--r-- 1 www-data root 16K 2010-12-17 15:00 test.millionaire.charanga.com.ssl.error-log -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 unicorn.stderr.log -rw-r--r-- 1 deploy deploy 41K 2010-12-29 11:03 unicorn.stderr.log-20101229.gz -rw-r--r-- 1 deploy deploy 0 2010-12-29 14:49 unicorn.stdout.log -rw-r--r-- 1 deploy deploy 1.1K 2010-10-15 11:05 unicorn.stdout.log-20101229.gz I think what might have happened is that i first ran this config with a pattern matching *.log, and that means it only rotated the two files that ended in .log (as opposed to -log). Then, when i changed the config and ran it again, it won't do any more since it think's its already had its weekly run, or something. Can anyone see what i'm doing wrong? Is it to do with those top folders being owned by root rather than deploy do you think? thanks, max

Read the article
laptop crashed: why?

- by sds

my linux (ubuntu 12.04) laptop crashed, and I am trying to figure out why. # last sds pts/4 :0 Tue Sep 4 10:01 still logged in sds pts/3 :0 Tue Sep 4 10:00 still logged in reboot system boot 3.2.0-29-generic Tue Sep 4 09:43 - 11:23 (01:40) sds pts/8 :0 Mon Sep 3 14:23 - crash (19:19) this seems to indicate a crash at 09:42 (= 14:23+19:19). as per another question, I looked at /var/log: auth.log: Sep 4 09:17:02 t520sds CRON[32744]: pam_unix(cron:session): session closed for user root Sep 4 09:43:17 t520sds lightdm: pam_unix(lightdm:session): session opened for user lightdm by (uid=0) no messages file syslog: Sep 4 09:24:19 t520sds kernel: [219104.819975] CPU0: Package power limit normal Sep 4 09:43:16 t520sds kernel: imklog 5.8.6, log source = /proc/kmsg started. kern.log: Sep 4 09:24:19 t520sds kernel: [219104.819969] CPU1: Package power limit normal Sep 4 09:24:19 t520sds kernel: [219104.819971] CPU2: Package power limit normal Sep 4 09:24:19 t520sds kernel: [219104.819974] CPU3: Package power limit normal Sep 4 09:24:19 t520sds kernel: [219104.819975] CPU0: Package power limit normal Sep 4 09:43:16 t520sds kernel: imklog 5.8.6, log source = /proc/kmsg started. Sep 4 09:43:16 t520sds kernel: [ 0.000000] Initializing cgroup subsys cpuset Sep 4 09:43:16 t520sds kernel: [ 0.000000] Initializing cgroup subsys cpu I had a computation running until 9:24, but the system crashed 18 minutes later! kern.log has many pages of these: Sep 4 09:43:16 t520sds kernel: [ 0.000000] total RAM covered: 8086M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 64K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 128K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 256K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 512K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 1M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 2M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 4M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 8M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 16M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] *BAD*gran_size: 64K chunk_size: 32M num_reg: 10 lose cover RAM: -16M Sep 4 09:43:16 t520sds kernel: [ 0.000000] *BAD*gran_size: 64K chunk_size: 64M num_reg: 10 lose cover RAM: -16M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 128M num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 256M num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 512M num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 1G num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] *BAD*gran_size: 64K chunk_size: 2G num_reg: 10 lose cover RAM: -1G does this mean that my RAM is bad?! it also says Sep 4 09:43:16 t520sds kernel: [ 2.944123] EXT4-fs (sda1): INFO: recovery required on readonly filesystem Sep 4 09:43:16 t520sds kernel: [ 2.944126] EXT4-fs (sda1): write access will be enabled during recovery Sep 4 09:43:16 t520sds kernel: [ 3.088001] firewire_core: created device fw0: GUID f0def1ff8fbd7dff, S400 Sep 4 09:43:16 t520sds kernel: [ 8.929243] EXT4-fs (sda1): orphan cleanup on readonly fs Sep 4 09:43:16 t520sds kernel: [ 8.929249] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 658984 ... Sep 4 09:43:16 t520sds kernel: [ 9.343266] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 525343 Sep 4 09:43:16 t520sds kernel: [ 9.343270] EXT4-fs (sda1): 56 orphan inodes deleted Sep 4 09:43:16 t520sds kernel: [ 9.343271] EXT4-fs (sda1): recovery complete Sep 4 09:43:16 t520sds kernel: [ 9.645799] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) does this mean my HD is bad? As per FaultyHardware, I tried smartctl -l selftest, which uncovered no errors: smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-30-generic] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Momentus 7200.4 Device Model: ST9500420AS Serial Number: 5VJE81YK LU WWN Device Id: 5 000c50 0440defe3 Firmware Version: 0003LVM1 User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Sep 10 16:40:04 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 109) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 034 Pre-fail Always - 162843537 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 571 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 17210154023 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 174362787320258 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 571 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 061 043 045 Old_age Always In_the_past 39 (0 11 44 26) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 84 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 2434 194 Temperature_Celsius 0x0022 039 057 000 Old_age Always - 39 (0 15 0 0) 195 Hardware_ECC_Recovered 0x001a 041 041 000 Old_age Always - 162843537 196 Reallocated_Event_Count 0x000f 095 095 030 Pre-fail Always - 4540 (61955, 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 4545 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Googling for the messages proved inconclusive, I can't even figure out whether the messages are routine or catastrophic. So, what do I do now?

Read the article
Building an MVC application using QuickBooks

- by dataintegration

RSSBus ADO.NET Providers can be used from many tools and IDEs. In this article we show how to connect to QuickBooks from an MVC3 project using the RSSBus ADO.NET Provider for QuickBooks. Although this example uses the QuickBooks Data Provider, the same process can be used with any of our ADO.NET Providers. Creating the Model Step 1: Download and install the QuickBooks Data Provider from RSSBus. Step 2: Create a new MVC3 project in Visual Studio. Add a data model to the Models folder using the ADO.NET Entity Data Model wizard. Step 3: Create a new RSSBus QuickBooks Data Source by clicking "New Connection", specify the connection string options, and click Next. Step 4: Select all the tables and views you need, and click Finish to create the data model. Step 5: Right click on the entity diagram and select 'Add Code Generation Item'. Choose the 'ADO.NET DbContext Generator'. Creating the Controller and the Views Step 6: Add a new controller to the Controllers folder. Give it a meaningful name, such as ReceivePaymentsController. Also, make sure the template selected is 'Controller with empty read/write actions'. Before adding new methods to the Controller, create views for your model. We will add the List, Create, and Delete views. Step 7: Right click on the Views folder and go to Add -> View. Here create a new view for each: List, Create, and Delete templates. Make sure to also associate your Model with the new views. Step 10: Now that the views are ready, go back and edit the RecievePayment controller. Update your code to handle the Index, Create, and Delete methods. Sample Project We are including a sample project that shows how to use the QuickBooks Data Provider in an MVC3 application. You may download the C# project here or download the VB.NET project here. You will also need to install the QuickBooks ADO.NET Data Provider to run the demo. You can download a free trial here. To use this demo, you will also need to modify the connection string in the 'web.config'.

Read the article
Function Folding in #PowerQuery

- by Darren Gosbell

Originally posted on: http://geekswithblogs.net/darrengosbell/archive/2014/05/16/function-folding-in-powerquery.aspxLooking at a typical Power Query query you will noticed that it's made up of a number of small steps. As an example take a look at the query I did in my previous post about joining a fact table to a slowly changing dimension. It was roughly built up of the following steps: Get all records from the fact table Get all records from the dimension table do an outer join between these two tables on the business key (resulting in an increase in the row count as there are multiple records in the dimension table for each business key) Filter out the excess rows introduced in step 3 remove extra columns that are not required in the final result set. If Power Query was to execute a query like this literally, following the same steps in the same order it would not be overly efficient. Particularly if your two source tables were quite large. However Power Query has a feature called function folding where it can take a number of these small steps and push them down to the data source. The degree of function folding that can be performed depends on the data source, As you might expect, relational data sources like SQL Server, Oracle and Teradata support folding, but so do some of the other sources like OData, Exchange and Active Directory. To explore how this works I took the data from my previous post and loaded it into a SQL database. Then I converted my Power Query expression to source it's data from that database. Below is the resulting Power Query which I edited by hand so that the whole thing can be shown in a single expression: let SqlSource = Sql.Database("localhost", "PowerQueryTest"), BU = SqlSource{[Schema="dbo",Item="BU"]}[Data], Fact = SqlSource{[Schema="dbo",Item="fact"]}[Data], Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"), LeftJoin = Table.ExpandTableColumn(Source, "NewColumn" , {"BU_Key", "StartDate", "EndDate"} , {"BU_Key", "StartDate", "EndDate"}), BetweenFilter = Table.SelectRows(LeftJoin, each (([Date] >= [StartDate]) and ([Date] <= [EndDate])) ), RemovedColumns = Table.RemoveColumns(BetweenFilter,{"StartDate", "EndDate"}) in RemovedColumns If the above query was run step by step in a literal fashion you would expect it to run two queries against the SQL database doing "SELECT * …" from both tables. However a profiler trace shows just the following single SQL query: select [_].[BU_Code], [_].[Date], [_].[Amount], [_].[BU_Key] from ( select [$Outer].[BU_Code], [$Outer].[Date], [$Outer].[Amount], [$Inner].[BU_Key], [$Inner].[StartDate], [$Inner].[EndDate] from [dbo].[fact] as [$Outer] left outer join ( select [_].[BU_Key] as [BU_Key], [_].[BU_Code] as [BU_Code2], [_].[BU_Name] as [BU_Name], [_].[StartDate] as [StartDate], [_].[EndDate] as [EndDate] from [dbo].[BU] as [_] ) as [$Inner] on ([$Outer].[BU_Code] = [$Inner].[BU_Code2] or [$Outer].[BU_Code] is null and [$Inner].[BU_Code2] is null) ) as [_] where [_].[Date] >= [_].[StartDate] and [_].[Date] <= [_].[EndDate] The resulting query is a little strange, you can probably tell that it was generated programmatically. But if you look closely you'll notice that every single part of the Power Query formula has been pushed down to SQL Server. Power Query itself ends up just constructing the query and passing the results back to Excel, it does not do any of the data transformation steps itself. So now you can feel a bit more comfortable showing Power Query to your less technical Colleagues knowing that the tool will do it's best fold all the small steps in Power Query down the most efficient query that it can against the source systems.

Read the article
parallel_for_each from amp.h – part 1

- by Daniel Moth

This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are: (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality. That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

Read the article

Search Results

Search found 58440 results on 2338 pages for 'data cleansing'.

Page 527/2338 | < Previous Page | 523 524 525 526 527 528 529 530 531 532 533 534 | Next Page >

- by pinaldave

- by Pinal Dave

- by user9154181

- by eric.maurice

- by cwarticki

- by danx

- by AtulThakor

- by BuckWoody

- by DigiMortal

- by Nir

- by Troy Kitch

- by Karoly Vegh

- by Tony Davis

- by Martin

- by jasont

- by Matchlighter

- by Greg Low

- by m3th0dman

- by Max Williams

- by Max Williams

- by sds

- by dataintegration

- by Darren Gosbell

- by Daniel Moth

< Previous Page | 523 524 525 526 527 528 529 530 531 532 533 534 | Next Page >