Search Results

Search found 6634 results on 266 pages for 'fast fashion'.

Page 118/266 | < Previous Page | 114 115 116 117 118 119 120 121 122 123 124 125 | Next Page >

Defining Discovery: Core Concepts

- by Joe Lamantia

Discovery tools have had a referencable working definition since at least 2001, when Ben Shneiderman published 'Inventing Discovery Tools: Combining Information Visualization with Data Mining'. Dr. Shneiderman suggested the combination of the two distinct fields of data mining and information visualization could manifest as new category of tools for discovery, an understanding that remains essentially unaltered over ten years later. An industry analyst report titled Visual Discovery Tools: Market Segmentation and Product Positioning from March of this year, for example, reads, "Visual discovery tools are designed for visual data exploration, analysis and lightweight data mining." Tools should follow from the activities people undertake (a foundational tenet of activity centered design), however, and Dr. Shneiderman does not in fact describe or define discovery activity or capability. As I read it, discovery is assumed to be the implied sum of the separate fields of visualization and data mining as they were then understood. As a working definition that catalyzes a field of product prototyping, it's adequate in the short term. In the long term, it makes the boundaries of discovery both derived and temporary, and leaves a substantial gap in the landscape of core concepts around discovery, making consensus on the nature of most aspects of discovery difficult or impossible to reach. I think this definitional gap is a major reason that discovery is still an ambiguous product landscape. To help close that gap, I'm suggesting a few definitions of four core aspects of discovery. These come out of our sustained research into discovery needs and practices, and have the goal of clarifying the relationship between discvoery and other analytical categories. They are suggested, but should be internally coherent and consistent. Discovery activity is: "Purposeful sense making activity that intends to arrive at new insights and understanding through exploration and analysis (and for these we have specific defintions as well) of all types and sources of data." Discovery capability is: "The ability of people and organizations to purposefully realize valuable insights that address the full spectrum of business questions and problems by engaging effectively with all types and sources of data." Discovery tools: "Enhance individual and organizational ability to realize novel insights by augmenting and accelerating human sense making to allow engagement with all types of data at all useful scales." Discovery environments: "Enable organizations to undertake effective discovery efforts for all business purposes and perspectives, in an empirical and cooperative fashion." Note: applicability to a world of Big data is assumed - thus the refs to all scales / types / sources - rather than stated explicitly. I like that Big Data doesn't have to be written into this core set of definitions, b/c I think it's a transitional label - the new version of Web 2.0 - and goes away over time. References and Resources: Inventing Discovery Tools Visual Discovery Tools: Market Segmentation and Product Positioning Logic versus usage: the case for activity-centered design A Taxonomy of Enterprise Search and Discovery

Read the article
Windows Azure Use Case: High-Performance Computing (HPC)

- by BuckWoody

This is one in a series of posts on when and where to use a distributed architecture design in your organization's computing needs. You can find the main post here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Description: High-Performance Computing (also called Technical Computing) at its most simplistic is a layout of computer workloads where a “head node” accepts work requests, and parses them out to “worker nodes'”. This is useful in cases such as scientific simulations, drug research, MatLab work and where other large compute loads are required. It’s not the immediate-result type computing many are used to; instead, a “job” or group of work requests is sent to a cluster of computers and the worker nodes work on individual parts of the calculations and return the work to the scheduler or head node for the requestor in a batch-request fashion. This is typical to the way that many mainframe computing use-cases work. You can use commodity-based computers to create an HPC Cluster, such as the Linux application called Beowulf, and Microsoft has a server product for HPC using standard computers, called the Windows Compute Cluster that you can read more about here. The issue with HPC (from any vendor) that some organization have is the amount of compute nodes they need. Having too many results in excess infrastructure, including computers, buildings, storage, heat and so on. Having too few means that the work is slower, and takes longer to return a result to the calling application. Unless there is a consistent level of work requested, predicting the number of nodes is problematic. Implementation: Recently, Microsoft announced an internal partnership between the HPC group (Now called the Technical Computing Group) and Windows Azure. You now have two options for implementing an HPC environment using Windows. You can extend the current infrastructure you have for HPC by adding in Compute Nodes in Windows Azure, using a “Broker Node”. You can then purchase time for adding machines, and then stop paying for them when the work is completed. This is a common pattern in groups that have a constant need for HPC, but need to “burst” that load count under certain conditions. The second option is to install only a Head Node and a Broker Node onsite, and host all Compute Nodes in Windows Azure. This is often the pattern for organizations that need HPC on a scheduled and periodic basis, such as financial analysis or actuarial table calculations. References: Blog entry on Hybrid HPC with Windows Azure: http://blogs.msdn.com/b/ignitionshowcase/archive/2010/12/13/high-performance-computing-on-premise-and-in-the-windows-azure-cloud.aspx Links for further research on HPC, includes Windows Azure information: http://blogs.msdn.com/b/ncdevguy/archive/2011/02/16/handy-links-for-hpc-and-azure.aspx

Read the article
SQL SERVER – Sharing your ETL Resources Across Applications with Ease

- by pinaldave

Frequently an organization will find that the same resources are used in multiple ETL applications, for example, the same database, general purpose processing logic, or file system locations. Creating an easy way to reuse these resources across multiple applications would increase efficiency and reduce errors. Moreover, not every ETL developer has the same skill set, and it is likely that one developer will be more adept at writing code while another is more comfortable configuring database connections. Real productivity gains will come when these developers are able to work independently while still making their work available to others assigned to the same project. These are the benefits of a centralized version control system. Of course, most version control systems could be used to store and serve files, but the real need is to store and serve entire ETL applications so that each developer’s ongoing work can immediately benefit from another developer’s completed work. In other words, the version control system needs to be tightly integrated with the tools used to develop the ETL application. The following screen shot shows such a tool. Desktop ETL tool that tightly integrates with a central version control system Developers can checkout or commit entire projects or just a single artifact. Each artifact may be managed independently so if you need to go back to an earlier version of one artifact, changes you may have made to other artifacts are not lost. By being tightly integrated into the graphical environment used to create and edit the project artifacts, it is extremely easy and straight-forward to move your files to and from the version control system and there is no dependency on another vendor’s version control system. The built in version control system is optimized for managing the artifacts of ETL applications. It is equally important that the version control system supports all of the actions one typically performs such as rollbacks, locking and unlocking of files, and the ability to resolve conflicts. Note that this particular ETL tool also has the capability to switch back and forth between multiple version control systems. It also needs to be easy to determine the status of an artifact. Not just that it has been committed or modified, but when and by whom. Generally you must query the version control system for this information, but having it displayed within the development environment is more desirable. Who’s ETL tool works in this fashion? Last month I mentioned the data integration solution offered by expressor software. The version control features I described in this post are all available in their just released expressor 3.1 Standard Edition through the integration of their expressor Studio development environment with a centralized metadata repository and version control system. You can download their Studio application, which is free, or evaluate the full Standard Edition on your own hardware. It may be worth your time. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
VSDB to SSDT part 4 : Redistributable database deployment package with SqlPackage.exe

- by Etienne Giust

The goal here is to use SSDT SqlPackage to deploy the output of a Visual Studio 2012 Database project… a bit in the same fashion that was detailed here : http://geekswithblogs.net/80n/archive/2012/09/12/vsdb-to-ssdt-part-3--command-line-deployment-with-sqlpackage.exe.aspx The difference is we want to do it on an environment where Visual Studio 2012 and SSDT are not installed. This might be the case of your Production server. Package structure So, to get started you need to create a folder named “DeploymentSSDTRedistributable”. This folder will have the following structure : The dacpac and dll files are the outputs of your Visual Studio 2012 Database project. If your database project references another database project, you need to put their dacpac and dll here too, otherwise deployment will not work. The publish.xml file is the publish configuration suitable for your target environment. It holds connexion strings, SQLVARS parameters and deployment options. Review it carefully. The SqlDacRuntime folder (an arbitrary chosen name) will hold the SqlPackage executable and supporting libraries Contents of the SqlDacRuntime folder Here is what you need to put in the SqlDacRuntime folder : You will be able to find these files in the following locations, on a machine with Visual Studio 2012 Ultimate installed : C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin : SqlPackage.exe Microsoft.Data.Tools.Schema.Sql.dll Microsoft.Data.Tools.Utilities.dll Microsoft.SqlServer.Dac.dll C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.SqlServer.TransactSql.ScriptDom\v4.0_11.0.0.0__89845dcd8080cc91 Microsoft.SqlServer.TransactSql.ScriptDom.dll Deploying Now take your DeploymentSSDTRedistributable deployment package to your remote machine. In a standard command window, place yourself inside the DeploymentSSDTRedistributable folder. You can first perform a check of what will be updated in the target database. The DeployReport task of SqlPackage.exe will help you do that. The following command will output an xml of the changes: "SqlDacRuntime/SqlPackage.exe" /Action:DeployReport /SourceFile:./Our.Database.dacpac /Profile:./Release.publish.xml /OutputPath:./ChangesToDeploy.xml You might get some warnings on Log and Data file like I did. You can ignore them. Also, the tool is warning about data loss when removing a column from a table. By default, the publish.xml options will prevent you from deploying when data loss is occuring (see the BlockOnPossibleDataLoss inside the publish.xml file). Before actual deployment, take time to carefully review the changes to be applied in the ChangesToDeploy.xml file. When you are satisfied, you can deploy your changes with the following command : "SqlDacRuntime/SqlPackage.exe" /Action:Publish /SourceFile:./Our.Database.dacpac /Profile:./Release.publish.xml Et voilà ! Your dacpac file has been deployed to your database. I’ve been testing this on a SQL 2008 Server (not R2) but it should work on 2005, 2008 R2 and 2012 as well. Many thanks to Anuj Chaudhary for his article on the subject : http://www.anujchaudhary.com/2012/08/sqlpackageexe-automating-ssdt-deployment.html

Read the article
Algorithm for spreading labels in a visually appealing and intuitive way

- by mac

Short version Is there a design pattern for distributing vehicle labels in a non-overlapping fashion, placing them as close as possible to the vehicle they refer to? If not, is any of the method I suggest viable? How would you implement this yourself? Extended version In the game I'm writing I have a bird-eye vision of my airborne vehicles. I also have next to each of the vehicles a small label with key-data about the vehicle. This is an actual screenshot: Now, since the vehicles could be flying at different altitudes, their icons could overlap. However I would like to never have their labels overlapping (or a label from vehicle 'A' overlap the icon of vehicle 'B'). Currently, I can detect collisions between sprites and I simply push away the offending label in a direction opposite to the otherwise-overlapped sprite. This works in most situations, but when the airspace get crowded, the label can get pushed very far away from its vehicle, even if there was an alternate "smarter" alternative. For example I get: B - label A -----------label C - label where it would be better (= label closer to the vehicle) to get: B - label label - A C - label EDIT: It also has to be considered that beside the overlapping vehicles case, there might be other configurations in which vehicles'labels could overlap (the ASCII-art examples show for example three very close vehicles in which the label of A would overlap the icon of B and C). I have two ideas on how to improve the present situation, but before spending time implementing them, I thought to turn to the community for advice (after all it seems like a "common enough problem" that a design pattern for it could exist). For what it's worth, here's the two ideas I was thinking to: Slot-isation of label space In this scenario I would divide all the screen into "slots" for the labels. Then, each vehicle would always have its label placed in the closest empty one (empty = no other sprites at that location. Spiralling search From the location of the vehicle on the screen, I would try to place the label at increasing angles and then at increasing radiuses, until a non-overlapping location is found. Something down the line of: try 0°, 10px try 10°, 10px try 20°, 10px ... try 350°, 10px try 0°, 20px try 10°, 20px ...

Read the article
PTLQueue : a scalable bounded-capacity MPMC queue

- by Dave

Title: Fast concurrent MPMC queue -- I've used the following concurrent queue algorithm enough that it warrants a blog entry. I'll sketch out the design of a fast and scalable multiple-producer multiple-consumer (MPSC) concurrent queue called PTLQueue. The queue has bounded capacity and is implemented via a circular array. Bounded capacity can be a useful property if there's a mismatch between producer rates and consumer rates where an unbounded queue might otherwise result in excessive memory consumption by virtue of the container nodes that -- in some queue implementations -- are used to hold values. A bounded-capacity queue can provide flow control between components. Beware, however, that bounded collections can also result in resource deadlock if abused. The put() and take() operators are partial and wait for the collection to become non-full or non-empty, respectively. Put() and take() do not allocate memory, and are not vulnerable to the ABA pathologies. The PTLQueue algorithm can be implemented equally well in C/C++ and Java. Partial operators are often more convenient than total methods. In many use cases if the preconditions aren't met, there's nothing else useful the thread can do, so it may as well wait via a partial method. An exception is in the case of work-stealing queues where a thief might scan a set of queues from which it could potentially steal. Total methods return ASAP with a success-failure indication. (It's tempting to describe a queue or API as blocking or non-blocking instead of partial or total, but non-blocking is already an overloaded concurrency term. Perhaps waiting/non-waiting or patient/impatient might be better terms). It's also trivial to construct partial operators by busy-waiting via total operators, but such constructs may be less efficient than an operator explicitly and intentionally designed to wait. A PTLQueue instance contains an array of slots, where each slot has volatile Turn and MailBox fields. The array has power-of-two length allowing mod/div operations to be replaced by masking. We assume sensible padding and alignment to reduce the impact of false sharing. (On x86 I recommend 128-byte alignment and padding because of the adjacent-sector prefetch facility). Each queue also has PutCursor and TakeCursor cursor variables, each of which should be sequestered as the sole occupant of a cache line or sector. You can opt to use 64-bit integers if concerned about wrap-around aliasing in the cursor variables. Put(null) is considered illegal, but the caller or implementation can easily check for and convert null to a distinguished non-null proxy value if null happens to be a value you'd like to pass. Take() will accordingly convert the proxy value back to null. An advantage of PTLQueue is that you can use atomic fetch-and-increment for the partial methods. We initialize each slot at index I with (Turn=I, MailBox=null). Both cursors are initially 0. All shared variables are considered "volatile" and atomics such as CAS and AtomicFetchAndIncrement are presumed to have bidirectional fence semantics. Finally T is the templated type. I've sketched out a total tryTake() method below that allows the caller to poll the queue. tryPut() has an analogous construction. Zebra stripping : alternating row colors for nice-looking code listings. See also google code "prettify" : https://code.google.com/p/google-code-prettify/ Prettify is a javascript module that yields the HTML/CSS/JS equivalent of pretty-print. -- pre:nth-child(odd) { background-color:#ff0000; } pre:nth-child(even) { background-color:#0000ff; } border-left: 11px solid #ccc; margin: 1.7em 0 1.7em 0.3em; background-color:#BFB; font-size:12px; line-height:65%; " // PTLQueue : Put(v) : // producer : partial method - waits as necessary assert v != null assert Mask = 1 && (Mask & (Mask+1)) == 0 // Document invariants // doorway step // Obtain a sequence number -- ticket // As a practical concern the ticket value is temporally unique // The ticket also identifies and selects a slot auto tkt = AtomicFetchIncrement (&PutCursor, 1) slot * s = &Slots[tkt & Mask] // waiting phase : // wait for slot's generation to match the tkt value assigned to this put() invocation. // The "generation" is implicitly encoded as the upper bits in the cursor // above those used to specify the index : tkt div (Mask+1) // The generation serves as an epoch number to identify a cohort of threads // accessing disjoint slots while s-Turn != tkt : Pause assert s-MailBox == null s-MailBox = v // deposit and pass message Take() : // consumer : partial method - waits as necessary auto tkt = AtomicFetchIncrement (&TakeCursor,1) slot * s = &Slots[tkt & Mask] // 2-stage waiting : // First wait for turn for our generation // Acquire exclusive "take" access to slot's MailBox field // Then wait for the slot to become occupied while s-Turn != tkt : Pause // Concurrency in this section of code is now reduced to just 1 producer thread // vs 1 consumer thread. // For a given queue and slot, there will be most one Take() operation running // in this section. // Consumer waits for producer to arrive and make slot non-empty // Extract message; clear mailbox; advance Turn indicator // We have an obvious happens-before relation : // Put(m) happens-before corresponding Take() that returns that same "m" for T v = s-MailBox if v != null : s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 // unlock slot to admit next producer and consumer return v Pause tryTake() : // total method - returns ASAP with failure indication for auto tkt = TakeCursor slot * s = &Slots[tkt & Mask] if s-Turn != tkt : return null T v = s-MailBox // presumptive return value if v == null : return null // ratify tkt and v values and commit by advancing cursor if CAS (&TakeCursor, tkt, tkt+1) != tkt : continue s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 return v The basic idea derives from the Partitioned Ticket Lock "PTL" (US20120240126-A1) and the MultiLane Concurrent Bag (US8689237). The latter is essentially a circular ring-buffer where the elements themselves are queues or concurrent collections. You can think of the PTLQueue as a partitioned ticket lock "PTL" augmented to pass values from lock to unlock via the slots. Alternatively, you could conceptualize of PTLQueue as a degenerate MultiLane bag where each slot or "lane" consists of a simple single-word MailBox instead of a general queue. Each lane in PTLQueue also has a private Turn field which acts like the Turn (Grant) variables found in PTL. Turn enforces strict FIFO ordering and restricts concurrency on the slot mailbox field to at most one simultaneous put() and take() operation. PTL uses a single "ticket" variable and per-slot Turn (grant) fields while MultiLane has distinct PutCursor and TakeCursor cursors and abstract per-slot sub-queues. Both PTL and MultiLane advance their cursor and ticket variables with atomic fetch-and-increment. PTLQueue borrows from both PTL and MultiLane and has distinct put and take cursors and per-slot Turn fields. Instead of a per-slot queues, PTLQueue uses a simple single-word MailBox field. PutCursor and TakeCursor act like a pair of ticket locks, conferring "put" and "take" access to a given slot. PutCursor, for instance, assigns an incoming put() request to a slot and serves as a PTL "Ticket" to acquire "put" permission to that slot's MailBox field. To better explain the operation of PTLQueue we deconstruct the operation of put() and take() as follows. Put() first increments PutCursor obtaining a new unique ticket. That ticket value also identifies a slot. Put() next waits for that slot's Turn field to match that ticket value. This is tantamount to using a PTL to acquire "put" permission on the slot's MailBox field. Finally, having obtained exclusive "put" permission on the slot, put() stores the message value into the slot's MailBox. Take() similarly advances TakeCursor, identifying a slot, and then acquires and secures "take" permission on a slot by waiting for Turn. Take() then waits for the slot's MailBox to become non-empty, extracts the message, and clears MailBox. Finally, take() advances the slot's Turn field, which releases both "put" and "take" access to the slot's MailBox. Note the asymmetry : put() acquires "put" access to the slot, but take() releases that lock. At any given time, for a given slot in a PTLQueue, at most one thread has "put" access and at most one thread has "take" access. This restricts concurrency from general MPMC to 1-vs-1. We have 2 ticket locks -- one for put() and one for take() -- each with its own "ticket" variable in the form of the corresponding cursor, but they share a single "Grant" egress variable in the form of the slot's Turn variable. Advancing the PutCursor, for instance, serves two purposes. First, we obtain a unique ticket which identifies a slot. Second, incrementing the cursor is the doorway protocol step to acquire the per-slot mutual exclusion "put" lock. The cursors and operations to increment those cursors serve double-duty : slot-selection and ticket assignment for locking the slot's MailBox field. At any given time a slot MailBox field can be in one of the following states: empty with no pending operations -- neutral state; empty with one or more waiting take() operations pending -- deficit; occupied with no pending operations; occupied with one or more waiting put() operations -- surplus; empty with a pending put() or pending put() and take() operations -- transitional; or occupied with a pending take() or pending put() and take() operations -- transitional. The partial put() and take() operators can be implemented with an atomic fetch-and-increment operation, which may confer a performance advantage over a CAS-based loop. In addition we have independent PutCursor and TakeCursor cursors. Critically, a put() operation modifies PutCursor but does not access the TakeCursor and a take() operation modifies the TakeCursor cursor but does not access the PutCursor. This acts to reduce coherence traffic relative to some other queue designs. It's worth noting that slow threads or obstruction in one slot (or "lane") does not impede or obstruct operations in other slots -- this gives us some degree of obstruction isolation. PTLQueue is not lock-free, however. The implementation above is expressed with polite busy-waiting (Pause) but it's trivial to implement per-slot parking and unparking to deschedule waiting threads. It's also easy to convert the queue to a more general deque by replacing the PutCursor and TakeCursor cursors with Left/Front and Right/Back cursors that can move either direction. Specifically, to push and pop from the "left" side of the deque we would decrement and increment the Left cursor, respectively, and to push and pop from the "right" side of the deque we would increment and decrement the Right cursor, respectively. We used a variation of PTLQueue for message passing in our recent OPODIS 2013 paper. ul { list-style:none; padding-left:0; padding:0; margin:0; margin-left:0; } ul#myTagID { padding: 0px; margin: 0px; list-style:none; margin-left:0;} -- -- There's quite a bit of related literature in this area. I'll call out a few relevant references: Wilson's NYU Courant Institute UltraComputer dissertation from 1988 is classic and the canonical starting point : Operating System Data Structures for Shared-Memory MIMD Machines with Fetch-and-Add. Regarding provenance and priority, I think PTLQueue or queues effectively equivalent to PTLQueue have been independently rediscovered a number of times. See CB-Queue and BNPBV, below, for instance. But Wilson's dissertation anticipates the basic idea and seems to predate all the others. Gottlieb et al : Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors Orozco et al : CB-Queue in Toward high-throughput algorithms on many-core architectures which appeared in TACO 2012. Meneghin et al : BNPVB family in Performance evaluation of inter-thread communication mechanisms on multicore/multithreaded architecture Dmitry Vyukov : bounded MPMC queue (highly recommended) Alex Otenko : US8607249 (highly related). John Mellor-Crummey : Concurrent queues: Practical fetch-and-phi algorithms. Technical Report 229, Department of Computer Science, University of Rochester Thomasson : FIFO Distributed Bakery Algorithm (very similar to PTLQueue). Scott and Scherer : Dual Data Structures I'll propose an optimization left as an exercise for the reader. Say we wanted to reduce memory usage by eliminating inter-slot padding. Such padding is usually "dark" memory and otherwise unused and wasted. But eliminating the padding leaves us at risk of increased false sharing. Furthermore lets say it was usually the case that the PutCursor and TakeCursor were numerically close to each other. (That's true in some use cases). We might still reduce false sharing by incrementing the cursors by some value other than 1 that is not trivially small and is coprime with the number of slots. Alternatively, we might increment the cursor by one and mask as usual, resulting in a logical index. We then use that logical index value to index into a permutation table, yielding an effective index for use in the slot array. The permutation table would be constructed so that nearby logical indices would map to more distant effective indices. (Open question: what should that permutation look like? Possibly some perversion of a Gray code or De Bruijn sequence might be suitable). As an aside, say we need to busy-wait for some condition as follows : "while C == 0 : Pause". Lets say that C is usually non-zero, so we typically don't wait. But when C happens to be 0 we'll have to spin for some period, possibly brief. We can arrange for the code to be more machine-friendly with respect to the branch predictors by transforming the loop into : "if C == 0 : for { Pause; if C != 0 : break; }". Critically, we want to restructure the loop so there's one branch that controls entry and another that controls loop exit. A concern is that your compiler or JIT might be clever enough to transform this back to "while C == 0 : Pause". You can sometimes avoid this by inserting a call to a some type of very cheap "opaque" method that the compiler can't elide or reorder. On Solaris, for instance, you could use :"if C == 0 : { gethrtime(); for { Pause; if C != 0 : break; }}". It's worth noting the obvious duality between locks and queues. If you have strict FIFO lock implementation with local spinning and succession by direct handoff such as MCS or CLH,then you can usually transform that lock into a queue. Hidden commentary and annotations - invisible : * And of course there's a well-known duality between queues and locks, but I'll leave that topic for another blog post. * Compare and contrast : PTLQ vs PTL and MultiLane * Equivalent : Turn; seq; sequence; pos; position; ticket * Put = Lock; Deposit Take = identify and reserve slot; wait; extract & clear; unlock * conceptualize : Distinct PutLock and TakeLock implemented as ticket lock or PTL Distinct arrival cursors but share per-slot "Turn" variable provides exclusive role-based access to slot's mailbox field put() acquires exclusive access to a slot for purposes of "deposit" assigns slot round-robin and then acquires deposit access rights/perms to that slot take() acquires exclusive access to slot for purposes of "withdrawal" assigns slot round-robin and then acquires withdrawal access rights/perms to that slot At any given time, only one thread can have withdrawal access to a slot at any given time, only one thread can have deposit access to a slot Permissible for T1 to have deposit access and T2 to simultaneously have withdrawal access * round-robin for the purposes of; role-based; access mode; access role mailslot; mailbox; allocate/assign/identify slot rights; permission; license; access permission; * PTL/Ticket hybrid Asymmetric usage ; owner oblivious lock-unlock pairing K-exclusion add Grant cursor pass message m from lock to unlock via Slots[] array Cursor performs 2 functions : + PTL ticket + Assigns request to slot in round-robin fashion Deconstruct protocol : explication put() : allocate slot in round-robin fashion acquire PTL for "put" access store message into slot associated with PTL index take() : Acquire PTL for "take" access // doorway step seq = fetchAdd (&Grant, 1) s = &Slots[seq & Mask] // waiting phase while s-Turn != seq : pause Extract : wait for s-mailbox to be full v = s-mailbox s-mailbox = null Release PTL for both "put" and "take" access s-Turn = seq + Mask + 1 * Slot round-robin assignment and lock "doorway" protocol leverage the same cursor and FetchAdd operation on that cursor FetchAdd (&Cursor,1) + round-robin slot assignment and dispersal + PTL/ticket lock "doorway" step waiting phase is via "Turn" field in slot * PTLQueue uses 2 cursors -- put and take. Acquire "put" access to slot via PTL-like lock Acquire "take" access to slot via PTL-like lock 2 locks : put and take -- at most one thread can access slot's mailbox Both locks use same "turn" field Like multilane : 2 cursors : put and take slot is simple 1-capacity mailbox instead of queue Borrow per-slot turn/grant from PTL Provides strict FIFO Lock slot : put-vs-put take-vs-take at most one put accesses slot at any one time at most one put accesses take at any one time reduction to 1-vs-1 instead of N-vs-M concurrency Per slot locks for put/take Release put/take by advancing turn * is instrumental in ... * P-V Semaphore vs lock vs K-exclusion * See also : FastQueues-excerpt.java dice-etc/queue-mpmc-bounded-blocking-circular-xadd/ * PTLQueue is the same as PTLQB - identical * Expedient return; ASAP; prompt; immediately * Lamport's Bakery algorithm : doorway step then waiting phase Threads arriving at doorway obtain a unique ticket number Threads enter in ticket order * In the terminology of Reed and Kanodia a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer It can also be thought of as an optimization of Lamport's bakery lock was designed for fault-tolerance rather than performance Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers --

Read the article
SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

- by Pinal Dave

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance. Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode. In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion. The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values. Properly configured, it is reliable and reasonably fast. Among the many settings available on the lookup component, one of the most critical is the cache mode. This selection will determine whether and how the distinct lookup values are cached during package execution. It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results. Full Cache The full cache mode setting is the default cache mode selection in the SSIS lookup transformation. Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location. As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS. The most commonly used cache mode is the full cache setting, and for good reason. The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well. Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution. There are a few potential gotchas to be aware of when using full cache mode. First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data. If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data. Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation. This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation). There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation. Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation. Full cache mode is ideally useful when one or all of the following conditions exist: The size of the reference data set is small to moderately sized The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data Partial Cache When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow. Initially, each distinct value will be retrieved individually from the specified source, and then cached. To be clear, this is a row-by-row lookup for each distinct key value(s). This is a less frequently used cache setting because it addresses a narrower set of scenarios. Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set. If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)). Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode. Using partial cache mode is ideally suited for the conditions below: The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small The size of the lookup data is too large to effectively store in cache The lookup source is well indexed to allow for fast retrieval of row-by-row values No Cache As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS. As a result, every single row in the pipeline data set will require a query against the lookup source. Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused. In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful. As such, it’s critical to know your data before choosing this option. Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline. I would recommend considering the no cache mode only when all of the below conditions are true: The reference data set is too large to reasonably be loaded into SSIS memory The pipeline data set is small and is not expected to grow There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values) Conclusion The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow. Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution. Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

Read the article
C# Performance Pitfall – Interop Scenarios Change the Rules

- by Reed

C# and .NET, overall, really do have fantastic performance in my opinion. That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar. However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++. Often, they’re exactly backwards. For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages. If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler. Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph. For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data. As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack. However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem. In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating. At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit. In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes. This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code. For good or bad, this is typically not a good idea. The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea. In this scenario, for example, I may have been much better off leaving the original code alone. The reason for this is the garbage collector. The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages. First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code. This is something that should be avoided unless there is a significant reason. Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast. Finally, and most critically, there is a large advantage to having short lived objects. If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2). This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops. This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1. It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean. It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true. That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary. In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects. In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph. Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply. After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration. This required going back to a “native programming” mindset for optimization. Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

Read the article
Nashorn, the rhino in the room

- by costlow

Nashorn is a new runtime within JDK 8 that allows developers to run code written in JavaScript and call back and forth with Java. One advantage to the Nashorn scripting engine is that is allows for quick prototyping of functionality or basic shell scripts that use Java libraries. The previous JavaScript runtime, named Rhino, was introduced in JDK 6 (released 2006, end of public updates Feb 2013). Keeping tradition amongst the global developer community, "Nashorn" is the German word for rhino. The Java platform and runtime is an intentional home to many languages beyond the Java language itself. OpenJDK’s Da Vinci Machine helps coordinate work amongst language developers and tool designers and has helped different languages by introducing the Invoke Dynamic instruction in Java 7 (2011), which resulted in two major benefits: speeding up execution of dynamic code, and providing the groundwork for Java 8’s lambda executions. Many of these improvements are discussed at the JVM Language Summit, where language and tool designers get together to discuss experiences and issues related to building these complex components. There are a number of benefits to running JavaScript applications on JDK 8’s Nashorn technology beyond writing scripts quickly: Interoperability with Java and JavaScript libraries. Scripts do not need to be compiled. Fast execution and multi-threading of JavaScript running in Java’s JRE. The ability to remotely debug applications using an IDE like NetBeans, Eclipse, or IntelliJ (instructions on the Nashorn blog). Automatic integration with Java monitoring tools, such as performance, health, and SIEM. In the remainder of this blog post, I will explain how to use Nashorn and the benefit from those features. Nashorn execution environment The Nashorn scripting engine is included in all versions of Java SE 8, both the JDK and the JRE. Unlike Java code, scripts written in nashorn are interpreted and do not need to be compiled before execution. Developers and users can access it in two ways: Users running JavaScript applications can call the binary directly:jre8/bin/jjs This mechanism can also be used in shell scripts by specifying a shebang like #!/usr/bin/jjs Developers can use the API and obtain a ScriptEngine through:ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn"); When using a ScriptEngine, please understand that they execute code. Avoid running untrusted scripts or passing in untrusted/unvalidated inputs. During compilation, consider isolating access to the ScriptEngine and using Type Annotations to only allow @Untainted String arguments. One noteworthy difference between JavaScript executed in or outside of a web browser is that certain objects will not be available. For example when run outside a browser, there is no access to a document object or DOM tree. Other than that, all syntax, semantics, and capabilities are present. Examples of Java and JavaScript The Nashorn script engine allows developers of all experience levels the ability to write and run code that takes advantage of both languages. The specific dialect is ECMAScript 5.1 as identified by the User Guide and its standards definition through ECMA international. In addition to the example below, Benjamin Winterberg has a very well written Java 8 Nashorn Tutorial that provides a large number of code samples in both languages. Basic Operations A basic Hello World application written to run on Nashorn would look like this: #!/usr/bin/jjs print("Hello World"); The first line is a standard script indication, so that Linux or Unix systems can run the script through Nashorn. On Windows where scripts are not as common, you would run the script like: jjs helloWorld.js. Receiving Arguments In order to receive program arguments your jjs invocation needs to use the -scripting flag and a double-dash to separate which arguments are for jjs and which are for the script itself:jjs -scripting print.js -- "This will print" #!/usr/bin/jjs var whatYouSaid = $ARG.length==0 ? "You did not say anything" : $ARG[0] print(whatYouSaid); Interoperability with Java libraries (including 3rd party dependencies) Another goal of Nashorn was to allow for quick scriptable prototypes, allowing access into Java types and any libraries. Resources operate in the context of the script (either in-line with the script or as separate threads) so if you open network sockets and your script terminates, those sockets will be released and available for your next run. Your code can access Java types the same as regular Java classes. The “import statements” are written somewhat differently to accommodate for language. There is a choice of two styles: For standard classes, just name the class: var ServerSocket = java.net.ServerSocket For arrays or other items, use Java.type: var ByteArray = Java.type("byte[]")You could technically do this for all. The same technique will allow your script to use Java types from any library or 3rd party component and quickly prototype items. Building a user interface One major difference between JavaScript inside and outside of a web browser is the availability of a DOM object for rendering views. When run outside of the browser, JavaScript has full control to construct the entire user interface with pre-fabricated UI controls, charts, or components. The example below is a variation from the Nashorn and JavaFX guide to show how items work together. Nashorn has a -fx flag to make the user interface components available. With the example script below, just specify: jjs -fx -scripting fx.js -- "My title" #!/usr/bin/jjs -fx var Button = javafx.scene.control.Button; var StackPane = javafx.scene.layout.StackPane; var Scene = javafx.scene.Scene; var clickCounter=0; $STAGE.title = $ARG.length>0 ? $ARG[0] : "You didn't provide a title"; var button = new Button(); button.text = "Say 'Hello World'"; button.onAction = myFunctionForButtonClicking; var root = new StackPane(); root.children.add(button); $STAGE.scene = new Scene(root, 300, 250); $STAGE.show(); function myFunctionForButtonClicking(){ var text = "Click Counter: " + clickCounter; button.setText(text); clickCounter++; print(text); } For a more advanced post on using Nashorn to build a high-performing UI, see JavaFX with Nashorn Canvas example. Interoperable with frameworks like Node, Backbone, or Facebook React The major benefit of any language is the interoperability gained by people and systems that can read, write, and use it for interactions. Because Nashorn is built for the ECMAScript specification, developers familiar with JavaScript frameworks can write their code and then have system administrators deploy and monitor the applications the same as any other Java application. A number of projects are also running Node applications on Nashorn through Project Avatar and the supported modules. In addition to the previously mentioned Nashorn tutorial, Benjamin has also written a post about Using Backbone.js with Nashorn. To show the multi-language power of the Java Runtime, there is another interesting example that unites Facebook React and Clojure on JDK 8’s Nashorn. Summary Nashorn provides a simple and fast way of executing JavaScript applications and bridging between the best of each language. By making the full range of Java libraries to JavaScript applications, and the quick prototyping style of JavaScript to Java applications, developers are free to work as they see fit. Software Architects and System Administrators can take advantage of one runtime and leverage any work that they have done to tune, monitor, and certify their systems. Additional information is available within: The Nashorn Users’ Guide Java Magazine’s article "Next Generation JavaScript Engine for the JVM." The Nashorn team’s primary blog or a very helpful collection of Nashorn links.

Read the article
Oracle Flashback Technologies - Overview

- by Sridhar_R-Oracle

Oracle Flashback Technologies - IntroductionIn his May 29th 2014 blog, my colleague Joe Meeks introduced Oracle Maximum Availability Architecture (MAA) and discussed both planned and unplanned outages. Let’s take a closer look at unplanned outages. These can be caused by physical failures (e.g., server, storage, network, file deletion, physical corruption, site failures) or by logical failures – cases where all components and files are physically available, but data is incorrect or corrupt. These logical failures are usually caused by human errors or application logic errors. This blog series focuses on these logical errors – what causes them and how to address and recover from them using Oracle Database Flashback. In this introductory blog post, I’ll provide an overview of the Oracle Database Flashback technologies and will discuss the features in detail in future blog posts. Let’s get started. We are all human beings (unless a machine is reading this), and making mistakes is a part of what we do…often what we do best! We “fat finger”, we spill drinks on keyboards, unplug the wrong cables, etc. In addition, many of us, in our lives as DBAs or developers, must have observed, caused, or corrected one or more of the following unpleasant events: Accidentally updated a table with wrong values !! Performed a batch update that went wrong - due to logical errors in the code !! Dropped a table !! How do DBAs typically recover from these types of errors? First, data needs to be restored and recovered to the point-in-time when the error occurred (incomplete or point-in-time recovery). Moreover, depending on the type of fault, it’s possible that some services – or even the entire database – would have to be taken down during the recovery process.Apart from error conditions, there are other questions that need to be addressed as part of the investigation. For example, what did the data look like in the morning, prior to the error? What were the various changes to the row(s) between two timestamps? Who performed the transaction and how can it be reversed? Oracle Database includes built-in Flashback technologies, with features that address these challenges and questions, and enable you to perform faster, easier, and convenient recovery from logical corruptions. HistoryFlashback Query, the first Flashback Technology, was introduced in Oracle 9i. It provides a simple, powerful and completely non-disruptive mechanism for data verification and recovery from logical errors, and enables users to view the state of data at a previous point in time.Flashback Technologies were further enhanced in Oracle 10g, to provide fast, easy recovery at the database, table, row, and even at a transaction level.Oracle Database 11g introduced an innovative method to manage and query long-term historical data with Flashback Data Archive. The 11g release also introduced Flashback Transaction, which provides an easy, one-step operation to back out a transaction. Oracle Database versions 11.2.0.2 and beyond further enhanced the performance of these features. Note that all the features listed here work without requiring any kind of restore operation.In addition, Flashback features are fully supported with the new multi-tenant capabilities introduced with Oracle Database 12c, Flashback Features Oracle Flashback Database enables point-in-time-recovery of the entire database without requiring a traditional restore and recovery operation. It rewinds the entire database to a specified point in time in the past by undoing all the changes that were made since that time.Oracle Flashback Table enables an entire table or a set of tables to be recovered to a point in time in the past.Oracle Flashback Drop enables accidentally dropped tables and all dependent objects to be restored.Oracle Flashback Query enables data to be viewed at a point-in-time in the past. This feature can be used to view and reconstruct data that was lost due to unintentional change(s) or deletion(s). This feature can also be used to build self-service error correction into applications, empowering end-users to undo and correct their errors.Oracle Flashback Version Query offers the ability to query the historical changes to data between two points in time or system change numbers (SCN) Oracle Flashback Transaction Query enables changes to be examined at the transaction level. This capability can be used to diagnose problems, perform analysis, audit transactions, and even revert the transaction by undoing SQLOracle Flashback Transaction is a procedure used to back-out a transaction and its dependent transactions.Flashback technologies eliminate the need for a traditional restore and recovery process to fix logical corruptions or make enquiries. Using these technologies, you can recover from the error in the same amount of time it took to generate the error. All the Flashback features can be accessed either via SQL command line (or) via Enterprise Manager. Most of the Flashback technologies depend on the available UNDO to retrieve older data. The following table describes the various Flashback technologies: their purpose, dependencies and situations where each individual technology can be used. Example Syntax Error investigation related:The purpose is to investigate what went wrong and what the values were at certain points in timeFlashback Queries ( select .. as of SCN | Timestamp ) - Helps to see the value of a row/set of rows at a point in timeFlashback Version Queries ( select .. versions between SCN | Timestamp and SCN | Timestamp) - Helps determine how the value evolved between certain SCNs or between timestamps Flashback Transaction Queries (select .. XID=) - Helps to understand how the transaction caused the changes.Error correction related:The purpose is to fix the error and correct the problems,Flashback Table (flashback table .. to SCN | Timestamp) - To rewind the table to a particular timestamp or SCN to reverse unwanted updates Flashback Drop (flashback table .. to before drop ) - To undrop or undelete a table Flashback Database (flashback database to SCN | Restore Point ) - This is the rewind button for Oracle databases. You can revert the entire database to a particular point in time. It is a fast way to perform a PITR (point-in-time recovery). Flashback Transaction (DBMS_FLASHBACK.TRANSACTION_BACKOUT(XID..)) - To reverse a transaction and its related transactions Advanced use cases Flashback technology is integrated into Oracle Recovery Manager (RMAN) and Oracle Data Guard. So, apart from the basic use cases mentioned above, the following use cases are addressed using Oracle Flashback. Block Media recovery by RMAN - to perform block level recovery Snapshot Standby - where the standby is temporarily converted to a read/write environment for testing, backup, or migration purposes Re-instate old primary in a Data Guard environment – this avoids the need to restore an old backup and perform a recovery to make it a new standby. Guaranteed Restore Points - to bring back the entire database to an older point-in-time in a guaranteed way. and so on..I hope this introductory overview helps you understand how Flashback features can be used to investigate and recover from logical errors. As mentioned earlier, I will take a deeper-dive into to some of the critical Flashback features in my upcoming blogs and address common use cases.

Read the article
The SPARC SuperCluster

- by Karoly Vegh

Oracle has been providing a lead in the Engineered Systems business for quite a while now, in accordance with the motto "Hardware and Software Engineered to Work Together." Indeed it is hard to find a better definition of these systems. Allow me to summarize the idea. It is: Build a compute platform optimized to run your technologies Develop application aware, intelligently caching storage components Take an impressively fast network technology interconnecting it with the compute nodes Tune the application to scale with the nodes to yet unseen performance Reduce the amount of data moving via compression Provide this all in a pre-integrated single product with a single-pane management interface All these ideas have been around in IT for quite some time now. The real Oracle advantage is adding the last one to put these all together. Oracle has built quite a portfolio of Engineered Systems, to run its technologies - and run those like they never ran before. In this post I'll focus on one of them that serves as a consolidation demigod, a multi-purpose engineered system. As you probably have guessed, I am talking about the SPARC SuperCluster. It has many great features inherited from its predecessors, and it adds several new ones. Allow me to pick out and elaborate about some of the most interesting ones from a technological point of view. I. It is the SPARC SuperCluster T4-4. That is, as compute nodes, it includes SPARC T4-4 servers that we learned to appreciate and respect for their features: The SPARC T4 CPUs: Each CPU has 8 cores, each core runs 8 threads. The SPARC T4-4 servers have 4 sockets. That is, a single compute node can in parallel, simultaneously execute 256 threads. Now, a full-rack SPARC SuperCluster has 4 of these servers on board. Remember the keyword demigod. While retaining the forerunner SPARC T3's exceptional throughput, the SPARC T4 CPUs raise the bar with single performance too - a humble 5x better one than their ancestors. actually, the SPARC T4 CPU cores run in both single-threaded and multi-threaded mode, and switch between these two on-the-fly, fulfilling not only single-threaded OR multi-threaded applications' needs, but even mixed requirements (like in database workloads!). Data security, anyone? Every SPARC T4 CPU core has a built-in encryption engine, that is, encryption algorithms cast into silicon. A PCI controller right on the chip for customers who need I/O performance. Built-in, no-cost Virtualization: Oracle VM for SPARC (the former LDoms or Logical Domains) is not a server-emulation virtualization technology but rather a serverpartitioning one, the hypervisor runs in the server firmware, and all the VMs' HW resources (I/O, CPU, memory) are accessed natively, without performance overhead. This enables customers to run a number of Solaris 10 and Solaris 11 VMs separated, independent of each other within a physical server II. For Database performance, it includes Exadata Storage Cells - one of the main reasons why the Exadata Database Machine performs at diabolic speed. What makes them important? They provide DB backend storage for your Oracle Databases to run on the SPARC SuperCluster, that is what they are built and tuned for DB performance. These storage cells are SQL-aware. That is, if a SPARC T4 database compute node executes a query, it doesn't simply request tons of raw datablocks from the storage, filters the received data, and throws away most of it where the statement doesn't apply, but provides the SQL query to the storage node too. The storage cell software speaks SQL, that is, it is able to prefilter and through that transfer only the relevant data. With this, the traffic between database nodes and storage cells is reduced immensely. Less I/O is a good thing - as they say, all the CPUs of the world do one thing just as fast as any other - and that is waiting for I/O. They don't only pre-filter, but also provide data preprocessing features - e.g. if a DB-node requests an aggregate of data, they can calculate it, and handover only the results, not the whole set. Again, less data to transfer. They support the magical HCC, (Hybrid Columnar Compression). That is, data can be stored in a precompressed form on the storage. Less data to transfer. Of course one can't simply rely on disks for performance, there is Flash Storage included there for caching. III. The low latency, high-speed backbone network: InfiniBand, that interconnects all the members with: Real High Speed: 40 Gbit/s. Full Duplex, of course. Oh, and a really low latency. RDMA. Remote Direct Memory Access. This technology allows the DB nodes to do exactly that. Remotely, directly placing SQL commands into the Memory of the storage cells. Dodging all the network-stack bottlenecks, avoiding overhead, placing requests directly into the process queue. You can also run IP over InfiniBand if you please - that's the way the compute nodes can communicate with each other. IV. Including a general-purpose storage too: the ZFSSA, which is a unified storage, providing NAS and SAN access too, with the following features: NFS over RDMA over InfiniBand. Nothing is faster network-filesystem-wise. All the ZFS features onboard, hybrid storage pools, compression, deduplication, snapshot, replication, NFS and CIFS shares Storageheads in a HA-Cluster configuration providing availability of the data DTrace Live Analytics in a web-based Administration UI Being a general purpose application data storage for your non-database applications running on the SPARC SuperCluster over whichever protocol they prefer, easily replicating, snapshotting, cloning data for them. There's a lot of great technology included in Oracle's SPARC SuperCluster, we have talked its interior through. As for external scalability: you can start with a half- of full- rack SPARC SuperCluster, and scale out to several racks - that is, stacking not separate full-rack SPARC SuperClusters, but extending always one large instance of the size of several full-racks. Yes, over InfiniBand network. Add racks as you grow. What technologies shall run on it? SPARC SuperCluster is a general purpose scaleout consolidation/cloud environment. You can run Oracle Databases with RAC scaling, or Oracle Weblogic (end enjoy the SPARC T4's advantages to run Java). Remember, Oracle technologies have been integrated with the Oracle Engineered Systems - this is the Oracle on Oracle advantage. But you can run other software environments such as SAP if you please too. Run any application that runs on Oracle Solaris 10 or Solaris 11. Separate them in Virtual Machines, or even Oracle Solaris Zones, monitor and manage those from a central UI. Here the key takeaways once again: The SPARC SuperCluster: Is a pre-integrated Engineered System Contains SPARC T4-4 servers with built-in virtualization, cryptography, dynamic threading Contains the Exadata storage cells that intelligently offload the burden of the DB-nodes Contains a highly available ZFS Storage Appliance, that provides SAN/NAS storage in a unified way Combines all these elements over a high-speed, low-latency backbone network implemented with InfiniBand Can grow from a single half-rack to several full-rack size Supports the consolidation of hundreds of applications To summarize: All these technologies are great by themselves, but the real value is like in every other Oracle Engineered System: Integration. All these technologies are tuned to perform together. Together they are way more than the sum of all - and a careful and actually very time consuming integration process is necessary to orchestrate all these for performance. The SPARC SuperCluster's goal is to enable infrastructure operations and offer a pre-integrated solution that can be architected and delivered in hours instead of months of evaluations and tests. The tedious and most importantly time and resource consuming part of the work - testing and evaluating - has been done. Now go, provide services. -- charlie

Read the article
In the Cloud, Everything Costs Money

- by BuckWoody

I’ve been teaching my daughter about budgeting. I’ve explained that most of the time the money coming in is from only one or two sources – and you can only change that from time to time. The money going out, however, is to many locations, and it changes all the time. She’s made a simple debits and credits spreadsheet, and I’m having her research each part of the budget. Her eyes grow wide when she finds out everything has a cost – the house, gas for the lawnmower, dishes, water for showers, food, electricity to run the fridge, a new fridge when that one breaks, everything has a cost. She asked me “how do you pay for all this?” It’s a sentiment many adults have looking at their own budgets – and one reason that some folks don’t even make a budget. It’s hard to face up to the realities of how much it costs to do what we want to do. When we design a computing solution, it’s interesting to set up a similar budget, because we don’t always consider all of the costs associated with it. I’ve seen design sessions where the new software or servers are considered, but the “sunk” costs of personnel, networking, maintenance, increased storage, new sizes for backups and offsite storage and so on are not added in. They are already on premises, so they are assumed to be paid for already. When you move to a distributed architecture, you'll see more costs directly reflected. Store something, pay for that storage. If the system is deployed and no one is using it, you’re still paying for it. As you watch those costs rise, you might be tempted to think that a distributed architecture costs more than an on-premises one. And you might be right – for some solutions. I’ve worked with a few clients where moving to a distributed architecture doesn’t make financial sense – so we didn’t implement it. I still designed the system in a distributed fashion, however, so that when it does make sense there isn’t much re-architecting to do. In other cases, however, if you consider all of the on-premises costs and compare those accurately to operating a system in the cloud, the distributed system is much cheaper. Again, I never recommend that you take a “here-or-there-only” mentality – I think a hybrid distributed system is usually best – but each solution is different. There simply is no “one size fits all” to architecting a solution. As you design your solution, cost out each element. You might find that using a hybrid approach saves you money in one design and not in another. It’s a brave new world indeed. So yes, in the cloud, everything costs money. But an on-premises solution also costs money – it’s just that “dad” (the company) is paying for it and we don’t always see it. When we go out on our own in the cloud, we need to ensure that we consider all of the costs.

Read the article
VSDB to SSDT Part 1 : Converting projects and trimming excess files

- by Etienne Giust

Visual Studio 2012 introduces a change regarding Database Projects : they now use the SSDT technology, which means old VS2010 database projects (VSDB projects) need to be converted. Hopefully, VS2012 does that for you and it is quite painless, but in my case some unnecessary artifacts from the old project were left in place. Also, when reopening the solution, database projects appeared unconverted even if I had converted them in the previous session and saved the solution. Converting the project(s) When opening your Visual Studio 2010 solution with Visual Studio 2012, every standard project should be converted by default, but Visual Studio will ask you about your database projects : “Functional changes required Visual Studio will automatically make functional changes to the following projects in order to open them. The project behavior will change as a result. You will be able to open these projects in this version and Visual Studio 2010 SP1.” If you accept, your project is converted. And it should compile with no errors right away except if you have dependencies to dbschema files which are no longer supported. The output of a SSDT project is a dacpac file which replaces the dbschema file you were accustomed to. References to dacpac files can be added to SSDT projects in the same fashion references to dbschema could be added to VSDB projects. Cleaning up You will notice that your project file is now a sqlproj file but the old dbproj is still here. In fact at that point you can still reopen the solution in Visual Studio 2010 and everything should show up. If like me you plan on using VS2012 exclusively, you can get rid of the following files which are still on your disk and in your source control : the dbproj and dbproj.vspscc files Properties/Database.sqlcmdvars Properties/Database.sqldeployment Properties/Database.sqlpermissions Properties/Database.sqlsettings You might wonder where the information which used to be in the Properties files is now stored. Permissions : a Permissions.sql was created at the root level of your project. Note that when you create a new database project and import a database using the Schema Compare capabilities from Visual Studio, imported table and stored procedure definition files will hold the permission information (along with constraints and, indexes) SQLVars : they are defined inside the publish.xml files Deployment : they are also in the publish.xml files Settings : I was unable to find where those are now. I suppose they are not defined anymore But Visual Studio still says my database projects should be converted ! I had this error upon closing and then re-opening the solution : my database projects would appear unconverted even though I did all the necessary steps previously. Easy solution : remove those projects from the solution and add them again (the sqlproj files). More For those who run into problems when converting from VSDB to SSDT, I suggest reading the following post : http://blogs.msdn.com/b/ssdt/archive/2011/11/21/top-vsdb-gt-ssdt-project-conversion-issues.aspx Also interesting, is a side by side comparison of VSDB and SSDT project features : http://blogs.msdn.com/b/ssdt/archive/2011/11/21/sql-server-data-tools-ctp4-vs-vs2010-database-projects.aspx

Read the article
Comparison of Extreme Programming (XP) to Traditional Programming Methodologies

The comparison of extreme programming (XP) to traditional programming methodologies can find similarities between the historic biblical battle between David and Goliath. Goliath of Gath is a Philistine warrior renowned for his size, strength and battle tested skills. Much like Goliath, traditional methodologies are known to be cumbersome due to large amounts of documentation, and time consuming do to the time needed to gather all the information. However, traditional methodologies have been widely accepted by the software development community for years because of its attention to detail regarding project development and maintenance. David is a male Israelite teenager, who was small, fearless, and untrained in any type of formal combat. In a similar fashion, extreme programming focuses more on code over documentation so that time is spent on developing the project and not on cumbersome documentation of a project. Typically, project managers and developers are fearless when they start this type of project because they usually start with little to no documentation, and they expect to be given changes to be implemented at the start of every new project iteration. Because of the lack of need or desire for documentation in extreme programming projects they appear to act as if there is no formal process involved in developing an extreme programming project. This is a misnomer, because of the consistent development iterations and interaction with clients and users the quickly takes form because each iteration allows the project to be refined as the customer needs and desires change. Ravikant Agarwal and David Umphress documented a new approach to extreme programming called personal extreme programming (PXP) at the ACM Southeast Regional Conference in 2008. PXP is the application of extreme programming core concepts in a single developer team environment. PXP focuses on how to adjust the main concepts and practices of extreme programming that is typically centered in a group environment and how they can be altered to be beneficial for a single developer environment. Suzanne Smith and Sara Stoecklin are both advocates of extreme programming according to the Journal of Computing Sciences in Colleges and in fact they feel that it should receive more attention in introductory programming classes to allow students to better understand the software development process. Reasons why extreme programming is a good thing: Developers get to do more of what they love, Develop. Traditional software development methodologies tend to add additional demands on a project by requiring all requirements and project specifications to be fully defined prior to the start of the implementation phase of a project. A standard 40 hour work week. With limiting the work week to only 40 hours prevents developers from getting burned out on projects.

Read the article
Leveraging Social Networks for Retail

- by David Dorf

For retailers, social media is all about B2C2C. That is, Business to Consumer to Consumer, or more specifically, retailer to influencer to consumer. Traditional marketing targeted mass media, trying to expose the message to as many people as possible. While effective, this approach has never been very efficient, with high costs for relatively low penetration. Then it was thought that marketers should focus their efforts on a relative few super-influencers that would then sway the masses. History shows a few successes with this approach but lacked any consistency or predictability. After all, if super-influencers were easy to find, most campaigns would easily go viral. Alas, research shows that most wide-spread trends were the result of several fortunate events, including some luck. So do people exert influence over each other when it comes to purchase decisions? Of course they do, all the time. But that influence is usually limited to a small set of friends and specific specialization. For instance, although I have 165 friends on Facebook, I am only able to influence my close friends and family on PC purchases, and I have no sway at all for fashion purchases. People trust my knowledge on technology, but nobody asks my advice on shoes. How then should retailers leverage social networks in order to reinforce brand image and push promotions? Two obvious ways are Like and Share. Online advertisements or wall-postings receive more clicks when the viewer sees that friends have "liked" the posting. That's our modern-day version of word-of-mouth advertising. Statistics show that endorsements from friends make it more likely a person will engage. If my friends and I liked it, then I might also "share" (or "retweet" in the case of Twitter) it with other friends. In that case the retailer has paid for X showings of the advertisement, but sharing has pushed it to an additional Y people at no cost. And further, the implicit endorsement by the sharer makes it more likely the recipient will engage. So a good first step is to find people active in social networks that will Like and Share in order to exert influence. Its still tough to go viral, but doubling engagement is still a big step in the right direction. More complex social graph analysis would be a second step, but I'll leave that topic for another day. If you're interested in the academic side of social dynamics, I suggest reading Duncan Watts' work.

Read the article
DBCC CHECKDB on VVLDB and latches (Or: My Pain is Your Gain)

- by Argenis

Does your CHECKDB hurt, Argenis? There is a classic blog series by Paul Randal [blog|twitter] called “CHECKDB From Every Angle” which is pretty much mandatory reading for anybody who’s even remotely considering going for the MCM certification, or its replacement (the Microsoft Certified Solutions Master: Data Platform – makes my fingers hurt just from typing it). Of particular interest is the post “Consistency Options for a VLDB” – on it, Paul provides solid, timeless advice (I use the word “timeless” because it was written in 2007, and it all applies today!) on how to perform checks on very large databases. Well, here I was trying to figure out how to make CHECKDB run faster on a restored copy of one of our databases, which happens to exceed 7TB in size. The whole thing was taking several days on multiple systems, regardless of the storage used – SAS, SATA or even SSD…and I actually didn’t pay much attention to how long it was taking, or even bothered to look at the reasons why - as long as it was finishing okay and found no consistency errors. Yes – I know. That was a huge mistake, as corruption found in a database several days after taking place could only allow for further spread of the corruption – and potentially large data loss. In the last two weeks I increased my attention towards this problem, as we noticed that CHECKDB was taking EVEN LONGER on brand new all-flash storage in the SAN! I couldn’t really explain it, and were almost ready to blame the storage vendor. The vendor told us that they could initially see the server driving decent I/O – around 450Mb/sec, and then it would settle at a very slow rate of 10Mb/sec or so. “Hum”, I thought – “CHECKDB is just not pushing the I/O subsystem hard enough”. Perfmon confirmed the vendor’s observations. Dreaded @BlobEater What was CHECKDB doing all the time while doing so little I/O? Eating Blobs. It turns out that CHECKDB was taking an extremely long time on one of our frankentables, which happens to be have 35 billion rows (yup, with a b) and sucks up several terabytes of space in the database. We do have a project ongoing to purge/split/partition this table, so it’s just a matter of time before we deal with it. But the reality today is that CHECKDB is coming to a screeching halt in performance when dealing with this particular table. Checking sys.dm_os_waiting_tasks and sys.dm_os_latch_stats showed that LATCH_EX (DBCC_OBJECT_METADATA) was by far the top wait type. I remembered hearing recently about that wait from another post that Paul Randal made, but that was related to computed-column indexes, and in fact, Paul himself reminded me of his article via twitter. But alas, our pathologic table had no non-clustered indexes on computed columns. I knew that latches are used by the database engine to do internal synchronization – but how could I help speed this up? After all, this is stuff that doesn’t have a lot of knobs to tweak. (There’s a fantastic level 500 talk by Bob Ward from Microsoft CSS [blog|twitter] called “Inside SQL Server Latches” given at PASS 2010 – and you can check it out here. DISCLAIMER: I assume no responsibility for any brain melting that might ensue from watching Bob’s talk!) Failed Hypotheses Earlier on this week I flew down to Palo Alto, CA, to visit our Headquarters – and after having a great time with my Monkey peers, I was relaxing on the plane back to Seattle watching a great talk by SQL Server MVP and fellow MCM Maciej Pilecki [twitter] called “Masterclass: A Day in the Life of a Database Transaction” where he discusses many different topics related to transaction management inside SQL Server. Very good stuff, and when I got home it was a little late – that slow DBCC CHECKDB that I had been dealing with was way in the back of my head. As I was looking at the problem at hand earlier on this week, I thought “How about I set the database to read-only?” I remembered one of the things Maciej had (jokingly) said in his talk: “if you don’t want locking and blocking, set the database to read-only” (or something to that effect, pardon my loose memory). I immediately killed the CHECKDB which had been running painfully for days, and set the database to read-only mode. Then I ran DBCC CHECKDB against it. It started going really fast (even a bit faster than before), and then throttled down again to around 10Mb/sec. All sorts of expletives went through my head at the time. Sure enough, the same latching scenario was present. Oh well. I even spent some time trying to figure out if NUMA was hurting performance. Folks on Twitter made suggestions in this regard (thanks, Lonny! [twitter]) …Eureka? This past Friday I was still scratching my head about the whole thing; I was ready to start profiling with XPERF to see if I could figure out which part of the engine was to blame and then get Microsoft to look at the evidence. After getting a bunch of good news I’ll blog about separately, I sat down for a figurative smack down with CHECKDB before the weekend. And then the light bulb went on. A sparse column. I thought that I couldn’t possibly be experiencing the same scenario that Paul blogged about back in March showing extreme latching with non-clustered indexes on computed columns. Did I even have a non-clustered index on my sparse column? As it turns out, I did. I had one filtered non-clustered index – with the sparse column as the index key (and only column). To prove that this was the problem, I went and setup a test. Yup, that'll do it The repro is very simple for this issue: I tested it on the latest public builds of SQL Server 2008 R2 SP2 (CU6) and SQL Server 2012 SP1 (CU4). First, create a test database and a test table, which only needs to contain a sparse column: CREATE DATABASE SparseColTest; GO USE SparseColTest; GO CREATE TABLE testTable (testCol smalldatetime SPARSE NULL); GO INSERT INTO testTable (testCol) VALUES (NULL); GO 1000000 That’s 1 million rows, and even though you’re inserting NULLs, that’s going to take a while. In my laptop, it took 3 minutes and 31 seconds. Next, we run DBCC CHECKDB against the database: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; This runs extremely fast, as least on my test rig – 198 milliseconds. Now let’s create a filtered non-clustered index on the sparse column: CREATE NONCLUSTERED INDEX [badBadIndex] ON testTable (testCol) WHERE testCol IS NOT NULL; With the index in place now, let’s run DBCC CHECKDB one more time: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; In my test system this statement completed in 11433 milliseconds. 11.43 full seconds. Quite the jump from 198 milliseconds. I went ahead and dropped the filtered non-clustered indexes on the restored copy of our production database, and ran CHECKDB against that. We went down from 7+ days to 19 hours and 20 minutes. Cue the “Argenis is not impressed” meme, please, Mr. LaRock. My pain is your gain, folks. Go check to see if you have any of such indexes – they’re likely causing your consistency checks to run very, very slow. Happy CHECKDBing, -Argenis ps: I plan to file a Connect item for this issue – I consider it a pretty serious bug in the engine. After all, filtered indexes were invented BECAUSE of the sparse column feature – and it makes a lot of sense to use them together. Watch this space and my twitter timeline for a link.

Read the article
Social Analytics and the Customer

- by David Dorf

Many successful retailers put the customer at the center of everything they do, so its important that the customer is modeled correctly across all their systems. The path to omni-channel starts and ends with the customer so at ARTS, our next big project is focused on ensuring a consistent representation of customers across our transactional data model, datawarehouse model, and XML schemas. Further, we've started a new whitepaper that describes how Big Data and Social Media Analytics should be leveraged by retailers to add and additional level of customer insight. Let's start by taking a closer look at the meaning of social analytics. Here's my definition: Social Analytics, in the retail context, describes the analysis of data obtained from social media sources in an effort to better comprehend and interact with the community of consumers. This discipline seeks to understand what’s being said by the community about brands and products (“monitoring”), as well as understand the behaviors of those in the community (“profiling”). The results are used to enforce the brand image, improve product decisions, and better focus marketing, all of which lead to increased sales. To help illustrate the facets of social analytics, I drew the diagram below which was originally published by Retail Touchpoints. There are lots of tools on the market that allow retailers to monitor social media for brand and product mentions. These include analysis of sentiment, reach, share of voice, engagement, etc. When your brand is mentioned, good or bad, its an opportunity to engage with the customer and possibly lead to a sale. Because products are not always unique, its much more difficult to monitor product mentions, but detecting product trends early can help a retailer make better merchandising decisions, especially in fashion. Once a retailer understands what's being said, the next step is learn more about who's saying it. That involves profiling customers beyond simple demographics to understand their motivations. Much can be learned from patterns, and even more when customers voluntarily share their data. Knowing that a customer is passionate about, for example, mountain biking allows the retailer to make relevant offers on helmets, ask for opinions on hydration, and help spread marketing messages. Social analytics has many facets that benefit retailers, some of which are easy but many of which are hard. Its important for the CMO and CIO to work closely together to plan for these capabilities and monitor the maturity of tools on the market. This is an area that will separate winners from losers.

Read the article
Are Chromebooks the New Netbooks, and What Does That Mean?

- by Chris Hoffman

Netbooks — small, cheap, slow laptops — were once very popular. They fell out of favor — people bought them because they seemed cheap and portable, but the actual experience was lackluster. Most netbooks now sit unused. Windows netbooks have vanished from stores today, but there’s a new super-cheap laptop — the Chromebook. Chromebook sales numbers are impressive, but their usage statistics tell a different story. Are Chromebooks just the new netbook? The Problem With Netbooks Netbooks seemed appealing, especially in an age before tablets and lightweight ultrabooks. You could buy a netbook for $200 or so and have a portable device that let you get on the Internet. The name “netbook” spelled that out — it was a portable device for getting on the ‘net. They weren’t really that great. The original netbook was a lightweight Asus Eee PC that ran Linux alone and had a small amount of fast flash storage. Netbooks eventually ran heavier Windows XP operating systems — Windows Vista was out, but it was just too bloated to run on netbooks. Manufacturers added slow magnetic hard drives, bloatware, and even DVD drives! They couldn’t run most Windows software very well. The build quality was poor and their keyboards were tiny and cramped. People liked the idea of a lightweight device that let them get on the Internet and loved the cheap price, but the actual experience wasn’t great. Chromebook Sales Chromebook sales numbers seem surprisingly high. NPD reported that Chromebooks were 21% of all notebooks sold in the US in 2013. If you combine laptop and tablet sales into a single statistic, Chromebooks were 9.6% of all those devices sold. That’s 2/3 as many Chromebooks sold as iPads in the US! Of Amazon’s best-selling laptop computers, two of the top three are Chromebooks. These definitely look like successful products. Unlike netbooks, Chromebooks are taking off in a big way in the education market. Many schools are buying Chromebooks for their students instead of more expensive Windows laptops. They’re easier to manage and lock down than Windows laptops, but — more importantly for cash-strapped schools — they’re very cheap. Netbooks never had this sort of momentum in schools. Chromebook Usage Statistics Here’s where the rosy picture of Chromebooks starts to become more realistic. StatCounter’s browser usage statistics show how widely used different operating systems are. For example, Windows 7 has the highest share with 35.71% of web activity in April, 2014. The chart doesn’t even show Chrome OS at all, although there is an “Other” number near the bottom. Click the Download Data link to download a CSV file and we can view more detailed information. Chrome OS only accounted for 0.38% of web usage in April, 2014. Desktop Linux, which people often shrug at, accounted for 1.52% in the same month. To its credit, Chrome OS usage has increased. Chromebooks were widely mocked back in November, 2013 when the sales numbers came out. After all, they only accounted for 0.11% of web usage globally in November, 2013! But Chrome OS numbers have been improving: Nov, 2013: 0.11% Dec, 2013: 0.22% Jan, 2014: 0.31% Feb, 2014: 0.35% Mar, 2014: 0.36% Apr, 2014: 0.38% Chrome OS is climbing, but it’s definitely still in the “Other” category. It isn’t as high as we’d expect to see it with those types of sales numbers. Chromebooks vs. Netbooks Chromebooks are more limited devices than traditional PCs. You can do quite a few things, but you have to do it all using Chrome or Chrome apps. Most people won’t be enabling developer mode and installing a Linux desktop. You don’t have access to the powerful desktop software available for Windows and even Mac OS X. On the other hand, these Chromebooks are less compromised than netbooks in many ways. They come with a lightweight operating system designed for portable, mobile devices. They don’t come packed with any bloatware, like the bloatware you’ll find on competing Windows PCs and the original netbooks. They’re cheaper because the manufacturer doesn’t have to pay for a Windows license. There’s no need for antivirus software weighing the operating system down. They’re larger than the original netbooks, with many of them being 11.6-inches instead of the original 8-inch bodies many older netbooks came with. They have larger, more comfortable keyboards and fast solid-state storage. Really, Chromebooks are what netbooks wanted to be. People didn’t buy netbooks to use typical Windows software — they just wanted a lightweight PC. Of course, for many people, the real successor to netbooks is tablets. If all you want is a portable device to throw in a bag so you can get online, maybe a tablet is better. Where Does This Leave Chromebooks? So, are Chromebooks the new netbooks? It’s a bit early to answer that question. Chromebooks are definitely not out of the competition — their sales look good and their usage share is increasing. On the other hand, Chrome OS is still pretty far behind. They’re not catching fire like tablets did. Maybe netbooks were just before their time and Chromebooks were what they were always meant to be. Just as Microsoft’s Windows XP tablets failed, Windows XP netbooks also failed. Tablets took off with a more refined operating system on better hardware years later. “Netbooks” — or Chromebooks — are now taking off with a more purpose-built operating system on better hardware, too. It’s hard to count Chromebooks out because they provide a much better experience than netbooks ever did. If you’re one of the people who wants to use old Windows desktop apps on your portable laptop, you may think netbooks were better — but most people don’t want that. But maybe people either want a full desktop PC experience or a full mobile tablet experience. Is there a place for a laptop with a keyboard that can only view websites? We’ll have to wait and see. Image Credit: Kevin Jarret on Flickr, Clive Darra on Flickr, Sean Freese on Flickr

Read the article
Informed TDD – Kata “To Roman Numerals”

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons. PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

Read the article
Internal Mutation of Persistent Data Structures

- by Greg Ros

To clarify, when I mean use the terms persistent and immutable on a data structure, I mean that: The state of the data structure remains unchanged for its lifetime. It always holds the same data, and the same operations always produce the same results. The data structure allows Add, Remove, and similar methods that return new objects of its kind, modified as instructed, that may or may not share some of the data of the original object. However, while a data structure may seem to the user as persistent, it may do other things under the hood. To be sure, all data structures are, internally, at least somewhere, based on mutable storage. If I were to base a persistent vector on an array, and copy it whenever Add is invoked, it would still be persistent, as long as I modify only locally created arrays. However, sometimes, you can greatly increase performance by mutating a data structure under the hood. In more, say, insidious, dangerous, and destructive ways. Ways that might leave the abstraction untouched, not letting the user know anything has changed about the data structure, but being critical in the implementation level. For example, let's say that we have a class called ArrayVector implemented using an array. Whenever you invoke Add, you get a ArrayVector build on top of a newly allocated array that has an additional item. A sequence of such updates will involve n array copies and allocations. Here is an illustration: However, let's say we implement a lazy mechanism that stores all sorts of updates -- such as Add, Set, and others in a queue. In this case, each update requires constant time (adding an item to a queue), and no array allocation is involved. When a user tries to get an item in the array, all the queued modifications are applied under the hood, requiring a single array allocation and copy (since we know exactly what data the final array will hold, and how big it will be). Future get operations will be performed on an empty cache, so they will take a single operation. But in order to implement this, we need to 'switch' or mutate the internal array to the new one, and empty the cache -- a very dangerous action. However, considering that in many circumstances (most updates are going to occur in sequence, after all), this can save a lot of time and memory, it might be worth it -- you will need to ensure exclusive access to the internal state, of course. This isn't a question about the efficacy of such a data structure. It's a more general question. Is it ever acceptable to mutate the internal state of a supposedly persistent or immutable object in destructive and dangerous ways? Does performance justify it? Would you still be able to call it immutable? Oh, and could you implement this sort of laziness without mutating the data structure in the specified fashion?

Read the article
Profiling Startup Of VS2012 – SpeedTrace Profiler

- by Alois Kraus

SpeedTrace is a relatively unknown profiler made a company called Ipcas. A single professional license does cost 449€+VAT. For the test I did use SpeedTrace 4.5 which is currently Beta. Although it is cheaper than dotTrace it has by far the most options to influence how profiling does work. First you need to create a tracing project which does configure tracing for one process type. You can start the application directly from the profiler or (much more interesting) it does attach to a specific process when it is started. For this you need to check “Trace the specified …” radio button and enter the process name in the “Process Name of the Trace” edit box. You can even selectively enable tracing for processes with a specific command line. Then you need to activate the trace project by pressing the Activate Project button and you are ready to start VS as usual. If you want to profile the next 10 VS instances that you start you can set the Number of Processes counter to e.g. 10. This is immensely helpful if you are trying to profile only the next 5 started processes. As you can see there are many more tabs which do allow to influence tracing in a much more sophisticated way. SpeedTrace is the only profiler which does not rely entirely on the profiling Api of .NET. Instead it does modify the IL code (instrumentation on the fly) to write tracing information to disc which can later be analyzed. This approach is not only very fast but it does give you unprecedented analysis capabilities. Once the traces are collected they do show up in your workspace where you can open the trace viewer. I do skip the other windows because this view is by far the most useful one. You can sort the methods not only by Wall Clock time but also by CPU consumption and wait time which none of the other products support in their views at the same time. If you want to optimize for CPU consumption sort by CPU time. If you want to find out where most time is spent you need Clock Total time and Clock Waiting. There you can directly see if the method did take long because it did wait on something or it did really execute stuff that did take so long. Once you have found a method you want to drill deeper you can double click on a method to get to the Caller/Callee view which is similar to the JetBrains Method Grid view. But this time you do see much more. In the middle is the clicked method. Above are the methods that call you and below are the methods that you do directly call. Normally you would then start digging deeper to find the end of the chain where the slow method worth optimizing is located. But there is a shortcut. You can press the magic button to calculate the aggregation of all called methods. This is displayed in the lower left window where you can see each method call and how long it did take. There you can also sort to see if this call stack does only contain methods (e.g. WCF connect calls which you cannot make faster) not worth optimizing. YourKit has a similar feature where it is called Callees List. In the Functions tab you have in the context menu also many other useful analysis options One really outstanding feature is the View Call History Drilldown. When you select this one you get not a sum of all method invocations but a list with the duration of each method call. This is not surprising since SpeedTrace does use tracing to get its timings. There you can get many useful graphs how this method did behave over time. Did it become slower at some point in time or was only the first call slow? The diagrams and the list will tell you that. That is all fine but what should I do when one method call was slow? I want to see from where it was coming from. No problem select the method in the list hit F10 and you get the call stack. This is a life saver if you e.g. search for serialization problems. Today Serializers are used everywhere. You want to find out from where the 5s XmlSerializer.Deserialize call did come from? Hit F10 and you get the call stack which did invoke the 5s Deserialize call. The CPU timeline tab is also useful to find out where long pauses or excessive CPU consumption did happen. Click in the graph to get the Thread Stacks window where you can get a quick overview what all threads were doing at this time. This does look like the Stack Traces feature in YourKit. Only this time you get the last called method first which helps to quickly see what all threads were executing at this moment. YourKit does generate a rather long list which can be hard to go through when you have many threads. The thread list in the middle does not give you call stacks or anything like that but you see which methods were found most often executing code by the profiler which is a good indication for methods consuming most CPU time. This does sound too good to be true? I have not told you the best part yet. The best thing about this profiler is the staff behind it. When I do see a crash or some other odd behavior I send a mail to Ipcas and I do get usually the next day a mail that the problem has been fixed and a download link to the new version. The guys at Ipcas are even so helpful to log in to your machine via a Citrix Client to help you to get started profiling your actual application you want to profile. After a 2h telco I was converted from a hater to a believer of this tool. The fast response time might also have something to do with the fact that they are actively working on 4.5 to get out of the door. But still the support is by far the best I have encountered so far. The only downside is that you should instrument your assemblies including the .NET Framework to get most accurate numbers. You can profile without doing it but then you will see very high JIT times in your process which can severely affect the correctness of the measured timings. If you do not care about exact numbers you can also enable in the main UI in the Data Trace tab logging of method arguments of primitive types. If you need to know what files at which times were opened by your application you can find it out without a debugger. Since SpeedTrace does read huge trace files in its reader you should perhaps use a 64 bit machine to be able to analyze bigger traces as well. The memory consumption of the trace reader is too high for my taste. But they did promise for the next version to come up with something much improved.

Read the article
The Future of Project Management is Social

- by Natalia Rachelson

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} A guest post by Kazim Isfahani, Director, Product Marketing, Oracle Rapid Ascent. Breakneck Speed. Lightning Fast. Perhaps even overwhelming. No matter which set of adjectives we use to describe it, social media’s rise into the enterprise mainstream has been unprecedented. Indeed, the big 4 social media powerhouses (Facebook, Google+, LinkedIn, and Twitter), have nearly 2 Billion users between them. You may be asking (as you should really) “That’s all well and good for the consumer, but for me at my company, what’s your point? Beyond the fact that I can check and post updates, that is.” Good question, kind sir. Impact of Social and Collaboration on Project Management I’ll dovetail this discussion to the project management realm, since that’s what I’m writing about. Speed is a big challenge for project-driven organizations. Anything that can help speed up project delivery - be it a new product introduction effort or a geographical expansion project - fast is a good thing. So where does this whole social thing fit particularly since there are already a host of tools to help with traditional project execution? The fact is companies have seen improvements in their productivity by deploying departmental collaboration and other social-oriented solutions. McKinsey’s survey on social tools shows we have reached critical scale: 72% of respondents report that their companies use at least one and over 40% say they are using social networks and blogs. We don’t hear as much about the impact of social media technologies at the project and project manager level, but that does not mean there is none. Consider the new hire. The type of individual entering the workforce and executing on projects is a generation of worker expecting visually appealing, easy to use and easy to understand technology meshing hand-in-hand with business processes. Consider the project manager. The social era has enhanced the role that the project manager must play. Today’s project manager must be a supreme communicator, an influencer, a sympathizer, a negotiator, and still manage to keep all stakeholders in the loop on project progress. Social tools play a significant role in this effort. Now consider the impact to the project team. The way that a project team functions has changed, with newer, social oriented technologies making the process of information dissemination and team communications much more fluid. It’s clear that a shift is occurring where “social” is intersecting with project management. The Rise of Social Project Management We refer to the melding of project management and social networking as Social Project Management. Social Project Management is based upon the philosophy that the project team is one part of an integrated whole, and that valuable and unique abilities exist within the larger organization. For this reason, Social Project Management systems should be integrated into the collaborative platform(s) of an organization, allowing communication to proceed outside the project boundaries. What makes social project management "social" is an implicit awareness where distributed teams build connected links in ways that were previously restricted to teams that were co-located. Just as critical, Social Project Management embraces the vision of seamless online collaboration within a project team, but also provides for, (and enhances) the use of rigorous project management techniques. Social Project Management acknowledges that projects (particularly large projects) are a social activity - people doing work with people, for other people, with commitments to yet other people. The more people (larger projects), the more interpersonal the interactions, and the more social affects the project. The Epitome of Social - Fusion Project Portfolio Management If I take this one level further to discuss Fusion Project Portfolio Management, the notion of Social Project Management is on full display. With Fusion Project Portfolio Management, project team members have a single place for interaction on projects and access to any other resources working within the Fusion ERP applications. This allows team members the opportunity to be informed with greater participation and provide better information. The application’s the visual appeal, and highly graphical nature makes it easy to navigate information. The project activity stream adds to the intuitive user experience. The goal of productivity is pervasive throughout Fusion Project Portfolio Management. Field research conducted with Oracle customers and partners showed that users needed a way to stay in the context of their core transactions and yet easily access social networking tools. This is manifested in the application so when a user executes a business process, they not only have the transactional application at their fingertips, but also have things like e-mail, SMS, text, instant messaging, chat – all providing a number of different ways to interact with people and/or groups of people, both internal and external to the project and enterprise. But in the end, connecting people is relatively easy. The larger issue is finding a way to serve up relevant, system-generated, actionable information, in real time, which will allow for more streamlined execution on key business processes. Fusion Project Portfolio Management’s design concept enables users to create project communities, establish discussion threads, manage event calendars as well as deliver project based work spaces to organize communications within the context of a project – all within a secure business environment. We’d love to hear from you and get your thoughts and ideas about how Social Project Management is impacting your organization. To learn more about Oracle Fusion Project Portfolio Management, please visit this link

Read the article
Refactoring Part 1 : Intuitive Investments

- by Wes McClure

Fear, it’s what turns maintaining applications into a nightmare. Technology moves on, teams move on, someone is left to operate the application, what was green is now perceived brown. Eventually the business will evolve and changes will need to be made. The approach to those changes often dictates the long term viability of the application. Fear of change, lack of passion and a lack of interest in understanding the domain often leads to a paranoia to do anything that doesn’t involve duct tape and bailing twine. Don’t get me wrong, those have a place in the short term viability of a project but they don’t have a place in the long term. Add to it “us versus them” in regards to the original team and those that maintain it, internal politics and other factors and you have a recipe for disaster. This results in code that quickly becomes unmanageable. Even the most clever of designs will eventually become sub optimal and debt will amount that exponentially makes changes difficult. This is where refactoring comes in, and it’s something I’m very passionate about. Refactoring is about improving the process whereby we make change, it’s an exponential investment in the process of change. Without it we will incur exponential complexity that halts productivity. Investments, especially in the long term, require intuition and reflection. How can we tackle new development effectively via evolving the original design and paying off debt that has been incurred? The longer we wait to ask and answer this question, the more it will cost us. Small requests don’t warrant big changes, but realizing when changes now will pay off in the long term, and especially in the short term, is valuable. I have done my fair share of maintaining applications and continuously refactoring as needed, but recently I’ve begun work on a project that hasn’t had much debt, if any, paid down in years. This is the first in a series of blog posts to try to capture the process which is largely driven by intuition of smaller refactorings from other projects. Signs that refactoring could help: Testability How can decreasing test time not pay dividends? One of the first things I found was that a very important piece often takes 30+ minutes to test. I can only imagine how much time this has cost historically, but more importantly the time it might cost in the coming weeks: I estimate at least 10-20 hours per person! This is simply unacceptable for almost any situation. As it turns out, about 6 hours of working with this part of the application and I was able to cut the time down to under 30 seconds! In less than the lost time of one week, I was able to fix the problem for all future weeks! If we can’t test fast then we can’t change fast, nor with confidence. Code is used by end users and it’s also used by developers, consider your own needs in terms of the code base. Adding logic to enable/disable features during testing can help decouple parts of an application and lead to massive improvements. What exactly is so wrong about test code in real code? Often, these become features for operators and sometimes end users. If you cannot run an integration test within a test runner in your IDE, it’s time to refactor. Readability Are variables named meaningfully via a ubiquitous language? Is the code segmented functionally or behaviorally so as to minimize the complexity of any one area? Are aspects properly segmented to avoid confusion (security, logging, transactions, translations, dependency management etc) Is the code declarative (what) or imperative (how)? What matters, not how. LINQ is a great abstraction of the what, not how, of collection manipulation. The Reactive framework is a great example of the what, not how, of managing streams of data. Are constants abstracted and named, or are they just inline? Do people constantly bitch about the code/design? If the code is hard to understand, it will be hard to change with confidence. It’s a large undertaking if the original designers didn’t pay much attention to readability and as such will never be done to “completion.” Make sure not to go over board, instead use this as you change an application, not in lieu of changes (like with testability). Complexity Simplicity will never be achieved, it’s highly subjective. That said, a lot of code can be significantly simplified, tidy it up as you go. Refactoring will often converge upon a simplification step after enough time, keep an eye out for this. Understandability In the process of changing code, one often gains a better understanding of it. Refactoring code is a good way to learn how it works. However, it’s usually best in combination with other reasons, in effect killing two birds with one stone. Often this is done when readability is poor, in which case understandability is usually poor as well. In the large undertaking we are making with this legacy application, we will be replacing it. Therefore, understanding all of its features is important and this refactoring technique will come in very handy. Unused code How can deleting things not help? This is a freebie in refactoring, it’s very easy to detect with modern tools, especially in statically typed languages. We have VCS for a reason, if in doubt, delete it out (ok that was cheesy)! If you don’t know where to start when refactoring, this is an excellent starting point! Duplication Do not pray and sacrifice to the anti-duplication gods, there are excellent examples where consolidated code is a horrible idea, usually with divergent domains. That said, mediocre developers live by copy/paste. Other times features converge and aren’t combined. Tools for finding similar code are great in the example of copy/paste problems. Knowledge of the domain helps identify convergent concepts that often lead to convergent solutions and will give intuition for where to look for conceptual repetition. 80/20 and the Boy Scouts It’s often said that 80% of the time 20% of the application is used most. These tend to be the parts that are changed. There are also parts of the code where 80% of the time is spent changing 20% (probably for all the refactoring smells above). I focus on these areas any time I make a change and follow the philosophy of the Boy Scout in cleaning up more than I messed up. If I spend 2 hours changing an application, in the 20%, I’ll always spend at least 15 minutes cleaning it or nearby areas. This gives a huge productivity edge on developers that don’t. Ironically after a short period of time the 20% shrinks enough that we don’t have to spend 80% of our time there and can move on to other areas. Refactoring is highly subjective, never attempt to refactor to completion! Learn to be comfortable with leaving one part of the application in a better state than others. It’s an evolution, not a revolution. These are some simple areas to look into when making changes and can help get one started in the process. I’ve often found that refactoring is a convergent process towards simplicity that sometimes spans a few hours but often can lead to massive simplifications over the timespan of weeks and months of regular development.

Read the article
New channels for Exadata 11.2.3.1.1

- by Rene Kundersma

With the release of Exadata 11.2.3.1.0 back in April 2012 Oracle has deprecated the minimal pack for the Exadata Database Servers (compute nodes). From that release the Linux Database Server updates will be done using ULN and YUM. For the 11.2.3.1.0 release the ULN exadata_dbserver_11.2.3.1.0_x86_64_base channel was made available and Exadata operators could subscribe their system to it via linux.oracle.com. With the new 11.2.3.1.1 release two additional channels are added: a 'latest' channel (exadata_dbserver_11.2_x86_64_latest) a 'patch' channel (exadata_dbserver_11.2_x86_64_patch) The patch channel has the new or updated packages updated in 11.2.3.1.1 from the base channel. The latest channel has all the packages from 11.2.3.1.0 base and patch channels combined. From here there are three possible situations a Database Server can be in before it can be updated to 11.2.3.1.1: Database Server is on Exadata release < 11.2.3.1.0 Database Server is patched to 11.2.3.1.0 Database Server is freshly imaged to 11.2.3.1.0 In order to bring a Database Server to 11.2.3.1.1 for all three cases the same approach for updating can be used (using YUM), but there are some minor differences: For Database Servers on a release < 11.2.3.1.0 the following high-level steps need to be performed: Subscribe to el5_x86_64_addons, ol5_x86_64_latest and exadata_dbserver_11.2_x86_64_latest Create local repository Point Database Server to the local repository* install the update * during this process a one-time action needs to be done (details in the README) For Database Servers patched to 11.2.3.1.0: Subscribe to patch channel exadata_dbserver_11.2_x86_64_patch Create local repository Point Database Server to the local repository Update the system For Database Servers freshly imaged to 11.2.3.1.0: Subscribe to patch channel exadata_dbserver_11.2_x86_64_patch Create local repository Point Database Server to the local repository Update the system The difference between 'situation 2' (Database Server is patched to 11.2.3.1.0) and 'situation 3' (Database Server is freshly imaged to 11.2.3.1.0) is that in situation 2 the existing Exadata-computenode.repo file needs to be edited while in situation 3 this file is not existing and needs to be created or copied. Another difference is that you will end up with more OFA packages installed in situation 2. This is because none are removed during the updating process. The YUM update functionality with the new channels is a great enhancements to the Database Server update procedure. As usual, the updates can be done in a rolling fashion so no database service downtime is required. For detailed and up-to-date instructions always see the patch README's 1466459.1 patch 13998727 888828.1 Rene Kundersma

Read the article
Organizing Git repositories with common nested sub-modules

- by André Caron

I'm a big fan of Git sub-modules. I like to be able to track a dependency along with its version, so that you can roll-back to a previous version of your project and have the corresponding version of the dependency to build safely and cleanly. Moreover, it's easier to release our libraries as open source projects as the history for libraries is separate from that of the applications that depend on them (and which are not going to be open sourced). I'm setting up workflow for multiple projects at work, and I was wondering how it would be if we took this approach a bit of an extreme instead of having a single monolithic project. I quickly realized there is a potential can of worms in really using sub-modules. Supposing a pair of applications: studio and player, and dependent libraries core, graph and network, where dependencies are as follows: core is standalone graph depends on core (sub-module at ./libs/core) network depdends on core (sub-module at ./libs/core) studio depends on graph and network (sub-modules at ./libs/graph and ./libs/network) player depends on graph and network (sub-modules at ./libs/graph and ./libs/network) Suppose that we're using CMake and that each of these projects has unit tests and all the works. Each project (including studio and player) must be able to be compiled standalone to perform code metrics, unit testing, etc. The thing is, a recursive git submodule fetch, then you get the following directory structure: studio/ studio/libs/ (sub-module depth: 1) studio/libs/graph/ studio/libs/graph/libs/ (sub-module depth: 2) studio/libs/graph/libs/core/ studio/libs/network/ studio/libs/network/libs/ (sub-module depth: 2) studio/libs/network/libs/core/ Notice that core is cloned twice in the studio project. Aside from this wasting disk space, I have a build system problem because I'm building core twice and I potentially get two different versions of core. Question How do I organize sub-modules so that I get the versioned dependency and standalone build without getting multiple copies of common nested sub-modules? Possible solution If the the library dependency is somewhat of a suggestion (i.e. in a "known to work with version X" or "only version X is officially supported" fashion) and potential dependent applications or libraries are responsible for building with whatever version they like, then I could imagine the following scenario: Have the build system for graph and network tell them where to find core (e.g. via a compiler include path). Define two build targets, "standalone" and "dependency", where "standalone" is based on "dependency" and adds the include path to point to the local core sub-module. Introduce an extra dependency: studio on core. Then, studio builds core, sets the include path to its own copy of the core sub-module, then builds graph and network in "dependency" mode. The resulting folder structure looks like: studio/ studio/libs/ (sub-module depth: 1) studio/libs/core/ studio/libs/graph/ studio/libs/graph/libs/ (empty folder, sub-modules not fetched) studio/libs/network/ studio/libs/network/libs/ (empty folder, sub-modules not fetched) However, this requires some build system magic (I'm pretty confident this can be done with CMake) and a bit of manual work on the part of version updates (updating graph might also require updating core and network to get a compatible version of core in all projects). Any thoughts on this?

Read the article

Search Results

Search found 6634 results on 266 pages for 'fast fashion'.

Page 118/266 | < Previous Page | 114 115 116 117 118 119 120 121 122 123 124 125 | Next Page >

- by Joe Lamantia

- by BuckWoody

- by pinaldave

- by Etienne Giust

- by mac

- by Dave

- by Pinal Dave

- by Reed

- by costlow

- by Sridhar_R-Oracle

- by Karoly Vegh

- by BuckWoody

- by Etienne Giust

- by David Dorf

- by Argenis

- by David Dorf

- by Chris Hoffman

- by Ralf Westphal

- by Greg Ros

- by Alois Kraus

- by Natalia Rachelson

- by Wes McClure

- by Rene Kundersma

- by André Caron

< Previous Page | 114 115 116 117 118 119 120 121 122 123 124 125 | Next Page >