dunn less - Page 306 - Developer IT

How To Remove People and Objects From Photographs In Photoshop

- by Eric Z Goodnight

You might think that it’s a complicated process to remove objects from photographs. But really Photoshop makes it quite simple, even when removing all traces of a person from digital photographs. Read on to see just how easy it is. Photoshop was originally created to be an image editing program, and it excels at it. With hardly any Photoshop experience, any beginner can begin removing objects or people from their photos. Have some friends that photobombed an otherwise great pic? Tell them to say their farewells, because here’s how to get rid of them with Photoshop! Tools for Removing Objects Removing an object is not really “magical” work. Your goal is basically to cover up the information you don’t want in an image with information you do want. In this sample image, we want to remove the cigar smoking man, and leave the geisha. Here’s a couple of the tools that can be useful to work with when attempting this kind of task. Clone Stamp and Pattern Stamp Tool: Samples parts of your image from your background, and allows you to paint into your image with your mouse or stylus. Eraser and Brush Tools: Paint flat colors and shapes, and erase cloned layers of image information. Basic, down and dirty photo editing tools. Pen, Quick Selection, Lasso, and Crop tools: Select, isolate, and remove parts of your image with these selection tools. All useful in their own way. Some, like the pen tool, are nightmarishly tough on beginners. Remove a Person with the Clone Stamp Tool (Video) The video above uses the Clone Stamp tool to sample and paint with the background texture. It’s a simple tool to use, although it can be confusing, possibly counter-intuitive. Here’s some pointers, in addition to the video above. Select shortcut key to choose the Clone tool stamp from the Tools Panel. Always create a copy of your background layer before doing heavy edits by right clicking on the background in your Layers Panel and selecting “Duplicate.” Hold with the Clone Tool selected, and click anywhere in your image to sample that area. When you’re sampling an area, your cursor is “Aligned” with your sample area. When you paint, your sample area moves. You can turn the “Aligned” setting off by clicking the in the Options Panel at the top of your screen if you want. Change your brush size and hardness as shown in the video by right-clicking in your image. Use your lasso to copy and paste pieces of your image in order to cover up any parts that seem appropriate. Photoshop Magic with the “Content-Aware Fill” One of the hallmark features of CS5 is the “Content-Aware Fill.” Content aware fill can be an excellent shortcut to removing objects and even people in Photoshop, but it is somewhat limited, and can get confused. Here’s a basic rundown on how it works. Select an object using your Lasso tool, shortcut key . The Lasso works fine as this selection can be rough. Navigate to Edit > Fill, and select “Content-Aware,” as illustrated above, from the pull-down menu. It’s surprisingly simple. After some processing, Photoshop has done the work of removing the object for you. It takes a few moments, and it is not perfect, so be prepared to touch it up with some Copy-Paste, or some Clone stamp action. Content Aware Fill Has Its Limits Keep in mind that the Content Aware Fill is meant to be used with other techniques in mind. It doesn’t always perform perfectly, but can give you a great starting point. Take this image for instance. It is actually plausible to hide this figure and make this image look like he was never there at all. With a selection made with the Lasso tool, navigate to Edit > Fill and select “Content Aware” again. The result is surprisingly good, but as you can see, worthy of some touch up. With a result like this one, you’ll have to get your hands dirty with copy-paste to create believable lines in the background. With many photographs, Content Aware Fill will simply get confused and give you results you won’t be happy with. Additional Touch Up for Bad Background Textures with the Pattern Stamp Tool For the perfectionist, cleaning up the lumpy looking textures that the Clone Stamp can leave is fairly simple using the Pattern Stamp Tool. Sample an piece of your image with your Marquee Tool, shortcut key . Navigate to Edit > Define Pattern to create a new Pattern from your selection. Click OK to continue. Click and hold down on the Clone Stamp tool in your Tools Panel until you can select the Pattern Stamp Tool. Pick your new pattern from the Options at the top of your screen, in the Options Panel. Then simply right click in your image in order to pick as soft a brush as possible to paint with. Paint into your image until your background is as smooth as you want it to be, making your painted out object more and more invisible. If you get lines from your repeated texture, experiment turning the on and off and paint over them. In addition to this, simple use of the Crop Tool, shortcut , can recompose an image, making it look as if it never had another object in it at all. Combine these techniques to find a method that works best for your images. Have questions or comments concerning Graphics, Photos, Filetypes, or Photoshop? Send your questions to [email protected], and they may be featured in a future How-To Geek Graphics article. Image Credits: Geisha Kyoto Gion by Todd Laracuenta via Wikipedia, used under Creative Commons. Moai Rano raraku by Aurbina, in Public Domain. Chris Young visits Wrigley by TonyTheTiger, via Wikipedia, used under Creative Commons. Latest Features How-To Geek ETC Ask How-To Geek: How Can I Monitor My Bandwidth Usage? Internet Explorer 9 RC Now Available: Here’s the Most Interesting New Stuff Here’s a Super Simple Trick to Defeating Fake Anti-Virus Malware How to Change the Default Application for Android Tasks Stop Believing TV’s Lies: The Real Truth About "Enhancing" Images The How-To Geek Valentine’s Day Gift Guide CyanogenMod Updates; Rolls out Android 2.3 to the Less Fortunate MyPaint is an Open-Source Graphics App for Digital Painters Can the Birds and Pigs Really Be Friends in the End? [Angry Birds Video] Add the 2D Version of the New Unity Interface to Ubuntu 10.10 and 11.04 MightyMintyBoost Is a 3-in-1 Gadget Charger Watson Ties Against Human Jeopardy Opponents

Read the article

SQL Server Split() Function

- by HighAltitudeCoder

Title goes here Ever wanted a dbo.Split() function, but not had the time to debug it completely? Let me guess - you are probably working on a stored procedure with 50 or more parameters; two or three of them are parameters of differing types, while the other 47 or so all of the same type (id1, id2, id3, id4, id5...). Worse, you've found several other similar stored procedures with the ONLY DIFFERENCE being the number of like parameters taped to the end of the parameter list. If this is the situation you find yourself in now, you may be wondering, "why am I working with three different copies of what is basically the same stored procedure, and why am I having to maintain changes in three different places? Can't I have one stored procedure that accomplishes the job of all three? My answer to you: YES! Here is the Split() function I've created. /****************************************************************************** Split.sql ******************************************************************************/ /****************************************************************************** Split a delimited string into sub-components and return them as a table. Parameter 1: Input string which is to be split into parts. Parameter 2: Delimiter which determines the split points in input string. Works with space or spaces as delimiter. Split() is apostrophe-safe. SYNTAX: SELECT * FROM Split('Dvorak,Debussy,Chopin,Holst', ',') SELECT * FROM Split('Denver|Seattle|San Diego|New York', '|') SELECT * FROM Split('Denver is the super-awesomest city of them all.', ' ') ******************************************************************************/ USE AdventureWorks GO IF EXISTS (SELECT * FROM sysobjects WHERE xtype = 'TF' AND name = 'Split' ) BEGIN DROP FUNCTION Split END GO CREATE FUNCTION Split ( @InputString VARCHAR(8000), @Delimiter VARCHAR(50) ) RETURNS @Items TABLE ( Item VARCHAR(8000) ) AS BEGIN IF @Delimiter = ' ' BEGIN SET @Delimiter = ',' SET @InputString = REPLACE(@InputString, ' ', @Delimiter) END IF (@Delimiter IS NULL OR @Delimiter = '') SET @Delimiter = ',' --INSERT INTO @Items VALUES (@Delimiter) -- Diagnostic --INSERT INTO @Items VALUES (@InputString) -- Diagnostic DECLARE @Item VARCHAR(8000) DECLARE @ItemList VARCHAR(8000) DECLARE @DelimIndex INT SET @ItemList = @InputString SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0) WHILE (@DelimIndex != 0) BEGIN SET @Item = SUBSTRING(@ItemList, 0, @DelimIndex) INSERT INTO @Items VALUES (@Item) -- Set @ItemList = @ItemList minus one less item SET @ItemList = SUBSTRING(@ItemList, @DelimIndex+1, LEN(@ItemList)-@DelimIndex) SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0) END -- End WHILE IF @Item IS NOT NULL -- At least one delimiter was encountered in @InputString BEGIN SET @Item = @ItemList INSERT INTO @Items VALUES (@Item) END -- No delimiters were encountered in @InputString, so just return @InputString ELSE INSERT INTO @Items VALUES (@InputString) RETURN END -- End Function GO ---- Set Permissions --GRANT SELECT ON Split TO UserRole1 --GRANT SELECT ON Split TO UserRole2 --GO The syntax is basically as follows: SELECT <fields> FROM Table 1 JOIN Table 2 ON ... JOIN Table 3 ON ... WHERE LOGICAL CONDITION A AND LOGICAL CONDITION B AND LOGICAL CONDITION C AND TABLE2.Id IN (SELECT * FROM Split(@IdList, ',')) @IdList is a parameter passed into the stored procedure, and the comma (',') is the delimiter you have chosen to split the parameter list on. You can also use it like this: SELECT <fields> FROM Table 1 JOIN Table 2 ON ... JOIN Table 3 ON ... WHERE LOGICAL CONDITION A AND LOGICAL CONDITION B AND LOGICAL CONDITION C HAVING COUNT(SELECT * FROM Split(@IdList, ',') Similarly, it can be used in other aggregate functions at run-time: SELECT MIN(SELECT * FROM Split(@IdList, ','), <fields> FROM Table 1 JOIN Table 2 ON ... JOIN Table 3 ON ... WHERE LOGICAL CONDITION A AND LOGICAL CONDITION B AND LOGICAL CONDITION C GROUP BY <fields> Now that I've (hopefully effectively) explained the benefits to using this function and implementing it in one or more of your database objects, let me warn you of a caveat that you are likely to encounter. You may have a team member who waits until the right moment to ask you a pointed question: "Doesn't this function just do the same thing as using the IN function? Why didn't you just use that instead? In other words, why bother with this function?" What's happening is, one or more team members has failed to understand the reason for implementing this kind of function in the first place. (Note: this is THE MOST IMPORTANT ASPECT OF THIS POST). Allow me to outline a few pros to implementing this function, so you may effectively parry this question. Touche. 1) Code consolidation. You don't have to maintain what is basically the same code and logic, but with varying numbers of the same parameter in several SQL objects. I'm not going to go into the cons related to using this function, because the afore mentioned team member is probably more than adept at pointing these out. Remember, the real positive contribution is ou are decreasing the liklihood that your team fails to update all (x) duplicate copies of what are basically the same stored procedure, and so on... This is the classic downside to duplicate code. It is a virus, and you should kill it. You might be better off rejecting your team member's question, and responding with your own: "Would you rather maintain the same logic in multiple different stored procedures, and hope that the team doesn't forget to always update all of them at the same time?". In his head, he might be thinking "yes, I would like to maintain several different copies of the same stored procedure", although you probably will not get such a direct response. 2) Added flexibility - you can use the Split function elsewhere, and for splitting your data in different ways. Plus, you can use any kind of delimiter you wish. How can you know today the ways in which you might want to examine your data tomorrow? Segue to my next point. 3) Because the function takes a delimiter parameter, you can split the data in any number of ways. This greatly increases the utility of such a function and enables your team to work with the data in a variety of different ways in the future. You can split on a single char, symbol, word, or group of words. You can split on spaces. (The list goes on... test it out). Finally, you can dynamically define the behavior of a stored procedure (or other SQL object) at run time, through the use of this function. Rather than have several objects that accomplish almost the same thing, why not have only one instead?

Read the article

Move penetrating OBB out of another OBB to resolve collision

- by Milo

I'm working on collision resolution for my game. I just need a good way to get an object out of another object if it gets stuck. In this case a car. Here is a typical scenario. The red car is in the green object. How do I correctly get it out so the car can slide along the edge of the object as it should. I tried: if(buildings.size() > 0) { Entity e = buildings.get(0); Vector2D vel = new Vector2D(); vel.x = vehicle.getVelocity().x; vel.y = vehicle.getVelocity().y; vel.normalize(); while(vehicle.getRect().overlaps(e.getRect())) { vehicle.setCenter(vehicle.getCenterX() - vel.x * 0.1f, vehicle.getCenterY() - vel.y * 0.1f); } colided = true; } But that does not work too well. Is there some sort of vector I could calculate to use as the vector to move the car away from the object? Thanks Here is my OBB2D class: public class OBB2D { // Corners of the box, where 0 is the lower left. private Vector2D corner[] = new Vector2D[4]; private Vector2D center = new Vector2D(); private Vector2D extents = new Vector2D(); private RectF boundingRect = new RectF(); private float angle; //Two edges of the box extended away from corner[0]. private Vector2D axis[] = new Vector2D[2]; private double origin[] = new double[2]; public OBB2D(Vector2D center, float w, float h, float angle) { set(center,w,h,angle); } public OBB2D(float left, float top, float width, float height) { set(new Vector2D(left + (width / 2), top + (height / 2)),width,height,0.0f); } public void set(Vector2D center,float w, float h,float angle) { Vector2D X = new Vector2D( (float)Math.cos(angle), (float)Math.sin(angle)); Vector2D Y = new Vector2D((float)-Math.sin(angle), (float)Math.cos(angle)); X = X.multiply( w / 2); Y = Y.multiply( h / 2); corner[0] = center.subtract(X).subtract(Y); corner[1] = center.add(X).subtract(Y); corner[2] = center.add(X).add(Y); corner[3] = center.subtract(X).add(Y); computeAxes(); extents.x = w / 2; extents.y = h / 2; computeDimensions(center,angle); } private void computeDimensions(Vector2D center,float angle) { this.center.x = center.x; this.center.y = center.y; this.angle = angle; boundingRect.left = Math.min(Math.min(corner[0].x, corner[3].x), Math.min(corner[1].x, corner[2].x)); boundingRect.top = Math.min(Math.min(corner[0].y, corner[1].y),Math.min(corner[2].y, corner[3].y)); boundingRect.right = Math.max(Math.max(corner[1].x, corner[2].x), Math.max(corner[0].x, corner[3].x)); boundingRect.bottom = Math.max(Math.max(corner[2].y, corner[3].y),Math.max(corner[0].y, corner[1].y)); } public void set(RectF rect) { set(new Vector2D(rect.centerX(),rect.centerY()),rect.width(),rect.height(),0.0f); } // Returns true if other overlaps one dimension of this. private boolean overlaps1Way(OBB2D other) { for (int a = 0; a < axis.length; ++a) { double t = other.corner[0].dot(axis[a]); // Find the extent of box 2 on axis a double tMin = t; double tMax = t; for (int c = 1; c < corner.length; ++c) { t = other.corner[c].dot(axis[a]); if (t < tMin) { tMin = t; } else if (t > tMax) { tMax = t; } } // We have to subtract off the origin // See if [tMin, tMax] intersects [0, 1] if ((tMin > 1 + origin[a]) || (tMax < origin[a])) { // There was no intersection along this dimension; // the boxes cannot possibly overlap. return false; } } // There was no dimension along which there is no intersection. // Therefore the boxes overlap. return true; } //Updates the axes after the corners move. Assumes the //corners actually form a rectangle. private void computeAxes() { axis[0] = corner[1].subtract(corner[0]); axis[1] = corner[3].subtract(corner[0]); // Make the length of each axis 1/edge length so we know any // dot product must be less than 1 to fall within the edge. for (int a = 0; a < axis.length; ++a) { axis[a] = axis[a].divide((axis[a].length() * axis[a].length())); origin[a] = corner[0].dot(axis[a]); } } public void moveTo(Vector2D center) { Vector2D centroid = (corner[0].add(corner[1]).add(corner[2]).add(corner[3])).divide(4.0f); Vector2D translation = center.subtract(centroid); for (int c = 0; c < 4; ++c) { corner[c] = corner[c].add(translation); } computeAxes(); computeDimensions(center,angle); } // Returns true if the intersection of the boxes is non-empty. public boolean overlaps(OBB2D other) { if(right() < other.left()) { return false; } if(bottom() < other.top()) { return false; } if(left() > other.right()) { return false; } if(top() > other.bottom()) { return false; } if(other.getAngle() == 0.0f && getAngle() == 0.0f) { return true; } return overlaps1Way(other) && other.overlaps1Way(this); } public Vector2D getCenter() { return center; } public float getWidth() { return extents.x * 2; } public float getHeight() { return extents.y * 2; } public void setAngle(float angle) { set(center,getWidth(),getHeight(),angle); } public float getAngle() { return angle; } public void setSize(float w,float h) { set(center,w,h,angle); } public float left() { return boundingRect.left; } public float right() { return boundingRect.right; } public float bottom() { return boundingRect.bottom; } public float top() { return boundingRect.top; } public RectF getBoundingRect() { return boundingRect; } public boolean overlaps(float left, float top, float right, float bottom) { if(right() < left) { return false; } if(bottom() < top) { return false; } if(left() > right) { return false; } if(top() > bottom) { return false; } return true; } };

Read the article

General monitoring for SQL Server Analysis Services using Performance Monitor

- by Testas

A recent customer engagement required a setup of a monitoring solution for SSAS, due to the time restrictions placed upon this, native Windows Performance Monitor (Perfmon) and SQL Server Profiler Monitoring Tools was used as using a third party tool would have meant the customer providing an additional monitoring server that was not available.I wanted to outline the performance monitoring counters that was used to monitor the system on which SSAS was running. Due to the slow query performance that was occurring during certain scenarios, perfmon was used to establish if any pressure was being placed on the Disk, CPU or Memory subsystem when concurrent connections access the same query, and Profiler to pinpoint how the query was being managed within SSAS, profiler I will leave for another blogThis guide is not designed to provide a definitive list of what should be used when monitoring SSAS, different situations may require the addition or removal of counters as presented by the situation. However I hope that it serves as a good basis for starting your monitoring of SSAS. I would also like to acknowledge Chris Webb’s awesome chapters from “Expert Cube Development” that also helped shape my monitoring strategy:http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!6657.entrySimulating ConnectionsTo simulate the additional connections to the SSAS server whilst monitoring, I used ascmd to simulate multiple connections to the typical and worse performing queries that were identified by the customer. A similar sript can be downloaded from codeplex at http://www.codeplex.com/SQLSrvAnalysisSrvcs. File name: ASCMD_StressTestingScripts.zip. Performance MonitorWithin performance monitor, a counter log was created that contained the list of counters below. The important point to note when running the counter log is that the RUN AS property within the counter log properties should be changed to an account that has rights to the SSAS instance when monitoring MSAS counters. Failure to do so means that the counter log runs under the system account, no errors or warning are given while running the counter log, and it is not until you need to view the MSAS counters that they will not be displayed if run under the default account that has no right to SSAS. If your connection simulation takes hours, this could prove quite frustrating if not done beforehand JThe counters used…… Object Counter Instance Justification System Processor Queue legnth N/A Indicates how many threads are waiting for execution against the processor. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. System Context Switches/sec N/A Measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if the Processor\Interrupts/sec\(_Total) counter on your machine shows a similar unexplained increase Process % Processor Time sqlservr Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process % Processor Time msmdsrv Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process Working Set sqlservr If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Process Working Set msmdsrv If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Processor % Processor Time _Total and individual cores measures the total utilization of your processor by all running processes. If multi-proc then be mindful only an average is provided Processor % Privileged Time _Total To see how the OS is handling basic IO requests. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. Processor % User Time _Total To see how the applications is interacting from a processor perspective, a high percentage utilisation determine that the server is dealing with too many apps and may require increasing thje hardware or scaling out Processor Interrupts/sec _Total The average rate, in incidents per second, at which the processor received and serviced hardware interrupts. Shoulr be consistant over time but a sudden unexplained increase could indicate a device malfunction which can be confirmed using the System\Context Switches/sec counter Memory Pages/sec N/A Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays, this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. A good idea here is to configure a perfmon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system. May also want to see the configuration of the page file on the Server Memory Available Mbytes N/A is the amount of physical memory, in bytes, available to processes running on the computer. if this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM. monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM. Physical Disk Disk Transfers/sec for each physical disk If it goes above 10 disk I/Os per second then you've got poor response time for your disk. Physical Disk Idle Time _total If Disk Transfers/sec is above 25 disk I/Os per second use this counter. which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. Physical Disk Disk queue legnth For the OLAP and SQL physical disk A value that is consistently less than 2 means that the disk system is handling the IO requests against the physical disk Network Interface Bytes Total/sec For the NIC Should be monitored over a period of time to see if there is anb increase/decrease in network utilisation Network Interface Current Bandwidth For the NIC is an estimate of the current bandwidth of the network interface in bits per second (BPS). MSAS 2005: Memory Memory Limit High KB N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Limit Low KB N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Usage KB N/A Displays the memory usage of the server process. MSAS 2005: Memory File Store KB N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache MSAS 2005: Storage Engine Query Queries from Cache Direct / sec N/A Displays the rate of queries answered from the cache directly MSAS 2005: Storage Engine Query Queries from Cache Filtered / Sec N/A Displays the Rate of queries answered by filtering existing cache entry. MSAS 2005: Storage Engine Query Queries from File / Sec N/A Displays the Rate of queries answered from files. MSAS 2005: Storage Engine Query Average time /query N/A Displays the average time of a query MSAS 2005: Connection Current connections N/A Displays the number of connections against the SSAS instance MSAS 2005: Connection Requests / sec N/A Displays the rate of query requests per second MSAS 2005: Locks Current Lock Waits N/A Displays thhe number of connections waiting on a lock MSAS 2005: Threads Query Pool job queue Length N/A The number of queries in the job queue MSAS 2005:Proc Aggregations Temp file bytes written/sec N/A Shows the number of bytes of data processed in a temporary file MSAS 2005:Proc Aggregations Temp file rows written/sec N/A Shows the number of bytes of data processed in a temporary file

Read the article

Adding RSS to tags in Orchard

- by Bertrand Le Roy

A year ago, I wrote a scary post about RSS in Orchard. RSS was one of the first features we implemented in our CMS, and it has stood the test of time rather well, but the post was explaining things at a level that was probably too abstract whereas my readers were expecting something a little more practical. Well, this post is going to correct this by showing how I built a module that adds RSS feeds for each tag on the site. Hopefully it will show that it's not very complicated in practice, and also that the infrastructure is pretty well thought out. In order to provide RSS, we need to do two things: generate the XML for the feed, and inject the address of that feed into the existing tag listing page, in order to make the feed discoverable. Let's start with the discoverability part. One might be tempted to replace the controller or the view that are responsible for the listing of the items under a tag. Fortunately, there is no need to do any of that, and we can be a lot less obtrusive. Instead, we can implement a filter: public class TagRssFilter : FilterProvider, IResultFilter .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } On this filter, we can implement the OnResultExecuting method and simply check whether the current request is targeting the list of items under a tag. If that is the case, we can just register our new feed: public void OnResultExecuting(ResultExecutingContext filterContext) { var routeValues = filterContext.RouteData.Values; if (routeValues["area"] as string == "Orchard.Tags" && routeValues["controller"] as string == "Home" && routeValues["action"] as string == "Search") { var tag = routeValues["tagName"] as string; if (! string.IsNullOrWhiteSpace(tag)) { var workContext = _wca.GetContext(); _feedManager.Register( workContext.CurrentSite + " – " + tag, "rss", new RouteValueDictionary { { "tag", tag } } ); } } } The registration of the new feed is just specifying the title of the feed, its format (RSS) and the parameters that it will need (the tag). _wca and _feedManager are just instances of IWorkContextAccessor and IFeedManager that Orchard injected for us. That is all that's needed to get the following tag to be added to the head of our page, without touching an existing controller or view: <link rel="alternate" type="application/rss+xml" title="VuLu - Science" href="/rss?tag=Science"/> Nifty. Of course, if we navigate to the URL of that feed, we'll get a 404. This is because no implementation of IFeedQueryProvider knows about the tag parameter yet. Let's build one that does: public class TagFeedQuery : IFeedQueryProvider, IFeedQuery IFeedQueryProvider has one method, Match, that we can implement to take over any feed request that has a tag parameter: public FeedQueryMatch Match(FeedContext context) { var tagName = context.ValueProvider.GetValue("tag"); if (tagName == null) return null; return new FeedQueryMatch { FeedQuery = this, Priority = -5 }; } This is just saying that if there is a tag parameter, we will handle it. All that remains to be done is the actual building of the feed now that we have accepted to handle it. This is done by implementing the Execute method of the IFeedQuery interface: public void Execute(FeedContext context) { var tagValue = context.ValueProvider.GetValue("tag"); if (tagValue == null) return; var tagName = (string)tagValue.ConvertTo(typeof (string)); var tag = _tagService.GetTagByName(tagName); if (tag == null) return; var site = _services.WorkContext.CurrentSite; var link = new XElement("link"); context.Response.Element.SetElementValue("title", site.SiteName + " - " + tagName); context.Response.Element.Add(link); context.Response.Element.SetElementValue("description", site.SiteName + " - " + tagName); context.Response.Contextualize(requestContext => link.Add(GetTagUrl(tagName, requestContext))); var items = _tagService.GetTaggedContentItems(tag.Id, 0, 20); foreach (var item in items) { context.Builder.AddItem(context, item.ContentItem); } } This code is resolving the tag content item from its name and then gets content items tagged with it, using the tag services provided by the Orchard.Tags module. Then we add those items to the feed. And that is it. To summarize, we handled the request unobtrusively in order to inject the feed's link, then handled requests for feeds with a tag parameter and generated the list of items for that tag. It remains fairly simple and still it is able to handle arbitrary content types. That makes me quite happy about our little piece of over-engineered code from last year. The full code for this can be found in the Vandelay.TagCloud module: http://orchardproject.net/gallery/List/Modules/ Orchard.Module.Vandelay.TagCloud/1.2

Read the article

Using the BAM Interceptor with Continuation

- by Charles Young

Originally posted on: http://geekswithblogs.net/cyoung/archive/2014/06/02/using-the-bam-interceptor-with-continuation.aspxI’ve recently been resurrecting some code written several years ago that makes extensive use of the BAM Interceptor provided as part of BizTalk Server’s BAM event observation library. In doing this, I noticed an issue with continuations. Essentially, whenever I tried to configure one or more continuations for an activity, the BAM Interceptor failed to complete the activity correctly. Careful inspection of my code confirmed that I was initializing and invoking the BAM interceptor correctly, so I was mystified. However, I eventually found the problem. It is a logical error in the BAM Interceptor code itself. The BAM Interceptor provides a useful mechanism for implementing dynamic tracking. It supports configurable ‘track points’. These are grouped into named ‘locations’. BAM uses the term ‘step’ as a synonym for ‘location’. Each track point defines a BAM action such as starting an activity, extracting a data item, enabling a continuation, etc. Each step defines a collection of track points. Understanding Steps The BAM Interceptor provides an abstract model for handling configuration of steps. It doesn’t, however, define any specific configuration mechanism (e.g., config files, SSO, etc.) It is up to the developer to decide how to store, manage and retrieve configuration data. At run time, this configuration is used to register track points which then drive the BAM Interceptor. The full semantics of a step are not immediately clear from Microsoft’s documentation. They represent a point in a business activity where BAM tracking occurs. They are named locations in the code. What is less obvious is that they always represent either the full tracking work for a given activity or a discrete fragment of that work which commences with the start of a new activity or the continuation of an existing activity. The BAM Interceptor enforces this by throwing an error if no ‘start new’ or ‘continue’ track point is registered for a named location. This constraint implies that each step must marked with an ‘end activity’ track point. One of the peculiarities of BAM semantics is that when an activity is continued under a correlated ID, you must first mark the current activity as ‘ended’ in order to ensure the right housekeeping is done in the database. If you re-start an ended activity under the same ID, you will leave the BAM import tables in an inconsistent state. A step, therefore, always represents an entire unit of work for a given activity or continuation ID. For activities with continuation, each unit of work is termed a ‘fragment’. Instance and Fragment State Internally, the BAM Interceptor maintains state data at two levels. First, it represents the overall state of the activity using a ‘trace instance’ token. This token contains the name and ID of the activity together with a couple of state flags. The second level of state represents a ‘trace fragment’. As we have seen, a fragment of an activity corresponds directly to the notion of a ‘step’. It is the unit of work done at a named location, and it must be bounded by start and end, or continue and end, actions. When handling continuations, the BAM Interceptor differentiates between ‘root’ fragments and other fragments. Very simply, a root fragment represents the start of an activity. Other fragments represent continuations. This is where the logic breaks down. The BAM Interceptor loses state integrity for root fragments when continuations are defined. Initialization Microsoft’s BAM Interceptor code supports the initialization of BAM Interceptors from track point configuration data. The process starts by populating an Activity Interceptor Configuration object with an array of track points. These can belong to different steps (aka ‘locations’) and can be registered in any order. Once it is populated with track points, the Activity Interceptor Configuration is used to initialise the BAM Interceptor. The BAM Interceptor sets up a hash table of array lists. Each step is represented by an array list, and each array list contains an ordered set of track points. The BAM Interceptor represents track points as ‘executable’ components. When the OnStep method of the BAM Interceptor is called for a given step, the corresponding list of track points is retrieved and each track point is executed in turn. Each track point retrieves any required data using a call back mechanism and then serializes a BAM trace fragment object representing a specific action (e.g., start, update, enable continuation, stop, etc.). The serialised trace fragment is then handed off to a BAM event stream (buffered or direct) which takes the appropriate action. The Root of the Problem The logic breaks down in the Activity Interceptor Configuration. Each Activity Interceptor Configuration is initialised with an instance of a ‘trace instance’ token. This provides the basic metadata for the activity as a whole. It contains the activity name and ID together with state flags indicating if the activity ID is a root (i.e., not a continuation fragment) and if it is completed. This single token is then shared by all trace actions for all steps registered with the Activity Interceptor Configuration. Each trace instance token is automatically initialised to represent a root fragment. However, if you subsequently register a ‘continuation’ step with the Activity Interceptor Configuration, the ‘root’ flag is set to false at the point the ‘continue’ track point is registered for that step. If you use a ‘reflector’ tool to inspect the code for the ActivityInterceptorConfiguration class, you can see the flag being set in one of the overloads of the RegisterContinue method. This makes no sense. The trace instance token is shared across all the track points registered with the Activity Interceptor Configuration. The Activity Interceptor Configuration is designed to hold track points for multiple steps. The ‘root’ flag is clearly meant to be initialised to ‘true’ for the preliminary root fragment and then subsequently set to false at the point that a continuation step is processed. Instead, if the Activity Interceptor Configuration contains a continuation step, it is changed to ‘false’ before the root fragment is processed. This is clearly an error in logic. The problem causes havoc when the BAM Interceptor is used with continuation. Effectively the root step is no longer processed correctly, and the ultimate effect is that the continued activity never completes! This has nothing to do with the root and the continuation being in the same process. It is due to a fundamental mistake of setting the ‘root’ flag to false for a continuation before the root fragment is processed. The Workaround Fortunately, it is easy to work around the bug. The trick is to ensure that you create a new Activity Interceptor Configuration object for each individual step. This may mean filtering your configuration data to extract the track points for a single step or grouping the configured track points into individual steps and the creating a separate Activity Interceptor Configuration for each group. In my case, the first approach was required. Here is what the amended code looks like: // Because of a logic error in Microsoft's code, a separate ActivityInterceptorConfiguration must be used // for each location. The following code extracts only those track points for a given step name (location). var trackPointGroup = from ResolutionService.TrackPoint tp in bamActivity.TrackPoints where (string)tp.Location == bamStepName select tp; var bamActivityInterceptorConfig = new Microsoft.BizTalk.Bam.EventObservation.ActivityInterceptorConfiguration(activityName); foreach (var trackPoint in trackPointGroup) { switch (trackPoint.Type) { case TrackPointType.Start: bamActivityInterceptorConfig.RegisterStartNew(trackPoint.Location, trackPoint.ExtractionInfo); break; etc… I’m using LINQ to filter a list of track points for those entries that correspond to a given step and then registering only those track points on a new instance of the ActivityInterceptorConfiguration class. As soon as I re-wrote the code to do this, activities with continuations started to complete correctly.

Read the article

Finding the normal of OBB face with an OBB penetrating

- by Milo

Below is an illustration: I have an OBB in an OBB (see below for OBB2D code if needed). What I need to determine is, what face it is in, and what direction do I point the normal? The goal is to get the OBB out of the OBB so the normal needs to face outward of the OBB. How could I go about: Finding what face the line is penetrating given the 4 corners of the OBB and the class below: if we define dx=x2-x1 and dy=y2-y1, then the normals are (-dy, dx) and (dy, -dx). Which normal points outward of the OBB? Thanks public class OBB2D { // Corners of the box, where 0 is the lower left. private Vector2D corner[] = new Vector2D[4]; private Vector2D center = new Vector2D(); private Vector2D extents = new Vector2D(); private RectF boundingRect = new RectF(); private float angle; //Two edges of the box extended away from corner[0]. private Vector2D axis[] = new Vector2D[2]; private double origin[] = new double[2]; public OBB2D(Vector2D center, float w, float h, float angle) { set(center,w,h,angle); } public OBB2D(float left, float top, float width, float height) { set(new Vector2D(left + (width / 2), top + (height / 2)),width,height,0.0f); } public void set(Vector2D center,float w, float h,float angle) { Vector2D X = new Vector2D( (float)Math.cos(angle), (float)Math.sin(angle)); Vector2D Y = new Vector2D((float)-Math.sin(angle), (float)Math.cos(angle)); X = X.multiply( w / 2); Y = Y.multiply( h / 2); corner[0] = center.subtract(X).subtract(Y); corner[1] = center.add(X).subtract(Y); corner[2] = center.add(X).add(Y); corner[3] = center.subtract(X).add(Y); computeAxes(); extents.x = w / 2; extents.y = h / 2; computeDimensions(center,angle); } private void computeDimensions(Vector2D center,float angle) { this.center.x = center.x; this.center.y = center.y; this.angle = angle; boundingRect.left = Math.min(Math.min(corner[0].x, corner[3].x), Math.min(corner[1].x, corner[2].x)); boundingRect.top = Math.min(Math.min(corner[0].y, corner[1].y),Math.min(corner[2].y, corner[3].y)); boundingRect.right = Math.max(Math.max(corner[1].x, corner[2].x), Math.max(corner[0].x, corner[3].x)); boundingRect.bottom = Math.max(Math.max(corner[2].y, corner[3].y),Math.max(corner[0].y, corner[1].y)); } public void set(RectF rect) { set(new Vector2D(rect.centerX(),rect.centerY()),rect.width(),rect.height(),0.0f); } // Returns true if other overlaps one dimension of this. private boolean overlaps1Way(OBB2D other) { for (int a = 0; a < axis.length; ++a) { double t = other.corner[0].dot(axis[a]); // Find the extent of box 2 on axis a double tMin = t; double tMax = t; for (int c = 1; c < corner.length; ++c) { t = other.corner[c].dot(axis[a]); if (t < tMin) { tMin = t; } else if (t > tMax) { tMax = t; } } // We have to subtract off the origin // See if [tMin, tMax] intersects [0, 1] if ((tMin > 1 + origin[a]) || (tMax < origin[a])) { // There was no intersection along this dimension; // the boxes cannot possibly overlap. return false; } } // There was no dimension along which there is no intersection. // Therefore the boxes overlap. return true; } //Updates the axes after the corners move. Assumes the //corners actually form a rectangle. private void computeAxes() { axis[0] = corner[1].subtract(corner[0]); axis[1] = corner[3].subtract(corner[0]); // Make the length of each axis 1/edge length so we know any // dot product must be less than 1 to fall within the edge. for (int a = 0; a < axis.length; ++a) { axis[a] = axis[a].divide((axis[a].length() * axis[a].length())); origin[a] = corner[0].dot(axis[a]); } } public void moveTo(Vector2D center) { Vector2D centroid = (corner[0].add(corner[1]).add(corner[2]).add(corner[3])).divide(4.0f); Vector2D translation = center.subtract(centroid); for (int c = 0; c < 4; ++c) { corner[c] = corner[c].add(translation); } computeAxes(); computeDimensions(center,angle); } // Returns true if the intersection of the boxes is non-empty. public boolean overlaps(OBB2D other) { if(right() < other.left()) { return false; } if(bottom() < other.top()) { return false; } if(left() > other.right()) { return false; } if(top() > other.bottom()) { return false; } if(other.getAngle() == 0.0f && getAngle() == 0.0f) { return true; } return overlaps1Way(other) && other.overlaps1Way(this); } public Vector2D getCenter() { return center; } public float getWidth() { return extents.x * 2; } public float getHeight() { return extents.y * 2; } public void setAngle(float angle) { set(center,getWidth(),getHeight(),angle); } public float getAngle() { return angle; } public void setSize(float w,float h) { set(center,w,h,angle); } public float left() { return boundingRect.left; } public float right() { return boundingRect.right; } public float bottom() { return boundingRect.bottom; } public float top() { return boundingRect.top; } public RectF getBoundingRect() { return boundingRect; } public boolean overlaps(float left, float top, float right, float bottom) { if(right() < left) { return false; } if(bottom() < top) { return false; } if(left() > right) { return false; } if(top() > bottom) { return false; } return true; } };

Read the article

Running TeamCity from Amazon EC2 - Cloud based scalable build and continuous Integration

- by RoyOsherove

I’ve been having fun playing with the amazon EC2 cloud service. I set up a server running TeamCity, and an image of a server that just runs a TeamCity agent. I also setup TeamCity to automatically instantiate agents on EC2 and shut them down based upon availability of free agents. Here’s how I did it: The first step was setting up the teamcity server. Create an account on amazon EC2 (BTW, amazon’s sites works better in IE than it does in chrome.. who knew!?) Open the EC2 dashboard, and click “Launch Instance” . From the “Quick Start” tab I selected from the list: “Getting Started on Microsoft Windows Server 2008 (AMI Id: ami-c5e40dac)” . it’s good enough to just run teamcity. In the instance details, I used the default (Small instance, 1.7 GB mem). You might want to choose a close availability zone based on where you are. We want to “Launch instances” so click continue. Select the default kernel, RAM disk and all. No need to enable monitoring for now (you can do that later). click continue. If you don’t have a key pair, you will be prompted to create one. Once you do, select it in the list. Now you’ll be prompted to create a security group. I named mine “TC” as in “TeamCity”. each group is a bunch of settings on which ports can be let through into and out of a hosted machine. keep it as the default settings. We will change them later. Click continue, review and then click “Launch”. Now you’ll be able to see the new instance in the running instances list on your site. Now, you need to install stuff on that instance (TeamCity!) . To do that, you’ll need to Remote desktop into that instance. To do that, we’ll get the admin password for that instance: Check it on the list, and click “Instance Actions” - “Get Windows Admin Password”. You might have to wait about 10 minutes or so for the password to be generated for you. Once you have the password, you will remote desktop (start-run-‘mstsc’) into the instance. It’s address is a dns address shown below the list under “Public DNS”. it looks something like: ec2-256-226-194-91.compute-1.amazonaws.com Once you’re inside the instance – you’ll need to open IE (it is in hardened mode so you’ll have to relax its security settings to download stuff). I first downloaded chrome and using chrome I downloaded TeamCity. Note that the download speed is FAST. several MBs per second. To be able to see TeamCity from the outside, you will need to open the advanced firewall settings inside the remote machine, and add incoming and outgoing rules for port 80 (HTTP). Once you do that, you should be able to see the machine from the outside. If you still can’t, see the next step. I also enabled ports 9090 since I will use this machine to create an agent image later as well. Now configure the security group (TC) to enable talking to agents: IN the EC2 dashboard click on “Security Groups” and select your group. To add a rule, click on the empty list under the ‘protocol’ header. select TCP. from and ‘to’ ports are 9090. source ip is 0.0.0.0/0 (every ip is allowed). click “Save. Also make sure you can see “HTTP” tcp 80 in that list. if you can’t see it, add it or you won’t be able to browse to the machine’s teamcity server home page. I also set an elastic IP for the machine: so I always have the same IP for the machine instance. Allocate and set one through the”Elastic IP” link on the EC2 dashboard. you should now have a working instance of teamcity. Now let’s create an agent image. Repeat steps 1-9, but this time, make sure you select a machine that fits what an agent might do. I selected Instance type – Hihg-CPU medium machine, that is much faster. On that machine, I installed what I needed (VS 2010, PostSharp etc..). downloading VS 2010 from MSDN (2 GB took less than 10 min!) Now, instead of installing teamcity, browse using the browser to the teamcity homepage (from within the remote machine). go to the Administration page, and click the upper right link “Install agents”. Install the agent on he local machine – set it to the IP or DNS of the running TeamCity server. That way you’ll be able to check their connectivity live before making this machine your official agent image to reuse. Once the agent is installed, see that the TC server can see it and use it. see steps 13-14 above if they can’t. Once it works, you can take steps to make this image your agent image to be reused. next, here is a copy-paste of several steps to take from http://confluence.jetbrains.net/display/TCD5/Setting+Up+TeamCity+for+Amazon+EC2 Configure system so that agent it is started on machine boot (and make sure TeamCity server is accessible on machine boot). Test the setup by rebooting machine and checking that the agent connects normally to the server. Prepare the Image for bundling: Remove any temporary/history information in the system. Stop the agent (under Windows stop the service but leave it in Automatic startup type) Delete content agent logs and temp directories (not necessary) Delete "<Agent Home>/conf/amazon-*" file (not necessary) Change config/buildAgent.properties to remove properties: name, serverAddress, authToken (not necessary) Now, we need to: Make AMI from the running instance. Configure TeamCity EC2 support on TeamCity server. Making an AMI: Check the instance of the agent in the EC2 dashboard instance list, and select instance actions->Create Image (EBS AMI) you’ll see the image pending in the APIs list in the EC2 dashboard. this could take 30 minutes or more. meanwhile we can configure the could support in the teamcity server. COPY THE AMI ID to the clipboard (looks like ami-a88aa4ce) Configuring TeamCity for Cloud: In TeamCity, click on “Agents” and then on “Cloud” tab. this is where you will control your cloud agents. to configure new cloud agents based on APIs, click on the right link to the “configuration page” Create a new profile and select AMazon EC2 as cloud type. Use your AMI ID that you copied to the clipboard into the “Images” field. Select an availability zone that is the same as the one your instance is running on for best communication perf between them make sure you select the ‘TC’ security group hopefully, that should be it, and teamcity will try to instantiate new instances on demand. Note that it may take around 10 minutes for an agent to become available to teamcity from the time it’s started.

Read the article

European Interoperability Framework - a new beginning?

- by trond-arne.undheim

The most controversial document in the history of the European Commission's IT policy is out. EIF is here, wrapped in the Communication "Towards interoperability for European public services", and including the new feature European Interoperability Strategy (EIS), arguably a higher strategic take on the same topic. Leaving EIS aside for a moment, the EIF controversy has been around IPR, defining open standards and about the proper terminology around standardization deliverables. Today, as the document finally emerges, what is the verdict? First of all, to be fair to those among you who do not spend your lives in the intricate labyrinths of Commission IT policy documents on interoperability, let's define what we are talking about. According to the Communication: "An interoperability framework is an agreed approach to interoperability for organisations that want to collaborate to provide joint delivery of public services. Within its scope of applicability, it specifies common elements such as vocabulary, concepts, principles, policies, guidelines, recommendations, standards, specifications and practices." The Good - EIF reconfirms that "The Digital Agenda can only take off if interoperability based on standards and open platforms is ensured" and also confirms that "The positive effect of open specifications is also demonstrated by the Internet ecosystem." - EIF takes a productive and pragmatic stance on openness: "In the context of the EIF, openness is the willingness of persons, organisations or other members of a community of interest to share knowledge and stimulate debate within that community, the ultimate goal being to advance knowledge and the use of this knowledge to solve problems" (p.11). "If the openness principle is applied in full: - All stakeholders have the same possibility of contributing to the development of the specification and public review is part of the decision-making process; - The specification is available for everybody to study; - Intellectual property rights related to the specification are licensed on FRAND terms or on a royalty-free basis in a way that allows implementation in both proprietary and open source software" (p. 26). - EIF is a formal Commission document. The former EIF 1.0 was a semi-formal deliverable from the PEGSCO, a working group of Member State representatives. - EIF tackles interoperability head-on and takes a clear stance: "Recommendation 22. When establishing European public services, public administrations should prefer open specifications, taking due account of the coverage of functional needs, maturity and market support." - The Commission will continue to support the National Interoperability Framework Observatory (NIFO), reconfirming the importance of coordinating such approaches across borders. - The Commission will align its internal interoperability strategy with the EIS through the eCommission initiative. - One cannot stress the importance of using open standards enough, whether in the context of open source or non-open source software. The EIF seems to have picked up on this fact: What does the EIF says about the relation between open specifications and open source software? The EIF introduces, as one of the characteristics of an open specification, the requirement that IPRs related to the specification have to be licensed on FRAND terms or on a royalty-free basis in a way that allows implementation in both proprietary and open source software. In this way, companies working under various business models can compete on an equal footing when providing solutions to public administrations while administrations that implement the standard in their own software (software that they own) can share such software with others under an open source licence if they so decide. - EIF is now among the center pieces of the Digital Agenda (even though this demands extensive inter-agency coordination in the Commission): "The EIS and the EIF will be maintained under the ISA Programme and kept in line with the results of other relevant Digital Agenda actions on interoperability and standards such as the ones on the reform of rules on implementation of ICT standards in Europe to allow use of certain ICT fora and consortia standards, on issuing guidelines on essential intellectual property rights and licensing conditions in standard-setting, including for ex-ante disclosure, and on providing guidance on the link between ICT standardisation and public procurement to help public authorities to use standards to promote efficiency and reduce lock-in.(Communication, p.7)" All in all, quite a few good things have happened to the document in the two years it has been on the shelf or was being re-written, depending on your perspective, in any case, awaiting the storms to calm. The Bad - While a certain pragmatism is required, and governments cannot migrate to full openness overnight, EIF gives a bit too much room for governments not to apply the openness principle in full. Plenty of reasons are given, which should maybe have been put as challenges to be overcome: "However, public administrations may decide to use less open specifications, if open specifications do not exist or do not meet functional interoperability needs. In all cases, specifications should be mature and sufficiently supported by the market, except if used in the context of creating innovative solutions". - EIF does not use the internationally established terminology: open standards. Rather, the EIF introduces the notion of "formalised specification". How do "formalised specifications" relate to "standards"? According to the FAQ provided: The word "standard" has a specific meaning in Europe as defined by Directive 98/34/EC. Only technical specifications approved by a recognised standardisation body can be called a standard. Many ICT systems rely on the use of specifications developed by other organisations such as a forum or consortium. The EIF introduces the notion of "formalised specification", which is either a standard pursuant to Directive 98/34/EC or a specification established by ICT fora and consortia. The term "open specification" used in the EIF, on the one hand, avoids terminological confusion with the Directive and, on the other, states the main features that comply with the basic principle of openness laid down in the EIF for European Public Services. Well, this may be somewhat true, but in reality, Europe is 30 year behind in terminology. Unless the European Standardization Reform gets completed in the next few months, most Member States will likely conclude that they will go on referencing and using standards beyond those created by the three European endorsed monopolists of standardization, CEN, CENELEC and ETSI. Who can afford to begin following the strict Brussels rules for what they can call open standards when, in reality, standards stemming from global standardization organizations, so-called fora/consortia, dominate in the IT industry. What exactly is EIF saying? Does it encourage Member States to go on using non-ESO standards as long as they call it something else? I guess I am all for it, although it is a bit cumbersome, no? Why was there so much interest around the EIF? The FAQ attempts to explain: Some Member States have begun to adopt policies to achieve interoperability for their public services. These actions have had a significant impact on the ecosystem built around the provision of such services, e.g. providers of ICT goods and services, standardisation bodies, industry fora and consortia, etc... The Commission identified a clear need for action at European level to ensure that actions by individual Member States would not create new electronic barriers that would hinder the development of interoperable European public services. As a result, all stakeholders involved in the delivery of electronic public services in Europe have expressed their opinions on how to increase interoperability for public services provided by the different public administrations in Europe. Well, it does not take two years to read 50 consultation documents, and the EU Standardization Reform is not yet completed, so, more pragmatically, you finally had to release the document. Ok, let's leave some of that aside because the document is out and some people are happy (and others definitely not). The Verdict Considering the controversy, the delays, the lobbying, and the interests at stake both in the EU, in Member States and among vendors large and small, this document is pretty impressive. As with a good wine that has not yet come to full maturity, let's say that it seems to be coming in in the 85-88/100 range, but only a more fine-grained analysis, enjoyment in good company, and ultimately, implementation, will tell. The European Commission has today adopted a significant interoperability initiative to encourage public administrations across the EU to maximise the social and economic potential of information and communication technologies. Today, we should rally around this achievement. Tomorrow, let's sit down and figure out what it means for the future.

Read the article

Exploring the Excel Services REST API

- by jamiet

Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible. Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in. The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet

Read the article

Diagnosing ADF Mobile iOS deployment problems

- by Chris Muir

From time to time I encounter customers who have taken possession of a brand new Apple Mac, have that excited "I've just spent more on a computer then I ever wanted to but it's okay" crazy gleam in their eye, but on pre-loading all the necessary software for Oracle's ADF Mobile to start their mobile campaign, following Oracle's setup instructions and deploying their first app to Apple's XCode iPhone Simulator they hit this error message in the JDeveloper Log-Deployment window: [01:36:46 PM] Deployment cancelled. [01:36:46 PM] ---- Deployment incomplete ----. [01:36:46 PM] Failed to build the iOS application bundle. [01:36:46 PM] Deployment failed due to one or more errors returned by '/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild'. The following is a summary of the returned error(s): Command-line execution failed (Return code: 69) "Oh, return code 69, I know that well" I hear you say. Admittedly the error code is less than useful besides drawing some titters from the peanut gallery. Before explaining what's gone wrong, I think it's useful to teach customers how to diagnose these issues themselves. When ADF Mobile commences a deployment, be it to Apple's iOS or Google's Android platforms, JDeveloper and ADF Mobile do a good job in the Log window of showing you what the deployment process entails. In the case of deploying to iOS the log window will literally include the XCode commands executed to complete the deployment cycle. As example here's the log output that was produced before the error message was raised.... take the opportunity to read this line by line and note the command line calls highlighted in blue: (Note some of the following lines have been split over multiple lines to suit reading on this blog, each original line is preceded by a timestamp. Ensure to check the exact commands from JDev) [01:36:33 PM] Target platform is (iOS). [01:36:33 PM] Beginning deployment of ADF Mobile application 'LayoutDemo' to iOS using profile 'IOS_MOBILE_NATIVE_archive1'. [01:36:34 PM] Command-line executed: [/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild, -version] [01:36:34 PM] Command-line execution succeeded. [01:36:34 PM] Running dependency analysis... [01:36:34 PM] Building... [01:36:34 PM] Deploying 3 profiles... [01:36:35 PM] Wrote Archive Module to /Users/chris/fmw/jdeveloper/jdev/extensions/ oracle.adf.mobile/Samples/PublicSamples/LayoutDemo/ApplicationController/ deploy/ApplicationController.jar [01:36:35 PM] WARNING: No Resource Catalog enabled ADF components found to package [01:36:36 PM] Wrote Archive Module to /Users/chris/fmw/jdeveloper/jdev/extensions/ oracle.adf.mobile/Samples/PublicSamples/LayoutDemo/ViewController/ deploy/ViewController.jar [01:36:36 PM] Verifying existence of the .adf source directory of the ADF Mobile application... [01:36:36 PM] Verifying Application Controller project exists... [01:36:36 PM] Verifying application dependencies... [01:36:36 PM] The application may not function correctly because the following dependent libraries are missing: /Users/chris/jdev/jdeveloper/jdeveloper/jdev/extensions/oracle.adf.mobile/ lib/adfmf.springboard.jar [01:36:36 PM] Verifying project dependencies... [01:36:36 PM] Validating application XML files... [01:36:36 PM] Validating XML files in project ApplicationController... [01:36:36 PM] Validating XML files in project ViewController... [01:36:40 PM] Copying common javascript files... [01:36:41 PM] Copying FARs to the ADF Mobile Framework application... [01:36:41 PM] Extracting Feature Archive file, "ApplicationController.jar" to deployment folder, "ApplicationController". [01:36:42 PM] Extracting Feature Archive file, "ViewController.jar" to deployment folder, "ViewController". [01:36:42 PM] Deploying skinning files... [01:36:43 PM] Copying the CVM SDK files built for the x86 processor... [01:36:43 PM] Copying the CVM JDK files built for the x86 processor... [01:36:43 PM] Command-line executed: [cp, -R, -p, /Users/chris/fmw/jdeveloper/jdev/extensions/oracle.adf.mobile/iOS/jvmti/x86/, /Users/chris/fmw/jdeveloper/jdev/extensions/oracle.adf.mobile/ Samples/PublicSamples/ LayoutDemo/deploy/IOS_MOBILE_NATIVE_archive1/temporary_xcode_project/lib] [01:36:43 PM] Command-line execution succeeded. [01:36:43 PM] Command-line executed: [cp, -R, -p, /Users/chris/fmw/jdeveloper/jdev/extensions/oracle.adf.mobile/iOS/jvmti/jar/, /Users/chris/fmw/jdeveloper/jdev/extensions/oracle.adf.mobile/Samples/ PublicSamples/LayoutDemo/deploy/IOS_MOBILE_NATIVE_archive1/ temporary_xcode_project/lib] [01:36:43 PM] Command-line execution succeeded. [01:36:43 PM] Copying security related files to the ADF Mobile Framework application... [01:36:44 PM] Command-line executed from path: /Users/chris/fmw/jdeveloper/jdev/extensions/oracle.adf.mobile/Samples/ PublicSamples/LayoutDemo/deploy/IOS_MOBILE_NATIVE_archive1/temporary_xcode_project/ [01:36:44 PM] Command-line executed: /Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild clean install -configuration Debug -sdk /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/ Developer/SDKs/iPhoneSimulator6.1.sdk DSTROOT=/Users/chris/fmw/jdeveloper/jdev/extensions/oracle.adf.mobile/Samples/ PublicSamples/LayoutDemo/deploy/IOS_MOBILE_NATIVE_archive1/Destination_Root/ IPHONEOS_DEPLOYMENT_TARGET=5.0 TARGETED_DEVICE_FAMILY=1,2 PRODUCT_NAME=LayoutDemo ADD_SETTINGS_BUNDLE=NO As you can see when we move from JDeveloper undertaking its work, it then passes the code off in the last few lines for Apple's XCode to assemble and deploy the required .ipa file. From the original error message which followed this complaining about xcodebuild failing with return code 69, we can quickly see the exact command line used to call xcodebuild. As this is the exact command line call with all its options, you're free to open a Terminal window in Mac OSX and execute the same command by simply copying and pasting the command line. And via this you'll then find out what return code actually 69 means. Unfortunately it's not that exciting. For Macs that have just been installed and configured with XCode, XCode (and for that matter iTunes) which is required by ADF Mobile to deploy must have been run at least once before hand on your brand new Mac (to be clear that's once ever, not once every restart). On doing so you will be presented with a license agreement from Apple that you must accept. Only once you've done this will the command line calls work. They're currently failing as you haven't accepted the legal terms and conditions. (arguably you an also accept the terms and conditions from the command line too, but ADF Mobile cannot do this on your behalf, so it's just easier to open the tools and confirm the legal requirements that way). Putting aside the error code and its meaning, watching the log window, watching what commands are executed, learning what they do, this will assist you to diagnose issues yourself and solve these sort of issues more relatively quickly. From my perspective as an Oracle Product Manager, it allows me to say "this is the stuff you don't need to worry about when you use ADF Mobile when it's configured correctly" .... as you can see my salesman qualities shine through. For anyone who is happily using ADF Mobile on a Mac and wondering why you didn't hit these issues, it's quite likely that you already accepted the license conditions before deploying via ADF Mobile. For instance, though I'm not a fan of iTunes itself, iTunes was one of the first things I loaded on my Mac to access my Justin Bieber albums. Image courtesy of winnond / FreeDigitalPhotos.net

Read the article

6 Ways to Free Up Hard Drive Space Used by Windows System Files

- by Chris Hoffman

We’ve previously covered the standard ways to free up space on Windows. But if you have a small solid-state drive and really want more hard space, there are geekier ways to reclaim hard drive space. Not all of these tips are recommended — in fact, if you have more than enough hard drive space, following these tips may actually be a bad idea. There’s a tradeoff to changing all of these settings. Erase Windows Update Uninstall Files Windows allows you to uninstall patches you install from Windows Update. This is helpful if an update ever causes a problem — but how often do you need to uninstall an update, anyway? And will you really ever need to uninstall updates you’ve installed several years ago? These uninstall files are probably just wasting space on your hard drive. A recent update released for Windows 7 allows you to erase Windows Update files from the Windows Disk Cleanup tool. Open Disk Cleanup, click Clean up system files, check the Windows Update Cleanup option, and click OK. If you don’t see this option, run Windows Update and install the available updates. Remove the Recovery Partition Windows computers generally come with recovery partitions that allow you to reset your computer back to its factory default state without juggling discs. The recovery partition allows you to reinstall Windows or use the Refresh and Reset your PC features. These partitions take up a lot of space as they need to contain a complete system image. On Microsoft’s Surface Pro, the recovery partition takes up about 8-10 GB. On other computers, it may be even larger as it needs to contain all the bloatware the manufacturer included. Windows 8 makes it easy to copy the recovery partition to removable media and remove it from your hard drive. If you do this, you’ll need to insert the removable media whenever you want to refresh or reset your PC. On older Windows 7 computers, you could delete the recovery partition using a partition manager — but ensure you have recovery media ready if you ever need to install Windows. If you prefer to install Windows from scratch instead of using your manufacturer’s recovery partition, you can just insert a standard Window disc if you ever want to reinstall Windows. Disable the Hibernation File Windows creates a hidden hibernation file at C:\hiberfil.sys. Whenever you hibernate the computer, Windows saves the contents of your RAM to the hibernation file and shuts down the computer. When it boots up again, it reads the contents of the file into memory and restores your computer to the state it was in. As this file needs to contain much of the contents of your RAM, it’s 75% of the size of your installed RAM. If you have 12 GB of memory, that means this file takes about 9 GB of space. On a laptop, you probably don’t want to disable hibernation. However, if you have a desktop with a small solid-state drive, you may want to disable hibernation to recover the space. When you disable hibernation, Windows will delete the hibernation file. You can’t move this file off the system drive, as it needs to be on C:\ so Windows can read it at boot. Note that this file and the paging file are marked as “protected operating system files” and aren’t visible by default. Shrink the Paging File The Windows paging file, also known as the page file, is a file Windows uses if your computer’s available RAM ever fills up. Windows will then “page out” data to disk, ensuring there’s always available memory for applications — even if there isn’t enough physical RAM. The paging file is located at C:\pagefile.sys by default. You can shrink it or disable it if you’re really crunched for space, but we don’t recommend disabling it as that can cause problems if your computer ever needs some paging space. On our computer with 12 GB of RAM, the paging file takes up 12 GB of hard drive space by default. If you have a lot of RAM, you can certainly decrease the size — we’d probably be fine with 2 GB or even less. However, this depends on the programs you use and how much memory they require. The paging file can also be moved to another drive — for example, you could move it from a small SSD to a slower, larger hard drive. It will be slower if Windows ever needs to use the paging file, but it won’t use important SSD space. Configure System Restore Windows seems to use about 10 GB of hard drive space for “System Protection” by default. This space is used for System Restore snapshots, allowing you to restore previous versions of system files if you ever run into a system problem. If you need to free up space, you could reduce the amount of space allocated to system restore or even disable it entirely. Of course, if you disable it entirely, you’ll be unable to use system restore if you ever need it. You’d have to reinstall Windows, perform a Refresh or Reset, or fix any problems manually. Tweak Your Windows Installer Disc Want to really start stripping down Windows, ripping out components that are installed by default? You can do this with a tool designed for modifying Windows installer discs, such as WinReducer for Windows 8 or RT Se7en Lite for Windows 7. These tools allow you to create a customized installation disc, slipstreaming in updates and configuring default options. You can also use them to remove components from the Windows disc, shrinking the size of the resulting Windows installation. This isn’t recommended as you could cause problems with your Windows installation by removing important features. But it’s certainly an option if you want to make Windows as tiny as possible. Most Windows users can benefit from removing Windows Update uninstallation files, so it’s good to see that Microsoft finally gave Windows 7 users the ability to quickly and easily erase these files. However, if you have more than enough hard drive space, you should probably leave well enough alone and let Windows manage the rest of these settings on its own. Image Credit: Yutaka Tsutano on Flickr

Read the article

Taking the training wheels off: Accelerating the Business with Oracle IAM by Brian Mozinski (Accenture)

- by Greg Jensen

Today, technical requirements for IAM are evolving rapidly, and the bar is continuously raised for high performance IAM solutions as organizations look to roll out high volume use cases on the back of legacy systems. Existing solutions were often designed and architected to support offline transactions and manual processes, and the business owners today demand globally scalable infrastructure to support the growth their business cases are expected to deliver. To help IAM practitioners address these challenges and make their organizations and themselves more successful, this series we will outline the: • Taking the training wheels off: Accelerating the Business with Oracle IAM The explosive growth in expectations for IAM infrastructure, and the business cases they support to gain investment in new security programs. • "Necessity is the mother of invention": Technical solutions developed in the field Well proven tricks of the trade, used by IAM guru’s to maximize your solution while addressing the requirements of global organizations. • The Art & Science of Performance Tuning of Oracle IAM 11gR2 Real world examples of performance tuning with Oracle IAM • No Where to go but up: Extending the benefits of accelerated IAM Anything is possible, compelling new solutions organizations are unlocking with accelerated Oracle IAM Let’s get started … by talking about the changing dynamics driving these discussions. Big Companies are getting bigger everyday, and increasingly organizations operate across state lines, multiple times zones, and in many countries or continents at the same time. No longer is midnight to 6am a safe time to take down the system for upgrades, to run recon’s and import or update user accounts and attributes. Further IT organizations are operating as shared services with SLA’s similar to telephone carrier levels expected by their “clients”. Workers are moved in and out of roles on a weekly, daily, or even hourly rate and IAM is expected to support those rapid changes. End users registering for services during business hours in Singapore are expected their access to be green-lighted in custom apps hosted in Portugal within the hour. Many of the expectations of asynchronous systems and batched updates are not adequate and the number and types of users is growing. When organizations acted more like independent teams at functional or geographic levels it was manageable to have processes that relied on a handful of people who knew how to make things work …. Knew how to get you access to the key systems to get your job done. Today everyone is expected to do more with less, the finance administrator previously supporting their local Atlanta sales office might now be asked to help close the books for the Johannesburg team, and access certification process once completed monthly by Joan on the 3rd floor is now done by a shared pool of resources in Sao Paulo. Fragmented processes that rely on institutional knowledge to get access to systems and get work done quickly break down in these scenarios. Highly robust processes that have automated workflows for connected or disconnected systems give organizations the dynamic flexibility to share work across these lines and cut costs or increase productivity. As the IT industry computing paradigms continue to change with the passing of time, and as mature or proven approaches become clear, it is normal for organizations to adjust accordingly. Businesses must manage identity in an increasingly hybrid world in which legacy on-premises IAM infrastructures are extended or replaced to support more and more interconnected and interdependent services to a wider range of users. The old legacy IAM implementation models we had relied on to manage identities no longer apply. End users expect to self-request access to services from their tablet, get supervisor approval over mobile devices and email, and launch the application even if is hosted on the cloud, or run by a partner, vendor, or service provider. While user expectations are higher, they are also simpler … logging into custom desktop apps to request approvals, or going through email or paper based processes for certification is unacceptable. Users expect security to operate within the paradigm of the application … i.e. feel like the application they are using. Citizen and customer facing applications have evolved from every where, with custom applications, 3rd party tools, and merging in from acquired entities or 3rd party OEM’s resold to expand your portfolio of services. These all have their own user stores, authentication models, user lifecycles, session management, etc. Often the designers/developers are no longer accessible and the documentation is limited. Bringing together underlying directories to scale for growth, and improve user experience is critical for revenue … but also for operations. Job functions are more dynamic.... take the Olympics for example. Endless organizations from corporations broadcasting, endorsing, or marketing through the event … to non-profit athletic foundations and public/government entities for athletes and public safety, all operate simultaneously on the world stage. Each organization needs to spin up short-term teams, often dealing with proprietary information from hot ads to racing strategies or security plans. IAM is expected to enable team’s to spin up, enable new applications, protect privacy, and secure critical infrastructure. Then it needs to be disabled just as quickly as users go back to their previous responsibilities. On a more technical level … Optimized system directory; tuning guidelines and parameters are needed by businesses today. Business’s need to be making the right choices (virtual directories) and considerations via choosing the correct architectural patterns (virtual, direct, replicated, and tuning), challenge is that business need to assess and chose the correct architectural patters (centralized, virtualized, and distributed) Today's Business organizations have very complex heterogeneous enterprises that contain diverse and multifaceted information. With today's ever changing global landscape, the strategic end goal in challenging times for business is business agility. The business of identity management requires enterprise's to be more agile and more responsive than ever before. The continued proliferation of networking devices (PC, tablet, PDA's, notebooks, etc.) has caused the number of devices and users to be granted access to these devices to grow exponentially. Business needs to deploy an IAM system that can account for the demands for authentication and authorizations to these devices. Increased innovation is forcing business and organizations to centralize their identity management services. Access management needs to handle traditional web based access as well as handle new innovations around mobile, as well as address insufficient governance processes which can lead to rouge identity accounts, which can then become a source of vulnerabilities within a business’s identity platform. Risk based decisions are providing challenges to business, for an adaptive risk model to make proper access decisions via standard Web single sign on for internal and external customers,. Organizations have to move beyond simple login and passwords to address trusted relationship questions such as: Is this a trusted customer, client, or citizen? Is this a trusted employee, vendor, or partner? Is this a trusted device? Without a solid technological foundation, organizational performance, collaboration, constituent services, or any other organizational processes will languish. A Single server location presents not only network concerns for distributed user base, but identity challenges. The network risks are centered on latency of the long trip that the traffic has to take. Other risks are a performance around availability and if the single identity server is lost, all access is lost. As you can see, there are many reasons why performance tuning IAM will have a substantial impact on the success of your organization. In our next installment in the series we roll up our sleeves and get into detailed tuning techniques used everyday by thought leaders in the field implementing Oracle Identity & Access Management Solutions.

Read the article

C#: LINQ vs foreach - Round 1.

- by James Michael Hare

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool. But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? The answer is not really clear-cut. There are two sides to any code cost arguments: performance and maintainability. The first of these is obvious and quantifiable. Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better. Unfortunately, this is not always a good measure. Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages. In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost. Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library. Well, before we discuss any further, let's look at some sample code and the numbers. First, let's take a look at the for loop and the LINQ expression. This is just a simple find comparison: // find implemented via LINQ public static bool FindViaLinq(IEnumerable<int> list, int target) { return list.Any(item => item == target); } // find implemented via standard iteration public static bool FindViaIteration(IEnumerable<int> list, int target) { foreach (var i in list) { if (i == target) { return true; } } return false; } Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention. You don't have to actually analyze the behavior of the loop to determine what it's doing. So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data. For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches. List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower Any 10 26 0.00046 30.00% Iteration 10 20 0.00023 - Any 100 116 0.00201 18.37% Iteration 100 98 0.00118 - Any 1000 1058 0.01853 16.78% Iteration 1000 906 0.01155 - Any 10,000 10,383 0.18189 17.41% Iteration 10,000 8843 0.11362 - Any 100,000 104,004 1.8297 18.27% Iteration 100,000 87,941 1.13163 - The LINQ expression is running about 17% slower for average size collections and worse for smaller collections. Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this. So what about other LINQ expressions? After all, Any() is one of the more trivial ones. I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ: // Linq form public static int GetTargetPosition1(IEnumerable<int> list, int target) { return list.TakeWhile(item => item != target).Count(); } // traditionally iterative form public static int GetTargetPosition2(IEnumerable<int> list, int target) { int count = 0; foreach (var i in list) { if(i == target) { break; } ++count; } return count; } Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt. So I ran these through the same test data: List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Iteration 100,000 81635 0.81635 - Wow! I expected some overhead due to the state machines iterators produce, but 90% slower? That seems a little heavy to me. So then I thought, well, what if TakeWhile() is not the right tool for the job? The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem? Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way. This mostly matches Any(). // a new method that uses linq but evaluates the count in a closure. public static int TakeWhileViaLinq2(IEnumerable<int> list, int target) { int count = 0; list.Any(item => { if(item == target) { return true; } ++count; return false; }); return count; } Now how does this one compare? List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Any w/Closure 10 23 0.00023 28% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Any w/Closure 100 116 0.00116 27% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Any w/Closure 1000 1101 0.01101 33% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Any w/Closure 10,000 10802 0.10802 32% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Any w/Closure 100,000 108378 1.08378 33% Iteration 100,000 81635 0.81635 - Much better! It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do. So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm. But this is true of any choice of algorithm or collection in general. Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully. For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>. That's really not that bad at all in the grand scheme of things. Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay. And if your typical list is 1000 items or less we're talking only microseconds worth of difference. It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off. So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly. If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off. And if I ever do need that last millisecond of performance? Well then I'll optimize JUST THAT problem spot. To me it's worth it for the readability, speed-to-market, and maintainability.

Read the article

Try a sample: Using the counter predicate for event sampling

- by extended_events

Extended Events offers a rich filtering mechanism, called predicates, that allows you to reduce the number of events you collect by specifying criteria that will be applied during event collection. (You can find more information about predicates in Using SQL Server 2008 Extended Events (by Jonathan Kehayias)) By evaluating predicates early in the event firing sequence we can reduce the performance impact of collecting events by stopping event collection when the criteria are not met. You can specify predicates on both event fields and on a special object called a predicate source. Predicate sources are similar to action in that they typically are related to some type of global information available from the server. You will find that many of the actions available in Extended Events have equivalent predicate sources, but actions and predicates sources are not the same thing. Applying predicates, whether on a field or predicate source, is very similar to what you are used to in T-SQL in terms of how they work; you pick some field/source and compare it to a value, for example, session_id = 52. There is one predicate source that merits special attention though, not just for its special use, but for how the order of predicate evaluation impacts the behavior you see. I’m referring to the counter predicate source. The counter predicate source gives you a way to sample a subset of events that otherwise meet the criteria of the predicate; for example you could collect every other event, or only every tenth event. Simple CountingThe counter predicate source works by creating an in memory counter that increments every time the predicate statement is evaluated. Here is a simple example with my favorite event, sql_statement_completed, that only collects the second statement that is run. (OK, that’s not much of a sample, but this is for demonstration purposes. Here is the session definition: CREATE EVENT SESSION counter_test ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.counter = 2)ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS) You can find general information about the session DDL syntax in BOL and from Pedro’s post Introduction to Extended Events. The important part here is the WHERE statement that defines that I only what the event where package0.count = 2; in other words, only the second instance of the event. Notice that I need to provide the package name along with the predicate source. You don’t need to provide the package name if you’re using event fields, only for predicate sources. Let’s say I run the following test queries: -- Run three statements to test the sessionSELECT 'This is the first statement'GOSELECT 'This is the second statement'GOSELECT 'This is the third statement';GO Once you return the event data from the ring buffer and parse the XML (see my earlier post on reading event data) you should see something like this: event_name sql_text sql_statement_completed SELECT ‘This is the second statement’ You can see that only the second statement from the test was actually collected. (Feel free to try this yourself. Check out what happens if you remove the WHERE statement from your session. Go ahead, I’ll wait.) Percentage Sampling OK, so that wasn’t particularly interesting, but you can probably see that this could be interesting, for example, lets say I need a 25% sample of the statements executed on my server for some type of QA analysis, that might be more interesting than just the second statement. All comparisons of predicates are handled using an object called a predicate comparator; the simple comparisons such as equals, greater than, etc. are mapped to the common mathematical symbols you know and love (eg. = and >), but to do the less common comparisons you will need to use the predicate comparators directly. You would probably look to the MOD operation to do this type sampling; we would too, but we don’t call it MOD, we call it divides_by_uint64. This comparator evaluates whether one number is divisible by another with no remainder. The general syntax for using a predicate comparator is pred_comp(field, value), field is always first and value is always second. So lets take a look at how the session changes to answer our new question of 25% sampling: CREATE EVENT SESSION counter_test_25 ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.divides_by_uint64(package0.counter,4))ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS)GO Here I’ve replaced the simple equivalency check with the divides_by_uint64 comparator to check if the counter is evenly divisible by 4, which gives us back every fourth record. I’ll leave it as an exercise for the reader to test this session. Why order matters I indicated at the start of this post that order matters when it comes to the counter predicate – it does. Like most other predicate systems, Extended Events evaluates the predicate statement from left to right; as soon as the predicate statement is proven false we abandon evaluation of the remainder of the statement. The counter predicate source is only incremented when it is evaluated so whether or not the counter is incremented will depend on where it is in the predicate statement and whether a previous criteria made the predicate false or not. Here is a generic example: Pred1: (WHERE statement_1 AND package0.counter = 2)Pred2: (WHERE package0.counter = 2 AND statement_1) Let’s say I cause a number of events as follows and examine what happens to the counter predicate source. Iteration Statement Pred1 Counter Pred2 Counter A Not statement_1 0 1 B statement_1 1 2 C Not statement_1 1 3 D statement_1 2 4 As you can see, in the case of Pred1, statement_1 is evaluated first, when it fails (A & C) predicate evaluation is stopped and the counter is not incremented. With Pred2 the counter is evaluated first, so it is incremented on every iteration of the event and the remaining parts of the predicate are then evaluated. In this example, Pred1 would return an event for D while Pred2 would return an event for B. But wait, there is an interesting side-effect here; consider Pred2 if I had run my statements in the following order: Not statement_1 Not statement_1 statement_1 statement_1 In this case I would never get an event back from the system because the point at which counter=2, the rest of the predicate evaluates as false so the event is not returned. If you’re using the counter target for sampling and you’re not getting the expected events, or any events, check the order of the predicate criteria. As a general rule I’d suggest that the counter criteria should be the last element of your predicate statement since that will assure that your sampling rate will apply to the set of event records defined by the rest of your predicate. Aside: I’m interested in hearing about uses for putting the counter predicate criteria earlier in the predicate statement. If you have one, post it in a comment to share with the class. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

Underwriting in a New Frontier: Spurring Innovation

- by [email protected]

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 st1\:*{behavior:url(#ieooui) } /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif";} Susan Keuer, product strategy manager for Oracle Insurance, shares her experiences and insight from the 2010 Association of Home Office Underwriters (AHOU) Annual Conference, April 11-14, in San Antonio, Texas How can I be more innovative in underwriting? It's a common question I hear from insurance carriers, producers and others, so it was no surprise that it was the key theme at the recent 2010 AHOU Annual Conference. This year's event drew more than 900 insurance professionals involved in the underwriting process across life and annuities, property and casualty and reinsurance from around the globe, including the U.S., Canada, Australia, Bahamas, and more, to San Antonio - a Texas city where innovation transformed a series of downtown drainage canals into its premiere River Walk tourist destination. CNN's Medical Correspondent Dr. Sanjay Gupta kicked off the conference with a phenomenal opening session that drove home the theme of the conference, "Underwriting in a New Frontier: Spurring Innovation." Drawing from his own experience as a neurosurgeon treating critically injured medical patients in the field in Iraq, Gupta inspired audience members to think outside the box during the underwriting process. He shared a compelling story of operating on a soldier who had suffered a head-related trauma in a field hospital. With minimal supplies available Gupta used a Black and Decker saw to operate on the soldier's head and reduce pressure on his swelling brain. Drawing from this example, Gupta encouraged underwriters to think creatively, be innovative, and consider new tools and sources of information, such as social networking sites, during the underwriting process. So as you are looking at risk take into consideration all resources you have available. Gupta also stressed the concept of IKIGAI - noting that individuals who believe that their life is worth living are less likely to die than are their counterparts without this belief. How does one quantify this approach to life or thought process when evaluating risk? Could this be something to consider as a "category" in the near future? How can this same belief in your own work spur innovation? The role of technology was a hot topic of discussion throughout the conference. Sessions delved into the latest in underwriting software to the rise of social media and how it is being increasingly integrated into underwriting process and solutions. In one session a trio of panelists representing the carrier, producer and vendor communities stressed the importance to underwriters of leveraging new technology and the plethora of online information sources, which all could be used to accurately, honestly and consistently evaluate the risk throughout the underwriting process. Another focused on the explosion of social media noting: 1. Social media is growing exponentially - About eight percent of Americans used social media five years ago. Today about 46 percent of Americans do so, with 85 percent of financial services professionals using social media in their work. 2. It will impact your business - Underwriters reconfirmed over and over that they are increasingly using "free" tools that are available in cyberspace in lieu of more costly solutions, such as inspection reports conducted by individuals in the field. 3. Information is instantly available on the Web, anytime, anywhere - LinkedIn was mentioned as a way to connect to peers in the underwriting community and producers alike. Many carriers and agents also are using Facebook to promote their company to customers - and as a point-of-entry to allow them to perform some functionality - such as accessing product marketing information versus directing users to go to the carrier's own proprietary website. Other carriers have released their tight brand marketing to allow their producers to drive more business to their personal Facebook site where they offer innovative tools such as Application Capture or asking medical information in a more relaxed fashion. Other key topics at the conference included the economy, ongoing industry consolidation, real-estate valuations as an asset and input into the underwriting process, and producer trends. All stressed a "back to basics" approach for low cost, term products. Finally, Connie Merritt, RN, PHN, entertained the large group of atttendees with audience-engaging insight on how to "Tame the Lions in Your Life - Dealing with Complainers, Bullies, Grump and Curmudgeon." Merritt noted "we are too busy for our own good." She shared how her overachieving personality had impacted her life. Audience members then were asked to pick red, yellow, blue, or green shapes, without knowing that each one represented a specific personality trait. For example, those who picked blue were the peacemakers. Those who choose yellow were social - the hint was to "Be Quiet Longer." She then offered these "lion taming" steps: 1. Admit It 2. Accept It 3. Let Go 4. Be Present (which paralleled Gupta's IKIGAI concept) When thinking about underwriting I encourage you to be present in the moment and think creatively, but don't be afraid to look ahead to the future and be an innovator. I hope to see you at next year's AHOU Annual Conference, May 1-4, 2011 at The Mirage in Las Vegas, Nev. Susan Keuer is the product strategy manager for new business underwriting. She brings more than 20 years of insurance industry experience working with leading insurance carriers and technology companies to her role on the product strategy team for life/annuities solutions within the Oracle Insurance Global Business Unit

Read the article

A New Threat To Web Applications: Connection String Parameter Pollution (CSPP)

- by eric.maurice

Hi, this is Shaomin Wang. I am a security analyst in Oracle's Security Alerts Group. My primary responsibility is to evaluate the security vulnerabilities reported externally by security researchers on Oracle Fusion Middleware and to ensure timely resolution through the Critical Patch Update. Today, I am going to talk about a serious type of attack: Connection String Parameter Pollution (CSPP). Earlier this year, at the Black Hat DC 2010 Conference, two Spanish security researchers, Jose Palazon and Chema Alonso, unveiled a new class of security vulnerabilities, which target insecure dynamic connections between web applications and databases. The attack called Connection String Parameter Pollution (CSPP) exploits specifically the semicolon delimited database connection strings that are constructed dynamically based on the user inputs from web applications. CSPP, if carried out successfully, can be used to steal user identities and hijack web credentials. CSPP is a high risk attack because of the relative ease with which it can be carried out (low access complexity) and the potential results it can have (high impact). In today's blog, we are going to first look at what connection strings are and then review the different ways connection string injections can be leveraged by malicious hackers. We will then discuss how CSPP differs from traditional connection string injection, and the measures organizations can take to prevent this kind of attacks. In web applications, a connection string is a set of values that specifies information to connect to backend data repositories, in most cases, databases. The connection string is passed to a provider or driver to initiate a connection. Vendors or manufacturers write their own providers for different databases. Since there are many different providers and each provider has multiple ways to make a connection, there are many different ways to write a connection string. Here are some examples of connection strings from Oracle Data Provider for .Net/ODP.Net: Oracle Data Provider for .Net / ODP.Net; Manufacturer: Oracle; Type: .NET Framework Class Library: - Using TNS Data Source = orcl; User ID = myUsername; Password = myPassword; - Using integrated security Data Source = orcl; Integrated Security = SSPI; - Using the Easy Connect Naming Method Data Source = username/password@//myserver:1521/my.server.com - Specifying Pooling parameters Data Source=myOracleDB; User Id=myUsername; Password=myPassword; Min Pool Size=10; Connection Lifetime=120; Connection Timeout=60; Incr Pool Size=5; Decr Pool Size=2; There are many variations of the connection strings, but the majority of connection strings are key value pairs delimited by semicolons. Attacks on connection strings are not new (see for example, this SANS White Paper on Securing SQL Connection String). Connection strings are vulnerable to injection attacks when dynamic string concatenation is used to build connection strings based on user input. When the user input is not validated or filtered, and malicious text or characters are not properly escaped, an attacker can potentially access sensitive data or resources. For a number of years now, vendors, including Oracle, have created connection string builder class tools to help developers generate valid connection strings and potentially prevent this kind of vulnerability. Unfortunately, not all application developers use these utilities because they are not aware of the danger posed by this kind of attacks. So how are Connection String parameter Pollution (CSPP) attacks different from traditional Connection String Injection attacks? First, let's look at what parameter pollution attacks are. Parameter pollution is a technique, which typically involves appending repeating parameters to the request strings to attack the receiving end. Much of the public attention around parameter pollution was initiated as a result of a presentation on HTTP Parameter Pollution attacks by Stefano Di Paola and Luca Carettoni delivered at the 2009 Appsec OWASP Conference in Poland. In HTTP Parameter Pollution attacks, an attacker submits additional parameters in HTTP GET/POST to a web application, and if these parameters have the same name as an existing parameter, the web application may react in different ways depends on how the web application and web server deal with multiple parameters with the same name. When applied to connections strings, the rule for the majority of database providers is the "last one wins" algorithm. If a KEYWORD=VALUE pair occurs more than once in the connection string, the value associated with the LAST occurrence is used. This opens the door to some serious attacks. By way of example, in a web application, a user enters username and password; a subsequent connection string is generated to connect to the back end database. Data Source = myDataSource; Initial Catalog = db; Integrated Security = no; User ID = myUsername; Password = XXX; In the password field, if the attacker enters "xxx; Integrated Security = true", the connection string becomes, Data Source = myDataSource; Initial Catalog = db; Integrated Security = no; User ID = myUsername; Password = XXX; Intergrated Security = true; Under the "last one wins" principle, the web application will then try to connect to the database using the operating system account under which the application is running to bypass normal authentication. CSPP poses serious risks for unprepared organizations. It can be particularly dangerous if an Enterprise Systems Management web front-end is compromised, because attackers can then gain access to control panels to configure databases, systems accounts, etc. Fortunately, organizations can take steps to prevent this kind of attacks. CSPP falls into the Injection category of attacks like Cross Site Scripting or SQL Injection, which are made possible when inputs from users are not properly escaped or sanitized. Escaping is a technique used to ensure that characters (mostly from user inputs) are treated as data, not as characters, that is relevant to the interpreter's parser. Software developers need to become aware of the danger of these attacks and learn about the defenses mechanism they need to introduce in their code. As well, software vendors need to provide templates or classes to facilitate coding and eliminate developers' guesswork for protecting against such vulnerabilities. Oracle has introduced the OracleConnectionStringBuilder class in Oracle Data Provider for .NET. Using this class, developers can employ a configuration file to provide the connection string and/or dynamically set the values through key/value pairs. It makes creating connection strings less error-prone and easier to manager, and ultimately using the OracleConnectionStringBuilder class provides better security against injection into connection strings. For More Information: - The OracleConnectionStringBuilder is located at http://download.oracle.com/docs/cd/B28359_01/win.111/b28375/OracleConnectionStringBuilderClass.htm - Oracle has developed a publicly available course on preventing SQL Injections. The Server Technologies Curriculum course "Defending Against SQL Injection Attacks!" is located at http://st-curriculum.oracle.com/tutorial/SQLInjection/index.htm - The OWASP web site also provides a number of useful resources. It is located at http://www.owasp.org/index.php/Main_Page

Read the article

The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

Personal Development : Time, Planning , Repairs & Maintenance

- by Rajesh Pillai

Personal Development : Time, Planning, Repairs & Maintenance These are just my thoughts, but some you may find something interesting in it. Please think over it. We may know many things, but still we always keeps procrastinating it. I have written this as I have heard many people coming back and saying they don’t have time to do things they like. These are my thoughts buy may be useful to someone else too. Certain things in life needs periodic repairs and maintenance. To cite some examples , your CAR, your HOUSE, your personal laptop/desktop, your health etc. Likewise there are certain other things in professional life that requires repair/ maintenance /or some kind of polishing, so that you always stay on top of it. But they are not always obvious. Some of them are - Improving your communication skills - Increasing your vocabulary - Upgrading your technical skills - Pursuing your hobby - Increasing your knowledge/awareness etc… etc… And then there are certain things that we are always short of…. one is TIME. We all know TIME is one of the most precious things in life and yet we all are very miserable at managing it. Remember you can only manage it and not control it. You can only control which you own or which you create. In theory time is infinite. So, there should be abundant of it. But remember one thing, you know this, it’s not reversible. Once it has elapsed you cannot live it again. Think over it. So, how do find that golden 25th hour every day. To find the 25th hour you need to reflect back on your current daily activities. Analyze them and see where you are spending most of your time and is it really important. Even the 8 hours that you spent in the office, is it spent fruitfully. At the end of the day is the 8 precious hour that you spent was worth it. Just reflect back on your activities. Did you learn something? If yes did you make a point to NOTE IT. If you didn’t NOTED it then was the time you spent really worth it. Just ponder over it. Some calculations of your daily activities where most of the time is spent. Let’s start (in no particular order though) - Sleep (6.5 hours) [Remember you only require 6 good hours of sleep every day]. Some may thing it is 8, but it’s a myth. o To achive 6 hours of sleep and be in good health you can practice 15 minutes of daily meditation. So effectively you can round it to 6.5 hours. - Morning chores(2 hours) : Some may need to prepare breakfast and all other things. - Office commuting (avg. to and fro 3 hours) - Office Work (avg 9.5 hours) Total Hours: 21 hours effective time which is spent irrespective of what you do. There may be some variations here and there. Still you have 3 hours EXTRA. Where do these 3 hours go? If you can find it, then you may get that golden 25th hour out of these 3 hours. Let’s discount 2 hours for contingencies, still you have 1 hour with you. If you can’t find it then you are living a direction less life. As you can see, the 25th Hour lies within the 24 hours of the day. It’s upto each one of us to find and make use of it. Now what can you do with that 25th hour i.e. 1 hour extra of your life. Imagine the possibility. Again some calculations 1 hour daily * 30 days = 30 hours every month 30 hours pm * 12 month = 360 hours every year. 360 hours every year seems very promising. Let’s add some contingencies, say, let’s be optimistic and say 50 % contingency. Still you have 180 hours every year. That leaves with 30 minutes every day of extra time. That’s hell a lot of time, if you could manage it. These may sound like a high talk [yes, it is, unless you apply these simple rules and rationalize your everyday living and stop procrastinating]. NOTE: I haven’t taken weekend, holidays and leaves into account. So, that leaves us with a lot of buffer time. You can meet family friends, relatives, other tasks, and yet have these 180 pure hours of joy every year. Do whatever you want to do with it. So, how important is this 180 hours per year to you? Just think over it. You may use it the way you like - 50 hours [pursue your hobby like drawing, crafting, learn dance, learn juggling, learn swimming, travelling hmm.. anything you like doing and you didn’t had time to do it.] - 30 hours you can learn a new programming language or technology (i.e. you can get comfortable with it) - 50 hours [improve existing skills] - 20 hours [improve you communication skill]. Do some light reading. - 30 hours [YOU DECIDE WHAT TO DO]? So, if you had done this for one year you would have learnt a new programming language, upgraded existing skills, improved you communication etc.. If you had done this for two years.. imagine the level of personal development or growth which you may have attained….. If you had done this for three years….. NOW I think I don’t need to mention this… So, you still have TIME, as they say TIME is infinite. So, make judicious use of this precious thing. And never ever comeback saying “I don’t have time”. So, if you are RICH in TIME, everything else will be automatically taken care of, as those things may just be a byproduct of how you spend your time… So, happy TIMING your TIME everyday.

Read the article

Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article

The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article

The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

- by Bertrand Le Roy

We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

Read the article

Avoid the “Social Silo” - Learn Why and How

- by Brian Dayton

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} I’m not going to spend any more real estate than needed on this—social media is big. Facebook hit the Billion user mark in October, that’s 1 out of every 7 humans on the planet. This past Summer (in the Northern hemisphere) Twitter passed the 400 Million Tweet/day mark. The list of social properties and data points goes on and on. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} With social your customer, prospect, or constituent has pervasive access—through mobile—to a global audience, the ability to influence friends, friends of friends, and even people they will never meet. They also have the unique opportunity to forge a deeper relationship with your business—telling you what they like, what they don’t like, how you can help, and what they’d like to see more of. Are you listening? Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} What’s the Bottom Line for Business? Businesses need to be where their customers are—on social properties. They need to be available and responsive in those channels—24x7x365. They need to engage and communicate in new ways—sometimes in less than 140 characters and with empathy, not a 1-way megaphone. Finally, businesses need to look at social as an extension of their existing business practices. Not as a silo’d communication channel limited to marketing. Social Can’t Be a Silo – Learn Why @ Oracle CloudWorld When a business is on social networks they represent the whole business. That’s how a customer, constituent, partner or potential candidate sees it. Those organizations that have moved on the opportunity to build closer relationships through social marketing have already made the first step. Social Selling, Service, eCommerce, and Recruiting are external-facing opportunities that leading organizations are moving on right now. This strategy, one of weaving social into and across your business processes—and leveraging social concepts and technologies for internal collaboration—is something you can learn about during an Oracle CloudWorld event in a city near you. You’ll hear and see social relationship management concepts, best-practices, and recommendations woven into topics, discussions, and demonstrations throughout the event—from Marketing and Sales to Service and Human Resources. Stay Tuned and Avoid Potholes By all indications social is here to stay but it’s moving fast and social business strategies are evolving rapidly. At Oracle CloudWorld you’ll also get the opportunity to learn how to avoid some of the potholes on the road to #socialbusiness. Stay tuned to this blog. In future posts I’ll cover some of those potholes including the challenges of Social@Scale and Parallel Processes. Jump-start your social business strategy or learn how to refine and expand what you’re doing already at Oracle CloudWorld. Want to learn more about what Oracle is doing in social? Check out www.oracle.com/social or, if you're looking for a quick read my co-worker, Pat Ma, has a great post on this blog summarizing some popular Social Relationship Management use cases.

Read the article

Using Node.js as an accelerator for WCF REST services

- by Elton Stoneman

Node.js is a server-side JavaScript platform "for easily building fast, scalable network applications". It's built on Google's V8 JavaScript engine and uses an (almost) entirely async event-driven processing model, running in a single thread. If you're new to Node and your reaction is "why would I want to run JavaScript on the server side?", this is the headline answer: in 150 lines of JavaScript you can build a Node.js app which works as an accelerator for WCF REST services*. It can double your messages-per-second throughput, halve your CPU workload and use one-fifth of the memory footprint, compared to the WCF services direct. Well, it can if: 1) your WCF services are first-class HTTP citizens, honouring client cache ETag headers in request and response; 2) your services do a reasonable amount of work to build a response; 3) your data is read more often than it's written. In one of my projects I have a set of REST services in WCF which deal with data that only gets updated weekly, but which can be read hundreds of times an hour. The services issue ETags and will return a 304 if the client sends a request with the current ETag, which means in the most common scenario the client uses its local cached copy. But when the weekly update happens, then all the client caches are invalidated and they all need the same new data. Then the service will get hundreds of requests with old ETags, and they go through the full service stack to build the same response for each, taking up threads and processing time. Part of that processing means going off to a database on a separate cloud, which introduces more latency and downtime potential. We can use ASP.NET output caching with WCF to solve the repeated processing problem, but the server will still be thread-bound on incoming requests, and to get the current ETags reliably needs a database call per request. The accelerator solves that by running as a proxy - all client calls come into the proxy, and the proxy routes calls to the underlying REST service. We could use Node as a straight passthrough proxy and expect some benefit, as the server would be less thread-bound, but we would still have one WCF and one database call per proxy call. But add some smart caching logic to the proxy, and share ETags between Node and WCF (so the proxy doesn't even need to call the servcie to get the current ETag), and the underlying service will only be invoked when data has changed, and then only once - all subsequent client requests will be served from the proxy cache. I've built this as a sample up on GitHub: NodeWcfAccelerator on sixeyed.codegallery. Here's how the architecture looks: The code is very simple. The Node proxy runs on port 8010 and all client requests target the proxy. If the client request has an ETag header then the proxy looks up the ETag in the tag cache to see if it is current - the sample uses memcached to share ETags between .NET and Node. If the ETag from the client matches the current server tag, the proxy sends a 304 response with an empty body to the client, telling it to use its own cached version of the data. If the ETag from the client is stale, the proxy looks for a local cached version of the response, checking for a file named after the current ETag. If that file exists, its contents are returned to the client as the body in a 200 response, which includes the current ETag in the header. If the proxy does not have a local cached file for the service response, it calls the service, and writes the WCF response to the local cache file, and to the body of a 200 response for the client. So the WCF service is only troubled if both client and proxy have stale (or no) caches. The only (vaguely) clever bit in the sample is using the ETag cache, so the proxy can serve cached requests without any communication with the underlying service, which it does completely generically, so the proxy has no notion of what it is serving or what the services it proxies are doing. The relative path from the URL is used as the lookup key, so there's no shared key-generation logic between .NET and Node, and when WCF stores a tag it also stores the "read" URL against the ETag so it can be used for a reverse lookup, e.g: Key Value /WcfSampleService/PersonService.svc/rest/fetch/3 "28cd4796-76b8-451b-adfd-75cb50a50fa6" "28cd4796-76b8-451b-adfd-75cb50a50fa6" /WcfSampleService/PersonService.svc/rest/fetch/3 In Node we read the cache using the incoming URL path as the key and we know that "28cd4796-76b8-451b-adfd-75cb50a50fa6" is the current ETag; we look for a local cached response in /caches/28cd4796-76b8-451b-adfd-75cb50a50fa6.body (and the corresponding .header file which contains the original service response headers, so the proxy response is exactly the same as the underlying service). When the data is updated, we need to invalidate the ETag cache – which is why we need the reverse lookup in the cache. In the WCF update service, we don't need to know the URL of the related read service - we fetch the entity from the database, do a reverse lookup on the tag cache using the old ETag to get the read URL, update the new ETag against the URL, store the new reverse lookup and delete the old one. Running Apache Bench against the two endpoints gives the headline performance comparison. Making 1000 requests with concurrency of 100, and not sending any ETag headers in the requests, with the Node proxy I get 102 requests handled per second, average response time of 975 milliseconds with 90% of responses served within 850 milliseconds; going direct to WCF with the same parameters, I get 53 requests handled per second, mean response time of 1853 milliseconds, with 90% of response served within 3260 milliseconds. Informally monitoring server usage during the tests, Node maxed at 20% CPU and 20Mb memory; IIS maxed at 60% CPU and 100Mb memory. Note that the sample WCF service does a database read and sleeps for 250 milliseconds to simulate a moderate processing load, so this is *not* a baseline Node-vs-WCF comparison, but for similar scenarios where the service call is expensive but applicable to numerous clients for a long timespan, the performance boost from the accelerator is considerable. * - actually, the accelerator will work nicely for any HTTP request, where the URL (path + querystring) uniquely identifies a resource. In the sample, there is an assumption that the ETag is a GUID wrapped in double-quotes (e.g. "28cd4796-76b8-451b-adfd-75cb50a50fa6") – which is the default for WCF services. I use that assumption to name the cache files uniquely, but it is a trivial change to adapt to other ETag formats.

Search Results

Search found 8250 results on 330 pages for 'dunn less'.

Page 306/330 | < Previous Page | 302 303 304 305 306 307 308 309 310 311 312 313 | Next Page >

- by Eric Z Goodnight

- by HighAltitudeCoder

- by Milo

- by Testas

- by Bertrand Le Roy

- by Charles Young

- by Milo

- by RoyOsherove

- by trond-arne.undheim

- by jamiet

- by Chris Muir

- by Chris Hoffman

- by Greg Jensen

- by James Michael Hare

- by extended_events

- by [email protected]

- by eric.maurice

- by Rob Farley

- by Rob Farley

- by Rajesh Pillai

- by danx

- by BuckWoody

- by Bertrand Le Roy

- by Brian Dayton

- by Elton Stoneman

< Previous Page | 302 303 304 305 306 307 308 309 310 311 312 313 | Next Page >