Search Results

Search found 11513 results on 461 pages for 'level 2'.

Page 112/461 | < Previous Page | 108 109 110 111 112 113 114 115 116 117 118 119  | Next Page >

  • Thread placement policies on NUMA systems - update

    - by Dave
    In a prior blog entry I noted that Solaris used a "maximum dispersal" placement policy to assign nascent threads to their initial processors. The general idea is that threads should be placed as far away from each other as possible in the resource topology in order to reduce resource contention between concurrently running threads. This policy assumes that resource contention -- pipelines, memory channel contention, destructive interference in the shared caches, etc -- will likely outweigh (a) any potential communication benefits we might achieve by packing our threads more densely onto a subset of the NUMA nodes, and (b) benefits of NUMA affinity between memory allocated by one thread and accessed by other threads. We want our threads spread widely over the system and not packed together. Conceptually, when placing a new thread, the kernel picks the least loaded node NUMA node (the node with lowest aggregate load average), and then the least loaded core on that node, etc. Furthermore, the kernel places threads onto resources -- sockets, cores, pipelines, etc -- without regard to the thread's process membership. That is, initial placement is process-agnostic. Keep reading, though. This description is incorrect. On Solaris 10 on a SPARC T5440 with 4 x T2+ NUMA nodes, if the system is otherwise unloaded and we launch a process that creates 20 compute-bound concurrent threads, then typically we'll see a perfect balance with 5 threads on each node. We see similar behavior on an 8-node x86 x4800 system, where each node has 8 cores and each core is 2-way hyperthreaded. So far so good; this behavior seems in agreement with the policy I described in the 1st paragraph. I recently tried the same experiment on a 4-node T4-4 running Solaris 11. Both the T5440 and T4-4 are 4-node systems that expose 256 logical thread contexts. To my surprise, all 20 threads were placed onto just one NUMA node while the other 3 nodes remained completely idle. I checked the usual suspects such as processor sets inadvertently left around by colleagues, processors left offline, and power management policies, but the system was configured normally. I then launched multiple concurrent instances of the process, and, interestingly, all the threads from the 1st process landed on one node, all the threads from the 2nd process landed on another node, and so on. This happened even if I interleaved thread creating between the processes, so I was relatively sure the effect didn't related to thread creation time, but rather that placement was a function of process membership. I this point I consulted the Solaris sources and talked with folks in the Solaris group. The new Solaris 11 behavior is intentional. The kernel is no longer using a simple maximum dispersal policy, and thread placement is process membership-aware. Now, even if other nodes are completely unloaded, the kernel will still try to pack new threads onto the home lgroup (socket) of the primordial thread until the load average of that node reaches 50%, after which it will pick the next least loaded node as the process's new favorite node for placement. On the T4-4 we have 64 logical thread contexts (strands) per socket (lgroup), so if we launch 48 concurrent threads we will find 32 placed on one node and 16 on some other node. If we launch 64 threads we'll find 32 and 32. That means we can end up with our threads clustered on a small subset of the nodes in a way that's quite different that what we've seen on Solaris 10. So we have a policy that allows process-aware packing but reverts to spreading threads onto other nodes if a node becomes too saturated. It turns out this policy was enabled in Solaris 10, but certain bugs suppressed the mixed packing/spreading behavior. There are configuration variables in /etc/system that allow us to dial the affinity between nascent threads and their primordial thread up and down: see lgrp_expand_proc_thresh, specifically. In the OpenSolaris source code the key routine is mpo_update_tunables(). This method reads the /etc/system variables and sets up some global variables that will subsequently be used by the dispatcher, which calls lgrp_choose() in lgrp.c to place nascent threads. Lgrp_expand_proc_thresh controls how loaded an lgroup must be before we'll consider homing a process's threads to another lgroup. Tune this value lower to have it spread your process's threads out more. To recap, the 'new' policy is as follows. Threads from the same process are packed onto a subset of the strands of a socket (50% for T-series). Once that socket reaches the 50% threshold the kernel then picks another preferred socket for that process. Threads from unrelated processes are spread across sockets. More precisely, different processes may have different preferred sockets (lgroups). Beware that I've simplified and elided details for the purposes of explication. The truth is in the code. Remarks: It's worth noting that initial thread placement is just that. If there's a gross imbalance between the load on different nodes then the kernel will migrate threads to achieve a better and more even distribution over the set of available nodes. Once a thread runs and gains some affinity for a node, however, it becomes "stickier" under the assumption that the thread has residual cache residency on that node, and that memory allocated by that thread resides on that node given the default "first-touch" page-level NUMA allocation policy. Exactly how the various policies interact and which have precedence under what circumstances could the topic of a future blog entry. The scheduler is work-conserving. The x4800 mentioned above is an interesting system. Each of the 8 sockets houses an Intel 7500-series processor. Each processor has 3 coherent QPI links and the system is arranged as a glueless 8-socket twisted ladder "mobius" topology. Nodes are either 1 or 2 hops distant over the QPI links. As an aside the mapping of logical CPUIDs to physical resources is rather interesting on Solaris/x4800. On SPARC/Solaris the CPUID layout is strictly geographic, with the highest order bits identifying the socket, the next lower bits identifying the core within that socket, following by the pipeline (if present) and finally the logical thread context ("strand") on the core. But on Solaris on the x4800 the CPUID layout is as follows. [6:6] identifies the hyperthread on a core; bits [5:3] identify the socket, or package in Intel terminology; bits [2:0] identify the core within a socket. Such low-level details should be of interest only if you're binding threads -- a bad idea, the kernel typically handles placement best -- or if you're writing NUMA-aware code that's aware of the ambient placement and makes decisions accordingly. Solaris introduced the so-called critical-threads mechanism, which is expressed by putting a thread into the FX scheduling class at priority 60. The critical-threads mechanism applies to placement on cores, not on sockets, however. That is, it's an intra-socket policy, not an inter-socket policy. Solaris 11 introduces the Power Aware Dispatcher (PAD) which packs threads instead of spreading them out in an attempt to be able to keep sockets or cores at lower power levels. Maximum dispersal may be good for performance but is anathema to power management. PAD is off by default, but power management polices constitute yet another confounding factor with respect to scheduling and dispatching. If your threads communicate heavily -- one thread reads cache lines last written by some other thread -- then the new dense packing policy may improve performance by reducing traffic on the coherent interconnect. On the other hand if your threads in your process communicate rarely, then it's possible the new packing policy might result on contention on shared computing resources. Unfortunately there's no simple litmus test that says whether packing or spreading is optimal in a given situation. The answer varies by system load, application, number of threads, and platform hardware characteristics. Currently we don't have the necessary tools and sensoria to decide at runtime, so we're reduced to an empirical approach where we run trials and try to decide on a placement policy. The situation is quite frustrating. Relatedly, it's often hard to determine just the right level of concurrency to optimize throughput. (Understanding constructive vs destructive interference in the shared caches would be a good start. We could augment the lines with a small tag field indicating which strand last installed or accessed a line. Given that, we could augment the CPU with performance counters for misses where a thread evicts a line it installed vs misses where a thread displaces a line installed by some other thread.)

    Read the article

  • Is there a good resource for learning Rails in depth? [closed]

    - by Kocheez
    I've been developing rails applications for about 6 months now (I was originally a java developer) and I'm getting familiar enough with building applications that I want to take my rails knowledge to the next level. The majority of books and learning materials I've found deal mostly with "how to use rails" rather than "how it works". I was wondering if there are any good resources for getting a really in depth understanding of the framework, such as how modules and classes are loaded, the underlying architecture, how servers interact, etc... Any tips on learning more would be greatly appreciated

    Read the article

  • BizTalk ESB Toolkit: Core Components and Examples

    - by Rajesh Charagandla
    The BizTalk ESB Toolkit 2.0 provides a stable and powerful platform for services that can change as fast as your business needs. The main purpose of an enterprise service bus (ESB) to is to provide a common mediation layer (the “bus”) through which all services connect. By doing so, not only can many of the problems of point-to-point service connectivity be resolved, but a new level of agile service delivery can be achieved. Author: Jon Flanders This Document can be download from here.

    Read the article

  • .BasePermissions enumerations options in Microsoft.SharePoint.SPRoleDefinition

    - by steve schofield
    I was trying to add a new permission level to Sharepoint 2007 and needed to know all the enumeration options.    Here is the list...  Specify one of the following enumeration values and try again. The possible enumeration values are "EmptyMask, ViewListItems, AddListItems, EditListItems, DeleteListItems, ApproveItems, OpenItems, ViewVersions, DeleteVersions, CancelCheckout, ManagePersonalViews, ManageLists, ViewFormPages, Open, ViewPages, AddAndCustomizePages, ApplyThemeAndBorder, ApplyStyleSheets, ViewUsageData, CreateSSCSite, ManageSubwebs, CreateGroups, ManagePermissions,BrowseDirectories, BrowseUserInfo, AddDelPrivateWebParts, UpdatePersonalWebParts, ManageWeb, UseClientIntegration, UseRemoteAPIs, ManageAlerts, CreateAlerts, EditMyUserInfo, EnumeratePermissions, FullMask"."

    Read the article

  • Difference between Windows and Linux development environments?

    - by Ryan
    I have an interview coming up soon for a Business Analyst position and the recruiter mentioned some feedback from a prior candidate that was interviewed who said the interviewers asked him what the difference between a Windows and Linux development environment was. Are there some high level things I need to be aware of from a business point of view when working with a development team or designing an application on Windows vs Linux?

    Read the article

  • SQL University: Parallelism Week - Part 3, Settings and Options

    - by Adam Machanic
    Congratulations! You've made it back for the the third and final installment of Parallelism Week here at SQL University . So far we've covered the fundamentals of multitasking vs. parallel processing and delved into how parallel query plans actually work . Today we'll take a look at the settings and options that influence intra-query parallelism and discuss how best to set things up in various situations. Instance-Level Configuration Your database server probably has more than one logical processor....(read more)

    Read the article

  • Can LittleBigPlanet2's engine be used for other ?

    - by Bill
    LittleBigPlanet2 just came out. I've worked with the original LBP level editor a bit and really enjoyed it. I've read that LBP2's featureset in the game is much richer; is it possible to use these advanced features to create different sorts of game other than just a regular platformer? I imagine that something along the lines of a Breakout clone would definitely be manageable, but I'm interested in hearing more about the capabilities of the platform.

    Read the article

  • Oracle’s Sun Server X4-8 with Built-in Elastic Computing

    - by kgee
    We are excited to announce the release of Oracle's new 8-socket server, Sun Server X4-8. It’s the most flexible 8-socket x86 server Oracle has ever designed, and also the most powerful. Not only does it use the fastest Intel® Xeon® E7 v2 processors, but also its memory, I/O and storage subsystems are all designed for maximum performance and throughput. Like its predecessor, the Sun Server X4-8 uses a “glueless” design that allows for maximum performance for Oracle Database, while also reducing power consumption and improving reliability. The specs are pretty impressive. Sun Server X4-8 supports 120 cores (or 240 threads), 6 TB memory, 9.6 TB HDD capacity or 3.2 TB SSD capacity, contains 16 PCIe Gen 3 I/O expansion slots, and allows for up to 6.4 TB Sun Flash Accelerator F80 PCIe Cards. The Sun Server X4-8 is also the most dense x86 server with its 5U chassis, allowing 60% higher rack-level core and DIMM slot density than the competition.  There has been a lot of innovation in Oracle’s x86 product line, but the latest and most significant is a capability called elastic computing. This new capability is built into each Sun Server X4-8.   Elastic computing starts with the Intel processor. While Intel provides a wide range of processors each with a fixed combination of core count, operational frequency, and power consumption, customers have been forced to make tradeoffs when they select a particular processor. They have had to make educated guesses on which particular processor (core count/frequency/cache size) will be best suited for the workload they intend to execute on the server.Oracle and Intel worked jointly to define a new processor, the Intel Xeon E7-8895 v2 for the Sun Server X4-8, that has unique characteristics and effectively combines the capabilities of three different Xeon processors into a single processor. Oracle system design engineers worked closely with Oracle’s operating system development teams to achieve the ability to vary the core count and operating frequency of the Xeon E7-8895 v2 processor with time without the need for a system level reboot.  Along with the new processor, enhancements have been made to the system BIOS, Oracle Solaris, and Oracle Linux, which allow the processors in the system to dynamically clock up to faster speeds as cores are disabled and to reach higher maximum turbo frequencies for the remaining active cores. One customer, a stock market trading company, will take advantage of the elastic computing capability of Sun Server X4-8 by repurposing servers between daytime stock trading activity and nighttime stock portfolio processing, daily, to achieve maximum performance of each workload.To learn more about Sun Server X4-8, you can find more details including the data sheet and white papers here.Josh Rosen is a Principal Product Manager for Oracle’s x86 servers, focusing on Oracle’s operating systems and software. He previously spent more than a decade as a developer and architect of system management software. Josh has worked on system management for many of Oracle's hardware products ranging from the earliest blade systems to the latest Oracle x86 servers.

    Read the article

  • Friday Fun: Hexep

    - by Asian Angel
    This week’s game starts off simple enough, but will quickly challenge your problem solving skills as you work to fill in the hex chains with color on each level. Do you have the patience and skill to succeed at this wicked brain-teaser or will you end up screaming in frustration and defeat? How To Play DVDs on Windows 8 6 Start Menu Replacements for Windows 8 What Is the Purpose of the “Do Not Cover This Hole” Hole on Hard Drives?

    Read the article

  • IntelliSense for Razor Hosting in non-Web Applications

    - by Rick Strahl
    When I posted my Razor Hosting article a couple of weeks ago I got a number of questions on how to get IntelliSense to work inside of Visual Studio while editing your templates. The answer to this question is mainly dependent on how Visual Studio recognizes assemblies, so a little background is required. If you open a template just on its own as a standalone file by clicking on it say in Explorer, Visual Studio will open up with the template in the editor, but you won’t get any IntelliSense on any of your related assemblies that you might be using by default. It’ll give Intellisense on base System namespace, but not on your imported assembly types. This makes sense: Visual Studio has no idea what the assembly associations for the single file are. There are two options available to you to make IntelliSense work for templates: Add the templates as included files to your non-Web project Add a BIN folder to your template’s folder and add all assemblies required there Including Templates in your Host Project By including templates into your Razor hosting project, Visual Studio will pick up the project’s assembly references and make IntelliSense available for any of the custom types in your project and on your templates. To see this work I moved the \Templates folder from the samples from the Debug\Bin folder into the project root and included the templates in the WinForm sample project. Here’s what this looks like in Visual Studio after the templates have been included:   Notice that I take my original example and type cast the Context object to the specific type that it actually represents – namely CustomContext – by using a simple code block: @{ CustomContext Model = Context as CustomContext; } After that assignment my Model local variable is in scope and IntelliSense works as expected. Note that you also will need to add any namespaces with the using command in this case: @using RazorHostingWinForm which has to be defined at the very top of a Razor document. BTW, while you can only pass in a single Context 'parameter’ to the template with the default template I’ve provided realize that you can also assign a complex object to Context. For example you could have a container object that references a variety of other objects which you can then cast to the appropriate types as needed: @{ ContextContainer container = Context as ContextContainer; CustomContext Model = container.Model; CustomDAO DAO = container.DAO; } and so forth. IntelliSense for your Custom Template Notice also that you can get IntelliSense for the top level template by specifying an inherits tag at the top of the document: @inherits RazorHosting.RazorTemplateFolderHost By specifying the above you can then get IntelliSense on your base template’s properties. For example, in my base template there are Request and Response objects. This is very useful especially if you end up creating custom templates that include your custom business objects as you can get effectively see full IntelliSense from the ‘page’ level down. For Html Help Builder for example, I’d have a Help object on the page and assuming I have the references available I can see all the way into that Help object without even having to do anything fancy. Note that the @inherits key is a GREAT and easy way to override the base template you normally specify as the default template. It allows you to create a custom template and as long as it inherits from the base template it’ll work properly. Since the last post I’ve also made some changes in the base template that allow hooking up some simple initialization logic so it gets much more easy to create custom templates and hook up custom objects with an IntializeTemplate() hook function that gets called with the Context and a Configuration object. These objects are objects you can pass in at runtime from your host application and then assign to custom properties on your template. For example the default implementation for RazorTemplateFolderHost does this: public override void InitializeTemplate(object context, object configurationData) { // Pick up configuration data and stuff into Request object RazorFolderHostTemplateConfiguration config = configurationData as RazorFolderHostTemplateConfiguration; this.Request.TemplatePath = config.TemplatePath; this.Request.TemplateRelativePath = config.TemplateRelativePath; // Just use the entire ConfigData as the model, but in theory // configData could contain many objects or values to set on // template properties this.Model = config.ConfigData as TModel; } to set up a strongly typed Model and the Request object. You can do much more complex hookups here of course and create complex base template pages that contain all the objects that you need in your code with strong typing. Adding a Bin folder to your Template’s Root Path Including templates in your host project works if you own the project and you’re the only one modifying the templates. However, if you are distributing the Razor engine as a templating/scripting solution as part of your application or development tool the original project is likely not available and so that approach is not practical. Another option you have is to add a Bin folder and add all the related assemblies into it. You can also add a Web.Config file with assembly references for any GAC’d assembly references that need to be associated with the templates. Between the web.config and bin folder Visual Studio can figure out how to provide IntelliSense. The Bin folder should contain: The RazorHosting.dll Your host project’s EXE or DLL – renamed to .dll if it’s an .exe Any external (bin folder) dependent assemblies Note that you most likely also want a reference to the host project if it contains references that are going to be used in templates. Visual Studio doesn’t recognize an EXE reference so you have to rename the EXE to DLL to make it work. Apparently the binary signature of EXE and DLL files are identical and it just works – learn something new everyday… For GAC assembly references you can add a web.config file to your template root. The Web.config file then should contain any full assembly references to GAC components: <configuration> <system.web> <compilation debug="true"> <assemblies> <add assembly="System.Web.Mvc, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" /> <add assembly="System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" /> <add assembly="System.Web.Extensions, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" /> </assemblies> </compilation> </system.web> </configuration> And with that you should get full IntelliSense. Note that if you add a BIN folder and you also have the templates in your Visual Studio project Visual Studio will complain about reference conflicts as it’s effectively seeing both the project references and the ones in the bin folder. So it’s probably a good idea to use one or the other but not both at the same time :-) Seeing IntelliSense in your Razor templates is a big help for users of your templates. If you’re shipping an application level scripting solution especially it’ll be real useful for your template consumers/users to be able to get some quick help on creating customized templates – after all that’s what templates are all about – easy customization. Making sure that everything is referenced in your bin folder and web.config is a good idea and it’s great to see that Visual Studio (and presumably WebMatrix/Visual Web Developer as well) will be able to pick up your custom IntelliSense in Razor templates.© Rick Strahl, West Wind Technologies, 2005-2011Posted in Razor  

    Read the article

  • Kickstarter "last minute cold feet"

    - by mm24
    today I scheduled the publication of a video on kickstarter requesting approximately 5.000 $ in order to complete the iPhone shooter game I started 1 year ago after quitting my job. I invested more than 20.000$ in the game so far (for artwork, music, legal and accountant expenses) and I am now getting cold feet about my decision of publishing the video. The game is "nearly finished", in other words: the game mechanics are working but I still have some bugs to fix. Once I will have finished this (I hope will take me 1 or 2 weeks) I plan to start working on the actual level balancing (e.g. deciding the order of appearence of enemies for each level and balancing the number of hitpoints and strenght of bullets that the enemies have). Reasons for not publishing the video are: fear that the concept can be copied easily: the game is a shooter game set in a different environment (its a pretty cool one, believe me :)) and I am worried that someone might copy* the idea (I know, its the usual "I am worried story.."). A shooter game is one of the easiest game to implement and hence there will be hundreds game developer able to copy it by just adapting their existing code and changing graphics (not as straightforward). It took me one year to develop this because I was inexperienced plus there are approximately 6/7 months of work from the illustrator and there are 8 unique music tracks composed. The soundtrack of the video is the soundtrack of the game wich is not yet published and has not been deposited to a music society. I did create legally valid timestamps for the tracks and I am considering uploading the album on iTunes before publishing the video so I can have a certain publication date. But overall I am a bit scared and worried because I have never done this before and even the simple act of publishing an album requires me to read a long contract from the "aggregator company") which, even if I do have contracts with the musicians do worry me as I am not a U.S. resident and I am not familiar with the U.S. law system Reasons for publishing the video are: I almost run out of money (but this is not a real reason as I should have enough for one more month of development time) ...I kind of need extra money as, even if I do have money for 1 month of development I do not have money for marketing and for other expenses (e.g. accountant) It will create a fan base I could get some useful feedback from a wider range of beta testers It might create some pre-release buzz in case some blogger or game magazine likes the concept Anyone has had similar experiences? Is there a real risk that someone will copy the concept and implement it in a couple of months? Will the Kickstarter campaing be a good pre-release exposure for the gmae? Any refrences of similar projects/situations? Is it realistic that someone like ROVIO will copy the idea straight away?

    Read the article

  • GDD-BR 2010 [2G] So What's A Web App? Introduction to the Chrome Web Store

    GDD-BR 2010 [2G] So What's A Web App? Introduction to the Chrome Web Store Speaker: Eric Bidelman Track: Chrome and HTML5 Time slot: G [16:30 - 17:15] Room: 2 Level: 151 What does it mean to install a web app? This session will give an overview of how to build a beautiful application for the Chrome Web Store, monetize, and distribute it to 70 million users! From: GoogleDevelopers Views: 146 3 ratings Time: 38:57 More in Science & Technology

    Read the article

  • APress Deal of the Day - 1/Aug/2013 - Pro SQL Server 2012 Reporting Services

    - by TATWORTH
    Originally posted on: http://geekswithblogs.net/TATWORTH/archive/2013/08/01/apress-deal-of-the-day---1aug2013---pro-sql.aspxToday's $10 deal of the day from APress at http://www.apress.com/9781430238102 is Pro SQL Server 2012 Reporting Services."Pro SQL Server 2012 Reporting Services opens the door to delivering customizable, web-enabled reports across your business using Microsoft’s enterprise-level reporting platform."

    Read the article

  • Query Tuning Mastery at PASS Summit 2012: The Demos

    - by Adam Machanic
    For the second year in a row, I was asked to deliver a 500-level "Query Tuning Mastery" talk in room 6E of the Washington State Convention Center, for the PASS Summit. ( Here's some information about last year's talk, on workspace memory. ) And for the second year in a row, I had to deliver said talk at 10:15 in the morning, in a room used as overflow for the keynote, following a keynote speaker that didn't stop speaking on time. Frustrating! Last Thursday, after very, very quickly setting up and...(read more)

    Read the article

  • Taking the fear out of a Cloud initiative through the use of security tools

    - by user736511
    Typical employees, constituents, and business owners  interact with online services at a level where their knowledge of back-end systems is low, and most of the times, there is no interest in knowing the systems' architecture.  Most application administrators, while partially responsible for these systems' upkeep, have very low interactions with them, at least at an operational, platform level.  Of greatest interest to these groups is the consistent, reliable, and manageable operation of the interfaces with which they communicate.  Introducing the "Cloud" topic in any evolving architecture automatically raises the concerns for data and identity security simply because of the perception that when owning the silicon, enterprises are not able to manage its content.  But is this really true?   In the majority of traditional architectures, data and applications that access it are physically distant from the organization that owns it.  It may reside in a shared data center, or a geographically convenient location that spans large organizations' connectivity capabilities.  In the end, very often, the model of a "traditional" architecture is fairly close to the "new" Cloud architecture.  Most notable difference is that by nature, a Cloud setup uses security as a core function, and not as a necessary add-on. Therefore, following best practices, one can say that data can be safer in the Cloud than in traditional, stove-piped environments where data access is segmented and difficult to audit. The caveat is, of course, what "best practices" consist of, and here is where Oracle's security tools are perfectly suited for the task.  Since Oracle's model is to support very large organizations, it is fundamentally concerned about distributed applications, databases etc and their security, and the related Identity Management Products, or DB Security options reflect that concept.  In the end, consumers of applications and their data are to be served more safely in a controlled Cloud environment, while realizing the many cost savings associated with it. Having very fast resources to serve them (such as the Exa* platform) makes the concept even more attractive.  Finally, if a Cloud strategy does not seem feasible, consider the pros and cons of a traditional vs. a Cloud architecture.  Using the exact same criteria and business goals/traditions, and with Oracle's technology, you might be hard pressed to justify maintaining the technical status quo on security alone. For additional information please visit Oracle's Cloud Security page at: http://www.oracle.com/us/technologies/cloud/cloud-security-428855.html

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • SQL SERVER – Transaction Log Full – Transaction Log Larger than Data File – Notes from Fields #001

    - by Pinal Dave
    I am very excited to announce a new series on this blog – Notes from Fields. I have been blogging for almost 7 years on this blog and it has been a wonderful experience. Though, I have extensive experience with SQL and Databases, it is always a good idea that we consult experts for their advice and opinion. Following the same thought process, I have started this new series of Notes from Fields. In this series we will have notes from various experts in the database world. My friends at Linchpin People have graciously decided to support me in my new initiation.  Linchpin People are database coaches and wellness experts for a data driven world. In this very first episode of the Notes from Fields series database expert Tim Radney (partner at Linchpin People) explains a very common issue DBA and Developer faces in their career, when database logs fills up your hard-drive or your database log is larger than your data file. Read the experience of Tim in his own words. As a consultant, I encounter a number of common issues with clients.  One of the more common things I encounter is finding a user database in the FULL recovery model that does not make a regular transaction log backups or ever had a transaction log backup. When I find this, usually the transaction log is several times larger than the data file. Finding this issue is very significant to me in that it allows to me to discuss service level agreements with the client. I get to ask questions such as, are nightly full backups sufficient or do they need point in time recovery.  This conversation has now signed with the customer and gets them to thinking about their disaster recovery and high availability solutions. This issue is also very prominent on SQL Server forums and usually has the title of “Help, my transaction log has filled up my disk” or “Help, my transaction log is many times the size of my database”. In cases where the client only needs the previous full nights backup, I am able to change the recovery model to SIMPLE and shrink the transaction log using DBCC SHRINKFILE (2,1) or by specifying the transaction log file name by using DBCC SHRINKFILE (file_name, target_size). When the client needs point in time recovery then in most cases I will still end up switching the client to the SIMPLE recovery model to truncate the transaction log followed by a full backup. I will then schedule a SQL Agent job to make the regular transaction log backups with an interval determined by the client to meet their service level agreements. It should also be noted that typically when I find an overgrown transaction log the virtual log file count is also out of control. I clean up will always take that into account as well.  That is a subject for a future blog post. If your SQL Server is facing any issue we can Fix Your SQL Server. Additional reading: Monitoring SQL Server Database Transaction Log Space Growth – DBCC SQLPERF(logspace)  SQL SERVER – How to Stop Growing Log File Too Big Shrinking Truncate Log File – Log Full Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Oracle Launches New Oracle Database 12c Administrator Certifications

    - by Brandye Barrington
    Today Oracle University announces the release of new Oracle Database 12c Administrator certifications. The new Oracle Database 12c certifications emphasize the foundational and advanced skills needed by Database Administrators and will prepare DBAs to leverage powerful new management and consolidation capabilities, resulting in an even more valuable credential for customers and partners. ORACLE CERTIFIED ASSOCIATE (OCA)  The Oracle Certified Associate (OCA) for Oracle Database 12c objectives measure IT professionals' mastery of day-to-day administration skills and their ability to manage the challenges they're likely to encounter on the job. This credential focuses on SQL skills, operational administration of the Oracle Database including performance and space management, and installing, patching and upgrading the Oracle Database. Earning the OCA credential requires successful completion of two exams: 1Z0-061 - Oracle Database 12c: SQL Fundamentals and 1Z0-062 - Oracle Database 12c: Installation and Administration. The OCA certification track also allows for several alternate exams which can be substituted for 1Z0-061. ORACLE CERTIFIED PROFESSIONAL (OCP) Building on the competencies in the Oracle Database 12c OCA certification, the Oracle Certified Professional (OCP) for Oracle Database 12c certification includes advanced knowledge and skills required of top-performing database administrators. The OCP credential focuses on developing and implementing backup and recovery strategies, designing consolidation strategies to exploit multitenant container and pluggable databases, and thorough understanding how CDB/PDBs fit into the DBaaS cloud-computing model. Today, Oracle is releasing 1Z0-060 - Upgrade to Oracle Database 12c, which allows Oracle Certified Professionals with credentials in Oracle 9i, Oracle Database 10g or Oracle Database 11g to upgrade to Oracle Database 12c with a single exam. The upgrade exam focuses on designing consolidation strategies to exploit multitenant container and pluggable databases, implementing Oracle 12c feature-rich ILM support, optimizing SQL execution using dynamic swapping of sub plans, implementing real-time data redaction within databases, as well as exploiting many additional performance, backup and recovery, security and partitioning enhancements. The exam also includes a thorough review of core DBA skills. Visit the OCP certification track for more details on the new upgrade exam as well as alternate certification paths. ORACLE CERTIFIED MASTER (OCM) The Oracle Certified Master (OCM) for Oracle Database 12c - a very challenging and elite top-level certification - certifies the most highly skilled and experienced database experts. Further information on the 12c OCM level will be announced as exam development concludes. To date, there have been more than 1.6 million Oracle certifications granted worldwide. Explore these certification tracks, exam requirements and objectives, and start toward earning your exciting new Oracle Database 12c certification credentials from Oracle.

    Read the article

  • Database design and performance impact

    - by Craige
    I have a database design issue that I'm not quite sure how to approach, nor if the benefits out weigh the costs. I'm hoping some P.SE members can give some feedback on my suggested design, as well as any similar experiences they may have came across. As it goes, I am building an application that has large reporting demands. Speed is an important issue, as there will be peak usages throughout the year. This application/database has a multiple-level, many-to-many relationship. eg object a object b object c object d object b has relationship to object a object c has relationship to object b, a object d has relationship to object c, b, a Theoretically, this could go on for unlimited levels, though logic dictates it could only go so far. My idea here, to speed up reporting, would be to create a syndicate table that acts as a global many-to-many join table. In this table (with the given example), one might see: +----------+-----------+---------+ | child_id | parent_id | type_id | +----------+-----------+---------+ | b | a | 1 | | c | b | 2 | | c | a | 3 | | d | c | 4 | | d | b | 5 | | d | a | 6 | +----------+-----------+---------+ Where a, b, c and d would translate to their respective ID's in their respective tables. So, for ease of reporting all of a which exist on object d, one could query SELECT * FROM `syndicates` ... JOINS TO child and parent tables ... WHERE parent_id=a and type_id=6; rather than having a query with a join to each level up the chain. The Problem This table grows exponentially, and in a given year, could easily grow past 20,000 records for one client. Given multiple clients over multiple years, this table will VERY quickly explode to millions of records and beyond. Now, the database will, in time, be partitioned across multiple servers, but I would like (as most would) to keep the number of servers as low as possible while still offering flexibility. Also writes and updates would be exponentially longer (though possibly not noticeable to the end user) as there would be multiple inserts/updates/scans on this table to keep it in sync. Am I going in the right direction here, or am I way off track. What would you do in a similar situation? This solution seems overly complex, but allows the greatest flexibility and fastest read-operations. Sidenote 1 - This structure allows me to add new levels to the tree easily. Sidenote 2 - The database querying for this database is done through an ORM framework.

    Read the article

  • Very very slow transfer speeds between Windows 7 and samba server running on Ubuntu 11.10/12.04 minimal

    - by kuzyt
    As mentioned in the title I tried transferring files between Windows 7 and the samba server running on both Ubuntu 11.10 and 12.04 but both showed very slow transfer speeds. Can someone please guide me in the right direction to debug this problem ? wget --output-document=/dev/null http://tokyo1.linode.com/100MB-tokyo.bin --2012-08-21 22:02:17-- http://tokyo1.linode.com/100MB-tokyo.bin Resolving tokyo1.linode.com (tokyo1.linode.com)... 106.187.33.12 Connecting to tokyo1.linode.com (tokyo1.linode.com)|106.187.33.12|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 104857600 (100M) [application/octet-stream] Saving to: `/dev/null' 8% [=============> ] 8,923,980 64.8K/s eta 15m 0s wlan0 IEEE 802.11abgn ESSID:"TNET" Mode:Managed Frequency:2.462 GHz Access Point: 58:6D:8F:26:20:7A Bit Rate=117 Mb/s Tx-Power=20 dBm Retry long limit:7 RTS thr:off Fragment thr:off Power Management:off Link Quality=57/70 Signal level=-53 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:101 Invalid misc:2448 Missed beacon:0 03:00.0 Network controller: Atheros Communications Inc. AR9300 Wireless LAN adaptor (rev 01) Subsystem: Atheros Communications Inc. Device 3112 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+ Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=128K] Expansion ROM at fea20000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00 Kernel driver in use: ath9k Kernel modules: ath9k

    Read the article

  • iPhone development using AS3 (Resources)

    - by woodscreative
    I've just realeased my first game developed for the iPhone using AS3 and the iPhone Packager http://itunes.apple.com/us/app/snapshot-paintball/id407362440?mt=8&uo=4 I want to take the game to the next level but I am not using the native iPhone SDK so I need some other ideas, I am fresh to iPhone development and it's hard to find good resources, any AS3 developers out there willing to share some links? Highscore frameworks and best practices, connecting to Facebook, ui classes/gestures. Thanks.

    Read the article

  • Ledge grab and climb in Unity3D

    - by BallzOfSteel
    I just started on a new project. In this project one of the main gameplay mechanics is that you can grab a ledge on certain points in a level and hang on to it. Now my question, since I've been wrestling with this for quite a while now. How could I actually implement this? I have tried it with animations, but it's just really ugly since the player will snap to a certain point where the animation starts.

    Read the article

< Previous Page | 108 109 110 111 112 113 114 115 116 117 118 119  | Next Page >