Search Results

Search found 31356 results on 1255 pages for 'database backups'.

Page 644/1255 | < Previous Page | 640 641 642 643 644 645 646 647 648 649 650 651  | Next Page >

  • SQL University: Parallelism Week - Part 3, Settings and Options

    - by Adam Machanic
    Congratulations! You've made it back for the the third and final installment of Parallelism Week here at SQL University . So far we've covered the fundamentals of multitasking vs. parallel processing and delved into how parallel query plans actually work . Today we'll take a look at the settings and options that influence intra-query parallelism and discuss how best to set things up in various situations. Instance-Level Configuration Your database server probably has more than one logical processor....(read more)

    Read the article

  • Ghost team foundation build controllers

    - by Martin Hinshelwood
    Quite often after an upgrade there are things left over. Most of the time they are easy to delete, but sometimes it takes a little effort. Even rarer are those times when something just will not go away no matter how much you try. We have had a ghost team build controller hanging around for a while now, and it had defeated my best efforts to get rid of it. The build controller was from our old TFS server from before our TFS 2010 beta 2 upgrade and was really starting to annoy me. Every time I try to delete it I get the message: Controller cannot be deleted because there are build in progress -Manage Build Controller dialog   Figure: Deleting a ghost controller does not always work. I ended up checking all of our 172 Team Projects for the build that was queued, but did not find anything. Jim Lamb pointed me to the “tbl_BuildQueue” table in the team Project Collection database and sure enough there was the nasty little beggar. Figure: The ghost build was easily spotted Adam Cogan asked me: “Why did you suspect this one?” Well, there are a number of things that led me to suspect it: QueueId is very low: Look at the other items, they are in the thousands not single digits ControllerId: I know there is only one legitimate controller, and I am assuming that 6 relates to “zzUnicorn” DefinitionId: This is a very low number and I looked it up in “tbl_BuildDefinition” and it did not exist QueueTime: As we did not upgrade to TFS 2010 until late 2009 a date of 2008 for a queued build is very suspect Status: A status of 2 means that it is still queued This build must have been queued long ago when we were using TFS 2008, probably a beta, and it never got cleaned up. As controllers are new in TFS 2010 it would have created the “zzUnicorn” controller to handle any build servers that already exist. I had previously deleted the Agent, but leaving the controller just looks untidy. Now that the ghost build has been identified there are two options: Delete the row I would not recommend ever deleting anything from the database to achieve something in TFS. It is really not supported. Set the Status to cancelled (Recommended) This is the best option as TFS will then clean it up itself So I set the Status of this build to 2 (cancelled) and sure enough it disappeared after a couple of minutes and I was then able to then delete the “zzUnicorn” controller. Figure: Almost completely clean Now all I have to do is get rid of that untidy “zzBunyip” agent, but that will require rewriting one of our build scripts which will have to wait for now.   Technorati Tags: ALM,TFBS,TFS 2010

    Read the article

  • System.Data.SQLite

    - by csharp-source.net
    System.Data.SQLite is an enhanced version of the original SQLite database engine. It is a complete drop-in replacement for the original sqlite3.dll (you can even rename it to sqlite3.dll). It has no linker dependency on the .NET runtime so it can be distributed independently of .NET, yet embedded in the binary is a complete ADO.NET 2.0 provider for full managed development.

    Read the article

  • Using TPL and PLINQ to raise performance of feed aggregator

    - by DigiMortal
    In this posting I will show you how to use Task Parallel Library (TPL) and PLINQ features to boost performance of simple RSS-feed aggregator. I will use here only very basic .NET classes that almost every developer starts from when learning parallel programming. Of course, we will also measure how every optimization affects performance of feed aggregator. Feed aggregator Our feed aggregator works as follows: Load list of blogs Download RSS-feed Parse feed XML Add new posts to database Our feed aggregator is run by task scheduler after every 15 minutes by example. We will start our journey with serial implementation of feed aggregator. Second step is to use task parallelism and parallelize feeds downloading and parsing. And our last step is to use data parallelism to parallelize database operations. We will use Stopwatch class to measure how much time it takes for aggregator to download and insert all posts from all registered blogs. After every run we empty posts table in database. Serial aggregation Before doing parallel stuff let’s take a look at serial implementation of feed aggregator. All tasks happen one after other. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();           for (var index = 0; index <blogs.Count; index++)         {              ImportFeed(blogs[index]);         }     }       private void ImportFeed(BlogDto blog)     {         if(blog == null)             return;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                 }       private void ImportRssFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = RssFeed.Create(uri);           foreach (var item in feed.Channel.Items)         {             SaveRssFeedItem(item, blog.Id, blog.CreatedById);         }     }       private void ImportAtomFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           foreach (var item in feed.Entries)         {             SaveAtomFeedEntry(item, blog.Id, blog.CreatedById);         }     } } Serial implementation of feed aggregator downloads and inserts all posts with 25.46 seconds. Task parallelism Task parallelism means that separate tasks are run in parallel. You can find out more about task parallelism from MSDN page Task Parallelism (Task Parallel Library) and Wikipedia page Task parallelism. Although finding parts of code that can run safely in parallel without synchronization issues is not easy task we are lucky this time. Feeds import and parsing is perfect candidate for parallel tasks. We can safely parallelize feeds import because importing tasks doesn’t share any resources and therefore they don’t also need any synchronization. After getting the list of blogs we iterate through the collection and start new TPL task for each blog feed aggregation. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();                var tasks = new Task[blogs.Count];           for (var index = 0; index <blogs.Count; index++)         {             tasks[index] = new Task(ImportFeed, blogs[index]);             tasks[index].Start();         }           Task.WaitAll(tasks);     }       private void ImportFeed(object blogObject)     {         if(blogObject == null)             return;         var blog = (BlogDto)blogObject;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                }       private void ImportRssFeed(BlogDto blog)     {          var uri = new Uri(blog.RssUrl);          var feed = RssFeed.Create(uri);           foreach (var item in feed.Channel.Items)          {              SaveRssFeedItem(item, blog.Id, blog.CreatedById);          }     }     private void ImportAtomFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           foreach (var item in feed.Entries)         {             SaveAtomFeedEntry(item, blog.Id, blog.CreatedById);         }     } } You should notice first signs of the power of TPL. We made only minor changes to our code to parallelize blog feeds aggregating. On my machine this modification gives some performance boost – time is now 17.57 seconds. Data parallelism There is one more way how to parallelize activities. Previous section introduced task or operation based parallelism, this section introduces data based parallelism. By MSDN page Data Parallelism (Task Parallel Library) data parallelism refers to scenario in which the same operation is performed concurrently on elements in a source collection or array. In our code we have independent collections we can process in parallel – imported feed entries. As checking for feed entry existence and inserting it if it is missing from database doesn’t affect other entries the imported feed entries collection is ideal candidate for parallelization. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();                var tasks = new Task[blogs.Count];           for (var index = 0; index <blogs.Count; index++)         {             tasks[index] = new Task(ImportFeed, blogs[index]);             tasks[index].Start();         }           Task.WaitAll(tasks);     }       private void ImportFeed(object blogObject)     {         if(blogObject == null)             return;         var blog = (BlogDto)blogObject;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                }       private void ImportRssFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = RssFeed.Create(uri);           feed.Channel.Items.AsParallel().ForAll(a =>         {             SaveRssFeedItem(a, blog.Id, blog.CreatedById);         });      }        private void ImportAtomFeed(BlogDto blog)      {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           feed.Entries.AsParallel().ForAll(a =>         {              SaveAtomFeedEntry(a, blog.Id, blog.CreatedById);         });      } } We did small change again and as the result we parallelized checking and saving of feed items. This change was data centric as we applied same operation to all elements in collection. On my machine I got better performance again. Time is now 11.22 seconds. Results Let’s visualize our measurement results (numbers are given in seconds). As we can see then with task parallelism feed aggregation takes about 25% less time than in original case. When adding data parallelism to task parallelism our aggregation takes about 2.3 times less time than in original case. More about TPL and PLINQ Adding parallelism to your application can be very challenging task. You have to carefully find out parts of your code where you can safely go to parallel processing and even then you have to measure the effects of parallel processing to find out if parallel code performs better. If you are not careful then troubles you will face later are worse than ones you have seen before (imagine error that occurs by average only once per 10000 code runs). Parallel programming is something that is hard to ignore. Effective programs are able to use multiple cores of processors. Using TPL you can also set degree of parallelism so your application doesn’t use all computing cores and leaves one or more of them free for host system and other processes. And there are many more things in TPL that make it easier for you to start and go on with parallel programming. In next major version all .NET languages will have built-in support for parallel programming. There will be also new language constructs that support parallel programming. Currently you can download Visual Studio Async to get some idea about what is coming. Conclusion Parallel programming is very challenging but good tools offered by Visual Studio and .NET Framework make it way easier for us. In this posting we started with feed aggregator that imports feed items on serial mode. With two steps we parallelized feed importing and entries inserting gaining 2.3 times raise in performance. Although this number is specific to my test environment it shows clearly that parallel programming may raise the performance of your application significantly.

    Read the article

  • Daily tech links for .net and related technologies - Apr 1-3, 2010

    - by SanjeevAgarwal
    Daily tech links for .net and related technologies - Apr 1-3, 2010 Web Development Cleaner HTML Markup with ASP.NET 4 Web Forms - Client IDs - ScottGu Using jQuery and OData to Insert a Database Record - Stephen Walter Apple vs. Microsoft – A Website Usability Study Mastering ASP.NET MVC 2.0: Preview - TekPub Web Design UX Lessons Learned From Offline Experiences - Jon Phillips 5 Steps Toward jQuery Mastery - Dave Ward 20 jQuery Cheatsheets, Docs and References for Every Occasion - Paul Andrew 11...(read more)

    Read the article

  • SQL SERVER – Detecting guest User Permissions – guest User Access Status

    - by pinaldave
    Earlier I wrote the blog post SQL SERVER – Disable Guest Account – Serious Security Issue, and I got many comments asking questions related to the guest user. Here are the comments of Manoj: 1) How do we know if the uest user is enabled or disabled? 2) What is the default for guest user in SQL Server? Default settings for guest user When SQL Server is installed by default, the guest user is disabled for security reasons. If the guest user is not properly configured, it can create a major security issue. You can read more about this here. Identify guest user status There are multiple ways to identify guest user status: Using SQL Server Management Studio (SSMS) You can expand the database node >> Security >> Users. If you see the RED arrow pointing downward, it means that the guest user is disabled. Using sys.sysusers Here is a simple script. If you notice column dbaccess as 1, it means that the guest user is enabled and has access to the database. SELECT name, hasdbaccess FROM sys.sysusers WHERE name = 'guest' Using sys.database_principals and sys.server_permissions This script is valid in SQL Server 2005 and a later version. This is my default method recently. SELECT name, permission_name, state_desc FROM sys.database_principals dp INNER JOIN sys.server_permissions sp ON dp.principal_id = sp.grantee_principal_id WHERE name = 'guest' AND permission_name = 'CONNECT' Using sp_helprotect Just run the following stored procedure which will give you all the permissions associated with the user. sp_helprotect @username = 'guest' Disable Guest Account REVOKE CONNECT FROM guest Additionally, the guest account cannot be disabled in master and tempdb; it is always enabled. There is a special need for this. Let me ask a question back at you: In which scenario do you think this will be useful to keep the guest, and what will the additional configuration go along with the scenario? Note: Special mention to Imran Mohammed for being always there when users need help. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Security, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Red Gate Rolls Out the Red Carpet for SQL Server Users

    SQL in the City, the unique event for database developers and administrators organized by Red Gate Software, hits the streets of London and Seattle this fall. Now in its fourth year, it features presentations by some of the world’s top SQL Server speakers. Can 41,000 DBAs really be wrong? Join 41,000 other DBAs who are following the new series from the DBA Team: the 5 Worst Days in a DBA’s Life. Part 3, As Corrupt As It Gets, is out now – read it here.

    Read the article

  • ODI 11g – How to Load Using Partition Exchange

    - by David Allan
    Here we will look at how to load large volumes of data efficiently into the Oracle database using a mixture of CTAS and partition exchange loading. The example we will leverage was posted by Mark Rittman a couple of years back on Interval Partitioning, you can find that posting here. The best thing about ODI is that you can encapsulate all those ‘how to’ blog posts and scripts into templates that can be reused – the templates are of course Knowledge Modules. The interface design to mimic Mark's posting is shown below; The IKM I have constructed performs a simple series of steps to perform a CTAS to create the stage table to use in the exchange, then lock the partition (to ensure it exists, it will be created if it doesn’t) then exchange the partition in the target table. You can find the IKM Oracle PEL.xml file here. The IKM performs the follows steps and is meant to illustrate what can be done; So when you use the IKM in an interface you configure the options for hints (for parallelism levels etc), initial extent size, next extent size and the partition variable;   The KM has an option where the name of the partition can be passed in, so if you know the name of the partition then set the variable to the name, if you have interval partitioning you probably don’t know the name, so you can use the FOR clause. In my example I set the variable to use the date value of the source data FOR (TO_DATE(''01-FEB-2010'',''dd-MON-yyyy'')) Using a variable lets me invoke the scenario many times loading different partitions of the same target table. Below you can see where this is defined within ODI, I had to double single-quote the strings since this is placed inside the execute immediate tasks in the KM; Note also this example interface uses the LKM Oracle to Oracle (datapump), so this illustration uses a lot of the high performing Oracle database capabilities – it uses Data Pump to unload, then a CreateTableAsSelect (CTAS) is executed on the external table based on top of the Data Pump export. This table is then exchanged in the target. The IKM and illustrations above are using ODI 11.1.1.6 which was needed to get around some bugs in earlier releases with how the variable is handled...as far as I remember.

    Read the article

  • The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

    - by Bertrand Le Roy
    We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

    Read the article

  • Announcing Entity Framework Code-First (CTP 5 release)

    In this article, Scott provides a detailed coverage of Entity Framework Code-First CTP 5 release and the features included with the build. He begins with the steps required to install EF Code First. Scott then examines the usage of EF Code First to create a model layer for the Northwind sample database in a series of steps. Towards the end of the article, Scott examines the usage of UI Validation and few addtional EF Code First Improvements shipped with CTP 5.

    Read the article

  • Available OPN Specialization Categories & Specializations

    - by [email protected]
    DatabaseOracle Data Warehousing Specialization CriteriaOracle Database 11g Specialization CriteriaOracle Enterprise Linux Specialization CriteriaOracle Enterprise Manager Specialization Criteria MiddlewareOracle Service-Oriented Architecture Specialization CriteriaOracle Business Intelligence Foundation Specialization CriteriaOracle Enterprise ManagerOracle Enterprise Manager Specialization CriteriaOracle Enterprise LinuxOracle Enterprise Linux Specialization CriteriaAll Available Specializationshttps://competencycenter.oracle.com/opncc/glp_list.cc

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • T-SQL Tuesday #31 - Logging Tricks with CONTEXT_INFO

    - by Most Valuable Yak (Rob Volk)
    This month's T-SQL Tuesday is being hosted by Aaron Nelson [b | t], fellow Atlantan (the city in Georgia, not the famous sunken city, or the resort in the Bahamas) and covers the topic of logging (the recording of information, not the harvesting of trees) and maintains the fine T-SQL Tuesday tradition begun by Adam Machanic [b | t] (the SQL Server guru, not the guy who fixes cars, check the spelling again, there will be a quiz later). This is a trick I learned from Fernando Guerrero [b | t] waaaaaay back during the PASS Summit 2004 in sunny, hurricane-infested Orlando, during his session on Secret SQL Server (not sure if that's the correct title, and I haven't used parentheses in this paragraph yet).  CONTEXT_INFO is a neat little feature that's existed since SQL Server 2000 and perhaps even earlier.  It lets you assign data to the current session/connection, and maintains that data until you disconnect or change it.  In addition to the CONTEXT_INFO() function, you can also query the context_info column in sys.dm_exec_sessions, or even sysprocesses if you're still running SQL Server 2000, if you need to see it for another session. While you're limited to 128 bytes, one big advantage that CONTEXT_INFO has is that it's independent of any transactions.  If you've ever logged to a table in a transaction and then lost messages when it rolled back, you can understand how aggravating it can be.  CONTEXT_INFO also survives across multiple SQL batches (GO separators) in the same connection, so for those of you who were going to suggest "just log to a table variable, they don't get rolled back":  HA-HA, I GOT YOU!  Since GO starts a new batch all variable declarations are lost. Here's a simple example I recently used at work.  I had to test database mirroring configurations for disaster recovery scenarios and measure the network throughput.  I also needed to log how long it took for the script to run and include the mirror settings for the database in question.  I decided to use AdventureWorks as my database model, and Adam Machanic's Big Adventure script to provide a fairly large workload that's repeatable and easily scalable.  My test would consist of several copies of AdventureWorks running the Big Adventure script while I mirrored the databases (or not). Since Adam's script contains several batches, I decided CONTEXT_INFO would have to be used.  As it turns out, I only needed to grab the start time at the beginning, I could get the rest of the data at the end of the process.   The code is pretty small: declare @time binary(128)=cast(getdate() as binary(8)) set context_info @time   ... rest of Big Adventure code ...   go use master; insert mirror_test(server,role,partner,db,state,safety,start,duration) select @@servername, mirroring_role_desc, mirroring_partner_instance, db_name(database_id), mirroring_state_desc, mirroring_safety_level_desc, cast(cast(context_info() as binary(8)) as datetime), datediff(s,cast(cast(context_info() as binary(8)) as datetime),getdate()) from sys.database_mirroring where db_name(database_id) like 'Adv%';   I declared @time as a binary(128) since CONTEXT_INFO is defined that way.  I couldn't convert GETDATE() to binary(128) as it would pad the first 120 bytes as 0x00.  To keep the CAST functions simple and avoid using SUBSTRING, I decided to CAST GETDATE() as binary(8) and let SQL Server do the implicit conversion.  It's not the safest way perhaps, but it works on my machine. :) As I mentioned earlier, you can query system views for sessions and get their CONTEXT_INFO.  With a little boilerplate code this can be used to monitor long-running procedures, in case you need to kill a process, or are just curious  how long certain parts take.  In this example, I added code to Adam's Big Adventure script to set CONTEXT_INFO messages at strategic places I want to monitor.  (His code is in UPPERCASE as it was in the original, mine is all lowercase): declare @msg binary(128) set @msg=cast('Altering bigProduct.ProductID' as binary(128)) set context_info @msg go ALTER TABLE bigProduct ALTER COLUMN ProductID INT NOT NULL GO set context_info 0x0 go declare @msg1 binary(128) set @msg1=cast('Adding pk_bigProduct Constraint' as binary(128)) set context_info @msg1 go ALTER TABLE bigProduct ADD CONSTRAINT pk_bigProduct PRIMARY KEY (ProductID) GO set context_info 0x0 go declare @msg2 binary(128) set @msg2=cast('Altering bigTransactionHistory.TransactionID' as binary(128)) set context_info @msg2 go ALTER TABLE bigTransactionHistory ALTER COLUMN TransactionID INT NOT NULL GO set context_info 0x0 go declare @msg3 binary(128) set @msg3=cast('Adding pk_bigTransactionHistory Constraint' as binary(128)) set context_info @msg3 go ALTER TABLE bigTransactionHistory ADD CONSTRAINT pk_bigTransactionHistory PRIMARY KEY NONCLUSTERED(TransactionID) GO set context_info 0x0 go declare @msg4 binary(128) set @msg4=cast('Creating IX_ProductId_TransactionDate Index' as binary(128)) set context_info @msg4 go CREATE NONCLUSTERED INDEX IX_ProductId_TransactionDate ON bigTransactionHistory(ProductId,TransactionDate) INCLUDE(Quantity,ActualCost) GO set context_info 0x0   This doesn't include the entire script, only those portions that altered a table or created an index.  One annoyance is that SET CONTEXT_INFO requires a literal or variable, you can't use an expression.  And since GO starts a new batch I need to declare a variable in each one.  And of course I have to use CAST because it won't implicitly convert varchar to binary.  And even though context_info is a nullable column, you can't SET CONTEXT_INFO NULL, so I have to use SET CONTEXT_INFO 0x0 to clear the message after the statement completes.  And if you're thinking of turning this into a UDF, you can't, although a stored procedure would work. So what does all this aggravation get you?  As the code runs, if I want to see which stage the session is at, I can run the following (assuming SPID 51 is the one I want): select CAST(context_info as varchar(128)) from sys.dm_exec_sessions where session_id=51   Since SQL Server 2005 introduced the new system and dynamic management views (DMVs) there's not as much need for tagging a session with these kinds of messages.  You can get the session start time and currently executing statement from them, and neatly presented if you use Adam's sp_whoisactive utility (and you absolutely should be using it).  Of course you can always use xp_cmdshell, a CLR function, or some other tricks to log information outside of a SQL transaction.  All the same, I've used this trick to monitor long-running reports at a previous job, and I still think CONTEXT_INFO is a great feature, especially if you're still using SQL Server 2000 or want to supplement your instrumentation.  If you'd like an exercise, consider adding the system time to the messages in the last example, and an automated job to query and parse it from the system tables.  That would let you track how long each statement ran without having to run Profiler. #TSQL2sDay

    Read the article

  • Oracle Optimized Solutions at Oracle OpenWorld 2012

    - by ferhatSF
    Have you registered for Oracle OpenWorld 2012 in San Francisco from September 30 to October 4? Visit the Oracle OpenWorld 2012 site today for registration and more information. Come join us to hear how Oracle Optimized Solutions can help you save money, reduce integration risks, and improve user productivity. Oracle Optimized Solutions are designed, pre-tested, tuned and fully documented architectures for optimal performance and availability. They provide written guidelines to help size, configure, purchase and deploy enterprise solutions that address common IT problems. Built with flexibility in mind, Oracle Optimized Solutions can be deployed as complete solutions or easily tailored to meet your specific needs - they are proven to save money, reduce integration risks and improve user productivity. Here is a preview of the planned Oracle OpenWorld sessions(*) on Oracle Optimized Solutions. October 1, 2012 Monday Time Session ID Title Location 12:15 PM CON7916 Accelerate Oracle E-Business Suite Deployment with SPARC SuperCluster Moscone West - 2001 03:15 PM GEN9691 General Session: Accelerate Your Business with the Oracle Hardware Advantage Moscone North - Hall D 04:45 PM CON4821 Building a Flexible Enterprise Cloud Infrastructure on Oracle SPARC Systems Moscone West - 2001 October 2, 2012 Tuesday Time Session ID Title Location 10:15 AM CON4561 Backup-and-Recovery Best Practices with Oracle Engineered Systems Products Moscone South - 252 11:45 AM CON3851 Optimizing JD Edwards EnterpriseOne on SPARC T4 Servers for Best Performance Moscone West - 2000 01:15 PM GEN11472 General Session: Breakthrough Efficiency in Private Cloud Infrastructure Moscone West - 3014 01:15 PM CON4600 Extreme Storage Scale and Efficiency: Lessons from a 100,000-Person Organization Moscone South - 252 05:00 PM CON9465 Next-Generation Directory: Oracle Unified Directory Moscone West - 3008 05:00 PM CON4088 Accelerate Your SAP Landscape with the Oracle SPARC SuperCluster Moscone West - 2001 05:00 PM CON7743 High-Performance Security for Oracle Applications Using SPARC T4 Systems Moscone West - 2000 05:00 PM CON3857 Archive Strategies for 100 Percent Data Availability Moscone South - 270 October 3, 2012 Wednesday Time Session ID Title Location 10:15 AM CON6528 Configure Oracle Hybrid Columnar Compression to Optimize Query Database Performance up to 10x Moscone South - 252 11:45 AM CON2590 Breakthrough in Private Cloud Management on SPARC T-Series Servers Moscone South - 270 01:15 PM CON4289 Oracle Optimized Solution for Siebel CRM at ACCOR Moscone West - 2000 05:00 PM CON7570 Improve PeopleSoft HCM Performance and Reliability with SPARC SuperCluster Moscone South - 252 * Schedule subject to change In addition, there will be Oracle Optimized Solutions Hands-On-Labs sessions planned. Please enroll ahead of time as space is limited: Oracle Optimized Solutions: Hands on Labs in Oracle OpenWorld Place: Marriott Marquis - Salon 14/15 Date and Time Session ID Title Monday October 1, 2012 01:45 PM HOL9868 Enterprise Cloud Infrastructure for SPARC with Oracle Enterprise Manager Ops Center 12c Monday October 1, 2012 03:15 PM HOL9907 Oracle Virtual Desktop Infrastructure Performance and Tablet Mobility Wednesday October 3, 2012 05:00 PM HOL9870 x86 Enterprise Cloud Infrastructure with Oracle VM 3.x and Sun ZFS Storage Appliance Thursday October 4, 2012 11:15 AM HOL9869 0 to Database Backup and Recovery in 60 Minutes Oracle Optimized Solutions executives and experts will also be at hand for discussions and follow ups. And don’t forget to catch live demonstrations of our complete Oracle Optimized Solutions while at Oracle OpenWorld 2012 in San Francisco. We recommend the use of the Schedule Builder tool to plan your visit to the conference and for pre-enrollment in sessions of your interest. We hope to see you there!

    Read the article

  • Log & monitor mysql databases on servers

    - by user3215
    How MySQL databases logged and monitored on ubuntu servers in real time?. I checked /var/log/mysql.log and found it empty. EDIT 1: The log was not enabled in the mysql configuration file. Now it logs and I could see the logs in the file /var/log/mysql/mysql.log But this could not be sufficient to gather additional information about the database logs. Is there any other way or any popular open source tool?

    Read the article

  • SQLBeat Episode 11 – Ted the Fred Krueger Halloween SQL

    - by SQLBeat
    In this episode of the SQLBeat Podcast I speak conversationally (Ok I will just say I converse) with Ted Krueger about Elm Street, where he works as a DBA who stores nightmares in SQL Server database tables. The joke about it being BLOB storage is only one of several that may scare you away from this Halloween Special. If you like listening to two SQL guys talking about the bands they used to be in, rainbow trout and video games, come on in. Bwaaaa Haaaa Haaa…..Ok I will stop. Download the MP3

    Read the article

  • SQLAuthority News – Presented Soft Skill Session on Presentation Skills at SQL Bangalore on May 3, 2014

    - by Pinal Dave
    I have presented on various database technologies for almost 10 years now. SQL, Database and NoSQL have been part of my life. Earlier this month, I had the opportunity to present on the topic Performing an Effective Presentation. I must say it was blast to prepare as well as present this session. This event was part of the SQL Bangalore community. If you are in Bangalore, you must be part of this group. SQL Bangalore is a wonderful community and we always have a great response when we present on technology. It is SQL User Group and we discuss everything SQL there. This month we had SQL Server 2014 theme and we had a community launch of SQL Server. We have the best of the best speakers presenting on SQL Server 2014 technology. The event had amazing speakers and each of them did justice to the subject. You can read about this over here. In this session I told a story from my life. I talked about who inspired me and how I learned to speak in public. I told stories about two legends  who have inspired me. There is no video recording of this session. If you want to get resources from this session, please sign up my newsletter at http://bit.ly/sqllearn. Well, I had a great time at this event. We had over 250 people showed up at this event and had a grand  time together. I personally enjoyed a session of Amit Benerjee, Balmukund Lakhani and Vinod Kumar. Ken and Surabh also entertained the audience. Overall, this was a grand event and if you were in Bangalore and did not make it to this event. You did miss out on a few things. Here are a few photos of this event. SQL Bangalore UG Nupur, Chandra, Shaivi, Balmukund, Amit, Vinod [captions This] SQL Bangalore UG Audience Pinal Dave presenting at SQL UG in Bangalore Here are few of the slides from this presentation: Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLAuthority News, T SQL

    Read the article

  • SQL Server data platform upgrade - Why upgrade and how best you can reduce pre & post upgrade problems?

    - by ssqa.net
    SQL Server upgrade, let it be database(s) or instance(s) or both the process and procedures must follow best practices in order to reduce any problems that may occur even after the platform is upgraded. The success of any project relies upon the simpler methods of implementation and a process to reduce the complexity in testing to ensure a successful outcome. Also the topic has been a popular topic that .... read more from here ......(read more)

    Read the article

< Previous Page | 640 641 642 643 644 645 646 647 648 649 650 651  | Next Page >