Search Results

Search found 10711 results on 429 pages for 'blog spotlight'.

Page 74/429 | < Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81 | Next Page >

Using Subjects to Deploy Queries Dynamically

- by Roman Schindlauer

In the previous blog posting, we showed how to construct and deploy query fragments to a StreamInsight server, and how to re-use them later. In today’s posting we’ll integrate this pattern into a method of dynamically composing a new query with an existing one. The construct that enables this scenario in StreamInsight V2.1 is a Subject. A Subject lets me create a junction element in an existing query that I can tap into while the query is running. To set this up as an end-to-end example, let’s first define a stream simulator as our data source: var generator = myApp.DefineObservable( (TimeSpan t) => Observable.Interval(t).Select(_ => new SourcePayload())); This ‘generator’ produces a new instance of SourcePayload with a period of t (system time) as an IObservable. SourcePayload happens to have a property of type double as its payload data. Let’s also define a sink for our example—an IObserver of double values that writes to the console: var console = myApp.DefineObserver( (string label) => Observer.Create<double>(e => Console.WriteLine("{0}: {1}", label, e))) .Deploy("ConsoleSink"); The observer takes a string as parameter which is used as a label on the console, so that we can distinguish the output of different sink instances. Note that we also deploy this observer, so that we can retrieve it later from the server from a different process. Remember how we defined the aggregation as an IQStreamable function in the previous article? We will use that as well: var avg = myApp .DefineStreamable((IQStreamable<SourcePayload> s, TimeSpan w) => from win in s.TumblingWindow(w) select win.Avg(e => e.Value)) .Deploy("AverageQuery"); Then we define the Subject, which acts as an observable sequence as well as an observer. Thus, we can feed a single source into the Subject and have multiple consumers—that can come and go at runtime—on the other side: var subject = myApp.CreateSubject("Subject", () => new Subject<SourcePayload>()); Subject are always deployed automatically. Their name is used to retrieve them from a (potentially) different process (see below). Note that the Subject as we defined it here doesn’t know anything about temporal streams. It is merely a sequence of SourcePayloads, without any notion of StreamInsight point events or CTIs. So in order to compose a temporal query on top of the Subject, we need to 'promote' the sequence of SourcePayloads into an IQStreamable of point events, including CTIs: var stream = subject.ToPointStreamable( e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e), AdvanceTimeSettings.StrictlyIncreasingStartTime); In a later posting we will show how to use Subjects that have more awareness of time and can be used as a junction between QStreamables instead of IQbservables. Having turned the Subject into a temporal stream, we can now define the aggregate on this stream. We will use the IQStreamable entity avg that we defined above: var longAverages = avg(stream, TimeSpan.FromSeconds(5)); In order to run the query, we need to bind it to a sink, and bind the subject to the source: var standardQuery = longAverages .Bind(console("5sec average")) .With(generator(TimeSpan.FromMilliseconds(300)).Bind(subject)); Lastly, we start the process: standardQuery.Run("StandardProcess"); Now we have a simple query running end-to-end, producing results. What follows next is the crucial part of tapping into the Subject and adding another query that runs in parallel, using the same query definition (the “AverageQuery”) but with a different window length. We are assuming that we connected to the same StreamInsight server from a different process or even client, and thus have to retrieve the previously deployed entities through their names: // simulate the addition of a 'fast' query from a separate server connection, // by retrieving the aggregation query fragment // (instead of simply using the 'avg' object) var averageQuery = myApp .GetStreamable<IQStreamable<SourcePayload>, TimeSpan, double>("AverageQuery"); // retrieve the input sequence as a subject var inputSequence = myApp .GetSubject<SourcePayload, SourcePayload>("Subject"); // retrieve the registered sink var sink = myApp.GetObserver<string, double>("ConsoleSink"); // turn the sequence into a temporal stream var stream2 = inputSequence.ToPointStreamable( e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e), AdvanceTimeSettings.StrictlyIncreasingStartTime); // apply the query, now with a different window length var shortAverages = averageQuery(stream2, TimeSpan.FromSeconds(1)); // bind new sink to query and run it var fastQuery = shortAverages .Bind(sink("1sec average")) .Run("FastProcess"); The attached solution demonstrates the sample end-to-end. Regards, The StreamInsight Team

Read the article
wrong return value with jquery ajax and codeigniter

- by matthew

I do not know if this is specifically a jquery problem, actually I think it has to mostly do with my logic in the php code. What Im trying to do is make a voting system that when the user clicks on the vote up or vote down link in the web page, it triggers an ajax call to a php function that first updates the database with with the required value, on success of the database updating the another function is called just to get the required updated html for the that particular post that the user has voted on. (hope I havnt lost you). The problem I think deals with specifically with this one line of code. When I make the call it only returns the value of $row-beer_down and completly ignores everything else in that line. Funny thing is the same php code to display the html view works perfectly before the ajax function updates it. echo "<p>Score " . $row->beer_up + $row->beer_down . "</p>"; so here is the code to hope you can help as I have absolutely no idea how to fix this. here is the view file where it generates the page. This part is the query ajax function. <script type="text/javascript"> $(function() { $(".vote").click(function(){ var id = $(this).attr("id"); var name = $(this).attr("name"); var dataString = 'id='+ id ; var parent = $(this); if(name=='up') { $.ajax({ type: "POST", url: "http://127.0.0.1/CodeIgniter/blog/add_vote/" + id, data: dataString, cache: false, success: function(html) { //parent.html(html); $("." + id).load('http://127.0.0.1/CodeIgniter/blog/get_post/' + id).fadeIn("slow"); } }); } else { $.ajax({ type: "POST", url: "http://127.0.0.1/CodeIgniter/blog/minus_vote/" + id, data: dataString, cache: false, success: function(html) { //parent.html(html); $("." + id).load('http://127.0.0.1/CodeIgniter/blog/get_post/' + id).fadeIn("slow"); } }); } return false; }); }); </script> Here is the html and php part of the page to display the post. div id="post_container"> <?php //echo $this->table->generate($records); ?> <?php foreach($records->result() as $row) { ?> <?php echo "<div id=\"post\" class=\"" . $row->id . "\">"; ?> <h2><?php echo $row->id ?></h2> <?php echo "<img src=\"" . base_url() . $dir . $row->account_id . "/" . $row->file_name . "\">" ?> <p><?php echo $row->content ?></p> <p><?php echo $row->user_name ?> On <?php echo $row->time ?></p> <p>Score <?php echo $row->beer_up + $row->beer_down ?></p> <?php echo anchor('blog/add_vote/' . $row->id, 'Up Vote', array('class' => 'vote', 'id' => $row->id, 'name' => 'up')); echo anchor('blog/minus_vote/' . $row->id, 'Down Vote', array('class' => 'vote', 'id' => $row->id, 'name' => 'down')); echo anchor('blog/comments/' . $row->id, 'View Comments'); ?> </div> <?php } ?> here is the function the ajax calls when it is successfull: function get_post() { $this->db->where('id', $this->uri->segment(3)); $records = $this->db->get('post'); $dir = "/uploads/user_uploads/"; foreach($records->result() as $row) { echo "<div id=\"post\" class=\"" . $row->id . "\">"; echo "<h2>" . $row->id . "</h2>"; echo "<img src=\"" . base_url() . $dir . $row->account_id . "/" . $row->file_name . "\">"; echo "<p>" . $row->content . "</p>"; echo "<p>" . $row->user_name . " On " . $row->time . "</p>"; echo "<p>Score " . $row->beer_up + $row->beer_down . "</p>"; echo "<p>Up score" . $row->beer_up . "beer down" . $row->beer_down . "</p>"; echo anchor('blog/comments/' . $row->id, 'View Comments'); echo "</div>"; }

Read the article
BNF – how to read syntax?

- by Piotr Rodak

A few days ago I read post of Jen McCown (blog) about her idea of blogging about random articles from Books Online. I think this is a great idea, even if Jen says that it’s not exciting or sexy. I noticed that many of the questions that appear on forums and other media arise from pure fact that people asking questions didn’t bother to read and understand the manual – Books Online. Jen came up with a brilliant, concise acronym that describes very well the category of posts about Books Online – RTFM365. I take liberty of tagging this post with the same acronym. I often come across questions of type – ‘Hey, i am trying to create a table, but I am getting an error’. The error often says that the syntax is invalid. 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT DEFAULT Guid_Default NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); 5 The answer is usually(1), ‘Ok, let me check it out.. Ah yes – you have to put name of the DEFAULT constraint before the type of constraint: 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); Why many people stumble on syntax errors? Is the syntax poorly documented? No, the issue is, that correct syntax of the CREATE TABLE statement is documented very well in Books Online and is.. intimidating. Many people can be taken aback by the rather complex block of code that describes all intricacies of the statement. However, I don’t know better way of defining syntax of the statement or command. The notation that is used to describe syntax in Books Online is a form of Backus-Naur notatiion, called BNF for short sometimes. This is a notation that was invented around 50 years ago, and some say that even earlier, around 400 BC – would you believe? Originally it was used to define syntax of, rather ancient now, ALGOL programming language (in 1950’s, not in ancient India). If you look closer at the definition of the BNF, it turns out that the principles of this syntax are pretty simple. Here are a few bullet points: italic_text is a placeholder for your identifier <italic_text_in_angle_brackets> is a definition which is described further. [everything in square brackets] is optional {everything in curly brackets} is obligatory everything | separated | by | operator is an alternative ::= “assigns” definition to an identifier Yes, it looks like these six simple points give you the key to understand even the most complicated syntax definitions in Books Online. Books Online contain an article about syntax conventions – have you ever read it? Let’s have a look at fragment of the CREATE TABLE statement: 1 CREATE TABLE 2 [ database_name . [ schema_name ] . | schema_name . ] table_name 3 ( { <column_definition> | <computed_column_definition> 4 | <column_set_definition> } 5 [ <table_constraint> ] [ ,...n ] ) 6 [ ON { partition_scheme_name ( partition_column_name ) | filegroup 7 | "default" } ] 8 [ { TEXTIMAGE_ON { filegroup | "default" } ] 9 [ FILESTREAM_ON { partition_scheme_name | filegroup 10 | "default" } ] 11 [ WITH ( <table_option> [ ,...n ] ) ] 12 [ ; ] Let’s look at line 2 of the above snippet: This line uses rules 3 and 5 from the list. So you know that you can create table which has specified one of the following. just name – table will be created in default user schema schema name and table name – table will be created in specified schema database name, schema name and table name – table will be created in specified database, in specified schema database name, .., table name – table will be created in specified database, in default schema of the user. Note that this single line of the notation describes each of the naming schemes in deterministic way. The ‘optionality’ of the schema_name element is nested within database_name.. section. You can use either database_name and optional schema name, or just schema name – this is specified by the pipe character ‘|’. The error that user gets with execution of the first script fragment in this post is as follows: Msg 156, Level 15, State 1, Line 2 Incorrect syntax near the keyword 'DEFAULT'. Ok, let’s have a look how to find out the correct syntax. Line number 3 of the BNF fragment above contains reference to <column_definition>. Since column_definition is in angle brackets, we know that this is a reference to notion described further in the code. And indeed, the very next fragment of BNF contains syntax of the column definition. 1 <column_definition> ::= 2 column_name <data_type> 3 [ FILESTREAM ] 4 [ COLLATE collation_name ] 5 [ NULL | NOT NULL ] 6 [ 7 [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] 8 | [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] 9 ] 10 [ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] 11 [ SPARSE ] Look at line 7 in the above fragment. It says, that the column can have a DEFAULT constraint which, if you want to name it, has to be prepended with [CONSTRAINT constraint_name] sequence. The name of the constraint is optional, but I strongly recommend you to make the effort of coming up with some meaningful name yourself. So the correct syntax of the CREATE TABLE statement from the beginning of the article is like this: 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); That is practically everything you should know about BNF. I encourage you to study the syntax definitions for various statements and commands in Books Online, you can find really interesting things hidden there. Technorati Tags: SQL Server,t-sql,BNF,syntax (1) No, my answer usually is a question – ‘What error message? What does it say?’. You’d be surprised to know how many people think I can go through time and space and look at their screen at the moment they received the error.

Read the article
SQL Server and Hyper-V Dynamic Memory Part 3

- by SQLOS Team

In parts 1 and 2 of this series we looked at the basics of Hyper-V Dynamic Memory and SQL Server memory management. In this part Serdar looks at configuration guidelines for SQL Server memory management. Part 3: Configuration Guidelines for Hyper-V Dynamic Memory and SQL Server Now that we understand SQL Server Memory Management and Hyper-V Dynamic Memory basics, let’s take a look at general configuration guidelines in order to utilize benefits of Hyper-V Dynamic Memory in your SQL Server VMs. Requirements Host Operating System Requirements Hyper-V Dynamic Memory feature is introduced with Windows Server 2008 R2 SP1. Therefore in order to use Dynamic Memory for your virtual machines, you need to have Windows Server 2008 R2 SP1 or Microsoft Hyper-V Server 2008 R2 SP1 in your Hyper-V host. Guest Operating System Requirements In addition to this Dynamic Memory is only supported in Standard, Web, Enterprise and Datacenter editions of windows running inside VMs. Make sure that your VM is running one of these editions. For additional requirements on each operating system see “Dynamic Memory Configuration Guidelines” here. SQL Server Requirements All versions of SQL Server support Hyper-V Dynamic Memory. However, only certain editions of SQL Server are aware of dynamically changing system memory. To have a truly dynamic environment for your SQL Server VMs make sure that you are running one of the SQL Server editions listed below: · SQL Server 2005 Enterprise · SQL Server 2008 Enterprise / Datacenter Editions · SQL Server 2008 R2 Enterprise / Datacenter Editions Configuration guidelines for other versions of SQL Server are covered below in the FAQ section. Guidelines for configuring Dynamic Memory Parameters Here is how to configure Dynamic Memory for your SQL VMs in a nutshell: Hyper-V Dynamic Memory Parameter Recommendation Startup RAM 1 GB + SQL Min Server Memory Maximum RAM > SQL Max Server Memory Memory Buffer % 5 Memory Weight Based on performance needs Startup RAM In order to ensure that your SQL Server VMs can start correctly, ensure that Startup RAM is higher than configured SQL Min Server Memory for your VMs. Otherwise SQL Server service will need to do paging in order to start since it will not be able to see enough memory during startup. Also note that Startup Memory will always be reserved for your VMs. This will guarantee a certain level of performance for your SQL Servers, however setting this too high will limit the consolidation benefits you’ll get out of your virtualization environment. Maximum RAM This one is obvious. If you’ve configured SQL Max Server Memory for your SQL Server, make sure that Dynamic Memory Maximum RAM configuration is higher than this value. Otherwise your SQL Server will not grow to memory values higher than the value configured for Dynamic Memory. Memory Buffer % Memory buffer configuration is used to provision file cache to virtual machines in order to improve performance. Due to the fact that SQL Server is managing its own buffer pool, Memory Buffer setting should be configured to the lowest value possible, 5%. Configuring a higher memory buffer will prevent low resource notifications from Windows Memory Manager and it will prevent reclaiming memory from SQL Server VMs. Memory Weight Memory weight configuration defines the importance of memory to a VM. Configure higher values for the VMs that have higher performance requirements. VMs with higher memory weight will have more memory under high memory pressure conditions on your host. Questions and Answers Q1 – Which SQL Server memory model is best for Dynamic Memory? The best SQL Server model for Dynamic Memory is “Locked Page Memory Model”. This memory model ensures that SQL Server memory is never paged out and it’s also adaptive to dynamically changing memory in the system. This will be extremely useful when Dynamic Memory is attempting to remove memory from SQL Server VMs ensuring no SQL Server memory is paged out. You can find instructions on configuring “Locked Page Memory Model” for your SQL Servers here. Q2 – What about other SQL Server Editions, how should I configure Dynamic Memory for them? Other editions of SQL Server do not adapt to dynamically changing environments. They will determine how much memory they should allocate during startup and don’t change this value afterwards. Therefore make sure that you configure a higher startup memory for your VM because that will be all the memory that SQL Server utilize Tune Maximum Memory and Memory Buffer based on the other workloads running on the system. If there are no other workloads consider using Static Memory for these editions. Q3 – What if I have multiple SQL Server instances in a VM? Having multiple SQL Server instances in a VM is not a general recommendation for predictable performance, manageability and isolation. In order to achieve a predictable behavior make sure that you configure SQL Min Server Memory and SQL Max Server Memory for each instance in the VM. And make sure that: · Dynamic Memory Startup Memory is greater than the sum of SQL Min Server Memory values for the instances in the VM · Dynamic Memory Maximum Memory is greater than the sum of SQL Max Server Memory values for the instances in the VM Q4 – I’m using Large Page Memory Model for my SQL Server. Can I still use Dynamic Memory? The short answer is no. SQL Server does not dynamically change its memory size when configured with Large Page Memory Model. In virtualized environments Hyper-V provides large page support by default. Most of the time, Large Page Memory Model doesn’t bring any benefits to a SQL Server if it’s running in virtualized environments. Q5 – How do I monitor SQL performance when I’m trying Dynamic Memory on my VMs? Use the performance counters below to monitor memory performance for SQL Server: Process - Working Set: This counter is available in the VM via process performance counters. It represents the actual amount of physical memory being used by SQL Server process in the VM. SQL Server – Buffer Cache Hit Ratio: This counter is available in the VM via SQL Server counters. This represents the paging being done by SQL Server. A rate of 90% or higher is desirable. Conclusion These blog posts are a quick start to a story that will be developing more in the near future. We’re still continuing our testing and investigations to provide more detailed configuration guidelines with example performance numbers with a white paper in the upcoming months. Now it’s time to give SQL Server and Hyper-V Dynamic Memory a try. Use this guidelines to kick-start your environment. See what you think about it and let us know of your experiences. - Serdar Sutay Originally posted at http://blogs.msdn.com/b/sqlosteam/

Read the article
TechEd 2010 Day One – How I Travel

- by BuckWoody

Normally when I blog on the first day of a conference, well, there hasn’t been a first day yet. So I talk about the value of a conference or some other facet. And normally in my (non-conference) blogs, I show you how I have learned to be a data professional – things I’ve learned how to do over the years. But in all that time, I don’t think I’ve ever talked about a big part of my job – traveling. I’ve traveled a lot throughout the years, when I’ve taught, gone to conferences, consulted and in my current role assisting Microsoft customers with large-scale database system designs. So I’ll share a few thoughts about what I do. Keep in mind that I travel for short durations, just a day or so, and sometimes I travel internationally. For those I prepare differently – what I’m talking about here is what I do for a multi-day, same-country trip. Hopefully you find it useful. I’ll tag a few other travelers I know to add their thoughts. Preparing for Travel When I’m notified of a trip, I begin researching the location. I find the flights, hotel and (if I have to) a car to use while I’m away. We have an in-house system we use to book the travel, but when I travel not-for-Microsoft I use Expedia and Kayak to find what I need. Traveling on Sunday and Friday is the worst. I have to do it sometimes (like this week) and it’s always a bad idea. But you can blunt the impact by booking as early as you can stand it. That means I have to be up super-early, but the flights are normally on time. I stay flexible, and always have a backup plan in case the flights are delayed or canceled. For the hotel, I tend to go on the cheaper side, and I look for older hotels that have been renovated, or quirky ones. For instance, in Boise, ID recently I stayed at a 60’s-themed (think Mad-Men) hotel that was very cool. Always I go on the less expensive side – I find the “luxury” hotels nail me for Internet, food, everything. The cheaper places include all kinds of things, and even have breakfasts, shuttles and all kinds of things that start to add up. I even call ahead to make sure there’s an iron and ironing board available, since I’ll need those when I get there. I find any way I can not to get a car. I use mass-transit wherever possible, and try to make friends and pay their gas to take me places. In a pinch, I’ll use a taxi. It ends up being cheaper, faster, and less stressful all around. Packing Over the years I’ve learned never to check luggage whenever I can. To do that, I lay out everything I want to take with me on the bed, and then try and make sure I’m really going to use it. I wear a dark wool set of pants, which I can clean and wear in hot and cold climates. I bring undies and socks of course, and for most places I have to wear “dress up” shirts. I bring at least two print T-Shirts in case I want to dress down for something while I’m gone, but I only bring one set of shoes. All the clothes are rolled as tightly as possible as I learned in the military. Then I use those to cushion the electronics I take. For toiletries I bring a shaver, toothpaste and toothbrush, D/O and a small brush. Everything else the hotel will provide. For entertainment, I take a small Zune, a full PC-Headset (so I can make IP calls on the road) and my laptop. I don’t take books or anything else – everything is electronic. I use E-books (downloaded from our Library), Audio-Books (on the Zune) and I also bring along a Kaossilator (more here) to play music in the hotel room or even on the plane without being heard. If I can, I pack into one roll-on bag. There’s not a lot better than this one, but I also have a Bag I was given as a prize for something or other here at Microsoft. Either way, I like something with less pockets and more big, open compartments. Everything gets rolled up and packed in, with all of the wires and charges in small bags my wife made for me. The laptop (and anything I don’t want gate-checked) goes on top or in an outside pouch so I can grab it quickly if I have to gate-check the bag. As much as I can, I try to go in one bag. When I can’t (like this week) I use this bag since it can expand, roll up, crush and even be put away later. It’s super-heavy canvas and worth the price. This allows me to not check a bag. Journey Logistics The day of the trip, I have everything ready since I’m getting up early. I pack a few small snacks inside a plastic large-mouth water bottle, which protects the snacks and lets me get water in the terminal. I bring along those little powdered drink mixes to add to the water. At the airport, I make a beeline for the power-outlets. I charge up my laptop and phone, and download all my e-mails so I can work on them off-line in the air. I don’t travel as often as I used to – just every month or so now, so I don’t have a membership to an airline club. If I travel much more, I’ll invest in one again – they are WELL worth the money, for the wifi, food and quiet if for nothing else. I print out my logistics on paper and put that in my pocket – flight numbers, hotel addresses and phones for everything. That way if I have to make a change, I don’t have to boot up anything or even have power to be able to roll with the punches if things change. Working While Away While I’m away I realize I’m going to be swamped with things at the conference or with my clients. So I turn on Out-Of-Office notifications to let people know I won’t be as responsive, and I keep my Outlook calendar up to date so my co-workers know what I’m up to. I even update it with hotel and phone info in case they really need to reach me. I share my calendar with my wife so my family knows what I’m doing as well. I check my e-mail during breaks, but I only respond to them in the evening or early morning at the hotel. I tweet during conferences. The point is to be as present as possible during the event or when I’m at the clients. Both deserve it. So those are my initial thoughts. I’ll tag Brent Ozar, Brad McGeHee and Paul Randal, and they can tag whomever they wish. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Replication Services as ETL extraction tool

- by jorg

In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication… The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button. In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines! On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication. Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions The New Subscription Wizard appears Select the publisher on which you just created your publication and select the database and publication (SnapshotTest) You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful. You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here Click Next Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot: Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot: Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data: Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Developing a Cost Model for Cloud Applications

- by BuckWoody

Note - please pay attention to the date of this post. As much as I attempt to make the information below accurate, the nature of distributed computing means that components, units and pricing will change over time. The definitive costs for Microsoft Windows Azure and SQL Azure are located here, and are more accurate than anything you will see in this post: http://www.microsoft.com/windowsazure/offers/ When writing software that is run on a Platform-as-a-Service (PaaS) offering like Windows Azure / SQL Azure, one of the questions you must answer is how much the system will cost. I will not discuss the comparisons between on-premise costs (which are nigh impossible to calculate accurately) versus cloud costs, but instead focus on creating a general model for estimating costs for a given application. You should be aware that there are (at this writing) two billing mechanisms for Windows and SQL Azure: “Pay-as-you-go” or consumption, and “Subscription” or commitment. Conceptually, you can consider the former a pay-as-you-go cell phone plan, where you pay by the unit used (at a slightly higher rate) and the latter as a standard cell phone plan where you commit to a contract and thus pay lower rates. In this post I’ll stick with the pay-as-you-go mechanism for simplicity, which should be the maximum cost you would pay. From there you may be able to get a lower cost if you use the other mechanism. In any case, the model you create should hold. Developing a good cost model is essential. As a developer or architect, you’ll most certainly be asked how much something will cost, and you need to have a reliable way to estimate that. Businesses and Organizations have been used to paying for servers, software licenses, and other infrastructure as an up-front cost, and power, people to the systems and so on as an ongoing (and sometimes not factored) cost. When presented with a new paradigm like distributed computing, they may not understand the true cost/value proposition, and that’s where the architect and developer can guide the conversation to make a choice based on features of the application versus the true costs. The two big buckets of use-types for these applications are customer-based and steady-state. In the customer-based use type, each successful use of the program results in a sale or income for your organization. Perhaps you’ve written an application that provides the spot-price of foo, and your customer pays for the use of that application. In that case, once you’ve estimated your cost for a successful traversal of the application, you can build that into the price you charge the user. It’s a standard restaurant model, where the price of the meal is determined by the cost of making it, plus any profit you can make. In the second use-type, the application will be used by a more-or-less constant number of processes or users and no direct revenue is attached to the system. A typical example is a customer-tracking system used by the employees within your company. In this case, the cost model is often created “in reverse” - meaning that you pilot the application, monitor the use (and costs) and that cost is held steady. This is where the comparison with an on-premise system becomes necessary, even though it is more difficult to estimate those on-premise true costs. For instance, do you know exactly how much cost the air conditioning is because you have a team of system administrators? This may sound trivial, but that, along with the insurance for the building, the wiring, and every other part of the system is in fact a cost to the business. There are three primary methods that I’ve been successful with in estimating the cost. None are perfect, all are demand-driven. The general process is to lay out a matrix of: components units cost per unit and then multiply that times the usage of the system, based on which components you use in the program. That sounds a bit simplistic, but using those metrics in a calculation becomes more detailed. In all of the methods that follow, you need to know your application. The components for a PaaS include computing instances, storage, transactions, bandwidth and in the case of SQL Azure, database size. In most cases, architects start with the first model and progress through the other methods to gain accuracy. Simple Estimation The simplest way to calculate costs is to architect the application (even UML or on-paper, no coding involved) and then estimate which of the components you’ll use, and how much of each will be used. Microsoft provides two tools to do this - one is a simple slider-application located here: http://www.microsoft.com/windowsazure/pricing-calculator/ The other is a tool you download to create an “Return on Investment” (ROI) spreadsheet, which has the advantage of leading you through various questions to estimate what you plan to use, located here: https://roianalyst.alinean.com/msft/AutoLogin.do?d=176318219048082115 You can also just create a spreadsheet yourself with a structure like this: Program Element Azure Component Unit of Measure Cost Per Unit Estimated Use of Component Total Cost Per Component Cumulative Cost Of course, the consideration with this model is that it is difficult to predict a system that is not running or hasn’t even been developed. Which brings us to the next model type. Measure and Project A more accurate model is to actually write the code for the application, using the Software Development Kit (SDK) which can run entirely disconnected from Azure. The code should be instrumented to estimate the use of the application components, logging to a local file on the development system. A series of unit and integration tests should be run, which will create load on the test system. You can use standard development concepts to track this usage, and even use Windows Performance Monitor counters. The best place to start with this method is to use the Windows Azure Diagnostics subsystem in your code, which you can read more about here: http://blogs.msdn.com/b/sumitm/archive/2009/11/18/introducing-windows-azure-diagnostics.aspx This set of API’s greatly simplifies tracking the application, and in fact you can use this information for more than just a cost model. After you have the tracking logs, you can plug the numbers into ay of the tools above, which should give a representative cost or in some cases a unit cost. The consideration with this model is that the SDK fabric is not a one-to-one comparison with performance on the actual Windows Azure fabric. Those differences are usually smaller, but they do need to be considered. Also, you may not be able to accurately predict the load on the system, which might lead to an architectural change, which changes the model. This leads us to the next, most accurate method for a cost model. Sample and Estimate Using standard statistical and other predictive math, once the application is deployed you will get a bill each month from Microsoft for your Azure usage. The bill is quite detailed, and you can export the data from it to do analysis, and using methods like regression and so on project out into the future what the costs will be. I normally advise that the architect also extrapolate a unit cost from those metrics as well. This is the information that should be reported back to the executives that pay the bills: the past cost, future projected costs, and unit cost “per click” or “per transaction”, as your case warrants. The challenge here is in the model itself - statistical methods are not foolproof, and the larger the sample (in this case I recommend the entire population, not a smaller sample) is key. References and Tools Articles: http://blogs.msdn.com/b/patrick_butler_monterde/archive/2010/02/10/windows-azure-billing-overview.aspx http://technet.microsoft.com/en-us/magazine/gg213848.aspx http://blog.codingoutloud.com/2011/06/05/azure-faq-how-much-will-it-cost-me-to-run-my-application-on-windows-azure/ http://blogs.msdn.com/b/johnalioto/archive/2010/08/25/10054193.aspx http://geekswithblogs.net/iupdateable/archive/2010/02/08/qampa-how-can-i-calculate-the-tco-and-roi-when.aspx Other Tools: http://cloud-assessment.com/ http://communities.quest.com/community/cloud_tools

Read the article
Merge replication stopping without errors in SQL 2008 R2

- by Rob Farley

A non-SQL MVP friend of mine, who also happens to be a client, asked me for some help again last week. I was planning on writing this up even before Rob Volk (@sql_r) listed his T-SQL Tuesday topic for this month. Earlier in the year, I (well, LobsterPot Solutions, although I’d been the person mostly involved) had helped out with a merge replication problem. The Merge Agent on the subscriber was just stopping every time, shortly after it started. With no errors anywhere – not in the Windows Event Log, the SQL Agent logs, not anywhere. We’d managed to get the system working again, but didn’t have a good reason about what had happened, and last week, the problem occurred again. I asked him about writing up the experience in a blog post, largely because of the red herrings that we encountered. It was an interesting experience for me, also because I didn’t end up touching my computer the whole time – just tapping on my phone via Twitter and Live Msgr. You see, the thing with replication is that a useful troubleshooting option is to reinitialise the thing. We’d done that last time, and it had started to work again – eventually. I say eventually, because the link being used between the sites is relatively slow, and it took a long while for the initialisation to finish. Meanwhile, we’d been doing some investigation into what the problem could be, and were suitably pleased when the problem disappeared. So I got a message saying that a replication problem had occurred again. Reinitialising wasn’t going to be an option this time either. In this scenario, the subscriber having the problem happened to be in a different domain to the publisher. The other subscribers (within the domain) were fine, just this one in a different domain had the problem. Part of the problem seemed to be a log file that wasn’t being backed up properly. They’d been trying to back up to a backup device that had a corruption, and the log file was growing. Turned out, this wasn’t related to the problem, but of course, any time you’re troubleshooting and you see something untoward, you wonder. Having got past that problem, my next thought was that perhaps there was a problem with the account being used. But the other subscribers were using the same account, without any problems. The client pointed out that that it was almost exactly six months since the last failure (later shown to be a complete red herring). It sounded like something might’ve expired. Checking through certificates and trusts showed no sign of anything, and besides, there wasn’t a problem running a command-prompt window using the account in question, from the subscriber box. ...except that when he ran the sqlcmd –E –S servername command I recommended, it failed with a Named Pipes error. I’ve seen problems with firewalls rejecting connections via Named Pipes but letting TCP/IP through, so I got him to look into SQL Configuration Manager to see what kind of connection was being preferred... Everything seemed fine. And strangely, he could connect via Management Studio. Turned out, he had a typo in the servername of the sqlcmd command. That particular red herring must’ve been reflected in his cheeks as he told me. During the time, I also pinged a friend of mine to find out who I should ask, and Ted Kruger (@onpnt) ‘s name came up. Ted (and thanks again, Ted – really) reconfirmed some of my thoughts around the idea of an account expiring, and also suggesting bumping up the logging to level 4 (2 is Verbose, 4 is undocumented ridiculousness). I’d just told the client to push the logging up to level 2, but the log file wasn’t appearing. Checking permissions showed that the user did have permission on the folder, but still no file was appearing. Then it was noticed that the user had been switched earlier as part of the troubleshooting, and switching it back to the real user caused the log file to appear. Still no errors. A lot more information being pushed out, but still no errors. Ted suggested making sure the FQDNs were okay from both ends, in case the servers were unable to talk to each other. DNS problems can lead to hassles which can stop replication from working. No luck there either – it was all working fine. Another server started to report a problem as well. These two boxes were both SQL 2008 R2 (SP1), while the others, still working, were SQL 2005. Around this time, the client tried an idea that I’d shown him a few years ago – using a Profiler trace to see what was being called on the servers. It turned out that the last call being made on the publisher was sp_MSenumschemachange. A quick interwebs search on that showed a problem that exists in SQL Server 2008 R2, when stored procedures have more than 4000 characters. Running that stored procedure (with the same parameters) manually on SQL 2005 listed three stored procedures, the first of which did indeed have more than 4000 characters. Still no error though, and the problem as listed at http://support.microsoft.com/kb/2539378 describes an error that should occur in the Event log. However, this problem is the type of thing that is fixed by a reinitialisation (because it doesn’t need to send the procedure change across as a transaction). And a look in the change history of the long stored procs (you all keep them, right?), showed that the problem from six months earlier could well have been down to this too. Applying SP2 (with sufficient paranoia about backups and how to get back out again if necessary) fixed the problem. The stored proc changes went through immediately after the service pack was applied, and it’s been running happily since. The funny thing is that I didn’t solve the problem. He had put the Profiler trace on the server, and had done the search that found a forum post pointing at this particular problem. I’d asked Ted too, and although he’d given some useful information, nothing that he’d come up with had actually been the solution either. Sometimes, asking for help is the most useful thing you can do. Often though, you don’t end up getting the help from the person you asked – the sounding board is actually what you need. @rob_farley

Read the article
Pete Muir Interview on CDI 1.1

- by reza_rahman

The 109th episode of the Java Spotlight podcast features an interview with CDI 1.1 spec lead Pete Muir of JBoss/Red Hat. Pete talks with Roger Brinkley about the backdrop to CDI, his work at JBoss, the features in CDI 1.1 and what to expect in the future. What's going on behind the scenes and the possible contents for CDI 1.1+ are particularly insightful. You can listen to the full interview here.

Read the article
Three Hidden Extensibility Gems in ASP.NET 4

- by Latest Microsoft Blogs

ASP.NET 4 introduces a few new extensibility APIs that live the hermit lifestyle away from the public eye. They’re not exactly hidden - they are well documented on MSDN - but they aren’t well publicized. It’s about time we shine a spotlight on them. PreApplicationStartMethodAttribute Read More......(read more)

Read the article
How I Work: Staying Productive Whilst Traveling

- by BuckWoody

I travel a lot. Not like some folks that are gone every week, mind you, although in the last month I’ve been to: Cambridge, UK; Anchorage, AK; San Jose, CA; Copenhagen, DK, Boston, MA; and I’m currently en-route to Anaheim, CA. While this many places in a month is a bit unusual for me, I would say I travel frequently. I’ve travelled most of my 28+ years in IT, and at one time was a consultant traveling weekly. With that much time away from my primary work location, I have to find ways to stay productive. Some might say “just rest – take a nap!” – but I’m not able to do that. For one thing, I’m a very light sleeper and I’ve never slept on a plane - even a 30+ hour trip to New Zealand in Business Class - so that just isn’t option. I also am not always in the plane, of course. There’s the hotel, the taxi/bus/train, the airport and then all that over again when I arrive. Since my regular jobs have many demands, I have to get work done. Note: No, I’m not always focused on work. I need downtime just like everyone else. Sometimes I just think, watch a movie or listen to tunes – and I give myself permission to do that anytime – sometimes the whole trip. I have too fewheartbeats left in life to only focus on work – it’s just not that important, and neither am I. Some of these tasks are letters to friends and family, or other personal things. What I’m talking about here is a plan, not some task list I have to follow. When I get to the location I’m traveling to, I always build in as much time as I can to ensure I enjoy those sights and the people I’m with. I would find traveling to be a waste if not for that. The Unrealistic Expectation As I would evaluate the trip I was taking – say a 6-8 hour flight – I would expect to get 10-12 hours of work done. After all, there’s the time at the airport, the taxi and so on, and then of course the time in the air with all of the room, power, internet and everything else I needed to get my work done. I would pile up tasks at home, pack my bags, and head happily to the magical land of the TSA. Right. On return from the trip, I had accomplished little, had more e-mails and other work that had piled up, and I was tired, hungry, and unorganized. This had to change. So, I decided to do three things: Segment my work Set realistic expectations Plan accordingly Segmenting By Available Resources The first task was to decide what kind of work I could do in each location – if any. I found that I was dependent on a few things to get work done, such as power, the Internet, and a place to sit down. Before I fly, I take some time at home to get all of the work I’d like to accomplish while away segmented into these areas, and print that out on paper, which goes in my suit-coat pocket along with a mechanical pencil. I print my tickets, and I’m all set for the adventure ahead. Then I simply do each kind of work whenever I’m in that situation. No power There are certain times when I don’t have power available. But not only that, I might not even be able to use most of my electronics. So I now schedule as many phone calls as I can for the taxi/bus/train ride and the airports as I can. I have a paper notebook (Moleskine, of course) and a pencil and I print out any notes or numbers I need prior to the trip. Once I’m airborne or at the airport, I work on my laptop. I check and respond to e-mails, create slides, write code, do architecture, whatever I can. If I can’t use any electronics, or once the power runs out, I schedule time for reading. I can read at the airport or anywhere, actually, even in-flight or any other transport. I “read with a pencil”, meaning I take a lot of notes, which I liketo put in OneNote, but since in most cases I don’t have power, I use the Moleskine to do that. Speaking of which, sometimes as I’m thinking I come up with new topics, ideas, blog posts, or things to teach in my classes. Once again I take out the notebook and write it down. All of these notes get a check-mark when I get back to the office and transfer the writing to OneNote. I’ve tried those “smart pens” and so on to automate this, but it just never works out. Pencil and paper are just fine. As I mentioned, sometime I just need to think. I’ll do nothing, and let my mind wander, thinking of nothing in particular, or some math problem or science question I’m interested in. My only issue with this is that I communicate tothink, and I don’t want to drive people crazy by being that guy that won’t shut up, so I think in a different way. Power, but no Internet or Phone If I have power but no Internet or phone, I focus on the laptop and the tablet as before, and I also recharge my other gadgets. Power, Internet, Phone and a Place to Work At first I thought that when I arrived at the hotel or event I could get the same amount of work done that I do at the office. Not so. There’s simply too many distractions, things you need, or other issues that allow this. Of course, Ican work on any device, read, think, write or whatever, but I am simply not as productive as I am in my home office. So I plan for about 25-50% as much work getting done in this environment as I think I could really do. I’ve done some measurements, and this holds out to be true almost every time. The key is that I re-set my expectations (and my co-worker’s expectations as well) that this is the case. I use the Out-Of-Office notices to let people know that I’m just not going to be 100% at this time – it’s hard for everyone, but it’s more honest and realistic, and I’d rather they know that – and that I realize that – than to let them think I’m totally available. Because I’m not – I’m traveling. I don’t tend to put too much detail, because after all I don’t necessarily want to let people know when I’m not home :) but I do think it’s important to let people that depend on my know that I’ll get back with them later. I hope this helps you think through your own methodology of staying productive when you travel. Or perhaps you just go offline, and don’t worry about any of this – good for you! That’s completely valid as well. (Oh, and yes, I wrote this at 35K feet, on Alaska Airlines on a trip. :) Practice what you preach, Buck.)

Read the article
Convincing my coworkers to use Hudson CI

- by in0de

Im really aware of some benefits of using Hudson as CI server. But, im facing the problem to convince my coworkers to install and use it. To put some context, we are developing two different products (one is an enterprise search engine based on Apache Solr) and several enterprise search projects. We are facing a lot of versioning issues and i think Hudson will solve this problems. They argued about its productivity and learning curve What Hudson's benefits would you spotlight?

Read the article
Don't Miss Oracle UPK at the Oracle Applications Virtual Tradeshow

- by di.seghposs(at)oracle.com

Be sure to visit the Oracle Applications Virtual Tradeshow - Spotlight on Customer Success - February 3, 2011. If you are considering using Oracle UPK for a project or an upgrade, this is an event you don't want to miss. Hear how the City and County of San Francisco used Oracle UPK for their successful PeopleSoft upgrade. Get a chance to meet the experts and listen to 20+ customers share their success with Oracle Applications. Register Now!

Read the article
What is SEO? Discover Cheap, Easy and Effective Methods to Get on Page 1 of the Search Engines

SEO is a very important and powerful method to help your website achieve a page 1 ranking. With many websites not even being found and indexed by the major search engines, let alone making it onto page 1, the use of SEO has taken the spotlight and should not be ignored if you wish to give yourself the best chance of being successful.

Read the article
apache performance improvements and maxclients

- by updog

I know this has been asked a few (thousand) times around the internet but I was hoping someone whose in the know might be able to comment on my particular setup. I have a web server hosting one site (php/codeigniter) with a wordpress blog in a sub directory. The server has 2GB RAM, 3GHz CPU and I have offloaded the static assets to CloudFlare which is has reduced bandwidth for the actual server by almost 75%. The problem I have is when an email campaign is sent out that links to the site or blog, it slows down. Below is my settings in apache2.conf. Average apache process size is 80M and there is 1.5GB available for apache. <IfModule mpm_prefork_module> StartServers 8 MinSpareServers 5 MaxSpareServers 20 MaxClients 20 MaxRequestsPerChild 2000 </IfModule> I have already setup and installed apc and built some caching into the site and used w3totalcache on the blog. The number of concurrent users is around 2-300 when there is a campaign, are there any further optimisations before

Read the article
I Can't Get Ruby on Rails + Passenger + Apache to Work

- by Luke Crowe

I'm sorry if this is a stupid question, but I can't get Ruby on Rails to work on my Apache server. I'm using Phusion Passenger (mod_rails, mod_rack) for app deployment. Here is my RoR-specific configuration code in my website's Apache configuration file: Alias /rails /var/www/syyborg.com/ruby/blog/public <Directory /var/www/syyborg.com/ruby/blog/public Options FollowSymLinks AllowOverride None Order Allow,Deny Allow from All </Directory RailsBaseURI /rails Again, I really have very little knowledge of this kind of thing; I have never set up a server from scratch before. Anyways, my rails app, as you can see, is located at /var/www/syyborg.com/ruby/blog/. I am trying to access it from http://[my domain, syyborg.com]/rails. However, when I try to load the site, I get a "403 Forbidden" error. Any help would be greatly appreciated, and I can provide further details if they are required. Thanks in advance!

Read the article
Redirecting to a folder

- by RN

Apache2 Plesk 9.x I have a website www.example.com and my blog is on www.example.com/blog I have no content on www.example.com as of now So I want all requests for example.com to be redirected to www.example.com/blog How should I do that ? Is this something I can do in Apache? I am using the GoDaddy DNS server. Not sure if it matters- but I have multiple domains hosted n the same server. And I am using Plesk to manage my virtual hosts.

Read the article
Set WordPress permalinks directly in httpd.conf?

- by songdogtech

Is is possible to configure WordPress permalinks directly in Apache httpd.conf? I have a server situation (Apache 2.2.3 CentOS PHP5.1.6) where I can't use .htaccess for performance reasons, but can use httpd.conf. The admin says that mod_rewrite is enabled, but AllowOverride is not, and I can't change those settings. And I need to restrict the permalinks to just the "blog" directory. This is what would go in .htaccess but needs to go into httpd.conf: <IfModule mod_rewrite.c> RewriteEngine On RewriteBase /blog/ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /blog/index.php [L] </IfModule> Thanks...

Read the article
How to redirect from one web directory to another with Apache

- by RN

Apache2 Plesk 9.x I have a website www.example.com and my blog is on www.example.com/blog I have no content on www.example.com as of now So I want all requests for example.com to be redirected to www.example.com/blog How should I do that ? Is this something I can do in Apache? I am using the GoDaddy DNS server. Not sure if it matters- but I have multiple domains hosted n the same server. And I am using Plesk to manage my virtual hosts.

Read the article
ISPConfig - Unexisting subdomain address goes to an existing one

- by xperator

I am running Nginx/ISPConfig setup for about 6-7 months. Never had a problem and everything is smooth. But I just noticed that if browse to "blab.example.com", the page opens one of my wordpress blogs on the other domain. No matter what name I use for subdomain, Anything that I enter randomly "b53ks.example.com" still goes to that blog page. I have 3 or 4 different domain names and websites on the same server. But I think I misconfigured somewhere and that might be the cause of this. Lets say I have these domains: example-1.com, example-2.com, another-example.com If I go to anything.example-1.com or serverfault.example-2.com, or google.another-example.com the returned page is my blog at blog.example-1.com Note : I didn't set any subdomain in ISPConfig. Instead, I used "Add new website" for making a subdomain.

Read the article
timeout with apache & php w/ each virtual host has his own user process

- by acemtp

I have 10 unix users in /home/. Each user is for a specific subdomain for example user www in /home/www/public_html is for www.mywebsite. blog in /home/blog/public_html is for blog.mywebsite. 90% is php and 10% ror for the moment i use apache + fastcgi that use SuexecUserGroup to setup the process with the good user. it seems to works but i have a strange behavior where after a few hours/days, the server stop answering (timeout) but the cpu load is still very low (it's a big server), the apache status display lot of "W" Sending Reply states but there's still 50 idle workers so it should be able to answer. in the older server (lot of slower) we add only one user and using mod_php and we never had this issue. is there another way to do that without fastcgi and SuexecUserGroup or do you know what's going wrong?

Read the article
Is my website running on an iPhone?

- by Stefano Borini

Provocative question, but in any case, this is what it would appear... unless there's some other reason, of course. In my cystat wordpress log, I obtained the following entry IP Browser OS Date Method Type URL 127.0.0.1 Safari 419.3 iPhone July 30, 2009 7:39 pm GET BLOG /blog/ The IP address is the IP of the visiting client. It's clear that this is not possible. Why do I get 127.0.0.1 as IP? cystat bug? some weird network trick I am not aware of? or is my website really running on an iPhone, and the guy at the applestore is reading my blog ?

Read the article

Search Results

Search found 10711 results on 429 pages for 'blog spotlight'.

Page 74/429 | < Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81 | Next Page >

- by Roman Schindlauer

- by matthew

- by Piotr Rodak

- by SQLOS Team

- by BuckWoody

- by jorg

- by Rob Farley

- by Rob Farley

- by Dejan Sarka

- by BuckWoody

- by Rob Farley

- by reza_rahman

- by Latest Microsoft Blogs

- by BuckWoody

- by in0de

- by di.seghposs(at)oracle.com

- by updog

- by Luke Crowe

- by RN

- by songdogtech

- by RN

- by xperator

- by acemtp

- by Stefano Borini

< Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81 | Next Page >