Search Results

Search found 36788 results on 1472 pages for 'sql 2008'.

Page 245/1472 | < Previous Page | 241 242 243 244 245 246 247 248 249 250 251 252  | Next Page >

  • Send email from different domains to different external IP's on a single server

    - by user140429
    I have set up a windows 2008 R2 server to route email from Exchange 2010 using SMTP Server in IIS. I have 3 seperate domains and would like to route each one through a different internal and external IP for (IP Reputation etc), at the minute it is only using the primary IP on the server to route email externally. Is this at all possible using SMTP Server in IIS, or is there any other software available to do this?

    Read the article

  • Forms authentication for main site, Windows auth for subfolder

    - by John D
    Hi all, On my Windows 2008 R2 server with IIS 7.5 I would like to have my ASP.NET website running with forms authentication, while protecting a subfolder with the basic Windows authentication. I have done this on Windows 2003 with IIS 6 for years, but I simply can't get it to work with IIS 7.5. Your input would be highly appreciated :)

    Read the article

  • config.nt not exist in windows server 2008 64bit

    - by user1853266
    i have a server with high traffic , it's about 20K request/sec at peak time and 5~10k request/sec at normal time but i have a serious problem i install nginx in windows server 2008 standard edition 64bit , but i get this error massage 2012/11/26 05:29:23 [error] 2496#2004: *3976 maximum number of descriptors supported by select() is 1024 while reading client request line, client: X.X.X.X, server: 0.0.0.0:8080 when i search , i found this problem is about dos application file handle limit , and i can be changed it on c:\windir\system32\Config.nt but config.nt not exist i also hear , in 64Bit os version , this file not exist so how can i change file handle in windows server 2008 - 64Bit ?

    Read the article

  • DNS Setting keeps changing on me

    - by Chiggins
    So on my Windows Server 2008 box, I have a DNS server installed on it. For some reason, every ten minutes or so, the Host (A) address for the computer keeps on changing to its internal private IP address. I want it to have its public address for Active Directory purposes, but it keeps changing itself back to the private IP address. Any idea as to why, and how to change it? If it makes a difference, this is an Amazon EC2 server. Thanks

    Read the article

  • Hyper-V Server Management from Windows 8.1

    - by David Mackintosh
    Is there a way to manage a Hyper-V cluster that runs on a Windows Server 2008 R2 Datacenter from a Windows 8.1 Pro workstation? I've downloaded and installed the Remote Server Admin Tools for 8.1 (http://www.microsoft.com/en-us/download/details.aspx?id=39296), but when I enable the Hyper-V manager, it tells me that I can't manage Hyper-V servers on 2K8 or 2K8R2. Related, the Failover Cluster Manager barfs with a similar error. How do I manage my old servers with my new workstation?

    Read the article

  • cygwin running commands with switches when ssh in

    - by Troz123
    When I SSH into a Windows 2008 R2 box and try to run a command with switches like /cygdrive/c/directory/Reports.exe /ReportID=1 /DateRange=LastWeek the command just hangs and never finishes. I can see the Reports.exe spawn a process under the user but never finishes. If I RDP into the box open cygwin terminal and run the exact command it works. Any reason why I can't run the command when I SSH in?

    Read the article

  • Why web application on https stops working after windows update?

    - by Rychu
    On Windows Server 2008 R2 I have several web applications on IIS7. One of these applications has https binding with SSL certificate. However after every windows update this one application stops working. Browser says that server is unavailable. It starts working again when I simply open IIS manager, select that https binding, click edit and without changing anything click OK. Why is this happening?

    Read the article

  • How do I find a domain server on my client right away?

    - by SantaC
    This is a problem that I noticed. I have a Windows 2008 R2 Server and joined it to my windows 7 client. Now when I am trying to reach my "share" that i created in the Win2008 server, it does not show up at the Network tab in Windows 7. Instead, the only way I found it was to manually type in \\myserverlocation in the run prompt. Is there any way to find my share right way without doing this?

    Read the article

  • Why must user be logged in for impersonation to work?

    - by user16011
    My Windows Server 2008 server hosts an ASP.net application that uses impersonation. The application works as long as the user being impersonated remains logged on to the server. However, when the user logs off, clients can no longer view the web pages. They get a cryptic error instead. How can I configure the server to work without the impersonated user remaining logged on? Thanks in advance.

    Read the article

  • Access a remote active directory

    - by theXs
    I'm currently trying to query a remote Active Directory on a Windows Server 2008 R2. However, I'm not able to query the directory if I enter the following string in the cmd line: dsquery user -name m* -s ip:389 -u -p Furthermore, I tried to access the directory with: ldap://: but it didn't work either. I received the following error message: The server is not operational. Is there an option with which I can enable the remote access of an Active Directory?

    Read the article

  • Why Remote Desktop Sessions show client internal IP address? [closed]

    - by Varp
    I have Windows Server 2008 r2 with static ip address on WAN interface. I connecting to the server from home from my laptop. Laptop at home is behind nat box. When i connecting to the server from home in Remote Desktop Session Manager i see in client status dialog a local ip address of client behind the nat box not WAN ip address of nat box. I suppose i must see the WAN ip address of the nat box in Remote Desktop Session Manager, isnt it?

    Read the article

  • Increase in Virtual Memory [closed]

    - by Philly
    If we increase the virtual memory of windows server 2008 will it improve the performance on the application. upto what size can we increase the virtual memory in 32bit server. What are disadvantages and advantages of it? we are running bulk jobs(all imports and exports) everyday at 3pm.. because of that we are getting some system out of memory exceptions and and application is going down. Any suggestions. Thanks

    Read the article

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • Oracle SQL Developer v3.2.1 Now Available

    - by thatjeffsmith
    Oracle SQL Developer version 3.2.1 is now available. I recommend that everyone now upgrade to this release. It features more than 200 bug fixes, tweaks, and polish applied to the 3.2 edition. The high profile bug fixes submitted by customers and users on our forums are listed in all their glory for your review. I want to highlight a few of the changes though, as I recognize many of you lack the time and/or patience to ‘read the docs.’ That would include me, which is why I enjoy writing these kinds of blog posts. I’m lazy – just like you! No more artificial line breaks between CREATE OR REPLACE and your PL/SQL In versions 3.2 and older, when you pull up your stored procedural objects in our editor, you would see a line break inserted between the CREATE OR REPLACE and then the body of your code. In version 3.2.1, we have removed the line break. 3.1 3.2.1 Trivia Did You Know? The database doesn’t store the ‘CREATE’ or ‘CREATE OR REPLACE’ bit of your PL/SQL code in the database. If we look at the USER_SOURCE view, we can see that the code begins with the object name. So the CREATE OR REPLACE bit is ‘artificial’ The intent is to give you the code necessary to recreate your object – and have it ‘compile’ into the database. We pretty much HAVE to add the ‘CREATE OR REPLACE.’ From now on it will appear inline with the first line of your code. Exporting Tables & Views When exporting data from your tables or views, previous versions of SQL Developer presented a 3 step wizard. It allows you to choose your columns and apply data filters for what is exported. This was kind of redundant. The grids already allowed you to select your columns and apply filters. Wouldn’t it be more intuitive AND efficient to just make the grids behave in a What You See Is What You Get (WYSIWYG) fashion? In version 3.2.1, that is exactly what will happen. The wizard now only has two steps and the grid will export the data and columns as defined in the visible grid. Let the grid properties define what is actually exported! And here is what is pasted into my worksheet: "BREWERY"|"CITY" "3 Brewers Restaurant Micro-Brewery"|"Toronto" "Amsterdam Brewing Co."|"Toronto" "Ball Brewing Company Ltd."|"Toronto" "Big Ram Brewing Company"|"Toronto" "Black Creek Historic Brewery"|"Toronto" "Black Oak Brewing"|"Toronto" "C'est What?"|"Toronto" "Cool Beer Brewing Company"|"Toronto" "Denison's Brewing"|"Toronto" "Duggan's Brewery"|"Toronto" "Feathers"|"Toronto" "Fermentations! - Danforth"|"Toronto" "Fermentations! - Mount Pleasant"|"Toronto" "Granite Brewery & Restaurant"|"Toronto" "Labatt's Breweries of Canada"|"Toronto" "Mill Street Brew Pub"|"Toronto" "Mill Street Brewery"|"Toronto" "Molson Breweries of Canada"|"Toronto" "Molson Brewery at Air Canada Centre"|"Toronto" "Pioneer Brewery Ltd."|"Toronto" "Post-Production Bistro"|"Toronto" "Rotterdam Brewing"|"Toronto" "Steam Whistle Brewing"|"Toronto" "Strand Brasserie"|"Toronto" "Upper Canada Brewing"|"Toronto" JUST what I wanted And One Last Thing Speaking of export, sometimes I want to send data to Excel. And sometimes I want to send multiple objects to Excel – to a single Excel file that is. In version 3.2.1 you can now do that. Let’s export the bulk of the HR schema to Excel, with each table going to it’s own workbook in the same worksheet. Select many tables, put them in in a single Excel worksheet If you try this in previous versions of SQL Developer it will just write the first table to the Excel file. This is one of the bugs we addressed in v3.2.1. Here is what the output Excel file looks like now: Many tables - Many workbooks in an Excel Worksheet I have a sneaky suspicion that this will be a frequently used feature going forward. Excel seems to be the cornerstone of many of our popular features. Imagine that!

    Read the article

  • Installed Service Pack 3 for SQL Server 2005 but it does not show up when selecting @@version

    - by Blegger
    We installed Service Pack 3 for SQL Server 2005 Standard Edition (64-bit). In the Add or Remove Programs menu we have the following entries under Microsoft SQL Server 2005 (64-bit): Service Pack 3 for SQL Server Integration Services 64-bit) ENU (KB955706) Service Pack 3 for SQL Server Notification Services 2005 (64-bit) ENU (KB955706) Service Pack 3 for SQL Server Database Services 2005 (64-bit) ENU (KB955706) Service Pack 3 for SQL Server Tools and Workstation Components 2005 (64-bit) ENU (KB955706) But when I run the following query on the server: select @@version I get this result: Microsoft SQL Server 2005 - 9.00.4035.00 (X64) Nov 24 2008 16:17:31 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2) Why does it not display that it is running Service Pack 3?

    Read the article

  • Using SQL Execution Plans to discover the Swedish alphabet

    - by Rob Farley
    SQL Server is quite remarkable in a bunch of ways. In this post, I’m using the way that the Query Optimizer handles LIKE to keep it SARGable, the Execution Plans that result, Collations, and PowerShell to come up with the Swedish alphabet. SARGability is the ability to seek for items in an index according to a particular set of criteria. If you don’t have SARGability in play, you need to scan the whole index (or table if you don’t have an index). For example, I can find myself in the phonebook easily, because it’s sorted by LastName and I can find Farley in there by moving to the Fs, and so on. I can’t find everyone in my suburb easily, because the phonebook isn’t sorted that way. I can’t even find people who have six letters in their last name, because also the book is sorted by LastName, it’s not sorted by LEN(LastName). This is all stuff I’ve looked at before, including in the talk I gave at SQLBits in October 2010. If I try to find everyone who’s names start with F, I can do that using a query a bit like: SELECT LastName FROM dbo.PhoneBook WHERE LEFT(LastName,1) = 'F'; Unfortunately, the Query Optimizer doesn’t realise that all the entries that satisfy LEFT(LastName,1) = 'F' will be together, and it has to scan the whole table to find them. But if I write: SELECT LastName FROM dbo.PhoneBook WHERE LastName LIKE 'F%'; then SQL is smart enough to understand this, and performs an Index Seek instead. To see why, I look further into the plan, in particular, the properties of the Index Seek operator. The ToolTip shows me what I’m after: You’ll see that it does a Seek to find any entries that are at least F, but not yet G. There’s an extra Predicate in there (a Residual Predicate if you like), which checks that each LastName is really LIKE F% – I suppose it doesn’t consider that the Seek Predicate is quite enough – but most of the benefit is seen by its working out the Seek Predicate, filtering to just the “at least F but not yet G” section of the data. This got me curious though, particularly about where the G comes from, and whether I could leverage it to create the Swedish alphabet. I know that in the Swedish language, there are three extra letters that appear at the end of the alphabet. One of them is ä that appears in the word Västerås. It turns out that Västerås is quite hard to find in an index when you’re looking it up in a Swedish map. I talked about this briefly in my five-minute talk on Collation from SQLPASS (the one which was slightly less than serious). So by looking at the plan, I can work out what the next letter is in the alphabet of the collation used by the column. In other words, if my alphabet were Swedish, I’d be able to tell what the next letter after F is – just in case it’s not G. It turns out it is… Yes, the Swedish letter after F is G. But I worked this out by using a copy of my PhoneBook table that used the Finnish_Swedish_CI_AI collation. I couldn’t find how the Query Optimizer calculates the G, and my friend Paul White (@SQL_Kiwi) tells me that it’s frustratingly internal to the QO. He’s particularly smart, even if he is from New Zealand. To investigate further, I decided to do some PowerShell, leveraging the Get-SqlPlan function that I blogged about recently (make sure you also have the SqlServerCmdletSnapin100 snap-in added). I started by indicating that I was going to use Finnish_Swedish_CI_AI as my collation of choice, and that I’d start whichever letter cam straight after the number 9. I figure that this is a cheat’s way of guessing the first letter of the alphabet (but it doesn’t actually work in Unicode – luckily I’m using varchar not nvarchar. Actually, there are a few aspects of this code that only work using ASCII, so apologies if you were wanting to apply it to Greek, Japanese, etc). I also initialised my $alphabet variable. $collation = 'Finnish_Swedish_CI_AI'; $firstletter = '9'; $alphabet = ''; Now I created the table for my test. A single field would do, and putting a Clustered Index on it would suffice for the Seeks. Invoke-Sqlcmd -server . -data tempdb -query "create table dbo.collation_test (col varchar(10) collate $collation primary key);" Now I get into the looping. $c = $firstletter; $stillgoing = $true; while ($stillgoing) { I construct the query I want, seeking for entries which start with whatever $c has reached, and get the plan for it: $query = "select col from dbo.collation_test where col like '$($c)%';"; [xml] $pl = get-sqlplan $query "." "tempdb"; At this point, my $pl variable is a scary piece of XML, representing the execution plan. A bit of hunting through it showed me that the EndRange element contained what I was after, and that if it contained NULL, then I was done. $stillgoing = ($pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange -ne $null); Now I could grab the value out of it (which came with apostrophes that needed stripping), and append that to my $alphabet variable.   if ($stillgoing)   {  $c=$pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange.RangeExpressions.ScalarOperator.ScalarString.Replace("'","");     $alphabet += $c;   } Finally, finishing the loop, dropping the table, and showing my alphabet! } Invoke-Sqlcmd -server . -data tempdb -query "drop table dbo.collation_test;"; $alphabet; When I run all this, I see that the Swedish alphabet is ABCDEFGHIJKLMNOPQRSTUVXYZÅÄÖ, which matches what I see at Wikipedia. Interesting to see that the letters on the end are still there, even with Case Insensitivity. Turns out they’re not just “letters with accents”, they’re letters in their own right. I’m sure you gave up reading long ago, and really aren’t that fazed about the idea of doing this using PowerShell. I chose PowerShell because I’d already come up with an easy way of grabbing the estimated plan for a query, and PowerShell does allow for easy navigation of XML. I find the most interesting aspect of this as the fact that the Query Optimizer uses the next letter of the alphabet to maintain the SARGability of LIKE. I’m hoping they do something similar for a whole bunch of operations. Oh, and the fact that you know how to find stuff in the IKEA catalogue. Footnote: If you are interested in whether this works in other languages, you might want to consider the following screenshot, which shows that in principle, it should work with Japanese. It might be a bit harder to run this in PowerShell though, as I’m not sure how it translates. In Hiragana, the Japanese alphabet starts ?, ?, ?, ?, ?, ...

    Read the article

  • Fraud Detection with the SQL Server Suite Part 2

    - by Dejan Sarka
    This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

    Read the article

  • Querying the SSIS Catalog? Here’s a handy query!

    - by jamiet
    I’ve been working on a SQL Server Integration Services (SSIS) solution for about 6 months now and I’ve learnt many many things that I intend to share on this blog just as soon as I get the time. Here’s a very short starter-for-ten… I’ve found the following query to be utterly invaluable when interrogating the SSIS Catalog to discover what is going on in my executions: SELECT event_message_id,MESSAGE,package_name,event_name,message_source_name,package_path,execution_path,message_type,message_source_typeFROM   (       SELECT  em.*       FROM    SSISDB.catalog.event_messages em       WHERE   em.operation_id = (SELECT MAX(execution_id) FROM SSISDB.catalog.executions)           AND event_name NOT LIKE '%Validate%'       )q/* Put in whatever WHERE predicates you might like*/--WHERE event_name = 'OnError'--WHERE package_name = 'Package.dtsx'--WHERE execution_path LIKE '%<some executable>%'ORDER BY message_time DESC Know it. Learn it. Love it. @jamiet

    Read the article

< Previous Page | 241 242 243 244 245 246 247 248 249 250 251 252  | Next Page >