Search Results

Search found 51 results on 3 pages for 'dejan pelzel'.

Page 1/3 | 1 2 3  | Next Page >

  • Reading binary data from serial port using Dejan TComport Delphi component

    - by johnma
    Apologies for this question but I am a bit of a noob with Delphi. I am using Dejan TComport component to get data from a serial port. A box of equipment connected to the port sends about 100 bytes of binary data to the serial port. What I want to do is extract the bytes as numerical values into an array so that I can perform calculations on them. TComport has a method Read(buffer,Count) which reads DATA from input buffer. function Read(var Buffer; Count: Integer): Integer; The help says the Buffer variable must be large enough to hold Count bytes but does not provide any example of how to use this function. I can see that the Count variable holds the number of bytes received but I can't find a way to access the bytes in Buffer. TComport also has a methord Readstr which reads data from input buffer into a STRING variable. function ReadStr(var Str: String; Count: Integer): Integer; Again the Count variable shows the number of bytes received and I can use Memo1.Text:=str to display some information but obviously Memo1 has problems displaying the control characters. I have tried various ways to try and extract the byte data from Str but so far without success. I am sure it must be easy. Here's hoping.

    Read the article

  • Data Modeling Resources

    - by Dejan Sarka
    You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

    Read the article

  • Data Quality and Master Data Management Resources

    - by Dejan Sarka
    Many companies or organizations do regular data cleansing. When you cleanse the data, the data quality goes up to some higher level. The data quality level is determined by the amount of work invested in the cleansing. As time passes, the data quality deteriorates, and you need to repeat the cleansing process. If you spend an equal amount of effort as you did with the previous cleansing, you can expect the same level of data quality as you had after the previous cleansing. And then the data quality deteriorates over time again, and the cleansing process starts over and over again. The idea of Data Quality Services is to mitigate the cleansing process. While the amount of time you need to spend on cleansing decreases, you will achieve higher and higher levels of data quality. While cleansing, you learn what types of errors to expect, discover error patterns, find domains of correct values, etc. You don’t throw away this knowledge. You store it and use it to find and correct the same issues automatically during your next cleansing process. The following figure shows this graphically. The idea of master data management, which you can perform with Master Data Services (MDS), is to prevent data quality from deteriorating. Once you reach a particular quality level, the MDS application—together with the defined policies, people, and master data management processes—allow you to maintain this level permanently. This idea is shown in the following picture. OK, now you know what DQS and MDS are about. You can imagine the importance on maintaining the data quality. Here are some resources that help you preparing and executing the data quality (DQ) and master data management (MDM) activities. Books Dejan Sarka and Davide Mauri: Data Quality and Master Data Management with Microsoft SQL Server 2008 R2 – a general introduction to MDM, MDS, and data profiling. Matching explained in depth. Dejan Sarka, Matija Lah and Grega Jerkic: MCTS Self-Paced Training Kit (Exam 70-463): Building Data Warehouses with Microsoft SQL Server 2012 – I wrote quite a few chapters about DQ and MDM, and introduced also SQL Server 2012 DQS. Thomas Redman: Data Quality: The Field Guide – you should start with this book. Thomas Redman is the father of DQ and MDM. Tyler Graham: Microsoft SQL Server 2012 Master Data Services – MDS in depth from a product team mate. Arkady Maydanchik: Data Quality Assessment – data profiling in depth. Tamraparni Dasu, Theodore Johnson: Exploratory Data Mining and Data Cleaning – advanced data profiling with data mining. Forthcoming presentations I am presenting a DQS and MDM seminar at PASS SQL Rally Amsterdam 2013: Wednesday, November 6th, 2013: Enterprise Information Management with SQL Server 2012 – a good kick start to your first DQ and / or MDM project. Courses Data Quality and Master Data Management with SQL Server 2012 – I wrote a 2-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Start improving the quality of your data now!

    Read the article

  • SQL Saturday #274 Slovenia

    - by Dejan Sarka
    Yes, here it is SQL Saturday #274 is coming to Slovenia (#sqlsatSlovenia). The event will take place on Saturday, December 21st, at company pixi* labs, Informacijske tehnologije, d.o.o. Poslovna cona A 2 SI-4208 Šencur This company generously offered to host the event. We, the whole Slovenian SQL Server community, are very grateful for this. At this time, a call for speakers went out, and we are already getting the first proposals. We are especially happy that we will get possibility to show the foreign speakers how beautiful Slovenia and especially the capital Ljubljana is in December. Expect a lot of partying right on the streets, no matter of weather. Be prepared, we have slightly weird customs when it comes to drinks. For example, our regular special discount offer is not three drinks for the price of two; it is six drinks for the price of five. If you are a speaker or want to become one, consider sending a proposal. Since most of the sessions will be held in English and you don’t want to speak, consider coming as a visitor as well. Or maybe you would be interested to become a sponsor. Although we are targeting a low budgeted event, any kind of sponsorship is very welcome. Please feel free to contact the organizers if you are interested to become a sponsor: Matija Lah – [email protected], Mladen Prajdic - [email protected], or Dejan Sarka  - [email protected]. Looking forward to see you all!

    Read the article

  • Data Mining Resources

    - by Dejan Sarka
    There are many different types of analyses, each one with its own pros and cons. Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points in time. One of the drawbacks is that report authoring is limited to IT pros and advanced users. Any kind of dynamic restructuring is very limited. If real-time data is used for a report, the report has a negative impact on the performance of the source system. Processing of the reports might be slow because the data comes from relational database management systems, which are not optimized for reporting only. If you create a semantic model of your data, your end users can create ad-hoc report structures. However, the development is more complex because a developer is needed to create these semantic models. For OLAP, you typically use specialized database management systems. You get lightning speed of analyses. End users can use rich and thin clients to interactively change the structure of the report. Typically, they do it graphically. However, the development of an OLAP system is many times quite complex. It involves the preparation and maintenance of an enterprise data warehouse and OLAP cubes. In order to exploit the possibility of real-time restructuring of reports, the users must be both active and educated. The data is usually stale, as it is loaded into data warehouses and OLAP cubes with a scheduled process. With data mining, a structure is not selected in advance; it searches for the structure. As a result, data mining can give you the most valuable results because you can discover patterns you did not expect. A data mining model structure is limited only by the attributes that you use to train the model. One of the drawbacks is that a lot of knowledge is needed for a successful data mining project. End users have to understand the results. Subject matter experts and IT professionals need to understand business problem thoroughly. The development might be sometimes even more complex than the development of OLAP cubes. Each type of analysis has its own place in an enterprise system. SQL Server has tools for all kinds of analyses. However, data mining is the most advanced way of analyzing the data; this is the “I” in BI. In order to get the most out of it, you need to learn quite a lot. In this blog post, I am gathering together resources for learning, including forthcoming events. Books Multiple authors: SQL Server MVP Deep Dives – I wrote an introductory data mining chapter there. Erik Veerman, Teo Lachev and Dejan Sarka: MCTS Self-Paced Training Kit (Exam 70-448): Microsoft SQL Server 2008 - Business Intelligence Development and Maintenance – you can find a good overview of a complete BI solution, including data mining, in this book. Jamie MacLennan, ZhaoHui Tang, and Bogdan Crivat: Data Mining with Microsoft SQL Server 2008 – can’t miss this book if you want to mine your data with SQL Server tools. Michael Berry, Gordon Linoff: Mastering Data Mining: The Art and Science of Customer Relationship Management – data mining from both, business and technical perspective. Dorian Pyle: Data Preparation for Data Mining – an in-depth book about data preparation. Thomas and Ronald Wonnacott: Introductory Statistics – if you thought that you could get away without statistics, then you are not serious about data mining. Jiawei Han and Micheline Kamber: Data Mining Concepts and Techniques – in-depth explanation of the most popular data mining algorithms. Michael Berry and Gordon Linoff: Data Mining Techniques – another book that explains data mining algorithms, more fro a business perspective. Paolo Guidici: Applied Data Mining – very mathematical book, only if you enjoy statistics and mathematics in general. Forthcoming presentations I am presenting two data mining related sessions during the PASS Summit in Charlotte, NC: Wednesday, October 16th, 2013 - Fraud Detection: Notes from the Field – I am showing how to use data mining for a specific business problem. The presentation is based on real-life projects. Friday, October 18th: Excel 2013 Advanced Analytics – I am focusing on Excel Data Mining Add-ins, and how to use them together with Power Pivot and other add-ins. This is the most you can get out of Excel. Sinergija 2013, Belgrade, Serbia Tuesday, October 22nd: Excel 2013 Analytics to the Max – another presentation focusing on the most advanced analytics you can get in Excel. SQL Rally Amsterdam, Netherlands Thursday, November 7th: Advanced Analytics in Excel 2013 – and again I am presenting about data mining in Excel. Why three different titles for the same presentation? I don’t know, I guess I forgot the name I proposed every time right after I sent the proposal. Courses Data Mining with SQL Server 2012 – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. OK, now you know: no more excuses, start learning data mining, get the most out of your data

    Read the article

  • Is a backlink with a duplicate description and title from a news site bad for SEO?

    - by Dejan Pelzel
    I have a blog with over a thousand posts. I have posted some of those to a news aggregator site and included the same preview photo and description that I used for it on my own site and the link to the post on my site. Since the site is mainly videos and images, the description was usually a complete match of 4-6 lines of text. It now looks that I have been affected by panda and since I am not doing any bad stuff, I suspect it might be due to duplicate content. For example, when I search the title of my posts, sometimes my site is not even returned, but the news aggregator site is. Could this be the problem with panda?

    Read the article

  • Recovery from URL structure change?

    - by Dejan Pelzel
    in July this year, we have changed the URL structure of the website from: Post: domain.com/blog/post/986/dance/heart-beats-dance-video-by-chinatsu/ Category: domain.com/blog/index/cosplay/ to Post: domain.com/dance/heart-beats-dance-video-by-chinatsu-986/ Category: domain.com/cosplay/ Everything was (supposedly) properly redirected with 301 redirects and it first seemed that the traffic returned after a couple of days, but it has now been close to 2 months and things keep going worse although Google is slowly indexing the changes. What is worrying me even more is that the Pages crawled per day from Webmaster Tools started drastically dropping a few days ago and has just reached a new low in months (from over 2000 to 700). Should I be worried or will things sort out eventually?

    Read the article

  • Can thousands of backlinks from the same site harm PageRank?

    - by Dejan Pelzel
    I just noticed that one particular site has almost 7000 backlinks linking back to our website. The site is something like a news aggregator and for each post they created around 20 (sometimes much more) backlinks back to our page and they basically linked over 400 pages. I am beginning to get concerned that this amount of links might harm our page. They seem to have more backlinks to our page than all the other pages combined and more backlinks that our website has pages. We have seen a massive negative effect going on for quite a while and the PageRank seems to have dropped to None (Not even 0). But I am not sure when and why exactly that happened seeing that PageRank updates take quite a while to appear. The site linking to us is otherwise pretty reputable and doesn't seem to be having any problems with their rank. (PR 6 actually) I was thinking of using the Google disavow tool for this site, but I don't want to make things even worse. Do you think these are harmful? If so, how do I fix this? Thanks :)

    Read the article

  • Worth changing the URL structure to incorporate keywords?

    - by Dejan Pelzel
    I am migrating my blog from PHP to ASP.NET and while recoding the whole website, I figured I might as well improve the URL structure. This is how an url looks like now: example.com/blog/post/755/hakurei-reimu-cosplay-from-touhou-by-kishigami-hana and this is hould it will look after the change (cosplay being the dynamic main keyword of the post): example.com/blog/cosplay/hakurei-reimu-cosplay-from-touhou-by-kishigami-hana-755/ The website is a bit more than a half year old and receives around 650k page views a month, mainly from search traffic. Of course everything would be redirected with 301 redirects. Do you think it is worth changing to a new URL structure, or will it harm the ranking in the long run?

    Read the article

  • Working with Temporal Data in SQL Server

    - by Dejan Sarka
    My third Pluralsight course, Working with Temporal Data in SQL Server, is published. I am really proud on the second part of the course, where I discuss optimization of temporal queries. This was a nearly impossible task for decades. First solutions appeared only lately. I present all together six solutions (and one more that is not a solution), and I invented four of them. http://pluralsight.com/training/Courses/TableOfContents/working-with-temporal-data-sql-server

    Read the article

  • Connecting MS SQL using freetds and unixodbc: isql - no default driver specified

    - by Dejan
    I am trying to connect to the MS SQL database using freetds and unixodbc. I have read various guides how to do it, but no one works fine for me. When I try to connect to the database using isql tool, I get the following error: $ isql -v TS username password [IM002][unixODBC][Driver Manager]Data source name not found, and no default driver specified [ISQL]ERROR: Could not SQLConnect Have anybody already successfully established the connection to the MS SQL database using freetds and unixodbc on Ubuntu 12.04? I would really appreciate some help. Below is the procedure I used to configure the freetds and unixodbc. Thanks for your help in advance! Procedure First, I have installed the following packages sudo apt-get unixodbc unixodbc-dev freetds-dev tdsodbc and configured freetds as follows: --- /etc/freetds/freetds.conf --- [TS] host = SERVER port = 1433 tds version = 7.0 client charset = UTF-8 Using tsql tool I can successfully connect to the database by executing tsql -S TS -U username -P password As I need an odbc connection I configured odbcinst.ini as follows: --- /etc/odbcinst.ini --- [FreeTDS] Description = FreeTDS Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so FileUsage = 1 CPTimeout = CPResuse = client charset = utf-8 and odbc.ini as follows: --- /etc/odbc.ini --- [TS] Description = "test" Driver = FreeTDS Servername = SERVER Server = SERVER Port = 1433 Database = DBNAME Trace = No Trying to connect to the database using isql tool with such a configuration results the following error: $ isql -v TS username password [IM002][unixODBC][Driver Manager]Data source name not found, and no default driver specified [ISQL]ERROR: Could not SQLConnect

    Read the article

  • Logical and Physical Modeling for Analytical Applications

    - by Dejan Sarka
    I am proud to announce that my first course for Pluralsight is released. The course title is Logical and Physical Modeling for Analytical Applications. Here is the description of the course. A bad data model leads to an application that does not perform well. Therefore, when developing an application, you should create a good data model from the start. However, even the best logical model can’t help when the physical implementation is bad. It is also important to know how SQL Server stores and accesses data, and how to optimize the data access. Database optimization starts by splitting transactional and analytical applications. In this course, you learn how to support analytical applications with logical design, get understanding of the problems with data access for queries that deal with large amounts of data, and learn about SQL Server optimizations that help solving these problems. Enjoy the course!

    Read the article

  • Fraud Detection with the SQL Server Suite Part 2

    - by Dejan Sarka
    This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

    Read the article

  • Fraud Detection with the SQL Server Suite Part 1

    - by Dejan Sarka
    While working on different fraud detection projects, I developed my own approach to the solution for this problem. In my PASS Summit 2013 session I am introducing this approach. I also wrote a whitepaper on the same topic, which was generously reviewed by my friend Matija Lah. In order to spread this knowledge faster, I am starting a series of blog posts which will at the end make the whole whitepaper. Abstract With the massive usage of credit cards and web applications for banking and payment processing, the number of fraudulent transactions is growing rapidly and on a global scale. Several fraud detection algorithms are available within a variety of different products. In this paper, we focus on using the Microsoft SQL Server suite for this purpose. In addition, we will explain our original approach to solving the problem by introducing a continuous learning procedure. Our preferred type of service is mentoring; it allows us to perform the work and consulting together with transferring the knowledge onto the customer, thus making it possible for a customer to continue to learn independently. This paper is based on practical experience with different projects covering online banking and credit card usage. Introduction A fraud is a criminal or deceptive activity with the intention of achieving financial or some other gain. Fraud can appear in multiple business areas. You can find a detailed overview of the business domains where fraud can take place in Sahin Y., & Duman E. (2011), Detecting Credit Card Fraud by Decision Trees and Support Vector Machines, Proceedings of the International MultiConference of Engineers and Computer Scientists 2011 Vol 1. Hong Kong: IMECS. Dealing with frauds includes fraud prevention and fraud detection. Fraud prevention is a proactive mechanism, which tries to disable frauds by using previous knowledge. Fraud detection is a reactive mechanism with the goal of detecting suspicious behavior when a fraudster surpasses the fraud prevention mechanism. A fraud detection mechanism checks every transaction and assigns a weight in terms of probability between 0 and 1 that represents a score for evaluating whether a transaction is fraudulent or not. A fraud detection mechanism cannot detect frauds with a probability of 100%; therefore, manual transaction checking must also be available. With fraud detection, this manual part can focus on the most suspicious transactions. This way, an unchanged number of supervisors can detect significantly more frauds than could be achieved with traditional methods of selecting which transactions to check, for example with random sampling. There are two principal data mining techniques available both in general data mining as well as in specific fraud detection techniques: supervised or directed and unsupervised or undirected. Supervised techniques or data mining models use previous knowledge. Typically, existing transactions are marked with a flag denoting whether a particular transaction is fraudulent or not. Customers at some point in time do report frauds, and the transactional system should be capable of accepting such a flag. Supervised data mining algorithms try to explain the value of this flag by using different input variables. When the patterns and rules that lead to frauds are learned through the model training process, they can be used for prediction of the fraud flag on new incoming transactions. Unsupervised techniques analyze data without prior knowledge, without the fraud flag; they try to find transactions which do not resemble other transactions, i.e. outliers. In both cases, there should be more frauds in the data set selected for checking by using the data mining knowledge compared to selecting the data set with simpler methods; this is known as the lift of a model. Typically, we compare the lift with random sampling. The supervised methods typically give a much better lift than the unsupervised ones. However, we must use the unsupervised ones when we do not have any previous knowledge. Furthermore, unsupervised methods are useful for controlling whether the supervised models are still efficient. Accuracy of the predictions drops over time. Patterns of credit card usage, for example, change over time. In addition, fraudsters continuously learn as well. Therefore, it is important to check the efficiency of the predictive models with the undirected ones. When the difference between the lift of the supervised models and the lift of the unsupervised models drops, it is time to refine the supervised models. However, the unsupervised models can become obsolete as well. It is also important to measure the overall efficiency of both, supervised and unsupervised models, over time. We can compare the number of predicted frauds with the total number of frauds that include predicted and reported occurrences. For measuring behavior across time, specific analytical databases called data warehouses (DW) and on-line analytical processing (OLAP) systems can be employed. By controlling the supervised models with unsupervised ones and by using an OLAP system or DW reports to control both, a continuous learning infrastructure can be established. There are many difficulties in developing a fraud detection system. As has already been mentioned, fraudsters continuously learn, and the patterns change. The exchange of experiences and ideas can be very limited due to privacy concerns. In addition, both data sets and results might be censored, as the companies generally do not want to publically expose actual fraudulent behaviors. Therefore it can be quite difficult if not impossible to cross-evaluate the models using data from different companies and different business areas. This fact stresses the importance of continuous learning even more. Finally, the number of frauds in the total number of transactions is small, typically much less than 1% of transactions is fraudulent. Some predictive data mining algorithms do not give good results when the target state is represented with a very low frequency. Data preparation techniques like oversampling and undersampling can help overcome the shortcomings of many algorithms. SQL Server suite includes all of the software required to create, deploy any maintain a fraud detection infrastructure. The Database Engine is the relational database management system (RDBMS), which supports all activity needed for data preparation and for data warehouses. SQL Server Analysis Services (SSAS) supports OLAP and data mining (in version 2012, you need to install SSAS in multidimensional and data mining mode; this was the only mode in previous versions of SSAS, while SSAS 2012 also supports the tabular mode, which does not include data mining). Additional products from the suite can be useful as well. SQL Server Integration Services (SSIS) is a tool for developing extract transform–load (ETL) applications. SSIS is typically used for loading a DW, and in addition, it can use SSAS data mining models for building intelligent data flows. SQL Server Reporting Services (SSRS) is useful for presenting the results in a variety of reports. Data Quality Services (DQS) mitigate the occasional data cleansing process by maintaining a knowledge base. Master Data Services is an application that helps companies maintaining a central, authoritative source of their master data, i.e. the most important data to any organization. For an overview of the SQL Server business intelligence (BI) part of the suite that includes Database Engine, SSAS and SSRS, please refer to Veerman E., Lachev T., & Sarka D. (2009). MCTS Self-Paced Training Kit (Exam 70-448): Microsoft® SQL Server® 2008 Business Intelligence Development and Maintenance. MS Press. For an overview of the enterprise information management (EIM) part that includes SSIS, DQS and MDS, please refer to Sarka D., Lah M., & Jerkic G. (2012). Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012. O'Reilly. For details about SSAS data mining, please refer to MacLennan J., Tang Z., & Crivat B. (2009). Data Mining with Microsoft SQL Server 2008. Wiley. SQL Server Data Mining Add-ins for Office, a free download for Office versions 2007, 2010 and 2013, bring the power of data mining to Excel, enabling advanced analytics in Excel. Together with PowerPivot for Excel, which is also freely downloadable and can be used in Excel 2010, is already included in Excel 2013. It brings OLAP functionalities directly into Excel, making it possible for an advanced analyst to build a complete learning infrastructure using a familiar tool. This way, many more people, including employees in subsidiaries, can contribute to the learning process by examining local transactions and quickly identifying new patterns.

    Read the article

  • nano syntax highlighting not working for all languages

    - by Dejan
    I have a funny situation where I am unable to add custom highlighting definitions to my nano text editor. The funny thing is that the predefined work like a charm and can be edited. But I have created a new one for js with $ sudo touch js.nanorc $ sudo nano js.nanorc my current js.nanorc looks like this: syntax "JavaScript" "\.js$" color blue "\<[-+]?([1-9][0-9]*|0[0-7]*|0x[0-9a-fA-F]+)([uU][lL]?|[lL][uU]?)?\>" color blue "\<[-+]?([0-9]+\.[0-9]*|[0-9]*\.[0-9]+)([EePp][+-]?[0-9]+)?[fFlL]?" color blue "\<[-+]?([0-9]+[EePp][+-]?[0-9]+)[fFlL]?" color brightblue "[A-Za-z_][A-Za-z0-9_]*[[:space:]]*[(]" color black "[(]" color cyan "\<(break|case|catch|continue|default|delete|do|else|finally)\>" color cyan "\<(for|function|get|if|in|instanceof|new|return|set|switch)\>" color cyan "\<(switch|this|throw|try|typeof|var|void|while|with)\>" color cyan "\<(null|undefined|NaN)\>" color brightcyan "\<(true|false)\>" color green "\<(Array|Boolean|Date|Enumerator|Error|Function|Math)\>" color green "\<(Number|Object|RegExp|String)\>" color red "[-+/*=<>!~%?:&|]" color magenta "/[^*]([^/]|(\\/))*[^\\]/[gim]*" color yellow ""(\\.|[^"])*"|'(\\.|[^'])*'" color magenta "\\[0-7][0-7]?[0-7]?|\\x[0-9a-fA-F]+|\\[bfnrt'"\?\\]" color brightblack "(^|[[:space:]])//.*" color brightblack start="/\*" end="\*/" color brightwhite,cyan "TODO:?" color ,green "[[:space:]]+$" color ,red " +" If anyone can see the problem then please tel me

    Read the article

  • Suggestions on mail servers

    - by Dejan.S
    Hi. I got a app sending massive newsletters and so far I been using the regular smtp within the iis7. I been looking at mail servers and There are plenty out there and I wonder what your tips and experiences are on this? What mail servers are good and easy to use with windows server 2008, iis7.. Thanks

    Read the article

  • How to increase video memory in libvirt/KVM gui?

    - by Dejan
    In the 'Virtual Hardware details', it lists the model as 'cirrus' with 9MB of RAM. The RAM field cannot be changed, but how to increate the video RAM? My host OS is RH6 and gust OS is Fedora16. EDIT: From guest OS, when I run xvinfo it displays 'no adaptors present'. I was trying to play a video using gstreamers xvimagesink plugin (XFree86 video output plugin using Xv extension). The problem is that xvimagesink is using hardware acceleration for video performance and hence the error Could not initialize Xv output. I guess I'll have to configure hardware acceleration for the guest.

    Read the article

  • Change my MX record on my server to google MX?

    - by Dejan.S
    Hi. I got a windows server 2008 where I host a site, now I decided to have the email on google apps. I did add the MX records I get from them to my DNS settings on the server but with no luck. I recently started doing server stuff so I did like this. Server Manager / Roles / DNS Server / DNS / SERVERNAME / MYDOMAIN / Forward Lookup / New MX Host or child domain: What goes here? FQDN: here is my domain name, i think because I named the ns my domain? FQDN MX: here is the google MX record I got from them MSP: 10 I have no Idea where I go wrong but I thought I would ask you guys if any of you can maybe give me some tips on what to look for or any newbee mistake I do that you see from this little info. I really appreciate all help I could get on this.

    Read the article

  • Can I monitor active user count on my iis sites?

    - by Dejan.S
    We are having problems with performance on our server that host our websites that the processor gets upp to 90%. I would like to monitor the amount of users active on your sites that are published on the iis. My question, is this possible? is there any software for this? EDIT current (like this second) visitor count on all the active websites on our iis REASON FOR THIS if i can get the visitor amount on the days the CPU is not overloaded and and compare it to the days it is then i atleast know that this CAN be a reason why this is happening and i can take it from there. Otherwise i can focus on the code on the sites, or maybe google crawler is causing this, there are manythings that can cause this you know? for me this is just a simple way of troubleshooting.

    Read the article

  • Windows server 2008, Dns. I'm confused

    - by Dejan.S
    Hi. I recently setup a window server 2008 server at work. Keep in mind I have never worked with it before:). Background story is I try to host a couple of sites on the server through iis7, I got domains (currently hosted at other hosters for the moment). I want to point the domain NS to my server on all of them. I have read how to setup a DNS on the server, so far so good. now my dns is companyname.com in the server manager I got DNS / companyname.com in there I got ns.comanyname.com, in there I got Host(A) with the server ip now this is where I get confused about how things work with DNS, NS & Host(A). I dont know how to assign(so to speak) the Host(A) to one of my webapps hosted on the iis7, because that is the pointer right?. To leave an example to work with lets say, Hosted.com is hosted on my iis7, on port 81. You don't understand how great full I would be if somebody could explain this confusion. EDIT: Do I need to create a DNS for every site hosted on my server? Or just make a A Host/Record? Thanks guys

    Read the article

  • cisco ePC 3208 router and captive portal

    - by Dejan Milosevic
    Ok, i have cisco ePC 3208 router, and cable internet goes in router via cable and router is emiting wireless. It can work without computer and it is working fine. Now i want to have capiteve portal with home page for my buisness and user logins. Is it possible that i use computer as gateway for captive portal, so when user goes to wireless it will redirected to computer local server for authorization and then passed by if user and pass is good, or i need another router or wifi acess point?

    Read the article

  • Copying compressed files from Server 2008 R2 network share to XP client via VPN fails

    - by Dejan Janjuševic
    At the first sight the question looks similar to this one. I have experienced an odd behavior while trying to copy a certain file from Windows Server 2008 R2 network share to Windows XP Professional client via VPN. The VPN was set up using RRAS on the server machine. I will try to provide as much informations as possible in order to make the issue more clear. When trying to copy the compressed file sized ~2.5 MB (via Explorer or CMD, doesn't matter), the process stalls after some 20%, producing an error message after few seconds: Cannot copy filename: The specified network name is no longer available. If i start the command ping -t 192.168.2.1 (where the IP address specified belongs to the server) side by side with the copy command, I can clearly see that the ping command times out for few seconds as the copy process stalls. When this happens all network activities are frozen. After a few seconds, the network recovers, ping continues to run normally, however the copy process stands still before it displays the above error message. Copying other files (I tried 4-5 files), of which some are larger and some are smaller, succeeds. Seems to me that I can copy all uncompressed files. As soon as I try to copy an archive, the process freezes. Even a 707 KB large archive can't be copied. I can only reproduce this behavior on 2 machines, both Windows XP Professional, one is w/ SP2 and the other w/ SP3. Other XP clients don't have this problem, neither do Windows 7 clients. If I connect to the server using Remote Desktop Connection without using VPN from either of these 2 machines (using the same user account), I can copy anything I want normally, even these "problematic" files. Does anyone have any clue about what could possibly be going on?

    Read the article

  • Prevent Firefox from going to first search result

    - by Dejan
    When I type some terms in the address bar (not the search box!) and hit enter, Firefox searches for those terms on Google, and, depending on some logic either takes me to the search results page or takes me to the first search result. Now, I want it to always take me to the search results page (like Chrome does). Is this possible? And, yes, I am aware that the search box does exactly that, but I'm using it for some other search engine. So, another solution would be to add additional search box, that can also work for me.

    Read the article

  • spam check, spam score how to?

    - by Dejan.S
    Hi. I am doing a app that is sending email and need to have a spam checker on the outgoing email. I have been looking for this a while now, I can not seem to find a good solution. I would like to use something like the spamassassin. Do you guys got any examples how to do the spamassassin with asp.net (coding example, not example to setup the actual spamassassin, that is done)? Or examples on other ways to do this to get a score? Any ideas on how to do this just let me know. I would be very thankful for this Thanks

    Read the article

  • HttpPostAttribute not found? worked a second ago

    - by Dejan.S
    Hi I just opened up a project I done in MVC a while back, fired right up I looked at in the browser and now all the sudden it just wont find the [HttpPost] & [HttpPostAttribute]. What can be the problem? errormessage The type or namespace name 'HttpPost' could not be found (are you missing a using directive or an assembly reference?)

    Read the article

1 2 3  | Next Page >