Search Results

Search found 58475 results on 2339 pages for 'data mining'.

Page 5/2339 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Bridging Two Worlds: Big Data and Enterprise Data

    - by Dain C. Hansen
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} The big data world is all the vogue in today’s IT conversations. It’s a world of volume, velocity, variety – tantalizing us with its untapped potential. It’s a world of transformational game-changing technologies that have already begun to alter the information management landscape. One of the reasons that big data is so compelling is that it’s a universal challenge that impacts every one of us. Whether it is healthcare, financial, manufacturing, government, retail - big data presents a pressing problem for many industries: how can so much information be processed so quickly to deliver the ‘bigger’ picture? With big data we’re tapping into new information that didn’t exist before: social data, weblogs, sensor data, complex content, and more. What also makes big data revolutionary is that it turns traditional information architecture on its head, putting into question commonly accepted notions of where and how data should be aggregated processed, analyzed, and stored. This is where Hadoop and NoSQL come in – new technologies which solve new problems for managing unstructured data. And now for some worst practices that I'd recommend that you please not follow: Worst Practice Lesson 1: Throw away everything that you already know about data management, data integration tools, and start completely over. One shouldn’t forget what’s already running in today’s IT. Today’s Business Analytics, Data Warehouses, Business Applications (ERP, CRM, SCM, HCM), and even many social, mobile, cloud applications still rely almost exclusively on structured data – or what we’d like to call enterprise data. This dilemma is what today’s IT leaders are up against: what are the best ways to bridge enterprise data with big data? And what are the best strategies for dealing with the complexities of these two unique worlds? Worst Practice Lesson 2: Throw away all of your existing business applications … because they don’t run on big data yet. Bridging the two worlds of big data and enterprise data means considering solutions that are complete, based on emerging Hadoop technologies (as well as traditional), and are poised for success through integrated design tools, integrated platforms that connect to your existing business applications, as well as and support real-time analytics. Leveraging these types of best practices translates to improved productivity, lowered TCO, IT optimization, and better business insights. Worst Practice Lesson 3: Separate out [and keep separate] your big data sandboxes from all the current enterprise IT systems. Don’t mix sand among playgrounds. We didn't tell you that you wouldn't get dirty doing this. Correlation between the two worlds is key. The real advantage to analyzing big data comes when you can correlate it with the existing data in your data warehouse or your current applications to make sense of the larger patterns. If you have not followed these worst practices 1-3 then you qualify for the first step of our journey: bridging the two worlds of enterprise data and big data. Over the next several weeks we’ll be discussing this topic along with several others around big data as it relates to data integration. We welcome you to join us in the conversation by following us on twitter on #BridgingBigData or download our latest white paper and resource kit: Big Data and Enterprise Data: Bridging Two Worlds.

    Read the article

  • SQL SERVER – Data Sources and Data Sets in Reporting Services SSRS

    - by Pinal Dave
    This example is from the Beginning SSRS by Kathi Kellenberger. Supporting files are available with a free download from the www.Joes2Pros.com web site. This example is from the Beginning SSRS. Supporting files are available with a free download from the www.Joes2Pros.com web site. Connecting to Your Data? When I was a child, the telephone book was an important part of my life. Maybe I was just a nerd, but I enjoyed getting a new book every year to page through to learn about the businesses in my small town or to discover where some of my school acquaintances lived. It was also the source of maps to my town’s neighborhoods and the towns that surrounded me. To make a phone call, I would need a telephone number. In order to find a telephone number, I had to know how to use the telephone book. That seems pretty simple, but it resembles connecting to any data. You have to know where the data is and how to interact with it. A data source is the connection information that the report uses to connect to the database. You have two choices when creating a data source, whether to embed it in the report or to make it a shared resource usable by many reports. Data Sources and Data Sets A few basic terms will make the upcoming choses make more sense. What database on what server do you want to connect to? It would be better to just ask… “what is your data source?” The connection you need to make to get your reports data is called a data source. If you connected to a data source (like the JProCo database) there may be hundreds of tables. You probably only want data from just a few tables. This means you want to write a specific query against this data source. A query on a data source to get just the records you need for an SSRS report is called a Data Set. Creating a local Data Source You can connect embed a connection from your report directly to your JProCo database which (let’s say) is installed on a server named Reno. If you move JProCo to a new server named Tampa then you need to update the Data Set. If you have 10 reports in one project that were all pointing to the JProCo database on the Reno server then they would all need to be updated at once. It’s possible to make a project level Data Source and have each report use that. This means one change can fix all 10 reports at once. This would be called a Shared Data Source. Creating a Shared Data Source The best advice I can give you is to create shared data sources. The reason I recommend this is that if a database moves to a new server you will have just one place in Report Manager to make the server name change. That one change will update the connection information in all the reports that use that data source. To get started, you will start with a fresh project. Go to Start > All Programs > SQL Server 2012 > Microsoft SQL Server Data Tools to launch SSDT. Once SSDT is running, click New Project to create a new project. Once the New Project dialog box appears, fill in the form, as shown in. Be sure to select Report Server Project this time – not the wizard. Click OK to dismiss the New Project dialog box. You should now have an empty project, as shown in the Solution Explorer. A report is meant to show you data. Where is the data? The first task is to create a Shared Data Source. Right-click on the Shared Data Sources folder and choose Add New Data Source. The Shared Data Source Properties dialog box will launch where you can fill in a name for the data source. By default, it is named DataSource1. The best practice is to give the data source a more meaningful name. It is possible that you will have projects with more than one data source and, by naming them, you can tell one from another. Type the name JProCo for the data source name and click the Edit button to configure the database connection properties. If you take a look at the types of data sources you can choose, you will see that SSRS works with many data platforms including Oracle, XML, and Teradata. Make sure SQL Server is selected before continuing. For this post, I am assuming that you are using a local SQL Server and that you can use your Windows account to log in to the SQL Server. If, for some reason you must use SQL Server Authentication, choose that option and fill in your SQL Server account credentials. Otherwise, just accept Windows Authentication. If your database server was installed locally and with the default instance, just type in Localhost for the Server name. Select the JProCo database from the database list. At this point, the connection properties should look like. If you have installed a named instance of SQL Server, you will have to specify the server name like this: Localhost\InstanceName, replacing the InstanceName with whatever your instance name is. If you are not sure about the named instance, launch the SQL Server Configuration Manager found at Start > All Programs > Microsoft SQL Server 2012 > Configuration Tools. If you have a named instance, the name will be shown in parentheses. A default instance of SQL Server will display MSSQLSERVER; a named instance will display the name chosen during installation. Once you get the connection properties filled in, click OK to dismiss the Connection Properties dialog box and OK again to dismiss the Shared Data Source properties. You now have a data source in the Solution Explorer. What’s next I really need to thank Kathi Kellenberger and Rick Morelan for sharing this material for this 5 day series of posts on SSRS. To get really comfortable with SSRS you will get to know the different SSDT windows, Build reports on your own (without the wizards),  Add report headers and footers, Accept user input,  create levels, charts, or even maps for visual appeal. You might be surprise to know a small 230 page book starts from the very beginning and covers the steps to do all these items. Beginning SSRS 2012 is a small easy to follow book so you can learn SSRS for less than $20. See Joes2Pros.com for more on this and other books. If you want to learn SSRS in easy to simple words – I strongly recommend you to get Beginning SSRS book from Joes 2 Pros. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Reporting Services, SSRS

    Read the article

  • Social Analytics in your current data

    - by Dan McGrath
    By now everyone is aware of the massive boom in social-networking (Twitter, Facebook, LinkedIn) and obviously a big part of its business model revolves around being able to mine this data to create information that can be used to make money for someone. Gartner has identified 'Social Analytics' as one of the top 10 strategic technologies for 2011. Has anyone looked at their existing data structures to determine if they could extract a social graph and then perform further data mining against this? How does it fit in with your other strategic development strategies? What information are you trying to extract from the data? Take for example, a bank. They could conceivably determine a social graph through account relationships and transactions. Obviously there would be open edges on the graph where funds enter/leave the institute, but that shouldn't detract from the usefulness of the data. I'm looking for actual examples with the answers, as well as why/how they did it. References to other sites will be greatly appreciated. Note: I'm not at all referring to mining data out of actual social networks.

    Read the article

  • PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

    - by belvoir
    Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it into a data-warehouse. How much data? At peak times we are talking approx 80-100k rows per min hitting the OLTP side, off-peak this will drop significantly to 15-20k. The most frequently updated rows are ~64 bytes each but there are various tables etc so the data is quite diverse and can range up to 4000 bytes per row. The OLTP is active 24x5.5. Best Solution? From what I can piece together the most practical solution is as follows: Create a TRIGGER to write all DML activity to a rotating CSV log file Perform whatever transformations are required Use the native DW data pump tool to efficiently pump the transformed CSV into the DW Why this approach? TRIGGERS allow selective tables to be targeted rather than being system wide + output is configurable (i.e. into a CSV) and are relatively easy to write and deploy. SLONY uses similar approach and overhead is acceptable CSV easy and fast to transform Easy to pump CSV into the DW Alternatives considered .... Using native logging (http://www.postgresql.org/docs/8.3/static/runtime-config-logging.html). Problem with this is it looked very verbose relative to what I needed and was a little trickier to parse and transform. However it could be faster as I presume there is less overhead compared to a TRIGGER. Certainly it would make the admin easier as it is system wide but again, I don't need some of the tables (some are used for persistent storage of JMS messages which I do not want to log) Querying the data directly via an ETL tool such as Talend and pumping it into the DW ... problem is the OLTP schema would need tweaked to support this and that has many negative side-effects Using a tweaked/hacked SLONY - SLONY does a good job of logging and migrating changes to a slave so the conceptual framework is there but the proposed solution just seems easier and cleaner Using the WAL Has anyone done this before? Want to share your thoughts?

    Read the article

  • Reference Data Management and Master Data: Are Relation ?

    - by Mala Narasimharajan
    Submitted By:  Rahul Kamath  Oracle Data Relationship Management (DRM) has always been extremely powerful as an Enterprise Master Data Management (MDM) solution that can help manage changes to master data in a way that influences enterprise structure, whether it be mastering chart of accounts to enable financial transformation, or revamping organization structures to drive business transformation and operational efficiencies, or restructuring sales territories to enable equitable distribution of leads to sales teams following the acquisition of new products, or adding additional cost centers to enable fine grain control over expenses. Increasingly, DRM is also being utilized by Oracle customers for reference data management, an emerging solution space that deserves some explanation. What is reference data? How does it relate to Master Data? Reference data is a close cousin of master data. While master data is challenged with problems of unique identification, may be more rapidly changing, requires consensus building across stakeholders and lends structure to business transactions, reference data is simpler, more slowly changing, but has semantic content that is used to categorize or group other information assets – including master data – and gives them contextual value. In fact, the creation of a new master data element may require new reference data to be created. For example, when a European company acquires a US business, chances are that they will now need to adapt their product line taxonomy to include a new category to describe the newly acquired US product line. Further, the cross-border transaction will also result in a revised geo hierarchy. The addition of new products represents changes to master data while changes to product categories and geo hierarchy are examples of reference data changes.1 The following table contains an illustrative list of examples of reference data by type. Reference data types may include types and codes, business taxonomies, complex relationships & cross-domain mappings or standards. Types & Codes Taxonomies Relationships / Mappings Standards Transaction Codes Industry Classification Categories and Codes, e.g., North America Industry Classification System (NAICS) Product / Segment; Product / Geo Calendars (e.g., Gregorian, Fiscal, Manufacturing, Retail, ISO8601) Lookup Tables (e.g., Gender, Marital Status, etc.) Product Categories City à State à Postal Codes Currency Codes (e.g., ISO) Status Codes Sales Territories (e.g., Geo, Industry Verticals, Named Accounts, Federal/State/Local/Defense) Customer / Market Segment; Business Unit / Channel Country Codes (e.g., ISO 3166, UN) Role Codes Market Segments Country Codes / Currency Codes / Financial Accounts Date/Time, Time Zones (e.g., ISO 8601) Domain Values Universal Standard Products and Services Classification (UNSPSC), eCl@ss International Classification of Diseases (ICD) e.g., ICD9 à IC10 mappings Tax Rates Why manage reference data? Reference data carries contextual value and meaning and therefore its use can drive business logic that helps execute a business process, create a desired application behavior or provide meaningful segmentation to analyze transaction data. Further, mapping reference data often requires human judgment. Sample Use Cases of Reference Data Management Healthcare: Diagnostic Codes The reference data challenges in the healthcare industry offer a case in point. Part of being HIPAA compliant requires medical practitioners to transition diagnosis codes from ICD-9 to ICD-10, a medical coding scheme used to classify diseases, signs and symptoms, causes, etc. The transition to ICD-10 has a significant impact on business processes, procedures, contracts, and IT systems. Since both code sets ICD-9 and ICD-10 offer diagnosis codes of very different levels of granularity, human judgment is required to map ICD-9 codes to ICD-10. The process requires collaboration and consensus building among stakeholders much in the same way as does master data management. Moreover, to build reports to understand utilization, frequency and quality of diagnoses, medical practitioners may need to “cross-walk” mappings -- either forward to ICD-10 or backwards to ICD-9 depending upon the reporting time horizon. Spend Management: Product, Service & Supplier Codes Similarly, as an enterprise looks to rationalize suppliers and leverage their spend, conforming supplier codes, as well as product and service codes requires supporting multiple classification schemes that may include industry standards (e.g., UNSPSC, eCl@ss) or enterprise taxonomies. Aberdeen Group estimates that 90% of companies rely on spreadsheets and manual reviews to aggregate, classify and analyze spend data, and that data management activities account for 12-15% of the sourcing cycle and consume 30-50% of a commodity manager’s time. Creating a common map across the extended enterprise to rationalize codes across procurement, accounts payable, general ledger, credit card, procurement card (P-card) as well as ACH and bank systems can cut sourcing costs, improve compliance, lower inventory stock, and free up talent to focus on value added tasks. Change Management: Point of Sales Transaction Codes and Product Codes In the specialty finance industry, enterprises are confronted with usury laws – governed at the state and local level – that regulate financial product innovation as it relates to consumer loans, check cashing and pawn lending. To comply, it is important to demonstrate that transactions booked at the point of sale are posted against valid product codes that were on offer at the time of booking the sale. Since new products are being released at a steady stream, it is important to ensure timely and accurate mapping of point-of-sale transaction codes with the appropriate product and GL codes to comply with the changing regulations. Multi-National Companies: Industry Classification Schemes As companies grow and expand across geographies, a typical challenge they encounter with reference data represents reconciling various versions of industry classification schemes in use across nations. While the United States, Mexico and Canada conform to the North American Industry Classification System (NAICS) standard, European Union countries choose different variants of the NACE industry classification scheme. Multi-national companies must manage the individual national NACE schemes and reconcile the differences across countries. Enterprises must invest in a reference data change management application to address the challenge of distributing reference data changes to downstream applications and assess which applications were impacted by a given change. References 1 Master Data versus Reference Data, Malcolm Chisholm, April 1, 2006.

    Read the article

  • Focus on Oracle Data Profiling and Data Quality 11g - 24/Fev/11

    - by Claudia Costa
    Thursday 24th February, 11am GMTOracle offers an integrated suite Data Quality software architected to discover and correct today's data quality problems and establish a platform prepared for tomorrow's yet unknown data challenges.Oracle Data Profiling provides data investigation, discovery, and profiling in support of quality, migration, integration, stewardship, and governance initiatives. It includes a broad range of features that expand upon basic profiling, including automated monitoring, business-rule validation, and trend analysis.Oracle Data Quality for Data Integrator provides cleansing, standardization, matching, address validation, location enrichment, and linking functions for global customer data and operational business data.It ensures that data adheres to established standards that are adaptable to fit each organization's specific needs. Both single - and double - byte data are processed in local languages to provide a unique and centralized view of customers, products and services.  During this in-person briefing, Data Integration Solution Specialists will be providing a technical overview and a walkthrough.Agenda Oracle Data Integration Strategy overview A focus on Oracle Data Profiling and Oracle Data Quality for Data Integrator: Oracle Data Profiling Oracle Data Quality for Data Integrator Live demo Q&A  This FREE online LIVE eSeminar will be delivered over the Web and Conference Call. Registrations received less than 24hours prior to start time may not receive confirmation to attend.To register click here.For any questions please contact [email protected]

    Read the article

  • Extending WCF Data Service to synthesize missing data on request

    - by Schneider
    I have got a WCF Data Service based on a LINQ to SQL data provider. I am making a query "get me all the records between two dates". The problem is that I want to synthesize two extra records such that I always get records that fall on the start and end dates, plus all the ones in between which come from the database. Is there a way to "intercept" the request so I can synthesize these records and return them to the client? Thanks

    Read the article

  • Tackling Big Data Analytics with Oracle Data Integrator

    - by Irem Radzik
    v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";}  By Mike Eisterer  The term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and documents that can be mined for useful information.  Companies are facing emerging technologies, increasing data volumes, numerous data varieties and the processing power needed to efficiently analyze data which changes with high velocity. Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships Oracle Data Integrator Enterprise Edition(ODI) is critical to any enterprise big data strategy. ODI and the Oracle Data Connectors provide native access to Hadoop, leveraging such technologies as MapReduce, HDFS and Hive. Alongside with ODI’s metadata driven approach for extracting, loading and transforming data; companies may now integrate their existing data with big data technologies and deliver timely and trusted data to their analytic and decision support platforms. In this session, you’ll learn about ODI and Oracle Big Data Connectors and how, coupled together, they provide the critical integration with multiple big data platforms. Tackling Big Data Analytics with Oracle Data Integrator October 1, 2012 12:15 PM at MOSCONE WEST – 3005 For other data integration sessions at OpenWorld, please check our Focus-On document.  If you are not able to attend OpenWorld, please check out our latest resources for Data Integration.

    Read the article

  • BCP???!????????????:Oracle Data Guard ????

    - by Shinobu FUJINAMI
    ??????????????????????????????????????????????????·????????????????? ??????DG???????????????????????Disk Group???Down Grade????????????????????????????????????????? Oracle ? DG ??Data Guard????????????Oracle Data Guard ???????????????????????????????·??????????????????·???????????????????????????????????????????????????????????????? BCP(??????)????????????????????????????????? Oracle Data Guard ??? Oracle Data Guard ????????KROWN??????·????(KDS) ? Data Guard ??????????????????????????????????????????????????????( KROWN??????·????(KDS) ???????? ) ????·???????????? - ???????? Data Guard Data Guard ?????????BCP ????????????????? Data Guard ???????????????????????????????????? - ???????????????????????????? Data Guard ???????·??????(????????)???????????·??????????·??????2?????????????????·?????????????????? ???????????????????????????????·????????????????????????? - Data Guard >> ??????????? ??????????? Data Guard ???????????ASM ? RAC ??????????????????????????? Data Guard ??? Oracle Database ?????????????????  - DataGuard ??????????????????? (11gR1/11gR2) ???????????????????????????????????????????????????  Data Guard ??? Oracle Database ????????????????? - [DataGuard 11g] ?????·?????????????·???? 11g ????????????·?????????????·????????????????? ??????·??????????????????????????????????? ??·???????????? -  Data Guard >> ??????????? ???????????(?????·?????)?????????(????·?????)?????????/??????·???????????????????????? ??????????????????????? ??????????????????????????????? ???·????????????  - Data Guard >> ???? ????????????????????????????????? Data Guard ???????????????????????????????????? ?????????????????????DataGuard??????????????????????????????? ?????DataGuard???????????????????????????????Data Guard ???????????????????????·????????????????????????????? ???????????????????????????????????????????????????????- Data Guard >> ???? ??????????????? ?????????????????????????????????????????????????????????????? ????????????????????????????????? - Data Guard >> ??????????? ??????????????? ?????README, PSR ???????????????????????????????????????????????????????????????????????????????????????????? Oracle Data Guard ? Oracle9i ???????????????????????????????????Oracle Database 10g ???????????·??????? Data Guard ?????????????????????????????????????????????????????????????????Oracle Database 11g ??????·?????·????????????????????????????????????Oracle Data Guard ????????????????????????????????????????

    Read the article

  • Fraud Detection with the SQL Server Suite Part 1

    - by Dejan Sarka
    While working on different fraud detection projects, I developed my own approach to the solution for this problem. In my PASS Summit 2013 session I am introducing this approach. I also wrote a whitepaper on the same topic, which was generously reviewed by my friend Matija Lah. In order to spread this knowledge faster, I am starting a series of blog posts which will at the end make the whole whitepaper. Abstract With the massive usage of credit cards and web applications for banking and payment processing, the number of fraudulent transactions is growing rapidly and on a global scale. Several fraud detection algorithms are available within a variety of different products. In this paper, we focus on using the Microsoft SQL Server suite for this purpose. In addition, we will explain our original approach to solving the problem by introducing a continuous learning procedure. Our preferred type of service is mentoring; it allows us to perform the work and consulting together with transferring the knowledge onto the customer, thus making it possible for a customer to continue to learn independently. This paper is based on practical experience with different projects covering online banking and credit card usage. Introduction A fraud is a criminal or deceptive activity with the intention of achieving financial or some other gain. Fraud can appear in multiple business areas. You can find a detailed overview of the business domains where fraud can take place in Sahin Y., & Duman E. (2011), Detecting Credit Card Fraud by Decision Trees and Support Vector Machines, Proceedings of the International MultiConference of Engineers and Computer Scientists 2011 Vol 1. Hong Kong: IMECS. Dealing with frauds includes fraud prevention and fraud detection. Fraud prevention is a proactive mechanism, which tries to disable frauds by using previous knowledge. Fraud detection is a reactive mechanism with the goal of detecting suspicious behavior when a fraudster surpasses the fraud prevention mechanism. A fraud detection mechanism checks every transaction and assigns a weight in terms of probability between 0 and 1 that represents a score for evaluating whether a transaction is fraudulent or not. A fraud detection mechanism cannot detect frauds with a probability of 100%; therefore, manual transaction checking must also be available. With fraud detection, this manual part can focus on the most suspicious transactions. This way, an unchanged number of supervisors can detect significantly more frauds than could be achieved with traditional methods of selecting which transactions to check, for example with random sampling. There are two principal data mining techniques available both in general data mining as well as in specific fraud detection techniques: supervised or directed and unsupervised or undirected. Supervised techniques or data mining models use previous knowledge. Typically, existing transactions are marked with a flag denoting whether a particular transaction is fraudulent or not. Customers at some point in time do report frauds, and the transactional system should be capable of accepting such a flag. Supervised data mining algorithms try to explain the value of this flag by using different input variables. When the patterns and rules that lead to frauds are learned through the model training process, they can be used for prediction of the fraud flag on new incoming transactions. Unsupervised techniques analyze data without prior knowledge, without the fraud flag; they try to find transactions which do not resemble other transactions, i.e. outliers. In both cases, there should be more frauds in the data set selected for checking by using the data mining knowledge compared to selecting the data set with simpler methods; this is known as the lift of a model. Typically, we compare the lift with random sampling. The supervised methods typically give a much better lift than the unsupervised ones. However, we must use the unsupervised ones when we do not have any previous knowledge. Furthermore, unsupervised methods are useful for controlling whether the supervised models are still efficient. Accuracy of the predictions drops over time. Patterns of credit card usage, for example, change over time. In addition, fraudsters continuously learn as well. Therefore, it is important to check the efficiency of the predictive models with the undirected ones. When the difference between the lift of the supervised models and the lift of the unsupervised models drops, it is time to refine the supervised models. However, the unsupervised models can become obsolete as well. It is also important to measure the overall efficiency of both, supervised and unsupervised models, over time. We can compare the number of predicted frauds with the total number of frauds that include predicted and reported occurrences. For measuring behavior across time, specific analytical databases called data warehouses (DW) and on-line analytical processing (OLAP) systems can be employed. By controlling the supervised models with unsupervised ones and by using an OLAP system or DW reports to control both, a continuous learning infrastructure can be established. There are many difficulties in developing a fraud detection system. As has already been mentioned, fraudsters continuously learn, and the patterns change. The exchange of experiences and ideas can be very limited due to privacy concerns. In addition, both data sets and results might be censored, as the companies generally do not want to publically expose actual fraudulent behaviors. Therefore it can be quite difficult if not impossible to cross-evaluate the models using data from different companies and different business areas. This fact stresses the importance of continuous learning even more. Finally, the number of frauds in the total number of transactions is small, typically much less than 1% of transactions is fraudulent. Some predictive data mining algorithms do not give good results when the target state is represented with a very low frequency. Data preparation techniques like oversampling and undersampling can help overcome the shortcomings of many algorithms. SQL Server suite includes all of the software required to create, deploy any maintain a fraud detection infrastructure. The Database Engine is the relational database management system (RDBMS), which supports all activity needed for data preparation and for data warehouses. SQL Server Analysis Services (SSAS) supports OLAP and data mining (in version 2012, you need to install SSAS in multidimensional and data mining mode; this was the only mode in previous versions of SSAS, while SSAS 2012 also supports the tabular mode, which does not include data mining). Additional products from the suite can be useful as well. SQL Server Integration Services (SSIS) is a tool for developing extract transform–load (ETL) applications. SSIS is typically used for loading a DW, and in addition, it can use SSAS data mining models for building intelligent data flows. SQL Server Reporting Services (SSRS) is useful for presenting the results in a variety of reports. Data Quality Services (DQS) mitigate the occasional data cleansing process by maintaining a knowledge base. Master Data Services is an application that helps companies maintaining a central, authoritative source of their master data, i.e. the most important data to any organization. For an overview of the SQL Server business intelligence (BI) part of the suite that includes Database Engine, SSAS and SSRS, please refer to Veerman E., Lachev T., & Sarka D. (2009). MCTS Self-Paced Training Kit (Exam 70-448): Microsoft® SQL Server® 2008 Business Intelligence Development and Maintenance. MS Press. For an overview of the enterprise information management (EIM) part that includes SSIS, DQS and MDS, please refer to Sarka D., Lah M., & Jerkic G. (2012). Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012. O'Reilly. For details about SSAS data mining, please refer to MacLennan J., Tang Z., & Crivat B. (2009). Data Mining with Microsoft SQL Server 2008. Wiley. SQL Server Data Mining Add-ins for Office, a free download for Office versions 2007, 2010 and 2013, bring the power of data mining to Excel, enabling advanced analytics in Excel. Together with PowerPivot for Excel, which is also freely downloadable and can be used in Excel 2010, is already included in Excel 2013. It brings OLAP functionalities directly into Excel, making it possible for an advanced analyst to build a complete learning infrastructure using a familiar tool. This way, many more people, including employees in subsidiaries, can contribute to the learning process by examining local transactions and quickly identifying new patterns.

    Read the article

  • Extracting data from internet

    - by Ankiov Spetsnaz
    I would like to extract data from internet like www.mozenda.com does but I want to write my own program to do that. Specific data I'm looking for is various event data. Based on my research, I think custom web crawler is my answer but I Would like to confirm the answer and see if there are any suggestion to make custom web crawlers if web crawler indeed is an answer. Personally, I would prefer Java and I'm planning on using Glassfish technology if that matters...

    Read the article

  • Core Data Model Design Question - Changing "Live" Objects also Changes Saved Objects

    - by mwt
    I'm working on my first Core Data project (on iPhone) and am really liking it. Core Data is cool stuff. I am, however, running into a design difficulty that I'm not sure how to solve, although I imagine it's a fairly common situation. It concerns the data model. For the sake of clarity, I'll use an imaginary football game app as an example to illustrate my question. Say that there are NSMO's called Downs and Plays. Plays function like templates to be used by Downs. The user creates Plays (for example, Bootleg, Button Hook, Slant Route, Sweep, etc.) and fills in the various properties. Plays have a to-many relationship with Downs. For each Down, the user decides which Play to use. When the Down is executed, it uses the Play as its template. After each down is run, it is stored in history. The program remembers all the Downs ever played. So far, so good. This is all working fine. The question I have concerns what happens when the user wants to change the details of a Play. Let's say it originally involved a pass to the left, but the user now wants it to be a pass to the right. Making that change, however, not only affects all the future executions of that Play, but also changes the details of the Plays stored in history. The record of Downs gets "polluted," in effect, because the Play template has been changed. I have been rolling around several possible fixes to this situation, but I imagine the geniuses of SO know much more about how to handle this than I do. Still, the potential fixes I've come up with are: 1) "Versioning" of Plays. Each change to a Play template actually creates a new, separate Play object with the same name (as far as the user can tell). Underneath the hood, however, it is actually a different Play. This would work, AFAICT, but seems like it could potentially lead to a wild proliferation of Play objects, esp. if the user keeps switching back and forth between several versions of the same Play (creating object after object each time the user switches). Yes, the app could check for pre-existing, identical Plays, but... it just seems like a mess. 2) Have Downs, upon saving, record the details of the Play they used, but not as a Play object. This just seems ridiculous, given that the Play object is there to hold those just those details. 3) Recognize that Play objects are actually fulfilling 2 functions: one to be a template for a Down, and the other to record what template was used. These 2 functions have a different relationship with a Down. The first (template) has a to-many relationship. But the second (record) has a one-to-one relationship. This would mean creating a second object, something like "Play-Template" which would retain the to-many relationship with Downs. Play objects would get reconfigured to have a one-to-one relationship with Downs. A Down would use a Play-Template object for execution, but use the new kind of Play object to store what template was used. It is this change from a to-many relationship to a one-to-one relationship that represents the crux of the problem. Even writing this question out has helped me get clearer. I think something like solution 3 is the answer. However if anyone has a better idea or even just a confirmation that I'm on the right track, that would be helpful. (Remember, I'm not really making a football game, it's just faster/easier to use a metaphor everyone understands.) Thanks.

    Read the article

  • Data mining logs to locate a bug

    - by gooli
    I'm working on a data distribution application which receives data from a source and distributes that data to multiple target application. After successfully distributing several messages each second for 8 days, it missed a single message and did not deliver it properly to the clients. As I was looking at the logs I tried to find something there that was special for the time the miss happend - either in the data, its rate or some other condition but couldn't find anything. Is there any data mining technique I can use to identify how that specific event differs from other events?

    Read the article

  • Data mining textbook

    - by lmsasu
    If you followed a DM course, which textbook was used? I know about Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) and this poll. What did you effectively use?

    Read the article

  • Data mining google's web search results?

    - by cheesebunz
    Currently, i have a google web search. If a user searches starbucks, I would only want to retrieve the company or product information, not some other weird links like blog pages, using javascript, is it possible to do so? if yes, how am i able to do it? Kind of a newbie in the data mining part..thanks! Added my coding for download for clearer understanding : http://www.mediafire.com/?mzgo233kngm

    Read the article

  • Modifying a HTML page to fix several "bugs" add a function to next/previous on a option dropdown

    - by Dennis Sylvian
    SOF, I've got a few problems plaguing me at the moment and am wondering if anyone could assist me with them. I'm trying to get Next Class | Previous Class to act as buttons so that when Next Class is clicked it will go to the next item in the dropdown list and for previous it would go to back one. There used to be a scroll bar that allowed me to scroll the main window left and right, it's missing because (I think it was to do with the scroll left and scroll right function) The footer at the bottom doesn't show correctly on mobile devices; for some reason it appears completely differently to as it does on a computer. The "bar" practically and the Scroll Left and Scroll buttons don't appear at all on mobile devices. The scroll left button is unable to be clicked for some reason, I'm unsure what I've done wrong. Refreshing the page resets the horizontal scroll position to far left (I'm pretty sure this relates to the scroll bar) I want to also find a way so that on mobile devices the the header will not show the placeholder image, however I can't work out what CSS media tag(s) I should be using. Latest: http://jsfiddle.net/pwv7u/ Smaller HTML <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>DATA DATA DATA DATA DATA DATA DATA DATA</title> <style type="text/css"> <!-- @import url("nstyle.css"); --> </style> <script src="jquery.min.js" type="text/javascript"></script> <script type="text/javascript"> $(document).ready( function() { for (var i=0;i<($("table").children().length);i++){ if(readCookie(i)) $($($("table").children()[i]).children()[(readCookie(i))]).toggleClass('selected').siblings().removeClass('selected'); } $("tr").click(function(){ $(this).toggleClass('selected').siblings().removeClass('selected'); if(readCookie($(this).parent().index())){ if(readCookie($(this).parent().index())==$(this).index()) eraseCookie($(this).parent().index()); else{ eraseCookie($(this).parent().index()); createCookie($(this).parent().index(),$(this).index(),1); } } else createCookie($(this).parent().index(),$(this).index(),1); }); // gather CLASS info var selector = $('.class-selector').on('change', function(){ var id = this.value; if (id!==''){ scrollToAnchor(id); } }); $('a[id^="CLASS"]').each(function(){ var id = this.id, option = $('<option>',{ value: this.id, text:this.id }); selector.append(option); }); function scrollToAnchor(aid) { var aTag = $("a[id='" + aid + "']"); $('html,body').animate({ scrollTop: aTag.offset().top - 80 }, 1); } $("a.TOPJS").click(function () { scrollToAnchor('TOP'); }); $("a.KEYJS").click(function () { scrollToAnchor('KEY'); }); $("a.def").click(function () { $('#container').animate({ "scrollLeft": "-=204" }, 200); }); $("a.abc").click(function () { $("#container").animate({ "scrollLeft": "+=204" }, 200); }); function createCookie(name,value,days) { var expires; if (days) { var date = new Date(); date.setMilliseconds(0); date.setSeconds(0); date.setMinutes(0); date.setHours(0); date.setDate(date.getDate()+days); expires = "; expires="+date.toGMTString(); } else expires = ""; document.cookie = name+"="+value+expires+"; path=/"; } function readCookie(name) { var nameEQ = name + "="; var ca = document.cookie.split(';'); for(var i=0;i < ca.length;i++) { var c = ca[i]; while (c.charAt(0)==' ') c = c.substring(1,c.length); if (c.indexOf(nameEQ) === 0) return c.substring(nameEQ.length,c.length); } return null; } function eraseCookie(name) { createCookie(name,"",-1); } }); </script> </head> <body> <div id="header_container"> <div id="header"> <a href="http://site.x/" target="_blank"><img src="http://placehold.it/300x80"></a> <select class="class-selector"> <option value="">-select class-</option> </select> <div class="classcycler"> <a href="#TOP"><font color=#EFEFEF>Next Class</font></a> <font color=red>|</font> <a href="#TOP"><font color=#EFEFEF>Previous Class</font></a> </div> <div id="header1"> Semi-Transparent Image <a href="#TOP"><font color=#EFEFEF>Up to Top</font></a> | <a href="#KEY"><font color=#EFEFEF>Down to Key</font></a> </div> </div> </div> <a id="TOP"></a> <div id="container"> <table id="gradient-style"> <tbody> <thead> <tr> <th scope="col"><a id="CLASS1"></a>Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class<br>Test 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class Data 1</th> <th scope="col">Class 1<br>Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1<br>Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1</th> <th scope="col">Class 1 Class 1</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> (data text)</th> <th scope="col">title text</th> <th scope="col">text</th> <th scope="col">text</th> <th scope="col">title text</th> <th scope="col">title text</th> </tr> </thead> <tr class="ft3"><td>testing data</td><td>testing data</td><td>test</td><td>class b</td><td>test4</td><td><div align="left">data</div></td><td><div align="left"> </div></td><td><div align="left"></div></td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><tr> <tr class="f3"><td>test</td><td>test</td><td>test</td><td>class a</td><td>test2</td><td><div align="left"> </div></td><td><div align="left"></div></td><td><div align="left"></div></td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>test</td><tr> <thead> <tr> <th scope="col"><a id="CLASS2"></a>Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class<br>Test 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class Data 2</th> <th scope="col">Class 2<br>Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2<br>Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2</th> <th scope="col">Class 2 Class 2</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> data text</th> <th scope="col">title text<br> (data text)</th> <th scope="col">title text</th> <th scope="col">text</th> <th scope="col">text</th> <th scope="col">title text</th> <th scope="col">title text</th> </tr> </thead> <tr class="ft3"><td>testing data</td><td>testing data</td><td>test</td><td>class f</td><td>test2</td><td><div align="left">data</div></td><td><div align="left"></div></td><td><div align="left">data</div></td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><tr> <tr><td>test</td><td>testing data</td><td>test</td><td>class f</td><td>test4</td><td><div align="left">data</div></td><td><div align="left"></div></td><td><div align="left"></div></td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><tr> <tr class="f3"><td>test</td><td>testing data</td><td>testing data</td><td>class d</td><td>test5</td><td><div align="left">data</div></td><td><div align="left"> </div></td><td><div align="left">data</div></td><td>test</td><td>test</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><tr> <tr><td>testing data</td><td>test</td><td>test</td><td>class f</td><td>test5</td><td><div align="left"></div></td><td><div align="left"></div></td><td><div align="left">data</div></td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>test</td><td>testing data</td><tr> <tr class="f2"><td>test</td><td>test</td><td>testing data</td><td>class a</td><td>test1</td><td><div align="left">data</div></td><td><div align="left"> </div></td><td><div align="left">data</div></td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>test</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>testing data</td><td>test</td><td>testing data</td><td>testing data</td><td>test</td><tr> </tbody> <tfoot> <tr> <th class="alt" colspan="34" scope="col"><a id="KEY"></a><img src="http://placehold.it/300x50"></th> </tr> <tr> <td colspan="34"><em><b>DATA DATA</b> - DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA </em></td> </tr> <tr> <td class="alt" colspan="34"><em><b>DAT DATA</b> - DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA </em></td> </tr> </tfoot> </table> </div> <div id="footer_container"> <div id="footer"> <a href="http://site.x/" target="_blank"><img src="http://placehold.it/300x80"></a> <div class="footleft"> <a class="def" href="javascript: void(0);"><font color="#EFEFEF">Scroll Left</font></a> </div> <div id="footer1"> <font color="darkblue">Semi-Transparent Image</font> <i>Copyright &copy; 2013 <a href="http://site.x/" target="_blank" style="text-decoration: none"><font color=#ADD8E6>site</font></a>.</i> </div> <div id="footer2"> <i>All Rights Reserved.</i> </div> <div class="footright"> <a class="abc" href="javascript: void(0);"><font color="#EFEFEF">Scroll Right</font></a> </div> </div> </div> </body> </html> CSS gradient-style * { white-space: nowrap; } #header .class-selector { top: 10px; left: 20px; position: fixed; } #header .classcycler { top: 45px; left: 20px; position: fixed; font-size:20px; } body { line-height: 1.6em; background-color: #535353; overflow-x: scroll; } #gradient-style { font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif; font-size: 12px; margin: 0px; width: 100%; text-align: center; border-collapse: collapse; } #gradient-style th { font-size: 13px; font-weight: normal; line-height:250%; padding-left: 5px; padding-right: 5px; background: #535353 url('table-images/gradhead.png') repeat-x; border-top: 1px solid #fff; border-bottom: 1px solid #fff; color: #ffffff; } #gradient-style th.alt { font-family: "Times New Roman", Serif; text-align: left; padding: 10px; font-size: 26px; } #gradient-style td { padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #fff; border-left: 1px solid #fff; border-right: 1px solid #fff; color: #00000; border-top: 1px solid #fff; background: #FFF url('table-images/gradback.png') repeat-x; } #gradient-style tr.ft3 td { color: #00000; background: #99cde7 url('table-images/gradoverallstudent.png') repeat-x; font-weight: bold; } #gradient-style tr.f1 td { color: #00000; background: #99cde7 url('table-images/gradbeststudent.png') repeat-x; } #gradient-style tr.f2 td { color: #00000; background: #b7e2b6 url('table-images/gradmostattentedstudent.png') repeat-x; } #gradient-style tr.f3 td { color: #00000; background: #a9cd6c url('table-images/gradleastlatestudtent.png') repeat-x; } #gradient-style tfoot tr td { background: #6FA275; font-size: 12px; color: #000; padding: 10; text-align: left; } #gradient-style tbody tr:hover td, #gradient-style tbody tr.selected td { background: #d0dafd url('table-images/gradhover.png') repeat-x; color: #339; } body { margin: 0; padding: 0; } #header_container { background: #000000 url('table-images/gradhead.png') repeat-x; border: 0px solid #666; height: 80px; left: 0; position: fixed; width: 100%; top: 0; } #header { position: relative; margin: 0 auto; width: 500px; height: 100%; text-align: center; color: #0c0aad; } #header1 { position: absolute; width: 125%; top: 50px; } #container { margin: 0 auto; overflow: auto; padding: 80px 0; width: 100%; } #content { } #footer_container { background: #000000 url('table-images/gradhead.png') repeat-x; border: 0px solid #666; bottom: 0; height: 95px; left: 0; position: fixed; width: 100%; } #footer { position: relative; margin: 0 auto; height: 100%; text-align: center; color: #FFF; } #footer1 { position: absolute; width: 103%; top: 50px; } #footer2 { position: absolute; width: 110%; top: 70px; } #footer .footleft { top: 45px; left: 2%; position: absolute; font-size:20px; } #footer .footright { top: 45px; right: 2%; position: absolute; font-size:20px; }

    Read the article

  • Shrinking TCP Window Size to 0 on Cisco ASA

    - by Brent
    Having an issue with any large file transfer that crosses our Cisco ASA unit come to an eventual pause. Setup Test1: Server A, FileZilla Client <- 1GBPS - Cisco ASA <- 1 GBPS - Server B, FileZilla Server TCP Window size on large transfers will drop to 0 after around 30 seconds of a large file transfer. RDP session then becomes unresponsive for a minute or two and then is sporadic. After a minute or two, the FTP transfer resumes, but at 1-2 MB/s. When the FTP transfer is over, the responsiveness of the RDP session returns to normal. Test2: Server C in same network as Server B, FileZilla Client <- local network - Server B, FileZilla Server File will transfer at 30+ MB/s. Details ASA: 5520 running 8.3(1) with ASDM 6.3(1) Windows: Server 2003 R2 SP2 with latest patches Server: VMs running on HP C3000 blade chasis FileZilla: 3.3.5.1, latest stable build Transfer: 20 GB SQL .BAK file Protocol: Active FTP over tcp/20, tcp/21 Switches: Cisco Small Business 2048 Gigabit running latest 2.0.0.8 VMware: 4.1 HP: Flex-10 3.15, latest version Notes All servers are VMs. Thoughts Pretty sure the ASA is at fault since a transfer between VMs on the same network will not show a shrinking Window size. Our ASA is pretty vanilla. No major changes made to any of the settings. It has a bunch of NAT and ACLs. Wireshark Sample No. Time Source Destination Protocol Info 234905 73.916986 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131981791 Win=65535 Len=0 234906 73.917220 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234907 73.917224 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234908 73.917231 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131984551 Win=64155 Len=0 234909 73.917463 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234910 73.917467 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234911 73.917469 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234912 73.917476 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131988691 Win=60015 Len=0 234913 73.917706 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234914 73.917710 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234915 73.917715 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131991451 Win=57255 Len=0 234916 73.917949 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234917 73.917953 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234918 73.917958 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131994211 Win=54495 Len=0 234919 73.918193 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234920 73.918197 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234921 73.918202 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131996971 Win=51735 Len=0 234922 73.918435 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234923 73.918440 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234924 73.918445 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131999731 Win=48975 Len=0 234925 73.918679 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234926 73.918684 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234927 73.918689 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132002491 Win=46215 Len=0 234928 73.918922 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234929 73.918927 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234930 73.918932 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132005251 Win=43455 Len=0 234931 73.919165 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234932 73.919169 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234933 73.919174 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132008011 Win=40695 Len=0 234934 73.919408 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234935 73.919413 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234936 73.919418 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132010771 Win=37935 Len=0 234937 73.919652 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234938 73.919656 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234939 73.919661 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132013531 Win=35175 Len=0 234940 73.919895 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234941 73.919899 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234942 73.919904 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132016291 Win=32415 Len=0 234943 73.920138 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234944 73.920142 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234945 73.920147 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132019051 Win=29655 Len=0 234946 73.920381 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234947 73.920386 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234948 73.920391 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132021811 Win=26895 Len=0 234949 73.920625 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234950 73.920629 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234951 73.920632 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234952 73.920638 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132025951 Win=22755 Len=0 234953 73.920868 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234954 73.920871 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234955 73.920876 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132028711 Win=19995 Len=0 234956 73.921111 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234957 73.921115 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234958 73.921120 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132031471 Win=17235 Len=0 234959 73.921356 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234960 73.921362 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234961 73.921370 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132034231 Win=14475 Len=0 234962 73.921598 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234963 73.921606 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234964 73.921613 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132036991 Win=11715 Len=0 234965 73.921841 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234966 73.921848 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234967 73.921855 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132039751 Win=8955 Len=0 234968 73.922085 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234969 73.922092 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234970 73.922099 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132042511 Win=6195 Len=0 234971 73.922328 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234972 73.922335 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234973 73.922342 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132045271 Win=3435 Len=0 234974 73.922571 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234975 73.922579 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234976 73.922586 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132048031 Win=675 Len=0 234981 75.866453 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 675 bytes 234985 76.020168 1.1.1.1 2.2.2.2 TCP [TCP ZeroWindow] ftp-data ivecon-port [ACK] Seq=1 Ack=132048706 Win=0 Len=0 234989 76.771633 2.2.2.2 1.1.1.1 TCP [TCP ZeroWindowProbe] ivecon-port ftp-data [ACK] Seq=132048706 Ack=1 Win=65535 Len=1 234990 76.771648 1.1.1.1 2.2.2.2 TCP [TCP ZeroWindowProbeAck] [TCP ZeroWindow] ftp-data ivecon-port [ACK] Seq=1 Ack=132048706 Win=0 Len=0 234997 78.279701 2.2.2.2 1.1.1.1 TCP [TCP ZeroWindowProbe] ivecon-port ftp-data [ACK] Seq=132048706 Ack=1 Win=65535 Len=1 234998 78.279714 1.1.1.1 2.2.2.2 TCP [TCP ZeroWindowProbeAck] [TCP ZeroWindow] ftp-data ivecon-port [ACK] Seq=1 Ack=132048706 Win=0 Len=0

    Read the article

  • Shrinking Windows Size to 0 on Cisco ASA

    - by Brent
    Having an issue with any large file transfer that crosses our Cisco ASA unit come to an eventual pause. Setup Test1: Server A, FileZilla Client <- 1GBPS - Cisco ASA <- 1 GBPS - Server B, FileZilla Server TCP Window size on large transfers will drop to 0 after around 30 seconds of a large file transfer. RDP session then becomes unresponsive for a minute or two and then is sporadic. After a minute or two, the FTP transfer resumes, but at 1-2 MB/s. When the FTP transfer is over, the responsiveness of the RDP session returns to normal. Test2: Server C in same network as Server B, FileZilla Client <- local network - Server B, FileZilla Server File will transfer at 30+ MB/s. Details ASA: 5520 running 8.3(1) with ASDM 6.3(1) Windows: Server 2003 R2 SP2 with latest patches Server: VMs running on HP C3000 blade chasis FileZilla: 3.3.5.1, latest stable build Transfer: 20 GB SQL .BAK file Protocol: Active FTP over tcp/20, tcp/21 Switches: Cisco Small Business 2048 Gigabit running latest 2.0.0.8 VMware: 4.1 HP: Flex-10 3.15, latest version Notes All servers are VMs. Thoughts Pretty sure the ASA is at fault since a transfer between VMs on the same network will not show a shrinking Window size. Our ASA is pretty vanilla. No major changes made to any of the settings. It has a bunch of NAT and ACLs. Wireshark Sample No. Time Source Destination Protocol Info 234905 73.916986 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131981791 Win=65535 Len=0 234906 73.917220 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234907 73.917224 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234908 73.917231 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131984551 Win=64155 Len=0 234909 73.917463 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234910 73.917467 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234911 73.917469 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234912 73.917476 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131988691 Win=60015 Len=0 234913 73.917706 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234914 73.917710 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234915 73.917715 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131991451 Win=57255 Len=0 234916 73.917949 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234917 73.917953 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234918 73.917958 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131994211 Win=54495 Len=0 234919 73.918193 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234920 73.918197 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234921 73.918202 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131996971 Win=51735 Len=0 234922 73.918435 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234923 73.918440 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234924 73.918445 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=131999731 Win=48975 Len=0 234925 73.918679 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234926 73.918684 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234927 73.918689 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132002491 Win=46215 Len=0 234928 73.918922 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234929 73.918927 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234930 73.918932 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132005251 Win=43455 Len=0 234931 73.919165 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234932 73.919169 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234933 73.919174 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132008011 Win=40695 Len=0 234934 73.919408 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234935 73.919413 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234936 73.919418 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132010771 Win=37935 Len=0 234937 73.919652 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234938 73.919656 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234939 73.919661 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132013531 Win=35175 Len=0 234940 73.919895 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234941 73.919899 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234942 73.919904 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132016291 Win=32415 Len=0 234943 73.920138 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234944 73.920142 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234945 73.920147 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132019051 Win=29655 Len=0 234946 73.920381 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234947 73.920386 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234948 73.920391 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132021811 Win=26895 Len=0 234949 73.920625 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234950 73.920629 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234951 73.920632 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234952 73.920638 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132025951 Win=22755 Len=0 234953 73.920868 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234954 73.920871 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234955 73.920876 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132028711 Win=19995 Len=0 234956 73.921111 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234957 73.921115 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234958 73.921120 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132031471 Win=17235 Len=0 234959 73.921356 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234960 73.921362 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234961 73.921370 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132034231 Win=14475 Len=0 234962 73.921598 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234963 73.921606 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234964 73.921613 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132036991 Win=11715 Len=0 234965 73.921841 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234966 73.921848 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234967 73.921855 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132039751 Win=8955 Len=0 234968 73.922085 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234969 73.922092 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234970 73.922099 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132042511 Win=6195 Len=0 234971 73.922328 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234972 73.922335 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234973 73.922342 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132045271 Win=3435 Len=0 234974 73.922571 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234975 73.922579 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 1380 bytes 234976 73.922586 1.1.1.1 2.2.2.2 TCP ftp-data ivecon-port [ACK] Seq=1 Ack=132048031 Win=675 Len=0 234981 75.866453 2.2.2.2 1.1.1.1 FTP-DATA FTP Data: 675 bytes 234985 76.020168 1.1.1.1 2.2.2.2 TCP [TCP ZeroWindow] ftp-data ivecon-port [ACK] Seq=1 Ack=132048706 Win=0 Len=0 234989 76.771633 2.2.2.2 1.1.1.1 TCP [TCP ZeroWindowProbe] ivecon-port ftp-data [ACK] Seq=132048706 Ack=1 Win=65535 Len=1 234990 76.771648 1.1.1.1 2.2.2.2 TCP [TCP ZeroWindowProbeAck] [TCP ZeroWindow] ftp-data ivecon-port [ACK] Seq=1 Ack=132048706 Win=0 Len=0 234997 78.279701 2.2.2.2 1.1.1.1 TCP [TCP ZeroWindowProbe] ivecon-port ftp-data [ACK] Seq=132048706 Ack=1 Win=65535 Len=1 234998 78.279714 1.1.1.1 2.2.2.2 TCP [TCP ZeroWindowProbeAck] [TCP ZeroWindow] ftp-data ivecon-port [ACK] Seq=1 Ack=132048706 Win=0 Len=0

    Read the article

  • Big Data Appliance X4-2 Release Announcement

    - by Jean-Pierre Dijcks
    Today we are announcing the release of the 3rd generation Big Data Appliance. Read the Press Release here. Software Focus The focus for this 3rd generation of Big Data Appliance is: Comprehensive and Open - Big Data Appliance now includes all Cloudera Software, including Back-up and Disaster Recovery (BDR), Search, Impala, Navigator as well as the previously included components (like CDH, HBase and Cloudera Manager) and Oracle NoSQL Database (CE or EE). Lower TCO then DIY Hadoop Systems Simplified Operations while providing an open platform for the organization Comprehensive security including the new Audit Vault and Database Firewall software, Apache Sentry and Kerberos configured out-of-the-box Hardware Update A good place to start is to quickly review the hardware differences (no price changes!). On a per node basis the following is a comparison between old and new (X3-2) hardware: Big Data Appliance X3-2 Big Data Appliance X4-2 CPU 2 x 8-Core Intel® Xeon® E5-2660 (2.2 GHz) 2 x 8-Core Intel® Xeon® E5-2650 V2 (2.6 GHz) Memory 64GB 64GB Disk 12 x 3TB High Capacity SAS 12 x 4TB High Capacity SAS InfiniBand 40Gb/sec 40Gb/sec Ethernet 10Gb/sec 10Gb/sec For all the details on the environmentals and other useful information, review the data sheet for Big Data Appliance X4-2. The larger disks give BDA X4-2 33% more capacity over the previous generation while adding faster CPUs. Memory for BDA is expandable to 512 GB per node and can be done on a per-node basis, for example for NameNodes or for HBase region servers, or for NoSQL Database nodes. Software Details More details in terms of software and the current versions (note BDA follows a three monthly update cycle for Cloudera and other software): Big Data Appliance 2.2 Software Stack Big Data Appliance 2.3 Software Stack Linux Oracle Linux 5.8 with UEK 1 Oracle Linux 6.4 with UEK 2 JDK JDK 6 JDK 7 Cloudera CDH CDH 4.3 CDH 4.4 Cloudera Manager CM 4.6 CM 4.7 And like we said at the beginning it is important to understand that all other Cloudera components are now included in the price of Oracle Big Data Appliance. They are fully supported by Oracle and available for all BDA customers. For more information: Big Data Appliance Data Sheet Big Data Connectors Data Sheet Oracle NoSQL Database Data Sheet (CE | EE) Oracle Advanced Analytics Data Sheet

    Read the article

  • Review: Data Modeling 101

    I just recently read “Data Modeling 101”by Scott W. Ambler where he gave an overview of fundamental data modeling skills. I think this article was excellent for anyone who was just starting to learn or refresh their skills in regards to the modeling of data.  Scott defines data modeling as the act of exploring data oriented structures.  He goes on to explain about how data models are actually used by defining three different types of models. Types of Data Models Conceptual Data Model  Logical Data Model (LDMs) Physical Data Model(PDMs) He further expands on modeling by exploring common data modeling notations because there are no industry standards for the practice of data modeling. Scott then defines how to actually model data by expanding on entities, attributes, identities, and relationships which are the basic building blocks of data models. In addition he discusses the value of normalization for redundancy and demoralization for performance. Finally, he discuss ways in which Developers and DBAs can become better data modelers through the use of practice, and seeking guidance from more experienced data modelers.

    Read the article

  • HTML5 data-* (custom data attribute)

    - by Renso
    Goal: Store custom data with the data attribute on any DOM element and retrieve it. Previously under HTML4 we used to use classes to store custom data, something to the affect of <input class="account void limit-5000 over-4999" /> and then have to parse the data out of the class In a book published by Peter-Paul Koch in 2007, ppk on JavaScript, he explains why and how to use custom attributes to make data more accessible to JavaScript, using name-value pairs. Accessing a custom attribute account-limit=5000 is much easier and more intuitive than trying to parse it out of a class, Plus, what if the class name for example "color-5" has a representative class definition in a CSS stylesheet that hides it away or worse some JavaScript plugin that automatically adds 5000 to it, or something crazy like that, just because it is a valid class name. As you can see there are quite a few reasons why using classes is a bad design and why it was important to define custom data attributes in HTML5. Syntax: You define the data attribute by simply prefixing any data item you want to store with any HTML element with "data-". For example to store our customers account data with a hidden input element: <input type="hidden" data-account="void" data-limit=5000 data-over=4999  /> How to access the data: account  -     element.dataset.account limit    -     element.dataset.limit You can also access it by using the more traditional get/setAttribute method or if using jQuery $('#element').attr('data-account','void') Browser support: All except for IE. There is an IE hack around this at http://gist.github.com/362081. Special Note: Be AWARE, do not use upper-case when defining your data elements as it is all converted to lower-case when reading it, so: data-myAccount="A1234" will not be found when you read it with: element.dataset.myAccount Use only lowercase when reading so this will work: element.dataset.myaccount

    Read the article

  • Master Data Management – A Foundation for Big Data Analysis

    - by Manouj Tahiliani
    While Master Data Management has crossed the proverbial chasm and is on its way to becoming mainstream, businesses are being hammered by a new megatrend called Big Data. Big Data is characterized by massive volumes, its high frequency, the variety of less structured data sources such as email, sensors, smart meters, social networks, and Weblogs, and the need to analyze vast amounts of data to determine value to improve upon management decisions. Businesses that have embraced MDM to get a single, enriched and unified view of Master data by resolving semantic discrepancies and augmenting the explicit master data information from within the enterprise with implicit data from outside the enterprise like social profiles will have a leg up in embracing Big Data solutions. This is especially true for large and medium-sized businesses in industries like Retail, Communications, Financial Services, etc that would find it very challenging to get comprehensive analytical coverage and derive long-term success without resolving the limitations of the heterogeneous topology that leads to disparate, fragmented and incomplete master data. For analytical success from Big Data or in other words ROI from Big Data Investments, businesses need to acquire, organize and analyze the deluge of data to make better decisions. There will need to be a coexistence of structured and unstructured data and to maintain a tight link between the two to extract maximum insights. MDM is the catalyst that helps maintain that tight linkage by providing an understanding about the identity, characteristics of Persons, Companies, Products, Suppliers, etc. associated with the Big Data and thereby help accelerate ROI. In my next post I will discuss about patterns for co-existing Big Data Solutions and MDM. Feel free to provide comments and thoughts on above as well as Integration or Architectural patterns.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >