Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 41/172 | < Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48  | Next Page >

  • USB drive errors after airport scan

    - by bobobobo
    Well, I just got a new PNY usb drive and it passed through an airport scanner yesterday. For some reason, I wrote to it and then tried to read from it today, and it gave me a corrupted error! chkdsk reports errors like: Bad links in lost chain at cluster 1179 corrected. Lost chain cross-linked at cluster 1200. Orphan truncated. Lost chain cross-linked at cluster 1228. Orphan truncated. Lost chain cross-linked at cluster 1236. Orphan truncated. Lost chain cross-linked at cluster 1237. Orphan truncated. Lost chain cross-linked at cluster 1244. Orphan truncated. Lost chain cross-linked at cluster 1250. Orphan truncated. Lost chain cross-linked at cluster 1266. Orphan truncated. Lost chain cross-linked at cluster 1278. Orphan truncated. etc. What is this from? Could it possibly be from the airport scanner, or is it likely a defective USB chip? How can I check the chip to see if I should just return/throw it away or continue to use it?

    Read the article

  • How can I cluster short messages [Tweets] based on topic ? [Topic Based Clustering]

    - by Jagira
    Hello, I am planning an application which will make clusters of short messages/tweets based on topics. The number of topics will be limited like Sports [ NBA, NFL, Cricket, Soccer ], Entertainment [ movies, music ] and so on... I can think of two approaches to this Ask for users to tag questions like Stackoverflow does. Users can select tags from a predefined list of tags. Then on server side I will cluster them on based of tags. Pros:- Simple design. Less complexity in code. Cons:- Choices for users will be restricted. Clusters will not be dynamic. If a new event occurs, the predefined tags will miss it. Take the message, delete the stopwords [ predefined in a dictionary ] and apply some clustering algorithm to make a cluster and depending on its popularity, display the cluster. The cluster will be maintained according to its sustained popularity. New messages will be skimmed and assigned to corresponding clusters. Pros:- Dynamic clustering based on the popularity of the event/accident. Cons:- Increased complexity. More server resources required. I would like to know whether there are any other approaches to this problem. Or are there any ways of improving the above mentioned methods? Also suggest some good clustering algorithms.I think "K-Nearest Clustering" algorithm is apt for this situation.

    Read the article

  • Can ASM method-visitors be used with interfaces?

    - by Olaf Mertens
    I need to write a tool that lists the classes that call methods of specified interfaces. It will be used as part of the build process of a large java application consisting of many modules. The goal is to automatically document the dependencies between certain java modules. I found several tools for dependency analysis, but they don't work on the method level, just for packages or jars. Finally I found ASM, that seems to do what I need. The following code prints the method dependencies of all class files in a given directory: import java.io.*; import java.util.*; import org.objectweb.asm.ClassReader; public class Test { public static void main(String[] args) throws Exception { File dir = new File(args[0]); List<File> classFiles = new LinkedList<File>(); findClassFiles(classFiles, dir); for (File classFile : classFiles) { InputStream input = new FileInputStream(classFile); new ClassReader(input).accept(new MyClassVisitor(), 0); input.close(); } } private static void findClassFiles(List<File> list, File dir) { for (File file : dir.listFiles()) { if (file.isDirectory()) { findClassFiles(list, file); } else if (file.getName().endsWith(".class")) { list.add(file); } } } } import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.commons.EmptyVisitor; public class MyClassVisitor extends EmptyVisitor { private String className; @Override public void visit(int version, int access, String name, String signature, String superName, String[] interfaces) { this.className = name; } @Override public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { System.out.println(className + "." + name); return new MyMethodVisitor(); } } import org.objectweb.asm.commons.EmptyVisitor; public class MyMethodVisitor extends EmptyVisitor { @Override public void visitMethodInsn(int opcode, String owner, String name, String desc) { String key = owner + "." + name; System.out.println(" " + key); } } The Problem: The code works for regular classes only! If the class file contains an interface, visitMethod is called, but not visitMethodInsn. I don't get any info about the callers of interface methods. Any ideas?

    Read the article

  • Clustering for Mere Mortals (Pt2)

    - by Geoff N. Hiten
    Planning. I could stop there and let that be the entirety post #2 in this series.  Planning is the single most important element in building a cluster and the Laptop Demo Cluster is no exception.  One of the more awkward parts of actually creating a cluster is coordinating information between Windows Clustering and SQL Clustering.  The dialog boxes show up hours apart, but still have to have matching and consistent information. Excel seems to be a good tool for tracking these settings.  My workbook has four pages: Systems, Storage, Network, and Service Accounts.  The systems page looks like this:   Name Role Software Location East Physical Cluster Node 1 Windows Server 2008 R2 Enterprise Laptop VM West Physical Cluster Node 2 Windows Server 2008 R2 Enterprise Laptop VM North Physical Cluster Node 3 (Future Reserved) Windows Server 2008 R2 Enterprise Laptop VM MicroCluster Cluster Management Interface N/A Laptop VM SQL01 High-Performance High-Security Instance SQL Server 2008 Enterprise Edition x64 SP1 Laptop VM SQL02 High-Performance Standard-Security Instance SQL Server 2008 Enterprise Edition x64 SP1 Laptop VM SQL03 Standard-Performance High-Security Instance SQL Server 2008 Enterprise Edition x64 SP1 Laptop VM Note that everything that has a computer name is listed here, whether physical or virtual. Storage looks like this: Storage Name Instance Purpose Volume Path Size (GB) LUN ID Speed Quorum MicroCluster Cluster Quorum Quorum Q: 2     SQL01Anchor SQL01 Instance Anchor SQL01Anchor L: 2     SQL02Anchor SQL02 Instance Anchor SQL02Anchor M: 2     SQL01Data1 SQL01 SQL Data SQL01Data1 L:\MountPoints\SQL01Data1 2     SQL02Data1 SQL02 SQL Data SQL02Data1 M:\MountPoints\SQL02Data1       Starting at the left is the name used in the storage array.  It is important to rename resources at each level, whether it is Storage, LUN, Volume, or disk folder.  Otherwise, troubleshooting things gets complex and difficult.  You want to be able to glance at a resource at any level and see where it comes from and what it is connected to. Networking is the same way:   System Network VLAN  IP Subnet Mask Gateway DNS1 DNS2 East Public Cluster1 10.97.230.x(DHCP) 255.255.255.0 10.97.230.1 10.97.230.1 10.97.230.1 East Heartbeat Cluster2   255.255.255.0       West Public Cluster1 10.97.230.x(DHCP) 255.255.255.0 10.97.230.1 10.97.230.1 10.97.230.1 West Heartbeat Cluster2   255.255.255.0       North Public Cluster1 10.97.230.x(DHCP) 255.255.255.0 10.97.230.1 10.97.230.1 10.97.230.1 North Heartbeat Cluster2   255.255.255.0       SQL01 Public Cluster1 10.97.230.x(DHCP) 255.255.255.0       SQL02 Public Cluster1 10.97.230.x(DHCP) 255.255.255.0       One hallmark of a poorly planned and implemented cluster is a bunch of "Local Network Connection #n" entries in the network settings page.  That lets me know that somebody didn't care about the long-term supportabaility of the cluster.  This can be critically important with Hyper-V Clusters and their high NIC counts.  Final page:   Instance Service Name Account Password Domain OU SQL01 SQL Server SVCSQL01 Baseline22 MicroAD Service Accounts SQL01 SQL Agent SVCSQL01 Baseline22 MicroAD Service Accounts SQL02 SQL Server SVC_SQL02 Baseline22 MicroAD Service Accounts SQL02 SQL Agent SVC_SQL02 Baseline22 MicroAD Service Accounts SQL03 (Future) SQL Server SVC_SQL03 Baseline22 MicroAD Service Accounts SQL03 (Future) SQL Agent SVC_SQL03 Baseline22 MicroAD Service Accounts             Installation Account           administrator            Yes.  I write down the account information.  I secure the file via NTFS, but I don't want to fumble around looking for passwords when it comes time to rebuild a node. Always fill out the workbook COMPLETELY before installing anything.  The whole point is to have everything you need at your fingertips before you begin.  The install experience is so much better and more productive with this information in place.

    Read the article

  • Oracle Fusion Supply Chain Management (SCM) Designs May Improve End User Productivity

    - by Applications User Experience
    By Applications User Experience on March 10, 2011 Michele Molnar, Senior Usability Engineer, Applications User Experience The Challenge: The SCM User Experience team, in close collaboration with product management and strategy, completely redesigned the user experience for Oracle Fusion applications. One of the goals of this redesign was to increase end user productivity by applying design patterns and guidelines and incorporating findings from extensive usability research. But a question remained: How do we know that the Oracle Fusion designs will actually increase end user productivity? The Test: To answer this question, the SCM Usability Engineers compared Oracle Fusion designs to their corresponding existing Oracle applications using the workflow time analysis method. The workflow time analysis method breaks tasks into a sequence of operators. By applying standard time estimates for all of the operators in the task, an estimate of the overall task time can be calculated. The workflow time analysis method has been recently adopted by the Applications User Experience group for use in predicting end user productivity. Using this method, a design can be tested and refined as needed to improve productivity even before the design is coded. For the study, we selected some of our recent designs for Oracle Fusion Product Information Management (PIM). The designs encompassed tasks performed by Product Managers to create, manage, and define products for their organization. (See Figure 1 for an example.) In applying this method, the SCM Usability Engineers collaborated with Product Management to compare the new Oracle Fusion Applications designs against Oracle’s existing applications. Together, we performed the following activities: Identified the five most frequently performed tasks Created detailed task scenarios that provided the context for each task Conducted task walkthroughs Analyzed and documented the steps and flow required to complete each task Applied standard time estimates to the operators in each task to estimate the overall task completion time Figure 1. The interactions on each Oracle Fusion Product Information Management screen were documented, as indicated by the red highlighting. The task scenario and script provided the context for each task.  The Results: The workflow time analysis method predicted that the Oracle Fusion Applications designs would result in productivity gains in each task, ranging from 8% to 62%, with an overall productivity gain of 43%. All other factors being equal, the new designs should enable these tasks to be completed in about half the time it takes with existing Oracle Applications. Further analysis revealed that these performance gains would be achieved by reducing the number of clicks and screens needed to complete the tasks. Conclusions: Using the workflow time analysis method, we can expect the Oracle Fusion Applications redesign to succeed in improving end user productivity. The workflow time analysis method appears to be an effective and efficient tool for testing, refining, and retesting designs to optimize productivity. The workflow time analysis method does not replace usability testing with end users, but it can be used as an early predictor of design productivity even before designs are coded. We are planning to conduct usability tests later in the development cycle to compare actual end user data with the workflow time analysis results. Such results can potentially be used to validate the productivity improvement predictions. Used together, the workflow time analysis method and usability testing will enable us to continue creating, evaluating, and delivering Oracle Fusion designs that exceed the expectations of our end users, both in the quality of the user experience and in productivity. (For more information about studying productivity, refer to the Measuring User Productivity blog.)

    Read the article

  • UEC - Can the Cluster Controller and Storage Controller be seperate systems?

    - by Jeremy Hajek
    My department is implementing an Ubuntu Enterprise Cloud. I have done the testing and am quite comfortable with the 4 pieces, CC/SC, CLC, WS, NC. Looking at various documents below it appears the the Storage Controller and Cluster Controller (eucalyptus-sc and eucalyptus-cc) are always installed on the same system. My question is this: can I install the storage controller and the cluster controller on separate systems? http://open.eucalyptus.com/wiki/EucalyptusAdvanced_v2.0 the picture indicates that cc and sc are two different machines http://www.canonical.com/sites/default/files/active/Whitepaper-UbuntuEnterpriseCloudArchitecture-v1.pdf P.10 1st paragraph uses the word "machine(s)" http://software.intel.com/file/31966 P. 8 indicates the same separate architecture BUT... https://help.ubuntu.com/community/UEC/PackageInstallSeparate indicates below that the SC and CC are to be on the same system.

    Read the article

  • Easy way to update apache on a server cluster with shared NFS conf?

    - by Simon
    we have a server setup where a server cluster connected with a db/files/conf server shared by nfs serve our sites, behind an Elastic Load Balancer at Amazon EC2. The setup works correctly, but keeping it up to date is becoming like hell, because the apache/php conf that webservers use is shared through NFS. So, if we try to run an apt-get upgrade on a server on the cluster, it will abort it due to the webserver is not able to write back the configuration to the nfs server. Every time we want to update the machines, or install a package like php-curl, we need to create a new ami, so the changes will reflect on the new launched amis. Could it be another way of doing the things simpler? Thanks in advance!

    Read the article

  • Will the VMs on the ESXi cluster be running after disconnection from the network?

    - by John
    What will happen if I disconnect the ESXi cluster configured with HA from the switch(I need to change the power source on the main switch) and there is no management port redundancy? I'm going to disable the host monitoring and VM monitoring within HA settings, and connect to the switch after it boots up. Will the virtual machines be running if I disconnect the hosts from the network so they will not see any other ESXi host? I hope everything will work fine and the hosts will join the cluster after they connect to the network again, but I would like to be sure ..

    Read the article

  • Configure Oracle SOA JMSAdatper to Work with WLS JMS Topics

    - by fip
    The WebLogic JMS Topic are typically running in a WLS cluster. So as your SOA composites that receive these Topic messages. In some situation, the two clusters are the same while in others they are sepearate. The composites in SOA cluster are subscribers to the JMS Topic in WebLogic cluster. As nature of JMS Topic is meant to distribute the same copy of messages to all its subscribers, two questions arise immediately when it comes to load balancing the JMS Topic messages against the SOA composites: How to assure all of the SOA cluster members receive different messages instead of the same (duplicate) messages, even though the SOA cluster members are all subscribers to the Topic? How to make sure the messages are evenly distributed (load balanced) to SOA cluster members? Here we will walk through how to configure the JMS Topic, the JmsAdapter connection factory, as well as the composite so that the JMS Topic messages will be evenly distributed to same composite running off different SOA cluster nodes without causing duplication. 2. The typical configuration In this typical configuration, we achieve the load balancing of JMS Topic messages to JmsAdapters by configuring a partitioned distributed topic along with sharable subscriptions. You can reference the documentation for explanation of PDT. And this blog posting does a very good job to visually explain how this combination of configurations would message load balancing among clients of JMS Topics. Our job is to apply this configuration in the context of SOA JMS Adapters. To do so would involve the following steps: Step A. Configure JMS Topic to be UDD and PDT, at the WebLogic cluster that house the JMS Topic Step B. Configure JCA Connection Factory with proper ServerProperties at the SOA cluster Step C. Reference the JCA Connection Factory and define a durable subscriber name, at composite's JmsAdapter (or the *.jca file) Here are more details of each step: Step A. Configure JMS Topic to be UDD and PDT, You do this at the WebLogic cluster that house the JMS Topic. You can follow the instructions at Administration Console Online Help to create a Uniform Distributed Topic. If you use WebLogic Console, then at the same administration screen you can specify "Distribution Type" to be "Uniform", and the Forwarding policy to "Partitioned", which would make the JMS Topic Uniform Distributed Destination and a Partitioned Distributed Topic, respectively Step B: Configure ServerProperties of JCA Connection Factory You do this step at the SOA cluster. This step is to make the JmsAdapter that connect to the JMS Topic through this JCA Connection Factory as a certain type of "client". When you configure the JCA Connection Factory for the JmsAdapter, you define the list of properties in FactoryProperties field, in a semi colon separated list: ClientID=myClient;ClientIDPolicy=UNRESTRICTED;SubscriptionSharingPolicy=SHARABLE;TopicMessageDistributionAll=false You can refer to Chapter 8.4.10 Accessing Distributed Destinations (Queues and Topics) on the WebLogic Server JMS of the Adapter User Guide for the meaning of these properties. Please note: Except for ClientID, other properties such as the ClientIDPolicy=UNRESTRICTED, SubscriptionSharingPolicy=SHARABLE and TopicMessageDistributionAll=false are all default settings for the JmsAdapter's connection factory. Therefore you do NOT have to explicitly specify them explicitly. All you need to do is the specify the ClientID. The ClientID is different from the subscriber ID that we are to discuss in the later steps. To make it simple, you just need to remember you need to specify the client ID and make it unique per connection factory. Here is the example setting: Step C. Reference the JCA Connection Factory and define a durable subscriber name, at composite's JmsAdapter (or the *.jca file) In the following example, the value 'MySubscriberID-1' was given as the value of property 'DurableSubscriber': <adapter-config name="subscribe" adapter="JMS Adapter" wsdlLocation="subscribe.wsdl" xmlns="http://platform.integration.oracle/blocks/adapter/fw/metadata"> <connection-factory location="eis/wls/MyTestUDDTopic" UIJmsProvider="WLSJMS" UIConnectionName="ateam-hq24b"/> <endpoint-activation portType="Consume_Message_ptt" operation="Consume_Message"> <activation-spec className="oracle.tip.adapter.jms.inbound.JmsConsumeActivationSpec"> <property name="DurableSubscriber" value="MySubscriberID-1"/> <property name="PayloadType" value="TextMessage"/> <property name="UseMessageListener" value="false"/> <property name="DestinationName" value="jms/MyTestUDDTopic"/> </activation-spec> </endpoint-activation> </adapter-config> You can set the durable subscriber name either at composite's JmsAdapter wizard,or by directly editing the JmsAdapter's *.jca file within the Composite project. 2.The "atypical" configurations: For some systems, there may be restrictions that do not allow the afore mentioned "typical" configurations be applied. For examples, some deployments may be required to configure the JMS Topic to be Replicated Distributed Topic rather than Partition Distributed Topic. We would like to discuss those scenarios here: Configuration A: The JMS Topic is NOT PDT In this case, you need to define the message selector 'NOT JMS_WL_DDForwarded' in the adapter's *.jca file, to filter out those "replicated" messages. Configuration B. The ClientIDPolicy=RESTRICTED In this case, you need separate factories for different composites. More accurately, you need separate factories for different *.jca file of JmsAdapter. References: Managing Durable Subscription WebLogic JMS Partitioned Distributed Topics and Shared Subscriptions JMS Troubleshooting: Configuring JMS Message Logging: Advanced Programming with Distributed Destinations Using the JMS Destination Availability Helper API

    Read the article

  • Grontmij|Carl Bro A/S Relies on Telerik Reporting for Data Presentation and Analysis of Critical Bus

    Grontmij | Carl Bro A/S, an international company providing consultancy services in the fields of building, transportation, water, environment, energy and industry is using Telerik Reporting to save coding time and build an expandable  solution with swift performance and rich users interface. The main objective was to design and develop a web application that would provide users with an overview of construction budgets, contacts and all documents related to the properties and buildings they managed....Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • Should we exclude code for the code coverage analysis?

    - by romaintaz
    I'm working on several applications, mainly legacy ones. Currently, their code coverage is quite low: generally between 10 and 50%. Since several weeks, we have recurrent discussions with the Bangalore teams (main part of the development is made offshore in India) regarding the exclusions of packages or classes for Cobertura (our code coverage tool, even if we are currently migrating to JaCoCo). Their point of view is the following: as they will not write any unit tests on some layers of the application (1), these layers should be simply excluded from the code coverage measure. In others words, they want to limit the code coverage measure to the code that is tested or should be tested. Also, when they work on unit test for a complex class, the benefits - purely in term of code coverage - will be unnoticed due in a large application. Reducing the scope of the code coverage will make this kind of effort more visible... The interest of this approach is that we will have a code coverage measure that indicates the current status of the part of the application we consider as testable. However, my point of view is that we are somehow faking the figures. This solution is an easy way to reach higher level of code coverage without any effort. Another point that bothers me is the following: if we show a coverage increase from one week to another, how can we tell if this good news is due to the good work of the developers, or simply due to new exclusions? In addition, we will not be able to know exactly what is considered in the code coverage measure. For example, if I have a 10,000 lines of code application with 40% of code coverage, I can deduct that 40% of my code base is tested (2). But what happen if we set exclusions? If the code coverage is now 60%, what can I deduct exactly? That 60% of my "important" code base is tested? How can I As far as I am concerned, I prefer to keep the "real" code coverage value, even if we can't be cheerful about it. In addition, thanks to Sonar, we can easily navigate in our code base and know, for any module / package / class, its own code coverage. But of course, the global code coverage will still be low. What is your opinion on that subject? How do you do on your projects? Thanks. (1) These layers are generally related to the UI / Java beans, etc. (2) I know that's not true. In fact, it only means that 40% of my code base

    Read the article

  • How do I overcome paralysis by analysis when coding?

    - by LuxuryMode
    When I start a new project, I often times immediately start thinking about the details of implementation. "Where am I gonna put the DataBaseHandler? How should I use it? Should classes that want to use it extend from some Abstract superclass..? Should I an interface? What level of abstraction am I going to use in my class that contains methods for sending requests and parsing data?" I end up stalling for a long time because I want to code for extensibility and reusability. But I feel it almost impossible to get past thinking about how to implement perfectly. And then, if I try to just say "screw it, just get it done!", I hit a brick wall pretty quickly because my code isn't organized, I mixed levels of abstractions, etc. What are some techniques/methods you have for launching into a new project while also setting up a logical/modular structure that will scale well?

    Read the article

  • CPU Usage in Very Large Coherence Clusters

    - by jpurdy
    When sizing Coherence installations, one of the complicating factors is that these installations (by their very nature) tend to be application-specific, with some being large, memory-intensive caches, with others acting as I/O-intensive transaction-processing platforms, and still others performing CPU-intensive calculations across the data grid. Regardless of the primary resource requirements, Coherence sizing calculations are inherently empirical, in that there are so many permutations that a simple spreadsheet approach to sizing is rarely optimal (though it can provide a good starting estimate). So we typically recommend measuring actual resource usage (primarily CPU cycles, network bandwidth and memory) at a given load, and then extrapolating from those measurements. Of course there may be multiple types of load, and these may have varying degrees of correlation -- for example, an increased request rate may drive up the number of objects "pinned" in memory at any point, but the increase may be less than linear if those objects are naturally shared by concurrent requests. But for most reasonably-designed applications, a linear resource model will be reasonably accurate for most levels of scale. However, at extreme scale, sizing becomes a bit more complicated as certain cluster management operations -- while very infrequent -- become increasingly critical. This is because certain operations do not naturally tend to scale out. In a small cluster, sizing is primarily driven by the request rate, required cache size, or other application-driven metrics. In larger clusters (e.g. those with hundreds of cluster members), certain infrastructure tasks become intensive, in particular those related to members joining and leaving the cluster, such as introducing new cluster members to the rest of the cluster, or publishing the location of partitions during rebalancing. These tasks have a strong tendency to require all updates to be routed via a single member for the sake of cluster stability and data integrity. Fortunately that member is dynamically assigned in Coherence, so it is not a single point of failure, but it may still become a single point of bottleneck (until the cluster finishes its reconfiguration, at which point this member will have a similar load to the rest of the members). The most common cause of scaling issues in large clusters is disabling multicast (by configuring well-known addresses, aka WKA). This obviously impacts network usage, but it also has a large impact on CPU usage, primarily since the senior member must directly communicate certain messages with every other cluster member, and this communication requires significant CPU time. In particular, the need to notify the rest of the cluster about membership changes and corresponding partition reassignments adds stress to the senior member. Given that portions of the network stack may tend to be single-threaded (both in Coherence and the underlying OS), this may be even more problematic on servers with poor single-threaded performance. As a result of this, some extremely large clusters may be configured with a smaller number of partitions than ideal. This results in the size of each partition being increased. When a cache server fails, the other servers will use their fractional backups to recover the state of that server (and take over responsibility for their backed-up portion of that state). The finest granularity of this recovery is a single partition, and the single service thread can not accept new requests during this recovery. Ordinarily, recovery is practically instantaneous (it is roughly equivalent to the time required to iterate over a set of backup backing map entries and move them to the primary backing map in the same JVM). But certain factors can increase this duration drastically (to several seconds): large partitions, sufficiently slow single-threaded CPU performance, many or expensive indexes to rebuild, etc. The solution of course is to mitigate each of those factors but in many cases this may be challenging. Larger clusters also lead to the temptation to place more load on the available hardware resources, spreading CPU resources thin. As an example, while we've long been aware of how garbage collection can cause significant pauses, it usually isn't viewed as a major consumer of CPU (in terms of overall system throughput). Typically, the use of a concurrent collector allows greater responsiveness by minimizing pause times, at the cost of reducing system throughput. However, at a recent engagement, we were forced to turn off the concurrent collector and use a traditional parallel "stop the world" collector to reduce CPU usage to an acceptable level. In summary, there are some less obvious factors that may result in excessive CPU consumption in a larger cluster, so it is even more critical to test at full scale, even though allocating sufficient hardware may often be much more difficult for these large clusters.

    Read the article

  • Are there any off the shelf solutions for feature use analysis?

    - by Riviera
    I write a set of productivity tools that sells online and have tens of thousands of users. While we do get very good feedback, this tens to come from only the most vocal users, so we fear we might be missing the big picture. We would like to know if there is any off the shelf (or nearly so) solution to capture usage of different features and to report usage patterns and trends over time. Note: These tools are native apps, not web-based. I know about Google Analytics and the like. They're great, but I'm looking for native code solutions.

    Read the article

  • Is there extensible structured file analyzer, like network analysis tools?

    - by ???
    There are many network analysis tools like Wireshark, Sniffer Pro, Omnipeak which can dump the packet data in structured manner. I'm just writing my own file analyzer for general purpose, which can dump JPEG, PNG, EXE, ELF, ASN.1 DER encoded files, etc. in tree style. There are so many file formats in the world that I can't handle them all. So I'm wondering if there's some software already there, with pluggable architecture and a large established file format repository?

    Read the article

  • How to obtain the first cluster of the directory's data in FAT using C# (or at least C++) and Win32A

    - by DarkWalker
    So I have a FAT drive, lets say H: and a directory 'work' (full path 'H:\work'). I need to get the NUMBER of the first cluster of that directory. The number of the first cluster is 2-bytes value, that is stored in the 26th and 27th bytes of the folder enty (wich is 32 bytes). Lets say I am doing it with file, NOT a directory. I can use code like this: static public string GetDirectoryPtr(string dir) { IntPtr ptr = CreateFile(@"H:\Work\dover.docx", GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, IntPtr.Zero, OPEN_EXISTING, 0,//FILE_FLAG_BACKUP_SEMANTICS, IntPtr.Zero); try { const uint bytesToRead = 2; byte[] readbuffer = new byte[bytesToRead]; if (ptr.ToInt32() == -1) return String.Format("Error: cannot open direcotory {0}", dir); if (SetFilePointer(ptr, 26, 0, 0) == -1) return String.Format("Error: unable to set file pointer on file {0}", ptr); uint read = 0; // real count of read bytes if (!ReadFile(ptr, readbuffer, bytesToRead, out read, 0)) return String.Format("cant read from file {0}. Error #{1}", ptr, Marshal.GetLastWin32Error()); int result = readbuffer[0] + 16 * 16 * readbuffer[1]; return result.ToString();//ASCIIEncoding.ASCII.GetString(readbuffer); } finally { CloseHandle(ptr); } } And it will return some number, like 19 (quite real to me, this is the only file on the disk). But I DONT need a file, I need a folder. So I am puttin FILE_FLAG_BACKUP_SEMANTICS param for CreateFile call... and dont know what to do next =) msdn is very clear on this issue http://msdn.microsoft.com/en-us/library/aa365258(v=VS.85).aspx It sounds to me like: "There is no way you can get a number of the folder's first cluster". The most desperate thing is that my tutor said smth like "You are going to obtain this or you wont pass this course". The true reason why he is so sure this is possible is because for 10 years (or may be more) he recieved the folder's first cluster number as a HASH of the folder's addres (and I was stupid enough to point this to him, so now I cant do it the same way) PS: This is the most spupid task I have ever had!!! This value is not really used anythere in program, it is only fcking pointless integer.

    Read the article

  • How to find and fix performance problems in ORM powered applications

    - by FransBouma
    Once in a while we get requests about how to fix performance problems with our framework. As it comes down to following the same steps and looking into the same things every single time, I decided to write a blogpost about it instead, so more people can learn from this and solve performance problems in their O/R mapper powered applications. In some parts it's focused on LLBLGen Pro but it's also usable for other O/R mapping frameworks, as the vast majority of performance problems in O/R mapper powered applications are not specific for a certain O/R mapper framework. Too often, the developer looks at the wrong part of the application, trying to fix what isn't a problem in that part, and getting frustrated that 'things are so slow with <insert your favorite framework X here>'. I'm in the O/R mapper business for a long time now (almost 10 years, full time) and as it's a small world, we O/R mapper developers know almost all tricks to pull off by now: we all know what to do to make task ABC faster and what compromises (because there are almost always compromises) to deal with if we decide to make ABC faster that way. Some O/R mapper frameworks are faster in X, others in Y, but you can be sure the difference is mainly a result of a compromise some developers are willing to deal with and others aren't. That's why the O/R mapper frameworks on the market today are different in many ways, even though they all fetch and save entities from and to a database. I'm not suggesting there's no room for improvement in today's O/R mapper frameworks, there always is, but it's not a matter of 'the slowness of the application is caused by the O/R mapper' anymore. Perhaps query generation can be optimized a bit here, row materialization can be optimized a bit there, but it's mainly coming down to milliseconds. Still worth it if you're a framework developer, but it's not much compared to the time spend inside databases and in user code: if a complete fetch takes 40ms or 50ms (from call to entity object collection), it won't make a difference for your application as that 10ms difference won't be noticed. That's why it's very important to find the real locations of the problems so developers can fix them properly and don't get frustrated because their quest to get a fast, performing application failed. Performance tuning basics and rules Finding and fixing performance problems in any application is a strict procedure with four prescribed steps: isolate, analyze, interpret and fix, in that order. It's key that you don't skip a step nor make assumptions: these steps help you find the reason of a problem which seems to be there, and how to fix it or leave it as-is. Skipping a step, or when you assume things will be bad/slow without doing analysis will lead to the path of premature optimization and won't actually solve your problems, only create new ones. The most important rule of finding and fixing performance problems in software is that you have to understand what 'performance problem' actually means. Most developers will say "when a piece of software / code is slow, you have a performance problem". But is that actually the case? If I write a Linq query which will aggregate, group and sort 5 million rows from several tables to produce a resultset of 10 rows, it might take more than a couple of milliseconds before that resultset is ready to be consumed by other logic. If I solely look at the Linq query, the code consuming the resultset of the 10 rows and then look at the time it takes to complete the whole procedure, it will appear to me to be slow: all that time taken to produce and consume 10 rows? But if you look closer, if you analyze and interpret the situation, you'll see it does a tremendous amount of work, and in that light it might even be extremely fast. With every performance problem you encounter, always do realize that what you're trying to solve is perhaps not a technical problem at all, but a perception problem. The second most important rule you have to understand is based on the old saying "Penny wise, Pound Foolish": the part which takes e.g. 5% of the total time T for a given task isn't worth optimizing if you have another part which takes a much larger part of the total time T for that same given task. Optimizing parts which are relatively insignificant for the total time taken is not going to bring you better results overall, even if you totally optimize that part away. This is the core reason why analysis of the complete set of application parts which participate in a given task is key to being successful in solving performance problems: No analysis -> no problem -> no solution. One warning up front: hunting for performance will always include making compromises. Fast software can be made maintainable, but if you want to squeeze as much performance out of your software, you will inevitably be faced with the dilemma of compromising one or more from the group {readability, maintainability, features} for the extra performance you think you'll gain. It's then up to you to decide whether it's worth it. In almost all cases it's not. The reason for this is simple: the vast majority of performance problems can be solved by implementing the proper algorithms, the ones with proven Big O-characteristics so you know the performance you'll get plus you know the algorithm will work. The time taken by the algorithm implementing code is inevitable: you already implemented the best algorithm. You might find some optimizations on the technical level but in general these are minor. Let's look at the four steps to see how they guide us through the quest to find and fix performance problems. Isolate The first thing you need to do is to isolate the areas in your application which are assumed to be slow. For example, if your application is a web application and a given page is taking several seconds or even minutes to load, it's a good candidate to check out. It's important to start with the isolate step because it allows you to focus on a single code path per area with a clear begin and end and ignore the rest. The rest of the steps are taken per identified problematic area. Keep in mind that isolation focuses on tasks in an application, not code snippets. A task is something that's started in your application by either another task or the user, or another program, and has a beginning and an end. You can see a task as a piece of functionality offered by your application.  Analyze Once you've determined the problem areas, you have to perform analysis on the code paths of each area, to see where the performance problems occur and which areas are not the problem. This is a multi-layered effort: an application which uses an O/R mapper typically consists of multiple parts: there's likely some kind of interface (web, webservice, windows etc.), a part which controls the interface and business logic, the O/R mapper part and the RDBMS, all connected with either a network or inter-process connections provided by the OS or other means. Each of these parts, including the connectivity plumbing, eat up a part of the total time it takes to complete a task, e.g. load a webpage with all orders of a given customer X. To understand which parts participate in the task / area we're investigating and how much they contribute to the total time taken to complete the task, analysis of each participating task is essential. Start with the code you wrote which starts the task, analyze the code and track the path it follows through your application. What does the code do along the way, verify whether it's correct or not. Analyze whether you have implemented the right algorithms in your code for this particular area. Remember we're looking at one area at a time, which means we're ignoring all other code paths, just the code path of the current problematic area, from begin to end and back. Don't dig in and start optimizing at the code level just yet. We're just analyzing. If your analysis reveals big architectural stupidity, it's perhaps a good idea to rethink the architecture at this point. For the rest, we're analyzing which means we collect data about what could be wrong, for each participating part of the complete application. Reviewing the code you wrote is a good tool to get deeper understanding of what is going on for a given task but ultimately it lacks precision and overview what really happens: humans aren't good code interpreters, computers are. We therefore need to utilize tools to get deeper understanding about which parts contribute how much time to the total task, triggered by which other parts and for example how many times are they called. There are two different kind of tools which are necessary: .NET profilers and O/R mapper / RDBMS profilers. .NET profiling .NET profilers (e.g. dotTrace by JetBrains or Ants by Red Gate software) show exactly which pieces of code are called, how many times they're called, and the time it took to run that piece of code, at the method level and sometimes even at the line level. The .NET profilers are essential tools for understanding whether the time taken to complete a given task / area in your application is consumed by .NET code, where exactly in your code, the path to that code, how many times that code was called by other code and thus reveals where hotspots are located: the areas where a solution can be found. Importantly, they also reveal which areas can be left alone: remember our penny wise pound foolish saying: if a profiler reveals that a group of methods are fast, or don't contribute much to the total time taken for a given task, ignore them. Even if the code in them is perhaps complex and looks like a candidate for optimization: you can work all day on that, it won't matter.  As we're focusing on a single area of the application, it's best to start profiling right before you actually activate the task/area. Most .NET profilers support this by starting the application without starting the profiling procedure just yet. You navigate to the particular part which is slow, start profiling in the profiler, in your application you perform the actions which are considered slow, and afterwards you get a snapshot in the profiler. The snapshot contains the data collected by the profiler during the slow action, so most data is produced by code in the area to investigate. This is important, because it allows you to stay focused on a single area. O/R mapper and RDBMS profiling .NET profilers give you a good insight in the .NET side of things, but not in the RDBMS side of the application. As this article is about O/R mapper powered applications, we're also looking at databases, and the software making it possible to consume the database in your application: the O/R mapper. To understand which parts of the O/R mapper and database participate how much to the total time taken for task T, we need different tools. There are two kind of tools focusing on O/R mappers and database performance profiling: O/R mapper profilers and RDBMS profilers. For O/R mapper profilers, you can look at LLBLGen Prof by hibernating rhinos or the Linq to Sql/LLBLGen Pro profiler by Huagati. Hibernating rhinos also have profilers for other O/R mappers like NHibernate (NHProf) and Entity Framework (EFProf) and work the same as LLBLGen Prof. For RDBMS profilers, you have to look whether the RDBMS vendor has a profiler. For example for SQL Server, the profiler is shipped with SQL Server, for Oracle it's build into the RDBMS, however there are also 3rd party tools. Which tool you're using isn't really important, what's important is that you get insight in which queries are executed during the task / area we're currently focused on and how long they took. Here, the O/R mapper profilers have an advantage as they collect the time it took to execute the query from the application's perspective so they also collect the time it took to transport data across the network. This is important because a query which returns a massive resultset or a resultset with large blob/clob/ntext/image fields takes more time to get transported across the network than a small resultset and a database profiler doesn't take this into account most of the time. Another tool to use in this case, which is more low level and not all O/R mappers support it (though LLBLGen Pro and NHibernate as well do) is tracing: most O/R mappers offer some form of tracing or logging system which you can use to collect the SQL generated and executed and often also other activity behind the scenes. While tracing can produce a tremendous amount of data in some cases, it also gives insight in what's going on. Interpret After we've completed the analysis step it's time to look at the data we've collected. We've done code reviews to see whether we've done anything stupid and which parts actually take place and if the proper algorithms have been implemented. We've done .NET profiling to see which parts are choke points and how much time they contribute to the total time taken to complete the task we're investigating. We've performed O/R mapper profiling and RDBMS profiling to see which queries were executed during the task, how many queries were generated and executed and how long they took to complete, including network transportation. All this data reveals two things: which parts are big contributors to the total time taken and which parts are irrelevant. Both aspects are very important. The parts which are irrelevant (i.e. don't contribute significantly to the total time taken) can be ignored from now on, we won't look at them. The parts which contribute a lot to the total time taken are important to look at. We now have to first look at the .NET profiler results, to see whether the time taken is consumed in our own code, in .NET framework code, in the O/R mapper itself or somewhere else. For example if most of the time is consumed by DbCommand.ExecuteReader, the time it took to complete the task is depending on the time the data is fetched from the database. If there was just 1 query executed, according to tracing or O/R mapper profilers / RDBMS profilers, check whether that query is optimal, uses indexes or has to deal with a lot of data. Interpret means that you follow the path from begin to end through the data collected and determine where, along the path, the most time is contributed. It also means that you have to check whether this was expected or is totally unexpected. My previous example of the 10 row resultset of a query which groups millions of rows will likely reveal that a long time is spend inside the database and almost no time is spend in the .NET code, meaning the RDBMS part contributes the most to the total time taken, the rest is compared to that time, irrelevant. Considering the vastness of the source data set, it's expected this will take some time. However, does it need tweaking? Perhaps all possible tweaks are already in place. In the interpret step you then have to decide that further action in this area is necessary or not, based on what the analysis results show: if the analysis results were unexpected and in the area where the most time is contributed to the total time taken is room for improvement, action should be taken. If not, you can only accept the situation and move on. In all cases, document your decision together with the analysis you've done. If you decide that the perceived performance problem is actually expected due to the nature of the task performed, it's essential that in the future when someone else looks at the application and starts asking questions you can answer them properly and new analysis is only necessary if situations changed. Fix After interpreting the analysis results you've concluded that some areas need adjustment. This is the fix step: you're actively correcting the performance problem with proper action targeted at the real cause. In many cases related to O/R mapper powered applications it means you'll use different features of the O/R mapper to achieve the same goal, or apply optimizations at the RDBMS level. It could also mean you apply caching inside your application (compromise memory consumption over performance) to avoid unnecessary re-querying data and re-consuming the results. After applying a change, it's key you re-do the analysis and interpretation steps: compare the results and expectations with what you had before, to see whether your actions had any effect or whether it moved the problem to a different part of the application. Don't fall into the trap to do partly analysis: do the full analysis again: .NET profiling and O/R mapper / RDBMS profiling. It might very well be that the changes you've made make one part faster but another part significantly slower, in such a way that the overall problem hasn't changed at all. Performance tuning is dealing with compromises and making choices: to use one feature over the other, to accept a higher memory footprint, to go away from the strict-OO path and execute queries directly onto the RDBMS, these are choices and compromises which will cross your path if you want to fix performance problems with respect to O/R mappers or data-access and databases in general. In most cases it's not a big issue: alternatives are often good choices too and the compromises aren't that hard to deal with. What is important is that you document why you made a choice, a compromise: which analysis data, which interpretation led you to the choice made. This is key for good maintainability in the years to come. Most common performance problems with O/R mappers Below is an incomplete list of common performance problems related to data-access / O/R mappers / RDBMS code. It will help you with fixing the hotspots you found in the interpretation step. SELECT N+1: (Lazy-loading specific). Lazy loading triggered performance bottlenecks. Consider a list of Orders bound to a grid. You have a Field mapped onto a related field in Order, Customer.CompanyName. Showing this column in the grid will make the grid fetch (indirectly) for each row the Customer row. This means you'll get for the single list not 1 query (for the orders) but 1+(the number of orders shown) queries. To solve this: use eager loading using a prefetch path to fetch the customers with the orders. SELECT N+1 is easy to spot with an O/R mapper profiler or RDBMS profiler: if you see a lot of identical queries executed at once, you have this problem. Prefetch paths using many path nodes or sorting, or limiting. Eager loading problem. Prefetch paths can help with performance, but as 1 query is fetched per node, it can be the number of data fetched in a child node is bigger than you think. Also consider that data in every node is merged on the client within the parent. This is fast, but it also can take some time if you fetch massive amounts of entities. If you keep fetches small, you can use tuning parameters like the ParameterizedPrefetchPathThreshold setting to get more optimal queries. Deep inheritance hierarchies of type Target Per Entity/Type. If you use inheritance of type Target per Entity / Type (each type in the inheritance hierarchy is mapped onto its own table/view), fetches will join subtype- and supertype tables in many cases, which can lead to a lot of performance problems if the hierarchy has many types. With this problem, keep inheritance to a minimum if possible, or switch to a hierarchy of type Target Per Hierarchy, which means all entities in the inheritance hierarchy are mapped onto the same table/view. Of course this has its own set of drawbacks, but it's a compromise you might want to take. Fetching massive amounts of data by fetching large lists of entities. LLBLGen Pro supports paging (and limiting the # of rows returned), which is often key to process through large sets of data. Use paging on the RDBMS if possible (so a query is executed which returns only the rows in the page requested). When using paging in a web application, be sure that you switch server-side paging on on the datasourcecontrol used. In this case, paging on the grid alone is not enough: this can lead to fetching a lot of data which is then loaded into the grid and paged there. Keep note that analyzing queries for paging could lead to the false assumption that paging doesn't occur, e.g. when the query contains a field of type ntext/image/clob/blob and DISTINCT can't be applied while it should have (e.g. due to a join): the datareader will do DISTINCT filtering on the client. this is a little slower but it does perform paging functionality on the data-reader so it won't fetch all rows even if the query suggests it does. Fetch massive amounts of data because blob/clob/ntext/image fields aren't excluded. LLBLGen Pro supports field exclusion for queries. You can exclude fields (also in prefetch paths) per query to avoid fetching all fields of an entity, e.g. when you don't need them for the logic consuming the resultset. Excluding fields can greatly reduce the amount of time spend on data-transport across the network. Use this optimization if you see that there's a big difference between query execution time on the RDBMS and the time reported by the .NET profiler for the ExecuteReader method call. Doing client-side aggregates/scalar calculations by consuming a lot of data. If possible, try to formulate a scalar query or group by query using the projection system or GetScalar functionality of LLBLGen Pro to do data consumption on the RDBMS server. It's far more efficient to process data on the RDBMS server than to first load it all in memory, then traverse the data in-memory to calculate a value. Using .ToList() constructs inside linq queries. It might be you use .ToList() somewhere in a Linq query which makes the query be run partially in-memory. Example: var q = from c in metaData.Customers.ToList() where c.Country=="Norway" select c; This will actually fetch all customers in-memory and do an in-memory filtering, as the linq query is defined on an IEnumerable<T>, and not on the IQueryable<T>. Linq is nice, but it can often be a bit unclear where some parts of a Linq query might run. Fetching all entities to delete into memory first. To delete a set of entities it's rather inefficient to first fetch them all into memory and then delete them one by one. It's more efficient to execute a DELETE FROM ... WHERE query on the database directly to delete the entities in one go. LLBLGen Pro supports this feature, and so do some other O/R mappers. It's not always possible to do this operation in the context of an O/R mapper however: if an O/R mapper relies on a cache, these kind of operations are likely not supported because they make it impossible to track whether an entity is actually removed from the DB and thus can be removed from the cache. Fetching all entities to update with an expression into memory first. Similar to the previous point: it is more efficient to update a set of entities directly with a single UPDATE query using an expression instead of fetching the entities into memory first and then updating the entities in a loop, and afterwards saving them. It might however be a compromise you don't want to take as it is working around the idea of having an object graph in memory which is manipulated and instead makes the code fully aware there's a RDBMS somewhere. Conclusion Performance tuning is almost always about compromises and making choices. It's also about knowing where to look and how the systems in play behave and should behave. The four steps I provided should help you stay focused on the real problem and lead you towards the solution. Knowing how to optimally use the systems participating in your own code (.NET framework, O/R mapper, RDBMS, network/services) is key for success as well as knowing what's going on inside the application you built. I hope you'll find this guide useful in tracking down performance problems and dealing with them in a useful way.  

    Read the article

  • Big Data – Is Big Data Relevant to me? – Big Data Questionnaires – Guest Post by Vinod Kumar

    - by Pinal Dave
    This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions of SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. I think the series from Pinal is a good one for anyone planning to start on Big Data journey from the basics. In my daily customer interactions this buzz of “Big Data” always comes up, I react generally saying – “Sir, do you really have a ‘Big Data’ problem or do you have a big Data problem?” Generally, there is a silence in the air when I ask this question. Data is everywhere in organizations – be it big data, small data, all data and for few it is bad data which is same as no data :). Wow, don’t discount me as someone who opposes “Big Data”, I am a big supporter as much as I am a critic of the abuse of this term by the people. In this post, I wanted to let my mind flow so that you can also think in the direction I want you to see these concepts. In any case, this is not an exhaustive dump of what is in my mind – but you will surely get the drift how I am going to question Big Data terms from customers!!! Is Big Data Relevant to me? Many of my customers talk to me like blank whiteboard with no idea – “why Big Data”. They want to jump into the bandwagon of technology and they want to decipher insights from their unexplored data a.k.a. unstructured data with structured data. So what are these industry scenario’s that come to mind? Here are some of them: Financials Fraud detection: Banks and Credit cards are monitoring your spending habits on real-time basis. Customer Segmentation: applies in every industry from Banking to Retail to Aviation to Utility and others where they deal with end customer who consume their products and services. Customer Sentiment Analysis: Responding to negative brand perception on social or amplify the positive perception. Sales and Marketing Campaign: Understand the impact and get closer to customer delight. Call Center Analysis: attempt to take unstructured voice recordings and analyze them for content and sentiment. Medical Reduce Re-admissions: How to build a proactive follow-up engagements with patients. Patient Monitoring: How to track Inpatient, Out-Patient, Emergency Visits, Intensive Care Units etc. Preventive Care: Disease identification and Risk stratification is a very crucial business function for medical. Claims fraud detection: There is no precise dollars that one can put here, but this is a big thing for the medical field. Retail Customer Sentiment Analysis, Customer Care Centers, Campaign Management. Supply Chain Analysis: Every sensors and RFID data can be tracked for warehouse space optimization. Location based marketing: Based on where a check-in happens retail stores can be optimize their marketing. Telecom Price optimization and Plans, Finding Customer churn, Customer loyalty programs Call Detail Record (CDR) Analysis, Network optimizations, User Location analysis Customer Behavior Analysis Insurance Fraud Detection & Analysis, Pricing based on customer Sentiment Analysis, Loyalty Management Agents Analysis, Customer Value Management This list can go on to other areas like Utility, Manufacturing, Travel, ITES etc. So as you can see, there are obviously interesting use cases for each of these industry verticals. These are just representative list. Where to start? A lot of times I try to quiz customers on a number of dimensions before starting a Big Data conversation. Are you getting the data you need the way you want it and in a timely manner? Can you get in and analyze the data you need? How quickly is IT to respond to your BI Requests? How easily can you get at the data that you need to run your business/department/project? How are you currently measuring your business? Can you get the data you need to react WITHIN THE QUARTER to impact behaviors to meet your numbers or is it always “rear-view mirror?” How are you measuring: The Brand Customer Sentiment Your Competition Your Pricing Your performance Supply Chain Efficiencies Predictive product / service positioning What are your key challenges of driving collaboration across your global business?  What the challenges in innovation? What challenges are you facing in getting more information out of your data? Note: Garbage-in is Garbage-out. Hold good for all reporting / analytics requirements Big Data POCs? A number of customers get into the realm of setting a small team to work on Big Data – well it is a great start from an understanding point of view, but I tend to ask a number of other questions to such customers. Some of these common questions are: To what degree is your advanced analytics (natural language processing, sentiment analysis, predictive analytics and classification) paired with your Big Data’s efforts? Do you have dedicated resources exploring the possibilities of advanced analytics in Big Data for your business line? Do you plan to employ machine learning technology while doing Advanced Analytics? How is Social Media being monitored in your organization? What is your ability to scale in terms of storage and processing power? Do you have a system in place to sort incoming data in near real time by potential value, data quality, and use frequency? Do you use event-driven architecture to manage incoming data? Do you have specialized data services that can accommodate different formats, security, and the management requirements of multiple data sources? Is your organization currently using or considering in-memory analytics? To what degree are you able to correlate data from your Big Data infrastructure with that from your enterprise data warehouse? Have you extended the role of Data Stewards to include ownership of big data components? Do you prioritize data quality based on the source system (that is Facebook/Twitter data has lower quality thresholds than radio frequency identification (RFID) for a tracking system)? Do your retention policies consider the different legal responsibilities for storing Big Data for a specific amount of time? Do Data Scientists work in close collaboration with Data Stewards to ensure data quality? How is access to attributes of Big Data being given out in the organization? Are roles related to Big Data (Advanced Analyst, Data Scientist) clearly defined? How involved is risk management in the Big Data governance process? Is there a set of documented policies regarding Big Data governance? Is there an enforcement mechanism or approach to ensure that policies are followed? Who is the key sponsor for your Big Data governance program? (The CIO is best) Do you have defined policies surrounding the use of social media data for potential employees and customers, as well as the use of customer Geo-location data? How accessible are complex analytic routines to your user base? What is the level of involvement with outside vendors and third parties in regard to the planning and execution of Big Data projects? What programming technologies are utilized by your data warehouse/BI staff when working with Big Data? These are some of the important questions I ask each customer who is actively evaluating Big Data trends for their organizations. These questions give you a sense of direction where to start, what to use, how to secure, how to analyze and more. Sign off Any Big data is analysis is incomplete without a compelling story. The best way to understand this is to watch Hans Rosling – Gapminder (2:17 to 6:06) videos about the third world myths. Don’t get overwhelmed with the Big Data buzz word, the destination to what your data speaks is important. In this blog post, we did not particularly look at any Big Data technologies. This is a set of questionnaire one needs to keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make sense?  Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • What are the ways to build a failover cluster?

    - by light
    I have a task where I need to build a failover cluster in two cases: first with servers on Red Hat Enterprise 5.1 and second with SUSE Linux Enterprise 11 SP1. Both cases have SAN. I know there are many ways to build failover cluster, but I can’t find out more, so I need next: The ways to build it? I know only virtualization. Any good book or resource to broad my mind? I’ll be glad to hear any suggestion. Thanks! EDIT #1: failover of servers with bussiness application on it. EDIT #2: will be great to hear summary about solutions with SLES servers? EDIT #3: So if I understand correctly, in my cases the main ways are to use internal solutions or virtualization. So now I have additional questions: Does manufacturer of blades provide some solution? For example HP or IBM. (Without virtualization) Do I need additional server to control "heartbeat" between main and redundant servers? (Virtualization) For example I have several physical servers with VMs. Do I need additional server to control availability of VMs and to move VMs to another physical server in the case their physical server failure? Sorry for my poor English. EDIT #4: Failover of VM or OS on physical server. In both cases will be used SAN , it's not specified, but I think with file system image on it. I started to think that my question is stupid and I need to remake it.

    Read the article

  • Tiff Analyzer

    - by Kevin
    I am writing a program to convert some data, mainly a bunch of Tiff images. Some of the Tiffs seems to have a minor problem with them. They show up fine in some viewers (Irfanview, client's old system) but not in others (Client's new system, Window's picture and fax viewer). I have manually looked at the binary data and all the tags seem ok. Can anyone recommend an app that can analyze it and tell me what, if anything, is wrong with it? Also, for clarity sake, I'm only converting the data about the images which is stored seperately in a database and copying the images, I'm not editting the images myself, so I'm pretty sure I'm not messing them up. UDPATE: For anyone interested, here are the tags from a good and bad file: BAD Tag Type Length Value 256 Image Width SHORT 1 1652 257 Image Length SHORT 1 704 258 Bits Per Sample SHORT 1 1 259 Compression SHORT 1 4 262 Photometric SHORT 1 0 266 Fill Order SHORT 1 1 273 Strip Offsets LONG 1 210 (d2 Hex) 274 Orientation SHORT 1 3 277 Samples Per Pixel SHORT 1 1 278 Rows Per Strip SHORT 1 450 279 Strip Byte Counts LONG 1 7264 (1c60 Hex) 282 X Resolution RATIONAL 1 <194 200 / 1 = 200.000 283 Y Resolution RATIONAL 1 <202 200 / 1 = 200.000 284 Planar Configuration SHORT 1 1 296 Resolution Unit SHORT 1 2 Good Tag Type Length Value 254 New Subfile Type LONG 1 0 (0 Hex) 256 Image Width SHORT 1 1193 257 Image Length SHORT 1 788 258 Bits Per Sample SHORT 1 1 259 Compression SHORT 1 4 262 Photometric SHORT 1 0 266 Fill Order SHORT 1 1 270 Image Description ASCII 45 256 273 Strip Offsets LONG 1 1118 (45e Hex) 274 Orientation SHORT 1 1 277 Samples Per Pixel SHORT 1 1 278 Rows Per Strip LONG 1 788 (314 Hex) 279 Strip Byte Counts LONG 1 496 (1f0 Hex) 280 Min Sample Value SHORT 1 0 281 Max Sample Value SHORT 1 1 282 X Resolution RATIONAL 1 <301 200 / 1 = 200.000 283 Y Resolution RATIONAL 1 <309 200 / 1 = 200.000 284 Planar Configuration SHORT 1 1 293 Group 4 Options LONG 1 0 (0 Hex) 296 Resolution Unit SHORT 1 2

    Read the article

  • Prestashop compared to Zen-Cart and osCommerce

    - by Viet
    I'm considering Prestashop for a new project. It seems to be younger than Zen-Cart and osCommerce. Since I just found it by Google, I'd like to gather comments and experience and comparison of Prestashop to established "brands" like Zen-Cart and osCommerce

    Read the article

< Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48  | Next Page >