Search Results

Search found 11362 results on 455 pages for 'big o analysis'.

Page 6/455 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • SQL Server Analysis Services, DNS, AD, Kerberos, Connection Issues

    - by ScaleOvenStove
    Running into a very weird issue. Converting servers to Windows 2008/SQL 2008. Have a server, SERVER_A, brand new, setup with Win2k8,Sql2k8 - works. Have a Server SERVER_B, running Windows2003/SQL2005. I want to migrate from SERVER_B to SERVER_A. I have all db's, cubes, etc setup on SERVER_A and it is mimicking functionality. Since users are using Excel to connect to SSAS, they connection string has SERVER_B in it. What I want to do, is, change DNS on the network to point SERVER_B (by name) at the ip of SERVER_A. I have successfully done this with another server, SERVER_C, but I need to do it with SERVER_B. What I have found is that with SERVER_C, after changing DNS, had to remove SERVER_C from AD and then it worked. I could connect to SERVER_C (DB), SERVER_C (SSAS Default Instance) and SERVER_C (SSAS Named instance) and it all was actually connecting to SERVER_A I tried to do the same with with SERVER_B, and no luck. Changed DNS, removed from AD, and it wouldn't connect. Found out that there were some SPN's in AD set up, so removed those and tried again. I then could connect to SERVER_B (DB), SERVER_B (SSAS Named Instance), but not SERVER_B (SSAS Default Instance). I could connect to SERVER_B (SSAS Default Intance WITH the Port #), but I need to be able to connect without the port number. I am at a loss to as why I can't connect to the default instance without a port #. Not sure if it is SPN's in AD, or another AD issue, or something else. Pretty sure it isnt something on the server (because SERVER_C works!) Any insight or suggestions would be greatly helpful!!

    Read the article

  • apache log analysis tools for multiple virtual hosts?

    - by shreddd
    I am interested in trying to get a side by side usage comparison of all the virtual hosts being served up by my apache server. In the simplest case, I want to see a list (or bar chart) with each virtual host and the number of requests/traffic on that site. I've been playing around with webalyzer and awstats but I haven't been able to compare multiple virtual hosts in the same infographic. Anyone have any suggestions on tools for doing this (or how I might use the above tools to do so)?

    Read the article

  • Text tagging/analysis tool for Mac

    - by Mark Porter
    I'm a doctoral student doing research in the humanities. As part of my research I have gathered together a lot of interview text. To analyse this data I want to be able to easily tag sections of text with keywords (the tags need to be able to overlap, and perhaps be organised hierarchically) and later be able to collate those sections from across multiple files. I need to be able to do this on a Mac. It feels like a simple task but I can't find any software for doing it that isn't either horribly clunky or a massive overkill worth hundreds of pounds. Is there any good software for doing this, or are there any good ways of doing it with other software?

    Read the article

  • Running a simple integration scenario using the Oracle Big Data Connectors on Hadoop/HDFS cluster

    - by hamsun
    Between the elephant ( the tradional image of the Hadoop framework) and the Oracle Iron Man (Big Data..) an english setter could be seen as the link to the right data Data, Data, Data, we are living in a world where data technology based on popular applications , search engines, Webservers, rich sms messages, email clients, weather forecasts and so on, have a predominant role in our life. More and more technologies are used to analyze/track our behavior, try to detect patterns, to propose us "the best/right user experience" from the Google Ad services, to Telco companies or large consumer sites (like Amazon:) ). The more we use all these technologies, the more we generate data, and thus there is a need of huge data marts and specific hardware/software servers (as the Exadata servers) in order to treat/analyze/understand the trends and offer new services to the users. Some of these "data feeds" are raw, unstructured data, and cannot be processed effectively by normal SQL queries. Large scale distributed processing was an emerging infrastructure need and the solution seemed to be the "collocation of compute nodes with the data", which in turn leaded to MapReduce parallel patterns and the development of the Hadoop framework, which is based on MapReduce and a distributed file system (HDFS) that runs on larger clusters of rather inexpensive servers. Several Oracle products are using the distributed / aggregation pattern for data calculation ( Coherence, NoSql, times ten ) so once that you are familiar with one of these technologies, lets says with coherence aggregators, you will find the whole Hadoop, MapReduce concept very similar. Oracle Big Data Appliance is based on the Cloudera Distribution (CDH), and the Oracle Big Data Connectors can be plugged on a Hadoop cluster running the CDH distribution or equivalent Hadoop clusters. In this paper, a "lab like" implementation of this concept is done on a single Linux X64 server, running an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0, and a single node Apache hadoop-1.2.1 HDFS cluster, using the SQL connector for HDFS. The whole setup is fairly simple: Install on a Linux x64 server ( or virtual box appliance) an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 server Get the Apache Hadoop distribution from: http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-1.2.1. Get the Oracle Big Data Connectors from: http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html?ssSourceSiteId=ocomen. Check the java version of your Linux server with the command: java -version java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Decompress the hadoop hadoop-1.2.1.tar.gz file to /u01/hadoop-1.2.1 Modify your .bash_profile export HADOOP_HOME=/u01/hadoop-1.2.1 export PATH=$PATH:$HADOOP_HOME/bin export HIVE_HOME=/u01/hive-0.11.0 export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin (also see my sample .bash_profile) Set up ssh trust for Hadoop process, this is a mandatory step, in our case we have to establish a "local trust" as will are using a single node configuration copy the new public keys to the list of authorized keys connect and test the ssh setup to your localhost: We will run a "pseudo-Hadoop cluster", in what is called "local standalone mode", all the Hadoop java components are running in one Java process, this is enough for our demo purposes. We need to "fine tune" some Hadoop configuration files, we have to go at our $HADOOP_HOME/conf, and modify the files: core-site.xml hdfs-site.xml mapred-site.xml check that the hadoop binaries are referenced correctly from the command line by executing: hadoop -version As Hadoop is managing our "clustered HDFS" file system we have to create "the mount point" and format it , the mount point will be declared to core-site.xml as: The layout under the /u01/hadoop-1.2.1/data will be created and used by other hadoop components (MapReduce = /mapred/...) HDFS is using the /dfs/... layout structure format the HDFS hadoop file system: Start the java components for the HDFS system As an additional check, you can use the GUI Hadoop browsers to check the content of your HDFS configurations: Once our HDFS Hadoop setup is done you can use the HDFS file system to store data ( big data : )), and plug them back and forth to Oracle Databases by the means of the Big Data Connectors ( which is the next configuration step). You can create / use a Hive db, but in our case we will make a simple integration of "raw data" , through the creation of an External Table to a local Oracle instance ( on the same Linux box, we run the Hadoop HDFS one node cluster and one Oracle DB). Download some public "big data", I use the site: http://france.meteofrance.com/france/observations, from where I can get *.csv files for my big data simulations :). Here is the data layout of my example file: Download the Big Data Connector from the OTN (oraosch-2.2.0.zip), unzip it to your local file system (see picture below) Modify your environment in order to access the connector libraries , and make the following test: [oracle@dg1 bin]$./hdfs_stream Usage: hdfs_stream locationFile [oracle@dg1 bin]$ Load the data to the Hadoop hdfs file system: hadoop fs -mkdir bgtest_data hadoop fs -put obsFrance.txt bgtest_data/obsFrance.txt hadoop fs -ls /user/oracle/bgtest_data/obsFrance.txt [oracle@dg1 bg-data-raw]$ hadoop fs -ls /user/oracle/bgtest_data/obsFrance.txt Found 1 items -rw-r--r-- 1 oracle supergroup 54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt [oracle@dg1 bg-data-raw]$hadoop fs -ls hdfs:///user/oracle/bgtest_data/obsFrance.txt Found 1 items -rw-r--r-- 1 oracle supergroup 54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt Check the content of the HDFS with the browser UI: Start the Oracle database, and run the following script in order to create the Oracle database user, the Oracle directories for the Oracle Big Data Connector (dg1 it’s my own db id replace accordingly yours): #!/bin/bash export ORAENV_ASK=NO export ORACLE_SID=dg1 . oraenv sqlplus /nolog <<EOF CONNECT / AS sysdba; CREATE OR REPLACE DIRECTORY osch_bin_path AS '/u01/orahdfs-2.2.0/bin'; CREATE USER BGUSER IDENTIFIED BY oracle; GRANT CREATE SESSION, CREATE TABLE TO BGUSER; GRANT EXECUTE ON sys.utl_file TO BGUSER; GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO BGUSER; CREATE OR REPLACE DIRECTORY BGT_LOG_DIR as '/u01/BG_TEST/logs'; GRANT READ, WRITE ON DIRECTORY BGT_LOG_DIR to BGUSER; CREATE OR REPLACE DIRECTORY BGT_DATA_DIR as '/u01/BG_TEST/data'; GRANT READ, WRITE ON DIRECTORY BGT_DATA_DIR to BGUSER; EOF Put the following in a file named t3.sh and make it executable, hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -D oracle.hadoop.exttab.tableName=BGTEST_DP_XTAB \ -D oracle.hadoop.exttab.defaultDirectory=BGT_DATA_DIR \ -D oracle.hadoop.exttab.dataPaths="hdfs:///user/oracle/bgtest_data/obsFrance.txt" \ -D oracle.hadoop.exttab.columnCount=7 \ -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/dg1 \ -D oracle.hadoop.connection.user=BGUSER \ -D oracle.hadoop.exttab.printStackTrace=true \ -createTable --noexecute then test the creation fo the external table with it: [oracle@dg1 samples]$ ./t3.sh ./t3.sh: line 2: /u01/orahdfs-2.2.0: Is a directory Oracle SQL Connector for HDFS Release 2.2.0 - Production Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved. Enter Database Password:] The create table command was not executed. The following table would be created. CREATE TABLE "BGUSER"."BGTEST_DP_XTAB" ( "C1" VARCHAR2(4000), "C2" VARCHAR2(4000), "C3" VARCHAR2(4000), "C4" VARCHAR2(4000), "C5" VARCHAR2(4000), "C6" VARCHAR2(4000), "C7" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "BGT_DATA_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 STRING SIZES ARE IN CHARACTERS PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "C1" CHAR(4000), "C2" CHAR(4000), "C3" CHAR(4000), "C4" CHAR(4000), "C5" CHAR(4000), "C6" CHAR(4000), "C7" CHAR(4000) ) ) LOCATION ( 'osch-20131022081035-74-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files would be created. osch-20131022081035-74-1 contains 1 URI, 54103 bytes 54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt Then remove the --noexecute flag and create the external Oracle table for the Hadoop data. Check the results: The create table command succeeded. CREATE TABLE "BGUSER"."BGTEST_DP_XTAB" ( "C1" VARCHAR2(4000), "C2" VARCHAR2(4000), "C3" VARCHAR2(4000), "C4" VARCHAR2(4000), "C5" VARCHAR2(4000), "C6" VARCHAR2(4000), "C7" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "BGT_DATA_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 STRING SIZES ARE IN CHARACTERS PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "C1" CHAR(4000), "C2" CHAR(4000), "C3" CHAR(4000), "C4" CHAR(4000), "C5" CHAR(4000), "C6" CHAR(4000), "C7" CHAR(4000) ) ) LOCATION ( 'osch-20131022081719-3239-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files were created. osch-20131022081719-3239-1 contains 1 URI, 54103 bytes 54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt This is the view from the SQL Developer: and finally the number of lines in the oracle table, imported from our Hadoop HDFS cluster SQL select count(*) from "BGUSER"."BGTEST_DP_XTAB"; COUNT(*) ---------- 1151 In a next post we will integrate data from a Hive database, and try some ODI integrations with the ODI Big Data connector. Our simplistic approach is just a step to show you how these unstructured data world can be integrated to Oracle infrastructure. Hadoop, BigData, NoSql are great technologies, they are widely used and Oracle is offering a large integration infrastructure based on these services. Oracle University presents a complete curriculum on all the Oracle related technologies: NoSQL: Introduction to Oracle NoSQL Database Using Oracle NoSQL Database Big Data: Introduction to Big Data Oracle Big Data Essentials Oracle Big Data Overview Oracle Data Integrator: Oracle Data Integrator 12c: New Features Oracle Data Integrator 11g: Integration and Administration Oracle Data Integrator: Administration and Development Oracle Data Integrator 11g: Advanced Integration and Development Oracle Coherence 12c: Oracle Coherence 12c: New Features Oracle Coherence 12c: Share and Manage Data in Clusters Oracle Coherence 12c: Oracle GoldenGate 11g: Fundamentals for Oracle Oracle GoldenGate 11g: Fundamentals for SQL Server Oracle GoldenGate 11g Fundamentals for Oracle Oracle GoldenGate 11g Fundamentals for DB2 Oracle GoldenGate 11g Fundamentals for Teradata Oracle GoldenGate 11g Fundamentals for HP NonStop Oracle GoldenGate 11g Management Pack: Overview Oracle GoldenGate 11g Troubleshooting and Tuning Oracle GoldenGate 11g: Advanced Configuration for Oracle Other Resources: Apache Hadoop : http://hadoop.apache.org/ is the homepage for these technologies. "Hadoop Definitive Guide 3rdEdition" by Tom White is a classical lecture for people who want to know more about Hadoop , and some active "googling " will also give you some more references. About the author: Eugene Simos is based in France and joined Oracle through the BEA-Weblogic Acquisition, where he worked for the Professional Service, Support, end Education for major accounts across the EMEA Region. He worked in the banking sector, ATT, Telco companies giving him extensive experience on production environments. Eugen currently specializes in Oracle Fusion Middleware teaching an array of courses on Weblogic/Webcenter, Content,BPM /SOA/Identity-Security/GoldenGate/Virtualisation/Unified Comm Suite) throughout the EMEA region.

    Read the article

  • Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It  supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big-O of PHP functions?

    - by Kendall Hopkins
    After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes. //very slow for large $prime_array $prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... ); $result_array = array(); foreach( $array_of_number => $number ) { $result_array[$number] = in_array( $number, $large_prime_array ); } //still decent performance for large $prime_array $prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL, 11 => NULL, 13 => NULL, .... 104729 => NULL, ... ); foreach( $array_of_number => $number ) { $result_array[$number] = array_key_exists( $number, $large_prime_array ); } This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(logn)). So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question... I was wondering if there was a list of the theoretical (or practical) big O times for all* the PHP built in functions. *or at least the interesting ones For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

    Read the article

  • List of Big-O for PHP functions?

    - by Kendall Hopkins
    After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes. //very slow for large $prime_array $prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... ); $result_array = array(); foreach( $array_of_number => $number ) { $result_array[$number] = in_array( $number, $large_prime_array ); } //still decent performance for large $prime_array $prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL, 11 => NULL, 13 => NULL, .... 104729 => NULL, ... ); foreach( $array_of_number => $number ) { $result_array[$number] = array_key_exists( $number, $large_prime_array ); } This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(logn)). So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question... I was wondering if there was a list of the theoretical (or practical) big O times for all* the PHP built in functions. *or at least the interesting ones For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

    Read the article

  • Using Emacs for big big projects

    - by ignatius
    Hello, Maybe is a often repeated question here, but i can't find anything similar with the search. The point is that i like to use Emacs for my personal projects, usually very small applications using C or python, but i was wondering how to use it also for my work, in which we have project with about 10k files of source code, so is veeeery big (actually i am using source insight, that is very nice tool, but only for windows), questions are: Searching: Which is the most convenient way to search a string within the whole project? Navigating throught the function: I mean something like putting the cursor over a function, define, var, and going to the definition Refactoring Also if you have any experience with this and want to share your thoughts i will consider it highly interesting. Br

    Read the article

  • The Business case for Big Data

    - by jasonw
    The Business Case for Big Data Part 1 What's the Big Deal Okay, so a new buzz word is emerging. It's gone beyond just a buzzword now, and I think it is going to change the landscape of retail, financial services, healthcare....everything. Let me spend a moment to talk about what i'm going to talk about. Massive amounts of data are being collected every second, more than ever imaginable, and the size of this data is more than can be practically managed by today’s current strategies and technologies. There is a revolution at hand centering on this groundswell of data and it will change how we execute our businesses through greater efficiencies, new revenue discovery and even enable innovation. It is the revolution of Big Data. This is more than just a new buzzword is being tossed around technology circles.This blog series for Big Data will explain this new wave of technology and provide a roadmap for businesses to take advantage of this growing trend. Cases for Big Data There is a growing list of use cases for big data. We naturally think of Marketing as the low hanging fruit. Many projects look to analyze twitter feeds to find new ways to do marketing. I think of a great example from a TED speech that I recently saw on data visualization from Facebook from my masters studies at University of Virginia. We can see when the most likely time for breaks-ups occurs by looking at status changes and updates on users Walls. This is the intersection of Big Data, Analytics and traditional structured data. Ted Video Marketers can use this to sell more stuff. I really like the following piece on looking at twitter feeds to measure mood. The following company was bought by a hedge fund. They could predict how the S&P was going to do within three days at an 85% accuracy. Link to the article Here we see a convergence of predictive analytics and Big Data. So, we'll look at a lot of these business cases and start talking about what this means for the business. It's more than just finding ways to use Hadoop + NoSql and we'll talk about that too. How do I start in Big Data? That's what is coming next post.

    Read the article

  • Forbes Article on Big Data and Java Embedded Technology

    - by hinkmond
    Whoa, cool! Forbes magazine has an online article about what I've been blogging about all this time: Big Data and Java Embedded Technology, tying it all together with a big bow, connecting small devices to the data center. See: Billions of Java Embedded Devices Here's a quote: By the end of the decade we could see tens of billions of new Internet-connected devices... with billions of Internet- connected devices generating Big Data, are the next big thing. ... That’s why Oracle has put together an ecosystem of solutions for this new, Big Data-oriented device-to-data center world: secure, powerful, and adaptable embedded Java for intelligent devices, integrated middleware... This is the next big thing. Java SE Embedded Technology is something to watch for in the new year. Start developing for it now to get a head-start... Hinkmond

    Read the article

  • Unlock the Value of Big Data

    - by Mike.Hallett(at)Oracle-BI&EPM
    Partners should read this comprehensive new e-book to get advice from Oracle and industry leaders on how you can use big data to generate new business insights and make better decisions for your customers. “Big data represents an opportunity averaging 14% of current revenue.” —From the Oracle big data e-book, Meeting the Challenge of Big Data You’ll gain instant access to: Straightforward approaches for acquiring, organizing, and analyzing data Architectures and tools needed to integrate new data with your existing investments Survey data revealing how leading companies are using big data, so you can benchmark your progress Expert resources such as white papers, analyst videos, 3-D demos, and more If you want to be ready for the data deluge, Meeting the Challenge of Big Data is a must-read. Register today for the e-book and read it on your computer or Apple iPad.  

    Read the article

  • Should static analysis warnings fail the CI build?

    - by Cara
    Our team is investigating various options for static analysis in our project, and have mixed opinions about whether we want our Continuous Integration build to fail because of warnings from static analysis. The argument against failing the build is that there are often exceptions to the rules, and attempting to work around them just to make the build succeed reduces productivity. A better approach would be to generate reports with the build, and regularly dedicate developer time to addressing the reported issues. The counter-argument is that it is easy for the technical debt to build up if the bugs are not addressed immediately. Also, if the build fails when a potential bug is introduced, the amount of time required to fix it is reduced. What are your thoughts?

    Read the article

  • Configuring Team System Code Analysis via a FxCop rules file

    - by Ian G
    Is there anyway to configure the code analysis rules in Visual Studio Team System to match those in an FxCop configuration file and keep them in sync automatically? Not all the developers on the team have TS so keeping the rules we are currently running in an FxCop file is required so everyone can run the same set, but it would nice for those with to be able to run them in the IDE. We're introducing static analysis to an existing project so turning on everything now isn't a useful option. (We are not using Foundation Server for source control, if that makes any difference.)

    Read the article

  • Static source code analysis with LLVM

    - by Phong
    I recently discover the LLVM (low level virtual machine) project, and from what I have heard It can be used to performed static analysis on a source code. I would like to know if it is possible to extract the different function call through function pointer (find the caller function and the callee function) in a program. I could find the kind of information in the website so it would be really helpful if you could tell me if such an library already exist in LLVM or can you point me to the good direction on how to build it myself (existing source code, reference, tutorial, example...). EDIT: With my analysis I actually want to extract caller/callee function call. In the case of a function pointer, I would like to return a set of possible callee. both caller and callee must be define in the source code (this does not include third party function in a library).

    Read the article

  • <= vs < when proving big-o notation

    - by user600197
    We just started learning big-o in class. I understand the general concept that f(x) is big-o of g(x) if there exists two constants c,k such that for all xk |f(x)|<=c|g(x)|. I had a question whether or not it is required that we include the <= to sign or whether it is just sufficient to put the < sign? For example: suppose f(x)=17x+11 and we are to prove that this is O(x^2). Then if we take c=28 and xk=1 we know that 17x+11<=28x^2. So since we know that x will always be greater than 1 this implies that 28x^2 will always be greater than 17x+11. So, do we really need to include the equal sign (<=) or is it okay if we just write (<)? Thanks in advance.

    Read the article

  • android spectrum analysis of streaming input

    - by TheBeeKeeper
    for a school project I am trying to make an android application that, once started, will perform a spectrum analysis of live audio received from the microphone or a bluetooth headset. I know I should be using FFT, and have been looking at moonblink's open source audio analyzer ( http://code.google.com/p/moonblink/wiki/Audalyzer ) but am not familiar with android development, and his code is turning out to be too difficult for me to work with. So I suppose my questions are, are there any easier java based, or open source android apps that do spectrum analysis I can reference? Or is there any helpful information that can be given, such as; steps that need be taken to get the microphone input, put it into an fft algorithm, then display a graph of frequency and pitch over time from its output? Any help would be appreciated, thanks.

    Read the article

  • Simple Big O with lg(n) proof

    - by halohunter
    I'm attempting to guess and prove the Big O for: f(n) = n^3 - 7n^2 + nlg(n) + 10 I guess that big O is n^3 as it is the term with the largest order of growth However, I'm having trouble proving it. My unsuccesful attempt follows: f(n) <= cg(n) f(n) <= n^3 - 7n^2 + nlg(n) + 10 <= cn^3 f(n) <= n^3 + (n^3)*lg(n) + 10n^3 <= cn^3 f(n) <= N^3(11 + lg(n)) <= cn^3 so 11 + lg(n) = c But this can't be right because c must be constant. What am I doing wrong?

    Read the article

  • Principles of Big Data By Jules J Berman, O&rsquo;Reilly Media Book Review

    - by Compudicted
    Originally posted on: http://geekswithblogs.net/Compudicted/archive/2013/11/04/principles-of-big-data-by-jules-j-berman-orsquoreilly-media.aspx A fantastic book! Must be part, if not yet, of the fundamentals of the Big Data as a field of science. Highly recommend to those who are into the Big Data practice. Yet, I confess this book is one of my best reads this year and for a number of reasons: The book is full of wisdom, intimate insight, historical facts and real life examples to how Big Data projects get conceived, operate and sadly, yes, sometimes die. But not only that, the book is most importantly is filled with valuable advice, accurate and even overwhelming amount of reference (from the positive side), and the author does not event stop there: there are numerous technical excerpts, links and examples allowing to quickly accomplish many daunting tasks or make you aware of what one needs to perform as a data practitioner (excuse my use of the word practitioner, I just did not find a better substitute to it to trying to reference all who face Big Data). Be aware that Jules Berman’s background is in medicine, naturally, this book discusses this subject a lot as it is very dear to the author’s heart I believe, this does not make this book any less significant however, quite the opposite, I trust if there is an area in science or practice where the biggest benefits can be ripped from Big Data projects it is indeed the medical science, let’s make Cancer history! On a personal note, for me as a database, BI professional it has helped to understand better the motives behind Big Data initiatives, their underwater rivers and high altitude winds that divert or propel them forward. Additionally, I was impressed by the depth and number of mining algorithms covered in it. I must tell this made me very curious and tempting to find out more about these indispensable attributes of Big Data so sure I will be trying stretching my wallet to acquire several books that go more in depth on several most popular of them. My favorite parts of the book, well, all of them actually, but especially chapter 9: Analysis, it is just very close to my heart. But the real reason is it let me see what I do with data from a different angle. And then the next - “Special Considerations”, they are just two logical parts. The writing language is of this book is very acceptable for all levels, I had no technical problem reading it in ebook format on my 8” tablet or a large screen monitor. If I would be asked to say at least something negative I have to state I had a feeling initially that the book’s first part reads like an academic material relaxing the reader as the book progresses forward. I admit I am impressed with Jules’ abilities to use several programming languages and OSS tools, bravo! And I agree, it is not too, too hard to grasp at least the principals of a modern programming language, which seems becomes a defacto knowledge standard item for any modern human being. So grab a copy of this book, read it end to end and make yourself shielded from making mistakes at any stage of your Big Data initiative, by the way this book also helps build better future Big Data projects. Disclaimer: I received a free electronic copy of this book as part of the O'Reilly Blogger Program.

    Read the article

  • Oracle Big Data Software Downloads

    - by Mike.Hallett(at)Oracle-BI&EPM
    Companies have been making business decisions for decades based on transactional data stored in relational databases. Beyond that critical data, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Oracle offers a broad integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data Connectors Downloads here, includes: Oracle SQL Connector for Hadoop Distributed File System Release 2.1.0 Oracle Loader for Hadoop Release 2.1.0 Oracle Data Integrator Companion 11g Oracle R Connector for Hadoop v 2.1 Oracle Big Data Documentation The Oracle Big Data solution offers an integrated portfolio of products to help you organize and analyze your diverse data sources alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data, Release 2.2.0 - E41604_01 zip (27.4 MB) Integrated Software and Big Data Connectors User's Guide HTML PDF Oracle Data Integrator (ODI) Application Adapter for Hadoop Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases. Typical processing in Hadoop includes data validation and transformations that are programmed as MapReduce jobs. Designing and implementing a MapReduce job usually requires expert programming knowledge. However, when you use Oracle Data Integrator with the Application Adapter for Hadoop, you do not need to write MapReduce jobs. Oracle Data Integrator uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), the application adapter provides the following capabilities: Loading data into Hadoop from the local file system and HDFS Performing validation and transformation of data within Hadoop Loading processed data from Hadoop to an Oracle database for further processing and generating reports Oracle Database Loader for Hadoop Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. It pre-partitions the data if necessary and transforms it into a database-ready format. Oracle Loader for Hadoop is a Java MapReduce application that balances the data across reducers to help maximize performance. Oracle R Connector for Hadoop Oracle R Connector for Hadoop is a collection of R packages that provide: Interfaces to work with Hive tables, the Apache Hadoop compute infrastructure, the local R environment, and Oracle database tables Predictive analytic techniques, written in R or Java as Hadoop MapReduce jobs, that can be applied to data in HDFS files You install and load this package as you would any other R package. Using simple R functions, you can perform tasks such as: Access and transform HDFS data using a Hive-enabled transparency layer Use the R language for writing mappers and reducers Copy data between R memory, the local file system, HDFS, Hive, and Oracle databases Schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations Oracle SQL Connector for Hadoop Distributed File System Using Oracle SQL Connector for HDFS, you can use an Oracle Database to access and analyze data residing in Hadoop in these formats: Data Pump files in HDFS Delimited text files in HDFS Hive tables For other file formats, such as JSON files, you can stage the input in Hive tables before using Oracle SQL Connector for HDFS. Oracle SQL Connector for HDFS uses external tables to provide Oracle Database with read access to Hive tables, and to delimited text files and Data Pump files in HDFS. Related Documentation Cloudera's Distribution Including Apache Hadoop Library HTML Oracle R Enterprise HTML Oracle NoSQL Database HTML Recent Blog Posts Big Data Appliance vs. DIY Price Comparison Big Data: Architecture Overview Big Data: Achieve the Impossible in Real-Time Big Data: Vertical Behavioral Analytics Big Data: In-Memory MapReduce Flume and Hive for Log Analytics Building Workflows in Oozie

    Read the article

  • Rawr Code Clone Analysis&ndash;Part 0

    - by Dylan Smith
    Code Clone Analysis is a cool new feature in Visual Studio 11 (vNext).  It analyzes all the code in your solution and attempts to identify blocks of code that are similar, and thus candidates for refactoring to eliminate the duplication.  The power lies in the fact that the blocks of code don't need to be identical for Code Clone to identify them, it will report Exact, Strong, Medium and Weak matches indicating how similar the blocks of code in question are.   People that know me know that I'm anal enthusiastic about both writing clean code, and taking old crappy code and making it suck less. So the possibilities for this feature have me pretty excited if it works well - and thats a big if that I'm hoping to explore over the next few blog posts. I'm going to grab the Rawr source code from CodePlex (a World Of Warcraft gear calculator engine program), run Code Clone Analysis against it, then go through the results one-by-one and refactor where appropriate blogging along the way.  My goals with this blog series are twofold: Evaluate and demonstrate Code Clone Analysis Provide some concrete examples of refactoring code to eliminate duplication and improve the code-base Here are the initial results:   Code Clone Analysis has found: 129 Exact Matches 201 Strong Matches 300 Medium Matches 193 Weak Matches Also indicated is that there was a total of 45,181 potentially duplicated lines of code that could be eliminated through refactoring.  Considering the entire solution only has 109,763 lines of code, if true, the duplicates lines of code number is pretty significant. In the next post we’ll start examining some of the individual results and determine if they really do indicate a potential refactoring.

    Read the article

  • Marshalling a big-endian byte collection into a struct in order to pull out values

    - by Pat
    There is an insightful question about reading a C/C++ data structure in C# from a byte array, but I cannot get the code to work for my collection of big-endian (network byte order) bytes. (EDIT: Note that my real struct has more than just one field.) Is there a way to marshal the bytes into a big-endian version of the structure and then pull out the values in the endianness of the framework (that of the host, which is usually little-endian)? This should summarize what I'm looking for (LE=LittleEndian, BE=BigEndian): void Main() { var leBytes = new byte[] {1, 0}; var beBytes = new byte[] {0, 1}; Foo fooLe = ByteArrayToStructure<Foo>(leBytes); Foo fooBe = ByteArrayToStructureBigEndian<Foo>(beBytes); Assert.AreEqual(fooLe, fooBe); } [StructLayout(LayoutKind.Explicit, Size=2)] public struct Foo { [FieldOffset(0)] public ushort firstUshort; } T ByteArrayToStructure<T>(byte[] bytes) where T: struct { GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned); T stuff = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(),typeof(T)); handle.Free(); return stuff; } T ByteArrayToStructureBigEndian<T>(byte[] bytes) where T: struct { ??? }

    Read the article

  • Convert a raw string to an array of big-endian words with Ruby

    - by Zag zag..
    Hello, I would like to convert a raw string to an array of big-endian words. As example, here is a JavaScript function that do it well (by Paul Johnston): /* * Convert a raw string to an array of big-endian words * Characters >255 have their high-byte silently ignored. */ function rstr2binb(input) { var output = Array(input.length >> 2); for(var i = 0; i < output.length; i++) output[i] = 0; for(var i = 0; i < input.length * 8; i += 8) output[i>>5] |= (input.charCodeAt(i / 8) & 0xFF) << (24 - i % 32); return output; } I believe the Ruby equivalent can be String#unpack(format). However, I don't know what should be the correct format parameter. Thank you for any help. Regards

    Read the article

  • The Sensemaking Spectrum for Business Analytics: Translating from Data to Business Through Analysis

    - by Joe Lamantia
    One of the most compelling outcomes of our strategic research efforts over the past several years is a growing vocabulary that articulates our cumulative understanding of the deep structure of the domains of discovery and business analytics. Modes are one example of the deep structure we’ve found.  After looking at discovery activities across a very wide range of industries, question types, business needs, and problem solving approaches, we've identified distinct and recurring kinds of sensemaking activity, independent of context.  We label these activities Modes: Explore, compare, and comprehend are three of the nine recognizable modes.  Modes describe *how* people go about realizing insights.  (Read more about the programmatic research and formal academic grounding and discussion of the modes here: https://www.researchgate.net/publication/235971352_A_Taxonomy_of_Enterprise_Search_and_Discovery) By analogy to languages, modes are the 'verbs' of discovery activity.  When applied to the practical questions of product strategy and development, the modes of discovery allow one to identify what kinds of analytical activity a product, platform, or solution needs to support across a spread of usage scenarios, and then make concrete and well-informed decisions about every aspect of the solution, from high-level capabilities, to which specific types of information visualizations better enable these scenarios for the types of data users will analyze. The modes are a powerful generative tool for product making, but if you've spent time with young children, or had a really bad hangover (or both at the same time...), you understand the difficult of communicating using only verbs.  So I'm happy to share that we've found traction on another facet of the deep structure of discovery and business analytics.  Continuing the language analogy, we've identified some of the ‘nouns’ in the language of discovery: specifically, the consistently recurring aspects of a business that people are looking for insight into.  We call these discovery Subjects, since they identify *what* people focus on during discovery efforts, rather than *how* they go about discovery as with the Modes. Defining the collection of Subjects people repeatedly focus on allows us to understand and articulate sense making needs and activity in more specific, consistent, and complete fashion.  In combination with the Modes, we can use Subjects to concretely identify and define scenarios that describe people’s analytical needs and goals.  For example, a scenario such as ‘Explore [a Mode] the attrition rates [a Measure, one type of Subject] of our largest customers [Entities, another type of Subject] clearly captures the nature of the activity — exploration of trends vs. deep analysis of underlying factors — and the central focus — attrition rates for customers above a certain set of size criteria — from which follow many of the specifics needed to address this scenario in terms of data, analytical tools, and methods. We can also use Subjects to translate effectively between the different perspectives that shape discovery efforts, reducing ambiguity and increasing impact on both sides the perspective divide.  For example, from the language of business, which often motivates analytical work by asking questions in business terms, to the perspective of analysis.  The question posed to a Data Scientist or analyst may be something like “Why are sales of our new kinds of potato chips to our largest customers fluctuating unexpectedly this year?” or “Where can innovate, by expanding our product portfolio to meet unmet needs?”.  Analysts translate questions and beliefs like these into one or more empirical discovery efforts that more formally and granularly indicate the plan, methods, tools, and desired outcomes of analysis.  From the perspective of analysis this second question might become, “Which customer needs of type ‘A', identified and measured in terms of ‘B’, that are not directly or indirectly addressed by any of our current products, offer 'X' potential for ‘Y' positive return on the investment ‘Z' required to launch a new offering, in time frame ‘W’?  And how do these compare to each other?”.  Translation also happens from the perspective of analysis to the perspective of data; in terms of availability, quality, completeness, format, volume, etc. By implication, we are proposing that most working organizations — small and large, for profit and non-profit, domestic and international, and in the majority of industries — can be described for analytical purposes using this collection of Subjects.  This is a bold claim, but simplified articulation of complexity is one of the primary goals of sensemaking frameworks such as this one.  (And, yes, this is in fact a framework for making sense of sensemaking as a category of activity - but we’re not considering the recursive aspects of this exercise at the moment.) Compellingly, we can place the collection of subjects on a single continuum — we call it the Sensemaking Spectrum — that simply and coherently illustrates some of the most important relationships between the different types of Subjects, and also illuminates several of the fundamental dynamics shaping business analytics as a domain.  As a corollary, the Sensemaking Spectrum also suggests innovation opportunities for products and services related to business analytics. The first illustration below shows Subjects arrayed along the Sensemaking Spectrum; the second illustration presents examples of each kind of Subject.  Subjects appear in colors ranging from blue to reddish-orange, reflecting their place along the Spectrum, which indicates whether a Subject addresses more the viewpoint of systems and data (Data centric and blue), or people (User centric and orange).  This axis is shown explicitly above the Spectrum.  Annotations suggest how Subjects align with the three significant perspectives of Data, Analysis, and Business that shape business analytics activity.  This rendering makes explicit the translation and bridging function of Analysts as a role, and analysis as an activity. Subjects are best understood as fuzzy categories [http://georgelakoff.files.wordpress.com/2011/01/hedges-a-study-in-meaning-criteria-and-the-logic-of-fuzzy-concepts-journal-of-philosophical-logic-2-lakoff-19731.pdf], rather than tightly defined buckets.  For each Subject, we suggest some of the most common examples: Entities may be physical things such as named products, or locations (a building, or a city); they could be Concepts, such as satisfaction; or they could be Relationships between entities, such as the variety of possible connections that define linkage in social networks.  Likewise, Events may indicate a time and place in the dictionary sense; or they may be Transactions involving named entities; or take the form of Signals, such as ‘some Measure had some value at some time’ - what many enterprises understand as alerts.   The central story of the Spectrum is that though consumers of analytical insights (represented here by the Business perspective) need to work in terms of Subjects that are directly meaningful to their perspective — such as Themes, Plans, and Goals — the working realities of data (condition, structure, availability, completeness, cost) and the changing nature of most discovery efforts make direct engagement with source data in this fashion impossible.  Accordingly, business analytics as a domain is structured around the fundamental assumption that sense making depends on analytical transformation of data.  Analytical activity incrementally synthesizes more complex and larger scope Subjects from data in its starting condition, accumulating insight (and value) by moving through a progression of stages in which increasingly meaningful Subjects are iteratively synthesized from the data, and recombined with other Subjects.  The end goal of  ‘laddering’ successive transformations is to enable sense making from the business perspective, rather than the analytical perspective.Synthesis through laddering is typically accomplished by specialized Analysts using dedicated tools and methods. Beginning with some motivating question such as seeking opportunities to increase the efficiency (a Theme) of fulfillment processes to reach some level of profitability by the end of the year (Plan), Analysts will iteratively wrangle and transform source data Records, Values and Attributes into recognizable Entities, such as Products, that can be combined with Measures or other data into the Events (shipment of orders) that indicate the workings of the business.  More complex Subjects (to the right of the Spectrum) are composed of or make reference to less complex Subjects: a business Process such as Fulfillment will include Activities such as confirming, packing, and then shipping orders.  These Activities occur within or are conducted by organizational units such as teams of staff or partner firms (Networks), composed of Entities which are structured via Relationships, such as supplier and buyer.  The fulfillment process will involve other types of Entities, such as the products or services the business provides.  The success of the fulfillment process overall may be judged according to a sophisticated operating efficiency Model, which includes tiered Measures of business activity and health for the transactions and activities included.  All of this may be interpreted through an understanding of the operational domain of the businesses supply chain (a Domain).   We'll discuss the Spectrum in more depth in succeeding posts.

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >