Search Results

Search found 63386 results on 2536 pages for 'data structure'.

Page 4/2536 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Fast Data - Big Data's achilles heel

    - by thegreeneman
    At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database.  Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions.   Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity.   The software in the Fast Data solution stack involves 3 key pieces and their integration:  Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database.   All three of these technologies address a high throughput, low latency data management requirement.   Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream.    Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access.  It also has some special capabilities to deploy remote behavioral execution for "near data" processing.   The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads.  For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics.  Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis.  Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution.  The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together.   You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval.   Here is a great reference to how these two technologies are integrated and work together.  Coherence & Oracle NoSQL Database.   On the stream processing side, it is similar as with the Coherence case.  As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage.  OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream.  Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together.  OEP & Oracle NoSQL Database.

    Read the article

  • Oracle Announces Oracle Big Data Appliance X3-2 and Enhanced Oracle Big Data Connectors

    - by jgelhaus
    Enables Customers to Easily Harness the Business Value of Big Data at Lower Cost Engineered System Simplifies Big Data for the Enterprise Oracle Big Data Appliance X3-2 hardware features the latest 8-core Intel® Xeon E5-2600 series of processors, and compared with previous generation, the 18 compute and storage servers with 648 TB raw storage now offer: 33 percent more processing power with 288 CPU cores; 33 percent more memory per node with 1.1 TB of main memory; and up to a 30 percent reduction in power and cooling Oracle Big Data Appliance X3-2 further simplifies implementation and management of big data by integrating all the hardware and software required to acquire, organize and analyze big data. It includes: Support for CDH4.1 including software upgrades developed collaboratively with Cloudera to simplify NameNode High Availability in Hadoop, eliminating the single point of failure in a Hadoop cluster; Oracle NoSQL Database Community Edition 2.0, the latest version that brings better Hadoop integration, elastic scaling and new APIs, including JSON and C support; The Oracle Enterprise Manager plug-in for Big Data Appliance that complements Cloudera Manager to enable users to more easily manage a Hadoop cluster; Updated distributions of Oracle Linux and Oracle Java Development Kit; An updated distribution of open source R, optimized to work with high performance multi-threaded math libraries Read More   Data sheet: Oracle Big Data Appliance X3-2 Oracle Big Data Appliance: Datacenter Network Integration Big Data and Natural Language: Extracting Insight From Text Thomson Reuters Discusses Oracle's Big Data Platform Connectors Integrate Hadoop with Oracle Big Data Ecosystem Oracle Big Data Connectors is a suite of software built by Oracle to integrate Apache Hadoop with Oracle Database, Oracle Data Integrator, and Oracle R Distribution. Enhancements to Oracle Big Data Connectors extend these data integration capabilities. With updates to every connector, this release includes: Oracle SQL Connector for Hadoop Distributed File System, for high performance SQL queries on Hadoop data from Oracle Database, enhanced with increased automation and querying of Hive tables and now supported within the Oracle Data Integrator Application Adapter for Hadoop; Transparent access to the Hive Query language from R and introduction of new analytic techniques executing natively in Hadoop, enabling R developers to be more productive by increasing access to Hadoop in the R environment. Read More Data sheet: Oracle Big Data Connectors High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

    Read the article

  • Best approach to accessing multiple data source in a web application

    - by ced
    I've a base web application developed with .net technologies (asp.net) used into our LAN by 30 users simultanousley. From this web application I've developed two verticalization used from online users. In future i expect hundreds users simultanousley. Our company has different locations. Each site use its own database. The web application needs to retrieve information from all existing databases. Currently there are 3 database, but it's not excluded in the future expansion of new offices. My question then is: What is the best strategy for a web application to retrieve information from different databases (which have the same schema) whereas the main objective performance data access and high fault tolerance? There are case studies in the literature that I can take as an example? Do you know some good documents to study? Do you have any tips to implement this task so efficient? Intuitively I would say that two possible strategy are: perform queries from different sources in real time and aggregate data on the fly; create a repository that contains the union of the entities of interest and perform queries directly on repository;

    Read the article

  • Ideal data structure/techniques for storing generic scheduler data in C#

    - by GraemeMiller
    I am trying to implement a generic scheduler object in C# 4 which will output a table in HTML. Basic aim is to show some object along with various attributes, and whether it was doing something in a given time period. The scheduler will output a table displaying the headers: Detail Field 1 ....N| Date1.........N I want to initialise the table with a start date and an end date to create the date range (ideally could also do other time periods e.g. hours but that isn't vital). I then want to provide a generic object which will have associated events. Where an object has events within the period I want a table cell to be marked E.g. Name Height Weight 1/1/2011 2/1/2011 3/1/20011...... 31/1/2011 Ben 5.11 75 X X X Bill 5.7 83 X X So I created scheduler with Start Date=1/1/2011 and end date 31/1/2011 I'd like to give it my person object (already sorted) and tell it which fields I want displayed (Name, Height, Weight) Each person has events which have a start date and end date. Some events will start and end outwith but they should still be shown on the relevant date etc. Ideally I'd like to have been able to provide it with say a class booking object as well. So I'm trying to keep it generic. I have seen Javasript implementations etc of similar. What would a good data structure be for this? Any thoughts on techniques I could use to make it generic. I am not great with generics so any tips appreciated.

    Read the article

  • Is this project Structure Valid?

    - by rafuru
    I have a dilemma: In the university we learn to create modular software (on java), but this modularity is explained using a single project with packages (a package for business, another one for DAOS and another one for the model, oh and a last package for frontend). But in my work we use the next structure: I will try to explain: First we create a java library project where the model (entities classes) are created in a package. Next we create an EJB named DAOS and using the netbeans wizard we store the DAOS interfaces in the library project in another package , these interfaces are implemented in the DAOS bean. So the next part is the business logic, we create a business EJB for each group of functions , again using the wizard we store the interface in the java library project in another package then is implemented on the business bean. The final part (for the backend) is a bean that I have suggested: a Facade bean who will gather every method of the business beans in a single bean and this has an interface too that is created in our library project and implemented in the bean. So the next part is call the facade module on the web project. But I don't know how valid or viable is this, maybe I'm doing everything wrong and I don't even know! so I want to ask your opinion about this.

    Read the article

  • SQL SERVER – Introduction to Big Data – Guest Post

    - by pinaldave
    BIG Data – such a big word – everybody talks about this now a days. It is the word in the database world. In one of the conversation I asked my friend Jasjeet Sigh the same question – what is Big Data? He instantly came up with a very effective write-up.  Jasjeet is working as a Technical Manager with Koenig Solutions. He leads the SQL domain, and holds rich IT industry experience. Talking about Koenig, it is a 19 year old IT training company that offers several certification choices. Some of its courses include SharePoint Training, Project Management certifications, Microsoft Trainings, Business Intelligence programs, Web Design and Development courses etc. Big Data, as the name suggests, is about data that is BIG in nature. The data is BIG in terms of size, and it is difficult to manage such enormous data with relational database management systems that are quite popular these days. Big Data is not just about being large in size, it is also about the variety of the data that differs in form or type. Some examples of Big Data are given below : Scientific data related to weather and atmosphere, Genetics etc Data collected by various medical procedures, such as Radiology, CT scan, MRI etc Data related to Global Positioning System Pictures and Videos Radio Frequency Data Data that may vary very rapidly like stock exchange information Apart from difficulties in managing and storing such data, it is difficult to query, analyze and visualize it. The characteristics of Big Data can be defined by four Vs: Volume: It simply means a large volume of data that may span Petabyte, Exabyte and so on. However it also depends organization to organization that what volume of data they consider as Big Data. Variety: As discussed above, Big Data is not limited to relational information or structured Data. It can also include unstructured data like pictures, videos, text, audio etc. Velocity:  Velocity means the speed by which data changes. The higher is the velocity, the more efficient should be the system to capture and analyze the data. Missing any important point may lead to wrong analysis or may even result in loss. Veracity: It has been recently added as the fourth V, and generally means truthfulness or adherence to the truth. In terms of Big Data, it is more of a challenge than a characteristic. It is difficult to ascertain the truth out of the enormous amount of data and the one that has high velocity. There are always chances of having un-precise and uncertain data. It is a challenging task to clean such data before it is analyzed. Big Data can be considered as the next big thing in the IT sector in terms of innovation and development. If appropriate technologies are developed to analyze and use the information, it can be the driving force for almost all industrial segments. These include Retail, Manufacturing, Service, Finance, Healthcare etc. This will help them to automate business decisions, increase productivity, and innovate and develop new products. Thanks Jasjeet Singh for an excellent write up.  Jasjeet Sign is working as a Technical Manager with Koenig Solutions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Big Data

    Read the article

  • Creating a Corporate Data Hub

    - by BuckWoody
    The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a  codename for a project that ironically actually does what the codename says. In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative. Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again. Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does? Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets.     Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924  You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938

    Read the article

  • Creating a Corporate Data Hub

    - by BuckWoody
    The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a  codename for a project that ironically actually does what the codename says. In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative. Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again. Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does? Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets.     Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924  You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938

    Read the article

  • Optimal Data Structure for our own API

    - by vermiculus
    I'm in the early stages of writing an Emacs major mode for the Stack Exchange network; if you use Emacs regularly, this will benefit you in the end. In order to minimize the number of calls made to Stack Exchange's API (capped at 10000 per IP per day) and to just be a generally responsible citizen, I want to cache the information I receive from the network and store it in memory, waiting to be accessed again. I'm really stuck as to what data structure to store this information in. Obviously, it is going to be a list. However, as with any data structure, the choice must be determined by what data is being stored and what how it will be accessed. What, I would like to be able to store all of this information in a single symbol such as stack-api/cache. So, without further ado, stack-api/cache is a list of conses keyed by last update: `(<csite> <csite> <csite>) where <csite> would be (1362501715 . <site>) At this point, all we've done is define a simple association list. Of course, we must go deeper. Each <site> is a list of the API parameter (unique) followed by a list questions: `("codereview" <cquestion> <cquestion> <cquestion>) Each <cquestion> is, you guessed it, a cons of questions with their last update time: `(1362501715 <question>) (1362501720 . <question>) <question> is a cons of a question structure and a list of answers (again, consed with their last update time): `(<question-structure> <canswer> <canswer> <canswer> and ` `(1362501715 . <answer-structure>) This data structure is likely most accurately described as a tree, but I don't know if there's a better way to do this considering the language, Emacs Lisp (which isn't all that different from the Lisp you know and love at all). The explicit conses are likely unnecessary, but it helps my brain wrap around it better. I'm pretty sure a <csite>, for example, would just turn into (<epoch-time> <api-param> <cquestion> <cquestion> ...) Concerns: Does storing data in a potentially huge structure like this have any performance trade-offs for the system? I would like to avoid storing extraneous data, but I've done what I could and I don't think the dataset is that large in the first place (for normal use) since it's all just human-readable text in reasonable proportion. (I'm planning on culling old data using the times at the head of the list; each inherits its last-update time from its children and so-on down the tree. To what extent this cull should take place: I'm not sure.) Does storing data like this have any performance trade-offs for that which must use it? That is, will set and retrieve operations suffer from the size of the list? Do you have any other suggestions as to what a better structure might look like?

    Read the article

  • Data Structure Behind Amazon S3s Keys (Filtering Data Structure)

    - by dimo414
    I'd like to implement a data structure similar to the lookup functionality of Amazon S3. For those of you who don't know what I'm taking about, Amazon S3 stores all files at the root, but allows you to look up groups of files by common prefixes in their names, therefore replicating the power of a directory tree without the complexity of it. The catch is, both lookup and filter operations are O(1) (or close enough that even on very large buckets - S3's disk equivalents - both operations might as well be O(1))). So in short, I'm looking for a data structure that functions like a hash map, with the added benefit of efficient (at the very least not O(n)) filtering. The best I can come up with is extending HashMap so that it also contains a (sorted) list of contents, and doing a binary search for the range that matches the prefix, and returning that set. This seems slow to me, but I can't think of any other way to do it. Does anyone know either how Amazon does it, or a better way to implement this data structure?

    Read the article

  • Découvrir la solution d'exploration de données structuré et non structuré

    - by David lefranc
    Explorer et découvrir l’information… Nous vous proposons un atelier découverte pour vous permettre d’explorer toute type de données grâce à la solution Oracle Endeca . Quand : 7 Décembre 2012 De 9h30 à 12h30  Lieu : Oracle 15 Boulevard Charles de gaulle 92715 Colombes Pour s'inscrire : [email protected] Réalisé pour des utilisateurs métiers, cet atelier vous permettera en une demi journée , de découvrir Oracle Endeca Information Discovery afin de : Comprendre et explorer toute information venant de différents horizons ( Big Data, réseaux sociaux, forums, sondages, blogs..) Découvrir en quoi et comment OEID est un complément à des solutions de BI classiques Par une navigation simple et rapide, vous découvrirez combien il est facile de trouver des réponses à des questions imprévues en utilisant OEID sans formation préalable. Utilisez la recherche et la navigation guidée pour voir comment les informations structurées et non structurées peuvent être rapidement réunies pour dégager la valeur cachée. Explorer toutes vos données dans n'importe quel format et à partir de n'importe quelle source, y compris les médias sociaux, documents, fichiers,…. Pouvoir découvrir et explorer vos données sans référentiel pour permettre aux utilisateurs d’être autonome et d’analyser leurs propres données de manière rapide Élaborer une stratégie visant à accroître la valeur des données de l'entreprise tout en réduisant le coût total de possession Découvrez l'incroyable performance d’ Endeca sur Oracle Exalytics la machine In Memory AgendaAprès une introduction sur la solution Oracle information Endeca, suivi d’un atelier, vous verrez comment il est facile de: Utiliser la navigation guidée et le moteur de recherche pour explorer les données structurées et non structurées intégrer rapidement les nouvelles sources de données comme les médias sociaux Construire de nouvelles interfaces utilisateur tout en découvrant l’information répondre rapidement aux besoins changeants des entreprises et des environnements de données

    Read the article

  • Atelier gratuit : Découvrir la solution d'exploration de données structuré et non structuré

    - by David lefranc
    Explorer et découvrir l’information… Nous vous proposons un atelier découverte pour vous permettre d’explorer toute type de données grace à la solution Oracle Endeca Information Discovery. Quand : 7 Décembre 2012 De 9h30 à 12h30  Lieu : Oracle 15 Boulevard Charles de gaulle 92715 Colombes Pour s'inscrire : [email protected] Réalisé pour des utilisateurs métiers, cet atelier vous permettera en une demi journée , de découvrir Oracle Endeca Information Discovery afin de : Comprendre et explorer toute information venant de différents horizons ( Big Data, réseaux sociaux, forums, sondages, blogs..) Découvrir en quoi et comment OEID est un complément à des solutions de BI classiques Par une navigation simple et rapide, vous découvrirez combien il est facile de trouver des réponses à des questions imprévues en utilisant OEID sans formation préalable. Utilisez la recherche et la navigation guidée pour voir comment les informations structurées et non structurées peuvent être rapidement réunies pour dégager la valeur cachée. Explorer toutes vos données dans n'importe quel format et à partir de n'importe quelle source, y compris les médias sociaux, documents, fichiers,…. Pouvoir découvrir et explorer vos données sans référentiel pour permettre aux utilisateurs d’être autonome et d’analyser leurs propres données de manière rapide Élaborer une stratégie visant à accroître la valeur des données de l'entreprise tout en réduisant le coût total de possession Découvrez l'incroyable performance d’ Endeca sur Oracle Exalytics la machine In Memory Agenda Après une introduction sur la solution Oracle information Endeca, suivi d’un atelier, vous verrez comment il est facile de: Utiliser la navigation guidée et le moteur de recherche pour explorer les données structurées et non structurées intégrer rapidement les nouvelles sources de données comme les médias sociaux Construire de nouvelles interfaces utilisateur tout en découvrant l’information répondre rapidement aux besoins changeants des entreprises et des environnements de données Quand Lieu 7 Décembre 2012 De 9h30 à 12h30 Oracle 15 Boulevard Charles de gaulle 92715 Colombes

    Read the article

  • Accessing SQL Data Services via ADO.NET Data Service Client Library

    - by Mehmet Aras
    Is this possible? Basically I would like to use SQL Data Services REST interface and let the ADO.NET Data Service Client library handle communication details and generate the entities that I can use. I looked at the samples in February release of Azure services kit but the samples in there are using HttpWebRequest and HttpWebResponse to consume SQL Data Services RESTfully. I was hoping to use ADO.NET Data Service Client library to abstract low-level details away.

    Read the article

  • Bridging Two Worlds: Big Data and Enterprise Data

    - by Dain C. Hansen
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} The big data world is all the vogue in today’s IT conversations. It’s a world of volume, velocity, variety – tantalizing us with its untapped potential. It’s a world of transformational game-changing technologies that have already begun to alter the information management landscape. One of the reasons that big data is so compelling is that it’s a universal challenge that impacts every one of us. Whether it is healthcare, financial, manufacturing, government, retail - big data presents a pressing problem for many industries: how can so much information be processed so quickly to deliver the ‘bigger’ picture? With big data we’re tapping into new information that didn’t exist before: social data, weblogs, sensor data, complex content, and more. What also makes big data revolutionary is that it turns traditional information architecture on its head, putting into question commonly accepted notions of where and how data should be aggregated processed, analyzed, and stored. This is where Hadoop and NoSQL come in – new technologies which solve new problems for managing unstructured data. And now for some worst practices that I'd recommend that you please not follow: Worst Practice Lesson 1: Throw away everything that you already know about data management, data integration tools, and start completely over. One shouldn’t forget what’s already running in today’s IT. Today’s Business Analytics, Data Warehouses, Business Applications (ERP, CRM, SCM, HCM), and even many social, mobile, cloud applications still rely almost exclusively on structured data – or what we’d like to call enterprise data. This dilemma is what today’s IT leaders are up against: what are the best ways to bridge enterprise data with big data? And what are the best strategies for dealing with the complexities of these two unique worlds? Worst Practice Lesson 2: Throw away all of your existing business applications … because they don’t run on big data yet. Bridging the two worlds of big data and enterprise data means considering solutions that are complete, based on emerging Hadoop technologies (as well as traditional), and are poised for success through integrated design tools, integrated platforms that connect to your existing business applications, as well as and support real-time analytics. Leveraging these types of best practices translates to improved productivity, lowered TCO, IT optimization, and better business insights. Worst Practice Lesson 3: Separate out [and keep separate] your big data sandboxes from all the current enterprise IT systems. Don’t mix sand among playgrounds. We didn't tell you that you wouldn't get dirty doing this. Correlation between the two worlds is key. The real advantage to analyzing big data comes when you can correlate it with the existing data in your data warehouse or your current applications to make sense of the larger patterns. If you have not followed these worst practices 1-3 then you qualify for the first step of our journey: bridging the two worlds of enterprise data and big data. Over the next several weeks we’ll be discussing this topic along with several others around big data as it relates to data integration. We welcome you to join us in the conversation by following us on twitter on #BridgingBigData or download our latest white paper and resource kit: Big Data and Enterprise Data: Bridging Two Worlds.

    Read the article

  • SQL SERVER – Data Sources and Data Sets in Reporting Services SSRS

    - by Pinal Dave
    This example is from the Beginning SSRS by Kathi Kellenberger. Supporting files are available with a free download from the www.Joes2Pros.com web site. This example is from the Beginning SSRS. Supporting files are available with a free download from the www.Joes2Pros.com web site. Connecting to Your Data? When I was a child, the telephone book was an important part of my life. Maybe I was just a nerd, but I enjoyed getting a new book every year to page through to learn about the businesses in my small town or to discover where some of my school acquaintances lived. It was also the source of maps to my town’s neighborhoods and the towns that surrounded me. To make a phone call, I would need a telephone number. In order to find a telephone number, I had to know how to use the telephone book. That seems pretty simple, but it resembles connecting to any data. You have to know where the data is and how to interact with it. A data source is the connection information that the report uses to connect to the database. You have two choices when creating a data source, whether to embed it in the report or to make it a shared resource usable by many reports. Data Sources and Data Sets A few basic terms will make the upcoming choses make more sense. What database on what server do you want to connect to? It would be better to just ask… “what is your data source?” The connection you need to make to get your reports data is called a data source. If you connected to a data source (like the JProCo database) there may be hundreds of tables. You probably only want data from just a few tables. This means you want to write a specific query against this data source. A query on a data source to get just the records you need for an SSRS report is called a Data Set. Creating a local Data Source You can connect embed a connection from your report directly to your JProCo database which (let’s say) is installed on a server named Reno. If you move JProCo to a new server named Tampa then you need to update the Data Set. If you have 10 reports in one project that were all pointing to the JProCo database on the Reno server then they would all need to be updated at once. It’s possible to make a project level Data Source and have each report use that. This means one change can fix all 10 reports at once. This would be called a Shared Data Source. Creating a Shared Data Source The best advice I can give you is to create shared data sources. The reason I recommend this is that if a database moves to a new server you will have just one place in Report Manager to make the server name change. That one change will update the connection information in all the reports that use that data source. To get started, you will start with a fresh project. Go to Start > All Programs > SQL Server 2012 > Microsoft SQL Server Data Tools to launch SSDT. Once SSDT is running, click New Project to create a new project. Once the New Project dialog box appears, fill in the form, as shown in. Be sure to select Report Server Project this time – not the wizard. Click OK to dismiss the New Project dialog box. You should now have an empty project, as shown in the Solution Explorer. A report is meant to show you data. Where is the data? The first task is to create a Shared Data Source. Right-click on the Shared Data Sources folder and choose Add New Data Source. The Shared Data Source Properties dialog box will launch where you can fill in a name for the data source. By default, it is named DataSource1. The best practice is to give the data source a more meaningful name. It is possible that you will have projects with more than one data source and, by naming them, you can tell one from another. Type the name JProCo for the data source name and click the Edit button to configure the database connection properties. If you take a look at the types of data sources you can choose, you will see that SSRS works with many data platforms including Oracle, XML, and Teradata. Make sure SQL Server is selected before continuing. For this post, I am assuming that you are using a local SQL Server and that you can use your Windows account to log in to the SQL Server. If, for some reason you must use SQL Server Authentication, choose that option and fill in your SQL Server account credentials. Otherwise, just accept Windows Authentication. If your database server was installed locally and with the default instance, just type in Localhost for the Server name. Select the JProCo database from the database list. At this point, the connection properties should look like. If you have installed a named instance of SQL Server, you will have to specify the server name like this: Localhost\InstanceName, replacing the InstanceName with whatever your instance name is. If you are not sure about the named instance, launch the SQL Server Configuration Manager found at Start > All Programs > Microsoft SQL Server 2012 > Configuration Tools. If you have a named instance, the name will be shown in parentheses. A default instance of SQL Server will display MSSQLSERVER; a named instance will display the name chosen during installation. Once you get the connection properties filled in, click OK to dismiss the Connection Properties dialog box and OK again to dismiss the Shared Data Source properties. You now have a data source in the Solution Explorer. What’s next I really need to thank Kathi Kellenberger and Rick Morelan for sharing this material for this 5 day series of posts on SSRS. To get really comfortable with SSRS you will get to know the different SSDT windows, Build reports on your own (without the wizards),  Add report headers and footers, Accept user input,  create levels, charts, or even maps for visual appeal. You might be surprise to know a small 230 page book starts from the very beginning and covers the steps to do all these items. Beginning SSRS 2012 is a small easy to follow book so you can learn SSRS for less than $20. See Joes2Pros.com for more on this and other books. If you want to learn SSRS in easy to simple words – I strongly recommend you to get Beginning SSRS book from Joes 2 Pros. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Reporting Services, SSRS

    Read the article

  • Merging MySQL Structure and Data

    - by Shahid
    I have a MySQL database running on a deployment machine which also contains data. Then I have another MySQL database which has evolved in terms of STRUCTURE + DATA for some time. I need a way to merge the changes (ONLY) for both structure and data to the DB in deployment machine without disturbing the existing data. Does anyone know of a tool available which can do this safely. I have had a look at a few comparison tools but I need a tool which can automate the merge operation. Note also that most of the data in the tables is in BINARY so I can't use many file comparison tools. Does any one know of a solution to this? thanks

    Read the article

  • How to structure classes in the filesystem?

    - by da_b0uncer
    I have a few (view) classes. Table, Tree, PagingColumn, SelectionColumn, SparkLineColumn, TimeColumn. currently they're flat under app/view like this: app/view/Table app/view/Tree app/view/PagingColumn ... I thought about restructuring it, because the Trees and Tables use the columns, but there are some columns, which only work in a tree, some who work in trees and tables and in the future there are probably some who only work in tables, I don't know. My first idea was like this: app/view/Table app/view/Tree app/view/column/PagingColumn app/view/column/SelectionColumn app/view/column/SparkLineColumn app/view/column/TimeColumn But since the SelectionColumn is explicitly for trees, I have the fear that future developers could get the idea of missuse them. But how to restructure it probably? Like this: app/view/table/panel/Table app/view/tree/panel/Tree app/view/tree/column/PagingColumn app/view/tree/column/SelectionColumn app/view/column/SparkLineColumn app/view/column/TimeColumn Or like this: app/view/Table app/view/Tree app/view/column/SparkLineColumn app/view/column/TimeColumn app/view/column/tree/PagingColumn app/view/column/tree/SelectionColumn

    Read the article

  • Better data structure for a game like Bubble Witch

    - by CrociDB
    I'm implementing a bubble-witch-like game (http://www.king.com/games/puzzle-games/bubble-witch/), and I was thinking on what's the better way to store the "bubbles" and to work with. I thought of using graphs, but that might be too complex for a trivial thing. Thought of a matrix, just like a tile map, but that might get too 'workaroundy'. I don't know. I'll be doing in Flash/AS3, though. Thanks. :)

    Read the article

  • Direct3d - Code structure

    - by marcg11
    I'm learning directx in a master's degree and they taught us to have a GraphicsLayer class which is the one connecting with the direct3d library. That way this class is completly independent from the other classes (my game classes), meaning changing the renderer to OpenGL wouldn't require much effort but only changing the graphicLayer. This classe has it's LoadAssets, Paint methods, but I have a question, they told us to load all the assets inside this class. This means all these methods will be in the loadAssets method: D3DXCreateTextureFromFileEx(g_pD3DDevice,"tiles.png",0,0,1,0,D3DFMT_UNKNOWN,D3DPOOL_DEFAULT,D3DX_FILTER_NONE,D3DX_FILTER_NONE,NULL,NULL,NULL,&texTiles); // And more resources to load //... texTiles as you see is a LPDIRECT3DTEXTURE9 instance which is declared in the graphicLayer.h. So my question is, how do you manage all the resources? Do I have to declare in the .h all my game textures even if I'm not using them? How would you load only those resources there are in a scene and draw them in a code-strucured way?

    Read the article

  • map data structure in pacman

    - by Sam Fisher
    i am trying to make a pacman game in c# using GDI+, i have done some basic work and i have previously replicated games like copter-it and minesweeper. but i am confused about how do i implement the map in pacman, i mean which datastructure to use, so i can use it for moving AI controlled objects and check collisions with walls. i thought of a 2d array of ints but that didnt make sense to me. looking for some help. thanks.

    Read the article

  • Fast set indexing data structure for superset retrieval

    - by Asterios
    I am given a set of sets: {{a,b}, {a,b,c}, {a,c}, {a,c,f}} I would like to have a data structure to index those sets such that the following "lookup" is executed fast: find all supersets of a given set. For example, given the set {a,c} the structure would return {{a,b,c}, {a,c,f}, {a,c}} but not {a,b}. Any suggestions? Could this be done with a smart trie-like data structure storing sets after a proper sorting? This data structures is going to be queried a lot. Thus, I'm searching for a structure that might be expensive in build but rather fast to query.

    Read the article

  • PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

    - by belvoir
    Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it into a data-warehouse. How much data? At peak times we are talking approx 80-100k rows per min hitting the OLTP side, off-peak this will drop significantly to 15-20k. The most frequently updated rows are ~64 bytes each but there are various tables etc so the data is quite diverse and can range up to 4000 bytes per row. The OLTP is active 24x5.5. Best Solution? From what I can piece together the most practical solution is as follows: Create a TRIGGER to write all DML activity to a rotating CSV log file Perform whatever transformations are required Use the native DW data pump tool to efficiently pump the transformed CSV into the DW Why this approach? TRIGGERS allow selective tables to be targeted rather than being system wide + output is configurable (i.e. into a CSV) and are relatively easy to write and deploy. SLONY uses similar approach and overhead is acceptable CSV easy and fast to transform Easy to pump CSV into the DW Alternatives considered .... Using native logging (http://www.postgresql.org/docs/8.3/static/runtime-config-logging.html). Problem with this is it looked very verbose relative to what I needed and was a little trickier to parse and transform. However it could be faster as I presume there is less overhead compared to a TRIGGER. Certainly it would make the admin easier as it is system wide but again, I don't need some of the tables (some are used for persistent storage of JMS messages which I do not want to log) Querying the data directly via an ETL tool such as Talend and pumping it into the DW ... problem is the OLTP schema would need tweaked to support this and that has many negative side-effects Using a tweaked/hacked SLONY - SLONY does a good job of logging and migrating changes to a slave so the conceptual framework is there but the proposed solution just seems easier and cleaner Using the WAL Has anyone done this before? Want to share your thoughts?

    Read the article

  • Reference Data Management and Master Data: Are Relation ?

    - by Mala Narasimharajan
    Submitted By:  Rahul Kamath  Oracle Data Relationship Management (DRM) has always been extremely powerful as an Enterprise Master Data Management (MDM) solution that can help manage changes to master data in a way that influences enterprise structure, whether it be mastering chart of accounts to enable financial transformation, or revamping organization structures to drive business transformation and operational efficiencies, or restructuring sales territories to enable equitable distribution of leads to sales teams following the acquisition of new products, or adding additional cost centers to enable fine grain control over expenses. Increasingly, DRM is also being utilized by Oracle customers for reference data management, an emerging solution space that deserves some explanation. What is reference data? How does it relate to Master Data? Reference data is a close cousin of master data. While master data is challenged with problems of unique identification, may be more rapidly changing, requires consensus building across stakeholders and lends structure to business transactions, reference data is simpler, more slowly changing, but has semantic content that is used to categorize or group other information assets – including master data – and gives them contextual value. In fact, the creation of a new master data element may require new reference data to be created. For example, when a European company acquires a US business, chances are that they will now need to adapt their product line taxonomy to include a new category to describe the newly acquired US product line. Further, the cross-border transaction will also result in a revised geo hierarchy. The addition of new products represents changes to master data while changes to product categories and geo hierarchy are examples of reference data changes.1 The following table contains an illustrative list of examples of reference data by type. Reference data types may include types and codes, business taxonomies, complex relationships & cross-domain mappings or standards. Types & Codes Taxonomies Relationships / Mappings Standards Transaction Codes Industry Classification Categories and Codes, e.g., North America Industry Classification System (NAICS) Product / Segment; Product / Geo Calendars (e.g., Gregorian, Fiscal, Manufacturing, Retail, ISO8601) Lookup Tables (e.g., Gender, Marital Status, etc.) Product Categories City à State à Postal Codes Currency Codes (e.g., ISO) Status Codes Sales Territories (e.g., Geo, Industry Verticals, Named Accounts, Federal/State/Local/Defense) Customer / Market Segment; Business Unit / Channel Country Codes (e.g., ISO 3166, UN) Role Codes Market Segments Country Codes / Currency Codes / Financial Accounts Date/Time, Time Zones (e.g., ISO 8601) Domain Values Universal Standard Products and Services Classification (UNSPSC), eCl@ss International Classification of Diseases (ICD) e.g., ICD9 à IC10 mappings Tax Rates Why manage reference data? Reference data carries contextual value and meaning and therefore its use can drive business logic that helps execute a business process, create a desired application behavior or provide meaningful segmentation to analyze transaction data. Further, mapping reference data often requires human judgment. Sample Use Cases of Reference Data Management Healthcare: Diagnostic Codes The reference data challenges in the healthcare industry offer a case in point. Part of being HIPAA compliant requires medical practitioners to transition diagnosis codes from ICD-9 to ICD-10, a medical coding scheme used to classify diseases, signs and symptoms, causes, etc. The transition to ICD-10 has a significant impact on business processes, procedures, contracts, and IT systems. Since both code sets ICD-9 and ICD-10 offer diagnosis codes of very different levels of granularity, human judgment is required to map ICD-9 codes to ICD-10. The process requires collaboration and consensus building among stakeholders much in the same way as does master data management. Moreover, to build reports to understand utilization, frequency and quality of diagnoses, medical practitioners may need to “cross-walk” mappings -- either forward to ICD-10 or backwards to ICD-9 depending upon the reporting time horizon. Spend Management: Product, Service & Supplier Codes Similarly, as an enterprise looks to rationalize suppliers and leverage their spend, conforming supplier codes, as well as product and service codes requires supporting multiple classification schemes that may include industry standards (e.g., UNSPSC, eCl@ss) or enterprise taxonomies. Aberdeen Group estimates that 90% of companies rely on spreadsheets and manual reviews to aggregate, classify and analyze spend data, and that data management activities account for 12-15% of the sourcing cycle and consume 30-50% of a commodity manager’s time. Creating a common map across the extended enterprise to rationalize codes across procurement, accounts payable, general ledger, credit card, procurement card (P-card) as well as ACH and bank systems can cut sourcing costs, improve compliance, lower inventory stock, and free up talent to focus on value added tasks. Change Management: Point of Sales Transaction Codes and Product Codes In the specialty finance industry, enterprises are confronted with usury laws – governed at the state and local level – that regulate financial product innovation as it relates to consumer loans, check cashing and pawn lending. To comply, it is important to demonstrate that transactions booked at the point of sale are posted against valid product codes that were on offer at the time of booking the sale. Since new products are being released at a steady stream, it is important to ensure timely and accurate mapping of point-of-sale transaction codes with the appropriate product and GL codes to comply with the changing regulations. Multi-National Companies: Industry Classification Schemes As companies grow and expand across geographies, a typical challenge they encounter with reference data represents reconciling various versions of industry classification schemes in use across nations. While the United States, Mexico and Canada conform to the North American Industry Classification System (NAICS) standard, European Union countries choose different variants of the NACE industry classification scheme. Multi-national companies must manage the individual national NACE schemes and reconcile the differences across countries. Enterprises must invest in a reference data change management application to address the challenge of distributing reference data changes to downstream applications and assess which applications were impacted by a given change. References 1 Master Data versus Reference Data, Malcolm Chisholm, April 1, 2006.

    Read the article

  • Focus on Oracle Data Profiling and Data Quality 11g - 24/Fev/11

    - by Claudia Costa
    Thursday 24th February, 11am GMTOracle offers an integrated suite Data Quality software architected to discover and correct today's data quality problems and establish a platform prepared for tomorrow's yet unknown data challenges.Oracle Data Profiling provides data investigation, discovery, and profiling in support of quality, migration, integration, stewardship, and governance initiatives. It includes a broad range of features that expand upon basic profiling, including automated monitoring, business-rule validation, and trend analysis.Oracle Data Quality for Data Integrator provides cleansing, standardization, matching, address validation, location enrichment, and linking functions for global customer data and operational business data.It ensures that data adheres to established standards that are adaptable to fit each organization's specific needs. Both single - and double - byte data are processed in local languages to provide a unique and centralized view of customers, products and services.  During this in-person briefing, Data Integration Solution Specialists will be providing a technical overview and a walkthrough.Agenda Oracle Data Integration Strategy overview A focus on Oracle Data Profiling and Oracle Data Quality for Data Integrator: Oracle Data Profiling Oracle Data Quality for Data Integrator Live demo Q&A  This FREE online LIVE eSeminar will be delivered over the Web and Conference Call. Registrations received less than 24hours prior to start time may not receive confirmation to attend.To register click here.For any questions please contact [email protected]

    Read the article

  • Data structure for pattern matching.

    - by alvonellos
    Let's say you have an input file with many entries like these: date, ticker, open, high, low, close, <and some other values> And you want to execute a pattern matching routine on the entries(rows) in that file, using a candlestick pattern, for example. (See, Doji) And that pattern can appear on any uniform time interval (let t = 1s, 5s, 10s, 1d, 7d, 2w, 2y, and so on...). Say a pattern matching routine can take an arbitrary number of rows to perform an analysis and contain an arbitrary number of subpatterns. In other words, some patterns may require 4 entries to operate on. Say also that the routine (may) later have to find and classify extrema (local and global maxima and minima as well as inflection points) for the ticker over a closed interval, for example, you could say that a cubic function (x^3) has the extrema on the interval [-1, 1]. (See link) What would be the most natural choice in terms of a data structure? What about an interface that conforms a Ticker object containing one row of data to a collection of Ticker so that an arbitrary pattern can be applied to the data. What's the first thing that comes to mind? I chose a doubly-linked circular linked list that has the following methods: push_front() push_back() pop_front() pop_back() [] //overloaded, can be used with negative parameters But that data structure seems very clumsy, since so much pushing and popping is going on, I have to make a deep copy of the data structure before running an analysis on it. So, I don't know if I made my question very clear -- but the main points are: What kind of data structures should be considered when analyzing sequential data points to conform to a pattern that does NOT require random access? What kind of data structures should be considered when classifying extrema of a set of data points?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >