Search Results

Search found 61623 results on 2465 pages for 'data storage'.

Page 10/2465 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17  | Next Page >

  • Internal Mutation of Persistent Data Structures

    - by Greg Ros
    To clarify, when I mean use the terms persistent and immutable on a data structure, I mean that: The state of the data structure remains unchanged for its lifetime. It always holds the same data, and the same operations always produce the same results. The data structure allows Add, Remove, and similar methods that return new objects of its kind, modified as instructed, that may or may not share some of the data of the original object. However, while a data structure may seem to the user as persistent, it may do other things under the hood. To be sure, all data structures are, internally, at least somewhere, based on mutable storage. If I were to base a persistent vector on an array, and copy it whenever Add is invoked, it would still be persistent, as long as I modify only locally created arrays. However, sometimes, you can greatly increase performance by mutating a data structure under the hood. In more, say, insidious, dangerous, and destructive ways. Ways that might leave the abstraction untouched, not letting the user know anything has changed about the data structure, but being critical in the implementation level. For example, let's say that we have a class called ArrayVector implemented using an array. Whenever you invoke Add, you get a ArrayVector build on top of a newly allocated array that has an additional item. A sequence of such updates will involve n array copies and allocations. Here is an illustration: However, let's say we implement a lazy mechanism that stores all sorts of updates -- such as Add, Set, and others in a queue. In this case, each update requires constant time (adding an item to a queue), and no array allocation is involved. When a user tries to get an item in the array, all the queued modifications are applied under the hood, requiring a single array allocation and copy (since we know exactly what data the final array will hold, and how big it will be). Future get operations will be performed on an empty cache, so they will take a single operation. But in order to implement this, we need to 'switch' or mutate the internal array to the new one, and empty the cache -- a very dangerous action. However, considering that in many circumstances (most updates are going to occur in sequence, after all), this can save a lot of time and memory, it might be worth it -- you will need to ensure exclusive access to the internal state, of course. This isn't a question about the efficacy of such a data structure. It's a more general question. Is it ever acceptable to mutate the internal state of a supposedly persistent or immutable object in destructive and dangerous ways? Does performance justify it? Would you still be able to call it immutable? Oh, and could you implement this sort of laziness without mutating the data structure in the specified fashion?

    Read the article

  • Sabre Manages Fast Data Growth with Oracle Data Integration Products

    - by Irem Radzik
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} Last year at OpenWorld we announced Sabre Holding as a winner of the Fusion Middleware Innovation Awards. The Sabre team did an excellent job at leveraging cutting edge technologies for managing rapid data growth and exponential scalability demands they have experienced in the travel industry. Today we announced the details and specific benefits of Sabre’s new real-time data integration solution in a press release. Please take a look if you haven’t seen it yet. Sabre Holdings Deploys Oracle Data Integrator and Oracle GoldenGate to Support Rapid Customer Growth There are 3 different areas of benefits Sabre achieved by using Oracle Data Integration products: Manages 7X increase in data sources for the enterprise data warehouse Reduced infrastructure complexity Decreased time to market for new products and services by 30 percent. This simply shows that using latest technologies helps the companies to innovate robust solutions against today’s key data management challenges. And the benefit of using a next generation data integration technology is not only seen in the IT operations, but also in the business side. A better data integration solution for the enterprise data warehouse delivered the platform they need to accelerate how they service their customers, improving their competitive advantage. Tomorrow I will give another great example of innovation with next generation data integration from Oracle. We will be discussing the Fusion Middleware Innovation Awards 2012 winners and their results with using Oracle’s data integration products.

    Read the article

  • Is there still a place for tape storage?

    - by Jon Ericson
    We've backed up our data on LTO tapes for years and it's a real comfort to know we have everything on tape. A sister project and one of our data providers have both moved to 100% disk storage because the cost of disk has dropped so much. When we propose systems to potential customers these days we tend to downplay or not mention our use of tape systems for data storage since it might seem outdated. I feel more comfortable with having data saved in two separate formats: disks and tape. In addition, once data is securely written to tape, I feel (perhaps naively) that it's been permanently saved. Not having to rely on a RAID controller to be able to read back data is another plus for me. Do you see a place for tape backup these days?

    Read the article

  • implementing dynamic query handler on historical data

    - by user2390183
    EDIT : Refined question to focus on the core issue Context: I have historical data about property (house) sales collected from various sources in a centralized/cloud data source (assume info collection is handled by a third party) Planning to develop an application to query and retrieve data from this centralized data source Example Queries: Simple : for given XYZ post code, what is average house price for 3 bed room house? Complex: What is estimated price for an house at "DD,Some Street,XYZ Post Code" (worked out from average values of historic data filtered by various characteristics of the house: house post code, no of bed rooms, total area, and other deeper insights like house building type, year of built, features)? In addition to average price, the application should support other property info ** maximum, or minimum price..etc and trend (graph) on a selected property attribute over a period of time**. Hence, the queries should not enforce the search based on a primary key or few fixed fields In other words, queries can be What is the change in 3 Bed Room house price (irrespective of location) over last 30 days? What kind of properties we can get for X price (irrespective of location or house type) The challenge I have is identifying the domain (BI/ Data Analytical or DB Design or DB Query Interface or DW related or something else) this problem (dynamic query on historic data) belong to, so that I can do further exploration My findings so far I could be wrong on the following, so please correct me if you think so I briefly read about BI/Data Analytics - I think it is heavy weight solution for my problem and has scalability issues. DB Design - As I understand RDBMS works well if you know Data model at design time. I am expecting attributes about property or other entity (user) that am going to bring in, would evolve quickly. hence maintenance would be an issue. As I am going to have multiple users executing query at same time, performance would be a bottleneck Other options like Graph DB (http://www.tinkerpop.com/) seems to be bit complex (they are good. but using those tools meant for generic purpose, make me think like assembly programming to solve my problem ) BigData related solution are to analyse data from multiple unrelated domains So, Any suggestion on the space this problem fit in ? (Especially if you have design/implementation experience of back-end for property listing or similar portals)

    Read the article

  • Algorithms for Data Redundancy and Failover for distributed storage system?

    - by kennetham
    I'm building a distributed storage system that works with different storage sizes. For instance, my storage devices have sizes of 50GB, 70GB, 150GB, 250GB, 1000GB, 5 storage systems in one system. My application will store any files to the storage system. Question: How can I build a distributed storage with the idea of data redundancy and fail-over to store documents, videos, any type of files at the same time ensuring that should one of any storage devices fail, there would be another copy of these files on another storage device. However, the concern is, 50GB of storage can only store this maximum number of files as compared to 70GB, 150GB etc. With one storage in mind, bringing 5 storage systems like a cloud storage, is there any logical way to distribute or store the files through my application? How do I ensure data redundancy through different storage sizes? Is there any algorithm to collate multiple blob files into a single file archive? What is the best solution for one cloud storage with multiple different storage sizes? I open this topic with the objective of discussing the best way to implement this idea, assuming simplicity, what are the issues of this implementation, performance measurements and discussion of the limitations.

    Read the article

  • AngularJS dealing with large data sets (Strategy)

    - by Brian
    I am working on developing a personal temperature logging viewer based on my rasppi curl'ing data into my web server's api. Temperatures are taken every 2 seconds and I can have several temperature sensors posting data. Needless to say I will have a lot of data to handle even within the scope of an hour. I have implemented a very simple paging api from the server so the server doesn't timeout and is currently only returning data in 1000 units per call, then paging through the data. I had the idea to intially show say the last 20 minutes of data from a sensor (or all sensors depending on user choices), then allowing the user to select other timeframes from which to show data. The issue comes in when you want to view all sensors or an extended time period (say 24 hours). Is there a best practice of handling this large amount of data? Would it be useful to load those first 20 minutes into the live view and then cache into local storage something like the last 24 hours? I haven't been able to find a decent idea of this in use yet even though there are a lot of ways to take this problem. I am just looking for some suggestions as to what might provide a good balance between good performance and not caching the entire data set on the client side (as beyond a week of data this might not be feasible).

    Read the article

  • I need some help creating a non-binary tree (or some other data structure that will better solve my problem)

    - by EDO
    I have about ten lists of numbers and some strings. Each list has about <= 30K lines. Each line on a list has a distinct number. I need to build an efficient way of finding all the lines in each list that has the same 'control' number (or key for dB guys) and comparing what is in their string parts. I am writing this in Java. I have thought about using trees but my brain cells are about burnt now. I need some help.

    Read the article

  • replacing data.frame element-wise operations with data.table (that used rowname)

    - by Harold
    So lets say I have the following data.frames: df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10]) df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10), row.names = letters[1:10]) And perhaps the "equivalent" data.tables: dt1 <- data.table(x = rownames(df1), df1, key = 'x') dt2 <- data.table(x = rownames(df2), df2, key = 'x') If I want to do element-wise operations between df1 and df2, they look something like dfRes <- df1 / df2 And rownames() is preserved: R> head(dfRes) y z a 0.5 3.1405463 b 1.0 1.2925200 c 1.5 1.4137930 d 2.0 -0.5532855 e 2.5 -0.0998303 f 1.2 -1.6236294 My poor understanding of data.table says the same operation should look like this: dtRes <- dt1[, !'x', with = F] / dt2[, !'x', with = F] dtRes[, x := dt1[,x,]] setkey(dtRes, x) (setkey optional) Is there a more data.table-esque way of doing this? As a slightly related aside, more generally, I would have other columns such as factors in each data.table and I would like to omit those columns while doing the element-wise operations, but still have them in the result. Does this make sense? Thanks!

    Read the article

  • PHP - post data ends when '&' is in data.

    - by Phil Jackson
    Hi all, im posting data using jquery/ajax and PHP at the backend. Problem being, when I input something like 'Jack & Jill went up the hill' im only recieving 'Jack' when it gets to the backend. I have thrown an error at the frontend before that data is sent which alerts 'Jack & Jill went up the hill'. When I put die(print_r($_POST)); at the very top of my index page im only getting [key] => Jack how can I be loosing the data? I thought It may have been my filter; <?php function filter( $data ) { $data = trim( htmlentities( strip_tags( mb_convert_encoding( $data, 'HTML-ENTITIES', "UTF-8") ) ) ); if ( get_magic_quotes_gpc() ) { $data = stripslashes( $data ); } //$data = mysql_real_escape_string( $data ); return $data; } echo "<xmp>" . filter("you & me") . "</xmp>"; ?> but that returns fine in the test above you &amp; me which is in place after I added die(print_r($_POST));. Can anyone think of how and why this is happening? Any help much appreciated. Regards, Phil.

    Read the article

  • How to fake Azure Table Storage in .NET for Unit Testing?

    - by Erick T
    I am working on a system that uses Azure Table Storage. In other systems (e.g., SQL, File based, etc), I can write a fake that allows me to test my data persistence logic. However, I can't see an easy way to create a fake for the Azure Table Service. I could create a new IIS project that behaves the same way, but that isn't a good way to write a unit test, it is more of an integration test. Any thoughts on how to unit test data access code that uses the Azure Table Storage client? Thanks, Erick

    Read the article

  • Cannot remove storage account because of lease, but I already deleted the server [closed]

    - by djechelon
    I recently created a temporary virtual server on Azure. Then I deleted it. I wanted to delete the storage account associated with it because I didn't need it any more. The problem is that the VHD file is still associated to a non-existing virtual machine!! If I try to delete the VHD from Virtual Machines\Disks I get the Delete button greyed and the table tells me it's still associated with the old VM. If I go to storage administration and try to delete the blob from vhds/ directory I get there is an active lease. I've read on Azure forums that, in these case, one should try to force releasing the lease from the blob. I followed their instructions and downloaded their script, but running it failed. The script detected that the disk is associated to a Virtual Machine and can't be deleted. The problem is that I'm 1000000% sure that I already deleted the VM. In fact, I currently only have a single VM that has its own HD and is up and running fine! What can I do to delete that storage account that is probably sucking money from my pocket?

    Read the article

  • What is the best private cloud storage setup

    - by vdrmrt
    I need to create a private cloud and I'm searching for the best setup. These are my 2 most important requirements 1. Disk and system redundant 2. Price / GB as low as possible The system is going to be used as backup setup which will receive data 24/7 over SFTP and rsync. High throughput is not that important. I'm planning to use glusterfs and consumer grade 4TB hard-drives. I have worked out 3 possible setups 3 servers with 11 4TB HDD Setup up a replica 3 glusterfs and setup each hard drive as a separate ext4 brick. Total capacity: 44TB HDD / TB ratio of 0.75 (33HDD / 44TB) 2 servers with 11 4TB HDD The 11 hard-drives are combined in a RAIDZ3 ZFS storage pool. With a replica 2 gluster setup. Total capacity: 32TB (+ zfs compression) HDD / TB ratio of 0.68 (22HDD / 32TB) 3 servers with 11 4TB consumer hard-drives Setup up a replica 3 glusterfs and setup each hard-drive as a separate zfs storage pool and export each pool as a brick. Total capacity: 32TB (+ zfs compression) HDD / TB ratio of 0.68 (22HDD / 32TB) (Cheapest) My remarks and concerns: If a hard drive fails which setup will recover the quickest? In my opinion setup 1 and 3 because there only the contents of 1 hard-drive needs to be copied over the network. Instead of setup 2 were the hard-drive needs te be reconstructed by reading the parity of all the other harddrives in the system. Will a zfs pool on 1 harddrive give me extra protection against for example bit rot? With setup 1 and 3 I can loose 2 systems and still be up and running with setup 2 I can only loose 1 system. When I use ZFS I can enable compression which will give me some extra storage.

    Read the article

  • New whitepaper, “Why Oracle Sun ZFS Storage Appliance for Oracle Databases?” now available.

    - by Cinzia Mascanzoni
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Databases are the backbone of today’s modern business providing transaction integrity for key business systems such as payment engines or providing the core of analytical data for decision-making. These diverse use cases require a flexible, high performance and highly available storage platform. The ZFS Storage Appliance is ideally suited with its architecture providing a platform flexible enough to meet the ever-changing availability, capacity and performance requirements from the business. In this just published white paper the authors provide both business and technical evidence of the suitability of the Oracle ZSF Storage Appliance as primary storage for Oracle Database 11gR2 environments. Click here to download the whitepaper.

    Read the article

  • Recommended storage scheme for home server? (LVM/JBOD/RAID 5...)

    - by j-g-faustus
    Are there any guidelines for which storage scheme(s) makes most sense for a multiple-disk home server? I am assuming a separate boot/OS disk (so bootability is not a concern, this is for data storage only) and 4-6 storage disks of 1-2 TB each, for a total storage capacity in the range 4-12 TB. The file system is ext4, I expect there will be only one big partition spanning all disks. As far as I can tell, the alternatives are individual disks pros: works with any combination of disk sizes; losing a disk loses only the data on that disk; no need for volume management. cons: data management is clumsy when logical units (like a "movies" folder) are larger than the capacity of any single drive. JBOD span pros: can merge disks of any size. cons: losing a disk loses all data on all disks LVM pros: can merge disks of any size; relatively simple to add and remove disks. cons: losing a disk loses all data on all disks RAID 0 pros: speed cons: losing one drive loses all data; disks must be same size RAID 5 pros: data survives losing one disk cons: gives up one disk worth of capacity; disks must be same size RAID 6 pros: data survives losing two disks cons: gives up two disks worth of capacity; disks must be same size I'm primarily considering either LVM or JBOD span simply because it will let me reuse older, smaller-capacity disks when I upgrade the system. The runner-up is RAID 0 for speed. I'm planning on having full backups to a separate system, so I expect the extra redundancy from RAID levels 5 or 6 won't be important. Is this a fair representation of the alternatives? Are there other considerations or alternatives I have missed? And what would you recommend?

    Read the article

  • SQL Developer Debugging, Watches, Smart Data, & Data

    - by thatjeffsmith
    After presenting the SQL Developer PL/SQL debugger for about an hour yesterday at KScope12 in San Antonio, my boss came up and asked, “Now, would you really want to know what the Smart Data panel does?” Apparently I had ‘made up’ my own story about what that panel’s intent is based on my experience with it. Not good Jeff, not good. It was a very small point of my presentation, but I probably should have read the docs. The Smart Data tab displays information about variables, using your Debugger: Smart Data preferences. You can also specify these preferences by right-clicking in the Smart Data window and selecting Preferences. Debugger Smart Data Preferences, control number of variables to display The Smart Data panel auto-inspects the last X accessed variables. So if you have a program with 26 variables, instead of showing you all 26, it will just show you the last two variables that were referenced in your program. If you were to click on the ‘Data’ debug panel, you’ll see EVERYTHING. And if you only want to see a very specific set of values, then you should use Watches. The Smart Data Panel As I step through the code, the variables being tracked change as they are referenced. Only the most recent ones display. This is controlled by the ‘Maximum Locations to Remember’ preference. Step through the code, see the latest variables accessed The Data Panel All variables are displayed. Might be information overload on large PL/SQL programs where you have many dozens or even hundreds of variables to track. Shows everything all the time Watches Watches are added manually and only show what you ask for. Data on Demand – add a watch to track a specific variable Remember, you can interact with your data If you want to do more than just watch, you can mouse-right on a data element, and change the value of the variable as the program is running. This is one of the primary benefits to debugging over using DBMS_OUTPUT to track what’s happening in your program. Change the values while the program is running to test your ‘What if?’ scenarios

    Read the article

  • SQL – Biggest Concerns in a Data-Driven World

    - by Pinal Dave
    The ongoing chaos over Government Agency’s snooping has ignited a heated debate on privacy of personal data and its use by government and/or other institutions. It has created a feeling of disapproval and distrust among users. This incident proves to be a lesson for companies that are looking to leverage their business using a data driven approach. According to analysts, the goal of gathering personal information should be to deliver benefits to both the parties – the user as well as the data collector(government or business). Using data the right way is crucial, and companies need to deploy the right software applications and systems to ensure that their efforts are well-directed. However, there are various issues plaguing analysts regarding available software, which are highlighted below. According to a InformationWeek 2013 Survey of Analytics, Business Intelligence and Information Management where 541 business technology professionals contributed as respondents, it was discovered that the biggest concern was deemed to be the scarcity of expertise and high costs associated with the same. This concern was voiced by as many as 38% of the participants. A close second came out to be the issue of data warehouse appliance platforms being expensive, with 33% of those present believing it to be a huge roadblock. Another revelation made in this respect was that 31% professionals weren’t even sure how Data Analytics can create business opportunities for them. Another 17% shared that they found data platform technologies such as Hadoop and NoSQL technologies hard to learn. These results clearly pointed out that there are awareness and expertise issues that also need much attention. Unless the demand-supply gap of Business Intelligence professionals well versed in data analysis technologies is met, this divide is going to affect how companies make the most of their BI campaigns. One of the key action points that can be taken to salvage the situation, is to provide training on Data Analytics concepts. Koenig Solutions offer courses on many such technologies including a course on MCSE SQL Server 2012: BI Platform. So it’s time to brush up your skills and get down to work in a data driven world that awaits you ahead. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Store XML data in Core Data

    - by ct2k7
    Hi, is there any easy way of store XML data into core data? Currently, my app just pulls the values from the XML file directly, however, this isn't efficient for XML files which holds over 100 entries, thus storing the data in Core Data would be the best option. XML file is called/downloaded/parsed ever time the app opens. With the Core Data, the XML data would be downloaded ever 3600 seconds or so, and refresh the current data in the core data, to reduce the loading time when opening the app. Any ideas on how I can do this? Having reviewed the developer documentation, it doesn't look very tasty.

    Read the article

  • Core Data vs. SQLitePersistentObjects

    - by Macatomy
    I'm creating an iPhone app and I'm trying to choose between 2 solutions for a persistent store. Core Data, or SQLitePersistentObjects. Basically, all my app needs is a way to store an array of model objects and then load them again to display in a UITableView. Its nothing too complicated. Core Data seems to have a much higher learning curve than the simple to use SQLitePersistentObjects. Are there any obvious benefits of using Core Data over SQLitePersistentObjects in my case?

    Read the article

  • Where can I find free and open data?

    - by kitsune
    Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

    Read the article

  • Why are data structures so important in interviews?

    - by Vamsi Emani
    I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures when I graduated out of college. Through out the campus placements during my graduation, I've witnessed that most of the biggie tech companies like Amazon, Microsoft etc focused mainly on data structures. It appears as if data structures is the only thing that they expect from a graduate. Adding to this, I see that there is this general perspective that a good programmer is necessarily a one with good knowledge about data structures. To be honest, I felt bad about that. I write good code. I follow standard design patterns of coding, I do use data structures but at the superficial level as in java exposed APIs like ArrayLists, LinkedLists etc. But the companies usually focused on the intricate aspects of Data Structures like pointer based memory manipulation and time complexities. Probably because of my java-ish background, Back then, I understood code efficiency and logic only when talked in terms of Object Oriented Programming like Objects, instances, etc but I never drilled down into the level of bits and bytes. I did not want people to look down upon me for this knowledge deficit of mine in Data Structures. So really why all this emphasis on Data Structures? Does, Not having knowledge in Data Structures really effect one's career in programming? Or is the knowledge in this subject really a sufficient basis to differentiate a good and a bad programmer?

    Read the article

  • Data structure for pattern matching.

    - by alvonellos
    Let's say you have an input file with many entries like these: date, ticker, open, high, low, close, <and some other values> And you want to execute a pattern matching routine on the entries(rows) in that file, using a candlestick pattern, for example. (See, Doji) And that pattern can appear on any uniform time interval (let t = 1s, 5s, 10s, 1d, 7d, 2w, 2y, and so on...). Say a pattern matching routine can take an arbitrary number of rows to perform an analysis and contain an arbitrary number of subpatterns. In other words, some patterns may require 4 entries to operate on. Say also that the routine (may) later have to find and classify extrema (local and global maxima and minima as well as inflection points) for the ticker over a closed interval, for example, you could say that a cubic function (x^3) has the extrema on the interval [-1, 1]. (See link) What would be the most natural choice in terms of a data structure? What about an interface that conforms a Ticker object containing one row of data to a collection of Ticker so that an arbitrary pattern can be applied to the data. What's the first thing that comes to mind? I chose a doubly-linked circular linked list that has the following methods: push_front() push_back() pop_front() pop_back() [] //overloaded, can be used with negative parameters But that data structure seems very clumsy, since so much pushing and popping is going on, I have to make a deep copy of the data structure before running an analysis on it. So, I don't know if I made my question very clear -- but the main points are: What kind of data structures should be considered when analyzing sequential data points to conform to a pattern that does NOT require random access? What kind of data structures should be considered when classifying extrema of a set of data points?

    Read the article

  • Five Key Strategies in Master Data Management

    - by david.butler(at)oracle.com
    Here is a very interesting Profit Magazine article on MDM: A recent customer survey reveals the deleterious effects of data fragmentation. by Trevor Naidoo, December 2010   Across industries and geographies, IT organizations have grown in complexity, whether due to mergers and acquisitions, or decentralized systems supporting functional or departmental requirements. With systems architected over time to support unique, one-off process needs, they are becoming costly to maintain, and the Internet has only further added to the complexity. Data fragmentation has become a key inhibitor in delivering flexible, user-friendly systems. The Oracle Insight team conducted a survey assessing customers' master data management (MDM) capabilities over the past two years to get a sense of where they are in terms of their capabilities. The responses, by 27 respondents from six different industries, reveal five key areas in which customers need to improve their data management in order to get better financial results. 1. Less than 15 percent of organizations surveyed understand the sources and quality of their master data, and have a roadmap to address missing data domains. Examples of the types of master data domains referred to are customer, supplier, product, financial and site. Many organizations have multiple sources of master data with varying degrees of data quality in each source -- customer data stored in the customer relationship management system is inconsistent with customer data stored in the order management system. Imagine not knowing how many places you stored your customer information, and whether a customer's address was the most up to date in each source. In fact, more than 55 percent of the respondents in the survey manage their data quality on an ad-hoc basis. It is important for organizations to document their inventory of data sources and then profile these data sources to ensure that there is a consistent definition of key data entities throughout the organization. Some questions to ask are: How do we define a customer? What is a product? How do we define a site? The goal is to strive for one common repository for master data that acts as a cross reference for all other sources and ensures consistent, high-quality master data throughout the organization. 2. Only 18 percent of respondents have an enterprise data management strategy to ensure that data is treated as an asset to the organization. Most respondents handle data at the department or functional level and do not have an enterprise view of their master data. The sales department may track all their interactions with customers as they move through the sales cycle, the service department is tracking their interactions with the same customers independently, and the finance department also has a different perspective on the same customer. The salesperson may not be aware that the customer she is trying to sell to is experiencing issues with existing products purchased, or that the customer is behind on previous invoices. The lack of a data strategy makes it difficult for business users to turn data into information via reports. Without the key building blocks in place, it is difficult to create key linkages between customer, product, site, supplier and financial data. These linkages make it possible to understand patterns. A well-defined data management strategy is aligned to the business strategy and helps create the governance needed to ensure that data stewardship is in place and data integrity is intact. 3. Almost 60 percent of respondents have no strategy to integrate data across operational applications. Many respondents have several disparate sources of data with no strategy to keep them in sync with each other. Even though there is no clear strategy to integrate the data (see #2 above), the data needs to be synced and cross-referenced to keep the business processes running. About 55 percent of respondents said they perform this integration on an ad hoc basis, and in many cases, it is done manually with the help of Microsoft Excel spreadsheets. For example, a salesperson needs a report on global sales for a specific product, but the product has different product numbers in different countries. Typically, an analyst will pull all the data into Excel, manually create a cross reference for that product, and then aggregate the sales. The exact same procedure has to be followed if the same report is needed the following month. A well-defined consolidation strategy will ensure that a central cross-reference is maintained with updates in any one application being propagated to all the other systems, so that data is synchronized and up to date. This can be done in real time or in batch mode using integration technology. 4. Approximately 50 percent of respondents spend manual efforts cleansing and normalizing data. Information stored in various systems usually follows different standards and formats, making it difficult to match the data. A customer's address can be stored in different ways using a variety of abbreviations -- for example, "av" or "ave" for avenue. Similarly, a product's attributes can be stored in a number of different ways; for example, a size attribute can be stored in inches and can also be entered as "'' ". These types of variations make it difficult to match up data from different sources. Today, most customers rely on manual, heroic efforts to match, cleanse, and de-duplicate data -- clearly not a scalable, sustainable model. To solve this challenge, organizations need the ability to standardize data for customers, products, sites, suppliers and financial accounts; however, less than 10 percent of respondents have technology in place to automatically resolve duplicates. It is no wonder, therefore, that we get communications about products we don't own, at addresses we don't reside, and using channels (like direct mail) we don't like. An all-too-common example of a potential challenge follows: Customers end up receiving duplicate communications, which not only impacts customer satisfaction, but also incurs additional mailing costs. Cleansing, normalizing, and standardizing data will help address most of these issues. 5. Only 10 percent of respondents have the ability to share data that was mastered in a master data hub. Close to 60 percent of respondents have efforts in place that profile, standardize and cleanse data manually, and the output of these efforts are stored in spreadsheets in various parts of the organization. This valuable information is not easily shared with the rest of the organization and, more importantly, this enriched information cannot be sent back to the source systems so that the data is fixed at the source. A key benefit of a master data management strategy is not only to clean the data, but to also share the data back to the source systems as well as other systems that need the information. Aside from the source systems, another key beneficiary of this data is the business intelligence system. Having clean master data as input to business intelligence systems provides more accurate and enhanced reporting.  Characteristics of Stellar MDM When deciding on the right master data management technology, organizations should look for solutions that have four main characteristics: enterprise-grade MDM performance complete technology that can be rapidly deployed and addresses multiple business issues end-to-end MDM process management with data quality monitoring and assurance pre-built MDM business relevant applications with data stores and workflows These master data management capabilities will aid in moving closer to a best-practice maturity level, delivering tremendous efficiencies and savings as well as revenue growth opportunities as a result of better understanding your customers.  Trevor Naidoo is a senior director in Industry Strategy and Insight at Oracle. 

    Read the article

  • Address Regulatory Mandates for Data Encryption Without Changing Your Applications

    - by Troy Kitch
    The Payment Card Industry Data Security Standard, US state-level data breach laws, and numerous data privacy regulations worldwide all call for data encryption to protect personally identifiable information (PII). However encrypting PII data in applications requires costly and complex application changes. Fortunately, since this data typically resides in the application database, using Oracle Advanced Security, PII can be encrypted transparently by the Oracle database without any application changes. In this ISACA webinar, learn how Oracle Advanced Security offers complete encryption for data at rest, in transit, and on backups, along with built-in key management to help organizations meet regulatory requirements and save money. You will also hear from TransUnion Interactive, the consumer subsidiary of TransUnion, a global leader in credit and information management, which maintains credit histories on an estimated 500 million consumers across the globe, about how they addressed PCI DSS encryption requirements using Oracle Database 11g with Oracle Advanced Security. Register to watch the webinar now.

    Read the article

  • Using Hadooop (HDInsight) with Microsoft - Two (OK, Three) Options

    - by BuckWoody
    Microsoft has many tools for “Big Data”. In fact, you need many tools – there’s no product called “Big Data Solution” in a shrink-wrapped box – if you find one, you probably shouldn’t buy it. It’s tempting to want a single tool that handles everything in a problem domain, but with large, complex data, that isn’t a reality. You’ll mix and match several systems, open and closed source, to solve a given problem. But there are tools that help with handling data at large, complex scales. Normally the best way to do this is to break up the data into parts, and then put the calculation engines for that chunk of data right on the node where the data is stored. These systems are in a family called “Distributed File and Compute”. Microsoft has a couple of these, including the High Performance Computing edition of Windows Server. Recently we partnered with Hortonworks to bring the Apache Foundation’s release of Hadoop to Windows. And as it turns out, there are actually two (technically three) ways you can use it. (There’s a more detailed set of information here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx, I’ll cover the options at a general level below)  First Option: Windows Azure HDInsight Service  Your first option is that you can simply log on to a Hadoop control node and begin to run Pig or Hive statements against data that you have stored in Windows Azure. There’s nothing to set up (although you can configure things where needed), and you can send the commands, get the output of the job(s), and stop using the service when you are done – and repeat the process later if you wish. (There are also connectors to run jobs from Microsoft Excel, but that’s another post)   This option is useful when you have a periodic burst of work for a Hadoop workload, or the data collection has been happening into Windows Azure storage anyway. That might be from a web application, the logs from a web application, telemetrics (remote sensor input), and other modes of constant collection.   You can read more about this option here:  http://blogs.msdn.com/b/windowsazure/archive/2012/10/24/getting-started-with-windows-azure-hdinsight-service.aspx Second Option: Microsoft HDInsight Server Your second option is to use the Hadoop Distribution for on-premises Windows called Microsoft HDInsight Server. You set up the Name Node(s), Job Tracker(s), and Data Node(s), among other components, and you have control over the entire ecostructure.   This option is useful if you want to  have complete control over the system, leave it running all the time, or you have a huge quantity of data that you have to bulk-load constantly – something that isn’t going to be practical with a network transfer or disk-mailing scheme. You can read more about this option here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx Third Option (unsupported): Installation on Windows Azure Virtual Machines  Although unsupported, you could simply use a Windows Azure Virtual Machine (we support both Windows and Linux servers) and install Hadoop yourself – it’s open-source, so there’s nothing preventing you from doing that.   Aside from being unsupported, there are other issues you’ll run into with this approach – primarily involving performance and the amount of configuration you’ll need to do to access the data nodes properly. But for a single-node installation (where all components run on one system) such as learning, demos, training and the like, this isn’t a bad option. Did I mention that’s unsupported? :) You can learn more about Windows Azure Virtual Machines here: http://www.windowsazure.com/en-us/home/scenarios/virtual-machines/ And more about Hadoop and the installation/configuration (on Linux) here: http://en.wikipedia.org/wiki/Apache_Hadoop And more about the HDInsight installation here: http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW Choosing the right option Since you have two or three routes you can go, the best thing to do is evaluate the need you have, and place the workload where it makes the most sense.  My suggestion is to install the HDInsight Server locally on a test system, and play around with it. Read up on the best ways to use Hadoop for a given workload, understand the parts, write a little Pig and Hive, and get your feet wet. Then sign up for a test account on HDInsight Service, and see how that leverages what you know. If you're a true tinkerer, go ahead and try the VM route as well. Oh - there’s another great reference on the Windows Azure HDInsight that just came out, here: http://blogs.msdn.com/b/brunoterkaly/archive/2012/11/16/hadoop-on-azure-introduction.aspx  

    Read the article

  • Oracle - A Leader in Gartner's MQ for Master Data Management for Customer Data

    - by Mala Narasimharajan
      The Gartner MQ report for Master Data Management of Customer Data Solutions is released and we're proud to say that Oracle is in the leaders' quadrant.  Here's a snippet from the report itself:  " “Oracle has a strong, though complex, portfolio of domain-specific MDM products that include prepackaged data models. Gartner estimates that Oracle now has over 1,500 licensed MDM customers, including 650 customers managing customer data. The MDM portfolio includes three products that address MDM of customer data solution needs: Oracle Fusion Customer Hub (FCH), Oracle CDH and Oracle Siebel UCM. These three MDM products are positioned for different segments of the market and Oracle is progressively moving all three products onto a common MDM technology platform..." (Gartner, Oct 18, 2012)  For more information on Oracle's solutions for customer data in Master Data Management, click here.  

    Read the article

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17  | Next Page >