Search Results

Search found 60391 results on 2416 pages for 'data generation'.

Page 447/2416 | < Previous Page | 443 444 445 446 447 448 449 450 451 452 453 454  | Next Page >

  • Oracle Endeca Information Discovery 3.1 is Now Available

    - by p.anda
    Oracle Endeca Information Discovery (OEID) 3.1 is a major release that incorporates significant new self-service discovery capabilities for business users. These include agile data mashup, extended support for unstructured analytics, and an even tighter integration with Oracle BI This release is available for download from: Oracle Delivery Cloud Oracle Technology Network Some of the what's new highlights ... Self-service data mashup... enables access to a wider variety of personal and trusted enterprise data sources. Blend multiple data sets in a single app. Agile discovery dashboards... allows users to easily create, configure, and securely share discovery dashboards with intelligent defaults, intuitive wizards and drag-and-drop configuration. Deeper unstructured analysis ... enables users to enrich text using term extraction and whitelist tagging while the data is live. Enhanced integration with OBI... provides easier wizards for data selection and enables OBI Server as a self-service data source. Enterprise-class data discovery... offers faster performance, a trusted data connection library, improved auditing and increased data connectivity for Hadoop, web content and Oracle Data Integrator. Find out more ... visit the OEID Overview page to download the What's New and related Data Sheet PDF documents. Have questions or want to share details for Oracle Endeca Information Discovery?  The MOS Communities is a great first stop to visit and you can stop-by at MOS OEID Community.

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part I

    - by dbayard
    Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls.  R is great for doing advanced computations.  Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns.  This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios.  This information is needed for customer statements and the online web application.  In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations.  But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly.  Rather, IRR is a calculation where approximation techniques need to be used.  In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns.  You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R.  In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale.  The first step we did was to get some sample data.  For a historical IRR calculation, you have a balances and cash flows.  In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel.      The above figure shows part of the spreadsheet of sample data.  The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R.  This is something that R does well.  R offers multiple ways to work with spreadsheet data.  For instance, one could save the spreadsheet as a .csv file.  In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account.  To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files.  We wrote ourselves a little helper function called getsheet() around the RODBC package.  Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself.  The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book.  In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer.  (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique.  For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero.  With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum.  Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer.   To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high.  We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs.  To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range.  See the R help page on optimize()  for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function.  (Our real-world version is  more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation.  In our actual customer proof, we also defined time-weighted rate of return calculations.  The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data.  For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data.  We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory.  In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account.  The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html).  Given that our sample data can fit in memory, one easy approach is to use R’s “by” function.  (Other approaches to Split-Apply-Combine such as plyr can also be used.  See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set.  Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR.  There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

    Read the article

  • What data to send when tracking clicks with Google Analytics events (and how)?

    - by user359650
    When tracking clicks on links, there are 3 items I'm interested in: link location in the page by grabbing the id of the closest parent: to see influence of location on click-through link text: to see influence of text on click-through link href attribute value: to see where people go when leaving my website The problem when using Google Analytics to track those clicks is that events only have 3 available text fields, one of which being the category, which if you use to store one of the above items will create a mess in your Event reporting because you will have as many categories as item values. Therefore if you assign a predefined value to the category (e.g. clicks), then you're left with only 2 event fields (action, label) to store 3 items (location, text, href). That in itself isn't the end of the world because you can concatenate 2 items into 1 event field, then use the reporting or the API to filter things out. Accordingly what I plan on doing is this: category: clicks action: {location_on_page} ¦ {text} label: {href} where {__} are variable values related to the clicked links With this I can easily create some reports directly via the GUI: downloads: include only events where label ends with .pdf click outs to particular domains: include only events where label contains domain And for more complex tasks I need to export the data (or use the API): influence of location on clicks: for each location in the design, count number of events that have that location in the action, then corroborate with pageviews of the corresponding pages. Whilst this looks good I'm wondering if there is a better approach, hence the following questions: Q1: Can you foresee any particular issues with this particular setup (e.g. things I won't be able to report on)? Q2: Can you think of other data that would be interesting to include in the event?

    Read the article

  • Distinguishing the Transport Layer from the Session Layer

    In the OSI communications model, the Session layer manages the creation and removal of the association between two communicating network end points. This can also be called a connection. A connection is maintained while the network end points are communicated between each other in a conversation or session of some unknown length of time. Some connections and sessions only need to last long enough to send a message or a piece of data in one direction. In addition, some sessions may last longer, this typically occurs when one or both of the network end points are  able to terminate it the connection. The transport layer ensures that messages/data are delivered without errors, in sequence, and with no data content losses or data content duplications. It relieves the Session Layer from any concern with the transfer of data between them and their peers. Examples: Session Layer This layer acts like a manager and asks one of its employees (Transport Layer) to move a piece of data/message from one network end-point to another. Transportation Layer This layer takes the request of the Session Layer and moves the data as insturcted, it also double checks the data for any missing or corrupted information after it has been moved. If for some reason the new data does not match the old data then the Transport player will attempt to move the data gain until the data at both locations is in sync.

    Read the article

  • Huawei e303c data-card not working for Ubuntu 11.04?

    - by Umashankar
    Cheers to you. I got a problem in making a Mobile-BroadBand connection in Ubuntu 11.04, using 'Huawei e303c' usb data-card. I'm using Tata Docomo 3G sim-card (India, circle: Maharastra). My observations: 1.) I installed the device's driver 'Mobile-Partner For Linux'(which came up with the device). But it is not detecting my device. 2.) In Network Manager, Adding a Mobile-BroadBand connection is not able to detect the device (with or without the device's driver installed). 3.)I tried softwares like usb_modeswitch, gnomeppp, wvdial, sakis3G and followed their guidelines. These too didn't work. 4.) Without the driver, the system is able to identify the device (Mobile-Partner icon comes-up, that leads to driver setup files). But after installing the driver, nothing comes-up there. 5.) In all the above cases, when 'lsusb' cmd is fired, the prompt shows the connected data card (as 'DEVICE_ID:VENDOR_ID Huawei Technologies Ltd.,'). This is my problem. Give a solution to get my device connected. -Umash

    Read the article

  • File permission issues after setting up an amazon ec2 instance

    - by Pardoner
    I've set up an amazon ec2 instance and I'm have some file permission issues. I've created myself a new user and added myself to the following groups: adm:x:4:me,ubuntu www-data:x:33:me,www-data ssh:x:108:me admin:x:111:me ubuntu:x:1000:www-data,me me:x:1001:me but when I cd /var/www I can't do simple commands without doing sudo first. So I chmod -R www-data:www-data /var/www to ensure that I'm in the owning group but I still have to type sudo for everything. If I sudo su www-data it works fine. Since I'm in the www-data group shouldn't I have the same privilages as www-data? One strange thing I'm noticing is that when I ls -l it list the owner but not the group names. Could this possibly be part of the issue? Is is posible for a directory to not be part of a group? drwxr-xr-x 4 www-data 4.0K Oct 24 16:39 . drwxr-xr-x 14 root 4.0K Oct 10 16:58 .. drwxrwxr-x 9 www-data 4.0K Oct 23 04:03 admin.mywebsite.com drwxrwxr-x 2 www-data 4.0K Oct 4 00:29 mywebsite.com drwxrwxr-x 9 www-data 4.0K Oct 23 04:03 staging.mywebsite.com

    Read the article

  • Should I cache the data or hit the database?

    - by JD01
    I have not worked with any caching mechanisms and was wondering what my options are in the .net world for the following scenario. We basically have a a REST Service where the user passes an ID of a Category (think folder) and this category may have lots of sub categories and each of the sub categories could have 1000 of media containers (think file reference objects) which contain information about a file that may be on a NAS or SAN server (files are videos in this case). The relationship between these categories is stored in a database together with some permission rules and meta data about the sub categories. So from a UI perspective we have a lazy loaded tree control which is driven by the user by clicking on each sub folder (think of Windows explorer). Once they come to a URL of the video file, they then can watch the video. The number of users could grow into the 1000s and the sub categories and videos could be in the 10000s as the system grows. The question is should we carry on the way it is currently working where each request hits the database or should we think about caching the data? We are on using IIS 6/7 and Asp.net.

    Read the article

  • Where to put business logic in MVC design?

    - by BriskLabs Pakistan
    I have created a simple MVC java application that adds records through data forms to a database. my app collects data, it also validates it and stores it. This is because the data is being sourced online from different users. the data is mostly numeric in nature. now on the numeric data being stored into database (SQL server) , i wish that my app should be able to perform computations... and display it. the user is not interested in how computations are done so they must be encapsulated. the user must only be able to view the simple computed data which for example A column data - B Column data / C column data etc... and just display it to the user... i know how to write stored procedures for same but i want a 3 tier app I want the data, that I put into the database as a record, worked upon by performing calculations on it. However, the original data should remain unaffected, while the new data, post-calculations, must be stored as a new entity record into the database. Where should I write the code for this background calculation? As it is the rules and business logic... in a new java beans files ?

    Read the article

  • How to handle Real Time Data from a database perspective?

    - by balexandre
    I have an idea in mind, but it still confuses me the database area. Imagine that I want to show real time data, and using one of the latest browser technologies (web sockets - even using older browsers) it is very easy to show to all observables (user browser) what everyone is doing. Remy Sharp has an example about the simplicity about this. But I still don't get the database part, how would I feed, let's imagine (using Remy game Tron) that I want to save the path for each connected user in a database and if a client wants to see what is going on with a 5 sec delay, he will see that, not only the 5 sec until that moment but the continuation in time ... how can I query a DB like that? SELECT x, y FROM run WHERE time >= DATEADD(second, -5, rundate); is not the recommended path right? and pulling this x in x time ... this is not real data feed correct? If can someone help me understand the Database point of view, I would greatly appreciate.

    Read the article

  • OpenOffice Calc: How can I count the number of different items with data pilot?

    - by manu
    Hi all, I have a rather long spreadsheet with historical information of issues solved by some user on a colaborative environment. The spreadsheet have the following (relevant) columns date, week no., project, author id, etc... The week no. is calculated from the date, is basically the year concatenated with the week number within that year; for instance, both 2009-02-18 and 2009-02-20 yield the week number 200908 - the 8th week of year 2009; and 2009-02-23 yields 200909 - the 9th week of year 2009. I need to count how many different users (given by author id) contributed to some project, on a weekly basis. I have setup a data pilot with the week as Row Field, the project as the Column Field, and count-author as the Data Field. However, this counts the author id as different instances. This is not what I need. I need to count how many different users contributed to each project on a weekly basis. I expect to get something like: projects week Project1 Project2 Project3 200901 10 2 200902 2 7 Each inner cell containing how many different users contributed. With the count-author configuration, what I get is how many contributions (total) got the project on that week. Is there a way to tell OpenOffice Calc to do what I want?

    Read the article

  • Help parsing long (3.5mil lines) text file, line by line and storing data, need a strategy

    - by Jarrod
    This is a question about solving a particular problem I am struggling with, I am parsing a long list of text data, line by line for a business app in PHP (cron script on the CLI). The file follows the format: HD: Some text here {text here too} DC: A description here DC: the description continues here DC: and it ends here. DT: 2012-08-01 HD: Next header here {supplemental text} ... this repeats over and over for a few hundred megs I have to read each line, parse out the HD: line and grab the text on this line. I then compare this text against data stored in a database. When a match is found, I want to then record the following DC: lines that succeed the matched HD:. Pseudo code: while ( the_file_pointer_isnt_end_of_file) { line = getCurrentLineFromFile title = parseTitleFrom(line) matched = searchForMatchInDB(line) if ( matched ) { recordTheDCLines // <- Best way to do this? } } My problem is that because I am reading line by line, what is the best way to trigger the script to start saving DC lines, and then when they are finished save them to the database? I have a vague idea, but have yet to properly implement it. I would love to hear the communities ideas\suggestions! Thank you.

    Read the article

  • FILESTREAM in SQL Server 2008 R2

    - by CatherineRussell
    Much data is unstructured, such as text documents, images, and videos. This unstructured data is often stored outside the database, separate from its structured data. This separation can cause data management complexities. Or, if the data is associated with structured storage, the file streaming capabilities and performance can be limited. FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data. FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing. FILESTREAM data is not encrypted even when transparent data encryption is enabled. To read more, go to: http://technet.microsoft.com/en-us/library/bb933993.aspx

    Read the article

  • How to deal with data on the model specific to the technology being used?

    - by user1620696
    There are some cases where some of the data on a class of the domain model of an application seems to be dependent on the technology being used. One example of this is the following: suppose we are building one application in .NET such that there's the need of an Employee class. Suppose further that we are going to implement relational database, then the Employee has a primary key right? So that the classe would be something like public class Employee { public int EmployeeID { get; set; } public string Name { get; set; } ... } Now, that EmployeeID is dependent on the technology right? That's something that has to do with the way we've choose to persist our data. Should we write down a class independent of such things? If we do it this way, how should we work? I think I would need to map all the time between domain model and persistence specific types, but I'm not sure.

    Read the article

  • What sort of data should be sent for mouse-based movement in a multiplayer game?

    - by Daniel
    I'm new to the Multiplayer Rodeo here so please bear with me... I am just getting started and I'm trying to figure out how to deal with movement. I've looked at the question Best way to implement mouse-based movement in MMOG which gives me a pretty good idea, but I'm still struggling with what kind of data should be sent to the server. If a player is on position [x:0, y:0] and I click with the mouse on [x:40, y:40] to start movement, what information should I send to the server? Should I calculate the position based on velocity on client side and just send the expected location? Or should I send current location and velocity and direction? When the server is updating the clients on the players' whereabouts, should the position be sent only, and the clients expected to interpolate/predict movement, or can the direction sent from the client (instead of just coordinates) be used. My concern(or confusion) is regarding the ping/lag frequency of data update and use of a predictive algorithm, as I'd like the movement to be smooth even with a high latency, and prevent ability to cheat(though that's not the top priority).

    Read the article

  • Architecture Question

    - by katie77
    I am writing a rules/eligibility Module. I have 2 sets of data, one is the customer data and the other is the customer products data. Customer data to Customer products data is one to many. Now I have to go through a set of Eligibility rules for each of this Customer product data. For each customer products data, I can say the customer is eligible for that product or decline the eligibility and should move on to the next product record. So in all the rules, I need to have access to customer and customer product data(the particular record that the rules are being executed against). Since all the rules can either approve a product or decline a product, I created an interface with those 2 methods and is implementing the this interface for all the rules. I am passing the Customer data and one product data for all the rules (because rules should be executed on each row of customer product data). An Ideal situation would be having the customer and customer product data available for the rule instead of passing them to each rule. What is the best way of doing this in-terms of architecture?

    Read the article

  • Algorithms for Data Redundancy and Failover for distributed storage system?

    - by kennetham
    I'm building a distributed storage system that works with different storage sizes. For instance, my storage devices have sizes of 50GB, 70GB, 150GB, 250GB, 1000GB, 5 storage systems in one system. My application will store any files to the storage system. Question: How can I build a distributed storage with the idea of data redundancy and fail-over to store documents, videos, any type of files at the same time ensuring that should one of any storage devices fail, there would be another copy of these files on another storage device. However, the concern is, 50GB of storage can only store this maximum number of files as compared to 70GB, 150GB etc. With one storage in mind, bringing 5 storage systems like a cloud storage, is there any logical way to distribute or store the files through my application? How do I ensure data redundancy through different storage sizes? Is there any algorithm to collate multiple blob files into a single file archive? What is the best solution for one cloud storage with multiple different storage sizes? I open this topic with the objective of discussing the best way to implement this idea, assuming simplicity, what are the issues of this implementation, performance measurements and discussion of the limitations.

    Read the article

  • Architecture for a business objects / database access layer

    - by gregmac
    For various reasons, we are writing a new business objects/data storage library. One of the requirements of this layer is to separate the logic of the business rules, and the actual data storage layer. It is possible to have multiple data storage layers that implement access to the same object - for example, a main "database" data storage source that implements most objects, and another "ldap" source that implements a User object. In this scenario, User can optionally come from an LDAP source, perhaps with slightly different functionality (eg, not possible to save/update the User object), but otherwise it is used by the application the same way. Another data storage type might be a web service, or an external database. There are two main ways we are looking at implementing this, and me and a co-worker disagree on a fundamental level which is correct. I'd like some advice on which one is the best to use. I'll try to keep my descriptions of each as neutral as possible, as I'm looking for some objective view points here. Business objects are base classes, and data storage objects inherit business objects. Client code deals with data storage objects. In this case, common business rules are inherited by each data storage object, and it is the data storage objects that are directly used by the client code. This has the implication that client code determines which data storage method to use for a given object, because it has to explicitly declare an instance to that type of object. Client code needs to explicitly know connection information for each data storage type it is using. If a data storage layer implements different functionality for a given object, client code explicitly knows about it at compile time because the object looks different. If the data storage method is changed, client code has to be updated. Business objects encapsulate data storage objects. In this case, business objects are directly used by client application. Client application passes along base connection information to business layer. Decision about which data storage method a given object uses is made by business object code. Connection information would be a chunk of data taken from a config file (client app does not really know/care about details of it), which may be a single connection string for a database, or several pieces connection strings for various data storage types. Additional data storage connection types could also be read from another spot - eg, a configuration table in a database that specifies URLs to various web services. The benefit here is that if a new data storage method is added to an existing object, a configuration setting can be set at runtime to determine which method to use, and it is completely transparent to the client applications. Client apps do not need to be modified if data storage method for a given object changes. Business objects are base classes, data source objects inherit from business objects. Client code deals primarily with base classes. This is similar to the first method, but client code declares variables of the base business object types, and Load()/Create()/etc static methods on the business objects return the appropriate data source-typed objects. The architecture of this solution is similar to the first method, but the main difference is the decision about which data storage object to use for a given business object is made by the business layer, not the client code. I know there are already existing ORM libraries that provide some of this functionality, but please discount those for now (there is the possibility that a data storage layer is implemented with one of these ORM libraries) - also note I'm deliberately not telling you what language is being used here, other than that it is strongly typed. I'm looking for some general advice here on which method is better to use (or feel free to suggest something else), and why.

    Read the article

  • Survey Data Model - How to avoid EAV and excessive denormalization?

    - by AlexDPC
    Hi everyone, My database skills are mediocre at best and I have to design a data model for survey data. I have spent some thoughts on this and right now I feel that I am stuck between some kind of EAV model and a design involving hundreds of tables, each with hundreds of columns (and thousands of records). There must be a better way to do this and I hope that the wise folks on this forum can help me. I have already searched various forums, but I couldn't really find a solution. If it has already been given elsewhere, please excuse me and provide me with a link so I can read it up. Some assumptions about the data I have to deal with: Each survey consists of 1 to n questionnaires Each questionnaire consists of 100-2,000 questions (please ignore that 2,000 questions really sound like a lot to answer...) Questions can be of various types: multiple-choice, free text, a number (like age, income, percentages, ...) Each survey involves 10-200 countries (These are not the respondents. The respondents are actually people in the countries.) Depending on the type of questionnaire, each questionnaire is answered by 100-20,000 respondents per country. A country can adapt the questionnaires for a survey, i.e. add, remove or edit questions The data for one country is gathered in a separate database in that country. There is no possibility for online integration from the start. The data for all countries has to be integrated later. This means for example, if a country has deleted a question, that data must somehow be derived from what they sent in order to achieve a uniform design across all countries I will have to write the integration and cleaning software, which will need to work with every country's data In the end the data needs to be exported to flat files, one rectangular grid per country and questionnaire. I have already discussed this topic with people from various backgrounds and have not come to a good solution yet. I mainly got two kinds of opinions. The domain experts, who are used to working with flat files (spreadsheet-style) for data processing and analysis vote for a denormalized structure with loads of tables and columns as I described above (1 table per country and questionnaire). This sounds terrible to me, because I learned that wide tables are to be avoided, it will be annoying to determine which columns are actually in a table when working with it, the database will become cluttered with hundreds of tables (or I even need to set up multiple databases, each with a similar yet a bit differetn design), etc. O-O-programmers vote for a strongly "normalized" design, which would effectively lead to a central table containing all the answers from all respondents to all questions. This table would either need to contain a column of type sql_variant type or multiple answer columns with different types to store answers of different types (multiple choice, free text, ..). The former would essentially be a EAV model. I tend to follow Joe Celko here, who strongly discourages its use (he calls it OTLT or "One True Lookup Table"). The latter would imply that each row would contain null cells for the not applicable types by design. Another alternative I could think of would be to create one table per answer type, i.e., one for multiple-choice questions, one for free text questions, etc.. That's not so generic, it would lead to a lot of union joins, I think and I would have to add a table if a new answer type is invented. Sorry for boring you with all this text and thank you for your input! Cheers, Alex PS: I asked the same question here: http://www.eggheadcafe.com/community/aspnet/13/10242616/survey-data-model--how-to-avoid-eav-and-excessive-denormalization.aspx

    Read the article

  • What scalability problems have you solved using a NoSQL data store?

    - by knorv
    NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include: Cassandra (tabular, written in Java, used by Facebook, Twitter, Digg, Rackspace, Mahalo and Reddit) CouchDB (document, written in Erlang, used by Engine Yard and BBC) Dynomite (key-value, written in C++, used by Powerset) HBase (key-value, written in Java, used by Bing) Hypertable (tabular, written in C++, used by Baidu) Kai (key-value, written in Erlang) MemcacheDB (key-value, written in C, used by Reddit) MongoDB (document, written in C++, used by Sourceforge, Github, Electronic Arts and NY Times) Neo4j (graph, written in Java, used by Swedish Universities) Project Voldemort (key-value, written in Java, used by LinkedIn) Redis (key-value, written in C, used by Engine Yard, Github and Craigslist) Riak (key-value, written in Erlang, used by Comcast and Mochi Media) Ringo (key-value, written in Erlang, used by Nokia) Scalaris (key-value, written in Erlang, used by OnScale) ThruDB (document, written in C++, used by JunkDepot.com) Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site)) I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used. Questions: What scalability problems have you used NoSQL data stores to solve? What NoSQL data store did you use? What database did you use before switching to a NoSQL data store? I'm looking for first-hand experiences, so please do not answer unless you have that.

    Read the article

  • Before data is entered, is there a way to make a grouped table graphic placeholder?

    - by Matt Winters
    I have a grouped table with 3 sections, each section with a title. The 1st and 3rd sections always have only 1 row of information so before any user data is entered, I just put some words like "Enter Data Here..." as placeholder text. This text is edited (replaced) by the user with their own actual data. No problem. The 2nd section however will contain several rows of information entered by user and I'd prefer not to enter placeholder data in row 0, having the user Edit the first row of data then Add subsequent rows. If the numberOfRowsInSection is set to 0, the title for the 3rd section comes close to the title for the 2nd section and it looks ugly. The best that I could come with, and I don't know to do it, is to have a fake graphic placeholder on the striped background (between the 2nd and third titles) that looks like a single row in the 2nd section, put "Enter Data Here..." text in the graphic, and then the first row of actual data entered and all subsequent rows will cover it up. Can anyone tell me how to do this or offer a better suggestion. Thanks.

    Read the article

  • C# .Net Serial DataReceived Event response too slow for high-rate data.

    - by Matthew
    Hi, I have set up a SerialDataReceivedEventHandler, with a forms based program in VS2008 express. My serial port is set up as follows: 115200, 8N1 Dtr and Rts enabled ReceivedBytesThreshold = 1 I have a device I am interfacing with over a BlueTooth, USB to Serial. Hyper terminal receives the data just fine at any data rate. The data is sent regularly in 22 byte long packets. This device has an adjustable rate at which data is sent. At low data rates, 10-20Hz, the code below works great, no problems. However, when I increase the data rate past 25Hz, there starts to recieve mulitple packets on one call. What I mean by this is that there should be a event trigger for every incoming packet. With higher output rates, I have tested the buffer size (BytesToRead command) immediatly when the event is called and there are multiple packets in the buffer then. I think that the event fires slowly and by the time it reaches the code, more packes have hit the buffer. One test I do is see how many time the event is trigger per second. At 10Hz, I get 10 event triggers, awesome. At 100Hz, I get something like 40 event triggers, not good. My goal for data rate is 100HZ is acceptable, 200Hz preferred, and 300Hz optimum. This should work because even at 300Hz, that is only 52800bps, less than half of the set 115200 baud rate. Anything I am over looking? public Form1() { InitializeComponent(); serialPort1.DataReceived += new SerialDataReceivedEventHandler(serialPort1_DataReceived); } private void serialPort1_DataReceived(object sender, SerialDataReceivedEventArgs e) { this.Invoke(new EventHandler(Display_Results)); } private void Display_Results(object s, EventArgs e) { serialPort1.Read(IMU, 0, serial_Port1.BytesToRead); }

    Read the article

  • SQL Server CLR stored procedures in data processing tasks - good or evil?

    - by Gart
    In short - is it a good design solution to implement most of the business logic in CLR stored procedures? I have read much about them recently but I can't figure out when they should be used, what are the best practices, are they good enough or not. For example, my business application needs to parse a large fixed-length text file, extract some numbers from each line in the file, according to these numbers apply some complex business rules (involving regex matching, pattern matching against data from many tables in the database and such), and as a result of this calculation update records in the database. There is also a GUI for the user to select the file, view the results, etc. This application seems to be a good candidate to implement the classic 3-tier architecture: the Data Layer, the Logic Layer, and the GUI layer. The Data Layer would access the database The Logic Layer would run as a WCF service and implement the business rules, interacting with the Data Layer The GUI Layer would be a means of communication between the Logic Layer and the User. Now, thinking of this design, I can see that most of the business rules may be implemented in a SQL CLR and stored in SQL Server. I might store all my raw data in the database, run the processing there, and get the results. I see some advantages and disadvantages of this solution: Pros: The business logic runs close to the data, meaning less network traffic. Process all data at once, possibly utilizing parallelizm and optimal execution plan. Cons: Scattering of the business logic: some part is here, some part is there. Questionable design solution, may encounter unknown problems. Difficult to implement a progress indicator for the processing task. I would like to hear all your opinions about SQL CLR. Does anybody use it in production? Are there any problems with such design? Is it a good thing?

    Read the article

  • How to explain to someone that a data structure should not draw itself, explaining separation of con

    - by leeand00
    I have another programmer who I'm trying to explain why it is that a UI component should not also be a data-structure. For instance say that you get a data-structure that contains a record-set from the "database", and you wish to display that record-set in a UI component within your application. According to this programmer (who will remain nameless, he's young and I'm teaching him...), we should subclass the data-structure into a class that will draw the UI component within our application!!!!!! And thus according to this logic, the record-set should manage the drawing of the UI. **Head Desk*** I know that asking a record-set to draw itself is wrong, because, if you wish to render the same data-structure on more than one type of component on your UI, you are going to have a real mess on your hands; you'll need to extend yet another class for each and every UI component that you render from the base-class of your record-set; I am well aware of the "cleanliness" of the of the MVC pattern (and by that what I really mean is you don't confuse your data (the Model) with your UI (the view) or the actions that take place on the data (the Controller more or less...okay not really the API should really handle that...and the Controller should just make as few calls to it as it can, telling it which view to render)) But it's certainly alot cleaner than using data-structures to render UI components! Is there any other advice I could send his way other than the example above? I understand that when you first learn OOP you go through "a stage" where you where just want to extend everything. Followed by a stage when you think that Design Patterns are the solution every single problem...which isn't entirely correct either...thanks Jeff. Is there a way that I can gently nudge this kid in the right direction? Do you have any more examples that might help explain my point to him?

    Read the article

  • Database not updating after UPDATE SQL statement in ASP.net

    - by Ronnie
    I currently have a problem attepting to update a record within my database. I have a webpage that displays in text boxes a users details, these details are taken from the session upon login. The aim is to update the details when the user overwrites the current text in the text boxes. I have a function that runs when the user clicks the 'Save Details' button and it appears to work, as i have tested for number of rows affected and it outputs 1. However, when checking the database, the record has not been updated and I am unsure as to why. I've have checked the SQL statement that is being processed by displaying it as a label and it looks as so: UPDATE [users] SET [email]=@email, [firstname]=@firstname, [lastname]=@lastname, [promo]=@promo WHERE ([users].[user_id] = 16) The function and other relevant code is: Sub Button1_Click(sender As Object, e As EventArgs) changeDetails(emailBox.text, firstBox.text, lastBox.text, promoBox.text) End Sub Function changeDetails(ByVal email As String, ByVal firstname As String, ByVal lastname As String, ByVal promo As String) As Integer Dim connectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0; Ole DB Services=-4; Data Source=C:\Documents an"& _ "d Settings\Paul Jarratt\My Documents\ticketoffice\datab\ticketoffice.mdb" Dim dbConnection As System.Data.IDbConnection = New System.Data.OleDb.OleDbConnection(connectionString) Dim queryString As String = "UPDATE [users] SET [email]=@email, [firstname]=@firstname, [lastname]=@lastname, "& _ "[promo]=@promo WHERE ([users].[user_id] = " + session.contents.item("ID") + ")" Dim dbCommand As System.Data.IDbCommand = New System.Data.OleDb.OleDbCommand dbCommand.CommandText = queryString dbCommand.Connection = dbConnection Dim dbParam_email As System.Data.IDataParameter = New System.Data.OleDb.OleDbParameter dbParam_email.ParameterName = "@email" dbParam_email.Value = email dbParam_email.DbType = System.Data.DbType.[String] dbCommand.Parameters.Add(dbParam_email) Dim dbParam_firstname As System.Data.IDataParameter = New System.Data.OleDb.OleDbParameter dbParam_firstname.ParameterName = "@firstname" dbParam_firstname.Value = firstname dbParam_firstname.DbType = System.Data.DbType.[String] dbCommand.Parameters.Add(dbParam_firstname) Dim dbParam_lastname As System.Data.IDataParameter = New System.Data.OleDb.OleDbParameter dbParam_lastname.ParameterName = "@lastname" dbParam_lastname.Value = lastname dbParam_lastname.DbType = System.Data.DbType.[String] dbCommand.Parameters.Add(dbParam_lastname) Dim dbParam_promo As System.Data.IDataParameter = New System.Data.OleDb.OleDbParameter dbParam_promo.ParameterName = "@promo" dbParam_promo.Value = promo dbParam_promo.DbType = System.Data.DbType.[String] dbCommand.Parameters.Add(dbParam_promo) Dim rowsAffected As Integer = 0 dbConnection.Open Try rowsAffected = dbCommand.ExecuteNonQuery Finally dbConnection.Close End Try labelTest.text = rowsAffected.ToString() if rowsAffected = 1 then labelSuccess.text = "* Your details have been updated and saved" else labelError.text = "* Your details could not be updated" end if End Function Any help would be greatly appreciated.

    Read the article

  • Ruby on Rails 2.3.5: Populating my prod and devel database with data (migration or fixture?)

    - by randombits
    I need to populate my production database app with data in particular tables. This is before anyone ever even touches the application. This data would also be required in development mode as it's required for testing against. Fixtures are normally the way to go for testing data, but what's the "best practice" for Ruby on Rails to ship this data to the live database also upon db creation? ultimately this is a two part question I suppose. 1) What's the best way to load test data into my database for development, this will be roughly 1,000 items. Is it through a migration or through fixtures? The reason this is a different answer than the question below is that in development, there's certain fields in the tables that I'd like to make random. In production, these fields would all start with the same value of 0. 2) What's the best way to bootstrap a production db with live data I need in it, is this also through a migration or fixture? I think the answer is to seed as described here: http://lptf.blogspot.com/2009/09/seed-data-in-rails-234.html but I need a way to seed for development and seed for production. Also, why bother using Fixtures if seeding is available? When does one seed and when does one use fixtures?

    Read the article

< Previous Page | 443 444 445 446 447 448 449 450 451 452 453 454  | Next Page >