Search Results

Search found 71953 results on 2879 pages for 'load data infile'.

Page 11/2879 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18  | Next Page >

  • Managing Data Dependecies of Java Classes that Load Data from the Classpath at Runtime

    - by Martin Potthast
    What is the simplest way to manage dependencies of Java classes to data files present in the classpath? More specifically: How should data dependencies be annotated? Perhaps using Java annotations (e.g., @Data)? Or rather some build entries in a build script or a properties file? Is there build tool that integrates and evaluates such information (Ant, Scons, ...)? Do you have examples? Consider the following scenario: A few lines of Ant create a Jar from my sources that includes everything found on the classpath. Then jarjar is used to remove all .class files that are not necessary to execute, say, class Foo. The problem is that all the data files that class Bar depends upon are still there in the Jar. The ideal deployment script, however, would recognize that the data files on which only class Bar depends can be removed while data files on which class Foo depends must be retained. Any hints?

    Read the article

  • Best way to implement user-powered data validation

    - by vegetables
    I run a product recommendation engine and I'm hitting a few snags. I'm looking to see if anyone has any recommendations on what I should do to minimize these issues. Here's how the site works: Users come to the site and are presented with product recommendations based on some criteria. If a user knows of a product that is not in our system, they can add it by providing the product name and manufacturer. We take that information, and: Hit one API to gather all the product meta-data (and to validate the product spelling, etc). If the product is not in this first API, we do not allow it in our system. Use the information from step 1 to hit another API for pricing information (gathered from many places online). For the sake of discussion, assume that I am searching both APIs in the most efficient/successful manner possible. For the most part, this works very well. I'd say ~80% of our data is perfectly accurate, but there are a few issues: Sometimes the pricing API (Step 2) doesn't have any information for the product. The way the pricing API is built, it will always return something (theoretically, the closest possible match), and there's no guarantee that the product name is spelled exactly the same way in both APIs, so there's no automated way of knowing if it's the right product. When the pricing API finds the right product, occasionally it has outdated, or even invalid pricing data (e.g. if it screen-scraped the wrong price from a website). Since the site was fairly small at first, I was able to manually verify every product that was added to the website. However, the site has grown to the point where this is taking several hours per day, and is just not efficient use of my time. So, my question is: Aside from hiring someone (or getting an intern) to validate all the data manually, what would be the best system of letting my userbase self-manage the data. Specifically, how can I allow users to edit the data while minimizing the risk of someone ambushing my website, or accidentally setting the data incorrectly.

    Read the article

  • data handling with javascript

    - by Vincent Warmerdam
    Python has a very neat package called pandas which allows for quick data transformation; tables, aggregation, that sort of thing. A lot of these types of functionality can also be found in the python itertools module. The plyR package in R is also very similar. Usually one woud use this functionality to produce a table which is later visualized with a plot. I am personally very fond of d3, and I would like to allow the user to first indicate what type of data aggregation he wants on the dataset before it is visualized. The visualisation in question involves making a heatmap where the user gets to select the size of the bins of the heatmap beforehand (I want d3 to project this through leaflet). I want to visually select the ideal size of the bins for the heatmap. The way I work now is that I take the dataset, aggregate it with python and then manually load it in d3. This is a process that takes a lot of human effort and I was wondering if the data aggregation can be done through the javascript of the browser. I couldn't find a package for javascript specifically built for data, suggesting (to me) that this is a bad idea and that one should not use javascript for the data handling. Is there a good module/package for javascript to handle data aggregation? Is it a good/bad idea to do the data aggregation in javascript (performance wise)?

    Read the article

  • Customizing the NUnit GUI for data-driven testing

    - by rwong
    My test project consists of a set of input data files which is fed into a piece of legacy third-party software. Since the input data files for this software are difficult to construct (not something that can be done intentionally), I am not going to add new input data files. Each input data file will be subject to a set of "test functions". Some of the test functions can be invoked independently. Other test functions represent the stages of a sequential operation - if an earlier stage fails, the subsequent stages do not need to be executed. I have experimented with the NUnit parametrized test case (TestCaseAttribute and TestCaseSourceAttribute), passing in the list of data files as test cases. I am generally satisfied with the the ability to select the input data for testing. However, I would like to see if it is possible to customize its GUI's tree structure, so that the "test functions" become the children of the "input data". For example: File #1 CheckFileTypeTest GetFileTopLevelStructureTest CompleteProcessTest StageOneTest StageTwoTest StageThreeTest File #2 CheckFileTypeTest GetFileTopLevelStructureTest CompleteProcessTest StageOneTest StageTwoTest StageThreeTest This will be useful for identifying the stage that failed during the processing of a particular input file. Is there any tips and tricks that will enable the new tree layout? Do I need to customize NUnit to get this layout?

    Read the article

  • Test Data in a Distributed System

    - by Davin Tryon
    A question that has been vexing me lately has been about how to effectively test (end-to-end) features in a distributed system. Particuarly, how to effectively manage (through time) test data for feature testing. The system in question is a typical SOA setup. The composition is done in JavaScript when call to several REST APIs. Each service is built as an independent block. Each service has some kind of persistent storage (SQL Server in most cases). The main issue at the moment is how to approach test data when testing end-to-end features. Functional end-to-end testing occurs through the UI, and it is therefore necessary for test data to be set up before the test run (this could be manual or automated testing). As is typical in a distributed system, identifiers from one service are used as a link in another service. So, some level of synchronization needs to be present in the data to effectively test. What is the best way to manage and set up this data after a successful deployment to a test environment? For example, is it better to manage this test data inside each service? Or package it together with the testing suite? Does that testing suite exist as a separate project? I'm interested in design guidance about how to store and manage this test data as the application features evolve.

    Read the article

  • Data Management Business Continuity Planning

    Business Continuity Governance In order to ensure data continuity for an organization, they need to ensure they know how to handle a data or network emergency because all systems have the potential to fail. Data Continuity Checklist: Disaster Recovery Plan/Policy Backups Redundancy Trained Staff Business Continuity Policies In order to protect data in case of any emergency a company needs to put in place a Disaster recovery plan and policies that can be executed by IT staff to ensure the continuity of the existing data and/or limit the amount of data that is not contiguous.  A disaster recovery plan is a comprehensive statement of consistent actions to be taken before, during and after a disaster, according to Geoffrey H. Wold. He also states that the primary objective of disaster recovery planning is to protect the organization in the event that all or parts of its operations and/or computer services are rendered unusable. Furthermore, companies can mandate through policies that IT must maintain redundant hardware in case of any hardware failures and redundant network connectivity incase the primary internet service provider goes down.  Additionally, they can require that all staff be trained in regards to the Disaster recovery policy to ensure that all parties evolved are knowledgeable to execute the recovery plan. Business Continuity Procedures Business continuity procedure vary from organization to origination, however there are standard procedures that most originations should follow. Standard Business Continuity Procedures Backup and Test Backups to ensure that they work Hire knowledgeable and trainable staff  Offer training on new and existing systems Regularly monitor, test, maintain, and upgrade existing system hardware and applications Maintain redundancy regarding all data, and critical business functionality

    Read the article

  • How to choose how to store data?

    - by Eldros
    Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime. - Chinese Proverb I could ask what kind of data storage I should use for my actual project, but I want to learn to fish, so I don't need to ask for a fish each time I begin a new project. So, until I used two methods to store data on my non-game project: XML files, and relational databases. I know that there is also other kind of database, of the NoSQL kind. However I wouldn't know if there is more choice available to me, or how to choose in the first place, aside arbitrary picking one. So the question is the following: How should I choose the kind of data storage for a game project? And I would be interested on the following criterion when choosing: The size of the project. The platform targeted by the game. The complexity of the data structure. Added Portability of data amongst many project. Added How often should the data be accessed Added Multiple type of data for a same application Any other point you think is of interest when deciding what to use. EDIT I know about Would it be better to use XML/JSON/Text or a database to store game content?, but thought it didn't address exactly my point. Now if I am wrong, I would gladely be shown the error in my ways.

    Read the article

  • Ubuntu Server 12.04 CPU Load

    - by zertux
    I have a Server (2x Hexa-Core Xeon E5649 2.53GHz w/HT with 32GB RAM and 20000 GB Bandwidth) running Ubuntu Server 12.04 LTS. The server runs LAMP and serves one website only, the estimated number of users is to be ~ 15,000 at the same time. At the moment i have around 2000 users online each of them runs 50 MySQL queries (small values mostly select and insert) from the beginning until the end of the session. Server CPU Load is high at this number of connections while the RAM usage is almost 1GB out of 32GB its worth mentioning that the server was running very fast with no problems at all but am concerned about the load average. http://s12.postimage.org/z7hi6mz3h/photo.png top - 03:02:43 up 9 min, 2 users, load average: 50.83, 30.14, 12.83 Tasks: 432 total, 1 running, 430 sleeping, 0 stopped, 1 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 66.5%id, 33.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32939992k total, 3111604k used, 29828388k free, 84108k buffers Swap: 2048280k total, 0k used, 2048280k free, 1621640k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2860 root 20 0 25820 2288 1420 S 3 0.0 0:11.18 htop 1182 root 20 0 0 0 0 D 2 0.0 0:01.46 kjournald 1935 mysql 20 0 12.3g 161m 7924 S 1 0.5 102:31.45 mysqld 11 root 20 0 0 0 0 S 0 0.0 0:00.38 kworker/0:1 1822 www-data 20 0 247m 25m 4188 D 0 0.1 0:01.81 apache2 2920 www-data 20 0 0 0 0 Z 0 0.0 0:01.20 apache2 <defunct> 2942 www-data 20 0 247m 23m 3056 D 0 0.1 0:00.20 apache2 3516 www-data 20 0 247m 23m 3028 D 0 0.1 0:00.06 apache2 3521 www-data 20 0 247m 23m 3020 D 0 0.1 0:00.09 apache2 3664 www-data 20 0 247m 23m 3132 D 0 0.1 0:00.09 apache2 3674 www-data 20 0 247m 23m 3252 D 0 0.1 0:00.06 apache2 3713 www-data 20 0 247m 23m 3040 D 0 0.1 0:00.09 apache2 1 root 20 0 24328 2284 1344 S 0 0.0 0:03.09 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0 root@server:~/codes# vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 19 0 0 29684012 86112 1689844 0 0 19 590 254 231 48 0 47 5 23 0 0 29704812 86128 1697672 0 0 4 320 11100 8121 77 1 22 0 33 0 0 29671044 86156 1705308 0 0 0 5440 13190 9140 95 1 4 0 33 3 0 29670088 86160 1706288 0 0 0 32932 12275 7297 99 0 1 0 35 0 0 29693456 86188 1710724 0 0 4 676 12701 7867 98 1 1 0 ^C I have not changed any of the default configurations that comes with Ubuntu. Is this load normal for such powerful server ? is there any optimization i can make to Apache/MySQL to minimize the load ? What do you recommend ?

    Read the article

  • Plesk Postfix Mail Server 9.5.4 very heavy load, 1000s of processes

    - by Eugene van der Merwe
    Our Plesk Linux Ubuntu 64-bit mail server has extremely high load and we don't know how to isolate it. The load was okay will two weeks ago but in the last two weeks it's seriously deteriorated. The mail server has been running for years and we have had sporadic performance issues. Normally we reduce the load by turning off all SPAM checks until the problem is sorted (which sometimes resolves itself). Currently we have turned of real time block lists, SPF checking and we have attempted to turn off SpamAssassin. No matter what we do the SpamAssassin check box stays ticked in the GUI. Out of desperation we have done /etc/init.d/psa-spamassassin stop. For years we haven't been able to do SpamAssassin because it kills the server. We would like to use it but performance is more important for now. We cannot turn off Greylisting. The moment we turn off Greylisting our help desk is inandated with calls. Out of desperation we investigated truncating the Greylisting database which is now 2.5 GB big but we abandoned this after noticing turning of Greylisting doesn't improve the performance at all. We have no anti-virus. It's just more load and Dr. Web never really worked that well for us. But we'll try that if it will make a difference. We have implemented Postfix Anvil. This seems to have made the situation worse so we disabled it. We’re not sure if this is the case. Our current mail server is configured to forward all SMTP to a relay server. We did so to reduce the load. This helped a lot because outgoing queues are generally empty. We are running in an Expand configuration. The mail server has about 12 000 accounts of which maybe half are active. We have read through this document: http://www.postfix.org/STRESS_README.html but there are too many settings and we don’t know which ones to choose. Please assist urgently. We need advice on how to fix this problem before all our clients abandon is. The only clue we have is that there are 100s of these processes: 30 13205 1 0 13:18 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10027 before-queue 30 13207 1 0 11:38 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10027 before-queue 30 13208 1 0 13:18 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10026 before-remote 30 13209 1 0 11:38 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10026 before-remote 30 13213 1 0 13:18 ? 00:00:00 /usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10027 before-queue

    Read the article

  • MySQL: LOAD DATA reclaim disk space after delete

    - by Michael
    I have a DB schema composed of MYISAM tables, i am interested to delete old records from time to time from some of the tables. I know that delete does not reclaim the memory space, but as i found in a description of DELETE command, inserts may reuse the space deleted In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. I am interested if LOAD DATA command also reuses the deleted space? UPDATE I am also interested how the index space reclaimed?

    Read the article

  • Data Masking Pack 12.1.0.3 Certified with E-Business Suite 12.1.3

    - by Elke Phelps (Oracle Development)
    I'm pleased to announce the certification of the E-Business Suite 12.1.3 Data Masking Template for the Data Masking Pack with Enterprise Manager Cloud Control 12.1.0.3. You can use the Oracle Data Masking Pack with Oracle Enterprise Manager Grid Control 12c to scramble sensitive data in cloned E-Business Suite environments.     You may scramble data in E-Business Suite cloned environments with EM12.1.0.3 using the following template: E-Business Suite 12.1.3 Data Masking Template for Data Masking Pack with EM12c (Patch 18462641) What does data masking do in E-Business Suite environments? Application data masking does the following: De-identify the data:  Scramble identifiers of individuals, also known as personally identifiable information or PII.  Examples include information such as name, account, address, location, and driver's license number. Mask sensitive data:  Mask data that, if associated with personally identifiable information (PII), would cause privacy concerns.  Examples include compensation, health and employment information.   Maintain data validity:  Provide a fully functional application.  How can EBS customers use data masking? The Oracle E-Business Suite Template for Data Masking Pack can be used in situations where confidential or regulated data needs to be shared with other non-production users who need access to some of the original data, but not necessarily every table.  Examples of non-production users include internal application developers or external business partners such as offshore testing companies, suppliers or customers.  Due to data dependencies, scrambling E-Business Suite data is not a trivial task.  The data needs to be scrubbed in such a way that allows the application to continue to function. The template works with the Oracle Data Masking Pack and Oracle Enterprise Manager to obscure sensitive E-Business Suite information that is copied from production to non-production environments.  The Oracle E-Business Suite Template for Data Masking Pack is applied to a non-production environment with the Enterprise Manager Grid Control Data Masking Pack.  When applied, the Oracle E-Business Suite Template for Data Masking Pack will create an irreversibly scrambled version of your production database for development and testing. Is there a charge for this? Yes. You must purchase licenses for the Oracle Data Masking Pack to use the Oracle E-Business Suite 12.1.3 template. The Oracle E-Business Suite 12.1.3 Template for the Data Masking Pack is included with the Oracle Data Masking Pack license.  You can contact your Oracle account manager for more details about licensing. References Additional details and requirements are provided in the following My Oracle Support Note: Using Oracle E-Business Suite Release 12.1.3 Template for the Data Masking Pack with Oracle Enterprise Manager 12.1 Data Masking Tool (Note 1481916.1) Masking Sensitive Data in the Oracle Database Real Application Testing User's Guide 11g Release 2 (11.2) Related Articles Scrambling Sensitive Data in E-Business Suite E-Business Suite 12.1.3 Data Masking Certified with Enterprise Manager 12c

    Read the article

  • SSI includes not working on Debian with Apache

    - by Mike
    I'm trying to get SSI to work on Debian running Apache, however the .shtml files are not being parsed. From a PHP file with phpinfo() I can see that the following show up in the loaded modules section: mod_mime_xattr mod_mime mod_mime_magic In /etc/apache2/mods-enabled/mime.conf I have (among other things): AddType text/html .shtml AddOutputFilter INCLUDES .shtml In /etc/apache2/sites-enabled/domain.com.conf (for the virtual host in question) I have: <Directory /home/username/public_html> Options +Includes allow from all AllowOverride All </Directory> and for good measure, I added the following as well: <Directory /> Options +Includes </directory> In the user's .htaccess file, I tried adding: Options +Includes AddType text/html shtml AddHandler server-parsed shtml Nothing seems to work. How can I even debug this? Edit: Here is the output of ls /etc/apache2/mods-enabled/ in case this helps actions.conf dav_svn.load proxy_balancer.load actions.load deflate.conf proxy.conf alias.conf deflate.load proxy_connect.load alias.load dir.conf proxy_http.load auth_basic.load dir.load proxy.load auth_digest.load env.load python.load authn_file.load fcgid.conf reqtimeout.conf authz_default.load fcgid.load reqtimeout.load authz_groupfile.load mime.conf rewrite.load authz_host.load mime.load ruby.load authz_user.load mime_magic.conf setenvif.conf autoindex.conf mime_magic.load setenvif.load autoindex.load mime-xattr.load ssl.conf cgi.load negotiation.conf ssl.load dav_fs.conf negotiation.load status.conf dav_fs.load php5.conf status.load dav.load php5.load suexec.load dav_svn.conf proxy_balancer.conf

    Read the article

  • High load average, low CPU and IO (Centos 5.7)

    - by Ben
    A Drupal 7 site with CiviCRM, after running smoothly for a year on a 1&1 VPS suddenly became unresponsive. Now pages eventually load, but can take more than a minute. Looking at resource use in Virtuozzo, the load average carries a warning, and has remained above 1. While I understand this isn't particularly high, this is a change from when the site was working. Here is a typical snapshot of top: top - 03:10:32 up 3:21, 1 user, load average: 1.16, 1.22, 1.30 Tasks: 43 total, 1 running, 42 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.1%sy, 0.1%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2097152k total, 1015112k used, 1082040k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached CPU idle level never seems to go much below 70%. wa is virtually always at 0. There appears to be lots of free memory. And here is some vmstat output, again showign wa at 0, plenty of free memory, and an idle CPU: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 1100872 0 0 0 0 2783 23672 0 538 1 0 99 0 0 1 0 0 1100872 0 0 0 0 0 16 0 101754 0 0 100 0 0 0 0 0 1100872 0 0 0 0 0 17 0 103133 0 0 100 0 0 0 0 0 1100872 0 0 0 0 0 1 0 102080 0 0 100 0 0 1 0 0 1100872 0 0 0 0 0 6 0 99881 0 0 100 0 0 0 0 0 1100872 0 0 0 0 0 1 0 105187 0 0 100 0 0 I've spoken to 1&1 but they don't have any ideas as to what could be causing the high load average. Instead they suggested an upgrade :) I've looked for processes that might be causing this, examined MySQL showprocesslist, and restarted the container with no result. Does anyone have more troubleshooting suggestions or insights?

    Read the article

  • Virtualbox HTTP load testing, host CPU overload issues

    - by aschuler
    I'm doing HTTP load testing benchmarks (using Apache Benchmark and Siege) on a small Java EE 1.7.0 / Tomcat 7.0.26 application running on a Debian Squeeze 6.0.4 x64 virtualized with Virtualbox 4.1.8. The computer host is Ubuntu 11.10 x64. I've modified those parameters in the Tomcat server.xml : <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="200000" redirectPort="8443" acceptCount="2000" maxThreads="150" minSpareThreads="50" /> The application executed on the server takes around 300ms. This app is running well until a certain amount of concurrent connections like those one : ab -n 500 -c 150 http://xx.xx.xx.xx:8080/myapp/ ab -n 1000 -c 50 http://xx.xx.xx.xx:8080/myapp/ siege -b -c 100 -r 20 http://xx.xx.xx.xx:8080/myapp/ A lot of socket connection timed out happens and this completly overload the host processor (but the CPU load inside the VM is normal). Doing an htop on the host, i can see that the Virtualbox processus is running under 300% CPU and never come down even after the load test is finished. (I've allocated 4 processors to the VM, if I allocate only one processor, CPU load goes under 100%). Restarting Tomcat don't do anything, i'm forced to restart the whole VM. I've tryed to launch those ab/siege commands locally on the VM and everything goes well. I first thought it was related to a linux network limit as explained here: Running some benchmarks using ab, and tomcat starts to really slow down So I've modified those TCP parameters : echo 15 > /proc/sys/net/ipv4/tcp_fin_timeout echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse It seems to be better, but it continues to overload the host CPU and output socket connections time out at a certain amount of concurrent connections. I'm wondering if this is not related to how Virtualbox handles external concurrent connections.

    Read the article

  • is it worth to use load balancer on web server/website

    - by user427969
    I have a website and a while ago, the web server of the company hosting my website was down for about a day. I consulted the company for a solution on how i can stop this from happening in future and they suggested to have a second machine and which will be connected to my current website/web server by a "load balancer" (at an additional huge cost!!!). The second machine will be replicate of the first one and so if i goes down, the other will always be running. ---- Explanation ----- My hosting company suggested that it will be a good idea to have a second machine running at the same time and both the machines will be connected by a load balancer which reduces the rist of a downtime. The second machine will be a mirror of the first and any changes to first must be replicated in the second. I don't mind spending money if it really saves my website from going down. I want to know is it worth having this "load balancer" for my purpose? My website is a 24/7 service. I cannot afford an outage of 24 hours/1 hour. I don't mind using this "load balancer" as far as it is really worth. I am not sure if its just a marketing trick of my hosting company or really a "best" solution Thanks for help. Regards

    Read the article

  • Oracle Data Warehouse and Big Data Magazine MAY Edition for Customers + Partners

    - by KLaker
    Follow us on The latest edition of our monthly data warehouse and big data magazine for Oracle customers and partners is now available. The content for this magazine is taken from the various data warehouse and big data Oracle product management blogs, Oracle press releases, videos posted on Oracle Media Network and Oracle Facebook pages. Click here to view the May Edition Please share this link http://flip.it/fKOUS to our magazine with your customers and partners This magazine is optimized for display on tablets and smartphones using the Flipboard App which is available from the Apple App store and Google Play store

    Read the article

  • Version control and data provenance in charts, slides, and marketing materials that derive from code ouput

    - by EMS
    I develop as part of a small team that mostly does research and statistics stuff. But from the output of our code, other teams often create promotional materials, slides, presentations, etc. We run into a big problem because the marketing team (non-programmers) tend to use Excel, Adobe products, or other tools to carry out their work, and just want easy-to-use data formats from us. This leads to data provenance problems. We see email chains with attachments from 6 months ago and someone is saying "Hey, who generated this data. Can you generate more of it with the recent 6 months of results added in?" I want to help the other teams effectively use version control (my team uses it reasonably well for the code, but every other team classically comes up with many excuses to avoid it). For version controlling a software project where the participants are coders, I have some reasonable understanding of best practices and what to do. But for getting a team of marketing professionals to version control marketing materials and associate metadata about the software used to generate the data for the charts, I'm a bit at a loss. Some of the goals I'd like to achieve: Data that supported a material should never be associated with a person. As in, it should never be the case that someone says "Hey Person XYZ, I see you sent me this data as an attachment 6 months ago, can you update it for me?" Rather, data should be associated with the code and code-version of any code that was used to get it, and perhaps a team of many people who may maintain that code. Then references for data updates are about executing a specific piece of code, with a known version number. I'd like this to be a process that works easily with the tech that the marketing team already uses (e.g. Excel files, Adobe file, whatever). I don't want to burden them with needing to learn a bunch of new stuff just to use version control. They are capable folks, so learning something is fine. Ideally they could use our existing version control framework, but there are some issues around that. I think knowing some general best practices will be enough though, and I can handle patching that into the way our stuff works now. Are there any goals I am failing to think about? What are the time-tested ways to do something like this?

    Read the article

  • What does it mean to treat data as an asset?

    What does it mean to treat data as an asset? When considering this concept, we must define what data is and how it can be considered an asset. Data can easily be defined as a collection of stored truths that are open to interpretation and manipulation.  Expanding on this definition, data can be viewed as a set of captured facts, measurements, and ideas used to make decisions. Furthermore, InvestorsWords.com defines asset as any item of economic value owned by an individual or corporation. Now let’s apply this definition of asset to our definition of data, and ask the following question. Can facts, measurements and ideas be items that are of economic value owned by an individual or corporation? The obvious answer is yes; data can be bought and sold like commodities or analyzed to make smarter business decisions.  We can look at the economic value of data in one of two ways. First, data can be sold as a commodity that can take the form of goods like eBooks, Training, Music, Movies, and so on. Customers are willing to pay to gain access to this data for their consumption. This directly implies that there is an economic value for data in the form of a commodity because customers see a value in obtaining it.  Secondly data can be used in making smarter business decisions that allow for companies to become more profitable and/or reduce their potential for risk in regards to how they operate.  In the past I have worked at companies where we had to analyze previous sales activities in conjunction with current activities to determine how the company was preforming for the quarter.  In addition trends can be formulated based on existing data that allow companies to forecast data so that they can make strategic business decisions based sound forecasted data. Companies that truly value their data are constantly trying to grow and upgrade their data and supporting applications because it is the life blood of a company. If we look at an eBook retailer for example, imagine if they lost all of their data. They would be in essence forced out of business because they would have nothing to sell. In turn, if we look at a company that was using data to facilitate better decision making processes and they lost all of their data then they could be losing potential revenue and/ or increasing the company’s losses by making important business decisions virtually in the dark compared to when they were made on solid data.

    Read the article

  • Data Virtualization: Federated and Hybrid

    - by Krishnamoorthy
    Data becomes useful when it can be leveraged at the right time. Not only enterprises application stores operate on large volume, velocity and variety of data. Mobile and social computing are in the need of operating in foresaid data. Replicating and transferring large swaths of data is one challenge faced in the field of data integration. However, smaller chunks of data aggregated from a variety of sources presents and even more interesting challenge in the industry. Over the past few decades, technology trends focused on best user experience, operating systems, high performance computing, high performance web sites, analysis of warehouse data, service oriented architecture, social computing, cloud computing, and big data. Operating on the ‘dark data’ becomes mandatory in the future technology trend, although, no solution can make dark data useful data in a single day. Useful data can be quantified by the facts of contextual, personalized and on time delivery. In most cases, data from a single source may not be complete the picture. Data has to be combined and computed from various sources, where data may be captured as hybrid data, meaning the combination of structured and unstructured data. Since related data is often found across disparate sources, effectively integrating these sources determines how useful this data ultimately becomes. Technology trends in 2013 are expected to focus on big data and private cloud. Consumers are not merely interested in where data is located or how data is retrieved and computed. Consumers are interested in how quick and how the data can be leveraged. In many cases, data virtualization is the right solution, and is expected to play a foundational role for SOA, Cloud integration, and Big Data. The Oracle Data Integration portfolio includes a data virtualization product called ODSI (Oracle Data Service Integrator). Unlike other data virtualization solutions, ODSI can perform both read and write operations on federated/hybrid data (RDBMS, Webservices,  delimited file and XML). The ODSI Engine is built on XQuery, hence ODSI user can perform computations on data either using XQuery or SQL. Built in data and query caching features, which reduces latency in repetitive calls. Rightly positioning ODSI, can results in a highly scalable model, reducing spend on additional hardware infrastructure.

    Read the article

  • Python what's the data structure for triple data

    - by Paul
    I've got a set of data that has three attributes, say A, B, and C, where A is kind of the index (i.e., A is used to look up the other two attributes.) What would be the best data structure for such data? I used two dictionaries, with A as the index of each. However, there's key errors when the query to the data doesn't match any instance of A.

    Read the article

  • Load balance to proxies

    - by LoveRight
    I have installed several proxy programs whose IP addresses are, for example, 127.0.0.1:8580(use http), 127.0.0.1:9050(use socks5). You may regrard them as Tor and its alternatives. You know, certain proxy programs are faster than others at times, while at other times, they would be slower. The Firefox add-in, AutoProxy and FoxyProxy Standard, can define a list of rules such as any urls matching the pattern *.google.com should be proxied to 127.0.0.1:8580 using socks5 protocol. But the rule is "static". I want *.google.com to be proxied to the fastest proxy, no matter which one. I think that is kind of load balancing. I thought I could set a rule that direct request of *.google.com to the address the load balancer listens, and the load balancer forwards the request to the fastest real proxy. I notice that tor uses socks5 protocol and some other applications use http. I feel confused that which protocol should the load balancer use. I also start to wonder about the feasibility of this solution. Any suggestions? My operating system is Windows 7 x64.

    Read the article

  • Implementing a generic repository for WCF data services

    - by cibrax
    The repository implementation I am going to discuss here is not exactly what someone would call repository in terms of DDD, but it is an abstraction layer that becomes handy at the moment of unit testing the code around this repository. In other words, you can easily create a mock to replace the real repository implementation. The WCF Data Services update for .NET 3.5 introduced a nice feature to support two way data bindings, which is very helpful for developing WPF or Silverlight based application but also for implementing the repository I am going to talk about. As part of this feature, the WCF Data Services Client library introduced a new collection DataServiceCollection<T> that implements INotifyPropertyChanged to notify the data context (DataServiceContext) about any change in the association links. This means that it is not longer necessary to manually set or remove the links in the data context when an item is added or removed from a collection. Before having this new collection, you basically used the following code to add a new item to a collection. Order order = new Order {   Name = "Foo" }; OrderItem item = new OrderItem {   Name = "bar",   UnitPrice = 10,   Qty = 1 }; var context = new OrderContext(); context.AddToOrders(order); context.AddToOrderItems(item); context.SetLink(item, "Order", order); context.SaveChanges(); Now, thanks to this new collection, everything is much simpler and similar to what you have in other ORMs like Entity Framework or L2S. Order order = new Order {   Name = "Foo" }; OrderItem item = new OrderItem {   Name = "bar",   UnitPrice = 10,   Qty = 1 }; order.Items.Add(item); var context = new OrderContext(); context.AddToOrders(order); context.SaveChanges(); In order to use this new feature, you first need to enable V2 in the data service, and then use some specific arguments in the datasvcutil tool (You can find more information about this new feature and how to use it in this post). DataSvcUtil /uri:"http://localhost:3655/MyDataService.svc/" /out:Reference.cs /dataservicecollection /version:2.0 Once you use those two arguments, the generated proxy classes will use DataServiceCollection<T> rather than a simple ObjectCollection<T>, which was the default collection in V1. There are some aspects that you need to know to use this feature correctly. 1. All the entities retrieved directly from the data context with a query track the changes and report those to the data context automatically. 2. A entity created with “new” does not track any change in the properties or associations. In order to enable change tracking in this entity, you need to do the following trick. public Order CreateOrder() {   var collection = new DataServiceCollection<Order>(this.context);   var order = new Order();   collection.Add(order);   return order; } You basically need to create a collection, and add the entity to that collection with the “Add” method to enable change tracking on that entity. 3. If you need to attach an existing entity (For example, if you created the entity with the “new” operator rather than retrieving it from the data context with a query) to a data context for tracking changes, you can use the “Load” method in the DataServiceCollection. var order = new Order {   Id = 1 }; var collection = new DataServiceCollection<Order>(this.context); collection.Load(order); In this case, the order with Id = 1 must exist on the data source exposed by the Data service. Otherwise, you will get an error because the entity did not exist. These cool extensions methods discussed by Stuart Leeks in this post to replace all the magic strings in the “Expand” operation with Expression Trees represent another feature I am going to use to implement this generic repository. Thanks to these extension methods, you could replace the following query with magic strings by a piece of code that only uses expressions. Magic strings, var customers = dataContext.Customers .Expand("Orders")         .Expand("Orders/Items") Expressions, var customers = dataContext.Customers .Expand(c => c.Orders.SubExpand(o => o.Items)) That query basically returns all the customers with their orders and order items. Ok, now that we have the automatic change tracking support and the expression support for explicitly loading entity associations, we are ready to create the repository. The interface for this repository looks like this,public interface IRepository { T Create<T>() where T : new(); void Update<T>(T entity); void Delete<T>(T entity); IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties); IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties); void Attach<T>(T entity); void SaveChanges(); } The Retrieve and RetrieveAll methods are used to execute queries against the data service context. While both methods receive an array of expressions to load associations explicitly, only the Retrieve method receives a predicate representing the “where” clause. The following code represents the final implementation of this repository.public class DataServiceRepository: IRepository { ResourceRepositoryContext context; public DataServiceRepository() : this (new DataServiceContext()) { } public DataServiceRepository(DataServiceContext context) { this.context = context; } private static string ResolveEntitySet(Type type) { var entitySetAttribute = (EntitySetAttribute)type.GetCustomAttributes(typeof(EntitySetAttribute), true).FirstOrDefault(); if (entitySetAttribute != null) return entitySetAttribute.EntitySet; return null; } public T Create<T>() where T : new() { var collection = new DataServiceCollection<T>(this.context); var entity = new T(); collection.Add(entity); return entity; } public void Update<T>(T entity) { this.context.UpdateObject(entity); } public void Delete<T>(T entity) { this.context.DeleteObject(entity); } public void Attach<T>(T entity) { var collection = new DataServiceCollection<T>(this.context); collection.Load(entity); } public IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query.Where(predicate); } public IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query; } public void SaveChanges() { this.context.SaveChanges(SaveChangesOptions.Batch); } } For instance, you can use the following code to retrieve customers with First name equal to “John”, and all their orders in a single call. repository.Retrieve<Customer>(    c => c.FirstName == “John”, //Where    c => c.Orders.SubExpand(o => o.Items)); In case, you want to have some pre-defined queries that you are going to use across several places, you can put them in an specific class. public static class CustomerQueries {   public static Expression<Func<Customer, bool>> LastNameEqualsTo(string lastName)   {     return c => c.LastName == lastName;   } } And then, use it with the repository. repository.Retrieve<Customer>(    CustomerQueries.LastNameEqualsTo("foo"),    c => c.Orders.SubExpand(o => o.Items));

    Read the article

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18  | Next Page >