Search Results

Search found 80052 results on 3203 pages for 'data load performance'.

Page 17/3203 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >

  • Server Benchmarking: What tools to use with my real-world test data

    - by mdemmitt
    I want to benchmark a new server using historical HTTP-request data. I have a textfile that contains one day's worth of real historical requests to a production server. What is the best tool for sending that list of requests on the server I'm testing? The tool I use should be able to configure the following: Number of threads making the requests Number of requests/second sent A list of request URLs to use when making the requests. Apache Bench seems like a close fit. However, Bench does not seem to be able to take in a list of request URLs as a parameter. What would you recommend?

    Read the article

  • replacing data.frame element-wise operations with data.table (that used rowname)

    - by Harold
    So lets say I have the following data.frames: df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10]) df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10), row.names = letters[1:10]) And perhaps the "equivalent" data.tables: dt1 <- data.table(x = rownames(df1), df1, key = 'x') dt2 <- data.table(x = rownames(df2), df2, key = 'x') If I want to do element-wise operations between df1 and df2, they look something like dfRes <- df1 / df2 And rownames() is preserved: R> head(dfRes) y z a 0.5 3.1405463 b 1.0 1.2925200 c 1.5 1.4137930 d 2.0 -0.5532855 e 2.5 -0.0998303 f 1.2 -1.6236294 My poor understanding of data.table says the same operation should look like this: dtRes <- dt1[, !'x', with = F] / dt2[, !'x', with = F] dtRes[, x := dt1[,x,]] setkey(dtRes, x) (setkey optional) Is there a more data.table-esque way of doing this? As a slightly related aside, more generally, I would have other columns such as factors in each data.table and I would like to omit those columns while doing the element-wise operations, but still have them in the result. Does this make sense? Thanks!

    Read the article

  • a load balancing scenario using HAProxy and keepalived shows no performance advantage

    - by chakoshi
    Hi, I am trying to setup a load balanced web server scenario, using two HAproxy load balancers and two debian web servers following this guide http://www.howtoforge.com/setting-up-a-high-availability-load-balancer-with-haproxy-keepalived-on-debian-lenny. the setup is working but the results of simple performance benchmarking is not what I expected. I tried apache benchmark tool to send lots of requests to servers (one time directly testing one of the web servers and the other time testing through the load balancer) using the command "ab -n 1000000 -c 500 http://IP/index.html", but the test results shows better performance for the single server without load balancer. can any one tell me if I'm going wrong on some thing?

    Read the article

  • PHP - post data ends when '&' is in data.

    - by Phil Jackson
    Hi all, im posting data using jquery/ajax and PHP at the backend. Problem being, when I input something like 'Jack & Jill went up the hill' im only recieving 'Jack' when it gets to the backend. I have thrown an error at the frontend before that data is sent which alerts 'Jack & Jill went up the hill'. When I put die(print_r($_POST)); at the very top of my index page im only getting [key] => Jack how can I be loosing the data? I thought It may have been my filter; <?php function filter( $data ) { $data = trim( htmlentities( strip_tags( mb_convert_encoding( $data, 'HTML-ENTITIES', "UTF-8") ) ) ); if ( get_magic_quotes_gpc() ) { $data = stripslashes( $data ); } //$data = mysql_real_escape_string( $data ); return $data; } echo "<xmp>" . filter("you & me") . "</xmp>"; ?> but that returns fine in the test above you &amp; me which is in place after I added die(print_r($_POST));. Can anyone think of how and why this is happening? Any help much appreciated. Regards, Phil.

    Read the article

  • Oracle TimesTen In-Memory Database Performance on SPARC T4-2

    - by Brian
    The Oracle TimesTen In-Memory Database is optimized to run on Oracle's SPARC T4 processor platforms running Oracle Solaris 11 providing unsurpassed scalability, performance, upgradability, protection of investment and return on investment. The following demonstrate the value of combining Oracle TimesTen In-Memory Database with SPARC T4 servers and Oracle Solaris 11: On a Mobile Call Processing test, the 2-socket SPARC T4-2 server outperforms: Oracle's SPARC Enterprise M4000 server (4 x 2.66 GHz SPARC64 VII+) by 34%. Oracle's SPARC T3-4 (4 x 1.65 GHz SPARC T3) by 2.7x, or 5.4x per processor. Utilizing the TimesTen Performance Throughput Benchmark (TPTBM), the SPARC T4-2 server protects investments with: 2.1x the overall performance of a 4-socket SPARC Enterprise M4000 server in read-only mode and 1.5x the performance in update-only testing. This is 4.2x more performance per processor than the SPARC64 VII+ 2.66 GHz based system. 10x more performance per processor than the SPARC T2+ 1.4 GHz server. 1.6x better performance per processor than the SPARC T3 1.65 GHz based server. In replication testing, the two socket SPARC T4-2 server is over 3x faster than the performance of a four socket SPARC Enterprise T5440 server in both asynchronous replication environment and the highly available 2-Safe replication. This testing emphasizes parallel replication between systems. Performance Landscape Mobile Call Processing Test Performance System Processor Sockets/Cores/Threads Tps SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 218,400 M4000 SPARC64 VII+, 2.66 GHz 4 16 32 162,900 SPARC T3-4 SPARC T3, 1.65 GHz 4 64 512 80,400 TimesTen Performance Throughput Benchmark (TPTBM) Read-Only System Processor Sockets/Cores/Threads Tps SPARC T3-4 SPARC T3, 1.65 GHz 4 64 512 7.9M SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 6.5M M4000 SPARC64 VII+, 2.66 GHz 4 16 32 3.1M T5440 SPARC T2+, 1.4 GHz 4 32 256 3.1M TimesTen Performance Throughput Benchmark (TPTBM) Update-Only System Processor Sockets/Cores/Threads Tps SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 547,800 M4000 SPARC64 VII+, 2.66 GHz 4 16 32 363,800 SPARC T3-4 SPARC T3, 1.65 GHz 4 64 512 240,500 TimesTen Replication Tests System Processor Sockets/Cores/Threads Asynchronous 2-Safe SPARC T4-2 SPARC T4, 2.85 GHz 2 16 128 38,024 13,701 SPARC T5440 SPARC T2+, 1.4 GHz 4 32 256 11,621 4,615 Configuration Summary Hardware Configurations: SPARC T4-2 server 2 x SPARC T4 processors, 2.85 GHz 256 GB memory 1 x 8 Gbs FC Qlogic HBA 1 x 6 Gbs SAS HBA 4 x 300 GB internal disks Sun Storage F5100 Flash Array (40 x 24 GB flash modules) 1 x Sun Fire X4275 server configured as COMSTAR head SPARC T3-4 server 4 x SPARC T3 processors, 1.6 GHz 512 GB memory 1 x 8 Gbs FC Qlogic HBA 8 x 146 GB internal disks 1 x Sun Fire X4275 server configured as COMSTAR head SPARC Enterprise M4000 server 4 x SPARC64 VII+ processors, 2.66 GHz 128 GB memory 1 x 8 Gbs FC Qlogic HBA 1 x 6 Gbs SAS HBA 2 x 146 GB internal disks Sun Storage F5100 Flash Array (40 x 24 GB flash modules) 1 x Sun Fire X4275 server configured as COMSTAR head Software Configuration: Oracle Solaris 11 11/11 Oracle TimesTen 11.2.2.4 Benchmark Descriptions TimesTen Performance Throughput BenchMark (TPTBM) is shipped with TimesTen and measures the total throughput of the system. The workload can test read-only, update-only, delete and insert operations as required. Mobile Call Processing is a customer-based workload for processing calls made by mobile phone subscribers. The workload has a mixture of read-only, update, and insert-only transactions. The peak throughput performance is measured from multiple concurrent processes executing the transactions until a peak performance is reached via saturation of the available resources. Parallel Replication tests using both asynchronous and 2-Safe replication methods. For asynchronous replication, transactions are processed in batches to maximize the throughput capabilities of the replication server and network. In 2-Safe replication, also known as no data-loss or high availability, transactions are replicated between servers immediately emphasizing low latency. For both environments, performance is measured in the number of parallel replication servers and the maximum transactions-per-second for all concurrent processes. See Also SPARC T4-2 Server oracle.com OTN Oracle TimesTen In-Memory Database oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 1 October 2012.

    Read the article

  • Strange performance issue with Dell R7610 and LSI 2208 RAID controller

    - by GregC
    Connecting controller to any of the three PCIe x16 slots yield choppy read performance around 750 MB/sec Lowly PCIe x4 slot yields steady 1.2 GB/sec read Given same files, same Windows Server 2008 R2 OS, same RAID6 24-disk Seagate ES.2 3TB array on LSI 9286-8e, same Dell R7610 Precision Workstation with A03 BIOS, same W5000 graphics card (no other cards), same settings etc. I see super-low CPU utilization in both cases. SiSoft Sandra reports x8 at 5GT/sec in x16 slot, and x4 at 5GT/sec in x4 slot, as expected. I'd like to be able to rely on the sheer speed of x16 slots. What gives? What can I try? Any ideas? Please assist Cross-posted from http://en.community.dell.com/support-forums/desktop/f/3514/t/19526990.aspx Follow-up information We did some more performance testing with reading from 8 SSDs, connected directly (without an expander chip). This means that both SAS cables were utilized. We saw nearly double performance, but it varied from run to run: {2.0, 1.8, 1.6, and 1.4 GB/sec were observed, then performance jumped back up to 2.0}. The SSD RAID0 tests were conducted in a x16 PCIe slot, all other variables kept the same. It seems to me that we were getting double the performance of HDD-based RAID6 array. Just for reference: maximum possible read burst speed over single channel of SAS 6Gb/sec is 570 MB/sec due to 8b/10b encoding and protocol limitations (SAS cable provides four such channels).

    Read the article

  • Looking for application performance tracking software

    - by JavaRocky
    I have multiple java-based applications which produce statistics on how long method calls take. Right now the information is being written into a log file and I analyse performance that way. However with multiple apps and more monitoring requirements this is being becoming a bit overwhelming. I am looking for an application which will collect stats and graph them so I can analyse performance and be aware of performance degradation. I have looked at Solarwinds Application Performance Monitoring, however this polls periodically to gather information. My applications are totally event based and we would like to graph and track this accordingly. I almost started hacking together some scripts to produce Google Charts but surely there are applications which do this already. Suggestions?

    Read the article

  • Hyper-V performance comparisons vs physical client?

    - by rwmnau
    Are there any comparisons between Hyper-V client machines and their physical equivalent? I've looked around and can find 4000 articles about improving Hyper-V performance, but I can't find any that actually do a side-by-side comparison or give benchmarking numbers. Ideally, I'm interested in a comparison of CPU, memory, disk, and graphics performance between something like the following: Some powerful workstation (with plenty of RAM) with Windows 7 installed on it directly Same exact worksation with Hyper-V Server 2008 R2 (the bare Server role) and a full-screen Windows 7 client machine Virtual Server 2005 had performance that didn't compare at all with actual hardware, but with the advances in CPU and hardware-level virtualization, has performance improved significantly? How obvious would it be to a user of the two above scenarios that one of them was virtualized, and does anybody know of actual benchmarking of this type?

    Read the article

  • LTO 2 tape performance in LTO 3 drive

    - by hmallett
    I have a pile of LTO 2 tapes, and both an LTO 2 drive (HP Ultrium 460e), and an autoloader with an LTO 3 drive in (Tandberg T24 autoloader, with a HP drive). Performance of the LTO 2 tapes in the LTO 2 drive is adequate and consistent. HP L&TT tells me that the tapes can be read and written at 64 MB/s, which seems in line with the performance specifications of the drive. When I perform a backup (over the network) using Symantec Backup Exec, I get about 1700 MB/min backup and verify speeds, which is slower, but still adequate. Performance of the LTO 2 tapes in the LTO 3 drive in the autoloader is a different story. HP L&TT tells me that the tapes can be read at 82 MB/s and written at 49 MB/s, which seems unusual at the write speed drop, but not the end of the world. When I perform a backup (over the network) using Symantec Backup Exec though, I get about 331 MB/min backup speed and 205 MB/min verify speeds, which is not only much slower, but also much slower for reads than for writes. Notes: The comparison testing was done on the same server, SCSI card and SCSI cable, with the same backup data set and the same tape each time. The tape and drives are error-free (according to HP L&TT and Backup Exec). The SCSI card is a U160 card, which is not normally recommended for LTO 3, but we're not writing to LTO 3 tapes at LTO 3 speeds, and a U320 SCSI card is not available to me at the moment. As I'm scratching my head to determine the reason for the performance drop, my first question is: While LTO drives can write to the previous generation LTO tapes, does doing so normally incur a performance penalty?

    Read the article

  • Raid-5 Performance per spindle scaling

    - by Bill N.
    So I am stuck in a corner, I have a storage project that is limited to 24 spindles, and requires heavy random Write (the corresponding read side is purely sequential). Needs every bit of space on my Drives, ~13TB total in a n-1 raid-5, and has to go fast, over 2GB/s sort of fast. The obvious answer is to use a Stripe/Concat (Raid-0/1), or better yet a raid-10 in place of the raid-5, but that is disallowed for reasons beyond my control. So I am here asking for help in getting a sub optimal configuration to be as good as it can be. The array built on direct attached SAS-2 10K rpm drives, backed by a ARECA 18xx series controller with 4GB of cache. 64k array stripes and an 4K stripe aligned XFS File system, with 24 Allocation groups (to avoid some of the penalty for being raid 5). The heart of my question is this: In the same setup with 6 spindles/AG's I see a near disk limited performance on the write, ~100MB/s per spindle, at 12 spindles I see that drop to ~80MB/s and at 24 ~60MB/s. I would expect that with a distributed parity and matched AG's, the performance should scale with the # of spindles, or be worse at small spindle counts, but this array is doing the opposite. What am I missing ? Should Raid-5 performance scale with # of spindles ? Many thanks for your answers and any ideas, input, or guidance. --Bill Edit: Improving RAID performance The other relevant thread I was able to find, discusses some of the same issues in the answers, though it still leaves me with out an answer on the performance scaling.

    Read the article

  • Good C++ books regarding Performance?

    - by Leon
    Besides the books everyone knows about, like Meyer's 3 Effective C++/STL books, are there any other really good C++ books specifically aimed towards performance code? Maybe this is for gaming, telecommunications, finance/high frequency etc? When I say performance I mean things where a normal C++ book wouldnt bother advising because the gain in performance isn't worthwhile for 95% of C++ developers. Maybe suggestions like avoiding virtual pointers, going into great depth about inlining etc? A book going into great depth on C++ memory allocation or multithreading performance would obviously be very useful.

    Read the article

  • Best tool for monitoring backups, etc. and trending statstics from that data

    - by Randy Syring
    I have done some research on nagios, opennms, and zenoss but am not confident that I have found what I am looking for. The main driving force for me right now is being able to monitor backups. This includes mysql, mssql, and eventually some file system backups. We have a tool that wraps the backup process for these different systems and collects statistics. So, items like: number of databases backed up size of db backup file size of db backup file compressed time to make backup time to zip file I want to be able to A) have notifications if the jobs are not run according to schedule B) be able to set thresholds on the statistics which would trigger notifications C) I want to be able to trend and graph the statistics I am planning on sending this information to the monitoring application through an HTTP POST. Or, the monitoring application could pull it from a log file as well. However, we will have other processes with other "arbitrary" (from the monitoring system's perspective) statics that will want to monitor and trend, so flexibility is very important. The tool or tools should also be able to do general monitoring and trending of network interfaces, server load, etc. Once we get the backup monitoring in place, we will want to include those items as well. Thanks.

    Read the article

  • SQL Server replication and load balance

    - by Ahmed Galal
    I'm running a web service that serves a mobile app on IIS 8 and SQL Server 2014, my service has a massive load and i'm trying to improve performance, most of the load is happening on SQL. i don't think i have a bottleneck, my processor and ram is up to the max and i think my code is not that bad, am already using memcached and other stuff to avoid hitting SQL too much. i know i can always upgrade the server hardware but i already have a spare server that i would like to use, so i was thinking to split the SQL load on the 2 servers. What i was thinking of is to setup replication on the other server and do some load balancing, but am not sure how to do the load balance. I know i can adjust my code to hit the other server for some queries but i was hoping to find a solution that avoid changing my code. So my question is, What are the ways of doing load balancing between 2 SQL servers ? I would appreciate suggestions or best practices or some directions. Thanks.

    Read the article

  • Load balancing with puppet

    - by Gonçalo Queirós
    Hi there. Im trying to setup a loadbalancing system. My load balancer (nginx) has a conf file where i should list all IP's of the upstream servers. I could put the IP's on the conf manually, but this ways i would need to change the conf file every time i add/remove an upstream server. For now i came up with two different ideas, but i don't like much of neither: 1 - Have every upstream machine to use Exported Resources to create a file with it's IP..Then the load balancer server will have an "include conf_directory/*", and load all the files created by the upstrem servers. Since the load balancer is using nginx this can be done, but if i wan't latter on to configure something that doesn't have the "include" on the conf files, this solution will not work. 2 - If the config doesn't support the "include" command, then we could have again, every upstream server use the Exported Resources to create a filw with its IP, and latter on, the load balancer execute a command that would pick every file and generate the config Both versions addopt the same techinque, the difference is that version 2 is used when the server (that needs to have a conf generated) doesn't recognize a command like "include" inside its own conf. Now, my question is, is there any way to do this in a different form? I suspect that there is, since puppet is made to manage multiple servers, it seems a bit strange not have a easy way to configure load balancers.

    Read the article

  • nginx redirect what is not coming from load balancing

    - by dawez
    I have nginx on SERVER1 that is acting as load balancing between SERVER1 and SERVER2 in SERVER1 I have the upstreams for the load balancing defined as : upstream de.server.com { # similar upstreams defined also for other languages # SELF SERVER1 server 127.0.0.1:8082 weight=3 max_fails=3 fail_timeout=2; # other SERVER2 server otherserverip:8082 max_fails=3 fail_timeout=2; } The load balancing config on SERVER1 is this one: server { listen 80; server_name ~^(?<LANG>de|es|fr)\.server\.com; location / { proxy_pass http://$LANG.server.com; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # trying to pass a variable in the header to SERVER2 proxy_set_header Is-From-Load-Balancer 1; } } Then in server 2 I have: server { listen 8082; server_name localhost; root /var/www/server.com/public; # test output values add_header testloadbalancer $http_is_from_load_balancer; add_header testloadbalancer2 not_load_bal; ## other stuff here to process the request } I can see the "testloadbalancer" in the response header is set to 1 when the request is coming from the load balancing, it is not present when from a direct access: SERVER2:8082 . I would like to bounce back to the SERVER1 all the direct requests that are sent to SERVER2, but keep the ones from the load balancing. So this should forbid direct access to SERVER2:8082 and redirect to SERVER1:80 .

    Read the article

  • SQL Server – SafePeak “Logon Trigger” Feature for Managing Data Access

    - by pinaldave
    Lately I received an interesting question about the abilities of SafePeak for SQL Server acceleration software: Q: “I would like to use SafePeak to make my CRM application faster. It is an application we bought from some vendor, after a while it became slow and we can’t reprogram it. SafePeak automated caching sounds like an easy and good solution for us. But, in my application there are many servers and different other applications services that address its main database, and some even change data, and I feel that there is a chance that some servers that during the connection process we may miss some. Is there a way to ensure that SafePeak will be aware of all connections to the SQL Server, so its cache will remain intact?” Interesting question, as I remember that SafePeak (http://www.safepeak.com/Product/SafePeak-Overview) likes that all traffic to the database will go thru it. I decided to check out the features of SafePeak latest version (2.1) and seek for an answer there. A: Indeed I found SafePeak has a feature they call “Logon Trigger” and is designed for that purpose. It is located in the user interface, under: Settings -> SQL instances management  ->  [your instance]  ->  [Logon Trigger] tab. From here you activate / deactivate it and control a white-list of enabled server IPs and Login names that SafePeak will ignore them. Click to Enlarge After activation of the “logon trigger” Safepeak server is notified by the SQL Server itself on each new opened connection. Safepeak monitors those connections and decides if there is something to do with them or not. On a typical installation SafePeak likes all application and users connections to go via SafePeak – this way it knows about data and schema updates immediately (real time). With activation of the safepeak “logon trigger”  a special CLR trigger is deployed on the SQL server and notifies Safepeak on any connection that has not arrived via SafePeak. In such cases Safepeak can act to clear and lock the cache or to ignore it. This feature enables to make sure SafePeak will be aware of all connections so SafePeak cache will maintain exactly correct all times. So even if a user, like a DBA will connect to the SQL Server not via SafePeak, SafePeak will know about it and take actions. The notification does not impact the work of that connection, the user or application still continue to do whatever they planned to do. Note: I found that activation of logon trigger in SafePeak requires that SafePeak SQL login will have the next permissions: 1) CONTROL SERVER; 2) VIEW SERVER STATE; 3) And the SQL Server instance is CLR enabled; Seeing SafePeak in action, I can say SafePeak brings fantastic resource for those who seek to get performance for SQL Server critical apps. SafePeak promises to accelerate SQL Server applications in just several hours of installation, automatic learning and some optimization configuration (no code changes!!!). If better application and database performance means better business to you – I suggest you to download and try SafePeak. The solution of SafePeak is indeed unique, and the questions I receive are very interesting. Have any more questions on SafePeak? Please leave your question as a comment and I will try to get an answer for you. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL Developer Debugging, Watches, Smart Data, & Data

    - by thatjeffsmith
    After presenting the SQL Developer PL/SQL debugger for about an hour yesterday at KScope12 in San Antonio, my boss came up and asked, “Now, would you really want to know what the Smart Data panel does?” Apparently I had ‘made up’ my own story about what that panel’s intent is based on my experience with it. Not good Jeff, not good. It was a very small point of my presentation, but I probably should have read the docs. The Smart Data tab displays information about variables, using your Debugger: Smart Data preferences. You can also specify these preferences by right-clicking in the Smart Data window and selecting Preferences. Debugger Smart Data Preferences, control number of variables to display The Smart Data panel auto-inspects the last X accessed variables. So if you have a program with 26 variables, instead of showing you all 26, it will just show you the last two variables that were referenced in your program. If you were to click on the ‘Data’ debug panel, you’ll see EVERYTHING. And if you only want to see a very specific set of values, then you should use Watches. The Smart Data Panel As I step through the code, the variables being tracked change as they are referenced. Only the most recent ones display. This is controlled by the ‘Maximum Locations to Remember’ preference. Step through the code, see the latest variables accessed The Data Panel All variables are displayed. Might be information overload on large PL/SQL programs where you have many dozens or even hundreds of variables to track. Shows everything all the time Watches Watches are added manually and only show what you ask for. Data on Demand – add a watch to track a specific variable Remember, you can interact with your data If you want to do more than just watch, you can mouse-right on a data element, and change the value of the variable as the program is running. This is one of the primary benefits to debugging over using DBMS_OUTPUT to track what’s happening in your program. Change the values while the program is running to test your ‘What if?’ scenarios

    Read the article

  • SQL – Biggest Concerns in a Data-Driven World

    - by Pinal Dave
    The ongoing chaos over Government Agency’s snooping has ignited a heated debate on privacy of personal data and its use by government and/or other institutions. It has created a feeling of disapproval and distrust among users. This incident proves to be a lesson for companies that are looking to leverage their business using a data driven approach. According to analysts, the goal of gathering personal information should be to deliver benefits to both the parties – the user as well as the data collector(government or business). Using data the right way is crucial, and companies need to deploy the right software applications and systems to ensure that their efforts are well-directed. However, there are various issues plaguing analysts regarding available software, which are highlighted below. According to a InformationWeek 2013 Survey of Analytics, Business Intelligence and Information Management where 541 business technology professionals contributed as respondents, it was discovered that the biggest concern was deemed to be the scarcity of expertise and high costs associated with the same. This concern was voiced by as many as 38% of the participants. A close second came out to be the issue of data warehouse appliance platforms being expensive, with 33% of those present believing it to be a huge roadblock. Another revelation made in this respect was that 31% professionals weren’t even sure how Data Analytics can create business opportunities for them. Another 17% shared that they found data platform technologies such as Hadoop and NoSQL technologies hard to learn. These results clearly pointed out that there are awareness and expertise issues that also need much attention. Unless the demand-supply gap of Business Intelligence professionals well versed in data analysis technologies is met, this divide is going to affect how companies make the most of their BI campaigns. One of the key action points that can be taken to salvage the situation, is to provide training on Data Analytics concepts. Koenig Solutions offer courses on many such technologies including a course on MCSE SQL Server 2012: BI Platform. So it’s time to brush up your skills and get down to work in a data driven world that awaits you ahead. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Store XML data in Core Data

    - by ct2k7
    Hi, is there any easy way of store XML data into core data? Currently, my app just pulls the values from the XML file directly, however, this isn't efficient for XML files which holds over 100 entries, thus storing the data in Core Data would be the best option. XML file is called/downloaded/parsed ever time the app opens. With the Core Data, the XML data would be downloaded ever 3600 seconds or so, and refresh the current data in the core data, to reduce the loading time when opening the app. Any ideas on how I can do this? Having reviewed the developer documentation, it doesn't look very tasty.

    Read the article

  • Where can I find free and open data?

    - by kitsune
    Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

    Read the article

  • Temporary storage for keeping data between program iterations?

    - by mr.b
    I am working on an application that works like this: It fetches data from many sources, resulting in pool of about 500,000-1,500,000 records (depends on time/day) Data is parsed Part of data is processed in a way to compare it to pre-existing data (read from database), calculations are made, and stored in database. Resulting dataset that has to be stored in database is, however, much smaller in size (compared to original data set), and ranges from 5,000-50,000 records. This process almost always updates existing data, perhaps adds few more records. Then, data from step 2 should be kept somehow, somewhere, so that next time data is fetched, there is a data set which can be used to perform calculations, without touching pre-existing data in database. I should point out that this data can be lost, it's not irreplaceable (key information can be read from database if needed), but it would speed up the process next time. Application components can (and will be) run off different computers (in the same network), so storage has to be reachable from multiple hosts. I have considered using memcached, but I'm not quite sure should I do so, because one record is usually no smaller than 200 bytes, and if I have 1,500,000 records, I guess that it would amount to over 300 MB of memcached cache... But that doesn't seem scalable to me - what if data was 5x that amount? If it were to consume 1-2 GB of cache only to keep data in between iterations (which could easily happen)? So, the question is: which temporary storage mechanism would be most suitable for this kind of processing? I haven't considered using mysql temporary tables, as I'm not sure if they can persist between sessions, and be used by other hosts in network... Any other suggestion? Something I should consider?

    Read the article

  • Why are data structures so important in interviews?

    - by Vamsi Emani
    I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures when I graduated out of college. Through out the campus placements during my graduation, I've witnessed that most of the biggie tech companies like Amazon, Microsoft etc focused mainly on data structures. It appears as if data structures is the only thing that they expect from a graduate. Adding to this, I see that there is this general perspective that a good programmer is necessarily a one with good knowledge about data structures. To be honest, I felt bad about that. I write good code. I follow standard design patterns of coding, I do use data structures but at the superficial level as in java exposed APIs like ArrayLists, LinkedLists etc. But the companies usually focused on the intricate aspects of Data Structures like pointer based memory manipulation and time complexities. Probably because of my java-ish background, Back then, I understood code efficiency and logic only when talked in terms of Object Oriented Programming like Objects, instances, etc but I never drilled down into the level of bits and bytes. I did not want people to look down upon me for this knowledge deficit of mine in Data Structures. So really why all this emphasis on Data Structures? Does, Not having knowledge in Data Structures really effect one's career in programming? Or is the knowledge in this subject really a sufficient basis to differentiate a good and a bad programmer?

    Read the article

  • Data structure for pattern matching.

    - by alvonellos
    Let's say you have an input file with many entries like these: date, ticker, open, high, low, close, <and some other values> And you want to execute a pattern matching routine on the entries(rows) in that file, using a candlestick pattern, for example. (See, Doji) And that pattern can appear on any uniform time interval (let t = 1s, 5s, 10s, 1d, 7d, 2w, 2y, and so on...). Say a pattern matching routine can take an arbitrary number of rows to perform an analysis and contain an arbitrary number of subpatterns. In other words, some patterns may require 4 entries to operate on. Say also that the routine (may) later have to find and classify extrema (local and global maxima and minima as well as inflection points) for the ticker over a closed interval, for example, you could say that a cubic function (x^3) has the extrema on the interval [-1, 1]. (See link) What would be the most natural choice in terms of a data structure? What about an interface that conforms a Ticker object containing one row of data to a collection of Ticker so that an arbitrary pattern can be applied to the data. What's the first thing that comes to mind? I chose a doubly-linked circular linked list that has the following methods: push_front() push_back() pop_front() pop_back() [] //overloaded, can be used with negative parameters But that data structure seems very clumsy, since so much pushing and popping is going on, I have to make a deep copy of the data structure before running an analysis on it. So, I don't know if I made my question very clear -- but the main points are: What kind of data structures should be considered when analyzing sequential data points to conform to a pattern that does NOT require random access? What kind of data structures should be considered when classifying extrema of a set of data points?

    Read the article

  • Five Key Strategies in Master Data Management

    - by david.butler(at)oracle.com
    Here is a very interesting Profit Magazine article on MDM: A recent customer survey reveals the deleterious effects of data fragmentation. by Trevor Naidoo, December 2010   Across industries and geographies, IT organizations have grown in complexity, whether due to mergers and acquisitions, or decentralized systems supporting functional or departmental requirements. With systems architected over time to support unique, one-off process needs, they are becoming costly to maintain, and the Internet has only further added to the complexity. Data fragmentation has become a key inhibitor in delivering flexible, user-friendly systems. The Oracle Insight team conducted a survey assessing customers' master data management (MDM) capabilities over the past two years to get a sense of where they are in terms of their capabilities. The responses, by 27 respondents from six different industries, reveal five key areas in which customers need to improve their data management in order to get better financial results. 1. Less than 15 percent of organizations surveyed understand the sources and quality of their master data, and have a roadmap to address missing data domains. Examples of the types of master data domains referred to are customer, supplier, product, financial and site. Many organizations have multiple sources of master data with varying degrees of data quality in each source -- customer data stored in the customer relationship management system is inconsistent with customer data stored in the order management system. Imagine not knowing how many places you stored your customer information, and whether a customer's address was the most up to date in each source. In fact, more than 55 percent of the respondents in the survey manage their data quality on an ad-hoc basis. It is important for organizations to document their inventory of data sources and then profile these data sources to ensure that there is a consistent definition of key data entities throughout the organization. Some questions to ask are: How do we define a customer? What is a product? How do we define a site? The goal is to strive for one common repository for master data that acts as a cross reference for all other sources and ensures consistent, high-quality master data throughout the organization. 2. Only 18 percent of respondents have an enterprise data management strategy to ensure that data is treated as an asset to the organization. Most respondents handle data at the department or functional level and do not have an enterprise view of their master data. The sales department may track all their interactions with customers as they move through the sales cycle, the service department is tracking their interactions with the same customers independently, and the finance department also has a different perspective on the same customer. The salesperson may not be aware that the customer she is trying to sell to is experiencing issues with existing products purchased, or that the customer is behind on previous invoices. The lack of a data strategy makes it difficult for business users to turn data into information via reports. Without the key building blocks in place, it is difficult to create key linkages between customer, product, site, supplier and financial data. These linkages make it possible to understand patterns. A well-defined data management strategy is aligned to the business strategy and helps create the governance needed to ensure that data stewardship is in place and data integrity is intact. 3. Almost 60 percent of respondents have no strategy to integrate data across operational applications. Many respondents have several disparate sources of data with no strategy to keep them in sync with each other. Even though there is no clear strategy to integrate the data (see #2 above), the data needs to be synced and cross-referenced to keep the business processes running. About 55 percent of respondents said they perform this integration on an ad hoc basis, and in many cases, it is done manually with the help of Microsoft Excel spreadsheets. For example, a salesperson needs a report on global sales for a specific product, but the product has different product numbers in different countries. Typically, an analyst will pull all the data into Excel, manually create a cross reference for that product, and then aggregate the sales. The exact same procedure has to be followed if the same report is needed the following month. A well-defined consolidation strategy will ensure that a central cross-reference is maintained with updates in any one application being propagated to all the other systems, so that data is synchronized and up to date. This can be done in real time or in batch mode using integration technology. 4. Approximately 50 percent of respondents spend manual efforts cleansing and normalizing data. Information stored in various systems usually follows different standards and formats, making it difficult to match the data. A customer's address can be stored in different ways using a variety of abbreviations -- for example, "av" or "ave" for avenue. Similarly, a product's attributes can be stored in a number of different ways; for example, a size attribute can be stored in inches and can also be entered as "'' ". These types of variations make it difficult to match up data from different sources. Today, most customers rely on manual, heroic efforts to match, cleanse, and de-duplicate data -- clearly not a scalable, sustainable model. To solve this challenge, organizations need the ability to standardize data for customers, products, sites, suppliers and financial accounts; however, less than 10 percent of respondents have technology in place to automatically resolve duplicates. It is no wonder, therefore, that we get communications about products we don't own, at addresses we don't reside, and using channels (like direct mail) we don't like. An all-too-common example of a potential challenge follows: Customers end up receiving duplicate communications, which not only impacts customer satisfaction, but also incurs additional mailing costs. Cleansing, normalizing, and standardizing data will help address most of these issues. 5. Only 10 percent of respondents have the ability to share data that was mastered in a master data hub. Close to 60 percent of respondents have efforts in place that profile, standardize and cleanse data manually, and the output of these efforts are stored in spreadsheets in various parts of the organization. This valuable information is not easily shared with the rest of the organization and, more importantly, this enriched information cannot be sent back to the source systems so that the data is fixed at the source. A key benefit of a master data management strategy is not only to clean the data, but to also share the data back to the source systems as well as other systems that need the information. Aside from the source systems, another key beneficiary of this data is the business intelligence system. Having clean master data as input to business intelligence systems provides more accurate and enhanced reporting.  Characteristics of Stellar MDM When deciding on the right master data management technology, organizations should look for solutions that have four main characteristics: enterprise-grade MDM performance complete technology that can be rapidly deployed and addresses multiple business issues end-to-end MDM process management with data quality monitoring and assurance pre-built MDM business relevant applications with data stores and workflows These master data management capabilities will aid in moving closer to a best-practice maturity level, delivering tremendous efficiencies and savings as well as revenue growth opportunities as a result of better understanding your customers.  Trevor Naidoo is a senior director in Industry Strategy and Insight at Oracle. 

    Read the article

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >