Search Results

Search found 60062 results on 2403 pages for 'data science'.

Page 9/2403 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

- by Compudicted

Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Weindows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net. Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult. In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

Read the article
Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

- by Compudicted

Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review-again.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Windows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net. Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult. In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

Read the article
Internal Mutation of Persistent Data Structures

- by Greg Ros

To clarify, when I mean use the terms persistent and immutable on a data structure, I mean that: The state of the data structure remains unchanged for its lifetime. It always holds the same data, and the same operations always produce the same results. The data structure allows Add, Remove, and similar methods that return new objects of its kind, modified as instructed, that may or may not share some of the data of the original object. However, while a data structure may seem to the user as persistent, it may do other things under the hood. To be sure, all data structures are, internally, at least somewhere, based on mutable storage. If I were to base a persistent vector on an array, and copy it whenever Add is invoked, it would still be persistent, as long as I modify only locally created arrays. However, sometimes, you can greatly increase performance by mutating a data structure under the hood. In more, say, insidious, dangerous, and destructive ways. Ways that might leave the abstraction untouched, not letting the user know anything has changed about the data structure, but being critical in the implementation level. For example, let's say that we have a class called ArrayVector implemented using an array. Whenever you invoke Add, you get a ArrayVector build on top of a newly allocated array that has an additional item. A sequence of such updates will involve n array copies and allocations. Here is an illustration: However, let's say we implement a lazy mechanism that stores all sorts of updates -- such as Add, Set, and others in a queue. In this case, each update requires constant time (adding an item to a queue), and no array allocation is involved. When a user tries to get an item in the array, all the queued modifications are applied under the hood, requiring a single array allocation and copy (since we know exactly what data the final array will hold, and how big it will be). Future get operations will be performed on an empty cache, so they will take a single operation. But in order to implement this, we need to 'switch' or mutate the internal array to the new one, and empty the cache -- a very dangerous action. However, considering that in many circumstances (most updates are going to occur in sequence, after all), this can save a lot of time and memory, it might be worth it -- you will need to ensure exclusive access to the internal state, of course. This isn't a question about the efficacy of such a data structure. It's a more general question. Is it ever acceptable to mutate the internal state of a supposedly persistent or immutable object in destructive and dangerous ways? Does performance justify it? Would you still be able to call it immutable? Oh, and could you implement this sort of laziness without mutating the data structure in the specified fashion?

Read the article
Sabre Manages Fast Data Growth with Oracle Data Integration Products

- by Irem Radzik

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} Last year at OpenWorld we announced Sabre Holding as a winner of the Fusion Middleware Innovation Awards. The Sabre team did an excellent job at leveraging cutting edge technologies for managing rapid data growth and exponential scalability demands they have experienced in the travel industry. Today we announced the details and specific benefits of Sabre’s new real-time data integration solution in a press release. Please take a look if you haven’t seen it yet. Sabre Holdings Deploys Oracle Data Integrator and Oracle GoldenGate to Support Rapid Customer Growth There are 3 different areas of benefits Sabre achieved by using Oracle Data Integration products: Manages 7X increase in data sources for the enterprise data warehouse Reduced infrastructure complexity Decreased time to market for new products and services by 30 percent. This simply shows that using latest technologies helps the companies to innovate robust solutions against today’s key data management challenges. And the benefit of using a next generation data integration technology is not only seen in the IT operations, but also in the business side. A better data integration solution for the enterprise data warehouse delivered the platform they need to accelerate how they service their customers, improving their competitive advantage. Tomorrow I will give another great example of innovation with next generation data integration from Oracle. We will be discussing the Fusion Middleware Innovation Awards 2012 winners and their results with using Oracle’s data integration products.

Read the article
Computer Science Fundamentals - Recommended books

- by contactmatt

Hey, I'm looking to see if anyone can recommend any books in fundamentals of computer science. I obtained my associates degree as a programmer/analyst a couple years ago and I know a good amount about programming on the .NET framework. I'm even certified on the .NET 4 framework as a web application developer. However, since I was only able to obtain my associates degree, I was deprived at my college on the low-level basics and operations of computers and basic computer science information. I'm really interesting in learning about the low-level operations of a computer and in programming (bytes, bits, memory management, etc.) Can anyone recommend any good computer science books for someone who is decently experienced in programming? Thank You

Read the article
implementing dynamic query handler on historical data

- by user2390183

EDIT : Refined question to focus on the core issue Context: I have historical data about property (house) sales collected from various sources in a centralized/cloud data source (assume info collection is handled by a third party) Planning to develop an application to query and retrieve data from this centralized data source Example Queries: Simple : for given XYZ post code, what is average house price for 3 bed room house? Complex: What is estimated price for an house at "DD,Some Street,XYZ Post Code" (worked out from average values of historic data filtered by various characteristics of the house: house post code, no of bed rooms, total area, and other deeper insights like house building type, year of built, features)? In addition to average price, the application should support other property info ** maximum, or minimum price..etc and trend (graph) on a selected property attribute over a period of time**. Hence, the queries should not enforce the search based on a primary key or few fixed fields In other words, queries can be What is the change in 3 Bed Room house price (irrespective of location) over last 30 days? What kind of properties we can get for X price (irrespective of location or house type) The challenge I have is identifying the domain (BI/ Data Analytical or DB Design or DB Query Interface or DW related or something else) this problem (dynamic query on historic data) belong to, so that I can do further exploration My findings so far I could be wrong on the following, so please correct me if you think so I briefly read about BI/Data Analytics - I think it is heavy weight solution for my problem and has scalability issues. DB Design - As I understand RDBMS works well if you know Data model at design time. I am expecting attributes about property or other entity (user) that am going to bring in, would evolve quickly. hence maintenance would be an issue. As I am going to have multiple users executing query at same time, performance would be a bottleneck Other options like Graph DB (http://www.tinkerpop.com/) seems to be bit complex (they are good. but using those tools meant for generic purpose, make me think like assembly programming to solve my problem ) BigData related solution are to analyse data from multiple unrelated domains So, Any suggestion on the space this problem fit in ? (Especially if you have design/implementation experience of back-end for property listing or similar portals)

Read the article
What is the biggest weakness of students graduating with degrees in Computer Science?

- by akobre01

This question is directed more toward employers and graduate student advisors/professors but all opinions are welcome. What do you find is a common weakness of new hires and/or new grad students? Is it entirely variable dependent on the student and his or her university? Is there a particular skill or skillset that you wish new hires/researchers had expertise in and how can we remedey this deficiency? I realize that this question is general and really encapsulates two questions, one more about the weaknesses of new software engineers and one about the weaknesses of new researchers. However, both types of people tend to come from similar courses of study so I'm wondering if there is any overlap. Note: I am not a professor but I'm interested in how best to revise the undergraduate curriculum in CS.

Read the article
Computer Science Career Advice: Master's in Computer Science vs. Software Engineering?

- by Everton

Hello, I am a college student and I am majoring in Computer Science and Applied Mathematics. As I get closer to my senior year I have noticed that continuing my studies is the best choice right for me now. I see that several universities offer an Computer Science Master's Degree and an Software Engineering degree. What are their pros and cons? I feel that while the Computer Science master's degree seems a little too broad the Software Engineering is too restrictive. I did not decide yet between an career of Software development or research ( algorithm development among other things ). Any advice would be greatly apreciated!

Read the article
AngularJS dealing with large data sets (Strategy)

- by Brian

I am working on developing a personal temperature logging viewer based on my rasppi curl'ing data into my web server's api. Temperatures are taken every 2 seconds and I can have several temperature sensors posting data. Needless to say I will have a lot of data to handle even within the scope of an hour. I have implemented a very simple paging api from the server so the server doesn't timeout and is currently only returning data in 1000 units per call, then paging through the data. I had the idea to intially show say the last 20 minutes of data from a sensor (or all sensors depending on user choices), then allowing the user to select other timeframes from which to show data. The issue comes in when you want to view all sensors or an extended time period (say 24 hours). Is there a best practice of handling this large amount of data? Would it be useful to load those first 20 minutes into the live view and then cache into local storage something like the last 24 hours? I haven't been able to find a decent idea of this in use yet even though there are a lot of ways to take this problem. I am just looking for some suggestions as to what might provide a good balance between good performance and not caching the entire data set on the client side (as beyond a week of data this might not be feasible).

Read the article
I need some help creating a non-binary tree (or some other data structure that will better solve my problem)

- by EDO

I have about ten lists of numbers and some strings. Each list has about <= 30K lines. Each line on a list has a distinct number. I need to build an efficient way of finding all the lines in each list that has the same 'control' number (or key for dB guys) and comparing what is in their string parts. I am writing this in Java. I have thought about using trees but my brain cells are about burnt now. I need some help.

Read the article
replacing data.frame element-wise operations with data.table (that used rowname)

- by Harold

So lets say I have the following data.frames: df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10]) df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10), row.names = letters[1:10]) And perhaps the "equivalent" data.tables: dt1 <- data.table(x = rownames(df1), df1, key = 'x') dt2 <- data.table(x = rownames(df2), df2, key = 'x') If I want to do element-wise operations between df1 and df2, they look something like dfRes <- df1 / df2 And rownames() is preserved: R> head(dfRes) y z a 0.5 3.1405463 b 1.0 1.2925200 c 1.5 1.4137930 d 2.0 -0.5532855 e 2.5 -0.0998303 f 1.2 -1.6236294 My poor understanding of data.table says the same operation should look like this: dtRes <- dt1[, !'x', with = F] / dt2[, !'x', with = F] dtRes[, x := dt1[,x,]] setkey(dtRes, x) (setkey optional) Is there a more data.table-esque way of doing this? As a slightly related aside, more generally, I would have other columns such as factors in each data.table and I would like to omit those columns while doing the element-wise operations, but still have them in the result. Does this make sense? Thanks!

Read the article
PHP - post data ends when '&' is in data.

- by Phil Jackson

Hi all, im posting data using jquery/ajax and PHP at the backend. Problem being, when I input something like 'Jack & Jill went up the hill' im only recieving 'Jack' when it gets to the backend. I have thrown an error at the frontend before that data is sent which alerts 'Jack & Jill went up the hill'. When I put die(print_r($_POST)); at the very top of my index page im only getting [key] => Jack how can I be loosing the data? I thought It may have been my filter; <?php function filter( $data ) { $data = trim( htmlentities( strip_tags( mb_convert_encoding( $data, 'HTML-ENTITIES', "UTF-8") ) ) ); if ( get_magic_quotes_gpc() ) { $data = stripslashes( $data ); } //$data = mysql_real_escape_string( $data ); return $data; } echo "<xmp>" . filter("you & me") . "</xmp>"; ?> but that returns fine in the test above you & me which is in place after I added die(print_r($_POST));. Can anyone think of how and why this is happening? Any help much appreciated. Regards, Phil.

Read the article
How to Avoid Your Next 12-Month Science Project

- by constant

While most customers immediately understand how the magic of Oracle's Hybrid Columnar Compression, intelligent storage servers and flash memory make Exadata uniquely powerful against home-grown database systems, some people think that Exalogic is nothing more than a bunch of x86 servers, a storage appliance and an InfiniBand (IB) network, built into a single rack. After all, isn't this exactly what the High Performance Computing (HPC) world has been doing for decades? On the surface, this may be true. And some people tried exactly that: They tried to put together their own version of Exalogic, but then they discover there's a lot more to building a system than buying hardware and assembling it together. IT is not Ikea. Why is that so? Could it be there's more going on behind the scenes than merely putting together a bunch of servers, a storage array and an InfiniBand network into a rack? Let's explore some of the special sauce that makes Exalogic unique and un-copyable, so you can save yourself from your next 6- to 12-month science project that distracts you from doing real work that adds value to your company. Engineering Systems is Hard Work! The backbone of Exalogic is its InfiniBand network: 4 times better bandwidth than even 10 Gigabit Ethernet, and only about a tenth of its latency. What a potential for increased scalability and throughput across the middleware and database layers! But InfiniBand is a beast that needs to be tamed: It is true that Exalogic uses a standard, open-source Open Fabrics Enterprise Distribution (OFED) InfiniBand driver stack. Unfortunately, this software has been developed by the HPC community with fastest speed in mind (which is good) but, despite the name, not many other enterprise-class requirements are included (which is less good). Here are some of the improvements that Oracle's InfiniBand development team had to add to the OFED stack to make it enterprise-ready, simply because typical HPC users didn't have the need to implement them: More than 100 bug fixes in the pieces that were not related to the Message Passing Interface Protocol (MPI), which is the protocol that HPC users use most of the time, but which is less useful in the enterprise. Performance optimizations and tuning across the whole IB stack: From Switches, Host Channel Adapters (HCAs) and drivers to low-level protocols, middleware and applications. Yes, even the standard HPC IB stack could be improved in terms of performance. Ethernet over IB (EoIB): Exalogic uses InfiniBand internally to reach high performance, but it needs to play nicely with datacenters around it. That's why Oracle added Ethernet over InfiniBand technology to it that allows for creating many virtual 10GBE adapters inside Exalogic's nodes that are aggregated and connected to Exalogic's IB gateway switches. While this is an open standard, it's up to the vendor to implement it. In this case, Oracle integrated the EoIB stack with Oracle's own IB to 10GBE gateway switches, and made it fully virtualized from the beginning. This means that Exalogic customers can completely rewire their server infrastructure inside the rack without having to physically pull or plug a single cable - a must-have for every cloud deployment. Anybody who wants to match this level of integration would need to add an InfiniBand switch development team to their project. Or just buy Oracle's gateway switches, which are conveniently shipped with a whole server infrastructure attached! IPv6 support for InfiniBand's Sockets Direct Protocol (SDP), Reliable Datagram Sockets (RDS), TCP/IP over IB (IPoIB) and EoIB protocols. Because no IPv6 = not very enterprise-class. HA capability for SDP. High Availability is not a big requirement for HPC, but for enterprise-class application servers it is. Every node in Exalogic's InfiniBand network is connected twice for redundancy. If any cable or port or HCA fails, there's always a replacement link ready to take over. This requires extra magic at the protocol level to work. So in addition to Weblogic's failover capabilities, Oracle implemented IB automatic path migration at the SDP level to avoid unnecessary failover operations at the middleware level. Security, for example spoof-protection. Another feature that is less important for traditional users of InfiniBand, but very important for enterprise customers. InfiniBand Partitioning and Quality-of-Service (QoS): One of the first questions we get from customers about Exalogic is: “How can we implement multi-tenancy?” The answer is to partition your IB network, which effectively creates many networks that work independently and that are protected at the lowest networking layer possible. In addition to that, QoS allows administrators to prioritize traffic flow in multi-tenancy environments so they can keep their service levels where it matters most. Resilient IB Fabric Management: InfiniBand is a self-managing network, so a lot of the magic lies in coming up with the right topology and in teaching the subnet manager how to properly discover and manage the network. Oracle's Infiniband switches come with pre-integrated, highly available fabric management with seamless integration into Oracle Enterprise Manager Ops Center. In short: Oracle elevated the OFED InfiniBand stack into an enterprise-class networking infrastructure. Many years and multiple teams of manpower went into the above improvements - this is something you can only get from Oracle, because no other InfiniBand vendor can give you these features across the whole stack! Exabus: Because it's not About the Size of Your Network, it's How You Use it! So let's assume that you somehow were able to get your hands on an enterprise-class IB driver stack. Or maybe you don't care and are just happy with the standard OFED one? Anyway, the next step is to actually leverage that InfiniBand performance. Here are the choices: Use traditional TCP/IP on top of the InfiniBand stack, Develop your own integration between your middleware and the lower-level (but faster) InfiniBand protocols. While more bandwidth is always a good thing, it's actually the low latency that enables superior performance for your applications when running on any networking infrastructure: The lower the latency, the faster the response travels through the network and the more transactions you can close per second. The reason why InfiniBand is such a low latency technology is that it gets rid of most if not all of your traditional networking protocol stack: Data is literally beamed from one region of RAM in one server into another region of RAM in another server with no kernel/drivers/UDP/TCP or other networking stack overhead involved! Which makes option 1 a no-go: Adding TCP/IP on top of InfiniBand is like adding training wheels to your racing bike. It may be ok in the beginning and for development, but it's not quite the performance IB was meant to deliver. Which only leaves option 2: Integrating your middleware with fast, low-level InfiniBand protocols. And this is what Exalogic's "Exabus" technology is all about. Here are a few Exabus features that help applications leverage the performance of InfiniBand in Exalogic: RDMA and SDP integration at the JDBC driver level (SDP), for Oracle Weblogic (SDP), Oracle Coherence (RDMA), Oracle Tuxedo (RDMA) and the new Oracle Traffic Director (RDMA) on Exalogic. Using these protocols, middleware can communicate a lot faster with each other and the Oracle database than by using standard networking protocols, Seamless Integration of Ethernet over InfiniBand from Exalogic's Gateway switches into the OS, Oracle Weblogic optimizations for handling massive amounts of parallel transactions. Because if you have an 8-lane Autobahn, you also need to improve your ramps so you can feed it with many cars in parallel. Integration of Weblogic with Oracle Exadata for faster performance, optimized session management and failover. As you see, “Exabus” is Oracle's word for describing all the InfiniBand enhancements Oracle put into Exalogic: OFED stack enhancements, protocols for faster IB access, and InfiniBand support and optimizations at the virtualization and middleware level. All working together to deliver the full potential of InfiniBand performance. Who else has 100% control over their middleware so they can develop their own low-level protocol integration with InfiniBand? Even if you take an open source approach, you're looking at years of development work to create, test and support a whole new networking technology in your middleware! The Extras: Less Hassle, More Productivity, Faster Time to Market And then there are the other advantages of Engineered Systems that are true for Exalogic the same as they are for every other Engineered System: One simple purchasing process: No headaches due to endless RFPs and no “Will X work with Y?” uncertainties. Everything has been engineered together: All kinds of bugs and problems have been already fixed at the design level that would have only manifested themselves after you have built the system from scratch. Everything is built, tested and integrated at the factory level . Less integration pain for you, faster time to market. Every Exalogic machine world-wide is identical to Oracle's own machines in the lab: Instant replication of any problems you may encounter, faster time to resolution. Simplified patching, management and operations. One throat to choke: Imagine finger-pointing hell for systems that have been put together using several different vendors. Oracle's Engineered Systems have a single phone number that customers can call to get their problems solved. For more business-centric values, read The Business Value of Engineered Systems. Conclusion: Buy Exalogic, or get ready for a 6-12 Month Science Project And here's the reason why it's not easy to "build your own Exalogic": There's a lot of work required to make such a system fly. In fact, anybody who is starting to "just put together a bunch of servers and an InfiniBand network" is really looking at a 6-12 month science project. And the outcome is likely to not be very enterprise-class. And it won't have Exalogic's performance either. Because building an Engineered System is literally rocket science: It takes a lot of time, effort, resources and many iterations of design/test/analyze/fix to build such a system. That's why InfiniBand has been reserved for HPC scientists for such a long time. And only Oracle can bring the power of InfiniBand in an enterprise-class, ready-to use, pre-integrated version to customers, without the develop/integrate/support pain. For more details, check the new Exalogic overview white paper which was updated only recently. P.S.: Thanks to my colleagues Ola, Paul, Don and Andy for helping me put together this article! var flattr_uid = '26528'; var flattr_tle = 'How to Avoid Your Next 12-Month Science Project'; var flattr_dsc = 'While most customers immediately understand how the magic of Oracle's Hybrid Columnar Compression, intelligent storage servers and flash memory make Exadata uniquely powerful against home-grown database systems, some people think that Exalogic is nothing more than a bunch of x86 servers, a storage appliance and an InfiniBand (IB) network, built into a single rack.After all, isn't this exactly what the High Performance Computing (HPC) world has been doing for decades?On the surface, this may be true. And some people tried exactly that: They tried to put together their own version of Exalogic, but then they discover there's a lot more to building a system than buying hardware and assembling it together. IT is not Ikea.Why is that so? Could it be there's more going on behind the scenes than merely putting together a bunch of servers, a storage array and an InfiniBand network into a rack? Let's explore some of the special sauce that makes Exalogic unique and un-copyable, so you can save yourself from your next 6- to 12-month science project that distracts you from doing real work that adds value to your company.'; var flattr_tag = 'Engineered Systems,Engineered Systems,Infiniband,Integration,latency,Oracle,performance'; var flattr_cat = 'text'; var flattr_url = 'http://constantin.glez.de/blog/2012/04/how-avoid-your-next-12-month-science-project'; var flattr_lng = 'en_GB'

Read the article
SQL Developer Debugging, Watches, Smart Data, & Data

- by thatjeffsmith

After presenting the SQL Developer PL/SQL debugger for about an hour yesterday at KScope12 in San Antonio, my boss came up and asked, “Now, would you really want to know what the Smart Data panel does?” Apparently I had ‘made up’ my own story about what that panel’s intent is based on my experience with it. Not good Jeff, not good. It was a very small point of my presentation, but I probably should have read the docs. The Smart Data tab displays information about variables, using your Debugger: Smart Data preferences. You can also specify these preferences by right-clicking in the Smart Data window and selecting Preferences. Debugger Smart Data Preferences, control number of variables to display The Smart Data panel auto-inspects the last X accessed variables. So if you have a program with 26 variables, instead of showing you all 26, it will just show you the last two variables that were referenced in your program. If you were to click on the ‘Data’ debug panel, you’ll see EVERYTHING. And if you only want to see a very specific set of values, then you should use Watches. The Smart Data Panel As I step through the code, the variables being tracked change as they are referenced. Only the most recent ones display. This is controlled by the ‘Maximum Locations to Remember’ preference. Step through the code, see the latest variables accessed The Data Panel All variables are displayed. Might be information overload on large PL/SQL programs where you have many dozens or even hundreds of variables to track. Shows everything all the time Watches Watches are added manually and only show what you ask for. Data on Demand – add a watch to track a specific variable Remember, you can interact with your data If you want to do more than just watch, you can mouse-right on a data element, and change the value of the variable as the program is running. This is one of the primary benefits to debugging over using DBMS_OUTPUT to track what’s happening in your program. Change the values while the program is running to test your ‘What if?’ scenarios

Read the article
SQL – Biggest Concerns in a Data-Driven World

- by Pinal Dave

The ongoing chaos over Government Agency’s snooping has ignited a heated debate on privacy of personal data and its use by government and/or other institutions. It has created a feeling of disapproval and distrust among users. This incident proves to be a lesson for companies that are looking to leverage their business using a data driven approach. According to analysts, the goal of gathering personal information should be to deliver benefits to both the parties – the user as well as the data collector(government or business). Using data the right way is crucial, and companies need to deploy the right software applications and systems to ensure that their efforts are well-directed. However, there are various issues plaguing analysts regarding available software, which are highlighted below. According to a InformationWeek 2013 Survey of Analytics, Business Intelligence and Information Management where 541 business technology professionals contributed as respondents, it was discovered that the biggest concern was deemed to be the scarcity of expertise and high costs associated with the same. This concern was voiced by as many as 38% of the participants. A close second came out to be the issue of data warehouse appliance platforms being expensive, with 33% of those present believing it to be a huge roadblock. Another revelation made in this respect was that 31% professionals weren’t even sure how Data Analytics can create business opportunities for them. Another 17% shared that they found data platform technologies such as Hadoop and NoSQL technologies hard to learn. These results clearly pointed out that there are awareness and expertise issues that also need much attention. Unless the demand-supply gap of Business Intelligence professionals well versed in data analysis technologies is met, this divide is going to affect how companies make the most of their BI campaigns. One of the key action points that can be taken to salvage the situation, is to provide training on Data Analytics concepts. Koenig Solutions offer courses on many such technologies including a course on MCSE SQL Server 2012: BI Platform. So it’s time to brush up your skills and get down to work in a data driven world that awaits you ahead. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Computer science undergraduate project ideas

- by Mehrdad Afshari

Hopefully, I'm going to finish my undergraduate studies next semester and I'm thinking about the topic of my final project. And yes, I've read the questions with duplicate title. I'm asking this from a bit different viewpoint, so it's not an exact dupe. I've spent at least half of my life coding stuff in different languages and frameworks so I'm not looking at this project as a way to learn much about coding and preparing for real world apps or such. I've done lots of those already. But since I have to do it to complete my degree, I felt I should spend my time doing something useful instead of throwing the whole thing out. I'm planning to make it an open source project or a hosted Web app (depending on the type) if I can make a high quality thing out of it, so I decided to ask StackOverflow what could make a useful project. Situation I've plenty of freedom about the topic. They also require 30-40 pages of text describing the project. I have the following points in mind (the more satisfied, the better): Something useful for software development Something that benefits the community Having academic value is great Shouldn't take more than a month of development (I know I'm lazy). Shouldn't be related to advanced theoretical stuff (soft computing, fuzzy logic, neural networks, ...). I've been a business-oriented software developer. It should be software oriented. While I love hacking microcontrollers and other fun embedded electronic things, I'm not really good at soldering and things like that. I'm leaning toward a Web application (think StackOverflow, PasteBin, NerdDinner, things like those). Technology It's probably going to be done in .NET (C#, F#) and Windows platform. If I really like the project (cool low level hacking), I might actually slip to C/C++. But really, C# is what I'm efficient at. Ideas Programming language, parsing and compiler related stuff: Designing a domain specific programming language and compiler Templating language compiled to C# or IL Database tools and related code generation stuff Web related technologies: ASP.NET MVC View engine doing something cool (don't know what exactly...) Specific-purpose, small, fast ASP.NET-based Web framework Applications: Visual Studio plugin to integrate with Bazaar (it's too much work, I think). ASP.NET based, jQuery-powered issue tracker (and possibly, project lifecycle management as a whole - poor man's TFS) Others: Something related to GPGPU Looking forward for great ideas! Unfortunately, I can't help on a currently existing project. I need to start my own to prevent further problems (as it's an undergrad project, nevertheless).

Read the article
What topic in computer science is this?

- by jasonbogd

Hi, I am trying to figure out what 'topic' this is called, so I can learn more about it. Basically, I'm talking about designing my applications's architecture. I'm not talking about algorithms. More like -- this class should have these methods and these instance variables, and communicate with this class in this way, this class should have these responsibilities etc. Can anybody tell me what the name of this topic is called and how I can get better at doing this? Thanks.

Read the article
Store XML data in Core Data

- by ct2k7

Hi, is there any easy way of store XML data into core data? Currently, my app just pulls the values from the XML file directly, however, this isn't efficient for XML files which holds over 100 entries, thus storing the data in Core Data would be the best option. XML file is called/downloaded/parsed ever time the app opens. With the Core Data, the XML data would be downloaded ever 3600 seconds or so, and refresh the current data in the core data, to reduce the loading time when opening the app. Any ideas on how I can do this? Having reviewed the developer documentation, it doesn't look very tasty.

Read the article
Where can I find free and open data?

- by kitsune

Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

Read the article
Riddle to woo a girl that knows computer science [closed]

- by alex

What interesting or difficult riddle, or flirty question, would woo a girl that has a masters from Stanford? For example: If Python is C++, C++ is JAVA, and PHP is returning None. What is PHP(Python(PHP))?

Read the article
Mathematics for Computer Science Students

- by Ender

To cut a long story short, I am a CS student that has received no formal Post-16 Maths education for years. Right now even my Algebra is extremely rusty and I have a couple of months to shape up my skills. I've got a couple of video lectures in my bookmarks, consisting of: Pre-Calculus Algebra Calculus Probability Introduction to Statistics Differential Equations Linear Algebra My aim as of today is to be able to read the CLRS book Introduction to Algorithms and be able to follow the Mathematical notation in that, as well as being able to confidently read and back-up any arguments written in Mathematical notation. Aside from these video lectures, can anyone recommend any good books to help teach someone wishing to go from a low-foundation level to a more advanced level of Mathematics? Just as a note, I've taken a first-year module in Analytical Modelling, so I understand some of the basic concepts of Discrete Mathematics. EDIT: Just a note to those that are looking to learn Linear Algebra using the Video Lectures I have posted up. Peteris Krumins' Blog contains a run-through of these lecture notes as well as his own commentary and lecture notes, an invaluable resource for those looking to follow the lectures too.

Read the article
Temporary storage for keeping data between program iterations?

- by mr.b

I am working on an application that works like this: It fetches data from many sources, resulting in pool of about 500,000-1,500,000 records (depends on time/day) Data is parsed Part of data is processed in a way to compare it to pre-existing data (read from database), calculations are made, and stored in database. Resulting dataset that has to be stored in database is, however, much smaller in size (compared to original data set), and ranges from 5,000-50,000 records. This process almost always updates existing data, perhaps adds few more records. Then, data from step 2 should be kept somehow, somewhere, so that next time data is fetched, there is a data set which can be used to perform calculations, without touching pre-existing data in database. I should point out that this data can be lost, it's not irreplaceable (key information can be read from database if needed), but it would speed up the process next time. Application components can (and will be) run off different computers (in the same network), so storage has to be reachable from multiple hosts. I have considered using memcached, but I'm not quite sure should I do so, because one record is usually no smaller than 200 bytes, and if I have 1,500,000 records, I guess that it would amount to over 300 MB of memcached cache... But that doesn't seem scalable to me - what if data was 5x that amount? If it were to consume 1-2 GB of cache only to keep data in between iterations (which could easily happen)? So, the question is: which temporary storage mechanism would be most suitable for this kind of processing? I haven't considered using mysql temporary tables, as I'm not sure if they can persist between sessions, and be used by other hosts in network... Any other suggestion? Something I should consider?

Read the article
Why are data structures so important in interviews?

- by Vamsi Emani

I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures when I graduated out of college. Through out the campus placements during my graduation, I've witnessed that most of the biggie tech companies like Amazon, Microsoft etc focused mainly on data structures. It appears as if data structures is the only thing that they expect from a graduate. Adding to this, I see that there is this general perspective that a good programmer is necessarily a one with good knowledge about data structures. To be honest, I felt bad about that. I write good code. I follow standard design patterns of coding, I do use data structures but at the superficial level as in java exposed APIs like ArrayLists, LinkedLists etc. But the companies usually focused on the intricate aspects of Data Structures like pointer based memory manipulation and time complexities. Probably because of my java-ish background, Back then, I understood code efficiency and logic only when talked in terms of Object Oriented Programming like Objects, instances, etc but I never drilled down into the level of bits and bytes. I did not want people to look down upon me for this knowledge deficit of mine in Data Structures. So really why all this emphasis on Data Structures? Does, Not having knowledge in Data Structures really effect one's career in programming? Or is the knowledge in this subject really a sufficient basis to differentiate a good and a bad programmer?

Read the article
Data structure for pattern matching.

- by alvonellos

Let's say you have an input file with many entries like these: date, ticker, open, high, low, close, <and some other values> And you want to execute a pattern matching routine on the entries(rows) in that file, using a candlestick pattern, for example. (See, Doji) And that pattern can appear on any uniform time interval (let t = 1s, 5s, 10s, 1d, 7d, 2w, 2y, and so on...). Say a pattern matching routine can take an arbitrary number of rows to perform an analysis and contain an arbitrary number of subpatterns. In other words, some patterns may require 4 entries to operate on. Say also that the routine (may) later have to find and classify extrema (local and global maxima and minima as well as inflection points) for the ticker over a closed interval, for example, you could say that a cubic function (x^3) has the extrema on the interval [-1, 1]. (See link) What would be the most natural choice in terms of a data structure? What about an interface that conforms a Ticker object containing one row of data to a collection of Ticker so that an arbitrary pattern can be applied to the data. What's the first thing that comes to mind? I chose a doubly-linked circular linked list that has the following methods: push_front() push_back() pop_front() pop_back() [] //overloaded, can be used with negative parameters But that data structure seems very clumsy, since so much pushing and popping is going on, I have to make a deep copy of the data structure before running an analysis on it. So, I don't know if I made my question very clear -- but the main points are: What kind of data structures should be considered when analyzing sequential data points to conform to a pattern that does NOT require random access? What kind of data structures should be considered when classifying extrema of a set of data points?

Read the article
Mathematics for Computer Science

- by jiewmeng

I am going into university next year. I think maths would be one of the more important aspects of computer science? I recently saw the MIT Intro to Algorithms video on YouTube and the maths required is quite hardcore. I wonder what parts of maths do i need, probability, calculus, trigo etc. Will the book Concrete Mathematics - it claims to be foundation for computer science - on Amazon cover most of whats required?

Read the article

Search Results

Search found 60062 results on 2403 pages for 'data science'.

Page 9/2403 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

- by Compudicted

- by Compudicted

- by Greg Ros

- by Irem Radzik

- by contactmatt

- by user2390183

- by akobre01

- by Everton

- by Brian

- by EDO

- by Harold

- by Phil Jackson

- by constant

- by thatjeffsmith

- by Pinal Dave

- by Mehrdad Afshari

- by jasonbogd

- by ct2k7

- by kitsune

- by alex

- by Ender

- by mr.b

- by Vamsi Emani

- by alvonellos

- by jiewmeng

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >