Search Results

Search found 59554 results on 2383 pages for 'distributed data mining'.

Page 544/2383 | < Previous Page | 540 541 542 543 544 545 546 547 548 549 550 551  | Next Page >

  • Function Folding in #PowerQuery

    - by Darren Gosbell
    Originally posted on: http://geekswithblogs.net/darrengosbell/archive/2014/05/16/function-folding-in-powerquery.aspxLooking at a typical Power Query query you will noticed that it's made up of a number of small steps. As an example take a look at the query I did in my previous post about joining a fact table to a slowly changing dimension. It was roughly built up of the following steps: Get all records from the fact table Get all records from the dimension table do an outer join between these two tables on the business key (resulting in an increase in the row count as there are multiple records in the dimension table for each business key) Filter out the excess rows introduced in step 3 remove extra columns that are not required in the final result set. If Power Query was to execute a query like this literally, following the same steps in the same order it would not be overly efficient. Particularly if your two source tables were quite large. However Power Query has a feature called function folding where it can take a number of these small steps and push them down to the data source. The degree of function folding that can be performed depends on the data source, As you might expect, relational data sources like SQL Server, Oracle and Teradata support folding, but so do some of the other sources like OData, Exchange and Active Directory. To explore how this works I took the data from my previous post and loaded it into a SQL database. Then I converted my Power Query expression to source it's data from that database. Below is the resulting Power Query which I edited by hand so that the whole thing can be shown in a single expression: let     SqlSource = Sql.Database("localhost", "PowerQueryTest"),     BU = SqlSource{[Schema="dbo",Item="BU"]}[Data],     Fact = SqlSource{[Schema="dbo",Item="fact"]}[Data],     Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"),     LeftJoin = Table.ExpandTableColumn(Source, "NewColumn"                                   , {"BU_Key", "StartDate", "EndDate"}                                   , {"BU_Key", "StartDate", "EndDate"}),     BetweenFilter = Table.SelectRows(LeftJoin, each (([Date] >= [StartDate]) and ([Date] <= [EndDate])) ),     RemovedColumns = Table.RemoveColumns(BetweenFilter,{"StartDate", "EndDate"}) in     RemovedColumns If the above query was run step by step in a literal fashion you would expect it to run two queries against the SQL database doing "SELECT * …" from both tables. However a profiler trace shows just the following single SQL query: select [_].[BU_Code],     [_].[Date],     [_].[Amount],     [_].[BU_Key] from (     select [$Outer].[BU_Code],         [$Outer].[Date],         [$Outer].[Amount],         [$Inner].[BU_Key],         [$Inner].[StartDate],         [$Inner].[EndDate]     from [dbo].[fact] as [$Outer]     left outer join     (         select [_].[BU_Key] as [BU_Key],             [_].[BU_Code] as [BU_Code2],             [_].[BU_Name] as [BU_Name],             [_].[StartDate] as [StartDate],             [_].[EndDate] as [EndDate]         from [dbo].[BU] as [_]     ) as [$Inner] on ([$Outer].[BU_Code] = [$Inner].[BU_Code2] or [$Outer].[BU_Code] is null and [$Inner].[BU_Code2] is null) ) as [_] where [_].[Date] >= [_].[StartDate] and [_].[Date] <= [_].[EndDate] The resulting query is a little strange, you can probably tell that it was generated programmatically. But if you look closely you'll notice that every single part of the Power Query formula has been pushed down to SQL Server. Power Query itself ends up just constructing the query and passing the results back to Excel, it does not do any of the data transformation steps itself. So now you can feel a bit more comfortable showing Power Query to your less technical Colleagues knowing that the tool will do it's best fold all the  small steps in Power Query down the most efficient query that it can against the source systems.

    Read the article

  • How do I start the postgreSQL service upon boot?

    - by Homunculus Reticulli
    I am running PostgreSQL (v 8.4) on Ubuntu 10.0.4. The PG service currently starts on reboot (after I installed PG on my machine), however, I want the service to use a new data directory. Currently, after a reboot, I have to: Stop the currently running PG service manually type: /usr/local/pgsql/bin/pg_ctl start -D /my/preffered/data/directory -l /usr/local/pgsql/data/logfile Which file do I need to edit to ensure that I always have the service using the correct data folder?

    Read the article

  • parallel_for_each from amp.h – part 1

    - by Daniel Moth
    This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are:    (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality.   That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Medical Devices which supports Direct access through Bluetooth Low Energy [on hold]

    - by Suganthan
    I have went through this link and came to know that we can directly interact with BLE devices to read and write data, so I just want to know some medical device which supports direct access to third-party application (we can directly access the data from the medical device data). Is their any devices which supports direct access to the data Note: I already went through medical devices like Withings and Fitbit

    Read the article

  • Pay in the future should make you think in the present

    - by BuckWoody
    Distributed Computing - and more importantly “-as-a-Service” models of computing have a different cost model. This is something that sounds obvious on the surface but it’s often forgotten during the design and coding phase of a project. In on-premises computing, we’re used to purchasing a server and all of the hardware infrastructure and software licenses needed not only for one project, but several. This is an up-front or “sunk” cost that we consume by running code the organization needs to perform its function. Using a direct connection over wires you’ve already paid for, we don’t often have to think about bandwidth, hits on the data store or the amount of compute we use - we just know more is better. In a pay-as-you-go model, however, each of these architecture decisions has a potential cost impact. The amount of data you store, the number of times you access it, and the amount you send back all come with a charge. The offset is that you don’t buy anything at all up-front, so that sunk cost is freed up. And financial professionals know that money now is worth more than money later. Saving that up-front cost allows you to invest it in other things. It’s not just that you’re using things that now cost money - it’s that the design itself in distributed computing has a cost impact. That can be a really good thing, such as when you dynamically add capacity for paying customers. If you can tie back the cost of a series of clicks to what a user will pay to do so, you can set a profit margin that is easy to track. Here’s a case in point: Assume you are using a large instance in Windows Azure to compute some data that you retrieve from a SQL Azure database. If you don’t monitor the path of the application, you may not know what you are really using. Since you’re paying by the size of the instance, it’s best to maximize it all the time. Recently I evaluated just this situation, and found that downsizing the instance and adding another one where needed, adding a caching function to the application, moving part of the data into Windows Azure tables not only increased the speed of the application, but reduced the cost and more closely tied the cost to the profit. The key is this: from the very outset - the design - make sure you include metrics to measure for the cost/performance (sometimes these are the same) for your application. Windows Azure opens up awesome new ways of doing things, so make sure you study distributed systems architecture before you try and force in the application design you have on premises into your new application structure.

    Read the article

  • Pragmas and exceptions

    - by Darryl Gove
    The compiler pragmas: #pragma no_side_effect(routinename) #pragma does_not_write_global_data(routinename) #pragma does_not_read_global_data(routinename) are used to tell the compiler more about the routine being called, and enable it to do a better job of optimising around the routine. If a routine does not read global data, then global data does not need to be stored to memory before the call to the routine. If the routine does not write global data, then global data does not need to be reloaded after the call. The no side effect directive indicates that the routine does no I/O, does not read or write global data, and the result only depends on the input. However, these pragmas should not be used on routines that throw exceptions. The following example indicates the problem: #include <iostream extern "C" { int exceptional(int); #pragma no_side_effect(exceptional) } int exceptional(int a) { if (a==7) { throw 7; } else { return a+1; } } int a; int c=0; class myclass { public: int routine(); }; int myclass::routine() { for(a=0; a<1000; a++) { c=exceptional(c); } return 0; } int main() { myclass f; try { f.routine(); } catch(...) { std::cout << "Something happened" << a << c << std::endl; } } The routine "exceptional" is declared as having no side effects, however it can throw an exception. The no side effects directive enables the compiler to avoid storing global data back to memory, and retrieving it after the function call, so the loop containing the call to exceptional is quite tight: $ CC -O -S test.cpp ... .L77000061: /* 0x0014 38 */ call exceptional ! params = %o0 ! Result = %o0 /* 0x0018 36 */ add %i1,1,%i1 /* 0x001c */ cmp %i1,999 /* 0x0020 */ ble,pt %icc,.L77000061 /* 0x0024 */ nop However, when the program is run the result is incorrect: $ CC -O t.cpp $ ./a.out Something happend00 If the code had worked correctly, the output would have been "Something happened77" - the exception occurs on the seventh iteration. Yet, the current code produces a message that uses the original values for the variables 'a' and 'c'. The problem is that the exception handler reads global data, and due to the no side effects directive the compiler has not updated the global data before the function call. So these pragmas should not be used on routines that have the potential to throw exceptions.

    Read the article

  • What types of objects are useful in SQL CLR?

    - by Greg Low
    I've had a number of people over the years ask about whether or not a particular type of object is a good candidate for SQL CLR integration. The rules that I normally apply are as follows: Database Object Transact-SQL Managed Code Scalar UDF Generally poor performance Good option when limited or no data-access Table-valued UDF Good option if data-related Good option when limited or no data-access Stored Procedure Good option Good option when external access is required or limited data access DML...(read more)

    Read the article

  • Now Shipping! NetAdvantage for .NET 2010 Volume 3!

    The new NetAdvantage Ultimate includes all four Line of Business user interface control sets for ASP .NET, Windows Forms, WPF and Silverlight plus two advanced Data Visualization UI control sets for WPF and Silverlight. With six NetAdvantage products in one robust package, Infragistics® gives you hundreds of controls and infinite development possibilities. Unified XAML Product Strategy-Share Code, Get More Controls In the 10.3 release, Infragistics continues to deliver code parity between the XAML platforms, WPF and Silverlight. In the line of business toolsets, Infragistics introduces the new xamSchedule™, full-featured, Outlook® 2010-style schedule controls, and the new xamDataTree™, a data bound tree view that comfortably handles tens of thousands of tree nodes. Mimicking our Silverlight Drag and Drop Framework, the WPF Drag and Drop Framework CTP empowers you to add your own rich touches to your applications. Track Users' Behaviors New to all NetAdvantage Silverlight controls is the Infragistics Analytics Framework (IGAF), which empowers you to track user behavior in RIAs running on Silverlight 4. Building on the Microsoft® Silverlight Analytics Framework, with IGAF you can analyze the user's behaviors to ensure the experience you want to deliver. NetAdvantage for Windows Forms--New Office® 2010 Ribbon and Application Menu 2010 Create new experiences with Windows Forms. Now with Office 2010 styling, NetAdvantage for Windows Forms has new features such as Microsoft® Office 2010 ribbon and enhanced Infragistics.Excel to export the contents of the high performance WinGrid™ into Microsoft Excel® 2010. The new Windows Message Support enables Infragistics standalone editor controls to process numerous Windows® OS messages, allowing them to respond just like native controls to changes in the Windows environment. Create Faster Web 2.0 Experiences with NetAdvantage for ASP .NET Infragistics continues to push the envelope to deliver the fastest ASP .NET WebForms controls available on the market. Our lightning fast ASP .NET grids are now enhanced with XPS/PDF Exporting and Summary Rows. This release also includes support for jQuery Templating (as a CTP) within our WebDataGrid™ and WebDataTree™ controls allowing you to quickly cut down overall page size. Deliver Business Intelligence with Power, Flexibility and the Office 2010 Experience NetAdvantage for WPF Data Visualization and NetAdvantage for Silverlight Data Visualization help you deliver flexible, powerful and usable end user experiences in Business Intelligence applications. Both suites include the Pivot Grid that delivers the full power of online analytical processing (OLAP) to present multi-dimensional data, sliced and diced in cross-tabulated form for end users to drill down into, interact with and easily extract meaning from the data. Mapping Made Easy 10.3 marks the official release of the WPF Data Visualization xamMap™ control to map anything and everything from geographic to geo-spacial mapping data. Map layers allow you to add successive levels of detail, navigational panes for panning in all directions, color swatch panes that facilitate value scales like Choropleth shading, and scale panes allowing users to zoom-in and out. Both toolsets introduce the first of many relationship maps! With the xamOrgChart™ CTP you can map out organizational charts of up to 50K employees, competitive brackets (think World Cup) and any other relational, organizational map your application needs. http://www.infragistics.com span.fullpost {display:none;}

    Read the article

  • GlynnTucker.Cache

    - by csharp-source.net
    The GlynnTucker.Cache assembly provides a data structure for caching slow data retrievals, for example data retrieved from a database server over the network. Think of it as a Hashtable that can automatically expire its data after a set amount of time or a specified period of inactivity, on a per-object basis. It is written in C# and dual licensed under the GPL/MPL, it should work with any .NET language.

    Read the article

  • Undocumented Query Plans: The ANY Aggregate

    - by Paul White
    As usual, here’s a sample table: With some sample data: And an index that will be useful shortly: There’s a complete script to create the table and add the data at the end of this post.  There’s nothing special about the table or the data (except that I wanted to have some fun with values and data types). The Task We are asked to return distinct values of col1 and col2 , together with any one value from the thing column (it doesn’t matter which) per group.  One possible result set is shown...(read more)

    Read the article

  • Sync Google Calendar with SharePoint Calendar

    - by dataintegration
    The ADO.NET Providers for Google and SharePoint make it easy to retrieve and update data in both Google's web services and SharePoint. This article shows how the SQL interface to data makes it easy to build applications that need to move data from one source to another. The application described here is a demo Windows application that synchronizes calendar events between Google and SharePoint, but the RSSBus Providers can be used to achieve integrations on both the .NET and the Java platforms, including more sophisticated features like full automation. Getting the Events Step 1: Google accounts can have several calendars. Obtain a list of a user's Google Calendars by issuing a query to the Calendars table. For example: SELECT * FROM Calendars. Step 2: In order to get a list of the events from a given Google Calendar, issue a query to the CalendarEvents table while specifying the CalendarId from the Calendars table. The resulting events can be further filtered by using the StartDateTime or EndDateTime columns. For example: SELECT * FROM CalendarEvents WHERE (CalendarId = '[email protected]') AND (StartDateTime >= '1/1/2012') AND (StartDateTime <= '2/1/2012') Step 3: SharePoint stores data in Lists. There are various types of lists, e.g., document lists and calendar lists. A SharePoint account can have several lists of the same type. To find all the calendar lists in SharePoint, use the ListLists stored procedure and inspect the BaseTemplate column. Step 4: The SharePoint data provider models each SharPoint list as a table. Get the events in a particular calendar by querying the table with the same name as the list. The events may be filtered further by specifying the EventDate or EndDate columns. For example: SELECT * FROM Calendar WHERE (EventDate >= '1/1/2012') AND (EventDate <= '2/1/2012') Synchronizing the Events Synchronizing the events is a simple process. Once the events from Google and SharePoint are available they can be compared and synchronized based on user preference. The sample application does this based on user input, but it is easy to create one that does the synchronization automatically. The INSERT, UPDATE, and DELETE statements available in both data providers makes it easy to create, update, or delete events as needed. Pre-Built Demo Application The executable for the demo application can be downloaded here. Note that this demo is built using BETA builds of the ADO.NET Provider for Google V2 and ADO.NET Provider for SharePoint V2, and will expire in 2013. Source Code You can download the full source of the demo application here. You will need the Google ADO.NET Data Provider V2 and the SharePoint ADO.NET Data Provider V2, which can be obtained here.

    Read the article

  • Microsoft and Application Architectures

    Microsoft has dealt with several kinds of application architectures to include but not limited to desktop applications, web applications, operating systems, relational database systems, windows services, and web services. Because of the size and market share of Microsoft, virtually every modern language works with or around a Microsoft product. Some of the languages include: Visual Basic, VB.Net, C#, C++, C, ASP.net, ASP, HTML, CSS, JavaScript, Java and XML. From my experience, Microsoft strives to maintain an n-tier application standard where an application is comprised of multiple layers that perform specific functions, for example: presentation layer, business layer, data access layer are three general layers that just about every formally structured application contains. The presentation layer contains anything to do with displaying information to the screen and how it appears on the screen. The business layer is the middle man between the presentation layer and data access layer and transforms data from the data access layer in to useable information to be stored later or sent to an output device through the presentation layer. The data access layer does as its name implies, it allows the business layer to access data from a data source like MS SQL Server, XML, or another data source. One of my favorite technologies that Microsoft has come out with recently is the .Net Framework. This framework allows developers to code an application in multiple languages and compiles them in to one intermediate language called the Common Language Runtime (CLR). This allows VB and C# developers to work seamlessly together as if they were working in the same project. The only real disadvantage to using the .Net Framework is that it only natively runs on Microsoft operating systems. However, Microsoft does control a majority of the operating systems currently installed on modern computers and servers, especially with personal home computers. Given that the Microsoft .Net Framework is so flexible it is an ideal for business to develop applications around it as long as they wanted to commit to using Microsoft technologies and operating systems in the future. I have been a professional developer for about 9+ years now and have seen the .net framework work flawlessly in just about every instance I have used it. In addition, I have used it to develop web applications, mobile phone applications, desktop applications, web service applications, and windows service applications to name a few.

    Read the article

  • T-SQL Tuesday #006 Round-up!

    - by Mike C
    T-SQL Tuesday this month was all about LOB (large object) data. Thanks to all the great bloggers out there who participated! The participants this month posted some very impressive articles with information running the gamut from Reporting Services to SQL Server spatial data types to BLOB-handling in SSIS. One thing I noticed immediately was a trend toward articles about spatial data (SQL Server 2008 Geography and Geometry data types, a very fun topic to explore if you haven’t played around with...(read more)

    Read the article

  • Is Ubuntu MAAS free? Will it remain like that?

    - by Bruno Pereira
    Ubuntu MAAS, very cool, awesome in fact, looks like a unique tool for several jobs. It looks free, but part of its documentation starts already with clauses that would scare anyone with interest in it: Documentation is copy righted by Canonical; Documentation must be used only for non-commercial purposes; If documentation is distributed within the non-commercial clause you must retain copyright; It just sounds a lot for a guide on how to install MAAS + Juju + Openstack and that scares me a bit. Under what license is Ubuntu MAAS distributed and what would be the reasoning for being so worried about copyrighting a guide like that so heavily? Is Ubuntu MAAS free? Will it continue like that?

    Read the article

  • How to explain bad software to non-technical people?

    - by mtutty
    In discussing software development with non-technical people (customers, business owners, project sponsors, etc.), I often resort to analogies and metaphors. It's relatively easy and effective to use a "house" or other metaphor for describing the size and complexity of new development. However, we often inherit someone else's code or data, and this approach doesn't seem to hold up as well when trying to explain why we're gutting something that already seems to work. Of course we can point to cycle time and cost to be saved in the future but this generally means nothing to business folks. I know doctors can say "just take this pill," but I'm not sure that software devs have the same authority. Ideas? EDIT: Let me add a bit to the discussion. The specific project I'm talking about has customers that don't realize (or care) about specific aspects of the system we're retiring (i.e., they think it was just fine): The system would save a NEW RECORD every time someone updated a field The system contained tables for reference data. These tables had new records added every day, even though they were duplicates of previous records. And there was no way to tie the reference data used for a particular case at the time it was closed. This is like 99% of the data in the old system. The field NAMES also have spaces, apostrophes and other inappropriate characters in them, making everything harder to work with. In addition to the incredible amount of duplicate data, they have around 1000 XLS files with data they want added to the system. Previously, they would do a spreadsheet for each case in the database, IN ADDITION TO what they typed into the database. Getting rid of this old, unneeded information and piping in the XLS data comprises about 80% of the total project effort, and was not something we could accurately predict. I'm trying to find a concrete way to describe how bad this thing was, mostly so that the customer will understand why the migration process has been so time-consuming. The actual coding was done pretty quickly and the new system works fine, but without the old data they won't be happy. Sorry to get into the weeds, but most of the answers I've seen so far are pretty basic scope/schedule/cost things. I've been doing this for 15 years, so this really is more of a reflective, philosophical question - but without some of the details it can be difficult to really appreciate the awful beauty of this problem.

    Read the article

  • OpenWorld - Database Security Demonstrations in Moscone South Left

    - by Troy Kitch
    All this week, Oracle security experts will be giving live product demos of Oracle Database Security solutions in Moscone South Left, in the Oracle DEMOgrounds for "database." Demonstrations include Oracle Database Defense-in-Depth Security, Database Application Data Redaction, Transparent Data Encryption, Oracle Audit Vault and Database Firewall, Data Masking and Data Subsetting. Don't miss it!

    Read the article

  • Right-Time Retail Part 1

    - by David Dorf
    This is the first in a three-part series. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Right-Time Revolution Technology enables some amazing feats in retail. I can order flowers for my wife while flying 30,000 feet in the air. I can order my groceries in the subway and have them delivered later that day. I can even see how clothes look on me without setting foot in a store. Who knew that a TV, diamond necklace, or even a car would someday be as easy to purchase as a candy bar? Can technology make a mattress an impulse item? Wake-up and your back is hurting, so you rollover and grab your iPad, then a new mattress is delivered the next day. Behind the scenes the many processes are being choreographed to make the sale happen. This includes moving data between systems with the least amount for friction, which in some cases is near real-time. But real-time isn’t appropriate for all the integrations. Think about what a completely real-time retailer would look like. A consumer grabs toothpaste off the shelf, and all systems are immediately notified so that the backroom clerk comes running out and pushes the consumer aside so he can replace the toothpaste on the shelf. Such a system is not only cost prohibitive, but it’s also very inefficient and ineffectual. Retailers must balance the realities of people, processes, and systems to find the right speed of execution. That’ what “right-time retail” means. Retailers used to sell during the day and count the money and restock at night, but global expansion and the Web have complicated that simplistic viewpoint. Our 24hr society demands not only access but also speed, which constantly pushes the boundaries of our IT systems. In the last twenty years, there have been three major technology advancements that have moved us closer to real-time systems. Networking is the first technology that drove the real-time trend. As systems became connected, it became easier to move data between them. In retail we no longer had to mail the daily business report back to corporate each day as the dial-up modem could transfer the data. That was soon replaced with trickle-polling, when sale transactions were occasionally sent from stores to corporate throughout the day, often through VSAT. Then we got terrestrial networks like DSL and Ethernet that allowed the constant stream of data between stores and corporate. When corporate could see the sales transactions coming from stores, it could better plan for replenishment and promotions. That drove the need for speed into the supply chain and merchandising, but for many years those systems were stymied by the huge volumes of data. Nordstrom has 150 million SKU/Store combinations when planning (RPAS); The Gap generates 110 million price changes during end-of-season (RPM); Argos does 1.78 billion calculations executed each day for replenishment planning (AIP). These areas are now being alleviated by the second technology, storage. The typical laptop disk drive runs at 5,400rpm with PCs stepping up to 7,200rpm and servers hitting 15,000rpm. But the platters can only spin so fast, so to squeeze more performance we’ve had to rely on things like disk striping. Then solid state drives (SSDs) were introduced and prices continue to drop. (Augmenting your harddrive with a SSD is the single best PC upgrade these days.) RAM continues to be expensive, but compressing data in memory has allowed more efficient use. So a few years back, Oracle decided to build a box that incorporated all these advancements to move us closer to real-time. This family of products, often categorized as engineered systems, combines the hardware and software so that they work together to provide better performance. How much better? If Exadata powered a 747, you’d go from New York to Paris in 42 minutes, and it would carry 5,000 passengers. If Exadata powered baseball, games would last only 18 minutes and Boston’s Fenway would hold 370,000 fans. The Exa-family enables processing more data in less time. So with faster networks and storage, that brings us to the third and final ingredient. If we continue to process data in traditional ways, we won’t be able to take advantage of the faster networks and storage. Enter what Harvard calls “The Sexiest Job of the 21st Century” – the data scientist. New technologies like the Hadoop-powered Oracle Big Data Appliance, Oracle Advanced Analytics, and Oracle Endeca Information Discovery change the way in which we organize data. These technologies allow us to extract actionable information from raw data at incredible speeds, often ad-hoc. So the foundation to support the real-time enterprise exists, but how does a retailer begin to take advantage? The most visible way is through real-time marketing, but I’ll save that for part 3 and instead begin with improved integrations for the assets you already have in part 2.

    Read the article

  • Nails vs Screws (C# List vs Dictionary)

    - by MarkPearl
    General This may sound like a typical noob statement, but I’m finding out in a very real way that just because you have a solution to a problem, doesn’t necessarily mean it is the best solution. This was reiterated to me when a friend of mine suggested I look at using Dictionaries instead of Lists for a particular problem – he was right, I have always just assumed that because lists solved my problem I did not need to look elsewhere. So my new manifesto to counter this ageless problem is as follows… Look for a solution that will logically work Once you have a solution look for possible alternatives Decide why your current solution is the best approach compared to the alternatives If it is.. use it till something better comes along, if it isnt…. change What’s the difference between Lists & Dictionaries Both lists and dictionaries are used to store collections of data. Assume we had the following declarations… var dic = new Dictionary<string, long>(); var lst = new List<long>(); long data;   With a list, you simply add the item to the list and it will add the item to the end of the list. lst.Add(data); With a dictionary, you need to specify some sort of key and the data you want to add so that it can be uniquely identified. dic.Add(uniquekey, data);   Because with a dictionary you now have unique identifier, in the background they provide all sort’s of optimized algorithms to find your associated data. What this means is that if you are wanting to access your data it is a lot faster than a List. So when is it appropriate to use either class? For me, if I can guarantee that each item in my collection will have a unique identifier, then I will use Dictionaries instead of Lists as there is a considerable performance benefit when accessing each data item. If I cannot make this sort of guarantee, then by default I will use a list. I know this is all really basic, and I hope I haven’t missed some fundamental principle… If anyone would like to add their 2 cents, please feel free to do so…

    Read the article

  • M2M Solutions: The Move to Value Creation and the Internet of Things

    - by Javier Puerta
    There's a new Oracle-sponsored report available around big data, specifically machine to machine data (there will probably be more growth in m2m data than human-generated stuff like social media). Forbes published an article, Big Data Set to Explode as 40 Billion New Devices Connect to Internet, which references the report. Login to Download the M2M Solutions Report Good reading!

    Read the article

  • Batch change modified/created dates?

    - by Billiam
    I recently bought new hard drives for my NAS. This means that I'm copying all the data off the NAS, upgrading it, and then moving the data back. I've gotten as far as copying the data from the NAS, but every file's modified/created date has been changed to when it was copied (today). Is there a way, keeping in mind that I have the original data, to batch update the modified/created dates on the copied files without having to copy them over again (we're talking over a terabyte of data)?

    Read the article

  • Where ORMs blur the lines between code and data, how do you decide what logic should be a stored procedure, and what should be coded?

    - by PhonicUK
    Take the following pseudocode: CreateInvoiceAndCalculate(ItemsAndQuantities, DispatchAddress, User); And say CreateInvoice does the following: Create a new entry in an Invoices table belonging to the specified User to be sent to the given DispatchAddress. Create a new entry in an InvoiceItems table for each of the items in ItemsAndQuantities, storing the Item, the Quantity, and the cost of the item as of now (by looking it up from an Items table) Calculate the total amount of the invoice (ex shipping and taxes) and store it in the new Invoice row. At a glace you wouldn't be able to tell if this was a method in my applications code, or a stored procedure in the database that is being exposed as a function by the ORM. And to some extent it doesn't really matter. Now technically none of this is business logic. You're not making any decisions - just performing a calculation and creating records. However some may argue that because you are performing a calculation that affects the business (the total amount to be invoiced) that this isn't something that should be done in a stored procedure and instead should be in code. So for this specific example - why would it be more appropriate to do one or the other? And where do you draw the line? Or does it even particular matter as long as it's sufficiently well documented?

    Read the article

  • Basic memcached question

    - by Aadith
    I have been reading up on distributed hashing. I learnt that consistent hashing is used for distributing the keys among cache machines. I also learnt that, a key is duplicated on mutiple caches to handle failure of cache hosts. But what I have come across on memcached doesn't seem to be in alignment with all this. I read that all cache nodes are independent of each other and that if a cache goes down, requests go to DB. Theres no mention of cache miss on a host resulting in the host directing the request to another host which could either be holding the key or is nearer to the key. Can you please tell me how these two fit together? Is memcached a very preliminary form of distributed hashing which doesnt have much sophistication?

    Read the article

  • Hadoop:Only master node does the work

    - by user287722
    I've setup a Hadoop 2.2 cluster with 1 master node(namenode and secondary namenode) and 3 slave nodes(datanode and namenode on each one).All of the machines use Linux Mint 64bit. When I run my MapReduce program, writen in Java, I can only see that master node is using extra CPU and RAM. Slave nodes are not doing a thing. I've checked the logs from all of the namenodes and there is nothing wrong with the namenodes on slave nodes. Resource Manager is running and all of the slave nodes can see the Resource Manager. I used this http://n0where.net/hadoop-2-2-multi-node-cluster-setup/ tutorial to configure my nodes. Datanodes are working in terms of distributed data storing but I can't see any indication of distributed data processing. Do I have to configure the xml configuration files in some other way so all of the machines will process data while I'm running my MapReduce Job?

    Read the article

  • Raid superblock missing on single parition. Recovery needed!

    - by user171639
    Ok so I have a 2 TB raid 1 setup that has three partitions: sdc1: linux sdc2: swap sdc3: LVM for data However the LVM will no longer mount. So I thought that I would take the first drive, mount it in linux (ive done this b4), and reset the spare drive to copy the data. Normally I can mount a single drive for data recovery using: sudo su apt-get install mdadm lvm2 mdadm --assemble --scan modprobe dm-mod vgscan vgchange -ay c mount -o ro /dev/c/c /mnt Unfortunately, vgscan doesnot recognize the data partition. It appears as though the superblock on the first drive's data partition was erased while syncing with the second. So now I cannot mount that partition and the second drive is stuck in spare mode. Any ideas? Or a way to force mount the data partition just to copy the data? knoppix@Microknoppix:~$ sudo su root@Microknoppix:/home/knoppix# apt-get install mdadm lvm2 Reading package lists... Done Building dependency tree Reading state information... Done lvm2 is already the newest version. mdadm is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 551 not upgraded. root@Microknoppix:/home/knoppix# mdadm --assemble --scan mdadm: /dev/md/1 has been started with 1 drive (out of 2). mdadm: /dev/md/0 has been started with 1 drive (out of 2). root@Microknoppix:/home/knoppix# modprobe dm-mod root@Microknoppix:/home/knoppix# vgscan Reading all physical volumes. This may take a while... No volume groups found root@Microknoppix:/home/knoppix# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdc1[2] 4193268 blocks super 1.2 [2/1] [U_] md1 : active raid1 sdc2[2] 524276 blocks super 1.2 [2/1] [U_] unused devices: <none> root@Microknoppix:/home/knoppix# mdadm -v --assemble --auto=yes /dev/md2 /dev/sdc3 mdadm: looking for devices for /dev/md2 mdadm: no recogniseable superblock on /dev/sdc3 mdadm: /dev/sdc3 has no superblock - assembly aborted root@Microknoppix:/home/knoppix# dumpe2fs /dev/md0 | grep -i superblock dumpe2fs 1.42.4 (12-Jun-2012) Primary superblock at 0, Group descriptors at 1-1 Backup superblock at 32768, Group descriptors at 32769-32769 Backup superblock at 98304, Group descriptors at 98305-98305 Backup superblock at 163840, Group descriptors at 163841-163841 Backup superblock at 229376, Group descriptors at 229377-229377 Backup superblock at 294912, Group descriptors at 294913-294913 Backup superblock at 819200, Group descriptors at 819201-819201 Backup superblock at 884736, Group descriptors at 884737-884737 root@Microknoppix:/home/knoppix# Notes: I can read the super block from the spare drive. I was gonna try and restore the superblock from one of the backups, but i dont know how or if this would work. I also heard creating a new array (mdadm --create) using the same parameters will not delete the data on the drive but i didnt want to risk it. Recommendations?

    Read the article

< Previous Page | 540 541 542 543 544 545 546 547 548 549 550 551  | Next Page >