Search Results

Search found 27 results on 2 pages for 'bigdata'.

Page 1/2 | 1 2  | Next Page >

  • BigData and Customer Experience: Happy Together

    - by Isabel F. Peñuelas
    The two big buzzes of the year may lay closer than it appears. Both concepts intersect at various points: BigData and Return of Investment of Marketing Campaigns On a recent post Big Data Is The Future Of Marketing Jeff Dachis explains very clearly how “Big data analytics finally allows marketers to identify, measure, and manage what is positively impacting their Brand”. Regression analysis applied to big data volumes coming from social media will substitute the failed attempts to justify marketing investments on social media in terms of followers and likes, he continues, “the measurement models applied by marketers on TV Campaigns don´t work on social”, we need to study the data with fresh eyes and maybe then we will start understanding and measuring brand engagemet. Social CRM and BigData The real value of Social CRM start by analyzing mass of big data from social media in order of applying social intelligence techniques that allow us to classify new customer niches and communities and define appropriated strategies to contact potential customers. Gartner Says that the Market for Social CRM is on pace to surpass $1 Billion in Revenue by Year-End 2012 but in words of Zach Hofer-Shall, Analyst at Forrester Research “Social customer relationship management is hard” (The Social CRM Arms Race Heats ). To succeed brands need three things: Investing in new social tools, investing in consultancy and investing in infrastructure for massive data storage and analysis. Neither CeX or BigData are easy and cheap wins. But what are the customer benefits of such investments? Big Data and Brand Engagement Time is the most valuable asset of todays consumers: tired of information overload, exhausted by the terabytes of offering, anxious because of not having the same fast multichannel experience with their services’ marketers or preferred goods providers than the one they found on their social media. Yes, I know you have read this before- me too. But is real. The motto of the Customer Experience philosophy of providing a consistent experience through multiple touchpoints that makes the relationship customer/brand easier and valuable finds it basis on understanding customer/s preferences and context for which BigData analysis is another imperative. In summary, I believe that using BigData Analysis in combination with appropriated CeX strategies and technologies is a promising direction for achieving: efficiency and marketing cost-savings; growing the customer base; and increasing customer conversion and retention. In a world: The Direction of Future Marketing.

    Read the article

  • How can I get started with BigData?

    - by ????? ????????
    I have a programming background, and I've done lots of database design and written lots of queries with Sql Server. I am really excited about looking at bigdata solutions. I know almost nothing about it. The way I want to learn is to sign up for a sandbox where I can try things out. questions Is there a sandbox where I can play around with hadoop? It does not have to be free. Would amazon EMR be the right path to go? What technologies should I be looking at to get started quickly? Is there a 'bigdata' dataset that is available to play with? Thank you so much for your guidance.

    Read the article

  • Agile Data Book from O'Reilly Media

    - by Compudicted
    Originally posted on: http://geekswithblogs.net/Compudicted/archive/2013/07/01/153309.aspxAs part of my ongoing self-education and approaching of some free time, yeah, both is a must for every IT person and geek! I have carefully examined the latest trends in the Computersphere with whatever tools I had at my disposal (nothing really fancy was used) and came to a conclusion that for a database pro the *hottest* topic today is undoubtedly the #BigData and all the rapidly growing and spawning ecosystem around it. Having recently immersed myself into the NoSQL world (let me tell here right away NoSQL means Not Only SQL) one book really stood out of the crowd: Book site: http://shop.oreilly.com/product/0636920025054.doDespite being a new book I am sure it will end up on the tables of many Big Data Generalists.In a few dozen words, it is primarily for two reasons:1) The author understands that a  typical business today cannot wait for a Data Scientist for too long to deliver results demanding as usual a very quick turnaround on investments (ROI), and 2) The book covers all the needed and proven modern brick and mortar offerings to get the job done by a relatively newcomer to the Big Data World.It certainly enables such a professional to grow and expand based on the acquired knowledge, and one can truly do it very fast.

    Read the article

  • L'EPITA annonce la 5ème édition de son SRS Day, une journée d'échanges sous le signe de la sécurité, du Cloud et du BigData

    L'EPITA annonce la 5ème édition de son SRS Day Une journée Système Réseaux et Sécurité sous le signe du Cloud, du BigData et de la consumérisation de l'IT Pour sa 5e édition, l'EPITA, (l'école des ingénieurs du numérique, membre de IONIS Education Group) organise le SRS Day en partenariat avec le cabinet de conseil en management et système d'information Solucom. Ce sont 6 groupes de travail composés d'étudiants de l'EPITA, en dernière année de spécialisation systèmes, réseaux et sécurité, encadrés chacun par un expert de Solucom, qui feront lors des conférences du SRS Day un état de l'art sur des domaines parmi les plus sensibles. Cette démarche s'inscrit dans la volonté ...

    Read the article

  • High PageIOLatch_SH Waits with High Drive Idle times

    - by Marty Trenouth
    We are experiencing high volume of PageIOLatch_SH waits on our database (row counts in the Billions). However it seems that our drive Idle time Percentage hovers around 50-60 percent. CPU usage is nill. The Database Tuning Advisor gives no suggestions for optimization. The query plan (actual) from the single stored procedure used on the database puts the majority of the expense on index seek (yeah I know these should be optimial) operations. Anyone have suggestions of how to increase throughput?

    Read the article

  • Low cost way to host a large table yet keep the performance scalable?

    - by Leo Liang
    I have a growing table storing time series data, 500M entries now, and 200K new records every day. The total size is around 15GB for now. My clients are querying the table via a PHP script mostly, and the size of the result set is around 10K records (not very large). select * from T where timestamp > X and timestamp < Y and additionFilters And I want this operation cheap. Currently my table is hosting in Postgres 7, on a single 16G memory Box, and I would love to see some good suggestion for me to host this in low cost and also allow me to scale up for performance if needed. The table serves: 1. Query: 90% 2. Insert: 9.9% 2. Update: 0.1% <-- very rare.

    Read the article

  • Display another field in the referenced table for multiple columns with performance issues in mind

    - by israkir
    I have a table of edge like this: ------------------------------- | id | arg1 | relation | arg2 | ------------------------------- | 1 | 1 | 3 | 4 | ------------------------------- | 2 | 2 | 6 | 5 | ------------------------------- where arg1, relation and arg2 reference to the ids of objects in another object table: -------------------- | id | object_name | -------------------- | 1 | book | -------------------- | 2 | pen | -------------------- | 3 | on | -------------------- | 4 | table | -------------------- | 5 | bag | -------------------- | 6 | in | -------------------- What I want to do is that, considering performance issues (a very big table more than 50 million of entries) display the object_name for each edge entry rather than id such as: --------------------------- | arg1 | relation | arg2 | --------------------------- | book | on | table | --------------------------- | pen | in | bag | --------------------------- What is the best select query to do this? Also, I am open to suggestions for optimizing the query - adding more index on the tables etc... EDIT: Based on the comments below: 1) @Craig Ringer: PostgreSQL version: 8.4.13 and only index is id for both tables. 2) @andrefsp: edge is almost x2 times bigger than object.

    Read the article

  • hadoop - large database query

    - by Mastergeek
    Situation: I have a Postgres DB that contains a table with several million rows and I'm trying to query all of those rows for a MapReduce job. From the research I've done on DBInputFormat, Hadoop might try and use the same query again for a new mapper and since these queries take a considerable amount of time I'd like to prevent this in one of two ways that I've thought up: 1) Limit the job to only run 1 mapper that queries the whole table and call it good. or 2) Somehow incorporate an offset in the query so that if Hadoop does try to use a new mapper it won't grab the same stuff. I feel like option (1) seems more promising, but I don't know if such a configuration is possible. Option(2) sounds nice in theory but I have no idea how I would keep track of the mappers being made and if it is at all possible to detect that and reconfigure. Help is appreciated and I'm namely looking for a way to pull all of the DB table data and not have several of the same query running because that would be a waste of time.

    Read the article

  • Using the link command to keep backups on another drive

    - by Xavier
    I have a folder that contains a not so large amount of space called /data/backup. I have been told that if I link that folder (/data/backup) to an even bigger folder area like /bigdata/backup for example, that I will be able to execute backups to the /data/backup folder. It will then just create a link, but the data will be seen in both folders and the latter one (/bigdata/backup) will contain the backup results but it will show on both folders. Since the /bigdata/backup has far more disk space then the backup will no longer fail because of space problems in the /data/backup one. Is this true?

    Read the article

  • Linux using the link command

    - by Xavier
    Here it goes. I have a folder that contains a not so large amount of space called /data/backup but I have been told that if I link that folder (/data/backup) to an even bigger folder area like /bigdata/backup for example, that I will be able to execute backups to the /data/backup folder because it will be just a link but the data will be seen in both folders and the latter one (/bigdata/backup) will contain the backup results but it will show on both folders and since the /bigdata/backup has far more disk space then the backup will no longer fail because of space problems in the /data/backup one. Is this true? Thanks Xav

    Read the article

  • why are transaction monitors on decline? or are they?

    - by mrkafk
    http://www.itjobswatch.co.uk/jobs/uk/cics.do http://www.itjobswatch.co.uk/jobs/uk/tuxedo.do Look at the demand for programmers (% of job ads that the keyword appears), first graph under the table. It seems like demand for CICS, Tuxedo has fallen from 2.5%/1% respectively to almost zero. To me, it seems bizarre: now we have more networked and internet enabled machines than ever before. And most of them are talking to some kind of database. So it would seem that use of products whose developers spent last 20-30 years working on distributing and coordinating and optimizing transactions should be on the rise. And it appears they're not. I can see a few causes but can't tell whether they are true: we forgot that concurrency and distribution are really hard, and redoing it all by ourselves, in Java, badly. Erlang killed them all. Projects nowadays have changed character, like most business software has already been built and we're all doing internet services, using stuff like Node.js, Erlang, Haskell. (I've used RabbitMQ which is written in Erlang, "but it was small specialized side project" kind of thing). BigData is the emphasis now and BigData doesn't need transactions very much (?). None of those explanations seem particularly convincing to me, which is why I'm looking for better one. Anyone?

    Read the article

  • prerequisites of learnig hadoop, can php developer learn hadoop without java experience [closed]

    - by Rishabh Mathur
    i am willing to learn hadoop as a Developer , but i am confused over the prerequisite of learning it.? is having a good experience in java programming very essential to learn hadoop? I have 4 years of experience in application development in LAMP. But i am not in touch with java programming as a part of my regular work.My objective to get into hadoop is to increase my knowledge in bigdata analysis as well as to get an oppurtunity in this domain. Any suggestions?

    Read the article

  • PASS Summit 2012: keynote and Mobile BI announcements #sqlpass

    - by Marco Russo (SQLBI)
    Today at PASS Summit 2012 there have been several announcements during the keynote. Moreover, other news have not been highlighted in the keynote but are equally if not more important for the BI community. Let’s start from the big news in the keynote (other details on SQL Server Blog): Hekaton: this is the codename for in-memory OLTP technology that will appear (I suppose) in the next release of the SQL Server relational engine. The improvement in performance and scalability is impressive and it enables new scenarios. I’m curious to see whether it can be used also to improve ETL performance and how it differs from using SSD technology. Updates on Columnstore: In the next major release of SQL Server the columnstore indexes will be updatable and it will be possible to create a clustered index with Columnstore index. This is really a great news for near real-time reporting needs! Polybase: in 2013 it will debut SQL Server 2012 Parallel Data Warehouse (PDW), which will include the Polybase technology. By using Polybase a single T-SQL query will run queries across relational data and Hadoop data. A single query language for both. Sounds really interesting for using BigData in a more integrated way with existing relational databases. And, of course, to load a data warehouse using BigData, which is the ultimate goal that we all BI Pro have, right? SQL Server 2012 SP1: the Service Pack 1 for SQL Server 2012 is available now and it enable the use of PowerPivot for SharePoint and Power View on a SharePoint 2013 installation with Excel 2013. Power View works with Multidimensional cube: the long-awaited feature of being able to use PowerPivot with Multidimensional cubes has been shown by Amir Netz in an amazing demonstration during the keynote. The interesting thing is that the data model behind was based on a many-to-many relationship (something that is not fully supported by Power View with Tabular models). Another interesting aspect is that it is Analysis Services 2012 that supports DAX queries run on a Multidimensional model, enabling the use of any future tool generating DAX queries on top of a Multidimensional model. There are still no info about availability by now, but this is *not* included in SQL Server 2012 SP1. So what about Mobile BI? Well, even if not announced during the keynote, there is a dedicated session on this topic and there are very important news in this area: iOS, Android and Microsoft mobile platforms: the commitment is to get data exploration and visualization capabilities working within June 2013. This should impact at least Power View and SharePoint/Excel Services. This is the type of UI experience we are all waiting for, in order to satisfy the requests coming from users and customers. The important news here is that native applications will be available for both iOS and Windows 8 so it seems that Android will be supported initially only through the web. Unfortunately we haven’t seen any demo, so it’s not clear what will be the offline navigation experience (and whether there will be one). But at least we know that Microsoft is working on native applications in this area. I’m not too surprised that HTML5 is not the magic bullet for all the platforms. The next PASS Business Analytics conference in 2013 seems a good place to see this in action, even if I hope we don’t have to wait other six months before seeing some demo of native BI applications on mobile platforms! Viewing Reporting Services reports on iPad is supported starting with SQL Server 2012 SP1, which has been released today. This is another good reason to install SP1 on SQL Server 2012. If you are at PASS Summit 2012, come and join me, Alberto Ferrari and Chris Webb at our book signing event tomorrow, Thursday 8 2012, at the bookstore between 12:00pm and 12:30pm, or follow one of our sessions!

    Read the article

  • Why Oracle Data Integrator for Big Data?

    - by Mala Narasimharajan
    Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well.  The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records.  These are just a few examples of what constitutes big data.   Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness.  The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory.  Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop.  Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations.  The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development.  Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

    Read the article

  • implementing dynamic query handler on historical data

    - by user2390183
    EDIT : Refined question to focus on the core issue Context: I have historical data about property (house) sales collected from various sources in a centralized/cloud data source (assume info collection is handled by a third party) Planning to develop an application to query and retrieve data from this centralized data source Example Queries: Simple : for given XYZ post code, what is average house price for 3 bed room house? Complex: What is estimated price for an house at "DD,Some Street,XYZ Post Code" (worked out from average values of historic data filtered by various characteristics of the house: house post code, no of bed rooms, total area, and other deeper insights like house building type, year of built, features)? In addition to average price, the application should support other property info ** maximum, or minimum price..etc and trend (graph) on a selected property attribute over a period of time**. Hence, the queries should not enforce the search based on a primary key or few fixed fields In other words, queries can be What is the change in 3 Bed Room house price (irrespective of location) over last 30 days? What kind of properties we can get for X price (irrespective of location or house type) The challenge I have is identifying the domain (BI/ Data Analytical or DB Design or DB Query Interface or DW related or something else) this problem (dynamic query on historic data) belong to, so that I can do further exploration My findings so far I could be wrong on the following, so please correct me if you think so I briefly read about BI/Data Analytics - I think it is heavy weight solution for my problem and has scalability issues. DB Design - As I understand RDBMS works well if you know Data model at design time. I am expecting attributes about property or other entity (user) that am going to bring in, would evolve quickly. hence maintenance would be an issue. As I am going to have multiple users executing query at same time, performance would be a bottleneck Other options like Graph DB (http://www.tinkerpop.com/) seems to be bit complex (they are good. but using those tools meant for generic purpose, make me think like assembly programming to solve my problem ) BigData related solution are to analyse data from multiple unrelated domains So, Any suggestion on the space this problem fit in ? (Especially if you have design/implementation experience of back-end for property listing or similar portals)

    Read the article

  • Big Data – Buzz Words: What is NoSQL – Day 5 of 21

    - by Pinal Dave
    In yesterday’s blog post we explored the basic architecture of Big Data . In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – NoSQL. What is NoSQL? NoSQL stands for Not Relational SQL or Not Only SQL. Lots of people think that NoSQL means there is No SQL, which is not true – they both sound same but the meaning is totally different. NoSQL does use SQL but it uses more than SQL to achieve its goal. As per Wikipedia’s NoSQL Database Definition – “A NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models than traditional relational databases.“ Why use NoSQL? A traditional relation database usually deals with predictable structured data. Whereas as the world has moved forward with unstructured data we often see the limitations of the traditional relational database in dealing with them. For example, nowadays we have data in format of SMS, wave files, photos and video format. It is a bit difficult to manage them by using a traditional relational database. I often see people using BLOB filed to store such a data. BLOB can store the data but when we have to retrieve them or even process them the same BLOB is extremely slow in processing the unstructured data. A NoSQL database is the type of database that can handle unstructured, unorganized and unpredictable data that our business needs it. Along with the support to unstructured data, the other advantage of NoSQL Database is high performance and high availability. Eventual Consistency Additionally to note that NoSQL Database may not provided 100% ACID (Atomicity, Consistency, Isolation, Durability) compliance.  Though, NoSQL Database does not support ACID they provide eventual consistency. That means over the long period of time all updates can be expected to propagate eventually through the system and data will be consistent. Taxonomy Taxonomy is the practice of classification of things or concepts and the principles. The NoSQL taxonomy supports column store, document store, key-value stores, and graph databases. We will discuss the taxonomy in detail in later blog posts. Here are few of the examples of the each of the No SQL Category. Column: Hbase, Cassandra, Accumulo Document: MongoDB, Couchbase, Raven Key-value : Dynamo, Riak, Azure, Redis, Cache, GT.m Graph: Neo4J, Allegro, Virtuoso, Bigdata As of now there are over 150 NoSQL Database and you can read everything about them in this single link. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • The Developer's Conference Florianópolis, Brazil

    - by Tori Wieldt
    by guest blogger Yara Senger With over 2900 developers in person and another 2000 online, The Developer's Conference (TDC) in Florianópolis, Brazil, reminds us that Java is BIG in Brazil. The conference included 20 different tracks, and Java was the most popular track. Java was also a big part of the talks in the IoT, Cloud and BigData tracks. Here's my overview (in Brazilian Portguese): Several JUGs were involved in TDC Florianópolis, serving as track leads, speakers and all-around heros, including SouJava SouJava Campinas GUJava Santa Catarina JUG Vale JUG Maringá Java Bahia GOJava (Goinia) JUG Rio do Sul RS Jug (Rio Grande do Sul) and I thank them for their support and commitment. It is a vibrant and fun community! We saw that the IoT space is maturing rapidly. There are already some related to embedded in the region.  Java Evangelist Bruno Borges and Marco Antonio Maciel gave a view popular talk "Java: Tweet for Beer!" They demonstrated how to make a beer tap controlled by Java and connected to the Internet, using a visual application JavaFX with Java SE 8, running on a Rasperry Pi. Of course, they had to test the application quite throughly.   We Brazilians are training the next generation of Java developers. TDC4Kids was as big success. We made a tour with the kids in all booths and almost everybody talked about Java. Java in government managment (Betha), Java on the 2048  (Oracle), Java on the popcorn machine and Java training (Globalcode & V.Office) and of course: Java & Minecraft! OTN's Pablo Ciccarello was there to support the community.  He did several video interviews with JUG leaders and speakers (mine included). You can watch more videos on his TDC Florianópolis playlist.  Thank you, Oracle and OTN for all your support. We interacted with thousands of Java developers at The Developer's Conference Florianópolis. If you want to join us, we are planning two more conferences this year: The Developer's Conference São Paulo, July  The Developer's Conference Porto Alegre, October 

    Read the article

  • 24HOP gets off to a good start

    - by Rob Farley
    Session 11 is on as I write this – Ami Levin presenting about Primary Keys. It’s a good session. But actually, they’ve all been excellent so far, not just Ami’s. I’ve heard only good things about the content. So if you’re reading this and 24HOP is still on, then tune in and take part. If it’s finished, get yourself over to http://sqlpass.org/24hours and see if the sessions have been made available on-demand. Yes – you should be able to watch the sessions when you want to for a year. Watching live is best, because you can ask questions and have them answered during the session, but if there are ones you just couldn’t make, then watching them on-demand is a good option. Numbers have been “not bad”. At the moment it’s still the middle of the night for most Americans – about 6:30am in New York, and yet we’ve had well over a hundred at all the sessions so far, getting up to well over 300 for some sessions. And when I look through the list of names, I see a bunch of names that suggest we’re reaching people from all around the world. I’m seriously looking forward to seeing the stats about which countries have been represented in the audiences. There have been a few comments about the platform. Everyone seems to consider IBTalk an improvement on LiveMeeting, but the closed captioning has met a mixed reception. Some people are loving it, whereas other people are finding the translations leave quite a bit of space for improvement. If you have feedback on this, please feel free to drop me an email (my name with an underscore at hotmail.com, or with a dot at sqlpass.org should reach me just fine, or Twitter, etc). I don’t know how many of the sessions I’ll get to watch overnight – but I’m looking forward to seeing how things go as the day progresses. Big thanks to everyone who’s involved – the sponsors, PASS HQ team and the IBTalk folk who have stayed up overnight to facilitate, plus the moderators, the people doing the live captioning, and of course the speakers and attendees. I love how the SQL Community gets behind things like this. Earlier, the Adelaide SQL Server User Group gathered and watched Denny Lee’s session on BigData, and everyone in the group agreed that it worked really well. I took a picture of our cinema room, although you could only see a small section of the audience. @rob_farley

    Read the article

  • Oracle ENDECA Discovery 3.1 Partner Training 3-Day Workshop

    - by Mike.Hallett(at)Oracle-BI&EPM
    Normal 0 false false false EN-GB X-NONE X-NONE MicrosoftInternetExplorer4 To find out more about the ENDECA training, and to Register for this, click here. June 24-26, 2014: Oracle Reading, UK – Free to partners in EMEA. FREE of charge to OPN member Partners, this Oracle Endeca Information Discovery (OEID) 3-day bootcamp is designed to give partners an understanding of OEID’s features, and how it complements the existing Oracle Business Intelligence suite. This workshop will provide hands-on experience with Oracle Endeca Information Discovery. Topics covered will include Data Exploration with Endeca Information Discovery, Data Ingest, Project Lifecycle, Building an Endeca Server data model and advanced modeling techniques, and Working with Studio. You will also learn about working with ETL components for content acquisitions and other aspects of the project such as security. After taking this course, you will be well prepared to architect, build, demo, and implement an end-to-end Endeca Information Discovery solution. If you are a Bigdata Analytics Architect or Developer, BI or Data Warehouse Architect, developer or consultant, you don’t want to miss this 3-day workshop. Click here to Register for this. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

    Read the article

  • Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

    - by Compudicted
    Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Weindows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net.   Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult.   In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

    Read the article

  • Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

    - by Compudicted
    Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review-again.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Windows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net.   Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult.   In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

    Read the article

  • Top tweets SOA Partner Community – October 2012

    - by JuergenKress
    Send your tweets @soacommunity #soacommunity and follow us at http://twitter.com/soacommunity SOA Community Deploying Fusion Order Demo on 11.1.1.6 by Antony Reynolds http://wp.me/p10C8u-vA leonsmiers ?Cant wait to test it >> 't waiRT @OracleSOA: Case Management patterns, session coverage from #OOW #OracleBPM #ACM #BPM http://bit.ly/OdcZL6 Danilo Schmiedel Bye bye San Francisco. #oow was a great conference in a wonderful city! Thanks! @soacommunity pic.twitter.com/lcYSe9xC OPITZ CONSULTING ?The Journey towards #Oracle #BPM @OpenWorld 2012 - Slides by @t_winterberg & H. Normann: http://ow.ly/edkWE #oow demed Full house at the SOA Customer Advisory Board! #oow12 http://instagr.am/p/QX9B8eLMLS/ Danilo Schmiedel "@whitehorsesnl: Had some great talks with the BPM guys at the DEMOgrounds. It is one of the best things at #oow" -> I agree!! @soacommunity Mark Simpson ?Fusion Middleware Global Innovation Awards: nice to pick up a soa and bpm with our customer. #oow Mark Simpson ?RT @SOASimone: #oraclesoa #oow hands on lab fully booked pic.twitter.com/pwI94Ew7 <--quick, provision some more compute power on the cloud! Oracle SOA ?Join us for BPM and Analytics: Process Dashboards. BAM, and Intelligent OptimizationMoscone South - 308#OracleBPM #OOW Oracle SOA ?Real-time public safety demo! License plate recognition and processing in London via Oracle Event Processing. #oow pic.twitter.com/WufesDBq Marc ?Nice session on customer success stories on #SOA11g on with @SOASimone Pro and cons and architectural overview. #oow pic.twitter.com/bzuhsujm Lucas Jellema Full length Keynote on Middleware #oow : http://medianetwork.oracle.com/video/player/1873556035001 … #oow_amis OracleBlogs ?Why Fusion Middleware matters to Oracle Applications and Fusion Applications customers? http://ow.ly/2stVQ0 OracleBlogs ?Open World Session - BPM, SOA and ADF Combined:Patterns learned from Fusion Applications http://ow.ly/2suhzf Ronald Luttikhuizen ?VENNSTER BLOG | Presentations at OpenWorld 2012 | http://blog.vennster.nl/2012/10/presentations-at-openworld-2012.html … Andrejus Baranovskis @dschmied @soacommunity next OOW for sure, and may be SOA community event ! @soacommunity Danilo Schmiedel ?@andrejusb Thanks Andrejus - I really enjoyed having a session with you at #oow. When is next time :-) ? @soacommunity Lionel Dubreuil ?@soacommunity #oow12 Today-1:15pm-Marriott Marquis Salon 7 Jump-starting Integration with Oracle Foundation Pack http://bit.ly/QKKJzF Ronald Luttikhuizen ?Impression from our fault handling session in OSB and SOA Suite from the audience @soacommunity @gschmutz #oow pic.twitter.com/WSg1Z89E Marc Nice session on Oracle Virtual Assembly for #SOA11g, @soacommunity Works with #exalogic but not required SOA Community ?Send your #soacommunity #oow pictures and blog posts @soacommunity or http://www.facebook.com/soacommunity Enjoy OOW ;-) Jon petter hjulstad Oracle BPM- Big leap forward in 11.1.1.7 ! Whitehorses ?Common BPM Use Cases from Oracle #bpm #oow pic.twitter.com/ofOv04EF Whitehorses ?Oracle BPM 11.1.1.7 top new features. Interesting #oow #oowbenelux pic.twitter.com/HY9QN5un SOA Community Industrialized SOA - topic of Business Technology Magazine http://wp.me/p10C8u-vi orclateamsoa ?A-Team Blog #ateam: The curious case of SOA Human tasks' automatic completion http://ow.ly/1mq6YU Simone Geib Look for this sign #oow #oraclesoa pic.twitter.com/MJsPV4PO Lucas Jellema My summary of Larry Ellison's keynote at #oow on the AMIS Blog: http://technology.amis.nl/2012/10/01/oow-2012-larry-ellisons-keynote-announcements-exa-cloud-database/ … #oow_amis gschmutz ?Join my #oow session "Five Cool Use Cases for the Spring Component" to see the power of Spring and SOA Suite combined! Moscone 310 - 3:15 PM Ronald Luttikhuizen Thanks to @soacommunity for great SOA/BPM dinner event yesterday night! #oow pic.twitter.com/v7x3i0DC OracleBlogs ?OSB, Service Callouts and OQL http://ow.ly/2sq6B2 OracleBlogs ?Cloud and On-Premises Applications Integration using Oracle Integration Adapters http://ow.ly/2sqiDy OracleBlogs ?Adapters, SOA Suite and More @Openworld 2012 http://ow.ly/2srdTg Eric Elzinga ?OSB, Service Callouts and OQL - Part 3, http://see.sc/JodzEx #oracleservicebus Donatas Valys interesting articles about soa industrialization to read #soa #industrialization http://it-republik.de/business-technology/bt-magazin-ausgaben/Industrialized-SOA-000516.html … gschmutz ?“@techsymp: 2012 Symposium Presentation Download Page Now Available! 75% of presentations published. http://www.servicetechsymposium.com ” find mine there.. Oracle BPM Customer Experience and BPM – From Efficiency to Engagement #bpm #oraclebpm #processmanagement #socialbpm http://pub.vitrue.com/Tahi SOA Community ?@soacommunity SOA Community Newsletter September 2012 http://wp.me/p10C8u-wa SOA Community again again again.... it is Oracle Open World 2012 http://wp.me/p10C8u-wk OracleBlogs ?SOA Proactive support http://ow.ly/2smrSJ demed ?@gschmutz on NoSQL at @techsymp http://lockerz.com/s/247601661 demed ?Just finished "#BigData and its impact on #SOA" talk @techsymp. Really enjoyed getting out of beaten path. #london #oep http://lockerz.com/s/247636974 OTNArchBeat ?Need help selling SOA to business stakeholders? Give them this free eBook. #soasuite http://pub.vitrue.com/hsQY SOA Community top Tweets SOA Partner Community &ndash; September 2012 http://wp.me/p10C8u-vc SOA Community Move Data into the grid for scalable, predictable response times http://wp.me/p10C8u-vv ServiceTechSymposium ?The September issue of the Service Technology Magazine is now published with six new items! Read them at http://www.servicetechmag.com Marc ?Reviewed @Packt_OracleFMW new book on SOA11g administration! Very good ! http://tinyurl.com/8pzd5ww SOA Community ?BPM Solution Catalogue&ndash;promote your process templates http://wp.me/p10C8u-vt OTNArchBeat ?BPM ADF Task forms: Checking whether the current user is in a BPM Swimlane | @ChrisKarlChan http://pub.vitrue.com/aPMG OTNArchBeat ?Cloud, automation drive new growth in SOA governance market | @JoeMcKendrick http://pub.vitrue.com/hNPv Simon Haslam ?Looking for "oak style"(!) advanced content but you're a middleware specialist? See #ukoug2012 #middlewaresunday http://2012.ukoug.org/default.asp?p=9355 … Simon Haslam ?The #ukoug2012 agenda is "go, go, go!" (as Murray would say!) http://2012.ukoug.org/agendagrid Germán Gazzoni SOA Spezial II verfügbar – Industralized SOA: Die überarbeitete und ergänzte Neuauflage des SOA Spezial Sonderhe... http://bit.ly/PAWwN9 Oracle SOA ?Flip thru new interactive "Oracle SOA Suite eBook-In the Customers Words" #middleware #soa #oraclesoa http://pub.vitrue.com/NzFZ SOA Community Follow SOA Community on Facebook http://www.facebook.com/soacommunity #soacommunity #opn SOA & BPM Partner Community For regular information on Oracle SOA Suite become a member in the SOA & BPM Partner Community for registration please visit  www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Mix Forum Technorati Tags: SOA Community twitter,SOA Community,Oracle SOA,Oracle BPM,BPM Community,OPN,Jürgen Kress

    Read the article

  • From J2EE to Java EE: what has changed?

    - by Bruno.Borges
    See original @Java_EE tweet on 29 May 2014 Yeap, it has been 8 years since the term J2EE was replaced, and still some people refer to it (mostly recruiters, luckily!). But then comes the question: what has changed besides the name? Our community friend Abhishek Gupta worked on this question and provided an excellent response titled "What's in a name? Java EE? J2EE?". But let me give you a few highlights here so you don't lose yourself with YATO (yet another tab opened): J2EE used to be an infrastructure and resources provider only, requiring developers to depend on external 3rd-party frameworks to then implement application requirements or improve productivity J2EE used to require hundreds of XML lines of codes to define just a dozen of resources like EJBs, MDBs, Servlets, and so on J2EE used to support only EAR (Enterprise Archives) with a bunch of other archives like JARs and WARs just to run a simple Web application And so on, and so on! It was a great technology but still required a lot of work to get something up and running. Remember xDoclet? Remember Struts? The old days of pure Hibernate code? Or when Ajax became a trending topic and we were all implementing it with DWR Servlet? Still, we J2EE developers survived, and learned, and helped evolve the platform to a whole new level of DX (Developer Experience). A new DX for J2EE suggested a new name. One that referred to the platform as the Enterprise Edition of Java, because "Java is why we're here" quoting Bill Shannon. The release of Java EE 5 included so many features that clearly showed developers the platform was going after all those DX gaps. Radical simplification of the persistence model with the introduction of JPA Support of Annotations following the launch of Java SE 5.0 Updated XML APIs with the introduction of StAX Drastic simplification of the EJB component model (with annotations!) Convention over Configuration and Dependency Injection A few bullets you may say but that represented a whole new DX and a vision for upcoming versions. Clearly, the release of Java EE 5 helped drive the future of the platform by reducing the number of XMLs, Java Interfaces, simplified configurations, provided convention-over-configuration, etc! We then saw the release of Java EE 6 with even more great features like Managed Beans, CDI, Bean Validation, improved JSP and Servlets APIs, JASPIC, the posisbility to deploy plain WARs and so many other improvements it is difficult to list in one sentence. And we've gotta give Spring Framework some credit here: thanks to Rod Johnson and team, concepts like Dependency Injection fit perfectly into the Java EE Platform. Clearly, Spring used to be one of the most inspiring frameworks for the Java EE platform, and it is great to see things like Pivotal and Spring supporting JSR 352 Batch API standard! Cooperation to keep improving DX at maximum in the server-side Java landscape.  The master piece result of these previous releases is seen and called today as Java EE 7, which by providing a newly and improved JavaServer Faces release, with new features for Web Development like WebSockets API, improved JAX-RS, and JSON-P, but also including Batch API and so many other great improvements, has increased developer productivity and brought innovation to server-side Java developers. Java EE is not just a new name (which was introduced back in May 2006!) but a new Developer Experience for server-side Java developers. To show you why we are here and where we are going (see the Java EE 8 update), we wanted to share with you a draft of the new Java EE logos that the evangelist team created, to help you spread the word about Java EE. You can get access to these images at the Java EE Platform Facebook Album, or the Google+ Java EE Platform Album whichever is better for you, but don't forget to like and/or +1 those social network profiles :-) A message to all job recruiters: stop using J2EE and start using Java EE if you want to find great Java EE 5, Java EE 6, or Java EE 7 developers To not only save you recruiter valuable characters when tweeting that job opportunity but to also match the correct term, we invite you to replace long terms like "Java/J2EE" or even worse "#Java #J2EE #JEE" or all these awkward combinations with the only acceptable hashtag: #JavaEE. And to prove that Java EE is catching among developers and even recruiters, and that J2EE is past, let me highlight here how are the jobs trends! The image below is from Indeed.com trends page, for the following keywords: J2EE, Java/J2EE, Java/JEE, JEE. As you can see, J2EE is indeed going away, while JEE saw some increase. Perhaps because some people are just lazy to type "Java" but at the same time they are aware that J2EE (the '2') is past. We shall forgive that for a while :-) Another proof that J2EE is going away is by looking at its trending statistics at Google. People have been showing less and less interest in the term J2EE. See the chart below:  Recruiter, if you still need proof that J2EE is past, that Java EE is trending, and that other job recruiters are seeking for Java EE developers, and that the developer community is aware of the new term, perhaps these other charts can show you what term you should be using. See for example the Job Trends for Java EE at Indeed.com and notice where it started... 2006! 8 years ago :-) Last but not least, the Google Trends for Java EE term (including the still wrong but forgivable JavaEE term) shows us that the new term is catching up very well. J2EE is past. Oh, and don't worry about the curves going down. We developers like to be hipsters sometimes and today only AngularJS, NodeJS, BigData are going up. Java EE and other traditional server-side technologies such as Spring, or even from other platforms such as Ruby on Rails, PHP, Grails, are pretty much consolidated and the curves... well, they are consolidated too. So If you are a Java EE developer, drop that J2EE from your résumé, and let recruiters also know that this term is past. Embrace Java EE, and enjoy a new developer experience for server-side Java developers. Java EE on TwitterJava EE on Google+Java EE on Facebook

    Read the article

  • MapRedux - PowerShell and Big Data

    - by Dittenhafer Solutions
    MapRedux – #PowerShell and #Big Data Have you been hearing about “big data”, “map reduce” and other large scale computing terms over the past couple of years and been curious to dig into more detail? Have you read some of the Apache Hadoop online documentation and unfortunately concluded that it wasn't feasible to setup a “test” hadoop environment on your machine? More recently, I have read about some of Microsoft’s work to enable Hadoop on the Azure cloud. Being a "Microsoft"-leaning technologist, I am more inclinded to be successful with experimentation when on the Windows platform. Of course, it is not that I am "religious" about one set of technologies other another, but rather more experienced. Anyway, within the past couple of weeks I have been thinking about PowerShell a bit more as the 2012 PowerShell Scripting Games approach and it occured to me that PowerShell's support for Windows Remote Management (WinRM), and some other inherent features of PowerShell might lend themselves particularly well to a simple implementation of the MapReduce framework. I fired up my PowerShell ISE and started writing just to see where it would take me. Quite simply, the ScriptBlock feature combined with the ability of Invoke-Command to create remote jobs on networked servers provides much of the plumbing of a distributed computing environment. There are some limiting factors of course. Microsoft provided some default settings which prevent PowerShell from taking over a network without administrative approval first. But even with just one adjustment, a given Windows-based machine can become a node in a MapReduce-style distributed computing environment. Ok, so enough introduction. Let's talk about the code. First, any machine that will participate as a remote "node" will need WinRM enabled for remote access, as shown below. This is not exactly practical for hundreds of intended nodes, but for one (or five) machines in a test environment it does just fine. C:> winrm quickconfig WinRM is not set up to receive requests on this machine. The following changes must be made: Set the WinRM service type to auto start. Start the WinRM service. Make these changes [y/n]? y Alternatively, you could take the approach described in the Remotely enable PSRemoting post from the TechNet forum and use PowerShell to create remote scheduled tasks that will call Enable-PSRemoting on each intended node. Invoke-MapRedux Moving on, now that you have one or more remote "nodes" enabled, you can consider the actual Map and Reduce algorithms. Consider the following snippet: $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose Invoke-MapRedux takes an instance of a MapReduceItem which references the Map and Reduce scriptblocks, an array of computer names which are the remote nodes, and the initial data set to be processed. As simple as that, you can start working with concepts of big data and the MapReduce paradigm. Now, how did we get there? I have published the initial version of my PsMapRedux PowerShell Module on GitHub. The PsMapRedux module provides the Invoke-MapRedux function described above. Feel free to browse the underlying code and even contribute to the project! In a later post, I plan to show some of the inner workings of the module, but for now let's move on to how the Map and Reduce functions are defined. Map Both the Map and Reduce functions need to follow a prescribed prototype. The prototype for a Map function in the MapRedux module is as follows. A simple scriptblock that takes one PsObject parameter and returns a hashtable. It is important to note that the PsObject $dataset parameter is a MapRedux custom object that has a "Data" property which offers an array of data to be processed by the Map function. $aMap = { Param ( [PsObject] $dataset ) # Indicate the job is running on the remote node. Write-Host ($env:computername + "::Map"); # The hashtable to return $list = @{}; # ... Perform the mapping work and prepare the $list hashtable result with your custom PSObject... # ... The $dataset has a single 'Data' property which contains an array of data rows # which is a subset of the originally submitted data set. # Return the hashtable (Key, PSObject) Write-Output $list; } Reduce Likewise, with the Reduce function a simple prototype must be followed which takes a $key and a result $dataset from the MapRedux's partitioning function (which joins the Map results by key). Again, the $dataset is a MapRedux custom object that has a "Data" property as described in the Map section. $aReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) # The hashtable to return $redux = @{}; # Return Write-Output $redux; } All Together Now When everything is put together in a short example script, you implement your Map and Reduce functions, query for some starting data, build the MapReduxItem via New-MapReduxItem and call Invoke-MapRedux to get the process started: # Import the MapRedux and SQL Server providers Import-Module "MapRedux" Import-Module “sqlps” -DisableNameChecking # Query the database for a dataset Set-Location SQLSERVER:\sql\dbserver1\default\databases\myDb $query = "SELECT MyKey, Date, Value1 FROM BigData ORDER BY MyKey"; Write-Host "Query: $query" $dataset = Invoke-SqlCmd -query $query # Build the Map function $MyMap = { Param ( [PsObject] $dataset ) Write-Host ($env:computername + "::Map"); $list = @{}; foreach($row in $dataset.Data) { # Write-Host ("Key: " + $row.MyKey.ToString()); if($list.ContainsKey($row.MyKey) -eq $true) { $s = $list.Item($row.MyKey); $s.Sum += $row.Value1; $s.Count++; } else { $s = New-Object PSObject; $s | Add-Member -Type NoteProperty -Name MyKey -Value $row.MyKey; $s | Add-Member -type NoteProperty -Name Sum -Value $row.Value1; $list.Add($row.MyKey, $s); } } Write-Output $list; } $MyReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) $redux = @{}; $count = 0; foreach($s in $dataset.Data) { $sum += $s.Sum; $count += 1; } # Reduce $redux.Add($s.MyKey, $sum / $count); # Return Write-Output $redux; } # Create the item data $Mr = New-MapReduxItem "My Test MapReduce Job" $MyMap $MyReduce # Array of processing nodes... $MyNodes = ("node1", "node2", "node3", "node4", "localhost") # Run the Map Reduce routine... $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose # Show the results Set-Location C:\ $MyMrResults | Out-GridView Conclusion I hope you have seen through this article that PowerShell has a significant infrastructure available for distributed computing. While it does take some code to expose a MapReduce-style framework, much of the work is already done and PowerShell could prove to be the the easiest platform to develop and run big data jobs in your corporate data center, potentially in the Azure cloud, or certainly as an academic excerise at home or school. Follow me on Twitter to stay up to date on the continuing progress of my Powershell MapRedux module, and thanks for reading! Daniel

    Read the article

1 2  | Next Page >