Search Results

Search found 58965 results on 2359 pages for 'ssis data transformations'.

Page 64/2359 | < Previous Page | 60 61 62 63 64 65 66 67 68 69 70 71  | Next Page >

  • Algorithm detect repeating/similiar strings in a corpus of data -- say email subjects, in Python

    - by RizwanK
    I'm downloading a long list of my email subject lines , with the intent of finding email lists that I was a member of years ago, and would want to purge them from my Gmail account (which is getting pretty slow.) I'm specifically thinking of newsletters that often come from the same address, and repeat the product/service/group's name in the subject. I'm aware that I could search/sort by the common occurrence of items from a particular email address (and I intend to), but I'd like to correlate that data with repeating subject lines.... Now, many subject lines would fail a string match, but "Google Friends : Our latest news" "Google Friends : What we're doing today" are more similar to each other than a random subject line, as is: "Virgin Airlines has a great sale today" "Take a flight with Virgin Airlines" So -- how can I start to automagically extract trends/examples of strings that may be more similar. Approaches I've considered and discarded ('because there must be some better way'): Extracting all the possible substrings and ordering them by how often they show up, and manually selecting relevant ones Stripping off the first word or two and then count the occurrence of each sub string Comparing Levenshtein distance between entries Some sort of string similarity index ... Most of these were rejected for massive inefficiency or likelyhood of a vast amount of manual intervention required. I guess I need some sort of fuzzy string matching..? In the end, I can think of kludgy ways of doing this, but I'm looking for something more generic so I've added to my set of tools rather than special casing for this data set. After this, I'd be matching the occurring of particular subject strings with 'From' addresses - I'm not sure if there's a good way of building a data structure that represents how likely/not two messages are part of the 'same email list' or by filtering all my email subjects/from addresses into pools of likely 'related' emails and not -- but that's a problem to solve after this one. Any guidance would be appreciated.

    Read the article

  • What changes in Core Data after a save?

    - by Splash6
    I have a Core Data based mac application that is working perfectly well until I save a file. When I save a file it seems like something changes in core data because my original fetch request no longer fetches anything. This is the fetch request that works before saving but returns an empty array after saving. NSEntityDescription *outputCellEntityDescription = [NSEntityDescription entityForName:@"OutputCell" inManagedObjectContext:[[self document] managedObjectContext]]; NSFetchRequest *outputCellRequest = [[[NSFetchRequest alloc] init] autorelease]; [outputCellRequest setEntity:outputCellEntityDescription]; NSPredicate *outputCellPredicate = [NSPredicate predicateWithFormat:@"(cellTitle = %@)", outputCellTitle]; [outputCellRequest setPredicate:outputCellPredicate]; NSError *outputCellError = nil; NSArray *outputCellArray = [[[self document] managedObjectContext] executeFetchRequest:outputCellRequest error:&outputCellError]; I have checked with [[[self document] managedObjectContext] registeredObjects] to see that the object still exists after the save and nothing seems to have changed and the object still exists. It is probably something fairly basic but does anyone know what I might be doing wrong? If not can anyone give me any pointers to what might be different in the Core Data model after a save so I might have some clues why the fetch request stops working after saving?

    Read the article

  • C++: best way to implement globally scoped data

    - by bobobobo
    I'd like to make program-wide data in a C++ program, without running into pesky LNK2005 errors when all the source files #includes this "global variable repository" file. I have 2 ways to do it in C++, and I'm asking which way is better. The easiest way to do it in C# is just public static members. C#: public static class DataContainer { public static Object data1 ; public static Object data2 ; } In C++ you can do the same thing C++ global data way#1: class DataContainer { public: static Object data1 ; static Object data2 ; } ; Object DataContainer::data1 ; Object DataContainer::data2 ; However there's also extern C++ global data way #2: class DataContainer { public: Object data1 ; Object data2 ; } ; extern DataContainer * dataContainer ; // instantiate in .cpp file In C++ which is better, or possibly another way which I haven't thought about? The solution has to not cause LNK2005 "object already defined" errors.

    Read the article

  • strange memory error when deleting object from Core Data

    - by llloydxmas
    I have an application that downloads an xml file, parses the file, and creates core data objects while doing so. In the parse code I have a function called 'emptydatacontext' that removes all items from Core Data before creating replacements from the xml file. This method looks like this: -(void) emptyDataContext { NSMutableArray* mutableFetchResults = [CoreDataHelper getObjectsFromContext:@"Condition" :@"road" :NO :managedObjectContext]; NSFetchRequest * allCon = [[NSFetchRequest alloc] init]; [allCon setEntity:[NSEntityDescription entityForName:@"Condition" inManagedObjectContext:managedObjectContext]]; NSError * error = nil; NSArray * conditions = [managedObjectContext executeFetchRequest:allCon error:&error]; [allCon release]; for (NSManagedObject * condition in conditions) { [managedObjectContext deleteObject:condition]; } } The first time this runs it deletes all objects and functions as it should - creating new objects from the xml file. I created a 'update' button that starts the exact same process of retrieving the file the preceeding as it did the first time. All is well until its time to delete the core data objects again. This 'deleteObject' call creates a "EXC_BAD_ACCESS" error each time. This only happens on the second time through. See this image for the debugger window as it appears when walking through the deletion FOR loop on the second iteration. Conditions is the fetched array of 7 objects with the objects below. Condition should be an individual condition. link text As you can see 'condition' does not match any of the objects in the 'conditions' array. I'm sure this is why I'm getting the memory access errors. Just not sure why this fetch (or the FOR) is returning a wrong reference. All the code that successfully performes this function on the first iteration is used in the second but with very different results. Thanks in advance for the help!

    Read the article

  • MVC and jQuery data retrieval.

    - by user337542
    Hello, I am using mvc and jQuery and I am trying to display someone's profile with some additional institutions that the person belongs to. I am new to this but I've done something like this in ProfileControler: public ActionResult Institutions(int id) { var inst = fr.getInstitutions(id); return Json(inst); } getInstitutions(id) returns Institution objects(with Name, City, Post Code etc.) Then in a certain View I am trying to get the data with jQuery and display them as follows: $(document).ready(function () { $.post("/Profile/Institutions", { id: <%= Model.Profile.userProfileID %> }, function (data) { $.each(data, function () { var new_div = $("<div>"); var new_label = $("<label>"); new_label.html(this.City); var new_input_b = $("<input>"); new_input_b.attr("type", "button"); new_div.append(new_label); new_div.append(new_input_b); $("#institutions").append(new_div); }); }); }); $("#institutions") is a div where i want to display all of the results. .post works correct for sure because certain institutions are retrieved from database, and passed to the view as Json result. But then I am affraid it wont itterate with .each. Any help, coments or pointing in some direction would be much appriciated

    Read the article

  • JTable data only shown after scrolling

    - by Christian 'fuzi' Orgler
    I wrote a method, that creates my DefaultTableModel and there I'm going to add my records. When I set the model to my JTable, the data rows are blank. After scrolling the data gets displayed correct. How can I avoid this and display the data from the first moment? EDIT: I imported the javax.swing.table.DefaultTableModel -- is this correct? private DefaultTableModel _dtm; private void loadTable(Vector<Member> members) { loadTableModel(); try { lbl_state.setText("Please wait"); for (Member actMember : members) { String gender = ""; if (actMember.getGender() == MemberView.MEMBER_MALE) { gender = "männlich"; } else { gender = "weiblich"; } _dtm.addRow(new Object[]{ actMember.getNname(), actMember.getVname(), actMember.getCity(), actMember.getStreet(), actMember.getPlz(), actMember.getMail(), actMember.getPhonenumber(), actMember.getBirthdayString(), actMember.getStartDateString(), gender, actMember.getBankname(), actMember.getAccountnumber(), actMember.getBanknumber(), actMember.getGroup().toString(), (actMember.hasAccess() ? "JA" : "NEIN"), actMember.getWriteDateString(), (actMember.hasDrinkAbo() ? "JA" : "NEIN") }); } } catch (Exception ex) { System.err.println(ex.getMessage()); } tbl_results.setModel(_dtm); } private void loadTableModel() { _dtm = new DefaultTableModel(new Object[]{"Nachname", "Vorname", "Ort", "Straße", "PLZ", "E-Mail", "Telefon", "Geburtsdatum", "Beitrittsdatum", "Geschlecht", "Bankname", "Kontonummer", "Bankleitzahl", "Gruppe", "hat Zugriff", "Einschreibdatum", "Getränkeabo"}, 0); tbl_results.setAutoResizeMode(JTable.AUTO_RESIZE_ALL_COLUMNS); }

    Read the article

  • NSFetchedResultsController on secondary UITableView - how to query data?

    - by Jason
    I am creating a core-data based Navigation iPhone app with multiple screens. Let's say it is a flash-card application. The data model is very simple, with only two entities: Language, and CardSet. There is a one-to-many relationship between the Language entity and the CardSet entities, so each Language may contain multiple CardSets. In other words, Language has a one-to-many relationship Language.cardSets which points to the list of CardSets, and CardSet has a relationship CardSet.language which points to the Language. There are two screens: (1) An initial TableView screen, which displays the list of languages; and (2) a secondary TableView screen, which displays the list of CardSets in the Language. In the initial screen, which lists the languages, I am using NSFetchedResultsController to keep the list of languages up-to-date. The screen passes the Language selected to the secondary screen. On the secondary screen, I am trying to figure out whether I should again use an NSFetchedResultsController to maintain the list of CardSets, or if I should work through Language.cardSets to simply pull the list out of the object model. The latter makes the most sense programatically because I already have the Language - but then it would not automatically be updated on changes. I have looked at the NSFetchedResultsController documentation, and it seems like I can easily create predicates based on attributes - but not relationships. I.e., I can create the following NSFetchedResultsController: NSPredicate *predicate = [NSPredicate predicateWithFormat:@"name LIKE[c] 'Chuck Norris'"]; How can I access my data through the direct relationship - Language.cardSets - and also have the table auto-update using NSFetchedResultsController? Is this possible?

    Read the article

  • Can't find momd file: Core Data problems

    - by thekevinscott
    Aw geez! I screwed something up! I'm a Core Data noob, working on my first iOS app. After much Stack Overflowing I'm using this code: NSString *path = [[NSBundle mainBundle] pathForResource:@"CoreData" ofType:@"momd"]; if (!path) { path = [[NSBundle mainBundle] pathForResource:@"CoreData" ofType:@"mom"]; } NSAssert(path != nil, @"Unable to find Resource in main bundle"); CoreData is the name of my app. I've tried to put in initial data into the app by finding the path to the sqlite file in my iPhone simulator, and then going and inserting into that sqlite file. But at some point, I moved the sqlite (thinking it would create a fresh copy), deleted the app from the simulator, and the sqlite file is gone. I'm not sure if I'm leaving out some part of the process (this was a few hours ago) but the end result is that everything is screwed up. How do I resubstantiate this sqlite / momd file? "Clean" and "Clean all targets" are grayed out. I'm happy to post the relevant code from my app that would help shed some light on this problem but there's tons of code relating to Core Data which I don't understand, so I'm not sure what part to post! Any help is greatly appreciated.

    Read the article

  • Import & modify date data in MATLAB

    - by niko
    I have a .csv file with records written in the following form: 2010-04-20 15:15:00,"8.9915176259e+00","8.8562623697e+00" 2010-04-20 15:30:00,"8.5718021723e+00","8.6633827160e+00" 2010-04-20 15:45:00,"8.4484844117e+00","8.4336586330e+00" 2010-04-20 16:00:00,"1.1106980342e+01","8.4333062208e+00" 2010-04-20 16:15:00,"9.0643470589e+00","8.6885660103e+00" 2010-04-20 16:30:00,"8.2133517943e+00","8.2677822671e+00" 2010-04-20 16:45:00,"8.2499419380e+00","8.1523501983e+00" 2010-04-20 17:00:00,"8.2948492278e+00","8.2884797924e+00" From these data I would like to make clusters - I would like to add a column with number indicating the hour - so in case of the first row a value 15 has to be added in a new row. The first problem is that calling a function [numData, textData, rawData] = xlsread('testData.csv') creates an empty matrix numData and one-column textData and rawData structures. Is it possible to create any template which recognizes a yyyy, MM, dd, hh, mm, ss values from the data above? What I would basically like to do with these data is to categorize the values by hours so from the example row of input: 2010-04-20 15:15:00,"8.9915176259e+00","8.8562623697e+00" update 1: in Matlab the line above is recognized as a string: '2010-04-26 13:00:00,"1.0428104753e+00","2.3456394130e+00"' I would want this to be the output: 15, 8.9915176259e+00, 8.8562623697e+00 update 1: a string has to be parsed Does anyone know how to parse a string and retrieve a timestamp, value1 (1.0428104753e+00) and value2 (2.3456394130e+00) from it as separate values?

    Read the article

  • Uniquing with Existing Core Data Entities

    - by warrenm
    I'm using Core Data to store a lot (1000s) of items. A pair of properties on each item are used to determine uniqueness, so when a new item comes in, I compare it against the existing items before inserting it. Since the incoming data is in the form of an RSS feed, there are often many duplicates, and the cost of the uniquing step is O(N^2), which has become significant. Right now, I create a set of existing items before iterating over the list of (possible) new items. My theory is that on the first iteration, all the items will be faulted in, and assuming we aren't pressed for memory, most of those items will remain resident over the course of the iteration. I see my options thusly: Use string comparison for uniquing, iterating over all "new" items and comparing to all existing items (Current approach) Use a predicate to filter the set of existing items against the properties of the "new" items. Use a predicate with Core Data to determine uniqueness of each "new" item (without retrieving the set of existing items). Is option 3 likely to be faster than my current approach? Do you know of a better way?

    Read the article

  • WCF data services - Limiting related objects returned based on critera

    - by Mike Morley
    I have an object graph consisting of a base employee object, and a set of related message objects. I am able to return the employee objects based on search criteria on the employee properties (eg team) etc. However, if I expand on the messages, I get the full collection of messages back. I would like to be able to either take the top n messages (i.e. restrict to 10 most recent) or ideally use a date range on the message objects to limit how many are brought back. So far I have not been able to figure out a way of doing this: I get an error if I attempt to filter on properties on the message (&$filter=employee/message/StartDate gives an error "No property 'StartDate' exists in type 'System.Data.Objects.DataClasses.EntityCollection`1). Attempting to use Top on the message related object doesn't work either. I have also tried using a WebGet extension that takes a string list of employee IDs. That works until the list gets too long, and then fails due to the URL getting too long (it might be possible to setup a paging mechanism on this approach)... Unfortunately the UI control I am using requires the data to be in a fairly specific hierarchical shape, so I can't easily come at this from starting on the message side and working backwards. Outside of making multiple calls does anyone know of a method to accomplish this with wcf data services? Thanks! M.

    Read the article

  • Multidimensional data structure?

    - by Austin Truong
    I need a multidimensional data structure with a row and a column. Must be able to insert elements any location in the data structure. Example: {A , B} I want to insert C in between A and B. {A, C, B}. Dynamic: I do not know the size of the data structure. Another example: I know the [row][col] of where I want to insert the element. EX. insert("A", 1, 5), where A is the element to be inserted, 1 is the row, 5 is the column. EDIT I want to be able to insert like this. static void Main(string[] args) { Program p = new Program(); List<string> list = new List<string>(); list.Insert(1, "HELLO"); list.Insert(5, "RAWR"); for (int i = 0; i < list.Count; i++) { Console.WriteLine(list[i]); } Console.ReadKey(); } And of course this crashes with an out of bounds error. So in a sense I will have a user who will choose which ROW and COL to insert the element to.

    Read the article

  • Compact data structure for storing a large set of integral values

    - by Odrade
    I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

    Read the article

  • Display data FROM Sheet1 on Sheet2 based on conditional value

    - by Shawn
    Imagine a worksheet with 30 pieces of information, such as: A1= Start Date A2 = End Date A3 = Resource Name A4 = Cost .... A30 = Whatever B1 = 1/1/2010 B2 = 2/15/2010 B3 = Joe Smith B4 = $10,000.00 ... B30 = Blah Blah Now imagine a third column, C. The purpose of the third column is to determine WHICH report that row of data needs to appear in. C1 = Report 1 C2 = Report 1 and Report 2 C3 = Report 4 and Report 7 C4 = Report 1 and Report 5 ... C30 = Report 2 Each report is on Sheets 2, 3, 4, 5 and so on (depending on how many I decide to create). As you can see form my example above, some data may need to appear in multiple reports. For example, the data in Row 3 (Resource Name: Joe Smith) needs to appear in Report 4 and Report 7. That is to say, it needs to DYNAMICALLY appear on two additional worksheets. If I change the values in column C, then the reports need to update automatically. How can I create the worksheets which will serve as the reports such that they only diaplay the rows which have been "flagged" to be displayed in that report? Thanks!

    Read the article

  • Python, web log data mining for frequent patterns

    - by descent
    Hello! I need to develop a tool for web log data mining. Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website. I am new to Data Mining, and now examining Google a lot. Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies. So my questions are: Are there any python-based tools that do what I need or at least smth similar? Can Orange toolkit be of any help? Can reading the book Programming Collective Intelligence be of any help? What to Google for, what to read, which relatively simple algorithms to use best? I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time. Thanks in advance!

    Read the article

  • Save many-to-one relationship from JSON into Core Data

    - by Snow Crash
    I'm wanting to save a Many-to-one relationship parsed from JSON into Core Data. The code that parses the JSON and does the insert into Core Data looks like this: for (NSDictionary *thisRecipe in recipes) { Recipe *recipe = [NSEntityDescription insertNewObjectForEntityForName:@"Recipe" inManagedObjectContext:insertionContext]; recipe.title = [thisRecipe objectForKey:@"Title"]; NSDictionary *ingredientsForRecipe = [thisRecipe objectForKey:@"Ingredients"]; NSArray *ingredientsArray = [ingredientsForRecipe objectForKey:@"Results"]; for (NSDictionary *thisIngredient in ingredientsArray) { Ingredient *ingredient = [NSEntityDescription insertNewObjectForEntityForName:@"Ingredient" inManagedObjectContext:insertionContext]; ingredient.name = [thisIngredient objectForKey:@"Name"]; } } NSSet *ingredientsSet = [NSSet ingredientsArray]; [recipe setIngredients:ingredientsSet]; Notes: "setIngredients" is a Core Data generated accessor method. There is a many-to-one relationship between Ingredients and Recipe However, when I run this I get the following error: NSCFDictionary managedObjectContext]: unrecognized selector sent to instance If I remove the last line (i.e. [recipe setIngredients:ingredientsSet];) then, taking a peek at the SQLite database, I see the Recipe and Ingredients have been stored but no relationship has been created between Recipe and Ingredients Any suggestions as to how to ensure the relationship is stored correctly?

    Read the article

  • Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

    - by Reed
    In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine.  If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine.  The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off dataData Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection.  One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism.  In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library.  This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple.  Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter.  Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel.  Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition.  Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns.  We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change.  Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine.  Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method.  In this case, our for loop iteration block becomes a delegate creating via a lambda expression.  This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea.  There is a balance to be struck when writing parallel code.  We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce.  In this case, we have an image of data – most likely hundreds of pixels in both dimensions.  By just parallelizing our first loop, each row of pixels can be run as a single task.  With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks.  This introduces extra overhead with no extra gain, and will actually reduce our overall performance.  This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop.  I could have just as easily partitioned the inner loop.  However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels).  My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop.  If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance.  If we’re iterating through an IList<T> or an array, this is a typical approach.  However, there are other iteration statements common in C#.  In many situations, we’ll use foreach instead of a for loop.  This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation.  Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change.  Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist.  If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression.  This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel.  The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library.  These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism.  Iterating through collections and performing “work” is a very common pattern in nearly every application.  When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls.  Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

    Read the article

  • ERROR: Attempted to read or write protected memory. This is often an indication that other memory is corrupt

    - by SPSamL
    I get this error after having edited a few pages in SharePoint 2010. I have to do an IISReset on both front ends to get this to resolve. I don't know how to fix it or even what else to supply here, but please let me know as the resets now happen several times per day. Log Name: Application Source: ASP.NET 2.0.50727.0 Date: 1/26/2011 11:12:48 AM Event ID: 1309 Task Category: Web Event Level: Warning Keywords: Classic User: N/A Computer: PINTSPSFE02.samcstl.org Description: Event code: 3005 Event message: An unhandled exception has occurred. Event time: 1/26/2011 11:12:48 AM Event time (UTC): 1/26/2011 5:12:48 PM Event ID: c52fb336b7f147a3913fff3617a99d57 Event sequence: 4965 Event occurrence: 2178 Event detail code: 0 Application information: Application domain: /LM/W3SVC/1449762715/ROOT-2-129405348166941887 Trust level: WSS_Minimal Application Virtual Path: / Application Path: C:\inetpub\wwwroot\wss\VirtualDirectories\80\ Machine name: PINTSPSFE02 Process information: Process ID: 5928 Process name: w3wp.exe Account name: SAMC\MossAppPool Exception information: Exception type: AccessViolationException Exception message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Request information: Request URL: http://mosscluster/Pages/Home.aspx Request path: /Pages/Home.aspx User host address: 10.3.60.26 User: SAMC\BARNMD Is authenticated: True Authentication Type: NTLM Thread account name: SAMC\MossAppPool Thread information: Thread ID: 110 Thread account name: SAMC\MossAppPool Is impersonating: False Stack trace: at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) Custom event details: Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="ASP.NET 2.0.50727.0" /> <EventID Qualifiers="32768">1309</EventID> <Level>3</Level> <Task>3</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2011-01-26T17:12:48.000000000Z" /> <EventRecordID>35834</EventRecordID> <Channel>Application</Channel> <Computer>PINTSPSFE02.samcstl.org</Computer> <Security /> </System> <EventData> <Data>3005</Data> <Data>An unhandled exception has occurred.</Data> <Data>1/26/2011 11:12:48 AM</Data> <Data>1/26/2011 5:12:48 PM</Data> <Data>c52fb336b7f147a3913fff3617a99d57</Data> <Data>4965</Data> <Data>2178</Data> <Data>0</Data> <Data>/LM/W3SVC/1449762715/ROOT-2-129405348166941887</Data> <Data>WSS_Minimal</Data> <Data>/</Data> <Data>C:\inetpub\wwwroot\wss\VirtualDirectories\80\</Data> <Data>PINTSPSFE02</Data> <Data> </Data> <Data>5928</Data> <Data>w3wp.exe</Data> <Data>SAMC\MossAppPool</Data> <Data>AccessViolationException</Data> <Data></Data> <Data>http://mosscluster/Pages/Home.aspx</Data> <Data>/Pages/Home.aspx</Data> <Data>10.3.60.26</Data> <Data>SAMC\BARNMD</Data> <Data>True</Data> <Data>NTLM</Data> <Data>SAMC\MossAppPool</Data> <Data>110</Data> <Data>SAMC\MossAppPool</Data> <Data>False</Data> <Data> at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) </Data> </EventData> </Event>

    Read the article

  • We are hiring (take a minute to read this, is not another BS talk ;) )

    - by gsusx
    I really wanted to wait until our new website was out to blog about this but I hope you can put up with the ugly website for a few more days J. Tellago keeps growing and, after a quick break at the beginning of the year, we are back in hiring mode J. We are currently expanding our teams in the United States and Argentina and have various positions open in the following categories. .NET developers: If you are an exceptional .NET programmer with a passion for creating great software solutions working...(read more)

    Read the article

  • Oracle Data Integration Solutions and the Oracle EXADATA Database Machine

    - by João Vilanova
    Oracle's data integration solutions provide a complete, open and integrated solution for building, deploying, and managing real-time data-centric architectures in operational and analytical environments. Fully integrated with and optimized for the Oracle Exadata Database Machine, Oracle's data integration solutions take data integration to the next level and delivers extremeperformance and scalability for all the enterprise data movement and transformation needs. Easy-to-use, open and standards-based Oracle's data integration solutions dramatically improve productivity, provide unparalleled efficiency, and lower the cost of ownership.You can watch a video about this subject, after clicking on the link below.DIS for EXADATA Video

    Read the article

  • BIWA Wednesday TechCast Series - Opposition to Data Warehouse Initiatives

    - by jenny.gelhausen
    BIWA Wednesday TechCast Series - 19th Event! Opposition to Data Warehouse Initiatives Please join us for this webcast on Wednesday, March 24, 12 noon Eastern or check your local area's time Webcast is open to clients, prospects and partners. No matter how good your technology and technical skills, organizational issues can derail a data warehousing or BI project. Therefore BIWA presents a vital topic that crosses product boundaries: organizational resistance to data warehouse initiatives - how to recognize it and what to do about it. Many a DW/BI professional has been surprised by organizational resistance to DW/BI initiatives. Yet real organizational imperatives may be behind this apparently irrational behavior. Based on in-depth interviews with IT professionals, industry consultants, and power users, our speaker Bruce Jenks will present his research findings about what drives organizational resistance to data warehouse initiatives. The talk will cover specific behaviors that can signal organizational resistance to a data warehouse program and what organizations have done to address such resistance. Presenter: Bruce Jenks of Dun and Bradstreet Bruce Jenks has over 20 years experience in data warehousing and business intelligence, much of it as a consultant to large organizations spanning the US. Bruce's data warehousing clients have included firms such as Sprint, Gallo Wines, Southern California Edison, The Gap, and Safeway. He started his data warehousing career at Metaphor Computers, a pioneering DW/BI firm from which a number of industry luminaries sprang including Ralph Kimball (author of The Data Warehouse Toolkit ). Bruce continued his data warehousing career at HP, Stanford University and other firms. Bruce is currently completing his doctorate in business administration at Golden Gate University, and today's material arises from his doctoral research. He is also a principal consultant for Dun and Bradstreet. Audio Dial-In: 866 682 4770 Audio Meeting ID: 1683901 Audio Meeting Passcode: 334451 Web Conference: Please register at https://www1.gotomeeting.com/register/807185273 After you register you will be provided with a link to the TechCast. Invitation to Speakers: All BIWA members and Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) may submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community. Submit your BIWA TechCast abstract today! BIWA is a worldwide forum with over 2000 members who are business intelligence, warehousing and analytics professionals. BIWA presents information, experiences and best practices in successfully deploying Oracle Database-centric BI, Data Warehousing, and Analytics products, features and Options--the Oracle Database "BIWA" platform. Attendance Information & Replays at the BIWA website: oraclebiwa.org var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-13185312-1"); pageTracker._trackPageview(); } catch(err) {}

    Read the article

  • SQL Monitor’s data repository

    - by Chris Lambrou
    As one of the developers of SQL Monitor, I often get requests passed on by our support people from customers who are looking to dip into SQL Monitor’s own data repository, in order to pull out bits of information that they’re interested in. Since there’s clearly interest out there in playing around directly with the data repository, I thought I’d write some blog posts to start to describe how it all works. The hardest part for me is knowing where to begin, since the schema of the data repository is pretty big. Hmmm… I guess it’s tricky for anyone to write anything but the most trivial of queries against the data repository without understanding the hierarchy of monitored objects, so perhaps my first post should start there. I always imagine that whenever a customer fires up SSMS and starts to explore their SQL Monitor data repository database, they become immediately bewildered by the schema – that was certainly my experience when I did so for the first time. The following query shows the number of different object types in the data repository schema: SELECT type_desc, COUNT(*) AS [count] FROM sys.objects GROUP BY type_desc ORDER BY type_desc;  type_desccount 1DEFAULT_CONSTRAINT63 2FOREIGN_KEY_CONSTRAINT181 3INTERNAL_TABLE3 4PRIMARY_KEY_CONSTRAINT190 5SERVICE_QUEUE3 6SQL_INLINE_TABLE_VALUED_FUNCTION381 7SQL_SCALAR_FUNCTION2 8SQL_STORED_PROCEDURE100 9SYSTEM_TABLE41 10UNIQUE_CONSTRAINT54 11USER_TABLE193 12VIEW124 With 193 tables, 124 views, 100 stored procedures and 381 table valued functions, that’s quite a hefty schema, and when you browse through it using SSMS, it can be a bit daunting at first. So, where to begin? Well, let’s narrow things down a bit and only look at the tables belonging to the data schema. That’s where all of the collected monitoring data is stored by SQL Monitor. The following query gives us the names of those tables: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' ORDER BY sch.name, obj.name; This query still returns 110 tables. I won’t show them all here, but let’s have a look at the first few of them:  name 1data.Cluster_Keys 2data.Cluster_Machine_ClockSkew_UnstableSamples 3data.Cluster_Machine_Cluster_StableSamples 4data.Cluster_Machine_Keys 5data.Cluster_Machine_LogicalDisk_Capacity_StableSamples 6data.Cluster_Machine_LogicalDisk_Keys 7data.Cluster_Machine_LogicalDisk_Sightings 8data.Cluster_Machine_LogicalDisk_UnstableSamples 9data.Cluster_Machine_LogicalDisk_Volume_StableSamples 10data.Cluster_Machine_Memory_Capacity_StableSamples 11data.Cluster_Machine_Memory_UnstableSamples 12data.Cluster_Machine_Network_Capacity_StableSamples 13data.Cluster_Machine_Network_Keys 14data.Cluster_Machine_Network_Sightings 15data.Cluster_Machine_Network_UnstableSamples 16data.Cluster_Machine_OperatingSystem_StableSamples 17data.Cluster_Machine_Ping_UnstableSamples 18data.Cluster_Machine_Process_Instances 19data.Cluster_Machine_Process_Keys 20data.Cluster_Machine_Process_Owner_Instances 21data.Cluster_Machine_Process_Sightings 22data.Cluster_Machine_Process_UnstableSamples 23… There are two things I want to draw your attention to: The table names describe a hierarchy of the different types of object that are monitored by SQL Monitor (e.g. clusters, machines and disks). For each object type in the hierarchy, there are multiple tables, ending in the suffixes _Keys, _Sightings, _StableSamples and _UnstableSamples. Not every object type has a table for every suffix, but the _Keys suffix is especially important and a _Keys table does indeed exist for every object type. In fact, if we limit the query to return only those tables ending in _Keys, we reveal the full object hierarchy: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' AND obj.name LIKE '%_Keys' ORDER BY sch.name, obj.name;  name 1data.Cluster_Keys 2data.Cluster_Machine_Keys 3data.Cluster_Machine_LogicalDisk_Keys 4data.Cluster_Machine_Network_Keys 5data.Cluster_Machine_Process_Keys 6data.Cluster_Machine_Services_Keys 7data.Cluster_ResourceGroup_Keys 8data.Cluster_ResourceGroup_Resource_Keys 9data.Cluster_SqlServer_Agent_Job_History_Keys 10data.Cluster_SqlServer_Agent_Job_Keys 11data.Cluster_SqlServer_Database_BackupType_Backup_Keys 12data.Cluster_SqlServer_Database_BackupType_Keys 13data.Cluster_SqlServer_Database_CustomMetric_Keys 14data.Cluster_SqlServer_Database_File_Keys 15data.Cluster_SqlServer_Database_Keys 16data.Cluster_SqlServer_Database_Table_Index_Keys 17data.Cluster_SqlServer_Database_Table_Keys 18data.Cluster_SqlServer_Error_Keys 19data.Cluster_SqlServer_Keys 20data.Cluster_SqlServer_Services_Keys 21data.Cluster_SqlServer_SqlProcess_Keys 22data.Cluster_SqlServer_TopQueries_Keys 23data.Cluster_SqlServer_Trace_Keys 24data.Group_Keys The full object type hierarchy looks like this: Cluster Machine LogicalDisk Network Process Services ResourceGroup Resource SqlServer Agent Job History Database BackupType Backup CustomMetric File Table Index Error Services SqlProcess TopQueries Trace Group Okay, but what about the individual objects themselves represented at each level in this hierarchy? Well that’s what the _Keys tables are for. This is probably best illustrated by way of a simple example – how can I query my own data repository to find the databases on my own PC for which monitoring data has been collected? Like this: SELECT clstr._Name AS cluster_name, srvr._Name AS instance_name, db._Name AS database_name FROM data.Cluster_SqlServer_Database_Keys db JOIN data.Cluster_SqlServer_Keys srvr ON db.ParentId = srvr.Id -- Note here how the parent of a Database is a Server JOIN data.Cluster_Keys clstr ON srvr.ParentId = clstr.Id -- Note here how the parent of a Server is a Cluster WHERE clstr._Name = 'dev-chrisl2' -- This is the hostname of my own PC ORDER BY clstr._Name, srvr._Name, db._Name;  cluster_nameinstance_namedatabase_name 1dev-chrisl2SqlMonitorData 2dev-chrisl2master 3dev-chrisl2model 4dev-chrisl2msdb 5dev-chrisl2mssqlsystemresource 6dev-chrisl2tempdb 7dev-chrisl2sql2005SqlMonitorData 8dev-chrisl2sql2005TestDatabase 9dev-chrisl2sql2005master 10dev-chrisl2sql2005model 11dev-chrisl2sql2005msdb 12dev-chrisl2sql2005mssqlsystemresource 13dev-chrisl2sql2005tempdb 14dev-chrisl2sql2008SqlMonitorData 15dev-chrisl2sql2008master 16dev-chrisl2sql2008model 17dev-chrisl2sql2008msdb 18dev-chrisl2sql2008mssqlsystemresource 19dev-chrisl2sql2008tempdb These results show that I have three SQL Server instances on my machine (a default instance, one named sql2005 and one named sql2008), and each instance has the usual set of system databases, along with a database named SqlMonitorData. Basically, this is where I test SQL Monitor on different versions of SQL Server, when I’m developing. There are a few important things we can learn from this query: Each _Keys table has a column named Id. This is the primary key. Each _Keys table has a column named ParentId. A foreign key relationship is defined between each _Keys table and its parent _Keys table in the hierarchy. There are two exceptions to this, Cluster_Keys and Group_Keys, because clusters and groups live at the root level of the object hierarchy. Each _Keys table has a column named _Name. This is used to uniquely identify objects in the table within the scope of the same shared parent object. Actually, that last item isn’t always true. In some cases, the _Name column is actually called something else. For example, the data.Cluster_Machine_Services_Keys table has a column named _ServiceName instead of _Name (sorry for the inconsistency). In other cases, a name isn’t sufficient to uniquely identify an object. For example, right now my PC has multiple processes running, all sharing the same name, Chrome (one for each tab open in my web-browser). In such cases, multiple columns are used to uniquely identify an object within the scope of the same shared parent object. Well, that’s it for now. I’ve given you enough information for you to explore the _Keys tables to see how objects are stored in your own data repositories. In a future post, I’ll try to explain how monitoring data is stored for each object, using the _StableSamples and _UnstableSamples tables. If you have any questions about this post, or suggestions for future posts, just submit them in the comments section below.

    Read the article

  • SQL – Migrate Database from SQL Server to NuoDB – A Quick Tutorial

    - by Pinal Dave
    Data is growing exponentially and every organization with growing data is thinking of next big innovation in the world of Big Data. Big data is a indeed a future for every organization at one point of the time. Just like every other next big thing, big data has its own challenges and issues. The biggest challenge associated with the big data is to find the ideal platform which supports the scalability and growth of the data. If you are a regular reader of this blog, you must be familiar with NuoDB. I have been working with NuoDB for a while and their recent release is the best thus far. NuoDB is an elastically scalable SQL database that can run on local host, datacenter and cloud-based resources. A key feature of the product is that it does not require sharding (read more here). Last week, I was able to install NuoDB in less than 90 seconds and have explored their Explorer and Admin sections. You can read about my experiences in these posts: SQL – Step by Step Guide to Download and Install NuoDB – Getting Started with NuoDB SQL – Quick Start with Admin Sections of NuoDB – Manage NuoDB Database SQL – Quick Start with Explorer Sections of NuoDB – Query NuoDB Database Many SQL Authority readers have been following me in my journey to evaluate NuoDB. One of the frequently asked questions I’ve received from you is if there is any way to migrate data from SQL Server to NuoDB. The fact is that there is indeed a way to do so and NuoDB provides a fantastic tool which can help users to do it. NuoDB Migrator is a command line utility that supports the migration of Microsoft SQL Server, MySQL, Oracle, and PostgreSQL schemas and data to NuoDB. The migration to NuoDB is a three-step process: NuoDB Migrator generates a schema for a target NuoDB database It loads data into the target NuoDB database It dumps data from the source database Let’s see how we can migrate our data from SQL Server to NuoDB using a simple three-step approach. But before we do that we will create a sample database in MSSQL and later we will migrate the same database to NuoDB: Setup Step 1: Build a sample data CREATE DATABASE [Test]; CREATE TABLE [Department]( [DepartmentID] [smallint] NOT NULL, [Name] VARCHAR(100) NOT NULL, [GroupName] VARCHAR(100) NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_Department_DepartmentID] PRIMARY KEY CLUSTERED ( [DepartmentID] ASC ) ) ON [PRIMARY]; INSERT INTO Department SELECT * FROM AdventureWorks2012.HumanResources.Department; Note that I am using the SQL Server AdventureWorks database to build this sample table but you can build this sample table any way you prefer. Setup Step 2: Install Java 64 bit Before you can begin the migration process to NuoDB, make sure you have 64-bit Java installed on your computer. This is due to the fact that the NuoDB Migrator tool is built in Java. You can download 64-bit Java for Windows, Mac OSX, or Linux from the following link: http://java.com/en/download/manual.jsp. One more thing to remember is that you make sure that the path in your environment settings is set to your JAVA_HOME directory or else the tool will not work. Here is how you can do it: Go to My Computer >> Right Click >> Select Properties >> Click on Advanced System Settings >> Click on Environment Variables >> Click on New and enter the following values. Variable Name: JAVA_HOME Variable Value: C:\Program Files\Java\jre7 Make sure you enter your Java installation directory in the Variable Value field. Setup Step 3: Install JDBC driver for SQL Server. There are two JDBC drivers available for SQL Server.  Select the one you prefer to use by following one of the two links below: Microsoft JDBC Driver jTDS JDBC Driver In this example we will be using jTDS JDBC driver. Once you download the driver, move the driver to your NuoDB installation folder. In my case, I have moved the JAR file of the driver into the C:\Program Files\NuoDB\tools\migrator\jar folder as this is my NuoDB installation directory. Now we are all set to start the three-step migration process from SQL Server to NuoDB: Migration Step 1: NuoDB Schema Generation Here is the command I use to generate a schema of my SQL Server Database in NuoDB. First I go to the folder C:\Program Files\NuoDB\tools\migrator\bin and execute the nuodb-migrator.bat file. Note that my database name is ‘test’. Additionally my username and password is also ‘test’. You can see that my SQL Server database is running on my localhost on port 1433. Additionally, the schema of the table is ‘dbo’. nuodb-migrator schema –source.driver=net.sourceforge.jtds.jdbc.Driver –source.url=jdbc:jtds:sqlserver://localhost:1433/ –source.username=test –source.password=test –source.catalog=test –source.schema=dbo –output.path=/tmp/schema.sql The above script will generate a schema of all my SQL Server tables and will put it in the folder C:\tmp\schema.sql . You can open the schema.sql file and execute this file directly in your NuoDB instance. You can follow the link here to see how you can execute the SQL script in NuoDB. Please note that if you have not yet created the schema in the NuoDB database, you should create it before executing this step. Step 2: Generate the Dump File of the Data Once you have recreated your schema in NuoDB from SQL Server, the next step is very easy. Here we create a CSV format dump file, which will contain all the data from all the tables from the SQL Server database. The command to do so is very similar to the above command. Be aware that this step may take a bit of time based on your database size. nuodb-migrator dump –source.driver=net.sourceforge.jtds.jdbc.Driver –source.url=jdbc:jtds:sqlserver://localhost:1433/ –source.username=test –source.password=test –source.catalog=test –source.schema=dbo –output.type=csv –output.path=/tmp/dump.cat Once the above command is successfully executed you can find your CSV file in the C:\tmp\ folder. However, you do not have to do anything manually. The third and final step will take care of completing the migration process. Migration Step 3: Load the Data into NuoDB After building schema and taking a dump of the data, the very next step is essential and crucial. It will take the CSV file and load it into the NuoDB database. nuodb-migrator load –target.url=jdbc:com.nuodb://localhost:48004/mytest –target.schema=dbo –target.username=test –target.password=test –input.path=/tmp/dump.cat Please note that in the above script we are now targeting the NuoDB database, which we have already created with the name of “MyTest”. If the database does not exist, create it manually before executing the above script. I have kept the username and password as “test”, but please make sure that you create a more secure password for your database for security reasons. Voila!  You’re Done That’s it. You are done. It took 3 setup and 3 migration steps to migrate your SQL Server database to NuoDB.  You can now start exploring the database and build excellent, scale-out applications. In this blog post, I have done my best to come up with simple and easy process, which you can follow to migrate your app from SQL Server to NuoDB. Download NuoDB I strongly encourage you to download NuoDB and go through my 3-step migration tutorial from SQL Server to NuoDB. Additionally here are two very important blog post from NuoDB CTO Seth Proctor. He has written excellent blog posts on the concept of the Administrative Domains. NuoDB has this concept of an Administrative Domain, which is a collection of hosts that can run one or multiple databases.  Each database has its own TEs and SMs, but all are managed within the Admin Console for that particular domain. http://www.nuodb.com/techblog/2013/03/11/getting-started-provisioning-a-domain/ http://www.nuodb.com/techblog/2013/03/14/getting-started-running-a-database/ Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • Ameristar Wins with Oracle GoldenGate’s Heterogeneous Real-Time Data Integration

    - by Irem Radzik
    Today we announced a press release about another successful project with Oracle GoldenGate. This time at Ameristar. Ameristar is a casino gaming company and needed a single data integration solution to connect multiple heterogeneous systems to its Teradata data warehouse. The project involves integration of Ameristar’s promotional and gaming data from 14 data sources across its 7 casino hotel properties in real time into a central Teradata data warehouse. The source systems include the Aristocrat gaming and MGT promotional management platforms running on Microsoft SQL Server 2000 databases. As you can notice, there was no Oracle Database involved in this project, but Ameristar’s IT leadership knew that  GoldenGate’s strong heterogeneous and real-time data integration capabilities is the right technology for their data warehousing project. With GoldenGate Ameristar was able to reduce data latency to the enterprise data warehouse, and use this real-time customer information for marketing teams in improving overall customer experience. Ameristar customers receive more targeted and timely campaign offers, and the company has more up-to-date visibility into financial metrics of the company. One other key benefit the company experienced with GoldenGate is in operational costs. The previous data capture solution Ameristar used was trigger based and required a lot of effort to manage. They needed dedicated IT staff to maintain it. With GoldenGate, the solution runs seamlessly without needing a fully-dedicated staff, giving the IT team at Ameristar more resources for their other IT projects. If you want to learn more about GoldenGate and the latest features for Oracle Database and non-Oracle databases, please watch our on demand webcast about Oracle GoldenGate 11g Release 2.

    Read the article

  • My Take on Hadoop World 2011

    - by Jean-Pierre Dijcks
    I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

    Read the article

< Previous Page | 60 61 62 63 64 65 66 67 68 69 70 71  | Next Page >