Search Results

Search found 60920 results on 2437 pages for 'data professional'.

Page 52/2437 | < Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >

Using Constraints on Hierarchical Data in a Self-Referential Table

- by pbarney

Suppose you have the following table, intended to represent hierarchical data: +--------+-------------+ | Field | Type | +--------+-------------+ | id | int(10) | | parent | int(10) | | name | varchar(45) | +--------+-------------+ The table is self-referential in that the parent_id refers to id. So you might have the following data: +----+--------+---------------+ | id | parent | name | +----+--------+---------------+ | 1 | 0 | fruit | | 2 | 0 | vegetable | | 3 | 1 | apple | | 4 | 1 | orange | | 5 | 3 | red delicious | | 6 | 3 | granny smith | | 7 | 3 | gala | +----+--------+---------------+ Using MySQL, I am trying to impose a (self-referential) foreign key constraint upon the data to update on cascades and prevent deletion of fruit if they have "children." So I used the following: CREATE TABLE `idtlp_main`.`fruit` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `parent` INT(10) UNSIGNED, `name` VARCHAR(45) NOT NULL, PRIMARY KEY (`id`), CONSTRAINT `fk_parent` FOREIGN KEY (`parent`) REFERENCES `fruit` (`id`) ON UPDATE CASCADE ON DELETE RESTRICT ) ENGINE = InnoDB; From what I understand, this should fit my requirements. (And parent must default to null to allow insertions, correct?) The problem is, if I change the id of a record, it will not cascade: Cannot delete or update a parent row: a foreign key constraint fails (`iddoc_main`.`fruit`, CONSTRAINT `fk_parent` FOREIGN KEY (`parent`) REFERENCES `fruit` (`id`) ON UPDATE CASCADE) What am I missing? Feel free to correct me if my terminology is screwed up... I'm new to constraints.

Read the article
iPhone application update (using Core Data on Sqlite)

- by owen

I have an app which is using Core Data on Sqlite, Now I have a update which has some DB structure changes say adding a new table I know when an app get updated, it updates the app binary only, nothing on document directory will be changed. When the app gets updated and launchs at the first time and run [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:[self managedObjectModel]]; It will find the difference between the data model and DB structure in Sqlite and will throw an exception and quit. Error: "The model used to open the store is incompatible with the one used to create the store" So, can anyone here give me some idea how to update an app when there is a DB structure change? I think we can run a DB script to create that new table when it launchs the update at the first time. But if there are other changes like changing the type of some fields or deleting some fields, and we need to migrate the old data, this is really a headache. In this case, Is the only way to do is creating a new app? Is there anyone tried something similar like this?

Read the article
Point data structure for a sketching application

- by bebraw

I am currently developing a little sketching application based on HTML5 Canvas element. There is one particular problem I haven't yet managed to find a proper solution for. The idea is that the user will be able to manipulate existing stroke data (points) quite freely. This includes pushing point data around (ie. magnet tool) and manipulating it at whim otherwise (ie. altering color). Note that the current brush engine is able to shade by taking existing stroke data in count. It's a quick and dirty solution as it just iterates the points in the current stroke and checks them against a distance rule. Now the problem is how to do this in a nice manner. It is extremely important to be able to perform efficient queries that return all points within given canvas coordinate and radius. Other features, such as space usage, should be secondary to this. I don't mind doing some extra processing between strokes while the user is not painting. Any pointers are welcome. :)

Read the article
Algorithm detect repeating/similiar strings in a corpus of data -- say email subjects, in Python

- by RizwanK

I'm downloading a long list of my email subject lines , with the intent of finding email lists that I was a member of years ago, and would want to purge them from my Gmail account (which is getting pretty slow.) I'm specifically thinking of newsletters that often come from the same address, and repeat the product/service/group's name in the subject. I'm aware that I could search/sort by the common occurrence of items from a particular email address (and I intend to), but I'd like to correlate that data with repeating subject lines.... Now, many subject lines would fail a string match, but "Google Friends : Our latest news" "Google Friends : What we're doing today" are more similar to each other than a random subject line, as is: "Virgin Airlines has a great sale today" "Take a flight with Virgin Airlines" So -- how can I start to automagically extract trends/examples of strings that may be more similar. Approaches I've considered and discarded ('because there must be some better way'): Extracting all the possible substrings and ordering them by how often they show up, and manually selecting relevant ones Stripping off the first word or two and then count the occurrence of each sub string Comparing Levenshtein distance between entries Some sort of string similarity index ... Most of these were rejected for massive inefficiency or likelyhood of a vast amount of manual intervention required. I guess I need some sort of fuzzy string matching..? In the end, I can think of kludgy ways of doing this, but I'm looking for something more generic so I've added to my set of tools rather than special casing for this data set. After this, I'd be matching the occurring of particular subject strings with 'From' addresses - I'm not sure if there's a good way of building a data structure that represents how likely/not two messages are part of the 'same email list' or by filtering all my email subjects/from addresses into pools of likely 'related' emails and not -- but that's a problem to solve after this one. Any guidance would be appreciated.

Read the article
What changes in Core Data after a save?

- by Splash6

I have a Core Data based mac application that is working perfectly well until I save a file. When I save a file it seems like something changes in core data because my original fetch request no longer fetches anything. This is the fetch request that works before saving but returns an empty array after saving. NSEntityDescription *outputCellEntityDescription = [NSEntityDescription entityForName:@"OutputCell" inManagedObjectContext:[[self document] managedObjectContext]]; NSFetchRequest *outputCellRequest = [[[NSFetchRequest alloc] init] autorelease]; [outputCellRequest setEntity:outputCellEntityDescription]; NSPredicate *outputCellPredicate = [NSPredicate predicateWithFormat:@"(cellTitle = %@)", outputCellTitle]; [outputCellRequest setPredicate:outputCellPredicate]; NSError *outputCellError = nil; NSArray *outputCellArray = [[[self document] managedObjectContext] executeFetchRequest:outputCellRequest error:&outputCellError]; I have checked with [[[self document] managedObjectContext] registeredObjects] to see that the object still exists after the save and nothing seems to have changed and the object still exists. It is probably something fairly basic but does anyone know what I might be doing wrong? If not can anyone give me any pointers to what might be different in the Core Data model after a save so I might have some clues why the fetch request stops working after saving?

Read the article
C++: best way to implement globally scoped data

- by bobobobo

I'd like to make program-wide data in a C++ program, without running into pesky LNK2005 errors when all the source files #includes this "global variable repository" file. I have 2 ways to do it in C++, and I'm asking which way is better. The easiest way to do it in C# is just public static members. C#: public static class DataContainer { public static Object data1 ; public static Object data2 ; } In C++ you can do the same thing C++ global data way#1: class DataContainer { public: static Object data1 ; static Object data2 ; } ; Object DataContainer::data1 ; Object DataContainer::data2 ; However there's also extern C++ global data way #2: class DataContainer { public: Object data1 ; Object data2 ; } ; extern DataContainer * dataContainer ; // instantiate in .cpp file In C++ which is better, or possibly another way which I haven't thought about? The solution has to not cause LNK2005 "object already defined" errors.

Read the article
strange memory error when deleting object from Core Data

- by llloydxmas

I have an application that downloads an xml file, parses the file, and creates core data objects while doing so. In the parse code I have a function called 'emptydatacontext' that removes all items from Core Data before creating replacements from the xml file. This method looks like this: -(void) emptyDataContext { NSMutableArray* mutableFetchResults = [CoreDataHelper getObjectsFromContext:@"Condition" :@"road" :NO :managedObjectContext]; NSFetchRequest * allCon = [[NSFetchRequest alloc] init]; [allCon setEntity:[NSEntityDescription entityForName:@"Condition" inManagedObjectContext:managedObjectContext]]; NSError * error = nil; NSArray * conditions = [managedObjectContext executeFetchRequest:allCon error:&error]; [allCon release]; for (NSManagedObject * condition in conditions) { [managedObjectContext deleteObject:condition]; } } The first time this runs it deletes all objects and functions as it should - creating new objects from the xml file. I created a 'update' button that starts the exact same process of retrieving the file the preceeding as it did the first time. All is well until its time to delete the core data objects again. This 'deleteObject' call creates a "EXC_BAD_ACCESS" error each time. This only happens on the second time through. See this image for the debugger window as it appears when walking through the deletion FOR loop on the second iteration. Conditions is the fetched array of 7 objects with the objects below. Condition should be an individual condition. link text As you can see 'condition' does not match any of the objects in the 'conditions' array. I'm sure this is why I'm getting the memory access errors. Just not sure why this fetch (or the FOR) is returning a wrong reference. All the code that successfully performes this function on the first iteration is used in the second but with very different results. Thanks in advance for the help!

Read the article
MVC and jQuery data retrieval.

- by user337542

Hello, I am using mvc and jQuery and I am trying to display someone's profile with some additional institutions that the person belongs to. I am new to this but I've done something like this in ProfileControler: public ActionResult Institutions(int id) { var inst = fr.getInstitutions(id); return Json(inst); } getInstitutions(id) returns Institution objects(with Name, City, Post Code etc.) Then in a certain View I am trying to get the data with jQuery and display them as follows: $(document).ready(function () { $.post("/Profile/Institutions", { id: <%= Model.Profile.userProfileID %> }, function (data) { $.each(data, function () { var new_div = $("<div>"); var new_label = $("<label>"); new_label.html(this.City); var new_input_b = $("<input>"); new_input_b.attr("type", "button"); new_div.append(new_label); new_div.append(new_input_b); $("#institutions").append(new_div); }); }); }); $("#institutions") is a div where i want to display all of the results. .post works correct for sure because certain institutions are retrieved from database, and passed to the view as Json result. But then I am affraid it wont itterate with .each. Any help, coments or pointing in some direction would be much appriciated

Read the article
JTable data only shown after scrolling

- by Christian 'fuzi' Orgler

I wrote a method, that creates my DefaultTableModel and there I'm going to add my records. When I set the model to my JTable, the data rows are blank. After scrolling the data gets displayed correct. How can I avoid this and display the data from the first moment? EDIT: I imported the javax.swing.table.DefaultTableModel -- is this correct? private DefaultTableModel _dtm; private void loadTable(Vector<Member> members) { loadTableModel(); try { lbl_state.setText("Please wait"); for (Member actMember : members) { String gender = ""; if (actMember.getGender() == MemberView.MEMBER_MALE) { gender = "männlich"; } else { gender = "weiblich"; } _dtm.addRow(new Object[]{ actMember.getNname(), actMember.getVname(), actMember.getCity(), actMember.getStreet(), actMember.getPlz(), actMember.getMail(), actMember.getPhonenumber(), actMember.getBirthdayString(), actMember.getStartDateString(), gender, actMember.getBankname(), actMember.getAccountnumber(), actMember.getBanknumber(), actMember.getGroup().toString(), (actMember.hasAccess() ? "JA" : "NEIN"), actMember.getWriteDateString(), (actMember.hasDrinkAbo() ? "JA" : "NEIN") }); } } catch (Exception ex) { System.err.println(ex.getMessage()); } tbl_results.setModel(_dtm); } private void loadTableModel() { _dtm = new DefaultTableModel(new Object[]{"Nachname", "Vorname", "Ort", "Straße", "PLZ", "E-Mail", "Telefon", "Geburtsdatum", "Beitrittsdatum", "Geschlecht", "Bankname", "Kontonummer", "Bankleitzahl", "Gruppe", "hat Zugriff", "Einschreibdatum", "Getränkeabo"}, 0); tbl_results.setAutoResizeMode(JTable.AUTO_RESIZE_ALL_COLUMNS); }

Read the article
NSFetchedResultsController on secondary UITableView - how to query data?

- by Jason

I am creating a core-data based Navigation iPhone app with multiple screens. Let's say it is a flash-card application. The data model is very simple, with only two entities: Language, and CardSet. There is a one-to-many relationship between the Language entity and the CardSet entities, so each Language may contain multiple CardSets. In other words, Language has a one-to-many relationship Language.cardSets which points to the list of CardSets, and CardSet has a relationship CardSet.language which points to the Language. There are two screens: (1) An initial TableView screen, which displays the list of languages; and (2) a secondary TableView screen, which displays the list of CardSets in the Language. In the initial screen, which lists the languages, I am using NSFetchedResultsController to keep the list of languages up-to-date. The screen passes the Language selected to the secondary screen. On the secondary screen, I am trying to figure out whether I should again use an NSFetchedResultsController to maintain the list of CardSets, or if I should work through Language.cardSets to simply pull the list out of the object model. The latter makes the most sense programatically because I already have the Language - but then it would not automatically be updated on changes. I have looked at the NSFetchedResultsController documentation, and it seems like I can easily create predicates based on attributes - but not relationships. I.e., I can create the following NSFetchedResultsController: NSPredicate *predicate = [NSPredicate predicateWithFormat:@"name LIKE[c] 'Chuck Norris'"]; How can I access my data through the direct relationship - Language.cardSets - and also have the table auto-update using NSFetchedResultsController? Is this possible?

Read the article
Uniquing with Existing Core Data Entities

- by warrenm

I'm using Core Data to store a lot (1000s) of items. A pair of properties on each item are used to determine uniqueness, so when a new item comes in, I compare it against the existing items before inserting it. Since the incoming data is in the form of an RSS feed, there are often many duplicates, and the cost of the uniquing step is O(N^2), which has become significant. Right now, I create a set of existing items before iterating over the list of (possible) new items. My theory is that on the first iteration, all the items will be faulted in, and assuming we aren't pressed for memory, most of those items will remain resident over the course of the iteration. I see my options thusly: Use string comparison for uniquing, iterating over all "new" items and comparing to all existing items (Current approach) Use a predicate to filter the set of existing items against the properties of the "new" items. Use a predicate with Core Data to determine uniqueness of each "new" item (without retrieving the set of existing items). Is option 3 likely to be faster than my current approach? Do you know of a better way?

Read the article
Can't find momd file: Core Data problems

- by thekevinscott

Aw geez! I screwed something up! I'm a Core Data noob, working on my first iOS app. After much Stack Overflowing I'm using this code: NSString *path = [[NSBundle mainBundle] pathForResource:@"CoreData" ofType:@"momd"]; if (!path) { path = [[NSBundle mainBundle] pathForResource:@"CoreData" ofType:@"mom"]; } NSAssert(path != nil, @"Unable to find Resource in main bundle"); CoreData is the name of my app. I've tried to put in initial data into the app by finding the path to the sqlite file in my iPhone simulator, and then going and inserting into that sqlite file. But at some point, I moved the sqlite (thinking it would create a fresh copy), deleted the app from the simulator, and the sqlite file is gone. I'm not sure if I'm leaving out some part of the process (this was a few hours ago) but the end result is that everything is screwed up. How do I resubstantiate this sqlite / momd file? "Clean" and "Clean all targets" are grayed out. I'm happy to post the relevant code from my app that would help shed some light on this problem but there's tons of code relating to Core Data which I don't understand, so I'm not sure what part to post! Any help is greatly appreciated.

Read the article
Import & modify date data in MATLAB

- by niko

I have a .csv file with records written in the following form: 2010-04-20 15:15:00,"8.9915176259e+00","8.8562623697e+00" 2010-04-20 15:30:00,"8.5718021723e+00","8.6633827160e+00" 2010-04-20 15:45:00,"8.4484844117e+00","8.4336586330e+00" 2010-04-20 16:00:00,"1.1106980342e+01","8.4333062208e+00" 2010-04-20 16:15:00,"9.0643470589e+00","8.6885660103e+00" 2010-04-20 16:30:00,"8.2133517943e+00","8.2677822671e+00" 2010-04-20 16:45:00,"8.2499419380e+00","8.1523501983e+00" 2010-04-20 17:00:00,"8.2948492278e+00","8.2884797924e+00" From these data I would like to make clusters - I would like to add a column with number indicating the hour - so in case of the first row a value 15 has to be added in a new row. The first problem is that calling a function [numData, textData, rawData] = xlsread('testData.csv') creates an empty matrix numData and one-column textData and rawData structures. Is it possible to create any template which recognizes a yyyy, MM, dd, hh, mm, ss values from the data above? What I would basically like to do with these data is to categorize the values by hours so from the example row of input: 2010-04-20 15:15:00,"8.9915176259e+00","8.8562623697e+00" update 1: in Matlab the line above is recognized as a string: '2010-04-26 13:00:00,"1.0428104753e+00","2.3456394130e+00"' I would want this to be the output: 15, 8.9915176259e+00, 8.8562623697e+00 update 1: a string has to be parsed Does anyone know how to parse a string and retrieve a timestamp, value1 (1.0428104753e+00) and value2 (2.3456394130e+00) from it as separate values?

Read the article
Multidimensional data structure?

- by Austin Truong

I need a multidimensional data structure with a row and a column. Must be able to insert elements any location in the data structure. Example: {A , B} I want to insert C in between A and B. {A, C, B}. Dynamic: I do not know the size of the data structure. Another example: I know the [row][col] of where I want to insert the element. EX. insert("A", 1, 5), where A is the element to be inserted, 1 is the row, 5 is the column. EDIT I want to be able to insert like this. static void Main(string[] args) { Program p = new Program(); List<string> list = new List<string>(); list.Insert(1, "HELLO"); list.Insert(5, "RAWR"); for (int i = 0; i < list.Count; i++) { Console.WriteLine(list[i]); } Console.ReadKey(); } And of course this crashes with an out of bounds error. So in a sense I will have a user who will choose which ROW and COL to insert the element to.

Read the article
WCF data services - Limiting related objects returned based on critera

- by Mike Morley

I have an object graph consisting of a base employee object, and a set of related message objects. I am able to return the employee objects based on search criteria on the employee properties (eg team) etc. However, if I expand on the messages, I get the full collection of messages back. I would like to be able to either take the top n messages (i.e. restrict to 10 most recent) or ideally use a date range on the message objects to limit how many are brought back. So far I have not been able to figure out a way of doing this: I get an error if I attempt to filter on properties on the message (&$filter=employee/message/StartDate gives an error "No property 'StartDate' exists in type 'System.Data.Objects.DataClasses.EntityCollection`1). Attempting to use Top on the message related object doesn't work either. I have also tried using a WebGet extension that takes a string list of employee IDs. That works until the list gets too long, and then fails due to the URL getting too long (it might be possible to setup a paging mechanism on this approach)... Unfortunately the UI control I am using requires the data to be in a fairly specific hierarchical shape, so I can't easily come at this from starting on the message side and working backwards. Outside of making multiple calls does anyone know of a method to accomplish this with wcf data services? Thanks! M.

Read the article
Compact data structure for storing a large set of integral values

- by Odrade

I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

Read the article
Display data FROM Sheet1 on Sheet2 based on conditional value

- by Shawn

Imagine a worksheet with 30 pieces of information, such as: A1= Start Date A2 = End Date A3 = Resource Name A4 = Cost .... A30 = Whatever B1 = 1/1/2010 B2 = 2/15/2010 B3 = Joe Smith B4 = $10,000.00 ... B30 = Blah Blah Now imagine a third column, C. The purpose of the third column is to determine WHICH report that row of data needs to appear in. C1 = Report 1 C2 = Report 1 and Report 2 C3 = Report 4 and Report 7 C4 = Report 1 and Report 5 ... C30 = Report 2 Each report is on Sheets 2, 3, 4, 5 and so on (depending on how many I decide to create). As you can see form my example above, some data may need to appear in multiple reports. For example, the data in Row 3 (Resource Name: Joe Smith) needs to appear in Report 4 and Report 7. That is to say, it needs to DYNAMICALLY appear on two additional worksheets. If I change the values in column C, then the reports need to update automatically. How can I create the worksheets which will serve as the reports such that they only diaplay the rows which have been "flagged" to be displayed in that report? Thanks!

Read the article
Python, web log data mining for frequent patterns

- by descent

Hello! I need to develop a tool for web log data mining. Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website. I am new to Data Mining, and now examining Google a lot. Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies. So my questions are: Are there any python-based tools that do what I need or at least smth similar? Can Orange toolkit be of any help? Can reading the book Programming Collective Intelligence be of any help? What to Google for, what to read, which relatively simple algorithms to use best? I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time. Thanks in advance!

Read the article
Data Integration Solution?

- by Shlomo

At my company we have a number of data feeds and processing that run on any given day. The number of feeds and processing steps is starting to out-number the ability to manage it ad-hoc as it is managed currently. Is there a good solution that helps with logging and managing/scheduling dependencies? For example: A: When file x is FTP dropped into directory D1, kick off processing step B B: Load flat file into DB1 C: When file y is FTP dropped into directory D2, kick off processing Step D D: Load flat file into DB11 E: When B and D are done, churn through the data, and load new data into DB111. F: When Step E is done, launch application process P G: etc... I want those steps to run at the appropriate times, not to mention if B fails, there's no reason to run steps E & F, but I could still run C & D. When I re-run B successfully, it should trigger just E & F to re-run, not C & D. We're a .NET/C#/Sql Server shop, and I'm already familiar with SSIS. Is that really the best there is? That manages steps well, but not external dependencies, or logging. Open source (.NET) preferred, but not required.

Read the article
Save many-to-one relationship from JSON into Core Data

- by Snow Crash

I'm wanting to save a Many-to-one relationship parsed from JSON into Core Data. The code that parses the JSON and does the insert into Core Data looks like this: for (NSDictionary *thisRecipe in recipes) { Recipe *recipe = [NSEntityDescription insertNewObjectForEntityForName:@"Recipe" inManagedObjectContext:insertionContext]; recipe.title = [thisRecipe objectForKey:@"Title"]; NSDictionary *ingredientsForRecipe = [thisRecipe objectForKey:@"Ingredients"]; NSArray *ingredientsArray = [ingredientsForRecipe objectForKey:@"Results"]; for (NSDictionary *thisIngredient in ingredientsArray) { Ingredient *ingredient = [NSEntityDescription insertNewObjectForEntityForName:@"Ingredient" inManagedObjectContext:insertionContext]; ingredient.name = [thisIngredient objectForKey:@"Name"]; } } NSSet *ingredientsSet = [NSSet ingredientsArray]; [recipe setIngredients:ingredientsSet]; Notes: "setIngredients" is a Core Data generated accessor method. There is a many-to-one relationship between Ingredients and Recipe However, when I run this I get the following error: NSCFDictionary managedObjectContext]: unrecognized selector sent to instance If I remove the last line (i.e. [recipe setIngredients:ingredientsSet];) then, taking a peek at the SQLite database, I see the Recipe and Ingredients have been stored but no relationship has been created between Recipe and Ingredients Any suggestions as to how to ensure the relationship is stored correctly?

Read the article
Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

- by Reed

In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine. If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine. The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data. Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection. One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism. In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library. This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple. Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter. Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel. Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition. Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns. We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change. Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine. Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method. In this case, our for loop iteration block becomes a delegate creating via a lambda expression. This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea. There is a balance to be struck when writing parallel code. We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce. In this case, we have an image of data – most likely hundreds of pixels in both dimensions. By just parallelizing our first loop, each row of pixels can be run as a single task. With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks. This introduces extra overhead with no extra gain, and will actually reduce our overall performance. This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop. I could have just as easily partitioned the inner loop. However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels). My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop. If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance. If we’re iterating through an IList<T> or an array, this is a typical approach. However, there are other iteration statements common in C#. In many situations, we’ll use foreach instead of a for loop. This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation. Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change. Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist. If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression. This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel. The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library. These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism. Iterating through collections and performing “work” is a very common pattern in nearly every application. When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls. Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

Read the article
ERROR: Attempted to read or write protected memory. This is often an indication that other memory is corrupt

- by SPSamL

I get this error after having edited a few pages in SharePoint 2010. I have to do an IISReset on both front ends to get this to resolve. I don't know how to fix it or even what else to supply here, but please let me know as the resets now happen several times per day. Log Name: Application Source: ASP.NET 2.0.50727.0 Date: 1/26/2011 11:12:48 AM Event ID: 1309 Task Category: Web Event Level: Warning Keywords: Classic User: N/A Computer: PINTSPSFE02.samcstl.org Description: Event code: 3005 Event message: An unhandled exception has occurred. Event time: 1/26/2011 11:12:48 AM Event time (UTC): 1/26/2011 5:12:48 PM Event ID: c52fb336b7f147a3913fff3617a99d57 Event sequence: 4965 Event occurrence: 2178 Event detail code: 0 Application information: Application domain: /LM/W3SVC/1449762715/ROOT-2-129405348166941887 Trust level: WSS_Minimal Application Virtual Path: / Application Path: C:\inetpub\wwwroot\wss\VirtualDirectories\80\ Machine name: PINTSPSFE02 Process information: Process ID: 5928 Process name: w3wp.exe Account name: SAMC\MossAppPool Exception information: Exception type: AccessViolationException Exception message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Request information: Request URL: http://mosscluster/Pages/Home.aspx Request path: /Pages/Home.aspx User host address: 10.3.60.26 User: SAMC\BARNMD Is authenticated: True Authentication Type: NTLM Thread account name: SAMC\MossAppPool Thread information: Thread ID: 110 Thread account name: SAMC\MossAppPool Is impersonating: False Stack trace: at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) Custom event details: Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="ASP.NET 2.0.50727.0" /> <EventID Qualifiers="32768">1309</EventID> <Level>3</Level> <Task>3</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2011-01-26T17:12:48.000000000Z" /> <EventRecordID>35834</EventRecordID> <Channel>Application</Channel> <Computer>PINTSPSFE02.samcstl.org</Computer> <Security /> </System> <EventData> <Data>3005</Data> <Data>An unhandled exception has occurred.</Data> <Data>1/26/2011 11:12:48 AM</Data> <Data>1/26/2011 5:12:48 PM</Data> <Data>c52fb336b7f147a3913fff3617a99d57</Data> <Data>4965</Data> <Data>2178</Data> <Data>0</Data> <Data>/LM/W3SVC/1449762715/ROOT-2-129405348166941887</Data> <Data>WSS_Minimal</Data> <Data>/</Data> <Data>C:\inetpub\wwwroot\wss\VirtualDirectories\80\</Data> <Data>PINTSPSFE02</Data> <Data> </Data> <Data>5928</Data> <Data>w3wp.exe</Data> <Data>SAMC\MossAppPool</Data> <Data>AccessViolationException</Data> <Data></Data> <Data>http://mosscluster/Pages/Home.aspx</Data> <Data>/Pages/Home.aspx</Data> <Data>10.3.60.26</Data> <Data>SAMC\BARNMD</Data> <Data>True</Data> <Data>NTLM</Data> <Data>SAMC\MossAppPool</Data> <Data>110</Data> <Data>SAMC\MossAppPool</Data> <Data>False</Data> <Data> at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) </Data> </EventData> </Event>

Read the article
Data Profiling without SSIS

Strangely enough for a predominantly SSIS blog, this post is all about how to perform data profiling without using SSIS. Whilst the Data Profiling Task is a worthy addition, there are a couple of limitations I’ve encountered of late. The first is that it requires SQL Server 2008, and not everyone is there yet. The second is that it can only target SQL Server 2005 and above. What about older systems, which are the ones that we probably need to investigate the most, or other vendor databases such as Oracle? With these limitations in mind I did some searching to find a quick and easy alternative to help me perform some data profiling for a project I was working on recently. I only had SQL Server 2005 available, and anyway most of my target source systems were Oracle, and of course I had short timescales. I looked at several options. Some never got beyond the download stage, they failed to install or just did not run, and others provided less than I could have produced myself by spending 2 minutes writing some basic SQL queries. In the end I settled on an open source product called DataCleaner. To quote from their website: DataCleaner is an Open Source application for profiling, validating and comparing data. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more. DataCleaner is developed in Java and licensed under LGPL. As quoted above it claims to support profiling, validating and comparing data, but I didn’t really get past the profiling functions, so won’t comment on the other two. The profiling whilst not prefect certainly saved some time compared to the limited alternatives. The ability to profile heterogeneous data sources is a big advantage over the SSIS option, and I found it overall quite easy to use and performance was good. I could see it struggling at times, but actually for what it does I was impressed. It had some data type niggles with Oracle, and some metrics seem a little strange, although thankfully they were easy to augment with some SQL queries to ensure a consistent picture. The report export options didn’t do it for me, but copy and paste with a bit of Excel magic was sufficient. One initial point for me personally is that I have had limited exposure to things of the Java persuasion and whilst I normally get by fine, sometimes the simplest things can throw me. For example installing a JDBC driver, why do I have to copy files to make it all work, has nobody ever heard of an MSI? In case there are other people out there like me who have become totally indoctrinated with the Microsoft software paradigm, I’ve written a quick start guide that details every step required. Steps 1- 5 are the key ones, the rest is really an excuse for some screenshots to show you the tool. Quick Start Guide Step 1 - Download Data Cleaner. The Microsoft Windows zipped exe option, and I chose the latest stable build, currently DataCleaner 1.5.3 (final). Extract the files to a suitable location. Step 2 - Download Java. If you try and run datacleaner.exe without Java it will warn you, and then open your default browser and take you to the Java download site. Follow the installation instructions from there, normally just click Download Java a couple of times and you’re done. Step 3 - Download Microsoft SQL Server JDBC Driver. You may have SQL Server installed, but you won’t have a JDBC driver. Version 3.0 is the latest as of April 2010. There is no real installer, we are in the Java world here, but run the exe you downloaded to extract the files. The default Unzip to folder is not much help, so try a fully qualified path such as C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\ to ensure you can find the files afterwards. Step 4 - If you wish to use Windows Authentication to connect to your SQL Server then first we need to copy a file so that Data Cleaner can find it. Browse to the JDBC extract location from Step 3 and drill down to the file sqljdbc_auth.dll. You will have to choose the correct directory for your processor architecture. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\auth\x86\sqljdbc_auth.dll. Now copy this file to the Data Cleaner extract folder you chose in Step 1. An alternative method is to edit datacleaner.cmd in the data cleaner extract folder as detailed in this data cleaner wiki topic, but I find copying the file simpler. Step 5 – Now lets run Data Cleaner, just run datacleaner.exe from the extract folder you chose in Step 1. Step 6 – Complete or skip the registration screen, and ignore the task window for now. In the main window click settings. Step 7 – In the Settings dialog, select the Database drivers tab, then click Register database driver and select the Local JAR file option. Step 8 – Browse to the JDBC driver extract location from Step 3 and drill down to select sqljdbc4.jar. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\sqljdbc4.jar Step 9 – Select the Database driver class as com.microsoft.sqlserver.jdbc.SQLServerDriver, and then click the Test and Save database driver button. Step 10 - You should be back at the Settings dialog with a the list of drivers that includes SQL Server. Just click Save Settings to persist all your hard work. Step 11 – Now we can start to profile some data. In the main Data Cleaner window click New Task, and then Profile from the task window. Step 12 – In the Profile window click Open Database Step 13 – Now choose the SQL Server connection string option. Selecting a connection string gives us a template like jdbc:sqlserver://<hostname>:1433;databaseName=<database>, but obviously it requires some details to be entered for example jdbc:sqlserver://localhost:1433;databaseName=SQLBits. This will connect to the database called SQLBits on my local machine. The port may also have to be changed if using such as when you have a multiple instances of SQL Server running. If using SQL Server Authentication enter a username and password as required and then click Connect to database. You can use Window Authentication, just add integratedSecurity=true to the end of your connection string. e.g jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true. If you didn’t complete Step 4 above you will need to do so now and restart Data Cleaner before it will work. Manually setting the connection string is fine, but creating a named connection makes more sense if you will be spending any length of time profiling a specific database. As highlighted in the left-hand screen-shot, at the bottom of the dialog it includes partial instructions on how to create named connections. In the folder shown C:\Users\<Username>\.datacleaner\1.5.3, open the datacleaner-config.xml file in your editor of choice add your own details. You’ll see a sample connection in the file already, just add yours following the same pattern. e.g.  <bean class="dk.eobjects.datacleaner.gui.model.NamedConnection"> <property name="name" value="SQLBits Local Connection" /> <property name="driverClass" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" /> <property name="connectionString" value="jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true" /> <property name="tableTypes"> <list> <value>TABLE</value> <value>VIEW</value> </list> </property> </bean> Step 14 – Once back at the Profile window, you should now see your schemas, tables and/or views listed down the left hand side. Browse this tree and double-click a table to select it for profiling. You can then click Add profile, and choose some profiling options, before finally clicking Run profiling. You can see below a sample output for three of the most common profiles, click the image for full size. I hope this has given you a taster for DataCleaner, and should help you get up and running pretty quickly.

Read the article
Oracle Data Integration Solutions and the Oracle EXADATA Database Machine

- by João Vilanova

Oracle's data integration solutions provide a complete, open and integrated solution for building, deploying, and managing real-time data-centric architectures in operational and analytical environments. Fully integrated with and optimized for the Oracle Exadata Database Machine, Oracle's data integration solutions take data integration to the next level and delivers extremeperformance and scalability for all the enterprise data movement and transformation needs. Easy-to-use, open and standards-based Oracle's data integration solutions dramatically improve productivity, provide unparalleled efficiency, and lower the cost of ownership.You can watch a video about this subject, after clicking on the link below.DIS for EXADATA Video

Read the article
MacGyver Moments

- by KKline

In case you haven't heard, your MacGyver Moments are those times when you improvised an excellent solution to a problem using non-traditional materials, techniques, or tools......(read more)

Read the article

Search Results

Search found 60920 results on 2437 pages for 'data professional'.

Page 52/2437 | < Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >

- by pbarney

- by owen

- by bebraw

- by RizwanK

- by Splash6

- by bobobobo

- by llloydxmas

- by user337542

- by Christian 'fuzi' Orgler

- by Jason

- by warrenm

- by thekevinscott

- by niko

- by Austin Truong

- by Mike Morley

- by Odrade

- by Shawn

- by descent

- by Shlomo

- by Snow Crash

- by Reed

- by SPSamL

- by João Vilanova

- by KKline

< Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >