Search Results

Search found 59142 results on 2366 pages for 'data annotation'.

Page 49/2366 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >

  • Algorithm detect repeating/similiar strings in a corpus of data -- say email subjects, in Python

    - by RizwanK
    I'm downloading a long list of my email subject lines , with the intent of finding email lists that I was a member of years ago, and would want to purge them from my Gmail account (which is getting pretty slow.) I'm specifically thinking of newsletters that often come from the same address, and repeat the product/service/group's name in the subject. I'm aware that I could search/sort by the common occurrence of items from a particular email address (and I intend to), but I'd like to correlate that data with repeating subject lines.... Now, many subject lines would fail a string match, but "Google Friends : Our latest news" "Google Friends : What we're doing today" are more similar to each other than a random subject line, as is: "Virgin Airlines has a great sale today" "Take a flight with Virgin Airlines" So -- how can I start to automagically extract trends/examples of strings that may be more similar. Approaches I've considered and discarded ('because there must be some better way'): Extracting all the possible substrings and ordering them by how often they show up, and manually selecting relevant ones Stripping off the first word or two and then count the occurrence of each sub string Comparing Levenshtein distance between entries Some sort of string similarity index ... Most of these were rejected for massive inefficiency or likelyhood of a vast amount of manual intervention required. I guess I need some sort of fuzzy string matching..? In the end, I can think of kludgy ways of doing this, but I'm looking for something more generic so I've added to my set of tools rather than special casing for this data set. After this, I'd be matching the occurring of particular subject strings with 'From' addresses - I'm not sure if there's a good way of building a data structure that represents how likely/not two messages are part of the 'same email list' or by filtering all my email subjects/from addresses into pools of likely 'related' emails and not -- but that's a problem to solve after this one. Any guidance would be appreciated.

    Read the article

  • What changes in Core Data after a save?

    - by Splash6
    I have a Core Data based mac application that is working perfectly well until I save a file. When I save a file it seems like something changes in core data because my original fetch request no longer fetches anything. This is the fetch request that works before saving but returns an empty array after saving. NSEntityDescription *outputCellEntityDescription = [NSEntityDescription entityForName:@"OutputCell" inManagedObjectContext:[[self document] managedObjectContext]]; NSFetchRequest *outputCellRequest = [[[NSFetchRequest alloc] init] autorelease]; [outputCellRequest setEntity:outputCellEntityDescription]; NSPredicate *outputCellPredicate = [NSPredicate predicateWithFormat:@"(cellTitle = %@)", outputCellTitle]; [outputCellRequest setPredicate:outputCellPredicate]; NSError *outputCellError = nil; NSArray *outputCellArray = [[[self document] managedObjectContext] executeFetchRequest:outputCellRequest error:&outputCellError]; I have checked with [[[self document] managedObjectContext] registeredObjects] to see that the object still exists after the save and nothing seems to have changed and the object still exists. It is probably something fairly basic but does anyone know what I might be doing wrong? If not can anyone give me any pointers to what might be different in the Core Data model after a save so I might have some clues why the fetch request stops working after saving?

    Read the article

  • C++: best way to implement globally scoped data

    - by bobobobo
    I'd like to make program-wide data in a C++ program, without running into pesky LNK2005 errors when all the source files #includes this "global variable repository" file. I have 2 ways to do it in C++, and I'm asking which way is better. The easiest way to do it in C# is just public static members. C#: public static class DataContainer { public static Object data1 ; public static Object data2 ; } In C++ you can do the same thing C++ global data way#1: class DataContainer { public: static Object data1 ; static Object data2 ; } ; Object DataContainer::data1 ; Object DataContainer::data2 ; However there's also extern C++ global data way #2: class DataContainer { public: Object data1 ; Object data2 ; } ; extern DataContainer * dataContainer ; // instantiate in .cpp file In C++ which is better, or possibly another way which I haven't thought about? The solution has to not cause LNK2005 "object already defined" errors.

    Read the article

  • strange memory error when deleting object from Core Data

    - by llloydxmas
    I have an application that downloads an xml file, parses the file, and creates core data objects while doing so. In the parse code I have a function called 'emptydatacontext' that removes all items from Core Data before creating replacements from the xml file. This method looks like this: -(void) emptyDataContext { NSMutableArray* mutableFetchResults = [CoreDataHelper getObjectsFromContext:@"Condition" :@"road" :NO :managedObjectContext]; NSFetchRequest * allCon = [[NSFetchRequest alloc] init]; [allCon setEntity:[NSEntityDescription entityForName:@"Condition" inManagedObjectContext:managedObjectContext]]; NSError * error = nil; NSArray * conditions = [managedObjectContext executeFetchRequest:allCon error:&error]; [allCon release]; for (NSManagedObject * condition in conditions) { [managedObjectContext deleteObject:condition]; } } The first time this runs it deletes all objects and functions as it should - creating new objects from the xml file. I created a 'update' button that starts the exact same process of retrieving the file the preceeding as it did the first time. All is well until its time to delete the core data objects again. This 'deleteObject' call creates a "EXC_BAD_ACCESS" error each time. This only happens on the second time through. See this image for the debugger window as it appears when walking through the deletion FOR loop on the second iteration. Conditions is the fetched array of 7 objects with the objects below. Condition should be an individual condition. link text As you can see 'condition' does not match any of the objects in the 'conditions' array. I'm sure this is why I'm getting the memory access errors. Just not sure why this fetch (or the FOR) is returning a wrong reference. All the code that successfully performes this function on the first iteration is used in the second but with very different results. Thanks in advance for the help!

    Read the article

  • MVC and jQuery data retrieval.

    - by user337542
    Hello, I am using mvc and jQuery and I am trying to display someone's profile with some additional institutions that the person belongs to. I am new to this but I've done something like this in ProfileControler: public ActionResult Institutions(int id) { var inst = fr.getInstitutions(id); return Json(inst); } getInstitutions(id) returns Institution objects(with Name, City, Post Code etc.) Then in a certain View I am trying to get the data with jQuery and display them as follows: $(document).ready(function () { $.post("/Profile/Institutions", { id: <%= Model.Profile.userProfileID %> }, function (data) { $.each(data, function () { var new_div = $("<div>"); var new_label = $("<label>"); new_label.html(this.City); var new_input_b = $("<input>"); new_input_b.attr("type", "button"); new_div.append(new_label); new_div.append(new_input_b); $("#institutions").append(new_div); }); }); }); $("#institutions") is a div where i want to display all of the results. .post works correct for sure because certain institutions are retrieved from database, and passed to the view as Json result. But then I am affraid it wont itterate with .each. Any help, coments or pointing in some direction would be much appriciated

    Read the article

  • JTable data only shown after scrolling

    - by Christian 'fuzi' Orgler
    I wrote a method, that creates my DefaultTableModel and there I'm going to add my records. When I set the model to my JTable, the data rows are blank. After scrolling the data gets displayed correct. How can I avoid this and display the data from the first moment? EDIT: I imported the javax.swing.table.DefaultTableModel -- is this correct? private DefaultTableModel _dtm; private void loadTable(Vector<Member> members) { loadTableModel(); try { lbl_state.setText("Please wait"); for (Member actMember : members) { String gender = ""; if (actMember.getGender() == MemberView.MEMBER_MALE) { gender = "männlich"; } else { gender = "weiblich"; } _dtm.addRow(new Object[]{ actMember.getNname(), actMember.getVname(), actMember.getCity(), actMember.getStreet(), actMember.getPlz(), actMember.getMail(), actMember.getPhonenumber(), actMember.getBirthdayString(), actMember.getStartDateString(), gender, actMember.getBankname(), actMember.getAccountnumber(), actMember.getBanknumber(), actMember.getGroup().toString(), (actMember.hasAccess() ? "JA" : "NEIN"), actMember.getWriteDateString(), (actMember.hasDrinkAbo() ? "JA" : "NEIN") }); } } catch (Exception ex) { System.err.println(ex.getMessage()); } tbl_results.setModel(_dtm); } private void loadTableModel() { _dtm = new DefaultTableModel(new Object[]{"Nachname", "Vorname", "Ort", "Straße", "PLZ", "E-Mail", "Telefon", "Geburtsdatum", "Beitrittsdatum", "Geschlecht", "Bankname", "Kontonummer", "Bankleitzahl", "Gruppe", "hat Zugriff", "Einschreibdatum", "Getränkeabo"}, 0); tbl_results.setAutoResizeMode(JTable.AUTO_RESIZE_ALL_COLUMNS); }

    Read the article

  • NSFetchedResultsController on secondary UITableView - how to query data?

    - by Jason
    I am creating a core-data based Navigation iPhone app with multiple screens. Let's say it is a flash-card application. The data model is very simple, with only two entities: Language, and CardSet. There is a one-to-many relationship between the Language entity and the CardSet entities, so each Language may contain multiple CardSets. In other words, Language has a one-to-many relationship Language.cardSets which points to the list of CardSets, and CardSet has a relationship CardSet.language which points to the Language. There are two screens: (1) An initial TableView screen, which displays the list of languages; and (2) a secondary TableView screen, which displays the list of CardSets in the Language. In the initial screen, which lists the languages, I am using NSFetchedResultsController to keep the list of languages up-to-date. The screen passes the Language selected to the secondary screen. On the secondary screen, I am trying to figure out whether I should again use an NSFetchedResultsController to maintain the list of CardSets, or if I should work through Language.cardSets to simply pull the list out of the object model. The latter makes the most sense programatically because I already have the Language - but then it would not automatically be updated on changes. I have looked at the NSFetchedResultsController documentation, and it seems like I can easily create predicates based on attributes - but not relationships. I.e., I can create the following NSFetchedResultsController: NSPredicate *predicate = [NSPredicate predicateWithFormat:@"name LIKE[c] 'Chuck Norris'"]; How can I access my data through the direct relationship - Language.cardSets - and also have the table auto-update using NSFetchedResultsController? Is this possible?

    Read the article

  • Can't find momd file: Core Data problems

    - by thekevinscott
    Aw geez! I screwed something up! I'm a Core Data noob, working on my first iOS app. After much Stack Overflowing I'm using this code: NSString *path = [[NSBundle mainBundle] pathForResource:@"CoreData" ofType:@"momd"]; if (!path) { path = [[NSBundle mainBundle] pathForResource:@"CoreData" ofType:@"mom"]; } NSAssert(path != nil, @"Unable to find Resource in main bundle"); CoreData is the name of my app. I've tried to put in initial data into the app by finding the path to the sqlite file in my iPhone simulator, and then going and inserting into that sqlite file. But at some point, I moved the sqlite (thinking it would create a fresh copy), deleted the app from the simulator, and the sqlite file is gone. I'm not sure if I'm leaving out some part of the process (this was a few hours ago) but the end result is that everything is screwed up. How do I resubstantiate this sqlite / momd file? "Clean" and "Clean all targets" are grayed out. I'm happy to post the relevant code from my app that would help shed some light on this problem but there's tons of code relating to Core Data which I don't understand, so I'm not sure what part to post! Any help is greatly appreciated.

    Read the article

  • Import & modify date data in MATLAB

    - by niko
    I have a .csv file with records written in the following form: 2010-04-20 15:15:00,"8.9915176259e+00","8.8562623697e+00" 2010-04-20 15:30:00,"8.5718021723e+00","8.6633827160e+00" 2010-04-20 15:45:00,"8.4484844117e+00","8.4336586330e+00" 2010-04-20 16:00:00,"1.1106980342e+01","8.4333062208e+00" 2010-04-20 16:15:00,"9.0643470589e+00","8.6885660103e+00" 2010-04-20 16:30:00,"8.2133517943e+00","8.2677822671e+00" 2010-04-20 16:45:00,"8.2499419380e+00","8.1523501983e+00" 2010-04-20 17:00:00,"8.2948492278e+00","8.2884797924e+00" From these data I would like to make clusters - I would like to add a column with number indicating the hour - so in case of the first row a value 15 has to be added in a new row. The first problem is that calling a function [numData, textData, rawData] = xlsread('testData.csv') creates an empty matrix numData and one-column textData and rawData structures. Is it possible to create any template which recognizes a yyyy, MM, dd, hh, mm, ss values from the data above? What I would basically like to do with these data is to categorize the values by hours so from the example row of input: 2010-04-20 15:15:00,"8.9915176259e+00","8.8562623697e+00" update 1: in Matlab the line above is recognized as a string: '2010-04-26 13:00:00,"1.0428104753e+00","2.3456394130e+00"' I would want this to be the output: 15, 8.9915176259e+00, 8.8562623697e+00 update 1: a string has to be parsed Does anyone know how to parse a string and retrieve a timestamp, value1 (1.0428104753e+00) and value2 (2.3456394130e+00) from it as separate values?

    Read the article

  • Uniquing with Existing Core Data Entities

    - by warrenm
    I'm using Core Data to store a lot (1000s) of items. A pair of properties on each item are used to determine uniqueness, so when a new item comes in, I compare it against the existing items before inserting it. Since the incoming data is in the form of an RSS feed, there are often many duplicates, and the cost of the uniquing step is O(N^2), which has become significant. Right now, I create a set of existing items before iterating over the list of (possible) new items. My theory is that on the first iteration, all the items will be faulted in, and assuming we aren't pressed for memory, most of those items will remain resident over the course of the iteration. I see my options thusly: Use string comparison for uniquing, iterating over all "new" items and comparing to all existing items (Current approach) Use a predicate to filter the set of existing items against the properties of the "new" items. Use a predicate with Core Data to determine uniqueness of each "new" item (without retrieving the set of existing items). Is option 3 likely to be faster than my current approach? Do you know of a better way?

    Read the article

  • WCF data services - Limiting related objects returned based on critera

    - by Mike Morley
    I have an object graph consisting of a base employee object, and a set of related message objects. I am able to return the employee objects based on search criteria on the employee properties (eg team) etc. However, if I expand on the messages, I get the full collection of messages back. I would like to be able to either take the top n messages (i.e. restrict to 10 most recent) or ideally use a date range on the message objects to limit how many are brought back. So far I have not been able to figure out a way of doing this: I get an error if I attempt to filter on properties on the message (&$filter=employee/message/StartDate gives an error "No property 'StartDate' exists in type 'System.Data.Objects.DataClasses.EntityCollection`1). Attempting to use Top on the message related object doesn't work either. I have also tried using a WebGet extension that takes a string list of employee IDs. That works until the list gets too long, and then fails due to the URL getting too long (it might be possible to setup a paging mechanism on this approach)... Unfortunately the UI control I am using requires the data to be in a fairly specific hierarchical shape, so I can't easily come at this from starting on the message side and working backwards. Outside of making multiple calls does anyone know of a method to accomplish this with wcf data services? Thanks! M.

    Read the article

  • Multidimensional data structure?

    - by Austin Truong
    I need a multidimensional data structure with a row and a column. Must be able to insert elements any location in the data structure. Example: {A , B} I want to insert C in between A and B. {A, C, B}. Dynamic: I do not know the size of the data structure. Another example: I know the [row][col] of where I want to insert the element. EX. insert("A", 1, 5), where A is the element to be inserted, 1 is the row, 5 is the column. EDIT I want to be able to insert like this. static void Main(string[] args) { Program p = new Program(); List<string> list = new List<string>(); list.Insert(1, "HELLO"); list.Insert(5, "RAWR"); for (int i = 0; i < list.Count; i++) { Console.WriteLine(list[i]); } Console.ReadKey(); } And of course this crashes with an out of bounds error. So in a sense I will have a user who will choose which ROW and COL to insert the element to.

    Read the article

  • Compact data structure for storing a large set of integral values

    - by Odrade
    I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

    Read the article

  • Display data FROM Sheet1 on Sheet2 based on conditional value

    - by Shawn
    Imagine a worksheet with 30 pieces of information, such as: A1= Start Date A2 = End Date A3 = Resource Name A4 = Cost .... A30 = Whatever B1 = 1/1/2010 B2 = 2/15/2010 B3 = Joe Smith B4 = $10,000.00 ... B30 = Blah Blah Now imagine a third column, C. The purpose of the third column is to determine WHICH report that row of data needs to appear in. C1 = Report 1 C2 = Report 1 and Report 2 C3 = Report 4 and Report 7 C4 = Report 1 and Report 5 ... C30 = Report 2 Each report is on Sheets 2, 3, 4, 5 and so on (depending on how many I decide to create). As you can see form my example above, some data may need to appear in multiple reports. For example, the data in Row 3 (Resource Name: Joe Smith) needs to appear in Report 4 and Report 7. That is to say, it needs to DYNAMICALLY appear on two additional worksheets. If I change the values in column C, then the reports need to update automatically. How can I create the worksheets which will serve as the reports such that they only diaplay the rows which have been "flagged" to be displayed in that report? Thanks!

    Read the article

  • Spring annotation-based container configuration context:include & exclude filters

    - by lisak
    Hey, first off I point to the similar question. I spent more than an hour to set this up, but PathMatchingResourcePatternResolver still scans everything. I have one common.xml (that is imported from specific.xml) and a specific.xml bean definition file. The context is loaded from specific.xml. In common.xml there is this element: <context:component-scan base-package="cz.instance.transl"> <context:exclude-filter type="aspectj" expression="cz.instance.transl.model..* &amp;&amp; cz.instance.transl.service..* &amp;&amp; cz.instance.transl.hooks..*"/> </context:component-scan> Where classes in packages like cz.instance.transl.service.* should not be subject of scanning, but everything else in here cz.instance.transl.* should be scanned through. But PathMatchingResourcePatternResolver marks everything as matching resources. It is the same with regex. BTW: in xml style configuration, one can have many components that share a common.xml beans via "import resource" when loading context. How this is done when Annotation-based container configuration is used ?

    Read the article

  • hibernate annotation bi-directional mapping

    - by smithystar
    I'm building a web application using Spring framework and Hibernate with annotation and get stuck with a simple mapping between two entities. I'm trying to create a many-to-many relationship between User and Course. I followed one of the Hibernate tutorials and my implementation is as follows: User class: @Entity @Table(name="USER") public class User { private Long id; private String email; private String password; private Set<Course> courses = new HashSet<Course>(0); @Id @GeneratedValue @Column(name="USER_ID") public Long getId() { return id; } public void setId(Long id) { this.id = id; } @Column(name="USER_EMAIL") public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } @Column(name="USER_PASSWORD") public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } @ManyToMany(cascade = CascadeType.ALL) @JoinTable(name = "USER_COURSE", joinColumns = { @JoinColumn(name = "USER_ID") }, inverseJoinColumns = { @JoinColumn(name = "COURSE_ID") }) public Set<Course> getCourses() { return courses; } public void setCourses(Set<Course> courses) { this.courses = courses; } } Course class: @Entity @Table(name="COURSE") public class Course { private Long id; private String name; @Id @GeneratedValue @Column(name="COURSE_ID") public Long getId() { return id; } public void setId(Long id) { this.id = id; } @Column(name="NAME") public String getName() { return name; } public void setName(String name) { this.name = name; } } The problem is that this implementation only allows me to go one way user.getCourses() What do I need to change, so I can go in both directions? user.getCourses() course.getUsers() Any help would be appreciated.

    Read the article

  • Python, web log data mining for frequent patterns

    - by descent
    Hello! I need to develop a tool for web log data mining. Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website. I am new to Data Mining, and now examining Google a lot. Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies. So my questions are: Are there any python-based tools that do what I need or at least smth similar? Can Orange toolkit be of any help? Can reading the book Programming Collective Intelligence be of any help? What to Google for, what to read, which relatively simple algorithms to use best? I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time. Thanks in advance!

    Read the article

  • Data Integration Solution?

    - by Shlomo
    At my company we have a number of data feeds and processing that run on any given day. The number of feeds and processing steps is starting to out-number the ability to manage it ad-hoc as it is managed currently. Is there a good solution that helps with logging and managing/scheduling dependencies? For example: A: When file x is FTP dropped into directory D1, kick off processing step B B: Load flat file into DB1 C: When file y is FTP dropped into directory D2, kick off processing Step D D: Load flat file into DB11 E: When B and D are done, churn through the data, and load new data into DB111. F: When Step E is done, launch application process P G: etc... I want those steps to run at the appropriate times, not to mention if B fails, there's no reason to run steps E & F, but I could still run C & D. When I re-run B successfully, it should trigger just E & F to re-run, not C & D. We're a .NET/C#/Sql Server shop, and I'm already familiar with SSIS. Is that really the best there is? That manages steps well, but not external dependencies, or logging. Open source (.NET) preferred, but not required.

    Read the article

  • Save many-to-one relationship from JSON into Core Data

    - by Snow Crash
    I'm wanting to save a Many-to-one relationship parsed from JSON into Core Data. The code that parses the JSON and does the insert into Core Data looks like this: for (NSDictionary *thisRecipe in recipes) { Recipe *recipe = [NSEntityDescription insertNewObjectForEntityForName:@"Recipe" inManagedObjectContext:insertionContext]; recipe.title = [thisRecipe objectForKey:@"Title"]; NSDictionary *ingredientsForRecipe = [thisRecipe objectForKey:@"Ingredients"]; NSArray *ingredientsArray = [ingredientsForRecipe objectForKey:@"Results"]; for (NSDictionary *thisIngredient in ingredientsArray) { Ingredient *ingredient = [NSEntityDescription insertNewObjectForEntityForName:@"Ingredient" inManagedObjectContext:insertionContext]; ingredient.name = [thisIngredient objectForKey:@"Name"]; } } NSSet *ingredientsSet = [NSSet ingredientsArray]; [recipe setIngredients:ingredientsSet]; Notes: "setIngredients" is a Core Data generated accessor method. There is a many-to-one relationship between Ingredients and Recipe However, when I run this I get the following error: NSCFDictionary managedObjectContext]: unrecognized selector sent to instance If I remove the last line (i.e. [recipe setIngredients:ingredientsSet];) then, taking a peek at the SQLite database, I see the Recipe and Ingredients have been stored but no relationship has been created between Recipe and Ingredients Any suggestions as to how to ensure the relationship is stored correctly?

    Read the article

  • Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

    - by Reed
    In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine.  If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine.  The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off dataData Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection.  One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism.  In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library.  This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple.  Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter.  Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel.  Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition.  Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns.  We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change.  Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine.  Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method.  In this case, our for loop iteration block becomes a delegate creating via a lambda expression.  This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea.  There is a balance to be struck when writing parallel code.  We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce.  In this case, we have an image of data – most likely hundreds of pixels in both dimensions.  By just parallelizing our first loop, each row of pixels can be run as a single task.  With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks.  This introduces extra overhead with no extra gain, and will actually reduce our overall performance.  This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop.  I could have just as easily partitioned the inner loop.  However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels).  My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop.  If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance.  If we’re iterating through an IList<T> or an array, this is a typical approach.  However, there are other iteration statements common in C#.  In many situations, we’ll use foreach instead of a for loop.  This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation.  Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change.  Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist.  If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression.  This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel.  The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library.  These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism.  Iterating through collections and performing “work” is a very common pattern in nearly every application.  When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls.  Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

    Read the article

  • ERROR: Attempted to read or write protected memory. This is often an indication that other memory is corrupt

    - by SPSamL
    I get this error after having edited a few pages in SharePoint 2010. I have to do an IISReset on both front ends to get this to resolve. I don't know how to fix it or even what else to supply here, but please let me know as the resets now happen several times per day. Log Name: Application Source: ASP.NET 2.0.50727.0 Date: 1/26/2011 11:12:48 AM Event ID: 1309 Task Category: Web Event Level: Warning Keywords: Classic User: N/A Computer: PINTSPSFE02.samcstl.org Description: Event code: 3005 Event message: An unhandled exception has occurred. Event time: 1/26/2011 11:12:48 AM Event time (UTC): 1/26/2011 5:12:48 PM Event ID: c52fb336b7f147a3913fff3617a99d57 Event sequence: 4965 Event occurrence: 2178 Event detail code: 0 Application information: Application domain: /LM/W3SVC/1449762715/ROOT-2-129405348166941887 Trust level: WSS_Minimal Application Virtual Path: / Application Path: C:\inetpub\wwwroot\wss\VirtualDirectories\80\ Machine name: PINTSPSFE02 Process information: Process ID: 5928 Process name: w3wp.exe Account name: SAMC\MossAppPool Exception information: Exception type: AccessViolationException Exception message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Request information: Request URL: http://mosscluster/Pages/Home.aspx Request path: /Pages/Home.aspx User host address: 10.3.60.26 User: SAMC\BARNMD Is authenticated: True Authentication Type: NTLM Thread account name: SAMC\MossAppPool Thread information: Thread ID: 110 Thread account name: SAMC\MossAppPool Is impersonating: False Stack trace: at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) Custom event details: Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="ASP.NET 2.0.50727.0" /> <EventID Qualifiers="32768">1309</EventID> <Level>3</Level> <Task>3</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2011-01-26T17:12:48.000000000Z" /> <EventRecordID>35834</EventRecordID> <Channel>Application</Channel> <Computer>PINTSPSFE02.samcstl.org</Computer> <Security /> </System> <EventData> <Data>3005</Data> <Data>An unhandled exception has occurred.</Data> <Data>1/26/2011 11:12:48 AM</Data> <Data>1/26/2011 5:12:48 PM</Data> <Data>c52fb336b7f147a3913fff3617a99d57</Data> <Data>4965</Data> <Data>2178</Data> <Data>0</Data> <Data>/LM/W3SVC/1449762715/ROOT-2-129405348166941887</Data> <Data>WSS_Minimal</Data> <Data>/</Data> <Data>C:\inetpub\wwwroot\wss\VirtualDirectories\80\</Data> <Data>PINTSPSFE02</Data> <Data> </Data> <Data>5928</Data> <Data>w3wp.exe</Data> <Data>SAMC\MossAppPool</Data> <Data>AccessViolationException</Data> <Data></Data> <Data>http://mosscluster/Pages/Home.aspx</Data> <Data>/Pages/Home.aspx</Data> <Data>10.3.60.26</Data> <Data>SAMC\BARNMD</Data> <Data>True</Data> <Data>NTLM</Data> <Data>SAMC\MossAppPool</Data> <Data>110</Data> <Data>SAMC\MossAppPool</Data> <Data>False</Data> <Data> at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Delete(String key, Boolean recursive, DeletionReason reason) at Microsoft.Office.Server.ObjectCache.SPCache.MossObjectCache_Tracked.Get(String key) at Microsoft.Office.Server.ObjectCache.SPCache.Get(String objectTypeName, String id) at Microsoft.Office.Server.Administration.UserProfileServiceProxy.GetPartitionPropertiesCache(Guid applicationID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.get_PartitionPropertiesCache() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.DataCache.get_PartitionProperties() at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, Guid partitionID) at Microsoft.Office.Server.Administration.UserProfileApplicationProxy.GetMySitePortalUrl(SPUrlZone zone, SPServiceContext serviceContext) at Microsoft.Office.Server.WebControls.MyLinksRibbon.EnsureMySiteUrls() at Microsoft.Office.Server.WebControls.MyLinksRibbon.get_PortalMySiteUrlAvailable() at Microsoft.Office.Server.WebControls.MyLinksRibbon.OnLoad(EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) </Data> </EventData> </Event>

    Read the article

  • Data Profiling without SSIS

    Strangely enough for a predominantly SSIS blog, this post is all about how to perform data profiling without using SSIS. Whilst the Data Profiling Task is a worthy addition, there are a couple of limitations I’ve encountered of late. The first is that it requires SQL Server 2008, and not everyone is there yet. The second is that it can only target SQL Server 2005 and above. What about older systems, which are the ones that we probably need to investigate the most, or other vendor databases such as Oracle? With these limitations in mind I did some searching to find a quick and easy alternative to help me perform some data profiling for a project I was working on recently. I only had SQL Server 2005 available, and anyway most of my target source systems were Oracle, and of course I had short timescales. I looked at several options. Some never got beyond the download stage, they failed to install or just did not run, and others provided less than I could have produced myself by spending 2 minutes writing some basic SQL queries. In the end I settled on an open source product called DataCleaner. To quote from their website: DataCleaner is an Open Source application for profiling, validating and comparing data. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more. DataCleaner is developed in Java and licensed under LGPL. As quoted above it claims to support profiling, validating and comparing data, but I didn’t really get past the profiling functions, so won’t comment on the other two. The profiling whilst not prefect certainly saved some time compared to the limited alternatives. The ability to profile heterogeneous data sources is a big advantage over the SSIS option, and I found it overall quite easy to use and performance was good. I could see it struggling at times, but actually for what it does I was impressed. It had some data type niggles with Oracle, and some metrics seem a little strange, although thankfully they were easy to augment with some SQL queries to ensure a consistent picture. The report export options didn’t do it for me, but copy and paste with a bit of Excel magic was sufficient. One initial point for me personally is that I have had limited exposure to things of the Java persuasion and whilst I normally get by fine, sometimes the simplest things can throw me. For example installing a JDBC driver, why do I have to copy files to make it all work, has nobody ever heard of an MSI? In case there are other people out there like me who have become totally indoctrinated with the Microsoft software paradigm, I’ve written a quick start guide that details every step required. Steps 1- 5 are the key ones, the rest is really an excuse for some screenshots to show you the tool. Quick Start Guide Step 1  - Download Data Cleaner. The Microsoft Windows zipped exe option, and I chose the latest stable build, currently DataCleaner 1.5.3 (final). Extract the files to a suitable location. Step 2 - Download Java. If you try and run datacleaner.exe without Java it will warn you, and then open your default browser and take you to the Java download site. Follow the installation instructions from there, normally just click Download Java a couple of times and you’re done. Step 3 - Download Microsoft SQL Server JDBC Driver. You may have SQL Server installed, but you won’t have a JDBC driver. Version 3.0 is the latest as of April 2010. There is no real installer, we are in the Java world here, but run the exe you downloaded to extract the files. The default Unzip to folder is not much help, so try a fully qualified path such as C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\ to ensure you can find the files afterwards. Step 4 - If you wish to use Windows Authentication to connect to your SQL Server then first we need to copy a file so that Data Cleaner can find it. Browse to the JDBC extract location from Step 3 and drill down to the file sqljdbc_auth.dll. You will have to choose the correct directory for your processor architecture. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\auth\x86\sqljdbc_auth.dll. Now copy this file to the Data Cleaner extract folder you chose in Step 1. An alternative method is to edit datacleaner.cmd in the data cleaner extract folder as detailed in this data cleaner wiki topic, but I find copying the file simpler. Step 5 – Now lets run Data Cleaner, just run datacleaner.exe from the extract folder you chose in Step 1. Step 6 – Complete or skip the registration screen, and ignore the task window for now. In the main window click settings. Step 7 – In the Settings dialog, select the Database drivers tab, then click Register database driver and select the Local JAR file option. Step 8 – Browse to the JDBC driver extract location from Step 3 and drill down to select sqljdbc4.jar. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\sqljdbc4.jar Step 9 – Select the Database driver class as com.microsoft.sqlserver.jdbc.SQLServerDriver, and then click the Test and Save database driver button. Step 10 - You should be back at the Settings dialog with a the list of drivers that includes SQL Server. Just click Save Settings to persist all your hard work. Step 11 – Now we can start to profile some data. In the main Data Cleaner window click New Task, and then Profile from the task window. Step 12 – In the Profile window click Open Database Step 13 – Now choose the SQL Server connection string option. Selecting a connection string gives us a template like jdbc:sqlserver://<hostname>:1433;databaseName=<database>, but obviously it requires some details to be entered for example  jdbc:sqlserver://localhost:1433;databaseName=SQLBits. This will connect to the database called SQLBits on my local machine. The port may also have to be changed if using such as when you have a multiple instances of SQL Server running. If using SQL Server Authentication enter a username and password as required and then click Connect to database. You can use Window Authentication, just add integratedSecurity=true to the end of your connection string. e.g jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true.  If you didn’t complete Step 4 above you will need to do so now and restart Data Cleaner before it will work. Manually setting the connection string is fine, but creating a named connection makes more sense if you will be spending any length of time profiling a specific database. As highlighted in the left-hand screen-shot, at the bottom of the dialog it includes partial instructions on how to create named connections. In the folder shown C:\Users\<Username>\.datacleaner\1.5.3, open the datacleaner-config.xml file in your editor of choice add your own details. You’ll see a sample connection in the file already, just add yours following the same pattern. e.g. <!-- Darren's Named Connections --> <bean class="dk.eobjects.datacleaner.gui.model.NamedConnection"> <property name="name" value="SQLBits Local Connection" /> <property name="driverClass" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" /> <property name="connectionString" value="jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true" /> <property name="tableTypes"> <list> <value>TABLE</value> <value>VIEW</value> </list> </property> </bean> Step 14 – Once back at the Profile window, you should now see your schemas, tables and/or views listed down the left hand side. Browse this tree and double-click a table to select it for profiling. You can then click Add profile, and choose some profiling options, before finally clicking Run profiling. You can see below a sample output for three of the most common profiles, click the image for full size.   I hope this has given you a taster for DataCleaner, and should help you get up and running pretty quickly.

    Read the article

  • Oracle Data Integration Solutions and the Oracle EXADATA Database Machine

    - by João Vilanova
    Oracle's data integration solutions provide a complete, open and integrated solution for building, deploying, and managing real-time data-centric architectures in operational and analytical environments. Fully integrated with and optimized for the Oracle Exadata Database Machine, Oracle's data integration solutions take data integration to the next level and delivers extremeperformance and scalability for all the enterprise data movement and transformation needs. Easy-to-use, open and standards-based Oracle's data integration solutions dramatically improve productivity, provide unparalleled efficiency, and lower the cost of ownership.You can watch a video about this subject, after clicking on the link below.DIS for EXADATA Video

    Read the article

  • BIWA Wednesday TechCast Series - Opposition to Data Warehouse Initiatives

    - by jenny.gelhausen
    BIWA Wednesday TechCast Series - 19th Event! Opposition to Data Warehouse Initiatives Please join us for this webcast on Wednesday, March 24, 12 noon Eastern or check your local area's time Webcast is open to clients, prospects and partners. No matter how good your technology and technical skills, organizational issues can derail a data warehousing or BI project. Therefore BIWA presents a vital topic that crosses product boundaries: organizational resistance to data warehouse initiatives - how to recognize it and what to do about it. Many a DW/BI professional has been surprised by organizational resistance to DW/BI initiatives. Yet real organizational imperatives may be behind this apparently irrational behavior. Based on in-depth interviews with IT professionals, industry consultants, and power users, our speaker Bruce Jenks will present his research findings about what drives organizational resistance to data warehouse initiatives. The talk will cover specific behaviors that can signal organizational resistance to a data warehouse program and what organizations have done to address such resistance. Presenter: Bruce Jenks of Dun and Bradstreet Bruce Jenks has over 20 years experience in data warehousing and business intelligence, much of it as a consultant to large organizations spanning the US. Bruce's data warehousing clients have included firms such as Sprint, Gallo Wines, Southern California Edison, The Gap, and Safeway. He started his data warehousing career at Metaphor Computers, a pioneering DW/BI firm from which a number of industry luminaries sprang including Ralph Kimball (author of The Data Warehouse Toolkit ). Bruce continued his data warehousing career at HP, Stanford University and other firms. Bruce is currently completing his doctorate in business administration at Golden Gate University, and today's material arises from his doctoral research. He is also a principal consultant for Dun and Bradstreet. Audio Dial-In: 866 682 4770 Audio Meeting ID: 1683901 Audio Meeting Passcode: 334451 Web Conference: Please register at https://www1.gotomeeting.com/register/807185273 After you register you will be provided with a link to the TechCast. Invitation to Speakers: All BIWA members and Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) may submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community. Submit your BIWA TechCast abstract today! BIWA is a worldwide forum with over 2000 members who are business intelligence, warehousing and analytics professionals. BIWA presents information, experiences and best practices in successfully deploying Oracle Database-centric BI, Data Warehousing, and Analytics products, features and Options--the Oracle Database "BIWA" platform. Attendance Information & Replays at the BIWA website: oraclebiwa.org var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-13185312-1"); pageTracker._trackPageview(); } catch(err) {}

    Read the article

  • SQL Monitor’s data repository

    - by Chris Lambrou
    As one of the developers of SQL Monitor, I often get requests passed on by our support people from customers who are looking to dip into SQL Monitor’s own data repository, in order to pull out bits of information that they’re interested in. Since there’s clearly interest out there in playing around directly with the data repository, I thought I’d write some blog posts to start to describe how it all works. The hardest part for me is knowing where to begin, since the schema of the data repository is pretty big. Hmmm… I guess it’s tricky for anyone to write anything but the most trivial of queries against the data repository without understanding the hierarchy of monitored objects, so perhaps my first post should start there. I always imagine that whenever a customer fires up SSMS and starts to explore their SQL Monitor data repository database, they become immediately bewildered by the schema – that was certainly my experience when I did so for the first time. The following query shows the number of different object types in the data repository schema: SELECT type_desc, COUNT(*) AS [count] FROM sys.objects GROUP BY type_desc ORDER BY type_desc;  type_desccount 1DEFAULT_CONSTRAINT63 2FOREIGN_KEY_CONSTRAINT181 3INTERNAL_TABLE3 4PRIMARY_KEY_CONSTRAINT190 5SERVICE_QUEUE3 6SQL_INLINE_TABLE_VALUED_FUNCTION381 7SQL_SCALAR_FUNCTION2 8SQL_STORED_PROCEDURE100 9SYSTEM_TABLE41 10UNIQUE_CONSTRAINT54 11USER_TABLE193 12VIEW124 With 193 tables, 124 views, 100 stored procedures and 381 table valued functions, that’s quite a hefty schema, and when you browse through it using SSMS, it can be a bit daunting at first. So, where to begin? Well, let’s narrow things down a bit and only look at the tables belonging to the data schema. That’s where all of the collected monitoring data is stored by SQL Monitor. The following query gives us the names of those tables: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' ORDER BY sch.name, obj.name; This query still returns 110 tables. I won’t show them all here, but let’s have a look at the first few of them:  name 1data.Cluster_Keys 2data.Cluster_Machine_ClockSkew_UnstableSamples 3data.Cluster_Machine_Cluster_StableSamples 4data.Cluster_Machine_Keys 5data.Cluster_Machine_LogicalDisk_Capacity_StableSamples 6data.Cluster_Machine_LogicalDisk_Keys 7data.Cluster_Machine_LogicalDisk_Sightings 8data.Cluster_Machine_LogicalDisk_UnstableSamples 9data.Cluster_Machine_LogicalDisk_Volume_StableSamples 10data.Cluster_Machine_Memory_Capacity_StableSamples 11data.Cluster_Machine_Memory_UnstableSamples 12data.Cluster_Machine_Network_Capacity_StableSamples 13data.Cluster_Machine_Network_Keys 14data.Cluster_Machine_Network_Sightings 15data.Cluster_Machine_Network_UnstableSamples 16data.Cluster_Machine_OperatingSystem_StableSamples 17data.Cluster_Machine_Ping_UnstableSamples 18data.Cluster_Machine_Process_Instances 19data.Cluster_Machine_Process_Keys 20data.Cluster_Machine_Process_Owner_Instances 21data.Cluster_Machine_Process_Sightings 22data.Cluster_Machine_Process_UnstableSamples 23… There are two things I want to draw your attention to: The table names describe a hierarchy of the different types of object that are monitored by SQL Monitor (e.g. clusters, machines and disks). For each object type in the hierarchy, there are multiple tables, ending in the suffixes _Keys, _Sightings, _StableSamples and _UnstableSamples. Not every object type has a table for every suffix, but the _Keys suffix is especially important and a _Keys table does indeed exist for every object type. In fact, if we limit the query to return only those tables ending in _Keys, we reveal the full object hierarchy: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' AND obj.name LIKE '%_Keys' ORDER BY sch.name, obj.name;  name 1data.Cluster_Keys 2data.Cluster_Machine_Keys 3data.Cluster_Machine_LogicalDisk_Keys 4data.Cluster_Machine_Network_Keys 5data.Cluster_Machine_Process_Keys 6data.Cluster_Machine_Services_Keys 7data.Cluster_ResourceGroup_Keys 8data.Cluster_ResourceGroup_Resource_Keys 9data.Cluster_SqlServer_Agent_Job_History_Keys 10data.Cluster_SqlServer_Agent_Job_Keys 11data.Cluster_SqlServer_Database_BackupType_Backup_Keys 12data.Cluster_SqlServer_Database_BackupType_Keys 13data.Cluster_SqlServer_Database_CustomMetric_Keys 14data.Cluster_SqlServer_Database_File_Keys 15data.Cluster_SqlServer_Database_Keys 16data.Cluster_SqlServer_Database_Table_Index_Keys 17data.Cluster_SqlServer_Database_Table_Keys 18data.Cluster_SqlServer_Error_Keys 19data.Cluster_SqlServer_Keys 20data.Cluster_SqlServer_Services_Keys 21data.Cluster_SqlServer_SqlProcess_Keys 22data.Cluster_SqlServer_TopQueries_Keys 23data.Cluster_SqlServer_Trace_Keys 24data.Group_Keys The full object type hierarchy looks like this: Cluster Machine LogicalDisk Network Process Services ResourceGroup Resource SqlServer Agent Job History Database BackupType Backup CustomMetric File Table Index Error Services SqlProcess TopQueries Trace Group Okay, but what about the individual objects themselves represented at each level in this hierarchy? Well that’s what the _Keys tables are for. This is probably best illustrated by way of a simple example – how can I query my own data repository to find the databases on my own PC for which monitoring data has been collected? Like this: SELECT clstr._Name AS cluster_name, srvr._Name AS instance_name, db._Name AS database_name FROM data.Cluster_SqlServer_Database_Keys db JOIN data.Cluster_SqlServer_Keys srvr ON db.ParentId = srvr.Id -- Note here how the parent of a Database is a Server JOIN data.Cluster_Keys clstr ON srvr.ParentId = clstr.Id -- Note here how the parent of a Server is a Cluster WHERE clstr._Name = 'dev-chrisl2' -- This is the hostname of my own PC ORDER BY clstr._Name, srvr._Name, db._Name;  cluster_nameinstance_namedatabase_name 1dev-chrisl2SqlMonitorData 2dev-chrisl2master 3dev-chrisl2model 4dev-chrisl2msdb 5dev-chrisl2mssqlsystemresource 6dev-chrisl2tempdb 7dev-chrisl2sql2005SqlMonitorData 8dev-chrisl2sql2005TestDatabase 9dev-chrisl2sql2005master 10dev-chrisl2sql2005model 11dev-chrisl2sql2005msdb 12dev-chrisl2sql2005mssqlsystemresource 13dev-chrisl2sql2005tempdb 14dev-chrisl2sql2008SqlMonitorData 15dev-chrisl2sql2008master 16dev-chrisl2sql2008model 17dev-chrisl2sql2008msdb 18dev-chrisl2sql2008mssqlsystemresource 19dev-chrisl2sql2008tempdb These results show that I have three SQL Server instances on my machine (a default instance, one named sql2005 and one named sql2008), and each instance has the usual set of system databases, along with a database named SqlMonitorData. Basically, this is where I test SQL Monitor on different versions of SQL Server, when I’m developing. There are a few important things we can learn from this query: Each _Keys table has a column named Id. This is the primary key. Each _Keys table has a column named ParentId. A foreign key relationship is defined between each _Keys table and its parent _Keys table in the hierarchy. There are two exceptions to this, Cluster_Keys and Group_Keys, because clusters and groups live at the root level of the object hierarchy. Each _Keys table has a column named _Name. This is used to uniquely identify objects in the table within the scope of the same shared parent object. Actually, that last item isn’t always true. In some cases, the _Name column is actually called something else. For example, the data.Cluster_Machine_Services_Keys table has a column named _ServiceName instead of _Name (sorry for the inconsistency). In other cases, a name isn’t sufficient to uniquely identify an object. For example, right now my PC has multiple processes running, all sharing the same name, Chrome (one for each tab open in my web-browser). In such cases, multiple columns are used to uniquely identify an object within the scope of the same shared parent object. Well, that’s it for now. I’ve given you enough information for you to explore the _Keys tables to see how objects are stored in your own data repositories. In a future post, I’ll try to explain how monitoring data is stored for each object, using the _StableSamples and _UnstableSamples tables. If you have any questions about this post, or suggestions for future posts, just submit them in the comments section below.

    Read the article

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >