Search Results

Search found 4593 results on 184 pages for 'operator equal'.

Page 104/184 | < Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111  | Next Page >

  • How to cut the line between quality and time?

    - by m3th0dman
    On one hand, I have been taught by various software engineering books ([1] as example) that my job as a programmer is to make the best possible software: great design, flexibility, to be easily maintained etc. One the other hand although I realize that I actually write software for money and not for entertainment, although is very nice to write good code and plan ahead and refactor after writing and ... I wonder if it is always best for the business (after all we should be responsible). Is the business always benefiting from a best code? Maybe I'm over-engineering something, and it's not always useful? So how should I know when to stop in the process to achieving the best possible code? I am sure that experience is something that makes a difference here, but I believe this cannot be the only answer. [1] Uncle Bob's in Clean Code says at page 6 about the fact that: They [managers] may defend the schedule and requirements with passion; but that’s their job. It’s your job to defend the code with equal passion.

    Read the article

  • Rotate a vector

    - by marc wellman
    I want my first-person camera to smoothly change its viewing direction from direction d1 to direction d2. The latter direction is indicated by a target position t2. So far I have implemented a rotation that works fine but the speed of the rotation slows down the closer the current direction gets to the desired one. This is what I want to avoid. Here are the two very simple methods I have written so far: // this method initiates the direction change and sets the parameter public void LookAt(Vector3 target) { _desiredDirection = target - _cameraPosition; _desiredDirection.Normalize(); _rotation = new Matrix(); _rotationAxis = Vector3.Cross(Direction, _desiredDirection); _isLooking = true; } // this method gets execute by the Update()-method if _isLooking flag is up. private void _lookingAt() { dist = Vector3.Distance(Direction, _desiredDirection); // check whether the current direction has reached the desired one. if (dist >= 0.00001f) { _rotationAxis = Vector3.Cross(Direction, _desiredDirection); _rotation = Matrix.CreateFromAxisAngle(_rotationAxis, MathHelper.ToRadians(1)); Direction = Vector3.TransformNormal(Direction, _rotation); } else { _onDirectionReached(); _isLooking = false; } } Again, rotation works fine; camera reaches its desired direction. But the speed is not equal over the course of movement - it slows down. How to achieve a rotation with constant speed ?

    Read the article

  • How do I produce "enjoyably" random, as opposed to pseudo-random?

    - by Hilton Campbell
    I'm making a game which presents a number of different kinds of puzzles in sequence. I choose each puzzle with a pseudorandom number. For each puzzle, there are a number of variations. I choose the variation with another pseudorandom number. And so on. The thing is, while this produces near-true randomness, this isn't what the player really wants. The player typically wants what they perceive to be and identify as random, but only if it doesn't tend to repeat puzzles. So, not really random. Just unpredictable. Giving it some thought, I can imagine hacky ways of doing it. For example, temporarily eliminating the most recent N choices from the set of possibilities when selecting a new choice. Or assigning every choice an equal probability, reducing a choice's probability to zero on selection, and then increasing all probabilities slowly with each selection. I assume there's an established way of doing this, but I just don't know the terminology so I can't find it. Anyone know? Or has anyone solved this in a pleasing way?

    Read the article

  • Using C++ but not using the language's specific features, should switch to C?

    - by Petruza
    I'm developing a NES emulator as a hobby, in my free time. I use C++ because is the language I use mostly, know mostly and like mostly. But now that I made some advance into the project I realize I'm not using almost any specific features of C++, and could have done it in plain C and getting the same result. I don't use templates, operator overloading, polymorphism, inheritance. So what would you say? should I stay in C++ or rewrite it in C? I won't do this to gain in performance, it could come as a side effect, but the idea is why should I use C++ if I don't need it? The only features of C++ I'm using is classes to encapsulate data and methods, but that can be done as well with structs and functions, I'm using new and delete, but could as well use malloc and free, and I'm using inheritance just for callbacks, which could be achieved with pointers to functions. Remember, it's a hobby project, I have no deadlines, so the overhead time and work that would require a re-write are not a problem, might be fun as well. So, the question is C or C++?

    Read the article

  • Cheap ways to do scaling ops in shader?

    - by Nick Wiggill
    I've got an extensive world terrain that uses vec3 for the vertex position attribute. That's good, because the terrain has endless gradations due to the use of floating point. But I'm thinking about how to reduce the amount of data uploaded to the GPU. For my terrain, which uses discrete / grid-based vertex positions in x and z, it's pretty clear that I can replace my vec3s (floats, really) with shorts, halving the per-vertex position attribute cost from 12 bytes each to 6 bytes. Considering I've got little enough other vertex data, and an enormous amount of terrain data to push into the world, it's a major gain. Currently in my code, one unit in GLSL shaders is equal to 1m in the world. I like that scale. If I move over to using shorts, though, I won't be able to use the same scale, as I would then have a very blocky world where every step in height is an entire metre. So I see these potential solutions to scale the positional data correctly once it arrives at the vertex shader stage: Use 10:1 scaling, i.e. 1 short unit = 1 decimetre in CPU-side code. Do a division by 10 in the vertex shader to scale incoming decimetre values back to metres. Arbirary (non-PoT) divisions tend to be slow, however. Use (some-power-of-two):1 scaling (eg. 8:1), which enables the use of a bitshift (eg. val >> 3) to do the division... not sure how performant this is in shaders, though. Not as intuitive to read values, but possibly quite a bit faster than div by a non-PoT value. Use a texture as lookup table. I've heard that this is really fast. Or whatever solutions others can offer to achieve the same results -- minimal vertex data with sensible scaling.

    Read the article

  • What is causing this behavior with the movement of Pong Ball in 2D? [closed]

    - by thegermanpole
    //edit after running it through the debugger it turned out i had the display function set to x,x...TIL how to use a debugger I've been trying to teach myself C++ SDL with the lazyfoo tutorial and seem to have run into a roadblock. The code below is the movement function of my Dot class, which controls the ball. The ball seems to ignore yvel and moves with xvel to the bottom right. The code should be pretty readable, the rest of the relevant facts are: All variables are names Constants are in caps dotrad is the radius of my dot yvel and xvel are set to 5 in the constructor The dot is created at x and y equal to 100 When I comment out the x movement block it doesn't move, but if i comment out the y movement block, it keeps on going down to the right. void Dot::move() { if(((y+yvel+dotrad) <= SCREEN_HEIGHT) && (0 <= (y-dotrad+yvel))) { y+=yvel; } else { yvel = -1*yvel; } if(((x+xvel+dotrad) <= SCREEN_WIDTH) && (0 <= (x-dotrad+xvel))) { x +=xvel; } else { xvel = -1*xvel; } }

    Read the article

  • What language available on commodity web hosts would suit a C# developer? [closed]

    - by billpg
    Recognising its ubiquity on commodity web hosting services, I tried developing in PHP a few years ago. I really didn't like it, later deciding that life was too short for PHP. (In brief, having to put $ on variable names; mis-spelt variable names become new variables; converting non-numeric strings to integers without complaint; the need for an "and this time I mean it" comparison operator.) In my ideal world, commodity web hosts would all support C#/ASP.NET, my preferred web-development language and framework, but this is not my ideal world. Even Mono has barely made a dent on Linux based hosts. However, last time I moaned about PHP's ubiquity, someone followed up that this was no longer the case, and that many other languages are now commonly usable on web hosts too. What programming language; a. Would suit a developer who prefers C#. b. Is available to run on many web hosts.

    Read the article

  • Clustering Strings on the basis of Common Substrings

    - by pk188
    I have around 10000+ strings and have to identify and group all the strings which looks similar(I base the similarity on the number of common words between any two give strings). The more number of common words, more similar the strings would be. For instance: How to make another layer from an existing layer Unable to edit data on the network drive Existing layers in the desktop Assistance with network drive In this case, the strings 1 and 3 are similar with common words Existing, Layer and 2 and 4 are similar with common words Network Drive(eliminating stop word) The steps I'm following are: Iterate through the data set Do a row by row comparison Find the common words between the strings Form a cluster where number of common words is greater than or equal to 2(eliminating stop words) If number of common words<2, put the string in a new cluster. Assign the rows either to the existing clusters or form a new one depending upon the common words Continue until all the strings are processed I am implementing the project in C#, and have got till step 3. However, I'm not sure how to proceed with the clustering. I have researched a lot about string clustering but could not find any solution that fits my problem. Your inputs would be highly appreciated.

    Read the article

  • F# Objects &ndash; Part 3 &ndash; it&rsquo;s time to overload&hellip;

    - by MarkPearl
    Okay, some basic examples of overloading in F# Overloading Constructors Assume you have a F# object called person… type Person (firstname : string, lastname : string) = member v.Fullname = firstname + " " + lastname   This only has one constructor. To add additional constructors to the object by explicitly declaring them using the method member new. type Person (firstname : string, lastname : string) = new () = Person("Unknown", "Unknown") member v.Fullname = firstname + " " + lastname   In the code above I added another constructor to the Person object that takes no parameters and then refers to the primary constructor. Using the same technique in the code below I have created another constructor that accepts only the firstname as a parameter to create an object. type Person (firstname : string, lastname : string) = new () = Person("Unknown", "Unknown") new (firstname : string) = Person(firstname, "Unknown") member v.Fullname = firstname + " " + lastname   Overloading Operators So, you can overload operators of objects in F# as well… let’s look at example code… type Person(name : string) = member v.name = name static member (+) (person1 : Person , person2 : Person) = Person(person1.name + " " + person2.name)   In the code above we have overloaded the “+” operator. Whenever we add to Person objects together, it will now create a new object with the combined names…

    Read the article

  • Finding maximum number of congruent numbers

    - by Stefan Czarnecki
    Let's say we have a multiset (set with possible duplicates) of integers. We would like to find the size of the largest subset of the multiset such that all numbers in the subset are congruent to each other modulo some m 1. For example: 1 4 7 7 8 10 for m = 2 the subsets are: (1, 7, 7) and (4, 8, 10), both having size 3. for m = 3 the subsets are: (1, 4, 7, 7, 10) and (8), the larger set of size 5. for m = 4 the subsets are: (1), (4, 8), (7, 7), (10), the largest set of size 2. At this moment it is evident that the best answer is 5 for m = 3. Given m we can find the size of the largest subset in linear time. Because the answer is always equal or larger than half of the size of the set, it is enough to check for values of m upto median of the set. Also I noticed it is necessary to check for only prime values of m. However if values in the set are large the algorithm is still rather slow. Does anyone have any ideas how to improve it?

    Read the article

  • How can I malloc an array of struct to do the following using file operation?

    - by RHN
    How can malloc an array of struct to do the following using file operation? the file is .txt input in the file is like: 10 22 3.3 33 4.4 I want to read the first line from the file and then I want to malloc an array of input structures equal to the number of lines to be read in from the file. Then I want to read in the data from the file and into the array of structures malloc. Later on I want to store the size of the array into the input variable size. return an array.After this I want to create another function that print out the data in the input variable in the same form like input file and suppose a function call clean_data will free the malloc memory at the end. I have tried somthing like: struct input { int a; float b,c; } struct input* readData(char *filename,int *size); int main() { return 0; } struct input* readData(char *filename,int *size) { char filename[] = "input.txt"; FILE *fp = fopen(filename, "r"); int num; while(!feof(fp)) { fscanf(fp,"%f", &num); struct input *arr = (struct input*)malloc(sizeof(struct input)); } }

    Read the article

  • Can't start system due to mdadm failing

    - by user101212
    I used to have a 5-disk RAID5 partition, all working very well. I have then decided to add 3 more disks on it, totaling 8 equal disks. I've opened Webmin and just asked to add the disks. Then I've realized the three disks had NTFS partitions, wich mdadm didn't complain, so I tried to stop the growing to remove the Windows partitions. I've tried to remove a disk using the same Webmin, but (as you might guess and call me fool...), the system became unstable. By restarting the system, I've started receiving these messages: "udev[126]: timeout: killing '/sbin/mdadm --incremental /dev/sdh1' [311]" "udev[124]: timeout: killing '/sbin/mdadm --detail --export /dev/md0' [316]" I've formated the system disk, hoping to get a system up and running. I did that with all RAID disks disconected, so everything was fine. I then reconnected the disks, wich was also ok. And finally installed mdadm using apt-get. By reboot, the system has found the mdadm intention of growing the system, so the same messages appear again. I other words: I can't even reach a command prompt to do something. Any ideas of what to do? I believe I could turn off the system, disconnect the disks and look for the mdadm.conf file. Would that be a good idead? I'm no Linux expert, so I'm really lost here.

    Read the article

  • Testing my model for hybrid scheduling in Embedded Systems

    - by markusian
    I am working on a project for school, where I have to analyze the performances of a few fixed-priority servers algorithms (polling server, deferrable server, priority exchange) using a simulator in the case of hybrid scheduling, where we have both hard periodic tasks and soft aperiodic tasks. In my model I consider that: the hard tasks have a period equal to their deadline, with a known worst case execution time (wcet). The actual execution time could be smaller than the wcet. the soft tasks have a known wcet and random interarrival times. The actual execution time could be smaller than the wcet. In order to test those algorithms I need realistic case studies. For this reason I'm digging in the scientific literature but I am facing different problems: Sometimes I find a list of hard tasks with wcet, but it is not specified how the soft tasks parameters are found. Given the wcet of a task, how can I model its actual execution time? This means, what random distribution should I use considering the wcet? How can I model the random interarrival times of soft aperiodic tasks?

    Read the article

  • How to persuade C fanatics to work on my C++ open source project?

    - by paperjam
    I am launching an open-source project into a space where a lot of the development is still done Linux-kernel-style, i.e. C-language with a low-level mindset. There are multiple benefits to C++ in our space but I fear those used to working in C will be scared off. How can I make the case for the benefits of C++? Specifically, the following C++ attributes are very valuable: concept of objects and reference-counting pointers - really don't want to have to malloc(sizeof(X)) or memcpy() structs templates for specialising whole bodies of code with specific performance optimizations and for avoiding duplication of code. template metaprogramming related to the above syntactic sweetness available (e.g. operator overloading, to be used in very small doses) STL Boost libraries Many of the knee-jerk negative reactions to C++ are illfounded. Performance does not suffer: modern compilers can flatten dozens of call stack levels and avoid bloat through wide use of template specializations. Granted, when using metaprogramming and building multiple specializations of a large call tree, compile time is slower but there are ways to mitigate this. How can I sell C++?

    Read the article

  • OpenGL: Move camera regardless of rotation

    - by Markus
    For a 2D board game I'd like to move and rotate an orthogonal camera in coordinates given in a reference system (window space), but simply can't get it to work. The idea is that the user can drag the camera over a surface, rotate and scale it. Rotation and scaling should always be around the center of the current viewport. The camera is set up as: gl.glMatrixMode(GL2.GL_PROJECTION); gl.glLoadIdentity(); gl.glOrtho(-width/2, width/2, -height/2, height/2, nearPlane, farPlane); where width and height are equal to the viewport's width and height, so that 1 unit is one pixel when no zoom is applied. Since these transformations usually mean (scaling and) translating the world, then rotating it, the implementation is: gl.glMatrixMode(GL2.GL_MODELVIEW); gl.glLoadIdentity(); gl.glRotatef(rotation, 0, 0, 1); // e.g. 45° gl.glTranslatef(x, y, 0); // e.g. +10 for 10px right, -2 for 2px down gl.glScalef(zoomFactor, zoomFactor, zoomFactor); // e.g. scale by 1.5 That however has the nasty side effect that translations are transformed as well, that is applied in world coordinates. If I rotate around 90° and translate again, X and Y axis are swapped. If I reorder the transformations so they read gl.glTranslatef(x, y, 0); gl.glScalef(zoomFactor, zoomFactor, zoomFactor); gl.glRotatef(rotation, 0, 0, 1); the translation will be applied correctly (in reference space, so translation along x always visually moves the camera sideways) but rotation and scaling are now performed around origin. It shouldn't be too hard, so what is it I'm missing?

    Read the article

  • Can I dynamicaly size a div based on discreet units and still center? [closed]

    - by Dave
    Here's the problem: I have a website I'm working on that, depending on what the user has selected, will pop up a different number of boxes 80px high and 200px wide and currently set to float:left. These boxes are contained within a div that is basically the whole width of the screen minus some 1% margins. So at the moment they all fill in the box and, depending on screen size, occupy a grid of variable height and width. The problem is, if the screen size makes the containing box, say, 700px wide then you end up with 3 boxes per row and a bloody big margin on the right. What I would like to do is center the grid of boxes inside the containing box so that the margins are equal left and right. I suspect this can't be done since it means the containing box needs to set its size by looking both at the size of the user's window as well as the size of its children. It would be easy to do with javascript but I'd prefer not to if that is an option. If it is truly impossible then I will simply script it and let non-js users see a left-justified set of boxes.

    Read the article

  • [LINQ] Master&ndash;Detail Same Record (I)

    - by JTorrecilla
    PROBLEM Firstly, I am working on a project based on LINQ, EF, and C# with VS2010. The following Table shows what I have and what I want to show. Header C1 C2 C3 1 P1 01/01/2011 2 P2 01/02/2011 Details 1 1 D1 2 1 D2 3 1 D3 4 2 D1 5 2 D4 Expected Results 1 P1 01/01/2011 D1, D2, D3 2 P2 01/02/2011 D1,D4   IDEAS At the begin I got 3 possible ways: - Doing inside the DB:  It could be achieved from DB with a CURSOR in a Stored Procedure. - Doing from .NET with LOOPS. - Doing with LINQ (I love it!!) FIRST APROX Example with a simple CLASS with a LIST: With and Employee Class that acts as Header Table: 1: public class Employee 2: { 3: public Employee () { } 4: public Int32 ID { get; set; } 5: public String FirstName{ get; set; } 6: public String LastName{ get; set; } 7: public List<string> Numbers{ get; set; } // Acts as Details Table 8: } We can show all numbers contained by Employee: 1: List<Employee > listado = new List<Employee >(); 2: //Fill Listado 3: var query= from Employee em in listado 4: let Nums= string.Join(";", em.Numbers) 5: select new { 6: em.Name, 7: Nums 8: }; The “LET” operator allows us to host the results of “Join” of the Numbers List of the Employee Class. A little example. ASAP I will post the second part to achieve the same with Entity Framework. Best Regards

    Read the article

  • Customer Highlight: NTT DOCOMO

    - by jeckels
    NTT DOCOMO is the largest mobile operator in Japan, and serves over 13 million smartphone customers. Due to their growing data processing and scalability needs, they turned to Oracle's Cloud Application Foundation products for an integral soultion. At Oracle OpenWorld 2012, we first showcased NTT DOCOMO as a customer who was utilizing Oracle Coherence to process mobile data at a rate of 700,000 events per second (and then using Hadoop for distributed processing of big data). Overall, this Led to a 50% cost reduction due to the ultra-high velocity traffic processing of their customers' events. Recently, on October 7th, 2013, Oracle and NTT DOCOMO were proud to again announce a partnership around another key component of Oracle CAF: WebLogic Server. WebLogic was recently deployed as the application platform of choice to run DOCOMO's mission-critical data system ALADIN, which connects nationwide shops and information centers. ALADIN, which also utilizes Oracle Database and Oracle Tuxedo, is based on Java Platform, Enterprise Edition (Java EE), which has allowed the company to operate smoothly while minimizing additional development and modification associated with the migration of application server products. We look forward to continuing to partner with NTT DOCOMO, and are proud that Oracle Cloud Application Foundation products are providing the mission-critical solutions - at scale - that DOCOMO requires. Want to learn more about how CAF products are working in the real world? Join us for a FREE Virtual Developer Day on November 5th from 9am-1pm Pacific Time!REGISTER NOW

    Read the article

  • Should I amortize scripting cost via bytecode analysis or multithreading?

    - by user18983
    I'm working on a game sort of thing where users can write arbitrary code for individual agents, and I'm trying to decide the best way to divide up computation time. The simplest option would be to give each agent a set amount of time and skip their turn if it elapses without an action being decided upon, but I would like people to be able to write their agents decision functions without having to think too much about how long its taking unless they really want to. The two approaches I'm considering are giving each agent a set number of bytecode instructions (taking cost into account) each timestep, and making players deal with the consequences of the game state changing between blocks of computation (as with Battlecode) or giving each agent it's own thread and giving each thread equal time on the processor. I'm about equally knowledgeable on both concurrency and bytecode stuff, which is to say not very, so I'm wondering which approach would be best. I have a clearer idea of how I'd structure things if I used bytecode, but less certainty about how to actually implement the analysis. I'm pretty sure I can work up a concurrency based system without much trouble, but I worry it will be messier with more overhead and will add unnecessary complexity to the project.

    Read the article

  • Inheritance Mapping Strategies with Entity Framework Code First CTP5 Part 1: Table per Hierarchy (TPH)

    - by mortezam
    A simple strategy for mapping classes to database tables might be “one table for every entity persistent class.” This approach sounds simple enough and, indeed, works well until we encounter inheritance. Inheritance is such a visible structural mismatch between the object-oriented and relational worlds because object-oriented systems model both “is a” and “has a” relationships. SQL-based models provide only "has a" relationships between entities; SQL database management systems don’t support type inheritance—and even when it’s available, it’s usually proprietary or incomplete. There are three different approaches to representing an inheritance hierarchy: Table per Hierarchy (TPH): Enable polymorphism by denormalizing the SQL schema, and utilize a type discriminator column that holds type information. Table per Type (TPT): Represent "is a" (inheritance) relationships as "has a" (foreign key) relationships. Table per Concrete class (TPC): Discard polymorphism and inheritance relationships completely from the SQL schema.I will explain each of these strategies in a series of posts and this one is dedicated to TPH. In this series we'll deeply dig into each of these strategies and will learn about "why" to choose them as well as "how" to implement them. Hopefully it will give you a better idea about which strategy to choose in a particular scenario. Inheritance Mapping with Entity Framework Code FirstAll of the inheritance mapping strategies that we discuss in this series will be implemented by EF Code First CTP5. The CTP5 build of the new EF Code First library has been released by ADO.NET team earlier this month. EF Code-First enables a pretty powerful code-centric development workflow for working with data. I’m a big fan of the EF Code First approach, and I’m pretty excited about a lot of productivity and power that it brings. When it comes to inheritance mapping, not only Code First fully supports all the strategies but also gives you ultimate flexibility to work with domain models that involves inheritance. The fluent API for inheritance mapping in CTP5 has been improved a lot and now it's more intuitive and concise in compare to CTP4. A Note For Those Who Follow Other Entity Framework ApproachesIf you are following EF's "Database First" or "Model First" approaches, I still recommend to read this series since although the implementation is Code First specific but the explanations around each of the strategies is perfectly applied to all approaches be it Code First or others. A Note For Those Who are New to Entity Framework and Code-FirstIf you choose to learn EF you've chosen well. If you choose to learn EF with Code First you've done even better. To get started, you can find a great walkthrough by Scott Guthrie here and another one by ADO.NET team here. In this post, I assume you already setup your machine to do Code First development and also that you are familiar with Code First fundamentals and basic concepts. You might also want to check out my other posts on EF Code First like Complex Types and Shared Primary Key Associations. A Top Down Development ScenarioThese posts take a top-down approach; it assumes that you’re starting with a domain model and trying to derive a new SQL schema. Therefore, we start with an existing domain model, implement it in C# and then let Code First create the database schema for us. However, the mapping strategies described are just as relevant if you’re working bottom up, starting with existing database tables. I’ll show some tricks along the way that help you dealing with nonperfect table layouts. Let’s start with the mapping of entity inheritance. -- The Domain ModelIn our domain model, we have a BillingDetail base class which is abstract (note the italic font on the UML class diagram below). We do allow various billing types and represent them as subclasses of BillingDetail class. As for now, we support CreditCard and BankAccount: Implement the Object Model with Code First As always, we start with the POCO classes. Note that in our DbContext, I only define one DbSet for the base class which is BillingDetail. Code First will find the other classes in the hierarchy based on Reachability Convention. public abstract class BillingDetail  {     public int BillingDetailId { get; set; }     public string Owner { get; set; }             public string Number { get; set; } } public class BankAccount : BillingDetail {     public string BankName { get; set; }     public string Swift { get; set; } } public class CreditCard : BillingDetail {     public int CardType { get; set; }                     public string ExpiryMonth { get; set; }     public string ExpiryYear { get; set; } } public class InheritanceMappingContext : DbContext {     public DbSet<BillingDetail> BillingDetails { get; set; } } This object model is all that is needed to enable inheritance with Code First. If you put this in your application you would be able to immediately start working with the database and do CRUD operations. Before going into details about how EF Code First maps this object model to the database, we need to learn about one of the core concepts of inheritance mapping: polymorphic and non-polymorphic queries. Polymorphic Queries LINQ to Entities and EntitySQL, as object-oriented query languages, both support polymorphic queries—that is, queries for instances of a class and all instances of its subclasses, respectively. For example, consider the following query: IQueryable<BillingDetail> linqQuery = from b in context.BillingDetails select b; List<BillingDetail> billingDetails = linqQuery.ToList(); Or the same query in EntitySQL: string eSqlQuery = @"SELECT VAlUE b FROM BillingDetails AS b"; ObjectQuery<BillingDetail> objectQuery = ((IObjectContextAdapter)context).ObjectContext                                                                          .CreateQuery<BillingDetail>(eSqlQuery); List<BillingDetail> billingDetails = objectQuery.ToList(); linqQuery and eSqlQuery are both polymorphic and return a list of objects of the type BillingDetail, which is an abstract class but the actual concrete objects in the list are of the subtypes of BillingDetail: CreditCard and BankAccount. Non-polymorphic QueriesAll LINQ to Entities and EntitySQL queries are polymorphic which return not only instances of the specific entity class to which it refers, but all subclasses of that class as well. On the other hand, Non-polymorphic queries are queries whose polymorphism is restricted and only returns instances of a particular subclass. In LINQ to Entities, this can be specified by using OfType<T>() Method. For example, the following query returns only instances of BankAccount: IQueryable<BankAccount> query = from b in context.BillingDetails.OfType<BankAccount>() select b; EntitySQL has OFTYPE operator that does the same thing: string eSqlQuery = @"SELECT VAlUE b FROM OFTYPE(BillingDetails, Model.BankAccount) AS b"; In fact, the above query with OFTYPE operator is a short form of the following query expression that uses TREAT and IS OF operators: string eSqlQuery = @"SELECT VAlUE TREAT(b as Model.BankAccount)                       FROM BillingDetails AS b                       WHERE b IS OF(Model.BankAccount)"; (Note that in the above query, Model.BankAccount is the fully qualified name for BankAccount class. You need to change "Model" with your own namespace name.) Table per Class Hierarchy (TPH)An entire class hierarchy can be mapped to a single table. This table includes columns for all properties of all classes in the hierarchy. The concrete subclass represented by a particular row is identified by the value of a type discriminator column. You don’t have to do anything special in Code First to enable TPH. It's the default inheritance mapping strategy: This mapping strategy is a winner in terms of both performance and simplicity. It’s the best-performing way to represent polymorphism—both polymorphic and nonpolymorphic queries perform well—and it’s even easy to implement by hand. Ad-hoc reporting is possible without complex joins or unions. Schema evolution is straightforward. Discriminator Column As you can see in the DB schema above, Code First has to add a special column to distinguish between persistent classes: the discriminator. This isn’t a property of the persistent class in our object model; it’s used internally by EF Code First. By default, the column name is "Discriminator", and its type is string. The values defaults to the persistent class names —in this case, “BankAccount” or “CreditCard”. EF Code First automatically sets and retrieves the discriminator values. TPH Requires Properties in SubClasses to be Nullable in the Database TPH has one major problem: Columns for properties declared by subclasses will be nullable in the database. For example, Code First created an (INT, NULL) column to map CardType property in CreditCard class. However, in a typical mapping scenario, Code First always creates an (INT, NOT NULL) column in the database for an int property in persistent class. But in this case, since BankAccount instance won’t have a CardType property, the CardType field must be NULL for that row so Code First creates an (INT, NULL) instead. If your subclasses each define several non-nullable properties, the loss of NOT NULL constraints may be a serious problem from the point of view of data integrity. TPH Violates the Third Normal FormAnother important issue is normalization. We’ve created functional dependencies between nonkey columns, violating the third normal form. Basically, the value of Discriminator column determines the corresponding values of the columns that belong to the subclasses (e.g. BankName) but Discriminator is not part of the primary key for the table. As always, denormalization for performance can be misleading, because it sacrifices long-term stability, maintainability, and the integrity of data for immediate gains that may be also achieved by proper optimization of the SQL execution plans (in other words, ask your DBA). Generated SQL QueryLet's take a look at the SQL statements that EF Code First sends to the database when we write queries in LINQ to Entities or EntitySQL. For example, the polymorphic query for BillingDetails that you saw, generates the following SQL statement: SELECT  [Extent1].[Discriminator] AS [Discriminator],  [Extent1].[BillingDetailId] AS [BillingDetailId],  [Extent1].[Owner] AS [Owner],  [Extent1].[Number] AS [Number],  [Extent1].[BankName] AS [BankName],  [Extent1].[Swift] AS [Swift],  [Extent1].[CardType] AS [CardType],  [Extent1].[ExpiryMonth] AS [ExpiryMonth],  [Extent1].[ExpiryYear] AS [ExpiryYear] FROM [dbo].[BillingDetails] AS [Extent1] WHERE [Extent1].[Discriminator] IN ('BankAccount','CreditCard') Or the non-polymorphic query for the BankAccount subclass generates this SQL statement: SELECT  [Extent1].[BillingDetailId] AS [BillingDetailId],  [Extent1].[Owner] AS [Owner],  [Extent1].[Number] AS [Number],  [Extent1].[BankName] AS [BankName],  [Extent1].[Swift] AS [Swift] FROM [dbo].[BillingDetails] AS [Extent1] WHERE [Extent1].[Discriminator] = 'BankAccount' Note how Code First adds a restriction on the discriminator column and also how it only selects those columns that belong to BankAccount entity. Change Discriminator Column Data Type and Values With Fluent API Sometimes, especially in legacy schemas, you need to override the conventions for the discriminator column so that Code First can work with the schema. The following fluent API code will change the discriminator column name to "BillingDetailType" and the values to "BA" and "CC" for BankAccount and CreditCard respectively: protected override void OnModelCreating(System.Data.Entity.ModelConfiguration.ModelBuilder modelBuilder) {     modelBuilder.Entity<BillingDetail>()                 .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue("BA"))                 .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue("CC")); } Also, changing the data type of discriminator column is interesting. In the above code, we passed strings to HasValue method but this method has been defined to accepts a type of object: public void HasValue(object value); Therefore, if for example we pass a value of type int to it then Code First not only use our desired values (i.e. 1 & 2) in the discriminator column but also changes the column type to be (INT, NOT NULL): modelBuilder.Entity<BillingDetail>()             .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue(1))             .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue(2)); SummaryIn this post we learned about Table per Hierarchy as the default mapping strategy in Code First. The disadvantages of the TPH strategy may be too serious for your design—after all, denormalized schemas can become a major burden in the long run. Your DBA may not like it at all. In the next post, we will learn about Table per Type (TPT) strategy that doesn’t expose you to this problem. References ADO.NET team blog Java Persistence with Hibernate book a { text-decoration: none; } a:visited { color: Blue; } .title { padding-bottom: 5px; font-family: Segoe UI; font-size: 11pt; font-weight: bold; padding-top: 15px; } .code, .typeName { font-family: consolas; } .typeName { color: #2b91af; } .padTop5 { padding-top: 5px; } .padTop10 { padding-top: 10px; } p.MsoNormal { margin-top: 0in; margin-right: 0in; margin-bottom: 10.0pt; margin-left: 0in; line-height: 115%; font-size: 11.0pt; font-family: "Calibri" , "sans-serif"; }

    Read the article

  • How to add new filters to CAML queries in SharePoint 2007

    - by uruit
    Normal 0 21 false false false ES-UY X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} One flexibility SharePoint has is CAML (Collaborative Application Markup Language). CAML it’s a markup language like html that allows developers to do queries against SharePoint lists, it’s syntax is very easy to understand and it allows to add logical conditions like Where, Contains, And, Or, etc, just like a SQL Query. For one of our projects we have the need to do a filter on SharePoint views, the problem here is that the view it’s a list containing a CAML Query with the filters the view may have, so in order to filter the view that’s already been filtered before, we need to append our filters to the existing CAML Query. That’s not a trivial task because the where statement in a CAML Query it’s like this: <Where>   <And>     <Filter1 />     <Filter2 />   </And> </Where> If we want to add a new logical operator, like an OR it’s not just as simple as to append the OR expression like the following example: <Where>   <And>     <Filter1 />     <Filter2 />   </And>   <Or>     <Filter3 />   </Or> </Where> But instead the correct query would be: <Where>   <Or>     <And>       <Filter1 />       <Filter2 />     </And>     <Filter3 />   </Or> </Where> Notice that the <Filter# /> tags are for explanation purpose only. In order to solve this problem we created a simple component, it has a method that receives the current query (could be an empty query also) and appends the expression you want to that query. Example: string currentQuery = @“ <Where>    <And>     <Contains><FieldRef Name='Title' /><Value Type='Text'>A</Value></Contains>     <Contains><FieldRef Name='Title' /><Value Type='Text'>B</Value></Contains>   </And> </Where>”; currentQuery = CAMLQueryBuilder.AppendQuery(     currentQuery,     “<Contains><FieldRef Name='Title' /><Value Type='Text'>C</Value></Contains>”,     CAMLQueryBuilder.Operators.Or); The fist parameter this function receives it’s the actual query, the second it’s the filter you want to add, and the third it’s the logical operator, so basically in this query we want all the items that the title contains: the character A and B or the ones that contains the character C. The result query is: <Where>   <Or>      <And>       <Contains><FieldRef Name='Title' /><Value Type='Text'>A</Value></Contains>       <Contains><FieldRef Name='Title' /><Value Type='Text'>B</Value></Contains>     </And>     <Contains><FieldRef Name='Title' /><Value Type='Text'>C</Value></Contains>   </Or> </Where>     The code:   First of all we have an enumerator inside the CAMLQueryBuilder class that has the two possible Options And, Or. public enum Operators { And, Or }   Then we have the main method that’s the one that performs the append of the filters. public static string AppendQuery(string containerQuery, string logicalExpression, Operators logicalOperator){   In this method the first we do is create a new XmlDocument and wrap the current query (that may be empty) with a “<Query></Query>” tag, because the query that comes with the view doesn’t have a root element and the XmlDocument must be a well formatted xml.   XmlDocument queryDoc = new XmlDocument(); queryDoc.LoadXml("<Query>" + containerQuery + "</Query>");   The next step is to create a new XmlDocument containing the logical expression that has the filter needed.   XmlDocument logicalExpressionDoc = new XmlDocument(); logicalExpressionDoc.LoadXml("<root>" + logicalExpression + "</root>"); In these next four lines we extract the expression from the recently created XmlDocument and create an XmlElement.                  XmlElement expressionElTemp = (XmlElement)logicalExpressionDoc.SelectSingleNode("/root/*"); XmlElement expressionEl = queryDoc.CreateElement(expressionElTemp.Name); expressionEl.InnerXml = expressionElTemp.InnerXml;   Below are the main steps in the component logic. The first “if” checks if the actual query doesn’t contains a “Where” clause. In case there’s no “Where” we add it and append the expression.   In case that there’s already a “Where” clause, we get the entire statement that’s inside the “Where” and reorder the query removing and appending elements to form the correct query, that will finally filter the list.   XmlElement whereEl; if (!containerQuery.Contains("Where")) { queryDoc.FirstChild.AppendChild(queryDoc.CreateElement("Where")); queryDoc.SelectSingleNode("/Query/Where").AppendChild(expressionEl); } else { whereEl = (XmlElement)queryDoc.SelectSingleNode("/Query/Where"); if (!containerQuery.Contains("<And>") &&                 !containerQuery.Contains("<Or>"))        {              XmlElement operatorEl = queryDoc.CreateElement(GetName(logicalOperator)); XmlElement existingExpression = (XmlElement)whereEl.SelectSingleNode("/Query/Where/*"); whereEl.RemoveChild(existingExpression);                 operatorEl.AppendChild(existingExpression);               operatorEl.AppendChild(expressionEl);                 whereEl.AppendChild(operatorEl);        }        else        {              XmlElement operatorEl = queryDoc.CreateElement(GetName(logicalOperator)); XmlElement existingOperator = (XmlElement)whereEl.SelectSingleNode("/Query/Where/*");                 whereEl.RemoveChild(existingOperator);               operatorEl.AppendChild(existingOperator);               operatorEl.AppendChild(expressionEl);                 whereEl.AppendChild(operatorEl);         }  }  return queryDoc.FirstChild.InnerXml }     Finally the GetName method converts the Enum option to his string equivalent.   private static string GetName(Operators logicalOperator) {       return Enum.GetName(typeof(Operators), logicalOperator); }        Normal 0 21 false false false ES-UY X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Normal 0 21 false false false ES-UY X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} This component helped our team a lot using SharePoint 2007 and modifying the queries, but now in SharePoint 2010; that wouldn’t be needed because of the incorporation of LINQ to SharePoint. This new feature enables the developers to do typed queries against SharePoint lists without the need of writing any CAML code.  But there is still much development to the 2007 version, so I hope this information is useful for other members.  Post Normal 0 21 false false false ES-UY X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} written by Sebastian Rodriguez - Portals and Collaboration Solutions @ UruIT

    Read the article

  • How to add new filters to CAML queries in SharePoint 2007

    - by uruit
      Normal 0 21 false false false ES-UY X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} One flexibility SharePoint has is CAML (Collaborative Application Markup Language). CAML it’s a markup language like html that allows developers to do queries against SharePoint lists, it’s syntax is very easy to understand and it allows to add logical conditions like Where, Contains, And, Or, etc, just like a SQL Query. For one of our projects we have the need to do a filter on SharePoint views, the problem here is that the view it’s a list containing a CAML Query with the filters the view may have, so in order to filter the view that’s already been filtered before, we need to append our filters to the existing CAML Query. That’s not a trivial task because the where statement in a CAML Query it’s like this: <Where>   <And>     <Filter1 />     <Filter2 />   </And> </Where> If we want to add a new logical operator, like an OR it’s not just as simple as to append the OR expression like the following example: <Where>   <And>     <Filter1 />     <Filter2 />   </And>   <Or>     <Filter3 />   </Or> </Where> But instead the correct query would be: <Where>   <Or>     <And>       <Filter1 />       <Filter2 />     </And>     <Filter3 />   </Or> </Where> Notice that the <Filter# /> tags are for explanation purpose only. In order to solve this problem we created a simple component, it has a method that receives the current query (could be an empty query also) and appends the expression you want to that query. Example: string currentQuery = @“ <Where>    <And>     <Contains><FieldRef Name='Title' /><Value Type='Text'>A</Value></Contains>     <Contains><FieldRef Name='Title' /><Value Type='Text'>B</Value></Contains>   </And> </Where>”; currentQuery = CAMLQueryBuilder.AppendQuery(     currentQuery,     “<Contains><FieldRef Name='Title' /><Value Type='Text'>C</Value></Contains>”,     CAMLQueryBuilder.Operators.Or); The fist parameter this function receives it’s the actual query, the second it’s the filter you want to add, and the third it’s the logical operator, so basically in this query we want all the items that the title contains: the character A and B or the ones that contains the character C. The result query is: <Where>   <Or>      <And>       <Contains><FieldRef Name='Title' /><Value Type='Text'>A</Value></Contains>       <Contains><FieldRef Name='Title' /><Value Type='Text'>B</Value></Contains>     </And>     <Contains><FieldRef Name='Title' /><Value Type='Text'>C</Value></Contains>   </Or> </Where>             The code:   First of all we have an enumerator inside the CAMLQueryBuilder class that has the two possible Options And, Or. public enum Operators { And, Or }   Then we have the main method that’s the one that performs the append of the filters. public static string AppendQuery(string containerQuery, string logicalExpression, Operators logicalOperator){   In this method the first we do is create a new XmlDocument and wrap the current query (that may be empty) with a “<Query></Query>” tag, because the query that comes with the view doesn’t have a root element and the XmlDocument must be a well formatted xml.   XmlDocument queryDoc = new XmlDocument(); queryDoc.LoadXml("<Query>" + containerQuery + "</Query>");   The next step is to create a new XmlDocument containing the logical expression that has the filter needed.   XmlDocument logicalExpressionDoc = new XmlDocument(); logicalExpressionDoc.LoadXml("<root>" + logicalExpression + "</root>"); In these next four lines we extract the expression from the recently created XmlDocument and create an XmlElement.                  XmlElement expressionElTemp = (XmlElement)logicalExpressionDoc.SelectSingleNode("/root/*"); XmlElement expressionEl = queryDoc.CreateElement(expressionElTemp.Name); expressionEl.InnerXml = expressionElTemp.InnerXml;   Below are the main steps in the component logic. The first “if” checks if the actual query doesn’t contains a “Where” clause. In case there’s no “Where” we add it and append the expression.   In case that there’s already a “Where” clause, we get the entire statement that’s inside the “Where” and reorder the query removing and appending elements to form the correct query, that will finally filter the list.   XmlElement whereEl; if (!containerQuery.Contains("Where")) { queryDoc.FirstChild.AppendChild(queryDoc.CreateElement("Where")); queryDoc.SelectSingleNode("/Query/Where").AppendChild(expressionEl); } else { whereEl = (XmlElement)queryDoc.SelectSingleNode("/Query/Where"); if (!containerQuery.Contains("<And>") &&                 !containerQuery.Contains("<Or>"))        {              XmlElement operatorEl = queryDoc.CreateElement(GetName(logicalOperator)); XmlElement existingExpression = (XmlElement)whereEl.SelectSingleNode("/Query/Where/*"); whereEl.RemoveChild(existingExpression);                 operatorEl.AppendChild(existingExpression);               operatorEl.AppendChild(expressionEl);                 whereEl.AppendChild(operatorEl);        }        else        {              XmlElement operatorEl = queryDoc.CreateElement(GetName(logicalOperator)); XmlElement existingOperator = (XmlElement)whereEl.SelectSingleNode("/Query/Where/*");                 whereEl.RemoveChild(existingOperator);               operatorEl.AppendChild(existingOperator);               operatorEl.AppendChild(expressionEl);                 whereEl.AppendChild(operatorEl);         }  }  return queryDoc.FirstChild.InnerXml }     Finally the GetName method converts the Enum option to his string equivalent.   private static string GetName(Operators logicalOperator) {       return Enum.GetName(typeof(Operators), logicalOperator); }        This component helped our team a lot using SharePoint 2007 and modifying the queries, but now in SharePoint 2010; that wouldn’t be needed because of the incorporation of LINQ to SharePoint. This new feature enables the developers to do typed queries against SharePoint lists without the need of writing any CAML code.   Normal 0 21 false false false ES-UY X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} Post written by Sebastian Rodriguez - Portals and Collaboration Solutions @ UruIT  

    Read the article

  • C#/.NET Little Wonders: Interlocked CompareExchange()

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Two posts ago, I discussed the Interlocked Add(), Increment(), and Decrement() methods (here) for adding and subtracting values in a thread-safe, lightweight manner.  Then, last post I talked about the Interlocked Read() and Exchange() methods (here) for safely and efficiently reading and setting 32 or 64 bit values (or references).  This week, we’ll round out the discussion by talking about the Interlocked CompareExchange() method and how it can be put to use to exchange a value if the current value is what you expected it to be. Dirty reads can lead to bad results Many of the uses of Interlocked that we’ve explored so far have centered around either reading, setting, or adding values.  But what happens if you want to do something more complex such as setting a value based on the previous value in some manner? Perhaps you were creating an application that reads a current balance, applies a deposit, and then saves the new modified balance, where of course you’d want that to happen atomically.  If you read the balance, then go to save the new balance and between that time the previous balance has already changed, you’ll have an issue!  Think about it, if we read the current balance as $400, and we are applying a new deposit of $50.75, but meanwhile someone else deposits $200 and sets the total to $600, but then we write a total of $450.75 we’ve lost $200! Now, certainly for int and long values we can use Interlocked.Add() to handles these cases, and it works well for that.  But what if we want to work with doubles, for example?  Let’s say we wanted to add the numbers from 0 to 99,999 in parallel.  We could do this by spawning several parallel tasks to continuously add to a total: 1: double total = 0; 2:  3: Parallel.For(0, 10000, next => 4: { 5: total += next; 6: }); Were this run on one thread using a standard for loop, we’d expect an answer of 4,999,950,000 (the sum of all numbers from 0 to 99,999).  But when we run this in parallel as written above, we’ll likely get something far off.  The result of one of my runs, for example, was 1,281,880,740.  That is way off!  If this were banking software we’d be in big trouble with our clients.  So what happened?  The += operator is not atomic, it will read in the current value, add the result, then store it back into the total.  At any point in all of this another thread could read a “dirty” current total and accidentally “skip” our add.   So, to clean this up, we could use a lock to guarantee concurrency: 1: double total = 0.0; 2: object locker = new object(); 3:  4: Parallel.For(0, count, next => 5: { 6: lock (locker) 7: { 8: total += next; 9: } 10: }); Which will give us the correct result of 4,999,950,000.  One thing to note is that locking can be heavy, especially if the operation being locked over is trivial, or the life of the lock is a high percentage of the work being performed concurrently.  In the case above, the lock consumes pretty much all of the time of each parallel task – and the task being locked on is relatively trivial. Now, let me put in a disclaimer here before we go further: For most uses, lock is more than sufficient for your needs, and is often the simplest solution!    So, if lock is sufficient for most needs, why would we ever consider another solution?  The problem with locking is that it can suspend execution of your thread while it waits for the signal that the lock is free.  Moreover, if the operation being locked over is trivial, the lock can add a very high level of overhead.  This is why things like Interlocked.Increment() perform so well, instead of locking just to perform an increment, we perform the increment with an atomic, lockless method. As with all things performance related, it’s important to profile before jumping to the conclusion that you should optimize everything in your path.  If your profiling shows that locking is causing a high level of waiting in your application, then it’s time to consider lighter alternatives such as Interlocked. CompareExchange() – Exchange existing value if equal some value So let’s look at how we could use CompareExchange() to solve our problem above.  The general syntax of CompareExchange() is: T CompareExchange<T>(ref T location, T newValue, T expectedValue) If the value in location == expectedValue, then newValue is exchanged.  Either way, the value in location (before exchange) is returned. Actually, CompareExchange() is not one method, but a family of overloaded methods that can take int, long, float, double, pointers, or references.  It cannot take other value types (that is, can’t CompareExchange() two DateTime instances directly).  Also keep in mind that the version that takes any reference type (the generic overload) only checks for reference equality, it does not call any overridden Equals(). So how does this help us?  Well, we can grab the current total, and exchange the new value if total hasn’t changed.  This would look like this: 1: // grab the snapshot 2: double current = total; 3:  4: // if the total hasn’t changed since I grabbed the snapshot, then 5: // set it to the new total 6: Interlocked.CompareExchange(ref total, current + next, current); So what the code above says is: if the amount in total (1st arg) is the same as the amount in current (3rd arg), then set total to current + next (2nd arg).  This check and exchange pair is atomic (and thus thread-safe). This works if total is the same as our snapshot in current, but the problem, is what happens if they aren’t the same?  Well, we know that in either case we will get the previous value of total (before the exchange), back as a result.  Thus, we can test this against our snapshot to see if it was the value we expected: 1: // if the value returned is != current, then our snapshot must be out of date 2: // which means we didn't (and shouldn't) apply current + next 3: if (Interlocked.CompareExchange(ref total, current + next, current) != current) 4: { 5: // ooops, total was not equal to our snapshot in current, what should we do??? 6: } So what do we do if we fail?  That’s up to you and the problem you are trying to solve.  It’s possible you would decide to abort the whole transaction, or perhaps do a lightweight spin and try again.  Let’s try that: 1: double current = total; 2:  3: // make first attempt... 4: if (Interlocked.CompareExchange(ref total, current + i, current) != current) 5: { 6: // if we fail, go into a spin wait, spin, and try again until succeed 7: var spinner = new SpinWait(); 8:  9: do 10: { 11: spinner.SpinOnce(); 12: current = total; 13: } 14: while (Interlocked.CompareExchange(ref total, current + i, current) != current); 15: } 16:  This is not trivial code, but it illustrates a possible use of CompareExchange().  What we are doing is first checking to see if we succeed on the first try, and if so great!  If not, we create a SpinWait and then repeat the process of SpinOnce(), grab a fresh snapshot, and repeat until CompareExchnage() succeeds.  You may wonder why not a simple do-while here, and the reason it’s more efficient to only create the SpinWait until we absolutely know we need one, for optimal efficiency. Though not as simple (or maintainable) as a simple lock, this will perform better in many situations.  Comparing an unlocked (and wrong) version, a version using lock, and the Interlocked of the code, we get the following average times for multiple iterations of adding the sum of 100,000 numbers: 1: Unlocked money average time: 2.1 ms 2: Locked money average time: 5.1 ms 3: Interlocked money average time: 3 ms So the Interlocked.CompareExchange(), while heavier to code, came in lighter than the lock, offering a good compromise of safety and performance when we need to reduce contention. CompareExchange() - it’s not just for adding stuff… So that was one simple use of CompareExchange() in the context of adding double values -- which meant we couldn’t have used the simpler Interlocked.Add() -- but it has other uses as well. If you think about it, this really works anytime you want to create something new based on a current value without using a full lock.  For example, you could use it to create a simple lazy instantiation implementation.  In this case, we want to set the lazy instance only if the previous value was null: 1: public static class Lazy<T> where T : class, new() 2: { 3: private static T _instance; 4:  5: public static T Instance 6: { 7: get 8: { 9: // if current is null, we need to create new instance 10: if (_instance == null) 11: { 12: // attempt create, it will only set if previous was null 13: Interlocked.CompareExchange(ref _instance, new T(), (T)null); 14: } 15:  16: return _instance; 17: } 18: } 19: } So, if _instance == null, this will create a new T() and attempt to exchange it with _instance.  If _instance is not null, then it does nothing and we discard the new T() we created. This is a way to create lazy instances of a type where we are more concerned about locking overhead than creating an accidental duplicate which is not used.  In fact, the BCL implementation of Lazy<T> offers a similar thread-safety choice for Publication thread safety, where it will not guarantee only one instance was created, but it will guarantee that all readers get the same instance.  Another possible use would be in concurrent collections.  Let’s say, for example, that you are creating your own brand new super stack that uses a linked list paradigm and is “lock free”.  We could use Interlocked.CompareExchange() to be able to do a lockless Push() which could be more efficient in multi-threaded applications where several threads are pushing and popping on the stack concurrently. Yes, there are already concurrent collections in the BCL (in .NET 4.0 as part of the TPL), but it’s a fun exercise!  So let’s assume we have a node like this: 1: public sealed class Node<T> 2: { 3: // the data for this node 4: public T Data { get; set; } 5:  6: // the link to the next instance 7: internal Node<T> Next { get; set; } 8: } Then, perhaps, our stack’s Push() operation might look something like: 1: public sealed class SuperStack<T> 2: { 3: private volatile T _head; 4:  5: public void Push(T value) 6: { 7: var newNode = new Node<int> { Data = value, Next = _head }; 8:  9: if (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next) 10: { 11: var spinner = new SpinWait(); 12:  13: do 14: { 15: spinner.SpinOnce(); 16: newNode.Next = _head; 17: } 18: while (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next); 19: } 20: } 21:  22: // ... 23: } Notice a similar paradigm here as with adding our doubles before.  What we are doing is creating the new Node with the data to push, and with a Next value being the original node referenced by _head.  This will create our stack behavior (LIFO – Last In, First Out).  Now, we have to set _head to now refer to the newNode, but we must first make sure it hasn’t changed! So we check to see if _head has the same value we saved in our snapshot as newNode.Next, and if so, we set _head to newNode.  This is all done atomically, and the result is _head’s original value, as long as the original value was what we assumed it was with newNode.Next, then we are good and we set it without a lock!  If not, we SpinWait and try again. Once again, this is much lighter than locking in highly parallelized code with lots of contention.  If I compare the method above with a similar class using lock, I get the following results for pushing 100,000 items: 1: Locked SuperStack average time: 6 ms 2: Interlocked SuperStack average time: 4.5 ms So, once again, we can get more efficient than a lock, though there is the cost of added code complexity.  Fortunately for you, most of the concurrent collection you’d ever need are already created for you in the System.Collections.Concurrent (here) namespace – for more information, see my Little Wonders – The Concurent Collections Part 1 (here), Part 2 (here), and Part 3 (here). Summary We’ve seen before how the Interlocked class can be used to safely and efficiently add, increment, decrement, read, and exchange values in a multi-threaded environment.  In addition to these, Interlocked CompareExchange() can be used to perform more complex logic without the need of a lock when lock contention is a concern. The added efficiency, though, comes at the cost of more complex code.  As such, the standard lock is often sufficient for most thread-safety needs.  But if profiling indicates you spend a lot of time waiting for locks, or if you just need a lock for something simple such as an increment, decrement, read, exchange, etc., then consider using the Interlocked class’s methods to reduce wait. Technorati Tags: C#,CSharp,.NET,Little Wonders,Interlocked,CompareExchange,threading,concurrency

    Read the article

  • How to maintain encapsulation with composition in C++?

    - by iFreilicht
    I am designing a class Master that is composed from multiple other classes, A, Base, C and D. These four classes have absolutely no use outside of Master and are meant to split up its functionality into manageable and logically divided packages. They also provide extensible functionality as in the case of Base, which can be inherited from by clients. But, how do I maintain encapsulation of Master with this design? So far, I've got two approaches, which are both far from perfect: 1. Replicate all accessors: Just write accessor-methods for all accessor-methods of all classes that Master is composed of. This leads to perfect encapsulation, because no implementation detail of Master is visible, but is extremely tedious and makes the class definition monstrous, which is exactly what the composition should prevent. Also, adding functionality to one of the composees (is that even a word?) would require to re-write all those methods in Master. An additional problem is that inheritors of Base could only alter, but not add functionality. 2. Use non-assignable, non-copyable member-accessors: Having a class accessor<T> that can not be copied, moved or assigned to, but overrides the operator-> to access an underlying shared_ptr, so that calls like Master->A()->niceFunction(); are made possible. My problem with this is that it kind of breaks encapsulation as I would now be unable to change my implementation of Master to use a different class for the functionality of niceFunction(). Still, it is the closest I've gotten without using the ugly first approach. It also fixes the inheritance issue quite nicely. A small side question would be if such a class already existed in std or boost. EDIT: Wall of code I will now post the code of the header files of the classes discussed. It may be a bit hard to understand, but I'll give my best in explaining all of it. 1. GameTree.h The foundation of it all. This basically is a doubly-linked tree, holding GameObject-instances, which we'll later get to. It also has it's own custom iterator GTIterator, but I left that out for brevity. WResult is an enum with the values SUCCESS and FAILED, but it's not really important. class GameTree { public: //Static methods for the root. Only one root is allowed to exist at a time! static void ConstructRoot(seed_type seed, unsigned int depth); inline static bool rootExists(){ return static_cast<bool>(rootObject_); } inline static weak_ptr<GameTree> root(){ return rootObject_; } //delta is in ms, this is used for velocity, collision and such void tick(unsigned int delta); //Interaction with the tree inline weak_ptr<GameTree> parent() const { return parent_; } inline unsigned int numChildren() const{ return static_cast<unsigned int>(children_.size()); } weak_ptr<GameTree> getChild(unsigned int index) const; template<typename GOType> weak_ptr<GameTree> addChild(seed_type seed, unsigned int depth = 9001){ GOType object{ new GOType(seed) }; return addChildObject(unique_ptr<GameTree>(new GameTree(std::move(object), depth))); } WResult moveTo(weak_ptr<GameTree> newParent); WResult erase(); //Iterators for for( : ) loop GTIterator& begin(){ return *(beginIter_ = std::move(make_unique<GTIterator>(children_.begin()))); } GTIterator& end(){ return *(endIter_ = std::move(make_unique<GTIterator>(children_.end()))); } //unloading should be used when objects are far away WResult unloadChildren(unsigned int newDepth = 0); WResult loadChildren(unsigned int newDepth = 1); inline const RenderObject& renderObject() const{ return gameObject_->renderObject(); } //Getter for the underlying GameObject (I have not tested the template version) weak_ptr<GameObject> gameObject(){ return gameObject_; } template<typename GOType> weak_ptr<GOType> gameObject(){ return dynamic_cast<weak_ptr<GOType>>(gameObject_); } weak_ptr<PhysicsObject> physicsObject() { return gameObject_->physicsObject(); } private: GameTree(const GameTree&); //copying is only allowed internally GameTree(shared_ptr<GameObject> object, unsigned int depth = 9001); //pointer to root static shared_ptr<GameTree> rootObject_; //internal management of a child weak_ptr<GameTree> addChildObject(shared_ptr<GameTree>); WResult removeChild(unsigned int index); //private members shared_ptr<GameObject> gameObject_; shared_ptr<GTIterator> beginIter_; shared_ptr<GTIterator> endIter_; //tree stuff vector<shared_ptr<GameTree>> children_; weak_ptr<GameTree> parent_; unsigned int selfIndex_; //used for deletion, this isn't necessary void initChildren(unsigned int depth); //constructs children }; 2. GameObject.h This is a bit hard to grasp, but GameObject basically works like this: When constructing a GameObject, you construct its basic attributes and a CResult-instance, which contains a vector<unique_ptr<Construction>>. The Construction-struct contains all information that is needed to construct a GameObject, which is a seed and a function-object that is applied at construction by a factory. This enables dynamic loading and unloading of GameObjects as done by GameTree. It also means that you have to define that factory if you inherit GameObject. This inheritance is also the reason why GameTree has a template-function gameObject<GOType>. GameObject can contain a RenderObject and a PhysicsObject, which we'll later get to. Anyway, here's the code. class GameObject; typedef unsigned long seed_type; //this declaration magic means that all GameObjectFactorys inherit from GameObjectFactory<GameObject> template<typename GOType> struct GameObjectFactory; template<> struct GameObjectFactory<GameObject>{ virtual unique_ptr<GameObject> construct(seed_type seed) const = 0; }; template<typename GOType> struct GameObjectFactory : GameObjectFactory<GameObject>{ GameObjectFactory() : GameObjectFactory<GameObject>(){} unique_ptr<GameObject> construct(seed_type seed) const{ return unique_ptr<GOType>(new GOType(seed)); } }; //same as with the factories. this is important for storing them in vectors template<typename GOType> struct Construction; template<> struct Construction<GameObject>{ virtual unique_ptr<GameObject> construct() const = 0; }; template<typename GOType> struct Construction : Construction<GameObject>{ Construction(seed_type seed, function<void(GOType*)> func = [](GOType* null){}) : Construction<GameObject>(), seed_(seed), func_(func) {} unique_ptr<GameObject> construct() const{ unique_ptr<GameObject> gameObject{ GOType::factory.construct(seed_) }; func_(dynamic_cast<GOType*>(gameObject.get())); return std::move(gameObject); } seed_type seed_; function<void(GOType*)> func_; }; typedef struct CResult { CResult() : constructions{} {} CResult(CResult && o) : constructions(std::move(o.constructions)) {} CResult& operator= (CResult& other){ if (this != &other){ for (unique_ptr<Construction<GameObject>>& child : other.constructions){ constructions.push_back(std::move(child)); } } return *this; } template<typename GOType> void push_back(seed_type seed, function<void(GOType*)> func = [](GOType* null){}){ constructions.push_back(make_unique<Construction<GOType>>(seed, func)); } vector<unique_ptr<Construction<GameObject>>> constructions; } CResult; //finally, the GameObject class GameObject { public: GameObject(seed_type seed); GameObject(const GameObject&); virtual void tick(unsigned int delta); inline Matrix4f trafoMatrix(){ return physicsObject_->transformationMatrix(); } //getter inline seed_type seed() const{ return seed_; } inline CResult& properties(){ return properties_; } inline const RenderObject& renderObject() const{ return *renderObject_; } inline weak_ptr<PhysicsObject> physicsObject() { return physicsObject_; } protected: virtual CResult construct_(seed_type seed) = 0; CResult properties_; shared_ptr<RenderObject> renderObject_; shared_ptr<PhysicsObject> physicsObject_; seed_type seed_; }; 3. PhysicsObject That's a bit easier. It is responsible for position, velocity and acceleration. It will also handle collisions in the future. It contains three Transformation objects, two of which are optional. I'm not going to include the accessors on the PhysicsObject class because I tried my first approach on it and it's just pure madness (way over 30 functions). Also missing: the named constructors that construct PhysicsObjects with different behaviour. class Transformation{ Vector3f translation_; Vector3f rotation_; Vector3f scaling_; public: Transformation() : translation_{ 0, 0, 0 }, rotation_{ 0, 0, 0 }, scaling_{ 1, 1, 1 } {}; Transformation(Vector3f translation, Vector3f rotation, Vector3f scaling); inline Vector3f translation(){ return translation_; } inline void translation(float x, float y, float z){ translation(Vector3f(x, y, z)); } inline void translation(Vector3f newTranslation){ translation_ = newTranslation; } inline void translate(float x, float y, float z){ translate(Vector3f(x, y, z)); } inline void translate(Vector3f summand){ translation_ += summand; } inline Vector3f rotation(){ return rotation_; } inline void rotation(float pitch, float yaw, float roll){ rotation(Vector3f(pitch, yaw, roll)); } inline void rotation(Vector3f newRotation){ rotation_ = newRotation; } inline void rotate(float pitch, float yaw, float roll){ rotate(Vector3f(pitch, yaw, roll)); } inline void rotate(Vector3f summand){ rotation_ += summand; } inline Vector3f scaling(){ return scaling_; } inline void scaling(float x, float y, float z){ scaling(Vector3f(x, y, z)); } inline void scaling(Vector3f newScaling){ scaling_ = newScaling; } inline void scale(float x, float y, float z){ scale(Vector3f(x, y, z)); } void scale(Vector3f factor){ scaling_(0) *= factor(0); scaling_(1) *= factor(1); scaling_(2) *= factor(2); } Matrix4f matrix(){ return WMatrix::Translation(translation_) * WMatrix::Rotation(rotation_) * WMatrix::Scale(scaling_); } }; class PhysicsObject; typedef void tickFunction(PhysicsObject& self, unsigned int delta); class PhysicsObject{ PhysicsObject(const Transformation& trafo) : transformation_(trafo), transformationVelocity_(nullptr), transformationAcceleration_(nullptr), tick_(nullptr) {} PhysicsObject(PhysicsObject&& other) : transformation_(other.transformation_), transformationVelocity_(std::move(other.transformationVelocity_)), transformationAcceleration_(std::move(other.transformationAcceleration_)), tick_(other.tick_) {} Transformation transformation_; unique_ptr<Transformation> transformationVelocity_; unique_ptr<Transformation> transformationAcceleration_; tickFunction* tick_; public: void tick(unsigned int delta){ tick_ ? tick_(*this, delta) : 0; } inline Matrix4f transformationMatrix(){ return transformation_.matrix(); } } 4. RenderObject RenderObject is a base class for different types of things that could be rendered, i.e. Meshes, Light Sources or Sprites. DISCLAIMER: I did not write this code, I'm working on this project with someone else. class RenderObject { public: RenderObject(float renderDistance); virtual ~RenderObject(); float renderDistance() const { return renderDistance_; } void setRenderDistance(float rD) { renderDistance_ = rD; } protected: float renderDistance_; }; struct NullRenderObject : public RenderObject{ NullRenderObject() : RenderObject(0.f){}; }; class Light : public RenderObject{ public: Light() : RenderObject(30.f){}; }; class Mesh : public RenderObject{ public: Mesh(unsigned int seed) : RenderObject(20.f) { meshID_ = 0; textureID_ = 0; if (seed == 1) meshID_ = Model::getMeshID("EM-208_heavy"); else meshID_ = Model::getMeshID("cube"); }; unsigned int getMeshID() const { return meshID_; } unsigned int getTextureID() const { return textureID_; } private: unsigned int meshID_; unsigned int textureID_; }; I guess this shows my issue quite nicely: You see a few accessors in GameObject which return weak_ptrs to access members of members, but that is not really what I want. Also please keep in mind that this is NOT, by any means, finished or production code! It is merely a prototype and there may be inconsistencies, unnecessary public parts of classes and such.

    Read the article

  • How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

    - by Bob Hanckel
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables.  They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL  8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use.  The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section).  Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load.  In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks.  It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts.  The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes.   The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead.  The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load.  It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same.  Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size.  For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods.  This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    Read the article

< Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111  | Next Page >