Search Results

Search found 2418 results on 97 pages for 'uniform distribution'.

Page 22/97 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

Sorting objects before rendering

- by dreta

I'm trying to implement a scene graph and in all the articles i've come across there is talk about object sorting. So you'd sort your objects by "material" for example. Now untill i sat down and started implementing it, i kind of took this for granted, because it made sense. But now i'm wondering what does sorting actually change? In my engine, i have a manager for UBOs, i use those to store data that'll be shared between programs, at the moment that only involves time, camera and projection matrices and lights (i'm not worrying about managing which lights affect which objects ATM). Now for each model i have to change the model to world matrix uniform, no sorting is going to change that. So is the jump from changing this matrix to also setting a material for each object that bad? I vaguely remember reading somewhere that each time you change something in the pipeline, it has to get flushed and that can cause performance issues. But for each drawing call i'm setting up a model to world matrix anyway, so what sense does it make to ever be concerned about this? BTW is there any information about whether changing a uniform and calling glBufferSubData is more (or less) expensive.

Read the article
Better solution for boolean mixing?

- by Ruben Nunez

Sorry if this question has been asked in the past, but searching Google and here didn't yield relevant results, so here goes. I'm working on a fragment shader that implements both conditional/boolean diffuse and bump mapping (that is to say, you don't need a diffuse texture or a normals texture, and if they're not present, they're simply changed to default values). My current solution is to use a uniform float to say "mix amount". For example, computing the diffuse texel works as: // Compute diffuse amount scaled by vCol // If no texture is present (mDif = 0.0), then DiffuseTexel = vCol // kT[0] is the diffuse texture // vTex is the texture co-ordinates // mDif is the uniform float containing the mix amount (either 0.0 or 1.0) vec4 DiffuseTexel = vCol*mix(vec4(1.0), texture2D(kT[0], vTex), mDif); While that works great and all, I was wondering if there's a better way of doing this, as I will never have any use for in-between values for funky effects. I know that perhaps the best solution is to simply write separate shaders for mDif=0.0 and mDif=1.0, but I'd like a more elegant solution than splicing shaders before compiling or writing multiple shader files and keeping each one updated. Any ideas are greatly appreciated. =)

Read the article
problems texture mapping in modern OpenGL 3.3 using GLSL #version 150

- by RubyKing

Hi all I'm trying to do texture mapping using Modern OpenGL and GLSL 150. The problem is the texture shows but has this weird flicker I can show a video here http://www.youtube.com/watch?v=xbzw_LMxlHw and I have everything setup best I can have my texcords in my vertex array sent up to opengl I have my fragment color set to the texture values and texel values I have my vertex sending the textures cords to texture cordinates to be used in the fragment shader I have my ins and outs setup and I still don't know what I'm missing that could be causing that flicker. here is my code FRAGMENT SHADER #version 150 uniform sampler2D texture; in vec2 texture_coord; varying vec3 texture_coordinate; void main(void){ gl_FragColor = texture(texture, texture_coord); } VERTEX SHADER #version 150 in vec4 position; out vec2 texture_coordinate; out vec2 texture_coord; uniform vec3 translations; void main() { texture_coord = (texture_coordinate); gl_Position = vec4(position.xyz + translations.xyz, 1.0); } Last bit here is my vertex array with texture cordinates GLfloat vVerts[] = { 0.5f, 0.5f, 0.0f, 0.0f, 1.0f , 0.0f, 0.5f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.0f, 0.0f, 1.0f, 0.0f}; //tex x and y HERE IS THE ACTUAL FULL SOURCE CODE if you need to see all the code in its fullest glory here is a link to every file http://ideone.com/7kQN3 thank you for your help

Read the article
Proper way to do texture mapping in modern OpenGL?

- by RubyKing

I'm trying to do texture mapping using OpenGL 3.3 and GLSL 150. The problem is the texture shows but has this weird flicker I can show a video here. My texcords are in a vertex array. I have my fragment color set to the texture values and texel values. I have my vertex shader sending the texture cords to texture cordinates to be used in the fragment shader. I have my ins and outs setup and I still don't know what I'm missing that could be causing that flicker. Here is my code: Fragment shader #version 150 uniform sampler2D texture; in vec2 texture_coord; varying vec3 texture_coordinate; void main(void) { gl_FragColor = texture(texture, texture_coord); } Vertex shader #version 150 in vec4 position; out vec2 texture_coordinate; out vec2 texture_coord; uniform vec3 translations; void main() { texture_coord = (texture_coordinate); gl_Position = vec4(position.xyz + translations.xyz, 1.0); } Last bit Here is my vertex array with texture coordinates: GLfloat vVerts[] = { 0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 0.0f, 0.5f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.5f, 0.0f, 0.0f, 1.0f, 0.0f}; //tex x and y If you need to see all the code, here is a link to every file. Thank you for your help.

Read the article
Linux LiveCD (or LiveUSB) with custom xorg.conf

- by Jakub Narebski

Is there some Live Linux distribution (on CD, DVD or USB), which allow one to use specific xorg.conf file, i.e. specific X11 configuration? The problem I am trying to solve is to find Linux Live distribution for web browsing which would work well with NEC LCD 22WV monitor. It is supposedly DDCCCI capable, but X.Org X Window System autoconfiguration fails to detect proper modeline, and uses fallback 800x600 screen resolution, instead of preferred screen resolution of 1680x1050.

Read the article
outlook rule is not working

- by oo

i have a rule that says if i am cced on the mail then move to a folder called "CC". If someone sends an email to a distribution list where the distribution list is cced, shouldn't that email also go into the "CC" folder? it doesn't seem to be working

Read the article
Download specific kernels for distros

- by ant2009

Hello, I am running CentOS 5.3. I am wondering where I can download the latest kernel for this distribution. I went to www.centos.org but could see any kernel download only the complete distribution is available to download. I just want the kernel. Kernels on the www.kernels.org are the vanilla kernels. I am wondering where to download for the specific distro? Many thanks for any advice,

Read the article
SQL 2008 Replication

- by sevenlamp

I have replicated two database in SQL2008. The last two days the distribution database from publisher server changed to Suspect Mode and the replication process is down. I read a lot from Google and tried to repair, but without success. During my repairing process, the Distribution Database is has changed to Emergency mode. Can anyone please kindly suggest any solutions?

Read the article
Recovery packages (Ubuntu 10.04 LTS)

- by user363249

Good day, everybody. I have a Problem of the following: I deleted libtiff4 with all the programs in a destributive where it was used (about 1.5 gigabytes). Q How do I restore the distribution package? P.S. I do not have distribution.

Read the article
Histogram matching - image processing - c/c++

- by Raj

Hello I have two histograms. int Hist1[10] = {1,4,3,5,2,5,4,6,3,2}; int Hist1[10] = {1,4,3,15,12,15,4,6,3,2}; Hist1's distribution is of type multi-modal; Hist2's distribution is of type uni-modal with single prominent peak. My questions are Is there any way that i could determine the type of distribution programmatically? How to quantify whether these two histograms are similar/dissimilar? Thanks

Read the article
How can I get my setup.py to use a relative path to my files?

- by Chris B.

I'm trying to build a Python distribution with distutils. Unfortunately, my directory structure looks like this: /code /mypackage __init__.py file1.py file2.py /subpackage __init__.py /build setup.py Here's my setup.py file: from distutils.core import setup setup( name = 'MyPackage', description = 'This is my package', packages = ['mypackage', 'mypackage.subpackage'], package_dir = { 'mypackage' : '../mypackage' }, version = '1', url = 'http://www.mypackage.org/', author = 'Me', author_email = '[email protected]', ) When I run python setup.py sdist it correctly generates the manifest file, but doesn't include my source files in the distribution. Apparently, it creates a directory to contain the source files (i.e. mypackage1) then copies each of the source files to mypackage1/../mypackage which puts them outside of the distribution. How can I correct this, without forcing my directory structure to conform to what distutils expects?

Read the article
read file and print in specific format c++

- by 3yoon af

Dear all, I have a program that i should write a code using c++ lauguage and i don't used this laugauge before.. I now how to write it in java or c#, but i should write it in c++ !! the code should read a text file (i do this step) and then print the output in specific format using the array (i don't now how to do this step) For example: The file has the following: Task distribution duration dependence A Normal 2,10 - B UNIF 2,7 A The code will print the following: The task A is a normal distribution and it is duration between 2 and 10. It doesn't depend on any task. Task B is unif distribution and ...... etc .. Can someone help me, please?

Read the article
Histrogram matching - image processing - c/c++

- by Raj

Hello I have two histograms. int Hist1[10] = {1,4,3,5,2,5,4,6,3,2}; int Hist1[10] = {1,4,3,15,12,15,4,6,3,2}; Hist1's distribution is of type multi-modal; Hist2's distribution is of type uni-modal with single prominent peak. My questions are Is there any way that i could determine the type of distribution programmatically? How to quantify whether these two histograms are similar/dissimilar? Thanks

Read the article
Choose between multiple options with defined probability

- by Sijin

I have a scenario where I need to show a different page to a user for the same url based on a probability distribution, so for e.g. for 3 pages the distribution might be page 1 - 30% of all users page 2 - 50% of all users page 3 - 20% of all users When deciding what page to load for a given user, what technique can I use to ensure that the overall distribution matches the above? I am thinking I need a way to choose an object at "random" from a set X { x1, x2....xn } except that instead of all objects being equally likely the probability of an object being selected is defined beforehand.

Read the article
Problems when trying to submit iphone app

- by ryug

I'm a fairly new developer. When I try to submit my iphone app with xcode, I've got error as follows; Code Sign error: The identity 'iPhone Distribution' doesn't match any valid, non-expired certificate/private key pair in the default keychain After searching, I found out that I have to create a Distribution Provisioning Profile. However, my distribution provisioning profile doesn't work, even though my Development Provisioning Profile works perfectly. Could someone please help me with this problem? I'm stuck all day... and please forgive me that my English is not great. Thank you in advance.

Read the article
Configure Oracle SOA JMSAdatper to Work with WLS JMS Topics

- by fip

The WebLogic JMS Topic are typically running in a WLS cluster. So as your SOA composites that receive these Topic messages. In some situation, the two clusters are the same while in others they are sepearate. The composites in SOA cluster are subscribers to the JMS Topic in WebLogic cluster. As nature of JMS Topic is meant to distribute the same copy of messages to all its subscribers, two questions arise immediately when it comes to load balancing the JMS Topic messages against the SOA composites: How to assure all of the SOA cluster members receive different messages instead of the same (duplicate) messages, even though the SOA cluster members are all subscribers to the Topic? How to make sure the messages are evenly distributed (load balanced) to SOA cluster members? Here we will walk through how to configure the JMS Topic, the JmsAdapter connection factory, as well as the composite so that the JMS Topic messages will be evenly distributed to same composite running off different SOA cluster nodes without causing duplication. 2. The typical configuration In this typical configuration, we achieve the load balancing of JMS Topic messages to JmsAdapters by configuring a partitioned distributed topic along with sharable subscriptions. You can reference the documentation for explanation of PDT. And this blog posting does a very good job to visually explain how this combination of configurations would message load balancing among clients of JMS Topics. Our job is to apply this configuration in the context of SOA JMS Adapters. To do so would involve the following steps: Step A. Configure JMS Topic to be UDD and PDT, at the WebLogic cluster that house the JMS Topic Step B. Configure JCA Connection Factory with proper ServerProperties at the SOA cluster Step C. Reference the JCA Connection Factory and define a durable subscriber name, at composite's JmsAdapter (or the *.jca file) Here are more details of each step: Step A. Configure JMS Topic to be UDD and PDT, You do this at the WebLogic cluster that house the JMS Topic. You can follow the instructions at Administration Console Online Help to create a Uniform Distributed Topic. If you use WebLogic Console, then at the same administration screen you can specify "Distribution Type" to be "Uniform", and the Forwarding policy to "Partitioned", which would make the JMS Topic Uniform Distributed Destination and a Partitioned Distributed Topic, respectively Step B: Configure ServerProperties of JCA Connection Factory You do this step at the SOA cluster. This step is to make the JmsAdapter that connect to the JMS Topic through this JCA Connection Factory as a certain type of "client". When you configure the JCA Connection Factory for the JmsAdapter, you define the list of properties in FactoryProperties field, in a semi colon separated list: ClientID=myClient;ClientIDPolicy=UNRESTRICTED;SubscriptionSharingPolicy=SHARABLE;TopicMessageDistributionAll=false You can refer to Chapter 8.4.10 Accessing Distributed Destinations (Queues and Topics) on the WebLogic Server JMS of the Adapter User Guide for the meaning of these properties. Please note: Except for ClientID, other properties such as the ClientIDPolicy=UNRESTRICTED, SubscriptionSharingPolicy=SHARABLE and TopicMessageDistributionAll=false are all default settings for the JmsAdapter's connection factory. Therefore you do NOT have to explicitly specify them explicitly. All you need to do is the specify the ClientID. The ClientID is different from the subscriber ID that we are to discuss in the later steps. To make it simple, you just need to remember you need to specify the client ID and make it unique per connection factory. Here is the example setting: Step C. Reference the JCA Connection Factory and define a durable subscriber name, at composite's JmsAdapter (or the *.jca file) In the following example, the value 'MySubscriberID-1' was given as the value of property 'DurableSubscriber': <adapter-config name="subscribe" adapter="JMS Adapter" wsdlLocation="subscribe.wsdl" xmlns="http://platform.integration.oracle/blocks/adapter/fw/metadata"> <connection-factory location="eis/wls/MyTestUDDTopic" UIJmsProvider="WLSJMS" UIConnectionName="ateam-hq24b"/> <endpoint-activation portType="Consume_Message_ptt" operation="Consume_Message"> <activation-spec className="oracle.tip.adapter.jms.inbound.JmsConsumeActivationSpec"> <property name="DurableSubscriber" value="MySubscriberID-1"/> <property name="PayloadType" value="TextMessage"/> <property name="UseMessageListener" value="false"/> <property name="DestinationName" value="jms/MyTestUDDTopic"/> </activation-spec> </endpoint-activation> </adapter-config> You can set the durable subscriber name either at composite's JmsAdapter wizard,or by directly editing the JmsAdapter's *.jca file within the Composite project. 2.The "atypical" configurations: For some systems, there may be restrictions that do not allow the afore mentioned "typical" configurations be applied. For examples, some deployments may be required to configure the JMS Topic to be Replicated Distributed Topic rather than Partition Distributed Topic. We would like to discuss those scenarios here: Configuration A: The JMS Topic is NOT PDT In this case, you need to define the message selector 'NOT JMS_WL_DDForwarded' in the adapter's *.jca file, to filter out those "replicated" messages. Configuration B. The ClientIDPolicy=RESTRICTED In this case, you need separate factories for different composites. More accurately, you need separate factories for different *.jca file of JmsAdapter. References: Managing Durable Subscription WebLogic JMS Partitioned Distributed Topics and Shared Subscriptions JMS Troubleshooting: Configuring JMS Message Logging: Advanced Programming with Distributed Destinations Using the JMS Destination Availability Helper API

Read the article
NET Math Libraries

- by JoshReuben

NET Mathematical Libraries .NET Builder for Matlab The MathWorks Inc. - http://www.mathworks.com/products/netbuilder/ MATLAB Builder NE generates MATLAB based .NET and COM components royalty-free deployment creates the components by encrypting MATLAB functions and generating either a .NET or COM wrapper around them. .NET/Link for Mathematica www.wolfram.com a product that 2-way integrates Mathematica and Microsoft's .NET platform call .NET from Mathematica - use arbitrary .NET types directly from the Mathematica language. use and control the Mathematica kernel from a .NET program. turns Mathematica into a scripting shell to leverage the computational services of Mathematica. write custom front ends for Mathematica or use Mathematica as a computational engine for another program comes with full source code. Leverages MathLink - a Wolfram Research's protocol for sending data and commands back and forth between Mathematica and other programs. .NET/Link abstracts the low-level details of the MathLink C API. Extreme Optimization http://www.extremeoptimization.com/ a collection of general-purpose mathematical and statistical classes built for the.NET framework. It combines a math library, a vector and matrix library, and a statistics library in one package. download the trial of version 4.0 to try it out. Multi-core ready - Full support for Task Parallel Library features including cancellation. Broad base of algorithms covering a wide range of numerical techniques, including: linear algebra (BLAS and LAPACK routines), numerical analysis (integration and differentiation), equation solvers. Mathematics leverages parallelism using .NET 4.0's Task Parallel Library. Basic math: Complex numbers, 'special functions' like Gamma and Bessel functions, numerical differentiation. Solving equations: Solve equations in one variable, or solve systems of linear or nonlinear equations. Curve fitting: Linear and nonlinear curve fitting, cubic splines, polynomials, orthogonal polynomials. Optimization: find the minimum or maximum of a function in one or more variables, linear programming and mixed integer programming. Numerical integration: Compute integrals over finite or infinite intervals, over 2D and higher dimensional regions. Integrate systems of ordinary differential equations (ODE's). Fast Fourier Transforms: 1D and 2D FFT's using managed or fast native code (32 and 64 bit) BigInteger, BigRational, and BigFloat: Perform operations with arbitrary precision. Vector and Matrix Library Real and complex vectors and matrices. Single and double precision for elements. Structured matrix types: including triangular, symmetrical and band matrices. Sparse matrices. Matrix factorizations: LU decomposition, QR decomposition, singular value decomposition, Cholesky decomposition, eigenvalue decomposition. Portability and performance: Calculations can be done in 100% managed code, or in hand-optimized processor-specific native code (32 and 64 bit). Statistics Data manipulation: Sort and filter data, process missing values, remove outliers, etc. Supports .NET data binding. Statistical Models: Simple, multiple, nonlinear, logistic, Poisson regression. Generalized Linear Models. One and two-way ANOVA. Hypothesis Tests: 12 14 hypothesis tests, including the z-test, t-test, F-test, runs test, and more advanced tests, such as the Anderson-Darling test for normality, one and two-sample Kolmogorov-Smirnov test, and Levene's test for homogeneity of variances. Multivariate Statistics: K-means cluster analysis, hierarchical cluster analysis, principal component analysis (PCA), multivariate probability distributions. Statistical Distributions: 25 29 continuous and discrete statistical distributions, including uniform, Poisson, normal, lognormal, Weibull and Gumbel (extreme value) distributions. Random numbers: Random variates from any distribution, 4 high-quality random number generators, low discrepancy sequences, shufflers. New in version 4.0 (November, 2010) Support for .NET Framework Version 4.0 and Visual Studio 2010 TPL Parallellized – multicore ready sparse linear program solver - can solve problems with more than 1 million variables. Mixed integer linear programming using a branch and bound algorithm. special functions: hypergeometric, Riemann zeta, elliptic integrals, Frensel functions, Dawson's integral. Full set of window functions for FFT's. Product Price Update subscription Single Developer License $999 $399 Team License (3 developers) $1999 $799 Department License (8 developers) $3999 $1599 Site License (Unlimited developers in one physical location) $7999 $3199 NMath http://www.centerspace.net .NET math and statistics libraries matrix and vector classes random number generators Fast Fourier Transforms (FFTs) numerical integration linear programming linear regression curve and surface fitting optimization hypothesis tests analysis of variance (ANOVA) probability distributions principal component analysis cluster analysis built on the Intel Math Kernel Library (MKL), which contains highly-optimized, extensively-threaded versions of BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage). Product Price Update subscription Single Developer License $1295 $388 Team License (5 developers) $5180 $1554 DotNumerics http://www.dotnumerics.com/NumericalLibraries/Default.aspx free DotNumerics is a website dedicated to numerical computing for .NET that includes a C# Numerical Library for .NET containing algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, ports from Fortran to C# of LAPACK, BLAS and EISPACK, respectively. Linear Algebra (CSLapack, CSBlas and CSEispack). Systems of linear equations, eigenvalue problems, least-squares solutions of linear systems and singular value problems. Differential Equations. Initial-value problem for nonstiff and stiff ordinary differential equations ODEs (explicit Runge-Kutta, implicit Runge-Kutta, Gear's BDF and Adams-Moulton). Optimization. Unconstrained and bounded constrained optimization of multivariate functions (L-BFGS-B, Truncated Newton and Simplex methods). Math.NET Numerics http://numerics.mathdotnet.com/ free an open source numerical library - includes special functions, linear algebra, probability models, random numbers, interpolation, integral transforms. A merger of dnAnalytics with Math.NET Iridium in addition to a purely managed implementation will also support native hardware optimization. constants & special functions complex type support real and complex, dense and sparse linear algebra (with LU, QR, eigenvalues, ... decompositions) non-uniform probability distributions, multivariate distributions, sample generation alternative uniform random number generators descriptive statistics, including order statistics various interpolation methods, including barycentric approaches and splines numerical function integration (quadrature) routines integral transforms, like fourier transform (FFT) with arbitrary lengths support, and hartley spectral-space aware sequence manipulation (signal processing) combinatorics, polynomials, quaternions, basic number theory. parallelized where appropriate, to leverage multi-core and multi-processor systems fully managed or (if available) using native libraries (Intel MKL, ACMS, CUDA, FFTW) provides a native facade for F# developers

Read the article
OpenGL problem with FBO integer texture and color attachment

- by Grieverheart

In my simple renderer, I have 2 FBOs one that contains diffuse, normals, instance ID and depth in that order and one that I use store the ssao result. The textures I use for the first FBO are RGB8, RGBA16F, R32I and GL_DEPTH_COMPONENT32F for the depth. For the second FBO I use an R16F texture. My rendering process is to first render to everything I mentioned in the first FBO, then bind depth and normals textures for reading for the ssao pass and write to the second FBO. After that I bind the second FBO's texture for reading in my blur shader and bind the first FBO for writing. What I intend to do is to write the blurred ssao value to the alpha component of the Normals texture. Here are where the problems start. First of all, I use shading language 3.3, which my graphics card does support. I manage ouputs in my shaders using layout(location = #). Now, the normals texture should be bound to color attachment 1, but when I use 1, it seems to write to my diffuse texture which should be in color attachment 0. When I instead use layout(location = 0), it gets correctly written to my normals texture. Besides this, my instance ID texture also gets resets after running the blur shader which is weird because if I use a float texture and write to it instanceID / nInstances, the texture doesn't get reset after the blur shader has ran. Here is how I prepare my first FBO: bool CGBuffer::Init(unsigned int WindowWidth, unsigned int WindowHeight){ //Create FBO glGenFramebuffers(1, &m_fbo); glBindFramebuffer(GL_DRAW_FRAMEBUFFER, m_fbo); //Create gbuffer and Depth Buffer Textures glGenTextures(GBUFF_NUM_TEXTURES, &m_textures[0]); glGenTextures(1, &m_depthTexture); //prepare gbuffer for(unsigned int i = 0; i < GBUFF_NUM_TEXTURES; i++){ glBindTexture(GL_TEXTURE_2D, m_textures[i]); if(i == GBUFF_TEXTURE_TYPE_NORMAL) glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, WindowWidth, WindowHeight, 0, GL_RGBA, GL_FLOAT, NULL); else if(i == GBUFF_TEXTURE_TYPE_DIFFUSE) glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB8, WindowWidth, WindowHeight, 0, GL_RGB, GL_FLOAT, NULL); else if(i == GBUFF_TEXTURE_TYPE_ID) glTexImage2D(GL_TEXTURE_2D, 0, GL_R32I, WindowWidth, WindowHeight, 0, GL_RED_INTEGER, GL_INT, NULL); else{ std::cout << "Error in FBO initialization" << std::endl; return false; } glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, m_textures[i], 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP); } //prepare depth buffer glBindTexture(GL_TEXTURE_2D, m_depthTexture); glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT32F, WindowWidth, WindowHeight, 0, GL_DEPTH_COMPONENT, GL_FLOAT, NULL); glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, m_depthTexture, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri (GL_TEXTURE_2D, GL_TEXTURE_COMPARE_MODE, GL_NONE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP); GLenum DrawBuffers[] = {GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, GL_COLOR_ATTACHMENT2}; glDrawBuffers(GBUFF_NUM_TEXTURES, DrawBuffers); GLenum Status = glCheckFramebufferStatus(GL_FRAMEBUFFER); if(Status != GL_FRAMEBUFFER_COMPLETE){ std::cout << "FB error, status 0x" << std::hex << Status << std::endl; return false; } //Restore default framebuffer glBindFramebuffer(GL_FRAMEBUFFER, 0); return true; } where I use an enum defined as, enum GBUFF_TEXTURE_TYPE{ GBUFF_TEXTURE_TYPE_DIFFUSE, GBUFF_TEXTURE_TYPE_NORMAL, GBUFF_TEXTURE_TYPE_ID, GBUFF_NUM_TEXTURES }; Am I missing some kind of restriction? Does the color attachment of the FBO's textures somehow gets reset i.e. I'm using a re-size function which re-sizes the textures of the FBO but should I perhaps call glFramebufferTexture2D again too? EDIT: Here is the shader in question: #version 330 core uniform sampler2D aoSampler; uniform vec2 TEXEL_SIZE; // x = 1/res x, y = 1/res y uniform bool use_blur; noperspective in vec2 TexCoord; layout(location = 0) out vec4 out_AO; void main(void){ if(use_blur){ float result = 0.0; for(int i = -1; i < 2; i++){ for(int j = -1; j < 2; j++){ vec2 offset = vec2(TEXEL_SIZE.x * i, TEXEL_SIZE.y * j); result += texture(aoSampler, TexCoord + offset).r; // -0.004 because the texture seems to be a bit displaced } } out_AO = vec4(vec3(0.0), result / 9); } else out_AO = vec4(vec3(0.0), texture(aoSampler, TexCoord).r); }

Read the article
How do I pass vertex and color positions to OpenGL shaders?

- by smoth190

I've been trying to get this to work for the past two days, telling myself I wouldn't ask for help. I think you can see where that got me... I thought I'd try my hand at a little OpenGL, because DirectX is complex and depressing. I picked OpenGL 3.x, because even with my OpenGL 4 graphics card, all my friends don't have that, and I like to let them use my programs. There aren't really any great tutorials for OpenGL 3, most are just "type this and this will happen--the end". I'm trying to just draw a simple triangle, and so far, all I have is a blank screen with my clear color (when I set the draw type to GL_POINTS I just get a black dot). I have no idea what the problem is, so I'll just slap down the code: Here is the function that creates the triangle: void CEntityRenderable::CreateBuffers() { m_vertices = new Vertex3D[3]; m_vertexCount = 3; m_vertices[0].x = -1.0f; m_vertices[0].y = -1.0f; m_vertices[0].z = -5.0f; m_vertices[0].r = 1.0f; m_vertices[0].g = 0.0f; m_vertices[0].b = 0.0f; m_vertices[0].a = 1.0f; m_vertices[1].x = 1.0f; m_vertices[1].y = -1.0f; m_vertices[1].z = -5.0f; m_vertices[1].r = 1.0f; m_vertices[1].g = 0.0f; m_vertices[1].b = 0.0f; m_vertices[1].a = 1.0f; m_vertices[2].x = 0.0f; m_vertices[2].y = 1.0f; m_vertices[2].z = -5.0f; m_vertices[2].r = 1.0f; m_vertices[2].g = 0.0f; m_vertices[2].b = 0.0f; m_vertices[2].a = 1.0f; //Create the VAO glGenVertexArrays(1, &m_vaoID); //Bind the VAO glBindVertexArray(m_vaoID); //Create a vertex buffer glGenBuffers(1, &m_vboID); //Bind the buffer glBindBuffer(GL_ARRAY_BUFFER, m_vboID); //Set the buffers data glBufferData(GL_ARRAY_BUFFER, sizeof(m_vertices), m_vertices, GL_STATIC_DRAW); //Set its usage glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex3D), 0); glVertexAttribPointer(1, 4, GL_FLOAT, GL_TRUE, sizeof(Vertex3D), (void*)(3*sizeof(float))); //Enable glEnableVertexAttribArray(0); glEnableVertexAttribArray(1); //Check for errors if(glGetError() != GL_NO_ERROR) { Error("Failed to create VBO: %s", gluErrorString(glGetError())); } //Unbind... glBindVertexArray(0); } The Vertex3D struct is as such... struct Vertex3D { Vertex3D() : x(0), y(0), z(0), r(0), g(0), b(0), a(1) {} float x, y, z; float r, g, b, a; }; And finally the render function: void CEntityRenderable::RenderEntity() { //Render... glBindVertexArray(m_vaoID); //Use our attribs glDrawArrays(GL_POINTS, 0, m_vertexCount); glBindVertexArray(0); //unbind OnRender(); } (And yes, I am binding and unbinding the shader. That is just in a different place) I think my problem is that I haven't fully wrapped my mind around this whole VertexAttribArray thing (the only thing I like better in DirectX was input layouts D:). This is my vertex shader: #version 330 //Matrices uniform mat4 projectionMatrix; uniform mat4 viewMatrix; uniform mat4 modelMatrix; //In values layout(location = 0) in vec3 position; layout(location = 1) in vec3 color; //Out values out vec3 frag_color; //Main shader void main(void) { //Position in world gl_Position = vec4(position, 1.0); //gl_Position = projectionMatrix * viewMatrix * modelMatrix * vec4(in_Position, 1.0); //No color changes frag_color = color; } As you can see, I've disable the matrices, because that just makes debugging this thing so much harder. I tried to debug using glslDevil, but my program just crashes right before the shaders are created... so I gave up with that. This is my first shot at OpenGL since the good old days of LWJGL, but that was when I didn't even know what a shader was. Thanks for your help :)

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
Mnesia: A Distributed DBMS Rooted in Concurrency

Find out what makes Mnesia, the Erlang-based database management system, perfect for distribution across a network of computers.

Read the article
Difference Procedural Generation and Random Generation

- by U-No-Poo

Today, I got into an argument about the term "procedural generation". My point was that its different from "classic" random generation in the way that procedural is based on a more mathematical, fractal based, algorithm leading to a more "realistic" distribution and the usual randomness of most languages are based on a pseudo-random-number generator, leading to an "unrealistic", in a way, ugly, distribution. This discussion was made with a heightmap in mind. The discussion left me somehow unconvinced about my own arguments though, so, is there more to it? Or am I the one who is, in fact, simply wrong?

Read the article
Novell repousse l'offre de rachat d'un fonds d'investissement, l'éditeur de SUSE veut plus : Linux d

Mise à jour du 22/03/10 Novell repousse l'offre de rachat d'un fonds d'investissement Les dirigeants de l'éditeur de la distribution Linux SUSE veulent plus : Linux devient-il un produit spéculatif ? Novell, la société qui soutient la célèbre distribution Linux SUSE, vient de rejeter l'offre de rachat du fonds d'investissement Elliott Associates L.P. Il serait cependant faux de croire que l'affaire est close. Le fonds pourrait en effet lancer une offre public d'achat hostile sur l'entreprise. Quant aux dirigeants de Novell, ils ne ferment pas la porte à une éventuelle vente, mais à de meilleures conditions (ou à un a...

Read the article
Oracle va proposer ses serveurs Sparc avec Oracle Enterprise Linux et plus simplement avec Solaris pour concurrencer encore plus IBM

Oracle va proposer ses serveurs Sparc avec Oracle Enterprise Linux Et plus simplement avec Solaris, pour concurrencer encore plus IBM Oracle va porter sa distribution dans les prochaines versions de son processeur Sparc. Jusqu'ici, Solaris était l'OS de prédilection pour les serveurs SPARC. Ceci pourrait changer. Oracle a en effet décidé de mettre en avant sa distribution Linux : Oracle Enterprise Linux « Nous pensons que le Sparc va devenir clairement la meilleure technologie pour faire tourner des solutions Oracle », a déclaré Larry Ellison, le PDG d'Oracle lors du lancement des nouveaux systèmes SPARC. « Nous serions idiots de ne pas y porter Oracle Enterprise...

Read the article
Cannot install eclipse due to broken packages

- by Achim

Trying to install eclipse, I get the following error: XXX:~$ sudo apt-get install eclipse Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: eclipse : Depends: eclipse-jdt (>= 3.8.0~rc4-1ubuntu1) but it is not going to be installed Depends: eclipse-pde (>= 3.8.0~rc4-1ubuntu1) but it is not going to be installed E: Unable to correct problems, you have held broken packages. I have no idea how to solve it. I'm quite new to Ubuntu, but I don't think that I'm using a unstable distribution. But I have added the repository which is required to install Tomcat7. Could that cause the problem?

Read the article

< Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >