Search Results

Search found 24879 results on 996 pages for 'prime number'.

Page 218/996 | < Previous Page | 214 215 216 217 218 219 220 221 222 223 224 225  | Next Page >

  • Function Triggered before fadeOut(); is finished

    - by willmcneilly
    Hi I'm new to javascript/jQuery this really has me stumped. What I'm trying to achieve here is On toggling a#sameDayTab jquery will look for .changeAlert and fadeOut it's container div, when toggled again the div will fade in (this works well.) Each toggle will also call a function that tells me how many .changeAlert's are present on the page and updates the number appropriately in a span. The problem is when I first click the toggled anchor the number of visible should be 0 as the .changeAlert has been hidden by fadeOut instead it returns the number of classes present on page load this value never changes no matter how many times the toggle is activated. Any help greatly appreciated. function totalNumFares () { var n = $('.changeAlert:visible').size(); $('.numFares').replaceWith('<span class=\"numFares\">'+ n +'</span>'); } totalNumFares(); //Toggle On/off Same Day Connections $('a#sameDayTab').toggle(function() { $('.changeAlert').parent().parent().parent().parent().parent().fadeOut(); totalNumFares(); },function(){ $('.changeAlert').parent().parent().parent().parent().parent().fadeIn(); totalNumFares(); });

    Read the article

  • Know the row with max characters (C)

    - by l_core
    I have wrote a program in C, to find the row with the max number of characters. Here is the code #include <stdio.h> #include <stdlib.h> #include <ctype.h> #include <string.h> int main (int argc, char *argv[]) { char c; /* used to store the character with getc */ int c_tot = 0, c_rig = 0, c_max = 0; /* counters of characters*/ int r_tot = 0; /* counters of rows */ FILE *fptr; fptr = fopen(argv[1], "r"); if (fptr == NULL || argc != 2) { printf ("Error opening the file %s\n'", argv[1]); exit(EXIT_FAILURE); } while ( (c = getc(fptr)) != EOF) { if (c != ' ' && c != '\n') { c_tot++; c_rig++; } if (c == '\n') { r_tot++; if (c_rig > c_max) c_max = c_rig; c_rig = 0; } } printf ("Total rows: %d\n", r_tot); printf ("Total characters: %d\n", c_tot); printf ("Total characters in a row: %d\n", c_max); printf ("Average number of characters on a row: %d\n", (c_tot/r_tot)); printf ("The row with max characters is: %s\n", ??????) return 0; } I can easily find the row with the highest number of characters but how can i print that out? Thank You Folks

    Read the article

  • What is the point of having a key_t if what will be the key to access shared memory is the return value of shmget()?

    - by devoured elysium
    When using shared memory, why should we care about creating a key key_t ftok(const char *path, int id); in the following bit of code? key_t key; int shmid; key = ftok("/home/beej/somefile3", 'R'); shmid = shmget(key, 1024, 0644 | IPC_CREAT); From what I've come to understand, what is needed to access a given shared memory is the shmid, not the key. Or am I wrong? If what we need is the shmid, what is the point in not just creating a random key every time? Edit @link text one can read: What about this key nonsense? How do we create one? Well, since the type key_t is actually just a long, you can use any number you want. But what if you hard-code the number and some other unrelated program hardcodes the same number but wants another queue? The solution is to use the ftok() function which generates a key from two arguments. Reading this, it gives me the impression that what one needs to attach to a shared-memory block is the key. But this isn't true, is it? Thanks

    Read the article

  • Data Integration Solution?

    - by Shlomo
    At my company we have a number of data feeds and processing that run on any given day. The number of feeds and processing steps is starting to out-number the ability to manage it ad-hoc as it is managed currently. Is there a good solution that helps with logging and managing/scheduling dependencies? For example: A: When file x is FTP dropped into directory D1, kick off processing step B B: Load flat file into DB1 C: When file y is FTP dropped into directory D2, kick off processing Step D D: Load flat file into DB11 E: When B and D are done, churn through the data, and load new data into DB111. F: When Step E is done, launch application process P G: etc... I want those steps to run at the appropriate times, not to mention if B fails, there's no reason to run steps E & F, but I could still run C & D. When I re-run B successfully, it should trigger just E & F to re-run, not C & D. We're a .NET/C#/Sql Server shop, and I'm already familiar with SSIS. Is that really the best there is? That manages steps well, but not external dependencies, or logging. Open source (.NET) preferred, but not required.

    Read the article

  • SLRequest return nil before block complete

    - by jaytr0n
    I'm having a little trouble thinking through this and deciding if it's a design flaw on my behalf or if I'm missing a piece that could make this work. Basically I'm using the new SLRequest to make a Twitter API call. After the data is returned, I'd like to put it into an object and return that object: -(AUDSlide *) getFollowerSlide { SLRequest *getRequest = [SLRequest requestForServiceType:SLServiceTypeTwitter requestMethod:SLRequestMethodGET URL:[NSURL URLWithString:[NSString stringWithFormat:@"https://api.twitter.com/1.1/followers/ids.json?user_id=%@", [[self.twitterAccount valueForKey:@"properties"] valueForKey:@"user_id"]]] parameters:nil]; getRequest.account = twitterAccount; AUDSlide *slide = [[AUDSlide alloc] init]; [getRequest performRequestWithHandler:^(NSData *responseData, NSHTTPURLResponse *urlResponse, NSError *error) { if ([urlResponse statusCode] == 200) { NSError *jsonError = nil; NSDictionary *list = [NSJSONSerialization JSONObjectWithData:responseData options:0 error:&jsonError]; NSLog(@"Number of Followers: %u", [[list objectForKey:@"ids"] count]); slide.title = @"Followers"; slide.number = [NSString stringWithFormat:@"%u",[[list objectForKey:@"ids"] count]]; } else{ slide.title = @"Error"; slide.number = [NSString stringWithFormat: @"E%u", [urlResponse statusCode]]; } }]; return slide; } Of course the slide is returned before the call is complete and returns a nil object. So I'm not sure if I should try to force this into a synchronous request (that seems like it could be a bad idea) or rethink the design. Does anyone have any advice?

    Read the article

  • How to create custom MouseEvent.CLICK event in AS3 (pass parameters to function)?

    - by fromvega
    Hello, This question doesn't relate only to MouseEvent.CLICK event type but to all event types that already exist in AS3. I read a lot about custom events but until now I couldn't figure it out how to do what I want to do. I'm going to try to explain, I hope you understand: Here is a illustration of my situation: for(var i:Number; i < 10; i++){ var someVar = i; myClips[i].addEventListener(MouseEvent.CLICK, doSomething); } function doSomething(e:MouseEvent){ /* */ } But I want to be able to pass someVar as a parameter to doSomething. So I tried this: for(var i:Number; i < 10; i++){ var someVar = i; myClips[i].addEventListener(MouseEvent.CLICK, function(){ doSomething(someVar); }); } function doSomething(index){ trace(index); } This kind of works but not as I expect. Due to the function closures, when the MouseEvent.CLICK events are actually fired the for loop is already over and someVar is holding the last value, the number 9 in the example. So every click in each movie clip will call doSomething passing 9 as the parameter. And it's not what I want. I thought that creating a custom event should work, but then I couldn't find a way to fire a custom event when the MouseEvent.CLICK event is fired and pass the parameter to it. Now I don't know if it is the right answer. What should I do and how?

    Read the article

  • Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best w

    - by Phukab
    I was recently given this interview question and I'm curious what a good solution to it would be. Say I'm given a 2d array where all the numbers in the array are in increasing order from left to right and top to bottom. What is the best way to search and determine if a target number is in the array? Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off. Another solution I could use, if I could be sure the matrix is n x n, is to start at the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagnally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number. Does anyone have any good ideas on solving this problem? Example array: Sorted left to right, top to bottom. 1 2 4 5 6 2 3 5 7 8 4 6 8 9 10 5 8 9 10 11

    Read the article

  • Mysterious combination

    - by pstone
    I decided to learn concurrency and wanted to find out in how many ways instructions from two different processes could overlap. The code for both processes is just a 10 iteration loop with 3 instructions performed in each iteration. I figured out the problem consisted of leaving X instructions fixed at a point and then fit the other X instructions from the other process between the spaces taking into account that they must be ordered (instruction 4 of process B must always come before instruction 20). I wrote a program to count this number, looking at the results I found out that the solution is n Combination k, where k is the number of instructions executed throughout the whole loop of one process, so for 10 iterations it would be 30, and n is k*2 (2 processes). In other words, n number of objects with n/2 fixed and having to fit n/2 among the spaces without the latter n/2 losing their order. Ok problem solved. No, not really. I have no idea why this is, I understand that the definition of a combination is, in how many ways can you take k elements from a group of n such that all the groups are different but the order in which you take the elements doesn't matter. In this case we have n elements and we are actually taking them all, because all the instructions are executed ( n C n). If one explains it by saying that there are 2k blue (A) and red (B) objects in a bag and you take k objects from the bag, you are still only taking k instructions when 2k instructions are actually executed. Can you please shed some light into this? Thanks in advance.

    Read the article

  • Counting unique XML element attributes using XSLT

    - by patrick_corrigan
    I've just started using XSLT and am trying to figure this out. Your help would be greatly appreciated. Example XML file <purchases> <item id = "1" customer_id = "5"> </item> <item id = "2" customer_id = "5"> </item> <item id = "7" customer_id = "4"> </item> </purchases> Desired HTML Output: <table> <tr><th>Customer ID</th><th>Number of Items Purchased</th></tr> <tr><td>5</td><td>2</td></tr> <tr><td>4</td><td>1</td></tr> </table> Customer with id number 5 has bought 2 items. Customer with id number 4 has bought 1 item.

    Read the article

  • Filling in uninitialized array in java? (or workaround!)

    - by AlexRamallo
    Hello all, I'm currently in the process of creating an OBJ importer for an opengles android game. I'm relatively new to the language java, so I'm not exactly clear on a few things. I have an array which will hold the number of vertices in the model(along with a few other arrays as well): float vertices[]; The problem is that I don't know how many vertices there are in the model before I read the file using the inputstream given to me. Would I be able to fill it in as I need to like this?: vertices[95] = 5.004f; //vertices was defined like the example above or do I have to initialize it beforehand? if the latter is the case then what would be a good way to find out the number of vertices in the file? Once I read it using inputstreamreader.read() it goes to the next line until it reads the whole file. The only thing I can think of would be to read the whole file, count the number of vertices, then read it AGAIN the fill in the newly initialized array. Is there a way to dynamically allocate the data as is needed?

    Read the article

  • Python server open all ports

    - by user1670178
    I am trying to open all ports using this code, why can I not create a loop to perform this function? http://www.kellbot.com/2010/02/tutorial-writing-a-tcp-server-in-python/ #!/usr/bin/python # This is server.py file ##server.py from socket import * #import the socket library n=1025 while n<1050: ##let's set up some constants HOST = '' #we are the host PORT = n #arbitrary port not currently in use ADDR = (HOST,PORT) #we need a tuple for the address BUFSIZE = 4096 #reasonably sized buffer for data ## now we create a new socket object (serv) ## see the python docs for more information on the socket types/flags serv = socket( AF_INET,SOCK_STREAM) ##bind our socket to the address serv.bind((ADDR)) #the double parens are to create a tuple with one element serv.listen(5) #5 is the maximum number of queued connections we'll allow serv = socket( AF_INET,SOCK_STREAM) ##bind our socket to the address serv.bind((ADDR)) #the double parens are to create a tuple with one element serv.listen(5) #5 is the maximum number of queued connections we'll allow print 'listening...' n=n+1 conn,addr = serv.accept() #accept the connection print '...connected!' conn.send('TEST') conn.close() How do I make this work so that I can specify input range and have the server open all ports up to 65535? #!/usr/bin/python # This is server.py file from socket import * #import the socket library startingPort=input("\nPlease enter starting port: ") startingPort=int(startingPort) #print startingPort def connection(): ## let's set up some constants HOST = '' #we are the host PORT = startingPort #arbitrary port not currently in use ADDR = (HOST,PORT) #we need a tuple for the address BUFSIZE = 4096 #reasonably sized buffer for data def socketObject(): ## now we create a new socket object (serv) serv = socket( AF_INET,SOCK_STREAM) def bind(): ## bind our socket to the address serv = socket( AF_INET,SOCK_STREAM) serv.bind((ADDR)) #the double parens are to create a tuple with one element serv.listen(5) #5 is the maximum number of queued connections we'll allow serv = socket( AF_INET,SOCK_STREAM) print 'listening...' def accept(): conn,addr = serv.accept() #accept the connection print '...connected!' conn.send('TEST') def close(): conn.close() ## Main while startingPort<65535: connection() socketObject() bind() accept() startingPort=startingPort+1

    Read the article

  • C code Error: free(): invalid next size (fast):

    - by user1436057
    I got an error from my code, but I'm not sure where to fix it. Here's the explanation of what my code does: I'm writing some code that will read an input file and store each line as an object (char type) in an array. The first line of the input file is a number. This number tells me how many lines that I should read and store in the array. Here's my code: int main(int argc, char *argv[]){ FILE *fp; char **path; int num, i; ... /*after reading the first line and store the number value in num*/ path = malloc(num *sizeof(char)); i=0; while (!feof(fp)) { char buffer[500]; int length = 0; for (ch = fgetc(fp); ch != EOF && ch != '\n'; ch = fgetc(fp)) { buffer[length++] = ch; } if(ch == '\n' && ch!= EOF){ buffer[length] = '\0'; path[i] = malloc(strlen(buffer)+1); strcpy(path[i], buffer); i++; } } ... free(path); } After running the code, I get this *** glibc detected *** free(): invalid next size (fast): I have searched around and know this is malloc/free error, but I don't exactly know to fix it. Any help would be great. Thanks!

    Read the article

  • getline with ints C++

    - by Mdjon26
    I have a file 0 3 2 1 2 3 4 5 6 6 8 1 Where the first number for each line is the row, the second number is the column, and the third number is the data contained in that row, column. This will be a given [8][8] array so I have already initialized everything to 0, but how can I store each of these data values? For example, I want [0][3] =2 and [1][2] = 3. I would like to keep track of the line on which I found that row, col, and data value. So, how can I correctly insert these values into my 2-D array? int rowcol[8][8]; for (int i=0; i < 9; i++) for (int j=0; j < 9; j++) { rowcol[i][j] =0; } ifstream myfile; int nums; myfile.open(text.c_str()); while (!myfile.eof()) { myfile >> nums; numbers.push_back(nums); } for (int i=0; i < numbers.size(); i++) { //Not sure what the best approach here would be and I'm not even sure if I should have done a vector... }

    Read the article

  • MySQL Table Loop using PHP

    - by JM4
    I have an online form which collects consumer data and stores in a dedicated MySQL database. In some instances, data is passed in the URL under the "RefID" variable which is also stored in the database and attached to each registration. I use the 'mysql_num_rows ($result)' to fetch all agent details on another page but this only returns ALL available details. My goal is as follows: GOAL I want to create an HTML table in which rows are automatically generated based on the list of all registrations on my site. A new row is created IF and ONLY IF a unique RefID is present on that particular record. In the event the field is NULL, it is reported on a single line. In short, the HTML table could look something like this: RefID - Number of Enrollments abc123 - 10 baseball - 11 twonk - 7 NULL - 33 Where abc123 is a particular RefID and 10 is the number of times that RefID appears in the DB. If a new registration comes in with RefID = "horses", a new row is created, showing "horses - 1". The HTML table will be viewable by account administrators needing to see the number of enrollments for a particular RefID (which they won't know ahead of time). Anybody have any suggestions?

    Read the article

  • Reading/Writing/Modifying a struct in C

    - by user1016401
    I am taking some information from a user (name, address, contact number) and store it in a struct. I then store this in a file which is opened in "r+" mode. I try reading it line by line and see if the entry I am trying to enter already exists, in which case I exit. Otherwise I append this entry at the end of the file. The problem is that when I open the file in "r+" mode, it gives me Segmentation fault! Here is the code: struct cust{ char *frstnam; char *lastnam; char *cntact; char *add; }; Now consider this function. I am passing a struct of information in this function. Its job is to check if this struct already exists else append it to end of file. void check(struct cust c) { struct cust cpy; FILE *f; f=fopen("Customer.txt","r+"); int num=0; if (f!= NULL){ while (!feof(f)) { num++; fread(&cpy,sizeof(struct cust),1,f); if ((cpy.frstnam==c.frstnam)&(cpy.lastnam==c.lastnam)&(cpy.cntact==c.cntact)&(cpy.add==c.add)) { printf("Hi %s %s. Nice to meet you again. You live at %s and your contact number is %s\n", cpy.frstnam,cpy.lastnam,cpy.add,cpy.cntact); return; } } fwrite(&c,sizeof(struct cust),1,f); fclose (f); } printf("number of lines read is %d\n",num); }

    Read the article

  • SQL - Query to display average as either "longer than" or "shorter than"

    - by user1840801
    Here are the tables I've created: CREATE TABLE Plane_new (Pnum char(3), Feature varchar2(20), Ptype varchar2(15), primary key (Pnum)); CREATE TABLE Employee_new (eid char(3), ename varchar(10), salary number(7,2), mid char(3), PRIMARY KEY (eid), FOREIGN KEY (mid) REFERENCES Employee_new); CREATE TABLE Pilot_new (eid char(3), Licence char(9), primary key (eid), foreign key (eid) references Employee_new on delete cascade); CREATE TABLE FlightI_new (Fnum char(4), Fdate date, Duration number(2), Pid char(3), Pnum char(3), primary key (Fnum), foreign key (Pid) references Pilot_new (eid), foreign key (Pnum) references Plane_new); And here is the query I must complete: For each flight, display its number, the name of the pilot who implemented the flight and the words ‘Longer than average’ if the flight duration was longer than average or the words ‘Shorter than average’ if the flight duration was shorter than or equal to the average. For the column holding the words ‘Longer than average’ or ‘Shorter than average’ make a header Length. Here is what I've come up with - with no luck! SELECT F.Fnum, E.ename, CASE Length WHEN F.Duration>(SELECT AVG(F.Duration) FROM FlightI_new F) THEN "Longer than average" WHEN F.Duration<=(SELECT AVG(F.Duration) FROM FlightI_new F) THEN 'Shorter than average' END FROM FlightI_new F LEFT OUTER JOIN Employee_new E ON F.Pid=E.eid GROUP BY F.Fnum, E.ename; Where am I going wrong?

    Read the article

  • Changing Value of Array Pointer When Passed to a Function

    - by ZAX
    I have a function which receives both the array, and a specific instance of the array. I try to change the specific instance of the array by accessing one of its members "color", but it does not actually change it, as can be seen by debugging (checking the value of color after function runs in the main program). I am hoping someone can help me to access this member and change it. Essentially I need the instance of the array I'm specifying to be passed by reference if nothing else, but I'm hoping there is an easier way to accomplish what I'm trying to do. Here's the structures: typedef struct adjEdge{ int vertex; struct adjEdge *next; } adjEdge; typedef struct vertex{ int sink; int source; int color; //0 will be white, 1 will be grey, 5 will be black int number; adjEdge *nextVertex; } vertex; And here is the function: void walk(vertex *vertexArray, vertex v, int source, maxPairing *head) { int i; adjEdge *traverse; int moveVertex; int sink; traverse = vertexArray[v.number-1].nextVertex; if(v.color != 5 && v.sink == 5) { sink = v.number; v.color = 5; addMaxPair(head, source, sink); } else { walk(vertexArray, vertexArray[traverse->vertex-1], source, head); } } In particular, v.color needs to be changed to a 5, that way later after recursion the if condition blocks it.

    Read the article

  • Change form submission (enter to tab)

    - by user1298883
    I have a real basic form (code below) with a bunch of back-panel PhP. There is a scanner being used to input the data, but instead of tab after each item, it sends an "enter" command. Is it viable to add javascript to cause enter to instead tab to the next form field, and upon the last form field, submit it instead? I have found a few scripts online, but none that I have tried have worked in Firefox/Chrome. CODE: <html><head><title>Barcode Generation</title></head><body> <fieldset style="width: 300px;"> <form action="generator.php" method="post"> Invoice Number:<input type="text" name="invoice" /><br /> Model Number:<input type="text" name="model" /><br /> Serial Number:<input type="text" name="serial" /><br /> <input type="hidden" name="reload" value="true" /> <input type="submit" /> </form><br /><a href=null>en espanol</a></fieldset> </body></html>

    Read the article

  • For Loop not advancing

    - by shizishan
    I'm trying to read in a large number of jpg files using a for loop. But for some reason, the k index is not advancing. I just get A as a 460x520x3 uint8. Am I missing something? My goal with this code is to convert all the jpg images to the same size. Since I haven't been able to advance through the images, I can't quite tell if I'm doing it right. nFrames = length(date); % Number of frames. for k = 1:nFrames-1 % Number of days % Set file under consideration A = imread(['map_EUS_' datestr(cell2mat(date_O3(k)),'yyyy_mm_dd') '_O3_MDA8.jpg']); % Size of existing image A. [rowsA, colsA, numberOfColorChannelsA] = size(A); % Read in and get size of existing image B (the next image). B = imread(['map_EUS_' datestr(cell2mat(date_O3(k+1)),'yyyy_mm_dd') '_O3_MDA8.jpg']); [rowsB, colsB, numberOfColorChannelsB] = size(B); % Size of B does not match A, so resize B to match A's size. B = imresize(B, [rowsA colsA]); eval(['print -djpeg map_EUS_' datestr(cell2mat(date_O3(k)),'yyyy_mm_dd') '_O3_MDA8_test.jpg']); end end

    Read the article

  • Content Management for WebCenter Installation Guide

    - by Gary Niu
    Overvew As we known, there are two way to install Content Management for WebCenter. One way is install it by WebCenter installer wizard, another way is to install it use their own installer. This guide is for the later one. For SSO purpose, I also mentioned how to config OID identity store for Content Management for WebCenter. Content Management for WebCenter( 10.1.3.5.1) Oracle Enterprise Linux R5U4 Basic Installation -bash-3.2$ ./setup.sh Please select your locale from the list.           1. Chinese-Simplified           2. Chinese-Traditional           3. Deutsch          *4. English-US           5. English-UK           6. Español           7. Français           8. Italiano           9. Japanese          10. Korean          11. Nederlands          12. Português-Brazil Choice? Throughout the install, when entering a text value, you can press Enter to accept the default that appears between square brackets ([]). When selecting from a list, you can select the choice followed by an asterisk by pressing Enter. Select installation type from the list.         *1. Install new server          2. Update a server Choice? Content Server Installation Directory Please enter the full pathname to the installation directory. Content Server Core Folder [/oracle/ucm/server]:/opt/oracle/ucm/server Create Directory         *1. yes          2. no Choice? Java virtual machine         *1. Sun Java 1.5.0_11 JDK          2. Specify a custom Java virtual machine Choice? Installing with Java version 1.5.0_11. Enter the location of the native file repository. This directory contains the native files checked in by contributors. Content Server Native Vault Folder [/opt/oracle/ucm/server/vault/]: Create Directory         *1. yes          2. no Choice? Enter the location of the web-viewable file repository. This directory contains files that can be accessed through the web server. Content Server Weblayout Folder [/opt/oracle/ucm/server/weblayout/]: Create Directory         *1. yes          2. no Choice? This server can be configured to manage its own authentication or to allow another master to act as an authentication proxy. Configure this server as a master or proxied server.         *1. Configure as a master server.          2. Configure as server proxied by a local master server. Choice? During installation, an admin server can be installed and configured to manage this server. If there is already an admin server on this system, you can have the installer configure it to administrate this server instead. Select admin server configuration.         *1. Install an admin server to manage this server.          2. Configure an existing admin server to manage this server.          3. Don't configure an admin server. Choice? Enter the location of an executable to start your web browser. This browser will be used to display the online help. Web Browser Path [/usr/bin/firefox]: Content Server System locale           1. Chinese-Simplified           2. Chinese-Traditional           3. Deutsch          *4. English-US           5. English-UK           6. Español           7. Français           8. Italiano           9. Japanese          10. Korean          11. Nederlands          12. Português-Brazil Choice? Please select the region for your timezone from the list.         *1. Use the timezone setting for your operating system          2. Pacific          3. America          4. Atlantic          5. Europe          6. Africa          7. Asia          8. Indian          9. Australia Choice? Please enter the port number that will be used to connect to the Content Server. This port must be otherwise unused. Content Server Port [4444]: Please enter the port number that will be used to connect to the Admin Server. This port must be otherwise unused. Admin Server Port [4440]: Enter a security filter for the server port. Hosts which are allowed to communicate directly with the server port may access any resources managed by the server. Insure that hosts which need access are included in the filter. See the installation guide for more details. Incoming connection address filter [127.0.0.1]:*.*.*.* *** Content Server URL Prefix The URL prefix specified here is used when generating HTML pages that refer to the contents of the weblayout directory within the installation. This prefix must be mapped in the web server Additional Document Directories section of the Content Management administration menu to the physical location of the weblayout directory. For example, "/idc/" would be used in your installation to refer to the URL http://ucm.company.com/idc which would be mapped in the web server to the physical location /oracle/ucm/server/weblayout. Web Server Relative Root [/idc/]: Enter the name of the local mail server. The server will contact this system to deliver email. Company Mail Server [mail]: Enter the e-mail address for the system administrator. Administrator E-Mail Address [sysadmin@mail]: *** Web Server Address Many generated HTML pages refer to the web server you are using. The address specified here will be used when generating those pages. The address should include the host and domain name in most cases. If your webserver is running on a port other than 80, append a colon and the port number. Examples: www.company.com, ucm.company.com:90 Web Server HTTP Address [yekki]:yekki.cn.oracle.com:7777 Enter the name for this instance. This name should be unique across your entire enterprise. It may not contain characters other than letters, numbers, and underscores. Server Instance Name [idc]: Enter a short label for this instance. This label is used on web pages to identify this instance. It should be less than 12 characters long. Server Instance Label [idc]: Enter a long description for this instance. Server Description [Content Server idc]: Web Server         *1. Apache          2. Sun ONE          3. Configure manually Choice? Please select a database from the list below to use with the Content Server. Content Server Database         *1. Oracle          2. Microsoft SQL Server 2005          3. Microsoft SQL Server 2000          4. Sybase          5. DB2          6. Custom JDBC settings          7. Skip database configuration Choice? Manually configure JDBC settings for this database          1. yes         *2. no Choice? Oracle Server Hostname [localhost]: Oracle Listener Port Number [1521]: *** Database User ID The user name is used to log into the database used by the content server. Oracle User [user]:YEKKI_OCSERVER *** Database Password The password is used to log into the database used by the content server. Oracle Password []:oracle Oracle Instance Name [ORACLE]:orcl Configure the JVM to find the JDBC driver in a specific jar file          1. yes         *2. no Choice? The installer can attempt to create the database tables or you can manually create them. If you choose to manually create the tables, you should create them now. Attempt to create database tables          1. yes         *2. no Choice? Select components to install.          1. ContentFolios: Collect related items in folios          2. Folders_g: Organize content into hierarchical folders          3. LinkManager8: Hypertext link management support          4. OracleTextSearch: External Oracle 11g database as search indexer support          5. ThreadedDiscussions: Threaded discussion management Enter numbers separated by commas to toggle, 0 to unselect all, F to finish: 1,2,3,4,5         *1. ContentFolios: Collect related items in folios         *2. Folders_g: Organize content into hierarchical folders         *3. LinkManager8: Hypertext link management support         *4. OracleTextSearch: External Oracle 11g database as search indexer support         *5. ThreadedDiscussions: Threaded discussion management Enter numbers separated by commas to toggle, 0 to unselect all, F to finish: F Checking configuration. . . Configuration OK. Review install settings. . . Content Server Core Folder: /opt/oracle/ucm/server Java virtual machine: Sun Java 1.5.0_11 JDK Content Server Native Vault Folder: /opt/oracle/ucm/server/vault/ Content Server Weblayout Folder: /opt/oracle/ucm/server/weblayout/ Proxy authentication through another server: no Install admin server: yes Web Browser Path: /usr/bin/firefox Content Server System locale: English-US Content Server Port: 4444 Admin Server Port: 4440 Incoming connection address filter: *.*.*.* Web Server Relative Root: /idc/ Company Mail Server: mail Administrator E-Mail Address: sysadmin@mail Web Server HTTP Address: yekki.cn.oracle.com:7777 Server Instance Name: idc Server Instance Label: idc Server Description: Content Server idc Web Server: Apache Content Server Database: Oracle Manually configure JDBC settings for this database: false Oracle Server Hostname: localhost Oracle Listener Port Number: 1521 Oracle User: YEKKI_OCSERVER Oracle Password: 6GP1gBgzSyKa4JW10U8UqqPznr/lzkNn/Ojf6M8GJ8I= Oracle Instance Name: orcl Configure the JVM to find the JDBC driver in a specific jar file: false Attempt to create database tables: no Components: ContentFolios,Folders_g,LinkManager8,OracleTextSearch,ThreadedDiscussions Proceed with install         *1. Proceed          2. Change configuration          3. Recheck the configuration          4. Abort installation Choice? Finished install type Install with warnings at 4/2/10 12:32 AM. Run Scripts -bash-3.2$ ./wc_contentserverconfig.sh /opt/oracle/ucm/server /mnt/hgfs/SOFTWARE/ofm_ucm_generic_10.1.3.5.1_disk1_1of1/ContentServer/webcenter-conf Installing '/mnt/hgfs/SOFTWARE/ofm_ucm_generic_10.1.3.5.1_disk1_1of1/ContentServer/webcenter-conf/CS10gR35UpdateBundle.zip' Service 'DELETE_DOC' Extended Service 'DELETE_BYREV_REVISION' Extended Installing '/mnt/hgfs/SOFTWARE/ofm_ucm_generic_10.1.3.5.1_disk1_1of1/ContentServer/webcenter-conf/ContentAccess/ContentAccess-linux.zip' (internal)      04.02 00:40:38.019      main    updateDocMetaDefinitionV11: adding decimal column Installing '/opt/oracle/ucm/server/custom/CS10gR35UpdateBundle/extras/Folders_g.zip' Installing '/opt/oracle/ucm/server/custom/CS10gR35UpdateBundle/extras/FusionLibraries.zip' Installing '/opt/oracle/ucm/server/custom/CS10gR35UpdateBundle/extras/JpsUserProvider.zip' Installing '/mnt/hgfs/SOFTWARE/ofm_ucm_generic_10.1.3.5.1_disk1_1of1/ContentServer/webcenter-conf/WcConfigure.zip' Apr 2, 2010 12:41:24 AM oracle.security.jps.internal.core.util.JpsConfigUtil getPasswordCredential WARNING: A password credential is expected; instead found . Apr 2, 2010 12:41:24 AM oracle.security.jps.internal.idstore.util.IdentityStoreUtil getUnamePwdFromCredStore WARNING: The credential with map JPS and key ldap.credential does not exist. Apr 2, 2010 12:41:27 AM oracle.security.jps.internal.core.util.JpsConfigUtil getPasswordCredential WARNING: A password credential is expected; instead found . Apr 2, 2010 12:41:27 AM oracle.security.jps.internal.idstore.util.IdentityStoreUtil getUnamePwdFromCredStore WARNING: The credential with map JPS and key ldap.credential does not exist. Apr 2, 2010 12:41:28 AM oracle.security.jps.internal.core.util.JpsConfigUtil getPasswordCredential WARNING: A password credential is expected; instead found . Apr 2, 2010 12:41:28 AM oracle.security.jps.internal.idstore.util.IdentityStoreUtil getUnamePwdFromCredStore WARNING: The credential with map JPS and key ldap.credential does not exist. Restart Content Server to apply updates. Configuring Apache Web Server append the following lines at httpd.conf: include "/opt/oracle/ucm/server/data/users/apache22/apache.conf" Configuring the Identity Store( Optional ) 1.  Stop Oracle Content Server and the Admin Server 2.  Update the Oracle Content Server's JPS configuration file, jps-config.xml: a. add a service instance <serviceInstance provider="idstore.ldap.provider" name="idstore.oid"> <property name="subscriber.name" value="dc=cn,dc=oracle,dc=com"></property> <property name="idstore.type" value="OID"></property> <property name="security.principal.key" value="ldap.credential"></property> <property name="security.principal.alias" value="JPS"></property> <property name="ldap.url" value="ldap://yekki.cn.oracle.com:3060"></property> <extendedProperty> <name>user.search.bases</name> <values> <value>cn=users,dc=cn,dc=oracle,dc=com</value> </values> </extendedProperty> <extendedProperty> <name>group.search.bases</name> <values> <value>cn=groups,dc=cn,dc=oracle,dc=com</value> </values> </extendedProperty> <property name="username.attr" value="uid"></property> <property name="user.login.attr" value="uid"></property> <property name="groupname.attr" value="cn"></property> </serviceInstance> b. Ensure that the <jpsContext> entry in the jps-config.xml file refers to the new serviceInstance, that is, idstore.oid and not idstore.ldap: <jpsContext name="default"> <serviceInstanceRef ref="idstore.oid"/> 3. Run the new script to setup the credentials for idstore.oid in the credential store: cd CONTENT_SERVER_HOME/custom/FusionLibraries/tools -bash-3.2$ ./run_credtool.sh Buildfile: ./../tools/credtool.xml     [input] skipping input as property action has already been set.     [input] Alias: [JPS]     [input] Key: [ldap.credential]     [input] User Name: cn=orcladmin     [input] Password: welcome1     [input] JPS Config: [/opt/oracle/ucm/server/custom/FusionLibraries/tools/../../../config/jps-config.xml] manage-creds:      [echo] @@@ Help: run 'ant manage-creds' command to see the detailed usage      [java] Using default context in /opt/oracle/ucm/server/custom/FusionLibraries/tools/../../../config/jps-config.xml file for credential store.      [java] Credential store location : /opt/oracle/ucm/server/config      [java] Credential with map JPS key ldap.credential stored successfully!      [java]      [java]      [java]     Credential for map JPS and key ldap.credential is:      [java]             PasswordCredential name : cn=orcladmin      [java]             PasswordCredential password : welcome1 BUILD SUCCESSFUL Total time: 1 minute 27 seconds Testing 1. acces http://yekki.cn.oracle.com:7777/idc 2. login in with OID user, for example: orcladmin/welcome1 3. make sure your JpsUserProvider status is "good"

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • Scrum in 5 Minutes

    - by Stephen.Walther
    The goal of this blog entry is to explain the basic concepts of Scrum in less than five minutes. You learn how Scrum can help a team of developers to successfully complete a complex software project. Product Backlog and the Product Owner Imagine that you are part of a team which needs to create a new website – for example, an e-commerce website. You have an overwhelming amount of work to do. You need to build (or possibly buy) a shopping cart, install an SSL certificate, create a product catalog, create a Facebook page, and at least a hundred other things that you have not thought of yet. According to Scrum, the first thing you should do is create a list. Place the highest priority items at the top of the list and the lower priority items lower in the list. For example, creating the shopping cart and buying the domain name might be high priority items and creating a Facebook page might be a lower priority item. In Scrum, this list is called the Product Backlog. How do you prioritize the items in the Product Backlog? Different stakeholders in the project might have different priorities. Gary, your division VP, thinks that it is crucial that the e-commerce site has a mobile app. Sally, your direct manager, thinks taking advantage of new HTML5 features is much more important. Multiple people are pulling you in different directions. According to Scrum, it is important that you always designate one person, and only one person, as the Product Owner. The Product Owner is the person who decides what items should be added to the Product Backlog and the priority of the items in the Product Backlog. The Product Owner could be the customer who is paying the bills, the project manager who is responsible for delivering the project, or a customer representative. The critical point is that the Product Owner must always be a single person and that single person has absolute authority over the Product Backlog. Sprints and the Sprint Backlog So now the developer team has a prioritized list of items and they can start work. The team starts implementing the first item in the Backlog — the shopping cart — and the team is making good progress. Unfortunately, however, half-way through the work of implementing the shopping cart, the Product Owner changes his mind. The Product Owner decides that it is much more important to create the product catalog before the shopping cart. With some frustration, the team switches their developmental efforts to focus on implementing the product catalog. However, part way through completing this work, once again the Product Owner changes his mind about the highest priority item. Getting work done when priorities are constantly shifting is frustrating for the developer team and it results in lower productivity. At the same time, however, the Product Owner needs to have absolute authority over the priority of the items which need to get done. Scrum solves this conflict with the concept of Sprints. In Scrum, a developer team works in Sprints. At the beginning of a Sprint the developers and the Product Owner agree on the items from the backlog which they will complete during the Sprint. This subset of items from the Product Backlog becomes the Sprint Backlog. During the Sprint, the Product Owner is not allowed to change the items in the Sprint Backlog. In other words, the Product Owner cannot shift priorities on the developer team during the Sprint. Different teams use Sprints of different lengths such as one month Sprints, two-week Sprints, and one week Sprints. For high-stress, time critical projects, teams typically choose shorter sprints such as one week sprints. For more mature projects, longer one month sprints might be more appropriate. A team can pick whatever Sprint length makes sense for them just as long as the team is consistent. You should pick a Sprint length and stick with it. Daily Scrum During a Sprint, the developer team needs to have meetings to coordinate their work on completing the items in the Sprint Backlog. For example, the team needs to discuss who is working on what and whether any blocking issues have been discovered. Developers hate meetings (well, sane developers hate meetings). Meetings take developers away from their work of actually implementing stuff as opposed to talking about implementing stuff. However, a developer team which never has meetings and never coordinates their work also has problems. For example, Fred might get stuck on a programming problem for days and never reach out for help even though Tom (who sits in the cubicle next to him) has already solved the very same problem. Or, both Ted and Fred might have started working on the same item from the Sprint Backlog at the same time. In Scrum, these conflicting needs – limiting meetings but enabling team coordination – are resolved with the idea of the Daily Scrum. The Daily Scrum is a meeting for coordinating the work of the developer team which happens once a day. To keep the meeting short, each developer answers only the following three questions: 1. What have you done since yesterday? 2. What do you plan to do today? 3. Any impediments in your way? During the Daily Scrum, developers are not allowed to talk about issues with their cat, do demos of their latest work, or tell heroic stories of programming problems overcome. The meeting must be kept short — typically about 15 minutes. Issues which come up during the Daily Scrum should be discussed in separate meetings which do not involve the whole developer team. Stories and Tasks Items in the Product or Sprint Backlog – such as building a shopping cart or creating a Facebook page – are often referred to as User Stories or Stories. The Stories are created by the Product Owner and should represent some business need. Unlike the Product Owner, the developer team needs to think about how a Story should be implemented. At the beginning of a Sprint, the developer team takes the Stories from the Sprint Backlog and breaks the stories into tasks. For example, the developer team might take the Create a Shopping Cart story and break it into the following tasks: · Enable users to add and remote items from shopping cart · Persist the shopping cart to database between visits · Redirect user to checkout page when Checkout button is clicked During the Daily Scrum, members of the developer team volunteer to complete the tasks required to implement the next Story in the Sprint Backlog. When a developer talks about what he did yesterday or plans to do tomorrow then the developer should be referring to a task. Stories are owned by the Product Owner and a story is all about business value. In contrast, the tasks are owned by the developer team and a task is all about implementation details. A story might take several days or weeks to complete. A task is something which a developer can complete in less than a day. Some teams get lazy about breaking stories into tasks. Neglecting to break stories into tasks can lead to “Never Ending Stories” If you don’t break a story into tasks, then you can’t know how much of a story has actually been completed because you don’t have a clear idea about the implementation steps required to complete the story. Scrumboard During the Daily Scrum, the developer team uses a Scrumboard to coordinate their work. A Scrumboard contains a list of the stories for the current Sprint, the tasks associated with each Story, and the state of each task. The developer team uses the Scrumboard so everyone on the team can see, at a glance, what everyone is working on. As a developer works on a task, the task moves from state to state and the state of the task is updated on the Scrumboard. Common task states are ToDo, In Progress, and Done. Some teams include additional task states such as Needs Review or Needs Testing. Some teams use a physical Scrumboard. In that case, you use index cards to represent the stories and the tasks and you tack the index cards onto a physical board. Using a physical Scrumboard has several disadvantages. A physical Scrumboard does not work well with a distributed team – for example, it is hard to share the same physical Scrumboard between Boston and Seattle. Also, generating reports from a physical Scrumboard is more difficult than generating reports from an online Scrumboard. Estimating Stories and Tasks Stakeholders in a project, the people investing in a project, need to have an idea of how a project is progressing and when the project will be completed. For example, if you are investing in creating an e-commerce site, you need to know when the site can be launched. It is not enough to just say that “the project will be done when it is done” because the stakeholders almost certainly have a limited budget to devote to the project. The people investing in the project cannot determine the business value of the project unless they can have an estimate of how long it will take to complete the project. Developers hate to give estimates. The reason that developers hate to give estimates is that the estimates are almost always completely made up. For example, you really don’t know how long it takes to build a shopping cart until you finish building a shopping cart, and at that point, the estimate is no longer useful. The problem is that writing code is much more like Finding a Cure for Cancer than Building a Brick Wall. Building a brick wall is very straightforward. After you learn how to add one brick to a wall, you understand everything that is involved in adding a brick to a wall. There is no additional research required and no surprises. If, on the other hand, I assembled a team of scientists and asked them to find a cure for cancer, and estimate exactly how long it will take, they would have no idea. The problem is that there are too many unknowns. I don’t know how to cure cancer, I need to do a lot of research here, so I cannot even begin to estimate how long it will take. So developers hate to provide estimates, but the Product Owner and other product stakeholders, have a legitimate need for estimates. Scrum resolves this conflict by using the idea of Story Points. Different teams use different units to represent Story Points. For example, some teams use shirt sizes such as Small, Medium, Large, and X-Large. Some teams prefer to use Coffee Cup sizes such as Tall, Short, and Grande. Finally, some teams like to use numbers from the Fibonacci series. These alternative units are converted into a Story Point value. Regardless of the type of unit which you use to represent Story Points, the goal is the same. Instead of attempting to estimate a Story in hours (which is doomed to failure), you use a much less fine-grained measure of work. A developer team is much more likely to be able to estimate that a Story is Small or X-Large than the exact number of hours required to complete the story. So you can think of Story Points as a compromise between the needs of the Product Owner and the developer team. When a Sprint starts, the developer team devotes more time to thinking about the Stories in a Sprint and the developer team breaks the Stories into Tasks. In Scrum, you estimate the work required to complete a Story by using Story Points and you estimate the work required to complete a task by using hours. The difference between Stories and Tasks is that you don’t create a task until you are just about ready to start working on a task. A task is something that you should be able to create within a day, so you have a much better chance of providing an accurate estimate of the work required to complete a task than a story. Burndown Charts In Scrum, you use Burndown charts to represent the remaining work on a project. You use Release Burndown charts to represent the overall remaining work for a project and you use Sprint Burndown charts to represent the overall remaining work for a particular Sprint. You create a Release Burndown chart by calculating the remaining number of uncompleted Story Points for the entire Product Backlog every day. The vertical axis represents Story Points and the horizontal axis represents time. A Sprint Burndown chart is similar to a Release Burndown chart, but it focuses on the remaining work for a particular Sprint. There are two different types of Sprint Burndown charts. You can either represent the remaining work in a Sprint with Story Points or with task hours (the following image, taken from Wikipedia, uses hours). When each Product Backlog Story is completed, the Release Burndown chart slopes down. When each Story or task is completed, the Sprint Burndown chart slopes down. Burndown charts typically do not always slope down over time. As new work is added to the Product Backlog, the Release Burndown chart slopes up. If new tasks are discovered during a Sprint, the Sprint Burndown chart will also slope up. The purpose of a Burndown chart is to give you a way to track team progress over time. If, halfway through a Sprint, the Sprint Burndown chart is still climbing a hill then you know that you are in trouble. Team Velocity Stakeholders in a project always want more work done faster. For example, the Product Owner for the e-commerce site wants the website to launch before tomorrow. Developers tend to be overly optimistic. Rarely do developers acknowledge the physical limitations of reality. So Project stakeholders and the developer team often collude to delude themselves about how much work can be done and how quickly. Too many software projects begin in a state of optimism and end in frustration as deadlines zoom by. In Scrum, this problem is overcome by calculating a number called the Team Velocity. The Team Velocity is a measure of the average number of Story Points which a team has completed in previous Sprints. Knowing the Team Velocity is important during the Sprint Planning meeting when the Product Owner and the developer team work together to determine the number of stories which can be completed in the next Sprint. If you know the Team Velocity then you can avoid committing to do more work than the team has been able to accomplish in the past, and your team is much more likely to complete all of the work required for the next Sprint. Scrum Master There are three roles in Scrum: the Product Owner, the developer team, and the Scrum Master. I’v e already discussed the Product Owner. The Product Owner is the one and only person who maintains the Product Backlog and prioritizes the stories. I’ve also described the role of the developer team. The members of the developer team do the work of implementing the stories by breaking the stories into tasks. The final role, which I have not discussed, is the role of the Scrum Master. The Scrum Master is responsible for ensuring that the team is following the Scrum process. For example, the Scrum Master is responsible for making sure that there is a Daily Scrum meeting and that everyone answers the standard three questions. The Scrum Master is also responsible for removing (non-technical) impediments which the team might encounter. For example, if the team cannot start work until everyone installs the latest version of Microsoft Visual Studio then the Scrum Master has the responsibility of working with management to get the latest version of Visual Studio as quickly as possible. The Scrum Master can be a member of the developer team. Furthermore, different people can take on the role of the Scrum Master over time. The Scrum Master, however, cannot be the same person as the Product Owner. Using SonicAgile SonicAgile (SonicAgile.com) is an online tool which you can use to manage your projects using Scrum. You can use the SonicAgile Product Backlog to create a prioritized list of stories. You can estimate the size of the Stories using different Story Point units such as Shirt Sizes and Coffee Cup sizes. You can use SonicAgile during the Sprint Planning meeting to select the Stories that you want to complete during a particular Sprint. You can configure Sprints to be any length of time. SonicAgile calculates Team Velocity automatically and displays a warning when you add too many stories to a Sprint. In other words, it warns you when it thinks you are overcommitting in a Sprint. SonicAgile also includes a Scrumboard which displays the list of Stories selected for a Sprint and the tasks associated with each story. You can drag tasks from one task state to another. Finally, SonicAgile enables you to generate Release Burndown and Sprint Burndown charts. You can use these charts to view the progress of your team. To learn more about SonicAgile, visit SonicAgile.com. Summary In this post, I described many of the basic concepts of Scrum. You learned how a Product Owner uses a Product Backlog to create a prioritized list of tasks. I explained why work is completed in Sprints so the developer team can be more productive. I also explained how a developer team uses the daily scrum to coordinate their work. You learned how the developer team uses a Scrumboard to see, at a glance, who is working on what and the state of each task. I also discussed Burndown charts. You learned how you can use both Release and Sprint Burndown charts to track team progress in completing a project. Finally, I described the crucial role of the Scrum Master – the person who is responsible for ensuring that the rules of Scrum are being followed. My goal was not to describe all of the concepts of Scrum. This post was intended to be an introductory overview. For a comprehensive explanation of Scrum, I recommend reading Ken Schwaber’s book Agile Project Management with Scrum: http://www.amazon.com/Agile-Project-Management-Microsoft-Professional/dp/073561993X/ref=la_B001H6ODMC_1_1?ie=UTF8&qid=1345224000&sr=1-1

    Read the article

  • PTLQueue : a scalable bounded-capacity MPMC queue

    - by Dave
    Title: Fast concurrent MPMC queue -- I've used the following concurrent queue algorithm enough that it warrants a blog entry. I'll sketch out the design of a fast and scalable multiple-producer multiple-consumer (MPSC) concurrent queue called PTLQueue. The queue has bounded capacity and is implemented via a circular array. Bounded capacity can be a useful property if there's a mismatch between producer rates and consumer rates where an unbounded queue might otherwise result in excessive memory consumption by virtue of the container nodes that -- in some queue implementations -- are used to hold values. A bounded-capacity queue can provide flow control between components. Beware, however, that bounded collections can also result in resource deadlock if abused. The put() and take() operators are partial and wait for the collection to become non-full or non-empty, respectively. Put() and take() do not allocate memory, and are not vulnerable to the ABA pathologies. The PTLQueue algorithm can be implemented equally well in C/C++ and Java. Partial operators are often more convenient than total methods. In many use cases if the preconditions aren't met, there's nothing else useful the thread can do, so it may as well wait via a partial method. An exception is in the case of work-stealing queues where a thief might scan a set of queues from which it could potentially steal. Total methods return ASAP with a success-failure indication. (It's tempting to describe a queue or API as blocking or non-blocking instead of partial or total, but non-blocking is already an overloaded concurrency term. Perhaps waiting/non-waiting or patient/impatient might be better terms). It's also trivial to construct partial operators by busy-waiting via total operators, but such constructs may be less efficient than an operator explicitly and intentionally designed to wait. A PTLQueue instance contains an array of slots, where each slot has volatile Turn and MailBox fields. The array has power-of-two length allowing mod/div operations to be replaced by masking. We assume sensible padding and alignment to reduce the impact of false sharing. (On x86 I recommend 128-byte alignment and padding because of the adjacent-sector prefetch facility). Each queue also has PutCursor and TakeCursor cursor variables, each of which should be sequestered as the sole occupant of a cache line or sector. You can opt to use 64-bit integers if concerned about wrap-around aliasing in the cursor variables. Put(null) is considered illegal, but the caller or implementation can easily check for and convert null to a distinguished non-null proxy value if null happens to be a value you'd like to pass. Take() will accordingly convert the proxy value back to null. An advantage of PTLQueue is that you can use atomic fetch-and-increment for the partial methods. We initialize each slot at index I with (Turn=I, MailBox=null). Both cursors are initially 0. All shared variables are considered "volatile" and atomics such as CAS and AtomicFetchAndIncrement are presumed to have bidirectional fence semantics. Finally T is the templated type. I've sketched out a total tryTake() method below that allows the caller to poll the queue. tryPut() has an analogous construction. Zebra stripping : alternating row colors for nice-looking code listings. See also google code "prettify" : https://code.google.com/p/google-code-prettify/ Prettify is a javascript module that yields the HTML/CSS/JS equivalent of pretty-print. -- pre:nth-child(odd) { background-color:#ff0000; } pre:nth-child(even) { background-color:#0000ff; } border-left: 11px solid #ccc; margin: 1.7em 0 1.7em 0.3em; background-color:#BFB; font-size:12px; line-height:65%; " // PTLQueue : Put(v) : // producer : partial method - waits as necessary assert v != null assert Mask = 1 && (Mask & (Mask+1)) == 0 // Document invariants // doorway step // Obtain a sequence number -- ticket // As a practical concern the ticket value is temporally unique // The ticket also identifies and selects a slot auto tkt = AtomicFetchIncrement (&PutCursor, 1) slot * s = &Slots[tkt & Mask] // waiting phase : // wait for slot's generation to match the tkt value assigned to this put() invocation. // The "generation" is implicitly encoded as the upper bits in the cursor // above those used to specify the index : tkt div (Mask+1) // The generation serves as an epoch number to identify a cohort of threads // accessing disjoint slots while s-Turn != tkt : Pause assert s-MailBox == null s-MailBox = v // deposit and pass message Take() : // consumer : partial method - waits as necessary auto tkt = AtomicFetchIncrement (&TakeCursor,1) slot * s = &Slots[tkt & Mask] // 2-stage waiting : // First wait for turn for our generation // Acquire exclusive "take" access to slot's MailBox field // Then wait for the slot to become occupied while s-Turn != tkt : Pause // Concurrency in this section of code is now reduced to just 1 producer thread // vs 1 consumer thread. // For a given queue and slot, there will be most one Take() operation running // in this section. // Consumer waits for producer to arrive and make slot non-empty // Extract message; clear mailbox; advance Turn indicator // We have an obvious happens-before relation : // Put(m) happens-before corresponding Take() that returns that same "m" for T v = s-MailBox if v != null : s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 // unlock slot to admit next producer and consumer return v Pause tryTake() : // total method - returns ASAP with failure indication for auto tkt = TakeCursor slot * s = &Slots[tkt & Mask] if s-Turn != tkt : return null T v = s-MailBox // presumptive return value if v == null : return null // ratify tkt and v values and commit by advancing cursor if CAS (&TakeCursor, tkt, tkt+1) != tkt : continue s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 return v The basic idea derives from the Partitioned Ticket Lock "PTL" (US20120240126-A1) and the MultiLane Concurrent Bag (US8689237). The latter is essentially a circular ring-buffer where the elements themselves are queues or concurrent collections. You can think of the PTLQueue as a partitioned ticket lock "PTL" augmented to pass values from lock to unlock via the slots. Alternatively, you could conceptualize of PTLQueue as a degenerate MultiLane bag where each slot or "lane" consists of a simple single-word MailBox instead of a general queue. Each lane in PTLQueue also has a private Turn field which acts like the Turn (Grant) variables found in PTL. Turn enforces strict FIFO ordering and restricts concurrency on the slot mailbox field to at most one simultaneous put() and take() operation. PTL uses a single "ticket" variable and per-slot Turn (grant) fields while MultiLane has distinct PutCursor and TakeCursor cursors and abstract per-slot sub-queues. Both PTL and MultiLane advance their cursor and ticket variables with atomic fetch-and-increment. PTLQueue borrows from both PTL and MultiLane and has distinct put and take cursors and per-slot Turn fields. Instead of a per-slot queues, PTLQueue uses a simple single-word MailBox field. PutCursor and TakeCursor act like a pair of ticket locks, conferring "put" and "take" access to a given slot. PutCursor, for instance, assigns an incoming put() request to a slot and serves as a PTL "Ticket" to acquire "put" permission to that slot's MailBox field. To better explain the operation of PTLQueue we deconstruct the operation of put() and take() as follows. Put() first increments PutCursor obtaining a new unique ticket. That ticket value also identifies a slot. Put() next waits for that slot's Turn field to match that ticket value. This is tantamount to using a PTL to acquire "put" permission on the slot's MailBox field. Finally, having obtained exclusive "put" permission on the slot, put() stores the message value into the slot's MailBox. Take() similarly advances TakeCursor, identifying a slot, and then acquires and secures "take" permission on a slot by waiting for Turn. Take() then waits for the slot's MailBox to become non-empty, extracts the message, and clears MailBox. Finally, take() advances the slot's Turn field, which releases both "put" and "take" access to the slot's MailBox. Note the asymmetry : put() acquires "put" access to the slot, but take() releases that lock. At any given time, for a given slot in a PTLQueue, at most one thread has "put" access and at most one thread has "take" access. This restricts concurrency from general MPMC to 1-vs-1. We have 2 ticket locks -- one for put() and one for take() -- each with its own "ticket" variable in the form of the corresponding cursor, but they share a single "Grant" egress variable in the form of the slot's Turn variable. Advancing the PutCursor, for instance, serves two purposes. First, we obtain a unique ticket which identifies a slot. Second, incrementing the cursor is the doorway protocol step to acquire the per-slot mutual exclusion "put" lock. The cursors and operations to increment those cursors serve double-duty : slot-selection and ticket assignment for locking the slot's MailBox field. At any given time a slot MailBox field can be in one of the following states: empty with no pending operations -- neutral state; empty with one or more waiting take() operations pending -- deficit; occupied with no pending operations; occupied with one or more waiting put() operations -- surplus; empty with a pending put() or pending put() and take() operations -- transitional; or occupied with a pending take() or pending put() and take() operations -- transitional. The partial put() and take() operators can be implemented with an atomic fetch-and-increment operation, which may confer a performance advantage over a CAS-based loop. In addition we have independent PutCursor and TakeCursor cursors. Critically, a put() operation modifies PutCursor but does not access the TakeCursor and a take() operation modifies the TakeCursor cursor but does not access the PutCursor. This acts to reduce coherence traffic relative to some other queue designs. It's worth noting that slow threads or obstruction in one slot (or "lane") does not impede or obstruct operations in other slots -- this gives us some degree of obstruction isolation. PTLQueue is not lock-free, however. The implementation above is expressed with polite busy-waiting (Pause) but it's trivial to implement per-slot parking and unparking to deschedule waiting threads. It's also easy to convert the queue to a more general deque by replacing the PutCursor and TakeCursor cursors with Left/Front and Right/Back cursors that can move either direction. Specifically, to push and pop from the "left" side of the deque we would decrement and increment the Left cursor, respectively, and to push and pop from the "right" side of the deque we would increment and decrement the Right cursor, respectively. We used a variation of PTLQueue for message passing in our recent OPODIS 2013 paper. ul { list-style:none; padding-left:0; padding:0; margin:0; margin-left:0; } ul#myTagID { padding: 0px; margin: 0px; list-style:none; margin-left:0;} -- -- There's quite a bit of related literature in this area. I'll call out a few relevant references: Wilson's NYU Courant Institute UltraComputer dissertation from 1988 is classic and the canonical starting point : Operating System Data Structures for Shared-Memory MIMD Machines with Fetch-and-Add. Regarding provenance and priority, I think PTLQueue or queues effectively equivalent to PTLQueue have been independently rediscovered a number of times. See CB-Queue and BNPBV, below, for instance. But Wilson's dissertation anticipates the basic idea and seems to predate all the others. Gottlieb et al : Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors Orozco et al : CB-Queue in Toward high-throughput algorithms on many-core architectures which appeared in TACO 2012. Meneghin et al : BNPVB family in Performance evaluation of inter-thread communication mechanisms on multicore/multithreaded architecture Dmitry Vyukov : bounded MPMC queue (highly recommended) Alex Otenko : US8607249 (highly related). John Mellor-Crummey : Concurrent queues: Practical fetch-and-phi algorithms. Technical Report 229, Department of Computer Science, University of Rochester Thomasson : FIFO Distributed Bakery Algorithm (very similar to PTLQueue). Scott and Scherer : Dual Data Structures I'll propose an optimization left as an exercise for the reader. Say we wanted to reduce memory usage by eliminating inter-slot padding. Such padding is usually "dark" memory and otherwise unused and wasted. But eliminating the padding leaves us at risk of increased false sharing. Furthermore lets say it was usually the case that the PutCursor and TakeCursor were numerically close to each other. (That's true in some use cases). We might still reduce false sharing by incrementing the cursors by some value other than 1 that is not trivially small and is coprime with the number of slots. Alternatively, we might increment the cursor by one and mask as usual, resulting in a logical index. We then use that logical index value to index into a permutation table, yielding an effective index for use in the slot array. The permutation table would be constructed so that nearby logical indices would map to more distant effective indices. (Open question: what should that permutation look like? Possibly some perversion of a Gray code or De Bruijn sequence might be suitable). As an aside, say we need to busy-wait for some condition as follows : "while C == 0 : Pause". Lets say that C is usually non-zero, so we typically don't wait. But when C happens to be 0 we'll have to spin for some period, possibly brief. We can arrange for the code to be more machine-friendly with respect to the branch predictors by transforming the loop into : "if C == 0 : for { Pause; if C != 0 : break; }". Critically, we want to restructure the loop so there's one branch that controls entry and another that controls loop exit. A concern is that your compiler or JIT might be clever enough to transform this back to "while C == 0 : Pause". You can sometimes avoid this by inserting a call to a some type of very cheap "opaque" method that the compiler can't elide or reorder. On Solaris, for instance, you could use :"if C == 0 : { gethrtime(); for { Pause; if C != 0 : break; }}". It's worth noting the obvious duality between locks and queues. If you have strict FIFO lock implementation with local spinning and succession by direct handoff such as MCS or CLH,then you can usually transform that lock into a queue. Hidden commentary and annotations - invisible : * And of course there's a well-known duality between queues and locks, but I'll leave that topic for another blog post. * Compare and contrast : PTLQ vs PTL and MultiLane * Equivalent : Turn; seq; sequence; pos; position; ticket * Put = Lock; Deposit Take = identify and reserve slot; wait; extract & clear; unlock * conceptualize : Distinct PutLock and TakeLock implemented as ticket lock or PTL Distinct arrival cursors but share per-slot "Turn" variable provides exclusive role-based access to slot's mailbox field put() acquires exclusive access to a slot for purposes of "deposit" assigns slot round-robin and then acquires deposit access rights/perms to that slot take() acquires exclusive access to slot for purposes of "withdrawal" assigns slot round-robin and then acquires withdrawal access rights/perms to that slot At any given time, only one thread can have withdrawal access to a slot at any given time, only one thread can have deposit access to a slot Permissible for T1 to have deposit access and T2 to simultaneously have withdrawal access * round-robin for the purposes of; role-based; access mode; access role mailslot; mailbox; allocate/assign/identify slot rights; permission; license; access permission; * PTL/Ticket hybrid Asymmetric usage ; owner oblivious lock-unlock pairing K-exclusion add Grant cursor pass message m from lock to unlock via Slots[] array Cursor performs 2 functions : + PTL ticket + Assigns request to slot in round-robin fashion Deconstruct protocol : explication put() : allocate slot in round-robin fashion acquire PTL for "put" access store message into slot associated with PTL index take() : Acquire PTL for "take" access // doorway step seq = fetchAdd (&Grant, 1) s = &Slots[seq & Mask] // waiting phase while s-Turn != seq : pause Extract : wait for s-mailbox to be full v = s-mailbox s-mailbox = null Release PTL for both "put" and "take" access s-Turn = seq + Mask + 1 * Slot round-robin assignment and lock "doorway" protocol leverage the same cursor and FetchAdd operation on that cursor FetchAdd (&Cursor,1) + round-robin slot assignment and dispersal + PTL/ticket lock "doorway" step waiting phase is via "Turn" field in slot * PTLQueue uses 2 cursors -- put and take. Acquire "put" access to slot via PTL-like lock Acquire "take" access to slot via PTL-like lock 2 locks : put and take -- at most one thread can access slot's mailbox Both locks use same "turn" field Like multilane : 2 cursors : put and take slot is simple 1-capacity mailbox instead of queue Borrow per-slot turn/grant from PTL Provides strict FIFO Lock slot : put-vs-put take-vs-take at most one put accesses slot at any one time at most one put accesses take at any one time reduction to 1-vs-1 instead of N-vs-M concurrency Per slot locks for put/take Release put/take by advancing turn * is instrumental in ... * P-V Semaphore vs lock vs K-exclusion * See also : FastQueues-excerpt.java dice-etc/queue-mpmc-bounded-blocking-circular-xadd/ * PTLQueue is the same as PTLQB - identical * Expedient return; ASAP; prompt; immediately * Lamport's Bakery algorithm : doorway step then waiting phase Threads arriving at doorway obtain a unique ticket number Threads enter in ticket order * In the terminology of Reed and Kanodia a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer It can also be thought of as an optimization of Lamport's bakery lock was designed for fault-tolerance rather than performance Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers --

    Read the article

  • C++ HW - defining classes - objects that have objects of other class problem in header file (out of

    - by kitfuntastik
    This is my first time with much of this code. With this instancepool.h file below I get errors saying I can't use vector (line 14) or have instance& as a return type (line 20). It seems it can't use the instance objects despite the fact that I have included them. #ifndef _INSTANCEPOOL_H #define _INSTANCEPOOL_H #include "instance.h" #include <iostream> #include <string> #include <vector> #include <stdlib.h> using namespace std; class InstancePool { private: unsigned instances;//total number of instance objects vector<instance> ipp;//the collection of instance objects, held in a vector public: InstancePool();//Default constructor. Creates an InstancePool object that contains no Instance objects InstancePool(const InstancePool& original);//Copy constructor. After copying, changes to original should not affect the copy that was created. ~InstancePool();//Destructor unsigned getNumberOfInstances() const;//Returns the number of Instance objects the the InstancePool contains. const instance& operator[](unsigned index) const; InstancePool& operator=(const InstancePool& right);//Overloading the assignment operator for InstancePool. friend istream& operator>>(istream& in, InstancePool& ip);//Overloading of the >> operator. friend ostream& operator<<(ostream& out, const InstancePool& ip);//Overloading of the << operator. }; #endif Here is the instance.h : #ifndef _INSTANCE_H #define _INSTANCE_H ///////////////////////////////#include "instancepool.h" #include <iostream> #include <string> #include <stdlib.h> using namespace std; class Instance { private: string filenamee; bool categoryy; unsigned featuress; unsigned* featureIDD; unsigned* frequencyy; string* featuree; public: Instance (unsigned features = 0);//default constructor unsigned getNumberOfFeatures() const; //Returns the number of the keywords that the calling Instance object can store. Instance(const Instance& original);//Copy constructor. After copying, changes to the original should not affect the copy that was created. ~Instance() { delete []featureIDD; delete []frequencyy; delete []featuree;}//Destructor. void setCategory(bool category){categoryy = category;}//Sets the category of the message. Spam messages are represented with true and and legit messages with false.//easy bool getCategory() const;//Returns the category of the message. void setFileName(const string& filename){filenamee = filename;}//Stores the name of the file (i.e. “spam/spamsga1.txt”, like in 1st assignment) in which the message was initially stored.//const string& trick? string getFileName() const;//Returns the name of the file in which the message was initially stored. void setFeature(unsigned i, const string& feature, unsigned featureID,unsigned frequency) {//i for array positions featuree[i] = feature; featureIDD[i] = featureID; frequencyy[i] = frequency; } string getFeature(unsigned i) const;//Returns the keyword which is located in the ith position.//const string unsigned getFeatureID(unsigned i) const;//Returns the code of the keyword which is located in the ith position. unsigned getFrequency(unsigned i) const;//Returns the frequency Instance& operator=(const Instance& right);//Overloading of the assignment operator for Instance. friend ostream& operator<<(ostream& out, const Instance& inst);//Overloading of the << operator for Instance. friend istream& operator>>(istream& in, Instance& inst);//Overloading of the >> operator for Instance. }; #endif Also, if it is helpful here is instance.cpp: // Here we implement the functions of the class apart from the inline ones #include "instance.h" #include <iostream> #include <string> #include <stdlib.h> using namespace std; Instance::Instance(unsigned features) { //Constructor that can be used as the default constructor. featuress = features; if (features == 0) return; featuree = new string[featuress]; // Dynamic memory allocation. featureIDD = new unsigned[featuress]; frequencyy = new unsigned[featuress]; return; } unsigned Instance::getNumberOfFeatures() const {//Returns the number of the keywords that the calling Instance object can store. return featuress;} Instance::Instance(const Instance& original) {//Copy constructor. filenamee = original.filenamee; categoryy = original.categoryy; featuress = original.featuress; featuree = new string[featuress]; for(unsigned i = 0; i < featuress; i++) { featuree[i] = original.featuree[i]; } featureIDD = new unsigned[featuress]; for(unsigned i = 0; i < featuress; i++) { featureIDD[i] = original.featureIDD[i]; } frequencyy = new unsigned[featuress]; for(unsigned i = 0; i < featuress; i++) { frequencyy[i] = original.frequencyy[i];} } bool Instance::getCategory() const { //Returns the category of the message. return categoryy;} string Instance::getFileName() const { //Returns the name of the file in which the message was initially stored. return filenamee;} string Instance::getFeature(unsigned i) const { //Returns the keyword which is located in the ith position.//const string return featuree[i];} unsigned Instance::getFeatureID(unsigned i) const { //Returns the code of the keyword which is located in the ith position. return featureIDD[i];} unsigned Instance::getFrequency(unsigned i) const { //Returns the frequency return frequencyy[i];} Instance& Instance::operator=(const Instance& right) { //Overloading of the assignment operator for Instance. if(this == &right) return *this; delete []featureIDD; delete []frequencyy; delete []featuree; filenamee = right.filenamee; categoryy = right.categoryy; featuress = right.featuress; featureIDD = new unsigned[featuress]; frequencyy = new unsigned[featuress]; featuree = new string[featuress]; for(unsigned i = 0; i < featuress; i++) { featureIDD[i] = right.featureIDD[i]; } for(unsigned i = 0; i < featuress; i++) { frequencyy[i] = right.frequencyy[i]; } for(unsigned i = 0; i < featuress; i++) { featuree[i] = right.featuree[i]; } return *this; } ostream& operator<<(ostream& out, const Instance& inst) {//Overloading of the << operator for Instance. out << endl << "<message file=" << '"' << inst.filenamee << '"' << " category="; if (inst.categoryy == 0) out << '"' << "legit" << '"'; else out << '"' << "spam" << '"'; out << " features=" << '"' << inst.featuress << '"' << ">" <<endl; for (int i = 0; i < inst.featuress; i++) { out << "<feature id=" << '"' << inst.featureIDD[i] << '"' << " freq=" << '"' << inst.frequencyy[i] << '"' << "> " << inst.featuree[i] << " </feature>"<< endl; } out << "</message>" << endl; return out; } istream& operator>>(istream& in, Instance& inst) { //Overloading of the >> operator for Instance. string word; string numbers = ""; string filenamee2 = ""; bool categoryy2 = 0; unsigned featuress2; string featuree2; unsigned featureIDD2; unsigned frequencyy2; unsigned i; unsigned y; while(in >> word) { if (word == "<message") {//if at beginning of message in >> word;//grab filename word for (y=6; word[y]!='"'; y++) {//pull out filename from between quotes filenamee2 += word[y];} in >> word;//grab category word if (word[10] == 's') categoryy2 = 1; in >> word;//grab features word for (y=10; word[y]!='"'; y++) { numbers += word[y];} featuress2 = atoi(numbers.c_str());//convert string of numbers to integer Instance tempp2(featuress2);//make a temporary Instance object to hold values read in tempp2.setFileName(filenamee2);//set temp object to filename read in tempp2.setCategory(categoryy2); for (i=0; i<featuress2; i++) {//loop reading in feature reports for message in >> word >> word >> word;//skip two words numbers = "";//reset numbers string for (int y=4; word[y]!='"'; y++) {//grab feature ID numbers += word[y];} featureIDD2 = atoi(numbers.c_str()); in >> word;// numbers = ""; for (int y=6; word[y]!='"'; y++) {//grab frequency numbers += word[y];} frequencyy2 = atoi(numbers.c_str()); in >> word;//grab actual feature string featuree2 = word; tempp2.setFeature(i, featuree2, featureIDD2, frequencyy2); }//all done reading in and setting features in >> word;//read in last part of message : </message> inst = tempp2;//set inst (reference) to tempp2 (tempp2 will be destroyed at end of function call) return in; } } } and instancepool.cpp: // Here we implement the functions of the class apart from the inline ones #include "instancepool.h" #include "instance.h" #include <iostream> #include <string> #include <vector> #include <stdlib.h> using namespace std; InstancePool::InstancePool()//Default constructor. Creates an InstancePool object that contains no Instance objects { instances = 0; ipp.clear(); } InstancePool::~InstancePool() { ipp.clear();} InstancePool::InstancePool(const InstancePool& original) {//Copy constructor. instances = original.instances; for (int i = 0; i<instances; i++) { ipp.push_back(original.ipp[i]); } } unsigned InstancePool::getNumberOfInstances() const {//Returns the number of Instance objects the the InstancePool contains. return instances;} const Instance& InstancePool::operator[](unsigned index) const {//Overloading of the [] operator for InstancePool. return ipp[index];} InstancePool& InstancePool::operator=(const InstancePool& right) {//Overloading the assignment operator for InstancePool. if(this == &right) return *this; ipp.clear(); instances = right.instances; for(unsigned i = 0; i < instances; i++) { ipp.push_back(right.ipp[i]); } return *this; } istream& operator>>(istream& in, InstancePool& ip) {//Overloading of the >> operator. ip.ipp.clear(); string word; string numbers; int total;//int to hold total number of messages in collection while(in >> word) { if (word == "<messagecollection"){ in >> word;//reads in total number of all messages for (int y=10; word[y]!='"'; y++){ numbers = ""; numbers += word[y]; } total = atoi(numbers.c_str()); for (int x = 0; x<total; x++) {//do loop for each message in collection in >> ip.ipp[x];//use instance friend function and [] operator to fill in values and create Instance objects and read them intot he vector } } } } ostream& operator<<(ostream& out, const InstancePool& ip) {//Overloading of the << operator. out << "<messagecollection messages=" << '"' << '>' << ip.instances << '"'<< endl << endl; for (int z=0; z<ip.instances; z++) { out << ip[z];} out << endl<<"</messagecollection>\n"; } This code is currently not writing to files correctly either at least, I'm sure it has many problems. I hope my posting of so much is not too much, and any help would be very much appreciated. Thanks!

    Read the article

  • Upgrading Team Foundation Server 2008 to 2010

    - by Martin Hinshelwood
    I am sure you will have seen my posts on upgrading our internal Team Foundation Server from TFS2008 to TFS2010 Beta 2, RC and RTM, but what about a fresh upgrade of TFS2008 to TFS2010 using the RTM version of TFS. One of our clients is taking the plunge with TFS2010, so I have the job of doing the upgrade. It is sometimes very useful to have a team member that starts work when most of the Sydney workers are heading home as I can do the upgrade without impacting them. The down side is that if you have any blockers then you can be pretty sure that everyone that can deal with your problem is asleep I am starting with an existing blank installation of TFS 2010, but Adam Cogan let slip that he was the one that did the install so I thought it prudent to make sure that it was OK. Verifying Team Foundation Server 2010 We need to check that TFS 2010 has been installed correctly. First, check the Admin console and have a root about for any errors. Figure: Even the SQL Setup looks good. I don’t know how Adam did it! Backing up the Team Foundation Server 2008 Databases As we are moving from one server to another (recommended method) we will be taking a backup of our TFS2008 databases and resorting them to the SQL Server for the new TFS2010 Server. Do not just detach and reattach. This will cause problems with the version of the database. If you are running a test migration you just need to create a backup of the TFS 2008 databases, but if you are doing the live migration then you should stop IIS on the TFS 2008 server before you backup the databases. This will stop any inadvertent check-ins or changes to TFS 2008. Figure: Stop IIS before you take a backup to prevent any TFS 2008 changes being written to the database. It is good to leave a little time between taking the TFS 2008 server offline and commencing the upgrade as there is always one developer who has not finished and starts screaming. This time it was John Liu that needed 10 more minutes to make his changes and check-in, so I always give it 30 minutes and see if anyone screams. John Liu [SSW] said:   are you doing something to TFS :-O MrHinsh [SSW UK][VS ALM MVP] said:   I have stopped TFS 2008 as per my emails John Liu [SSW] said:   haven't finish check in @_@   can we have it for 10mins? :) MrHinsh [SSW UK][VS ALM MVP] said:   TFS 2008 has been started John Liu [SSW] said:   I love you! -IM conversation at TFS Upgrade +25 minutes After John confirmed that he had everything done I turned IIS off again and made a cup of tea. There were no more screams so the upgrade can continue. Figure: Backup all of the databases for TFS and include the Reporting Services, just in case.   Figure: Check that all the backups have been taken Once you have your backups, you need to copy them to your new TFS2010 server and restore them. This is a good way to proceed as if we have any problems, or just plain run out of time, then you just turn the TFS 2008 server back on and all you have lost is one upgrade day, and not 10 developer days. As per the rules, you should record the number of files and the total number of areas and iterations before the upgrade so you have something to compare to: TFS2008 File count: Type Count 1 1845 2 15770 Areas & Iterations: 139 You can use this to verify that the upgrade was successful. it should however be noted that the numbers in TFS 2010 will be bigger. This is due to some of the sorting out that TFS does during the upgrade process. Restore Team Foundation Server 2008 Databases Restoring the databases is much more time consuming than just attaching them as you need to do them one at a time. But you may be taking a backup of an operational database and need to restore all your databases to a particular point in time instead of to the latest. I am doing latest unless I encounter any problems. Figure: Restore each of the databases to either a latest or specific point in time.     Figure: Restore all of the required databases Now that all of your databases are restored you now need to upgrade them to Team Foundation Server 2010. Upgrade Team Foundation Server 2008 Databases This is probably the easiest part of the process. You need to call a fire and forget command that will go off to the database specified, find the TFS 2008 databases and upgrade them to 2010. During this process all of the 6 main TFS 2008 databases are merged into the TfsVersionControl database, upgraded and then the database is renamed to TFS_[CollectionName]. The rename is only the database and not the physical files, so it is worth going back and renaming the physical file as well. This keeps everything neat and tidy. If you plan to keep the old TFS 2008 server around, for example if you are doing a test migration first, then you will need to change the TFS GUID. This GUID is unique to each TFS instance and is preserved when you upgrade. This GUID is used by the clients and they can get a little confused if there are two servers with the same one. To kick of the upgrade you need to open a command prompt and change the path to “C:\Program Files\Microsoft Team Foundation Server 2010\Tools” and run the “import” command in  “tfsconfig”. TfsConfig import /sqlinstance:<Previous TFS Data Tier>                  /collectionName:<Collection Name>                  /confirmed Imports a TFS 2005 or 2008 data tier as a new project collection. Important: This command should only be executed after adequate backups have been performed. After you import, you will need to configure portal and reporting settings via the administration console. EXAMPLES -------- TfsConfig import /sqlinstance:tfs2008sql /collectionName:imported /confirmed TfsConfig import /sqlinstance:tfs2008sql\Instance /collectionName:imported /confirmed OPTIONS: -------- sqlinstance         The sql instance of the TFS 2005 or 2008 data tier. The TFS databases at that location will be modified directly and will no longer be usable as previous version databases.  Ensure you have back-ups. collectionName      The name of the new Team Project Collection. confirmed           Confirm that you have backed-up databases before importing. This command will automatically look for the TfsIntegration database and verify that all the other required databases exist. In this case it took around 5 minutes to complete the upgrade as the total database size was under 700MB. This was unlike the upgrade of SSW’s production database with over 17GB of data which took a few hours. At the end of the process you should get no errors and no warnings. The Upgrade operation on the ApplicationTier feature has completed. There were 0 errors and 0 warnings. As this is a new server and not a pure upgrade there should not be a problem with the GUID. If you think at any point you will be doing this more than once, for example doing a test migration, or merging many TFS 2008 instances into a single one, then you should go back and rename the physical TfsVersionControl.mdf file to the same as the new collection. This will avoid confusion later down the line. To do this, detach the new collection from the server and rename the physical files. Then reattach and change the physical file locations to match the new name. You can follow http://www.mssqltips.com/tip.asp?tip=1122 for a more detailed explanation of how to do this. Figure: Stop the collection so TFS does not take a wobbly when we detach the database. When you try to start the new collection again you will get a conflict with project names and will require to remove the Test Upgrade collection. This is fine and it just needs detached. Figure: Detaching the test upgrade from the new Team Foundation Server 2010 so we can start the new Collection again. You will now be able to start the new upgraded collection and you are ready for testing. Do you remember the stats we took off the TFS 2008 server? TFS2008 File count: Type Count 1 1845 2 15770 Areas & Iterations: 139 Well, now we need to compare them to the TFS 2010 stats, remembering that there will probably be more files under source control. TFS2010 File count: Type Count 1 19288 Areas & Iterations: 139 Lovely, the number of iterations are the same, and the number of files is bigger. Just what we were looking for. Testing the upgraded Team Foundation Server 2010 Project Collection Can we connect to the new collection and project? Figure: We can connect to the new collection and project.   Figure: make sure you can connect to The upgraded projects and that you can see all of the files. Figure: Team Web Access is there and working. Note that for Team Web Access you now use the same port and URL as for TFS 2010. So in this case as I am running on the local box you need to use http://localhost:8080/tfs which will redirect you to http://localhost:8080/tfs/web for the web access. If you need to connect with a Visual Studio 2008 client you will need to use the full path of the new collection, http://[servername]/tfs/[collectionname] and this will work with all of your collections. With Visual Studio 2005 you will only be able to connect to the Default collection and in both VS2008 and VS2005 you will need to install the forward compatibility updates. Visual Studio Team System 2005 Service Pack 1 Forward Compatibility Update for Team Foundation Server 2010 Visual Studio Team System 2008 Service Pack 1 Forward Compatibility Update for Team Foundation Server 2010 To make sure that you have everything up to date, make sure that you run SSW Diagnostics and get all green ticks. Upgrade Done! At this point you can send out a notice to everyone that the upgrade is complete and and give them the connection details. You need to remember that at this stage we have 2008 project upgraded to run under TFS 2010 but it is still running under that same process template that it was running before. You can only “enable” 2010 features in a process template you can’t upgrade. So what to do? Well, you need to create a new project and migrate things you want to keep across. Souse code is easy, you can move or Branch, but Work Items are more difficult as you can’t move them between projects. This instance is complicated more as the old project uses the Conchango/EMC Scrum for Team System template and I will need to write a script/application to get the work items across with their attachments in tact. That is my next task! Technorati Tags: TFS 2010,TFS 2008,VS ALM

    Read the article

< Previous Page | 214 215 216 217 218 219 220 221 222 223 224 225  | Next Page >