Search Results

Search found 60932 results on 2438 pages for 'data operations'.

Page 424/2438 | < Previous Page | 420 421 422 423 424 425 426 427 428 429 430 431  | Next Page >

  • .NET WebService IPC - Should it be done to minimise some expensive operations?

    - by Kyle
    I'm looking at a few different approaches to a problem: Client requests work, some stuff gets done, and a result (ok/error) is returned. A .NET web service definitely seems like the way to go, my only issue is that the "stuff" will involve building up and tearing down a session for each request. Does abstracting the "stuff" out to an app (which would keep a single session active, and process the request from the web service) seem like the right way to go? (and if so, what communication method) The work time is negligible, my concern is the hammering the transaction servers in question will probably get if I create/drop a session for each job. Is some form of IPC or socket based communication a feasible solution here? Thoughts/comments/experiences much appreciated. Edit: After a bit more research, it seems like hosting a WCF service in a Windows Service is probably a better way to go...

    Read the article

  • Which term to use when referring to functional data structures: persistent or immutable?

    - by Bob
    In the context of functional programming which is the correct term to use: persistent or immutable? When I Google "immutable data structures" I get a Wikipedia link to an article on "Persistent data structure" which even goes on to say: such data structures are effectively immutable Which further confuses things for me. Do functional programs rely on persistent data structures or immutable data structures? Or are they always the same thing?

    Read the article

  • VB.NET Two different approaches to generic cross-threaded operations; which is better?

    - by BASnappl
    VB.NET 2010, .NET 4 Hello, I recently read about using SynchronizationContext objects to control the execution thread for some code. I have been using a generic subroutine to handle (possibly) cross-thread calls for things like updating UI controls that utilizes Invoke. I'm an amateur and have a hard time understanding the pros and cons of any particular approach. I am looking for some insight on which approach might be preferable and why. Update: This question is motivated, in part, by statements such as the following from the MSDN page on Control.InvokeRequired. An even better solution is to use the SynchronizationContext returned by SynchronizationContext rather than a control for cross-thread marshaling. Method 1: Public Sub InvokeControl(Of T As Control)(ByVal Control As T, ByVal Action As Action(Of T)) If Control.InvokeRequired Then Control.Invoke(New Action(Of T, Action(Of T))(AddressOf InvokeControl), New Object() {Control, Action}) Else Action(Control) End If End Sub Method 2: Public Sub UIAction(Of T As Control)(ByVal Control As T, ByVal Action As Action(Of Control)) SyncContext.Send(New Threading.SendOrPostCallback(Sub() Action(Control)), Nothing) End Sub Where SyncContext is a Threading.SynchronizationContext object defined in the constructor of my UI form: Public Sub New() InitializeComponent() SyncContext = WindowsFormsSynchronizationContext.Current End Sub Then, if I wanted to update a control (e.g., Label1) on the UI form, I would do: InvokeControl(Label1, Sub(x) x.Text = "hello") or UIAction(Label1, Sub(x) x.Text = "hello") So, what do y'all think? Is one way preferred or does it depend on the context? If you have the time, verbosity would be appreciated! Thanks in advance, Brian

    Read the article

  • What are these values representative of in C bitwise operations?

    - by ajax81
    Hi All, I'm trying to reverse the order of bits in C (homework question, subject: bitwise operators). I found this solution, but I'm a little confused by the hex values (?) used -- 0x01 and 0x80. unsigned char reverse(unsigned char c) { int shift; unsigned char result = 0; for (shift = 0; shift < CHAR_BITS; shift++) { if (c & (0x01 << shift)) result |= (0x80 >> shift); } return result; } The book I'm working out of hasn't discussed these kinds of values, so I'm not really sure what to make of them. Can somebody shed some light on this solution? Thank you!

    Read the article

  • sqlite3 date operations when joining two tables in a view?

    - by duncan
    In short, how to add minutes to a datetime from an integer located in another table, in one select statement, by joining them? I have a table P(int id, ..., int minutes) and a table S(int id, int p_id, datetime start) I want to generate a view that gives me PS(S.id, P.id, S.start + P.minutes) by joining S.p_id=P.id The problem is, if I was generating the query from the application, I can do stuff like: select datetime('2010-04-21 14:00', '+20 minutes'); 2010-04-21 14:20:00 By creating the string '+20 minutes' in the application and then passing it to sqlite. However I can't find a way to create this string in the select itself: select p.*,datetime(s.start_at, formatstring('+%s minutes', p.minutes)) from p,s where s.p_id=p.id; Because sqlite as far the documentation tells, does not provide any string format function, nor can I see any alternative way of expressing the date modifiers.

    Read the article

  • Is there a design pattern that expresses objects (an their operations) in various states?

    - by darren
    Hi I have a design question about the evolution of an object (and its state) after some sequence of methods complete. I'm having trouble articulating what I mean so I may need to clean up the question based on feedback. Consider an object called Classifier. It has the following methods: void initialise() void populateTrainingSet(TrainingSet t) void pupulateTestingSet(TestingSet t) void train() void test() Result predict(Instance i) My problem is that these methods need to be called in a certain order. Futher, some methods are invalid until a previous method is called, and some methods are invalid after a method has been called. For example, it would be invalid to call predict() before test() was called, and it would be invalid to call train() after test() was called. My approach so far has been to maintain a private enum that represents the current stateof the object: private static enum STATE{ NEW, TRAINED, TESTED, READY}; But this seems a bit cloogy. Is there a design pattern for such a problem type? Maybe something related to the template method.

    Read the article

  • Problem with receiving data form serial port in c#?

    - by moon
    hello i have problem with receiving data from serial port in c# in am inserting a new line operator at the end of data buffer. then i send this data buffer on serial port, after this my c# GUI receiver will take this data via Readline() function but it always give me raw data not the actual one how to resolve this problem.

    Read the article

  • Geometry library for python (or C++) for CAD-like operations?

    - by gct
    I'm trying to put together a simple program that will let me visualize a series of consecutive cuts on a wood panel using a router with a particular cutting head. I'm trying to find a decent geometry library that will give me a shortcut through the CAD-like stuff. Specifically, I'd like to be able to define a rectangular solid (the wood panel) and then define a bit profile shape, and take cuts through the rectangular solid (sometimes on a straight line, sometimes on a circular arc). Does anyone know of anything that will do this?

    Read the article

  • Is there a way of providing a final transform method when chaining operations (like map reduce) in underscore.js?

    - by latentflip
    (Really strugging to title this question, so if anyone has suggestions feel free.) Say I wanted to do an operation like: take an array [1,2,3] multiply each element by 2 (map): [2,4,6] add the elements together (reduce): 12 multiply the result by 10: 120 I can do this pretty cleanly in underscore using chaining, like so: arr = [1,2,3] map = (el) -> 2*el reduce = (s,n) -> s+n out = (r) -> 10*r reduced = _.chain(arr).map(map).reduce(reduce).value() result = out(reduced) However, it would be even nicer if I could chain the 'out' method too, like this: result = _.chain(arr).map(map).reduce(reduce).out(out).value() Now this would be a fairly simple addition to a library like underscore. But my questions are: Does this 'out' method have a name in functional programming? Does this already exist in underscore (tap comes close, but not quite).

    Read the article

  • Indexing with pointer C/C++

    - by Leavenotrace
    Hey I'm trying to write a program to carry out newtons method and find the roots of the equation exp(-x)-(x^2)+3. It works in so far as finding the root, but I also want it to print out the root after each iteration but I can't get it to work, Could anyone point out my mistake I think its something to do with my indexing? Thanks a million :) #include <stdio.h> #include <math.h> #include <malloc.h> //Define Functions: double evalf(double x) { double answer=exp(-x)-(x*x)+3; return(answer); } double evalfprime(double x) { double answer=-exp(-x)-2*x; return(answer); } double *newton(double initialrt,double accuracy,double *data) { double root[102]; data=root; int maxit = 0; root[0] = initialrt; for (int i=1;i<102;i++) { *(data+i)=*(data+i-1)-evalf(*(data+i-1))/evalfprime(*(data+i-1)); if(fabs(*(data+i)-*(data+i-1))<accuracy) { maxit=i; break; } maxit=i; } if((maxit+1==102)&&(fabs(*(data+maxit)-*(data+maxit-1))>accuracy)) { printf("\nMax iteration reached, method terminated"); } else { printf("\nMethod successful"); printf("\nNumber of iterations: %d\nRoot Estimate: %lf\n",maxit+1,*(data+maxit)); } return(data); } int main() { double root,accuracy; double *data=(double*)malloc(sizeof(double)*102); printf("NEWTONS METHOD PROGRAMME:\nEquation: f(x)=exp(-x)-x^2+3=0\nMax No iterations=100\n\nEnter initial root estimate\n>> "); scanf("%lf",&root); _flushall(); printf("\nEnter accuracy required:\n>>"); scanf("%lf",&accuracy); *data= *newton(root,accuracy,data); printf("Iteration Root Error\n "); printf("%d %lf \n", 0,*(data)); for(int i=1;i<102;i++) { printf("%d %5.5lf %5.5lf\n", i,*(data+i),*(data+i)-*(data+i-1)); if(*(data+i*sizeof(double))-*(data+i*sizeof(double)-1)==0) { break; } } getchar(); getchar(); free(data); return(0); }

    Read the article

  • What PHP function(s) can I use to perform operations on non-integer timestamps?

    - by stephenhay
    Disclaimer, I'm not a PHP programmer, so you might find this question trivial. That's why I'm asking you! I've got this kind of timestamp: 2010-05-10T22:00:00 (That's Y-m-d) I would like to subtract, say, 10 days (or months, whatever) from this, and have my result be in the same format, i.e. 2010-04-30T22:00:00. What function(s) do I need to do this in PHP? Note: I'm using this to do a computed field in Drupal. The result will be the date that an e-mail is sent. Bonus question: If 2010-05-10T22:00:00 means "May 10, 2010 at 10pm", is there a timestamp equivalent of "May 10, 2010 (all day)"? Thanks everyone.

    Read the article

  • How to use an excel data-set for a multi-line ggplot in R?

    - by user1299887
    I have a data set in excel that I am trying to create a multiple line plot with on R. The data set contains 7 food groups and the calories consumed daily associated to the groups. As well, there is that set of data over 38 years (from 1970-2008) and I am attempting to use this data set to create a multiple line plot on R. I have tried for hours on end but can not seem to get R to recognize the variables within the data set.

    Read the article

  • Are there any javascript string formatting operations similar to the way %s is used in Python?

    - by Phil
    I've been writing a lot of javascript, and when I want to stick a variable in a string, I've been doing it like so: $("#more_info span#author").html("Created by: <a href='/user/" + author + "'>" + author + "</a>"); I feel like it's pretty ugly and a pain to write over and over. In python the %s operator makes this problem easy. Even in C, I can do sprintf (IIRC). Is there anything like that in javascript? (Lots of google'ing yielded nothing.)

    Read the article

  • Configuring Multiple Instances of MySQL in Solaris 11

    - by rajeshr
    Recently someone asked me for steps to configure multiple instances of MySQL database in an Operating Platform. Coz of my familiarity with Solaris OE, I prepared some notes on configuring multiple instances of MySQL database on Solaris 11. Maybe it's useful for some: If you want to run Solaris Operating System (or any other OS of your choice) as a virtualized instance in desktop, consider using Virtual Box. To download Solaris Operating System, click here. Once you have your Solaris Operating System (Version 11) up and running and have Internet connectivity to gain access to the Image Packaging System (IPS), please follow the steps as mentioned below to install MySQL and configure multiple instances: 1. Install MySQL Database in Solaris 11 $ sudo pkg install mysql-51 2. Verify if the mysql is installed: $ svcs -a | grep mysql Note: Service FMRI will look similar to the one here: svc:/application/database/mysql:version_51 3. Prepare data file system for MySQL Instance 1 zfs create rpool/mysql zfs create rpool/mysql/data zfs set mountpoint=/mysql/data rpool/mysql/data 4. Prepare data file system for MySQL Instance 2 zfs create rpool/mysql/data2 zfs set mountpoint=/mysql/data rpool/mysql/data2 5. Change the mysql/datadir of the MySQL Service (SMF) to point to /mysql/data $ svcprop mysql:version_51 | grep mysql/data $ svccfg -s mysql:version_51 setprop mysql/data=/mysql/data 6. Create a new instance of MySQL 5.1 (a) Copy the manifest of the default instance to temporary directory: $ sudo cp /lib/svc/manifest/application/database/mysql_51.xml /var/tmp/mysql_51_2.xml (b) Make appropriate modifications on the XML file $ sudo vi /var/tmp/mysql_51_2.xml - Change the "instance name" section to a new value "version_51_2" - Change the value of property name "data" to point to the ZFS file system "/mysql/data2" 7. Import the manifest to the SMF repository: $ sudo svccfg import /var/tmp/mysql_51_2.xml 8. Before starting the service, copy the file /etc/mysql/my.cnf to the data directories /mysql/data & /mysql/data2. $ sudo cp /etc/mysql/my.cnf /mysql/data/ $ sudo cp /etc/mysql/my.cnf /mysql/data2/ 9. Make modifications to the my.cnf in each of the data directories as required: $ sudo vi /mysql/data/my.cnf Under the [client] section port=3306 socket=/tmp/mysql.sock ---- ---- Under the [mysqld] section port=3306 socket=/tmp/mysql.sock datadir=/mysql/data ----- ----- server-id=1 $ sudo vi /mysql/data2/my.cnf Under the [client] section port=3307 socket=/tmp/mysql2.sock ----- ----- Under the [mysqld] section port=3307 socket=/tmp/mysql2.sock datadir=/mysql/data2 ----- ----- server-id=2 10. Make appropriate modification to the startup script of MySQL (managed by SMF) to point to the appropriate my.cnf for each instance: $ sudo vi /lib/svc/method/mysql_51 Note: Search for all occurences of mysqld_safe command and modify it to include the --defaults-file option. An example entry would look as follows: ${MySQLBIN}/mysqld_safe --defaults-file=${MYSQLDATA}/my.cnf --user=mysql --datadir=${MYSQLDATA} --pid=file=${PIDFILE} 11. Start the service: $ sudo svcadm enable mysql:version_51_2 $ sudo svcadm enable mysql:version_51 12. Verify that the two services are running by using: $ svcs mysql 13. Verify the processes: $ ps -ef | grep mysqld 14. Connect to each mysqld instance and verify: $ mysql --defaults-file=/mysql/data/my.cnf -u root -p $ mysql --defaults-file=/mysql/data2/my.cnf -u root -p Some references for Solaris 11 newbies Taking your first steps with Solaris 11 Introducing the basics of Image Packaging System Service Management Facility How To Guide For a detailed list of official educational modules available on Solaris 11, please visit here For MySQL courses from Oracle University access this page.

    Read the article

  • How would you gather client's data on Google App Engine without using Datastore/Backend Instances too much?

    - by ruslan
    I'm relatively new to StackExchange and not sure if it's appropriate place to ask design question. Site gives me a hint "The question you're asking appears subjective and is likely to be closed". Please let me know. Anyway.. One of the projects I'm working on is online survey engine. It's my first big commercial project on Google App Engine. I need your advice on how to collect stats and efficiently record them in DataStore without bankrupting me. Initial requirements are: After user finishes survey client sends list of pairs [ID (int) + PercentHit (double)]. This list shows how close answers of this user match predefined answers of reference answerers (which identified by IDs). I call them "target IDs". Creator of the survey wants to see aggregated % for given IDs for last hour, particular timeframe or from the beginning of the survey. Some surveys may have thousands of target/reference answerers. So I created entity public class HitsStatsDO implements Serializable { @Id transient private Long id; transient private Long version = (long) 0; transient private Long startDate; @Parent transient private Key parent; // fake parent which contains target id @Transient int targetId; private double avgPercent; private long hitCount; } But writing HitsStatsDO for each target from each user would give a lot of data. For instance I had a survey with 3000 targets which was answered by ~4 million people within one week with 300K people taking survey in first day. Even if we assume they were answering it evenly for 24 hours it would give us ~1040 writes/second. Obviously it hits concurrent writes limit of Datastore. I decided I'll collect data for one hour and save that, that's why there are avgPercent and hitCount in HitsStatsDO. GAE instances are stateless so I had to use dynamic backend instance. There I have something like this: // Contains stats for one hour private class Shard { ReadWriteLock lock = new ReentrantReadWriteLock(); Map<Integer, HitsStatsDO> map = new HashMap<Integer, HitsStatsDO>(); // Key is target ID public void saveToDatastore(); public void updateStats(Long startDate, Map<Integer, Double> hits); } and map with shard for current hour and previous hour (which doesn't stay here for long) private HashMap<Long, Shard> shards = new HashMap<Long, Shard>(); // Key is HitsStatsDO.startDate So once per hour I dump Shard for previous hour to Datastore. Plus I have class LifetimeStats which keeps Map<Integer, HitsStatsDO> in memcached where map-key is target ID. Also in my backend shutdown hook method I dump stats for unfinished hour to Datastore. There is only one major issue here - I have only ONE backend instance :) It raises following questions on which I'd like to hear your opinion: Can I do this without using backend instance ? What if one instance is not enough ? How can I split data between multiple dynamic backend instances? It hard because I don't know how many I have because Google creates new one as load increases. I know I can launch exact number of resident backend instances. But how many ? 2, 5, 10 ? What if I have no load at all for a week. Constantly running 10 backend instances is too expensive. What do I do with data from clients while backend instance is dead/restarting? Thank you very much in advance for your thoughts.

    Read the article

  • How can I gather client's data on Google App Engine without using Datastore/Backend Instances too much?

    - by ruslan
    One of the projects I'm working on is online survey engine. It's my first big commercial project on Google App Engine. I need your advice on how to collect stats and efficiently record them in DataStore without bankrupting me. Initial requirements are: After user finishes survey client sends list of pairs [ID (int) + PercentHit (double)]. This list shows how close answers of this user match predefined answers of reference answerers (which identified by IDs). I call them "target IDs". Creator of the survey wants to see aggregated % for given IDs for last hour, particular timeframe or from the beginning of the survey. Some surveys may have thousands of target/reference answerers. So I created entity public class HitsStatsDO implements Serializable { @Id transient private Long id; transient private Long version = (long) 0; transient private Long startDate; @Parent transient private Key parent; // fake parent which contains target id @Transient int targetId; private double avgPercent; private long hitCount; } But writing HitsStatsDO for each target from each user would give a lot of data. For instance I had a survey with 3000 targets which was answered by ~4 million people within one week with 300K people taking survey in first day. Even if we assume they were answering it evenly for 24 hours it would give us ~1040 writes/second. Obviously it hits concurrent writes limit of Datastore. I decided I'll collect data for one hour and save that, that's why there are avgPercent and hitCount in HitsStatsDO. GAE instances are stateless so I had to use dynamic backend instance. There I have something like this: // Contains stats for one hour private class Shard { ReadWriteLock lock = new ReentrantReadWriteLock(); Map<Integer, HitsStatsDO> map = new HashMap<Integer, HitsStatsDO>(); // Key is target ID public void saveToDatastore(); public void updateStats(Long startDate, Map<Integer, Double> hits); } and map with shard for current hour and previous hour (which doesn't stay here for long) private HashMap<Long, Shard> shards = new HashMap<Long, Shard>(); // Key is HitsStatsDO.startDate So once per hour I dump Shard for previous hour to Datastore. Plus I have class LifetimeStats which keeps Map<Integer, HitsStatsDO> in memcached where map-key is target ID. Also in my backend shutdown hook method I dump stats for unfinished hour to Datastore. There is only one major issue here - I have only ONE backend instance :) It raises following questions on which I'd like to hear your opinion: Can I do this without using backend instance ? What if one instance is not enough ? How can I split data between multiple dynamic backend instances? It hard because I don't know how many I have because Google creates new one as load increases. I know I can launch exact number of resident backend instances. But how many ? 2, 5, 10 ? What if I have no load at all for a week. Constantly running 10 backend instances is too expensive. What do I do with data from clients while backend instance is dead/restarting?

    Read the article

  • Are there any Microsoft Exchange Clients for iOS and Android that store their local data in an encrypted manner?

    - by Zac B
    I don't feel like this is a product recommendation question, more of a "does this tech even exist and is it feasible" question, but if I'm wrong, feel free to give this question the boot. Context: Our company has a bunch of traveling employees who access the company's Exchange server via thier iDevices or android phones, but because of the data protection laws in the state where our company is based (and the nature of the data our company works with), a recent security audit found that all mobile devices (laptops, phones, etc) operated by our company need to have all company correspondence and related data encrypted all the time. For laptops, that was easy: BitLocker or TrueCrypt, problem solved. For phones and tablets, however, I'm stumped. Sure, you can put lock screens/passwords on the phones, but the data is still accessible via external extraction, as law enforcement authorities already know. Question: Are there any clients for Microsoft Exchange that run on iOS or Android which store local data encrypted? The people using our mobile devices do a lot of their work while offline, so just giving them OWA access with SSL connection security isn't enough. Are there apps/technologies that present an additional login credential prompt to decrypt locally stored data in the app's storage area on the phone? My gut reaction when I started looking into this was "that doesn't sound like something Apple would allow into the App Store", but I've been wrong before...

    Read the article

  • Security implications of adding www-data to /etc/sudoers to run php-cgi as a different user

    - by BMiner
    What I really want to do is allow the 'www-data' user to have the ability to launch php-cgi as another user. I just want to make sure that I fully understand the security implications. The server should support a shared hosting environment where various (possibly untrusted) users have chroot'ed FTP access to the server to store their HTML and PHP files. Then, since PHP scripts can be malicious and read/write others' files, I'd like to ensure that each users' PHP scripts run with the same user permissions for that user (instead of running as www-data). Long story short, I have added the following line to my /etc/sudoers file, and I wanted to run it past the community as a sanity check: www-data ALL = (%www-data) NOPASSWD: /usr/bin/php-cgi This line should only allow www-data to run a command like this (without a password prompt): sudo -u some_user /usr/bin/php-cgi ...where some_user is a user in the group www-data. What are the security implications of this? This should then allow me to modify my Lighttpd configuration like this: fastcgi.server += ( ".php" => (( "bin-path" => "sudo -u some_user /usr/bin/php-cgi", "socket" => "/tmp/php.socket", "max-procs" => 1, "bin-environment" => ( "PHP_FCGI_CHILDREN" => "4", "PHP_FCGI_MAX_REQUESTS" => "10000" ), "bin-copy-environment" => ( "PATH", "SHELL", "USER" ), "broken-scriptfilename" => "enable" )) ) ...allowing me to spawn new FastCGI server instances for each user.

    Read the article

  • Beware Sneaky Reads with Unique Indexes

    - by Paul White NZ
    A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?” Leaving aside trivial cases (like selecting a computed column that does reference the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for.  In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest ( pk INTEGER IDENTITY NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);   WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( some_value, lob_data ) SELECT TOP (1000) N.n, @Data FROM Numbers N WHERE N.n <= 1000; Test 1: A Simple Update Let’s run a query to subtract one from every value in the some_value column: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources.  The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time.  The query plan is also very simple: Looking at the Clustered Index Scan, we can see that SQL Server only retrieves the pk and some_value columns during the scan: The pk column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed.  The some_value column is used by the Compute Scalar to calculate the new value.  (In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT). Test 2: Simple Update with an Index Now let’s create a nonclustered index keyed on the some_value column, with lob_data as an included column: CREATE NONCLUSTERED INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); This is not a useful index for our simple update query; imagine that someone else created it for a different purpose.  Let’s run our update query again: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; We find that it now requires 4,014 logical reads and the elapsed query time has increased to around 100ms.  The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index. The query plan is very similar to before (click to enlarge): The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index. The new Compute Scalar operators detect whether the value in the some_value column has actually been changed by the update.  SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post on non-updating updates for details).  Our simple query does change the value of some_data in every row, so this optimization doesn’t add any value in this specific case. The output list of columns from the Clustered Index Scan hasn’t changed from the one shown previously: SQL Server still just reads the pk and some_data columns.  Cool. Overall then, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table.  Let’s see what happens if we make the nonclustered index unique. Test 3: Simple Update with a Unique Index Here’s the script to create a new unique index, and drop the old one: CREATE UNIQUE NONCLUSTERED INDEX [UQ dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); GO DROP INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest; Remember that SQL Server only enforces uniqueness on index keys (the some_data column).  The lob_data column is simply stored at the leaf-level of the non-clustered index.  With that in mind, we might expect this change to make very little difference.  Let’s see: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; Whoa!  Now look at the elapsed time and logical reads: Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.   CPU time = 172 ms, elapsed time = 16172 ms. Even with all the data and index pages in memory, the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead). Why on earth is SQL Server reading LOB data in a query that only updates a single integer column? The Query Plan The query plan for test 3 looks a bit more complex than before: In fact, the bottom level is exactly the same as we saw with the non-unique index.  The top level has heaps of new stuff though, which I’ll come to in a moment. You might be expecting to find that the Clustered Index Scan is now reading the lob_data column (for some reason).  After all, we need to explain where all the LOB logical reads are coming from.  Sadly, when we look at the properties of the Clustered Index Scan, we see exactly the same as before: SQL Server is still only reading the pk and some_value columns – so what’s doing the LOB reads? Updates that Sneakily Read Data We have to go as far as the Clustered Index Update operator before we see LOB data in the output list: [Expr1020] is a bit flag added by an earlier Compute Scalar.  It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier). The Clustered Index Update operator adds two new columns: the lob_data column, and some_value_OLD.  The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column.  At this point, the clustered index has already been updated with the new value, but we haven’t touched the nonclustered index yet. An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its update operation.  SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the data through all the operations that occur prior to the Clustered Index Update.  The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then.  Sneaky! Why the LOB Data Is Needed This is all very interesting (I hope), but why is SQL Server reading the LOB data?  For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered Index Update? The answer relates to the top row of the query plan for test 3.  I’ll reproduce it here for convenience: Notice that this is a wide (per-index) update plan.  SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index too.  I’ll talk more about this difference shortly. The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient.  It does this by breaking each update into a delete/insert pair, reordering the operations, removing any redundant operations, and finally applying the net effect of all the changes to the nonclustered index. Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3.  If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4.  The net effect of all the changes is the same as if we simply deleted the value 1, and added a new value 4. By applying net changes, SQL Server can also avoid false unique-key violations.  If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3 of course) and the query would fail.  You might argue that SQL Server could avoid the uniqueness violation by starting with the highest value (3) and working down.  That’s fine, but it’s not possible to generalize this logic to work with every possible update query. SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations.  It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits.  As a result, you will often receive a wide update plan, even when you can see that no violations are possible. Another benefit of this optimization is that it includes a sort on the index key as part of its work.  Processing the index changes in index key order promotes sequential I/O against the nonclustered index. A side-effect of all this is that the net changes might include one or more inserts.  In order to insert a new row in the index, SQL Server obviously needs all the columns – the key column and the included LOB column.  This is the reason SQL Server reads the LOB data as part of the Clustered Index Update. In addition, the some_value_OLD column is required by the Split operator (it turns updates into delete/insert pairs).  In order to generate the correct index key delete operation, it needs the old key value. The irony is that in this case the Split/Sort/Collapse optimization is anything but.  Reading all that LOB data is extremely expensive, so it is sad that the current version of SQL Server has no way to avoid it. Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates. Beating the Set-Based Update with a Cursor One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated.  Armed with this knowledge, we can write a cursor (or the WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data: SET NOCOUNT ON; SET STATISTICS XML, IO, TIME OFF;   DECLARE @PK INTEGER, @StartTime DATETIME; SET @StartTime = GETUTCDATE();   DECLARE curUpdate CURSOR LOCAL FORWARD_ONLY KEYSET SCROLL_LOCKS FOR SELECT L.pk FROM LOBtest L ORDER BY L.pk ASC;   OPEN curUpdate;   WHILE (1 = 1) BEGIN FETCH NEXT FROM curUpdate INTO @PK;   IF @@FETCH_STATUS = -1 BREAK; IF @@FETCH_STATUS = -2 CONTINUE;   UPDATE dbo.LOBtest SET some_value = some_value - 1 WHERE CURRENT OF curUpdate; END;   CLOSE curUpdate; DEALLOCATE curUpdate;   SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE()); That completes the update in 1280 milliseconds (remember test 3 took over 16 seconds!) I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it.  One could just as well use a WHERE clause that specified the primary key value instead. Clustered Indexes A clustered index is the ultimate index with included columns: all non-key columns are included columns in a clustered index.  Let’s re-create the test table and data with an updatable primary key, and without any non-clustered indexes: IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL DROP TABLE dbo.LOBtest; GO CREATE TABLE dbo.LOBtest ( pk INTEGER NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);   WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( pk, some_value, lob_data ) SELECT TOP (1000) N.n, N.n, @Data FROM Numbers N WHERE N.n <= 1000; Now here’s a query to modify the cluster keys: UPDATE dbo.LOBtest SET pk = pk + 1; The query plan is: As you can see, the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection.  In addition, SQL Server now has no choice but to read the LOB data in the Clustered Index Scan: The performance is not great, as you might expect (even though there is no non-clustered index to maintain): Table 'LOBtest'. Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.   Table 'Worktable'. Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0, lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000.   SQL Server Execution Times: CPU time = 483 ms, elapsed time = 17884 ms. Notice how the LOB data is read twice: once from the Clustered Index Scan, and again from the work table in tempdb used by the Eager Spool. If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the cluster key (including uniqueifier) around (no LOB data or other non-key columns): A unique non-clustered index (on a heap) works well too: Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads.  (In fact the heap is rather more efficient). There are lots more fun combinations to try that I don’t have space for here. Final Thoughts The behaviour shown in this post is not limited to LOB data by any means.  If the conditions are met, any unique index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve covering queries, perhaps. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

    Read the article

< Previous Page | 420 421 422 423 424 425 426 427 428 429 430 431  | Next Page >