Search Results

Search found 8976 results on 360 pages for 'optimal solutions'.

Page 13/360 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20  | Next Page >

  • What is the optimal way to run a set of regressions in R.

    - by stevejb
    Assume that I have sources of data X and Y that are indexable, say matrices. And I want to run a set of independent regressions and store the result. My initial approach would be results = matrix(nrow=nrow(X), ncol=(2)) for(i in 1:ncol(X)) { matrix[i,] = coefficients(lm(Y[i,] ~ X[i,]) } But, loops are bad, so I could do it with lapply as out <- lapply(1:nrow(X), function(i) { coefficients(lm(Y[i,] ~ X[i,])) } ) Is there a better way to do this?

    Read the article

  • Optimal size for Database partitions

    - by Adrian Mouat
    Hi all, I am creating a very simple, very large Postgresql database. The database will have around 10 billion rows, which means I am looking at partitioning it into several tables. However, I can't find any information on how many partitions we should break it into. I don't know what type of queries to expect as of yet, so it won't be possible to come up with a perfect partitioning scheme, but are there any rules of thumb for partition size? Cheers, Adrian.

    Read the article

  • Enterprise Tape Backup solutions

    - by Tom O'Connor
    I'm currently attempting to re-architect a backup solution where I'm working. We've got 2 NAS devices, one in the office, one in the datacentre. The servers in the DC back up to the DC NAS, which is then replicated to the Office NAS. The office NAS exports shares as CIFS and NFS, this bit is fine. At some point, I'll have to expand our storage capacity, currently we've got about 1.4TB of storage space, which is about 96% full. Previously, the tape backup was a script that ran tar a few times and squirted data onto a tape. It worked, but was by no means a perfect solution. Restores are a bit of a pest, adding new data to the backup requires editing the script as root. It's just all a bit non-ideal. I've been evaluating a number of "enterprise" ready backup solutions, such as Yosemite Backup from Barracuda, Acronis Backup/Restore, and something from Arkeia. In the process of evaluating these, I've found 2 big problems. Not all of them allow backup of mounted devices (such as a NFS mounted NAS) Many of these applications don't like our tape device. For the most part, (1) is essential. Our NAS has a feeble processor and can't run applications like backup agents. I suspect that the biggest problem is the tape device, which is a HP C7438A DAT72 connected via USB. Questions: Has anyone else got an USB DAT72 device working with similar software? Is there a better way to back up data from an "appliance" NAS device on which you can't run an agent? Would I be totally out of my mind to specify a cheap HP or Dell server with a couple of 1TB hard disks, and a SAS card to then talk to an HP Ultrium (or similar) device? The biggest drawback to this would be cost (400ish for the server, 200 for the SAS connectivity and 1700 for a LTO4 device) Notes: I'd love to be able to say that I'd get rid of tapes entirely, and use some form of hard disk backup. In a previous job, we had LaCie USB drives, which were decidedly unreliable.

    Read the article

  • Optimal preferences for prefix queries with Oracle catalog (CTXCAT) index

    - by nw
    The documentation for Oracle Text gives this example of a prefix/substring preference setting for context and catalog indexes: begin ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE'); ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3'); ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4'); ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES'); end; What I need to know is whether the substring_index attribute is necessary if I only ever issue prefix searches, such as: SELECT title FROM auction WHERE CATSEARCH(title, 'cam*', '') > 0; TITLE --------------- CANON CAMERA FUJI CAMERA NIKON CAMERA OLYMPUS CAMERA PENTAX CAMERA SONY CAMERA 6 rows selected

    Read the article

  • ffmpeg libxvid settings for optimal quality and preferably fast encoding

    - by dropson
    What ffmpeg settings should I use to convert a video into xvid with a mixed speed and quality ratio, using 2-passes, and alternativly 1 pass. Currently I use the following for just 1 pass, but I need a better sugestion. -acodec libmp3lame -ab 128 -ar 44100 -ac 2 -vcodec libxvid -qmin 3 -qmax 5 -mbd 2 -bf 2 -flags +4mv -trellis -aic -cmp 2 -subcmp 2 -g 2 -maxrate 1300 -b 1200 -threads 0

    Read the article

  • Optimal two variable linear regression SQL statement (censoring outliers)

    - by Dave Jarvis
    Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 15 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) <15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here (with five outliers highlighted): Questions How do I return the y value against all rows without repeating the same query to collect and collate the data? That is, how do I "reuse" the list of t values? How would you change the query to eliminate outliers (at an 85% confidence interval)? The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Thank you!

    Read the article

  • Optimal two variable linear regression SQL statement

    - by Dave Jarvis
    Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 5 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) <15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; and insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here: Questions How do I return the y value against all rows without repeating the same query to collect and collate the data? That is, how do I "reuse" the list of t values? How would you change the query to eliminate outliers (at an 85% confidence interval)? The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Thank you!

    Read the article

  • Optimal two variable linear regression calculation

    - by Dave Jarvis
    Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT, FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 15 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) < 15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here: Question The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Related Sites Least absolute deviations Robust regression Thank you!

    Read the article

  • Optimal password salt length

    - by Juliusz Gonera
    I tried to find the answer to this question on Stack Overflow without any success. Let's say I store passwords using SHA-1 hash (so it's 160 bits) and let's assume that SHA-1 is enough for my application. How long should be the salt used to generated password's hash? The only answer I found was that there's no point in making it longer than the hash itself (160 bits in this case) which sounds logical, but should I make it that long? E.g. Ubuntu uses 8-byte salt with SHA-512 (I guess), so would 8 bytes be enough for SHA-1 too or maybe it would be too much?

    Read the article

  • Most optimal order (of joins) for left join

    - by Ram
    I have 3 tables Table1 (with 1020690 records), Table2(with 289425 records), Table 3(with 83692 records).I have something like this SELECT * FROM Table1 T1 /* OK fine select * is bad when not all columns are needed, this is just an example*/ LEFT JOIN Table2 T2 ON T1.id=T2.id LEFT JOIN Table3 T3 ON T1.id=T3.id and a query like this SELECT * FROM Table1 T1 LEFT JOIN Table3 T3 ON T1.id=T3.id LEFT JOIN Table2 T2 ON T1.id=T2.id The query plan shows me that it uses 2 Merge Join for both the joins. For the first query, the first merge is with T1 and T2 and then with T3. For the second query, the first merge is with T1 and T3 and then with T2. Both these queries take about the same time(40 seconds approx.) or sometimes Query1 takes couple of seconds longer. So my question is, does the join order matter ?

    Read the article

  • How to index a table with a Type 2 slowly changing dimension for optimal performance

    - by The Lazy DBA
    Suppose you have a table with a Type 2 slowly-changing dimension. Let's express this table as follows, with the following columns: * [Key] * [Value1] * ... * [ValueN] * [StartDate] * [ExpiryDate] In this example, let's suppose that [StartDate] is effectively the date in which the values for a given [Key] become known to the system. So our primary key would be composed of both [StartDate] and [Key]. When a new set of values arrives for a given [Key], we assign [ExpiryDate] to some pre-defined high surrogate value such as '12/31/9999'. We then set the existing "most recent" records for that [Key] to have an [ExpiryDate] that is equal to the [StartDate] of the new value. A simple update based on a join. So if we always wanted to get the most recent records for a given [Key], we know we could create a clustered index that is: * [ExpiryDate] ASC * [Key] ASC Although the keyspace may be very wide (say, a million keys), we can minimize the number of pages between reads by initially ordering them by [ExpiryDate]. And since we know the most recent record for a given key will always have an [ExpiryDate] of '12/31/9999', we can use that to our advantage. However... what if we want to get a point-in-time snapshot of all [Key]s at a given time? Theoretically, the entirety of the keyspace isn't all being updated at the same time. Therefore for a given point-in-time, the window between [StartDate] and [ExpiryDate] is variable, so ordering by either [StartDate] or [ExpiryDate] would never yield a result in which all the records you're looking for are contiguous. Granted, you can immediately throw out all records in which the [StartDate] is greater than your defined point-in-time. In essence, in a typical RDBMS, what indexing strategy affords the best way to minimize the number of reads to retrieve the values for all keys for a given point-in-time? I realize I can at least maximize IO by partitioning the table by [Key], however this certainly isn't ideal. Alternatively, is there a different type of slowly-changing-dimension that solves this problem in a more performant manner?

    Read the article

  • Sending files using Winsock - optimal send() data length?

    - by Meta
    I am using Winsock with non-blocking sockets to send a file to a client. The way I'm doing it right now is that I read a chunk of 8192 bytes from the file, and then loop until all of it successfully goes through send() (obviously handling WSAEWOULDBLOCK as it occurs). I then move on and read the next 8192 bytes, and so on... Although I can use any other number than 8192 when I test the transfer on my local machine, once I try it over a network, it seems like 8191 is the largest number I can use. When I try to use any number higher than 8191 (starting with 8192), the file transfer becomes extremely slow (about 5 times slower). Is there any reason why 8191 is so special? I've done some more testing and it turns out that using 8000 is slightly faster (by 0.5%). If you understand why 8191 is so special, can you tell me if there is a number better than the others (better than 8000)? I have a feeling that it has something to do with the fact that the default send buffer allocated to the socket by Winsock is 8KB, but I don't understand why. It might also have something to do with the Nagle algorithm, but again, I'm not sure how. Note that I have not modified the SO_SNDBUF option nor the TCP_NODELAY option. Or am I doing this all wrong? What's the best way of sending a file over a non-blocking socket?

    Read the article

  • What applications is Python optimal for?

    - by Alan
    I'm already a professional J2EE developer by day, and Rails developer by night. I'm planning on adding Python to my list of skills. I'm already convinced a language is just a tool, so I'm not interested in a religious war. I agree with the Pragmatic Programmers that learning one language/year is a good thing for your professional development So, in your considered opinion, what kinds of applications does Python hit the sweet spot? And why? What advantages does it have, and why do these advantages outweigh the costs in adopting Python? ADD: I also plan on learning a pure functional language like Scheme.

    Read the article

  • Rewrite this function as DB query?

    - by aLk
    I'm cleaning up my code, should i change the following function to a MySQL query? If so what would be a nice MySQL function to achieve this functionality? public ArrayList getNewTitles(ArrayList candidateTitles, ArrayList existingTitles) { ArrayList newTitles = new ArrayList(); Movie movie = new Movie(); boolean isNew = true; for(int i=0; i<candidateTitles.size(); i++) { for(int j=0; j<existingTitles.size(); j++) { movie = (Movie)existingTitles.get(j); if(((String)candidateTitles.get(i)).equals(movie.getRawTitle())) { isNew = false; } } if(isNew == true) { System.out.println("newTitle for crawling: " + (String)candidateTitles.get(i)); newTitles.add((String)candidateTitles.get(i)); } else { System.out.println("candidate binned: " + (String)candidateTitles.get(i)); } isNew = true; } return newTitles; }

    Read the article

  • Optimal setup for Doxygen in a large multi-application COM project

    - by John
    A system has up to 100 VC++ projects, each spitting out a DLL or EXE. In addition there are many COM components with IDL and generated .h/.c files. What's 'the right way' or at least a good way to organise this with Doxygen? One overall doxy project or one per project/solution? And what's the right way to handle COM, which has generated code and a lot of 'fluff' that will bloat generated HTML files.

    Read the article

  • Outputcache - how to determine optimal value for duration?

    - by Steve
    I read somewhere that for a high traffic site (I guess that is a murky term as well), 30 - 60 seconds is a good value. Obviously I could do a load test and vary the values, but I couldn't find any kind of documentation on this. Most samples have a minute, a couple of minutes. There's no recommended range. Is there something on msdn or anywhere that talks about this?

    Read the article

  • Optimal way to initialize varying objects

    - by John Smith
    I have to initialize a lot of different types of objects based on an integer parameter. They all have the same overall initialization methods. At the moment I have the following code #def APPLE 1 #def PEAR 2 switch (t) { case APPLE: newobj = [[FApple alloc] init]; break; case PEAR: newobj = [[FPear] alloc] init]; break; default: retobj = nil; } I believe there must be a better way to do this. When I add FOrange I have to go and add another line here. What would be a better way?

    Read the article

  • Optimal way to store and pass a date to Javascript

    - by user1493115
    I need to store a date-time value in MySQL and subsequently display it on a webpage. Due to its flexibility I usually chose to store a Unix timestamp in the database and convert it with PHP's date() to the desired format. This time however I would like to use MySQL's datetime field (mostly due to 2038) and apply the browser's timezone (hence I cannot simply format it on the server and pass the string to the client). I thought of storing the date as UTC datetime in the database and send it as well-defined format to the client, where it will be further processed. Here I would like to avoid a Unix timestamp but everything else might add additional overhead in processing. Is there any best practice as far as date processing is concerned in a MySQL, PHP, JQuery environment? Thanks.

    Read the article

  • Optimal diff between object lists in Java

    - by Philipp
    I have a List of Java objects on my server which is sent to the client through some serialization mechanism. Once in a while the List of objects gets updated on the server, that is, some objects get added, some get deleted and others just change their place in the List. I want to update the List on the client side as well, but send the least possible data. Especially, I don't want to resend Objects which are already available on the client. Is there a library available which will produce some sort of diff from the two lists, so that I can only send the difference and the new Objects accross the wire? I have found several Java implementation of the unix diff command, but this algorithm is unpractical for order changes. ie. [A,B,C] - [C,B,A] could be sent as only place changes [1-3] [3-1], while diff will want to resend the whole A and C objects (as far as I understand).

    Read the article

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20  | Next Page >