data load performance - Page 405

Basic site analytics doesn't tally with Google data

- by Jenkz

After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering I've been experimenting with a very basic analytics system of my own. MySQL table: hit_id, subsite_id, timestamp, ip, url The subsite_id let's me drill down to a folder (as explained in the previous question). I can now get the following metrics: Page Views - Grouped by subsite_id and date Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!) The usual "most visited page", "likely time to visit" etc etc. I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google. So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far. Short Questions: Is it worth me collating a list of all major crawlers to discount, is any list likely to change regularly? Are there any other obvious filters that Google will be applying to GA data? What other data would you collect that might be of use further down the line? What variables does Google use to work out entrance search keywords to a site? The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.

Read the article

structured vs. unstructured data in db

- by Igor

the question is one of design. i'm gathering a big chunk of performance data with lots of key-value pairs. pretty much everything in /proc/cpuinfo, /proc/meminfo/, /proc/loadavg, plus a bunch of other stuff, from several hundred hosts. right now, i just need to display the latest chunk of data in my UI. i will probably end up doing some analysis of the data gathered to figure out performance problems down the road, but this is a new application so i'm not sure what exactly i'm looking for performance-wise just yet. i could structure the data in the db -- have a column for each key i'm gathering. the table would end up being O(100) columns wide, it would be a pain to put into the db, i would have to add new columns if i start gathering a new stat. but it would be easy to sort/analyze the data just using SQL. or i could just dump my unstructured data blob into the table. maybe three columns -- host id, timestamp, and a serialized version of my array, probably using JSON in a TEXT field. which should I do? am i going to be sorry if i go with the unstructured approach? when doing analysis, should i just convert the fields i'm interested in and create a new, more structured table? what are the trade-offs i'm missing here?

Read the article

Linq to SQL Problem System.Data.Linq.IdentityManager.StandardIdentityManager.MultiKeyManager

- by luckyluke

I have a really tricky thing going up here. My project has around 100 tables and they are all mapped by LINQ. Everything works fine in a dev and test environment. These enviroments are MS Win 2008 r2 servers with SQL 2008 sp1 databases. IIS and SQL are on a different machines. Now on production enviroment which is MS Win 2003 x64 web farm + geoclustered SQL 2008 IT DOES not work. All I get is the exception System.IndexOutOfRangeException: Index was outside the bounds of the array. at System.Data.Linq.IdentityManager.StandardIdentityManager.MultiKeyManager3.TryCreateKeyFr>om Values(Object[] values, MultiKey& k) at System.Data.Linq.IdentityManager.StandardIdentityManager.IdentityCache2.Find(Object[] keyValues) at System.Data.Linq.ChangeProcessor.GetOtherItem(MetaAssociation assoc, Object instance) at System.Data.Linq.ChangeProcessor.BuildEdgeMaps() at System.Data.Linq.ChangeProcessor.SubmitChanges(ConflictMode failureMode) at System.Data.Linq.DataContext.SubmitChanges(ConflictMode failureMode) at ERS.IIMP.Services.ExposuresSrv.Update(Int32 ExpID, Int32 AssID) Services\ExposuresSrv.cs` My question is What the hell. They have precisely the same DBML, the DB has exactly THE SAME structure (when I get the DB from prod to TEST and mount it eveything works just great), the binaries on the WEB Server are the same. I seriously do not know what to do.... Did anyone found that Linq works on one env and does not on the second?? I mam really lost here. I really hope You can help me:)

Read the article

Saving data - does work on Simulator, not on Device

- by Peter Hajas

I use NSData to persist an image in my application. I save the image like so (in my App Delegate): - (void)applicationWillTerminate:(UIApplication *)application { NSLog(@"Saving data"); NSData *data = UIImagePNGRepresentation([[viewController.myViewController myImage]image]); //Write the file out! NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES); NSString *documentsDirectory = [paths objectAtIndex:0]; NSString *path_to_file = [documentsDirectory stringByAppendingString:@"lastimage.png"]; [data writeToFile:path_to_file atomically:YES]; NSLog(@"Data saved."); } and then I load it back like so (in my view controller, under viewDidLoad): NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES); NSString *documentsDirectory = [paths objectAtIndex:0]; NSString *path_to_file = [documentsDirectory stringByAppendingString:@"lastimage.png"]; if([[NSFileManager defaultManager] fileExistsAtPath:path_to_file]) { NSLog(@"Found saved file, loading in image"); NSData *data = [NSData dataWithContentsOfFile:path_to_file]; UIImage *temp = [[UIImage alloc] initWithData:data]; myViewController.myImage.image = temp; NSLog(@"Finished loading in image"); } This code works every time on the simulator, but on the device, it can never seem to load back in the image. I'm pretty sure that it saves out, but I have no view of the filesystem. Am I doing something weird? Does the simulator have a different way to access its directories than the device? Thanks!

Read the article

SeriesInterpolate - removing data at start of array

- by Allan Jardine

Hello all, I've been experimenting with Flex Charts (in Flash Builder 4) recently, but have run into one area which didn't quite work as I was expecting. Specifically, when adding and removing a data point from an array, with SeriesInterpolate set. I've put up three examples, as I expect these will make a lot more sense than me trying to explain it with words only!: http://sprymedia.co.uk/media/misc/flex/linechart/LineChart-Add.swf - Clicking the button in the top right adds a new data point to the end of the data array, and the chart is smoothly updated (almost smoothly there is a 1px shift when animating which is odd...) http://sprymedia.co.uk/media/misc/flex/linechart/LineChart-Delete.swf - Clicking the button (which is now labelled incorrectly) will remove the item at the start of the data array - but the chart draws this as if it were removing the end element. http://sprymedia.co.uk/media/misc/flex/linechart/LineChart-AddDelete.swf - This is sort of what I'm eventually aiming for - a smooth side scroll chart, where new data is added at the end and the old data is sifted off the front. However the delete behaviour makes this look a bit odd. Does anyone know if there is a way to get the smooth transition I'm looking with SeriesInterpolate? Or is it possible to implement a custom transition? Many thanks, Allan

Read the article

MySQL table data transformation -- how can I dis-aggregate MySQL time data?

- by lighthouse65

We are coding for a MySQL data warehousing application that stores descriptive data (User ID, Work ID, Machine ID, Start and End Time columns in the first table below) associated with time and production quantity data (Output and Time columns in the first table below) upon which aggregate (SUM, COUNT, AVG) functions are applied. We now wish to dis-aggregate time data for another type of analysis. Our current data table design: +---------+---------+------------+---------------------+---------------------+--------+------+ | User ID | Work ID | Machine ID | Event Start Time | Event End Time | Output | Time | +---------+---------+------------+---------------------+---------------------+--------+------+ | 080025 | ABC123 | M01 | 2008-01-24 16:19:15 | 2008-01-24 16:34:45 | 2120 | 930 | +---------+---------+------------+---------------------+---------------------+--------+------+ Reprocessing dis-aggregation that we would like to do would be to transform table content based on a granularity of minutes, rather than the current production event ("Event Start Time" and "Event End Time") granularity. The resulting reprocessing of existing table rows would look like: +---------+---------+------------+---------------------+--------+ | User ID | Work ID | Machine ID | Production Minute | Output | +---------+---------+------------+---------------------+--------+ | 080025 | ABC123 | M01 | 2010-01-24 16:19 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:20 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:21 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:23 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:24 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:25 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:26 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:27 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:28 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:29 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:30 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:31 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:33 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:34 | 133 | +---------+---------+------------+---------------------+--------+ So the reprocessing would take an existing row of data created at the granularity of production event and modify the granularity to minutes, eliminating redundant (Event End Time, Time) columns while doing so. It assumes a constant rate of production and divides output by the difference in minutes plus one to populate the new table's Output column. I know this can be done in code...but can it be done entirely in a MySQL insert statement (or otherwise entirely in MySQL)? I am thinking of a INSERT ... INTO construction but keep getting stuck. An additional complexity is that there are hundreds of machines to include in the operation so there will be multiple rows (one for each machine) for each minute of the day. Any ideas would be much appreciated. Thanks.

Read the article

SharePoint Designer Workflow: Unruly 'Collect User Data' Action

- by Jeremy

I'm a student in a SharePoint class online. This problem has riddled everyone I've discussed it with, including the teacher. There seems to be some sort of problem when I create a workflow with the collect data action. I can create workflows that send e-mails and use the other actions just fine. What could be causing this problem? My reproduction steps are simple: Create a new Site Collection with the Blank Site template. Create a new Custom List. In SharePoint Designer, start a new workflow on the Custom List. Add the collect data action to the workflow. Set the user to the one that created the task. Set the data to anything. A single check box, a string, Choice, doesn't matter. Leave the output variable as default. Mystery error appears! When the Check Workflow button is pressed, nothing happens. No message box appears at all. The warning icon in the Steps panel merely points out that there are some errors, it isn't specific as to what they are. Additionally, when I click on the data object again after it's been created, it doesn't populate the form with the old values. It goes back to the default name with no fields. So there's definitely something going wrong here. I've narrowed the problem down to the data object, but I don't know what to do about it. The workflow acts like normal for other activities. For example, delete the Collect Data action and add a Send Email one instead and it compiles and runs successfully.

Read the article

Optimizing MySql query to avoid using "Using filesort"

- by usef_ksa

I need your help to optimize the query to avoid using "Using filesort".The job of the query is to select all the articles that belongs to specific tag. The query is: "select title from tag,article where tag='Riyad' AND tag.article_id=article.id order by tag.article_id". the tables structure are the following: Tag table CREATE TABLE `tag` ( `tag` VARCHAR( 30 ) NOT NULL , `article_id` INT NOT NULL , INDEX ( `tag` ) ) ENGINE = MYISAM ; Article table CREATE TABLE `article` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 60 ) NOT NULL ) ENGINE = MYISAM Sample data INSERT INTO `article` VALUES (1, 'About Riyad'); INSERT INTO `article` VALUES (2, 'About Newyork'); INSERT INTO `article` VALUES (3, 'About Paris'); INSERT INTO `article` VALUES (4, 'About London'); INSERT INTO `tag` VALUES ('Riyad', 1); INSERT INTO `tag` VALUES ('Saudia', 1); INSERT INTO `tag` VALUES ('Newyork', 2); INSERT INTO `tag` VALUES ('USA', 2); INSERT INTO `tag` VALUES ('Paris', 3); INSERT INTO `tag` VALUES ('France', 3);

Read the article

Syncing data between devel/live databases in Django

- by T. Stone

With Django's new multi-db functionality in the development version, I've been trying to work on creating a management command that let's me synchronize the data from the live site down to a developer machine for extended testing. (Having actual data, particularly user-entered data, allows me to test a broader range of inputs.) Right now I've got a "mostly" working command. It can sync "simple" model data but the problem I'm having is that it ignores ManyToMany fields which I don't see any reason for it do so. Anyone have any ideas of either how to fix that or a better want to handle this? Should I be exporting that first query to a fixture first and then re-importing it? from django.core.management.base import LabelCommand from django.db.utils import IntegrityError from django.db import models from django.conf import settings LIVE_DATABASE_KEY = 'live' class Command(LabelCommand): help = ("Synchronizes the data between the local machine and the live server") args = "APP_NAME" label = 'application name' requires_model_validation = False can_import_settings = True def handle_label(self, label, **options): # Make sure we're running the command on a developer machine and that we've got the right settings db_settings = getattr(settings, 'DATABASES', {}) if not LIVE_DATABASE_KEY in db_settings: print 'Could not find "%s" in database settings.' % LIVE_DATABASE_KEY return if db_settings.get('default') == db_settings.get(LIVE_DATABASE_KEY): print 'Data cannot synchronize with self. This command must be run on a non-production server.' return # Fetch all models for the given app try: app = models.get_app(label) app_models = models.get_models(app) except: print "The app '%s' could not be found or models could not be loaded for it." % label for model in app_models: print 'Syncing %s.%s ...' % (model._meta.app_label, model._meta.object_name) # Query each model from the live site qs = model.objects.all().using(LIVE_DATABASE_KEY) # ...and save it to the local database for record in qs: try: record.save(using='default') except IntegrityError: # Skip as the record probably already exists pass

Read the article

DAL Layer : EF 4.0 or Normal Data access layer with Stored Procedure

- by Harryboy

Hello Experts, Application : I am working on one mid-large size application which will be used as a product, we need to decide on our DAL layer. Application UI is in Silverlight and DAL layer is going to be behind service layer. We are also moving ahead with domain model, so our DB tables and domain classes are not having same structure. So patterns like Data Mapper and Repository will definitely come into picture. I need to design DAL Layer considering below mentioned factors in priority manner Speed of Development with above average performance Maintenance Future support and stability of the technology Performance Limitation : 1) As we need to strictly go ahead with microsoft, we can not use NHibernate or any other ORM except EF 4.0 2) We can use any code generation tool (Should be Open source or very cheap) but it should only generate code in .Net, so there would not be any licensing issue on per copy basis. Questions I read so many articles about EF 4.0, on outset it looks like that it is still lacking in features from NHibernate but it is considerably better then EF 1.0 So, Do you people feel that we should go ahead with EF 4.0 or we should stick to ADO .Net and use any code geneartion tool like code smith or any other you feel best Also i need to answer questions like what time it will take to port application from EF 4.0 to ADO .Net if in future we stuck up with EF 4.0 for some features or we are having serious performance issue. In reverse case if we go ahead and choose ADO .Net then what time it will take to swith to EF 4.0 Lastly..as i was going through the article i found the code only approach (with POCO classes) seems to be best suited for our requirement as switching is really easy from one technology to other. Please share your thoughts on the same and please guide on the above questions

Read the article

SQL Server lock/hang issue

- by mattwoberts

Hi, I'm using SQL Server 2008 on Windows Server 2008 R2, all sp'd up. I'm getting occasional issues with SQL Server hanging with the CPU usage on 100% on our live server. It seems all the wait time on SQL Sever when this happens is given to SOS_SCHEDULER_YIELD. Here is the Stored Proc that causes the hang. I've added the "WITH (NOLOCK)" in an attempt to fix what seems to be a locking issue. ALTER PROCEDURE [dbo].[MostPopularRead] AS BEGIN SET NOCOUNT ON; SELECT c.ForeignId , ct.ContentSource as ContentSource , sum(ch.HitCount * hw.Weight) as Popularity , (sum(ch.HitCount * hw.Weight) * 100) / @Total as Percent , @Total as TotalHits from ContentHit ch WITH (NOLOCK) join [Content] c WITH (NOLOCK) on ch.ContentId = c.ContentId join HitWeight hw WITH (NOLOCK) on ch.HitWeightId = hw.HitWeightId join ContentType ct WITH (NOLOCK) on c.ContentTypeId = ct.ContentTypeId where ch.CreatedDate between @Then and @Now group by c.ForeignId , ct.ContentSource order by sum(ch.HitCount * hw.HitWeightMultiplier) desc END The stored proc reads from the table "ContentHit", which is a table that tracks when content on the site is clicked (it gets hit quite frequently - anything from 4 to 20 hits a minute). So its pretty clear that this table is the source of the problem. There is a stored proc that is called to add hit tracks to the ContentHit table, its pretty trivial, it just builds up a string from the params passed in, which involves a few selects from some lookup tables, followed by the main insert: BEGIN TRAN insert into [ContentHit] (ContentId, HitCount, HitWeightId, ContentHitComment) values (@ContentId, isnull(@HitCount,1), isnull(@HitWeightId,1), @ContentHitComment) COMMIT TRAN The ContentHit table has a clustered index on its ID column, and I've added another index on CreatedDate since that is used in the select. When I profile the issue, I see the Stored proc executes for exactly 30 seconds, then the SQL timeout exception occurs. If it makes a difference the web application using it is ASP.NET, and I'm using Subsonic (3) to execute these stored procs. Can someone please advise how best I can solve this problem? I don't care about reading dirty data... Thanks

Read the article

export and import utf8 data in mysql: best practices

- by ChrisRamakers

We're often faced with the need to send a data file to one of our clients with data from the database he/she needs to translate. Most of the time this export is CSV or XLS. Most of the time we create a csv dump with phpmyadmin and get an xls file in return with the translated data. The problem is that most of the time the data is UTF8 and when the file is returned as xls each and every time we load the data into mysql again we end up with utf8 problems, characters not being displayed properly, etc ... We've already doublechecked everything in mysql from my.conf to column charactersets and everything is set correctly to UTF8. My question is not how to fix the encoding issue since that's been solved but how we would best proceed in the future handling this situation? What export format should we hand over? How should we import (just mysql load data infile or our own processing scripts). What is the general consensus on how to handle this situation? We would like to continue using excel if possible since that's the format almost everybody expects including our clients' translation agencies. Our clients' ease of use is the most important factor here, without overloading us with major issues each time. The best of both worlds :)

Read the article

Where is my python script spending time? Is there "missing time" in my cprofile / pstats trace?

- by fmark

I am attempting to profile a long running python script. The script does some spatial analysis on raster GIS data set using the gdal module. The script currently uses three files, the main script which loops over the raster pixels called find_pixel_pairs.py, a simple cache in lrucache.py and some misc classes in utils.py. I have profiled the code on a moderate sized dataset. pstats returns: p.sort_stats('cumulative').print_stats(20) Thu May 6 19:16:50 2010 phes.profile 355483738 function calls in 11644.421 CPU seconds Ordered by: cumulative time List reduced from 86 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.008 0.008 11644.421 11644.421 <string>:1(<module>) 1 11064.926 11064.926 11644.413 11644.413 find_pixel_pairs.py:49(phes) 340135349 544.143 0.000 572.481 0.000 utils.py:173(extent_iterator) 8831020 18.492 0.000 18.492 0.000 {range} 231922 3.414 0.000 8.128 0.000 utils.py:152(get_block_in_bands) 142739 1.303 0.000 4.173 0.000 utils.py:97(search_extent_rect) 745181 1.936 0.000 2.500 0.000 find_pixel_pairs.py:40(is_no_data) 285478 1.801 0.000 2.271 0.000 utils.py:98(intify) 231922 1.198 0.000 2.013 0.000 utils.py:116(block_to_pixel_extent) 695766 1.990 0.000 1.990 0.000 lrucache.py:42(get) 1213166 1.265 0.000 1.265 0.000 {min} 1031737 1.034 0.000 1.034 0.000 {isinstance} 142740 0.563 0.000 0.909 0.000 utils.py:122(find_block_extent) 463844 0.611 0.000 0.611 0.000 utils.py:112(block_to_pixel_coord) 745274 0.565 0.000 0.565 0.000 {method 'append' of 'list' objects} 285478 0.346 0.000 0.346 0.000 {max} 285480 0.346 0.000 0.346 0.000 utils.py:109(pixel_coord_to_block_coord) 324 0.002 0.000 0.188 0.001 utils.py:27(__init__) 324 0.016 0.000 0.186 0.001 gdal.py:848(ReadAsArray) 1 0.000 0.000 0.160 0.160 utils.py:50(__init__) The top two calls contain the main loop - the entire analyis. The remaining calls sum to less than 625 of the 11644 seconds. Where are the remaining 11,000 seconds spent? Is it all within the main loop of find_pixel_pairs.py? If so, can I find out which lines of code are taking most of the time?

Read the article

sorting filtered data in asp.net listview

- by user791345

I've created a listview that's filled up with a list of guitars from the database on page load like so: protected void Page_Load(object sender, EventArgs e) { SqlConnection con = new SqlConnection(WebConfigurationManager.ConnectionStrings["GuitarsLTDBConnectionString"].ToString()); SqlCommand cmd = new SqlCommand("SELECT * FROM Guitars", con); SqlDataAdapter da = new SqlDataAdapter(cmd.CommandText, con); DataTable dt = new DataTable(); da.Fill(dt); lvGuitars.DataSource = dt; lvGuitars.DataBind(); } The following code filters that list of guitars by a certain Make when the user checks the checkbox corresponding to that make protected void chkOrgs_SelectedIndexChanged(object sender, EventArgs e) { DataTable dt = (DataTable)lvGuitars.DataSource; DataView dv = new DataView(dt); if (chkOrgs.SelectedValue == "Gibson") { dv.RowFilter = "Make = 'Gibson' OR Make='Fender'"; } lvGuitars.DataSource = dv.ToTable(); lvGuitars.DataBind(); } Now, what I want to do is be able to sort the latest data that is present within the listview. Meaning, if sort is clicked before filtering, the it should sort all data. If sort is clicked after filtering, it should sort the filtered data. I'm using the following code, which is triggered upon a LinkButton click protected void lnkSortResults_Click(object sender, EventArgs e) { DataTable dt = (DataTable)lvGuitars.DataSource; DataView dv = new DataView(dt); dv.Sort = "Make ASC"; lvGuitars.DataSource = dv.ToTable(); lvGuitars.DataBind(); } The problem is that all the data that the listview was loaded with before any filtering is sorted, and not the latest filtered data. How can I change this code so that the latest data available in the listview is the one that's sorted? Thanks

Read the article

Python socket error on UDP data receive. (10054)

- by Charles

I currently have a problem using UDP and Python socket module. We have a server and clients. The problem occurs when we send data to a user. It's possible that user may have closed their connection to the server through a client crash, disconnect by ISP, or some other improper method. As such, it is possible to send data to a closed socket. Of course with UDP you can't tell if the data really reached or if it's closed, as it doesn't care (atleast, it doesn't bring up an exception). However, if you send data and it is closed off, you get data back somehow (???), which ends up giving you a socket error on sock.recvfrom. [Errno 10054] An existing connection was forcibly closed by the remote host. Almost seems like an automatic response from the connection. Although this is fine, and can be handled by a try: except: block (even if it lowers performance of the server a little bit). The problem is, I can't tell who this is coming from or what socket is closed. Is there anyway to find out 'who' (ip, socket #) sent this? It would be great as I could instantly just disconnect them and remove them from the data. Any suggestions? Thanks.

Read the article

Plotting andrews curves of subsets of a data frame on the same plot

- by user2976477

I have a data frame of 12 columns and I want to plot andrews curves in R of this data, basing the color of the curves on the 12th columns. Below are a few samples from the data (sorry the columns are not aligned with the numbers) Teacher_explaining Teacher_enthusiastic Teacher_material_interesting Material_stimulating Material_useful Clear_marking Marking_fair Feedback_prompt Feedback_clarifies Detailed_comments Notes Year 80 80 80 80 85 85 80 80 80 80 70 3 70 60 30 40 70 60 30 40 70 0 30 3 100 90 90 80 80 100 100 90 100 100 100 MSc 85 85 85 90 90 70 90 50 70 80 100 MSc 90 50 90 90 90 70 100 50 80 100 100 4 100 80 80 75 90 80 80 50 80 80 90 3 From this data I tried to plot andrews curves using the code below: install.packages("andrews") library(andrews) col <- as.numeric(factor(course[,12])) andrews(course[,1:12], clr = 12) However, the 12th column has three groups (3 types of responses) and I want to group two of them and then plot the andrews curve of the data, without editing my data frame in Excel. x <- subset(course, Year == "MSc" & "4") y <- subset(course, Year == "3") I tried the above code, but my argument for x don't work. "MSc", "3" and "4" are the groups in the 12th column, and I want to group MSc and 4 so that their Andrews curves have the same color. If you have any idea how to do this, please let me know.

Read the article

Optimizing MySQL for ALTER TABLE of InnoDB

- by schuilr

Sometime soon we will need to make schema changes to our production database. We need to minimize downtime for this effort, however, the ALTER TABLE statements are going to run for quite a while. Our largest tables have 150 million records, largest table file is 50G. All tables are InnoDB, and it was set up as one big data file (instead of a file-per-table). We're running MySQL 5.0.46 on an 8 core machine, 16G memory and a RAID10 config. I have some experience with MySQL tuning, but this usually focusses on reads or writes from multiple clients. There is lots of info to be found on the Internet on this subject, however, there seems to be very little information available on best practices for (temporarily) tuning your MySQL server to speed up ALTER TABLE on InnoDB tables, or for INSERT INTO .. SELECT FROM (we will probably use this instead of ALTER TABLE to have some more opportunities to speed things up a bit). The schema changes we are planning to do is adding a integer column to all tables and make it the primary key, instead of the current primary key. We need to keep the 'old' column as well so overwriting the existing values is not an option. What would be the ideal settings to get this task done as quick as possible?

Read the article

Association and model data saving problem

- by Zhlobopotam

Developing with cakephp 1.3 (latest from github). There are 2 models bind with hasAndBelongsToMany: documents and tags. Document can have many tags in other words. I've add a new document submitting form there user can enter a list of tags separated with commas (new tag will be added, if not exist already). I looked at cakephp bakery 2.0 source code on github and found the solution. But it seems that something is wrong. class Document extends AppModel { public $hasAndBelongsToMany = array('Tag'); public function beforeSave($options = array()) { if (isset($this->data[$this->alias]['tags']) && !empty($this- >data[$this->alias]['tags'])) { $tagIds = $this->Tag->saveDocTags($this->data[$this->alias] ['tags']); unset($this->data[$this->alias]['tags']); $this->data[$this->Tag->alias][$this->Tag->alias] = $tagIds; } return true; } } class Tag extends AppModel { public $hasAndBelongsToMany = array ('Document'); public function saveDocTags($commalist = '') { if ($commalist == '') return null; $tags = explode(',',$commalist); if (empty($tags)) return null; $existing = $this->find('all', array( 'conditions' => array('title' => $tags) )); $return = Set::extract($existing,'/Tag/id'); if (sizeof($existing) == sizeof($tags)) { return $return; } $existing = Set::extract($existing,'/Tag/title'); foreach ($tags as $tag) { if (!in_array($tag, $existing)) { $this->create(array('title' => $tag)); $this->save(); $return[] = $this->id; } } return $return; } } So, new tags creation works well but document model can't save association data and tells: SQL Error: 1054: Unknown column 'Array' in 'field list' Query: INSERT INTO documents (title, content, shortnfo, date, status) VALUES ('Document with tags', '', '', Array, 1) Any ideas how to solve this problem?

Read the article

MySQL table data transformation -- how can I dis-aggreate MySQL time data?

- by lighthouse65

We are coding for a MySQL data warehousing application that stores descriptive data (User ID, Work ID, Machine ID, Start and End Time columns in the first table below) associated with time and production quantity data (Output and Time columns in the first table below) upon which aggregate (SUM, COUNT, AVG) functions are applied. We now wish to dis-aggregate time data for another type of analysis. Our current data table design: +---------+---------+------------+---------------------+---------------------+--------+------+ | User ID | Work ID | Machine ID | Event Start Time | Event End Time | Output | Time | +---------+---------+------------+---------------------+---------------------+--------+------+ | 080025 | ABC123 | M01 | 2008-01-24 16:19:15 | 2008-01-24 16:34:45 | 2120 | 930 | +---------+---------+------------+---------------------+---------------------+--------+------+ Reprocessing dis-aggregation that we would like to do would be to transform table content based on a granularity of minutes, rather than the current production event ("Event Start Time" and "Event End Time") granularity. The resulting reprocessing of existing table rows would look like: +---------+---------+------------+---------------------+--------+ | User ID | Work ID | Machine ID | Production Minute | Output | +---------+---------+------------+---------------------+--------+ | 080025 | ABC123 | M01 | 2010-01-24 16:19 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:20 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:21 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:23 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:24 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:25 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:26 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:27 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:28 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:29 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:30 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:31 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:33 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:34 | 133 | +---------+---------+------------+---------------------+--------+ So the reprocessing would take an existing row of data created at the granularity of production event and modify the granularity to minutes, eliminating redundant (Event End Time, Time) columns while doing so. It assumes a constant rate of production and divides output by the difference in minutes plus one to populate the new table's Output column. I know this can be done in code...but can it be done entirely in a MySQL insert statement (or otherwise entirely in MySQL)? I am thinking of a INSERT ... INTO construction but keep getting stuck. An additional complexity is that there are hundreds of machines to include in the operation so there will be multiple rows (one for each machine) for each minute of the day. Any ideas would be much appreciated. Thanks.

Read the article

Problem using structured data with sproxy-generated proxy c++ class

- by Odrade

I am attempting to communicate structured data types between a Visual C++ client application and an ASP.NET web service. I'm am having issues whenever any parameter or return type is not a basic type (e.g. string, int, float, etc). To illustrate the issue, I created the following ASP.NET web service: namespace TestWebService { [WebService(Namespace = "http://localhost/TestWebService")] [WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)] [ToolboxItem(false)] public class Service1 : System.Web.Services.WebService { [WebMethod] public TestData StructuredOutput() { TestData td = new TestData(); td.data = 1729; return td; } } public class TestData { public int data; } } To consume the service, I created a dirt-simple Visual C++ client in VS2005. I added a web reference to the project, which caused sproxy to generate a proxy class for me. With the generated header properly included, I attempted to invoke the service like this: int _tmain(int argc, _TCHAR* argv[]) { CoInitialize(NULL); Service1::CService1 ws; Service1::TestData td; HRESULT hr = ws.StructuredOutput(&td); //data is returned as expected CoUninitialize(); return 0; } // crashes here with access violation The call to StructuredOutput returns the data as expected, but an access violation occurs on destruction of the CService1 object. The access violation is occurring here (from atlsoap.h): void UninitializeSOAP() { if (m_spReader.p != NULL) { m_spReader->putContentHandler(NULL); //access violation m_spReader.Release(); } } I see the same behavior when using a TestData object as an input parameter, or when using any other structured data types as input or output. When I use basic types for input/output from the web service I do not experience these errors. Any ideas about why this might be happening? Is sproxy screwing something up, or am I? NOTE: I'm aware of gSOAP and the wsdl2h tool, but those aren't freely available for commercial use (and nobody here is going to buy a license). I am open to alternatives for generating the c++ proxy, as long as they are free for commercial use.

Read the article

Data Flow Object Graph and An Execution Engine for it

- by M Dotnet

I would like to build a custom workflow engine from scratch. The input to this workflow is a data flow diagram which is composed of a series of activities connected together through lines where each each represent the data flow. Each activity can export multiple outputs. Activities are complex math functions but the logic is hidden from the user. My workflow engine job is to execute the given data flow diagram. Each activity within the data flow diagram is a custom activity and each activity can output different outputs. How do you suggest to model the data flow diagram object? I need to be able to construct the data flow diagram problematically (no need for drag and drop) but I need to display the final result graphically (for display and debugging purposes). Are there any libraries out there that I could use? Should I keep the workflow presentable as an xml? I know that there are many projects out there trying to essentially doing similar thing by building such workflow engines but I need something light weight and open source. I do not need any state machine execution engine and mine is primarily sequential workflow with fork and join capabilities. My activities are wrappable as basic C# classes and I do NOT want to use anything as heavy as .NET workflow foundation.

Read the article

Should I define a single "DataContext" and pass references to it around or define muliple "DataConte

- by Nate Bross

I have a Silverlight application that consists of a MainWindow and several classes which update and draw images on the MainWindow. I'm now expanding this to keep track of everything in a database. Without going into specifics, lets say I have a structure like this: MainWindow Drawing-Surface Class1 -- Supports Drawing DataContext + DataServiceCollection<T> w/events Class2 -- Manages "transactions" (add/delete objects from drawing) Class3 Each "Class" is passed a reference to the Drawing Surface so they can interact with it independently. I'm starting to use WCF Data Services in Class1 and its working well; however, the other classes are also going to need access to the WCF Data Services. (Should I define my "DataContext" in MainWindow and pass a reference to each child class?) Class1 will need READ access to the "transactions" data, and Class2 will need READ access to some of the drawing data. So my question is, where does it make the most sense to define my DataContext? Does it make sense to: Define a "global" WCF Data Service "Context" object and pass references to that in all of my subsequent classes? Define an instance of the "Context" for each Class1, Class2, etc Have each method that requires access to data define its own instance of the "Context" and use closures handle the async load/complete events? Would a structure like this make more sense? Is there any danger in keeping an active "DataContext" open for an extended period of time? Typical usecase of this application could range from 1 minute to 40+ minutes. MainWindow Drawing-Surface DataContext Class1 -- Supports Drawing DataServiceCollection<DrawingType> w/events Class2 -- Manages "transactions" (add/delete objects from drawing) DataServiceCollection<TransactionType> w/events Class3 DataServiceCollection<T> w/events

Read the article

Optimal two variable linear regression calculation

- by Dave Jarvis

Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT, FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 15 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) < 15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here: Question The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Related Sites Least absolute deviations Robust regression Thank you!

Read the article

Suggested (simple) approach for drawing large numbers of visual elements in WPF?

- by Ender

I'm writing an interface that features a large (~50000px width) "canvas"-type area that is used to display a lot of data in a fairly novel way. This involves lots of lines, rectangles, and text. The user can scroll around to explore the entire canvas. At the moment I'm just using a standard Canvas panel with various Shapes placed on it. This is nice and easy to do: construct a shape, assign some coordinates, and attach it to the Canvas. Unfortunately, it's pretty slow (to construct the children, not to do the actual rendering). I've looked into some alternatives, it's a bit intimidating. I don't need anything fancy - just the ability to efficiently construct and place objects in a coordinate plane. If all I get are lines, colored rectangles, and text, I'll be happy. Do I need Geometry instances inside of Geometry Groups inside of GeometryDrawings inside of some Panel container? Note: I'd like to include text and graphics (i.e. colored rectangles) in the same space, if possible.

Read the article

Database and logic layer for ASP.NET MVC application

- by Ismail

I'm going to start a new project which is going to be small initially but may grow to big over the years. I'm strongly convinced that I'm going to use ASP.NET MVC with jQuery for UI. I want to go for MySQL as database for some reasons but worried on few things. I've a good years of experience working on SQL Server databases and on one project I've had a bad experience creating and managing stored procedures on MySQL database. I'm totally new to Linq but I see that it is easier to use once you are familiar with it. First thing is that accessing data should be easy. So I thought I should use MySQL to Linq but somewhere I read that it is not directly supported but MySQL .NET connector adds support for EntityFramework. I don't know what are the pros and cons of it. I would love if I can implement repository pattern as it allows to apply filter in logic layer rather than in data access layer. Will it be possible if I use Entity Framework? I'm not clear on how I should go about all this or I should just forget every thing and directly use SQL to Linq on SQL Server. I'm also concerned about the performance. Someone told me that if we use Entity framework it fetches lot of data and then filter it. Is that right? So questions basically are - Is MySQL to Linq possible? If yes where can I get more details on it? Pros and cons of using EntityFramework with MySQL? Will it be easy to access data using EntityFramework with MySQL? Will I be able to implement repository patter which allows applying filter in logic layer rather than data access layer (when I use EntityFramework with MySQL) Does it fetches hell lot of data from database and then apply filter on it? If it sounds too many questions from my side in that case, if you can just let me know what you will do (with a considerable reason) in this situation as an experienced person in this area, that should answer my question.

Search Results

Search found 80052 results on 3203 pages for 'data load performance'.

Page 405/3203 | < Previous Page | 401 402 403 404 405 406 407 408 409 410 411 412 | Next Page >

- by Jenkz

- by Igor

- by luckyluke

- by Peter Hajas

- by Allan Jardine

- by lighthouse65

- by Jeremy

- by usef_ksa

- by T. Stone

- by Harryboy

- by mattwoberts

- by ChrisRamakers

- by fmark

- by user791345

- by Charles

- by user2976477

- by schuilr

- by Zhlobopotam

- by lighthouse65

- by Odrade

- by M Dotnet

- by Nate Bross

- by Dave Jarvis

- by Ender

- by Ismail

< Previous Page | 401 402 403 404 405 406 407 408 409 410 411 412 | Next Page >