large data volumes - Page 107

setting syntax on in vim with large C file makes complete very slow

- by skeept

when I have syntax on in a large C file (about 8000) lines the completion ctrl-p and ctrl-n are very slow (more than 20). When I turn syntax off then completion takes less than a second. Any ideas on how to solve this? Thanks! EDIT: I figured out a minimal way of reproducing this behaviour: with an empty .vimrc and .vim folder the only changed settings are :set syntax on :set foldmethod=syntax and a large C file to edit, completion (and even general editing) becomes very very slow.

Read the article

Best way to migrate servers without losing any data and with no downtime(?)

- by ina

This is a methodology question from a freelancer, with a corollary on MySQL.. Is there a way to migrate from an old dedicated server to a new one without losing any data in-between - and with no downtime? In the past, I've had to lose MySQL data between the time when the new server goes up (i.e., all files transferred, system up and ready), and when I take the old server down (data still transferred to old until new one takes over). There is also a short period where both are down for DNS, etc., to refresh. Is there a way for MySQL/root to easily transfer all data that was updated/inserted between a certain time frame?

Read the article

Browsers (IE and Firefox) freeze when copying large amount of text

- by Matt

I have a web application - a Java servlet - that delivers data to users in the form of a text printout in a browser (text marked up with HTML in order to display in the browser as we want it to). The text does display in different colors, though most of it is black. One typical mode of operation is this: 1. User submits a form to request data. 2. Servlet delivers HTML file to browser. 3. User does CTRL+A to select all the text. 4. User does CTRL+C to copy all the text. 5. User goes to a text editor and does CTRL+V to paste the text. In the testing where I'm having this problem, step #2 successfully loads all the data - we wait for that to complete. We can scroll down to the end of what the browser loaded and see the end of the data. However, the browser freezes on step #3 (Firefox) or on step #4 (IE). Because step #2 finishes, I think it is a browser/memory issue, and not an issue with the web application. If I run queries to deliver smaller amounts of data (but after several queries we get the same data we would have above in one query) and copy/paste this text, the file I save it into ends up being about 8 MB. If I save the browser's displayed HTML to a file on my computer via File-Save As from the browser menu, it works fine and the file is about 22 MB. We've tried this on 2 different computers at work (both running Windows XP, with at least 2 GB of RAM and many GB of free disk space), using Firefox and IE. We also tried it on a home computer from a home network outside of work (thinking it might be our IT security software causing the problem), running Windows 7 using IE, and still had the problem. When I've done this, I can see whatever browser I'm using utilizing the CPU at 50%. Firefox's memory usage grows to about 1 GB; IE's stays in the several hundred MBs. We once let this run for half an hour, and it did not complete. I'm most likely going to modify the web app to have an option of delivering a plain text file for download, and I imagine that will get the users what they need. But for the mean time, and because I'm curious - and I don't like my application freezing people's browsers, does anyone have any ideas about the browser freezing? I understand that sometimes you just reach your memory limit, but 22 MB sounds to me like an amount I should be able to copy to the clipboard.

Read the article

MySQL Config File for Large System

- by Jonathon

We are running MySQL on a Windows 2003 Server Enterpise Edition box. MySQL is about the only program running on the box. We have approx. 8 slaves replicated to it, but my understanding is that having multiple slaves connecting to the same master does not significantly slow down performance, if at all. The master server has 16G RAM, 10 Terabyte drives in RAID 10, and four dual-core processors. From what I have seen from other sites, we have a really robust machine as our master db server. We just upgraded from a machine with only 4G RAM, but with similar hard drives, RAID, etc. It also ran Apache on it, so it was our db server and our application server. It was getting a little slow, so we split the db server onto this new machine and kept the application server on the first machine. We also distributed the application load amongst a few of our other slave servers, which also run the application. The problem is the new db server has mysqld.exe consuming 95-100% of CPU almost all the time and is really causing the app to run slowly. I know we have several queries and table structures that could be better optimized, but since they worked okay on the older, smaller server, I assume that our my.ini (MySQL config) file is not properly configured. Most of what I see on the net is for setting config files on small machines, so can anyone help me get the my.ini file correct for a large dedicated machine like ours? I just don't see how mysqld could get so bogged down! FYI: We have about 100 queries per second. We only use MyISAM tables, so skip-innodb is set in the ini file. And yes, I know it is reading the ini file correctly because I can change some settings (like the server-id and it will kill the server at startup). Here is the my.ini file: #MySQL Server Instance Configuration File # ---------------------------------------------------------------------- # Generated by the MySQL Server Instance Configuration Wizard # # # Installation Instructions # ---------------------------------------------------------------------- # # On Linux you can copy this file to /etc/my.cnf to set global options, # mysql-data-dir/my.cnf to set server-specific options # (@localstatedir@ for this installation) or to # ~/.my.cnf to set user-specific options. # # On Windows you should keep this file in the installation directory # of your server (e.g. C:\Program Files\MySQL\MySQL Server X.Y). To # make sure the server reads the config file use the startup option # "--defaults-file". # # To run run the server from the command line, execute this in a # command line shell, e.g. # mysqld --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini" # # To install the server as a Windows service manually, execute this in a # command line shell, e.g. # mysqld --install MySQLXY --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini" # # And then execute this in a command line shell to start the server, e.g. # net start MySQLXY # # # Guildlines for editing this file # ---------------------------------------------------------------------- # # In this file, you can use all long options that the program supports. # If you want to know the options a program supports, start the program # with the "--help" option. # # More detailed information about the individual options can also be # found in the manual. # # # CLIENT SECTION # ---------------------------------------------------------------------- # # The following options will be read by MySQL client applications. # Note that only client applications shipped by MySQL are guaranteed # to read this section. If you want your own MySQL client program to # honor these values, you need to specify it as an option during the # MySQL client library initialization. # [client] port=3306 [mysql] default-character-set=latin1 # SERVER SECTION # ---------------------------------------------------------------------- # # The following options will be read by the MySQL Server. Make sure that # you have installed the server correctly (see above) so it reads this # file. # [mysqld] # The TCP/IP Port the MySQL Server will listen on port=3306 #Path to installation directory. All paths are usually resolved relative to this. basedir="D:/MySQL/" #Path to the database root datadir="D:/MySQL/data" # The default character set that will be used when a new schema or table is # created and no character set is defined default-character-set=latin1 # The default storage engine that will be used when create new tables when default-storage-engine=MYISAM # Set the SQL mode to strict #sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" # we changed this because there are a couple of queries that can get blocked otherwise sql-mode="" #performance configs skip-locking max_allowed_packet = 1M table_open_cache = 512 # The maximum amount of concurrent sessions the MySQL server will # allow. One of these connections will be reserved for a user with # SUPER privileges to allow the administrator to login even if the # connection limit has been reached. max_connections=1510 # Query cache is used to cache SELECT results and later return them # without actual executing the same query once again. Having the query # cache enabled may result in significant speed improvements, if your # have a lot of identical queries and rarely changing tables. See the # "Qcache_lowmem_prunes" status variable to check if the current value # is high enough for your load. # Note: In case your tables change very often or if your queries are # textually different every time, the query cache may result in a # slowdown instead of a performance improvement. query_cache_size=168M # The number of open tables for all threads. Increasing this value # increases the number of file descriptors that mysqld requires. # Therefore you have to make sure to set the amount of open files # allowed to at least 4096 in the variable "open-files-limit" in # section [mysqld_safe] table_cache=3020 # Maximum size for internal (in-memory) temporary tables. If a table # grows larger than this value, it is automatically converted to disk # based table This limitation is for a single table. There can be many # of them. tmp_table_size=30M # How many threads we should keep in a cache for reuse. When a client # disconnects, the client's threads are put in the cache if there aren't # more than thread_cache_size threads from before. This greatly reduces # the amount of thread creations needed if you have a lot of new # connections. (Normally this doesn't give a notable performance # improvement if you have a good thread implementation.) thread_cache_size=64 #*** MyISAM Specific options # The maximum size of the temporary file MySQL is allowed to use while # recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE. # If the file-size would be bigger than this, the index will be created # through the key cache (which is slower). myisam_max_sort_file_size=100G # If the temporary file used for fast index creation would be bigger # than using the key cache by the amount specified here, then prefer the # key cache method. This is mainly used to force long character keys in # large tables to use the slower key cache method to create the index. myisam_sort_buffer_size=64M # Size of the Key Buffer, used to cache index blocks for MyISAM tables. # Do not set it larger than 30% of your available memory, as some memory # is also required by the OS to cache rows. Even if you're not using # MyISAM tables, you should still set it to 8-64M as it will also be # used for internal temporary disk tables. key_buffer_size=3072M # Size of the buffer used for doing full table scans of MyISAM tables. # Allocated per thread, if a full scan is needed. read_buffer_size=2M read_rnd_buffer_size=8M # This buffer is allocated when MySQL needs to rebuild the index in # REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE # into an empty table. It is allocated per thread so be careful with # large settings. sort_buffer_size=2M #*** INNODB Specific options *** innodb_data_home_dir="D:/MySQL InnoDB Datafiles/" # Use this option if you have a MySQL server with InnoDB support enabled # but you do not plan to use it. This will save memory and disk space # and speed up some things. skip-innodb # Additional memory pool that is used by InnoDB to store metadata # information. If InnoDB requires more memory for this purpose it will # start to allocate it from the OS. As this is fast enough on most # recent operating systems, you normally do not need to change this # value. SHOW INNODB STATUS will display the current amount used. innodb_additional_mem_pool_size=11M # If set to 1, InnoDB will flush (fsync) the transaction logs to the # disk at each commit, which offers full ACID behavior. If you are # willing to compromise this safety, and you are running small # transactions, you may set this to 0 or 2 to reduce disk I/O to the # logs. Value 0 means that the log is only written to the log file and # the log file flushed to disk approximately once per second. Value 2 # means the log is written to the log file at each commit, but the log # file is only flushed to disk approximately once per second. innodb_flush_log_at_trx_commit=1 # The size of the buffer InnoDB uses for buffering log data. As soon as # it is full, InnoDB will have to flush it to disk. As it is flushed # once per second anyway, it does not make sense to have it very large # (even with long transactions). innodb_log_buffer_size=6M # InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and # row data. The bigger you set this the less disk I/O is needed to # access data in tables. On a dedicated database server you may set this # parameter up to 80% of the machine physical memory size. Do not set it # too large, though, because competition of the physical memory may # cause paging in the operating system. Note that on 32bit systems you # might be limited to 2-3.5G of user level memory per process, so do not # set it too high. innodb_buffer_pool_size=500M # Size of each log file in a log group. You should set the combined size # of log files to about 25%-100% of your buffer pool size to avoid # unneeded buffer pool flush activity on log file overwrite. However, # note that a larger logfile size will increase the time needed for the # recovery process. innodb_log_file_size=100M # Number of threads allowed inside the InnoDB kernel. The optimal value # depends highly on the application, hardware as well as the OS # scheduler properties. A too high value may lead to thread thrashing. innodb_thread_concurrency=10 #replication settings (this is the master) log-bin=log server-id = 1 Thanks for all the help. It is greatly appreciated.

Read the article

Javascript large number array compression

- by gatapia

Hi All, I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for compression. I was thinking of implementing a base 62 number.toString(62) and parseInt(compressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered. The basic specs are: - Compress large number arrays into strings for JSONP transfer (So I think UTF is out) - Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip compression either. Any ideas would be greatly appreciated. Thanks Guido Tapia

Read the article

R: How to write out a data.frame so that I can paste it into SO for others to read?

- by John

I have a large data.frame displaying some weird properties when plotted. I'd like to ask a question about it on Stackoverflow, to do that I'd like to write the data.frame out in a form that I can paste it into SO and somebody else can easily run it and have it back into a data.frame object again. Is there an easy way to accomplish this? Also, if it is really long, should I use paste bin instead of directly paste it here?

Read the article

Is there an difference between transient properties defined in the data model, or in the custom subc

- by mystify

I was reading that setting the value of a transient property always results in marking the managed object as "dirty". However, what I don't get is this: If I make a subclass of NSManagedObject and use some extra properties which I don't need to be persistet, how does Core Data know about them and how can it mark the object as dirty when I access these? Again, they're not defined in the data model, so Core Data has no really good hint that they are there. Or does Core Data use some kind of introspection to analyze my custom class and figure out what properties I have in there?

Read the article

Download Large Files using java

- by angelina

Dear All, I M building a application in which i want to download large files on handset (mobile),but if size of file is large i m getting exception socket exception-broken pipe . inputStream = new FileInputStream(path); byte[] buffer = new byte[1024]; int bytesRead = 0; do { bytesRead = inputStream.read(buffer, offset, buffer.length); resp.getOutputStream().write(buffer, 0, bytesRead); } while (bytesRead == buffer.length); resp.getOutputStream().flush(); }

Read the article

Streaming large result sets with MySQL

- by configurator

I'm developing a spring application that uses large MySQL tables. When loading large tables, I get an OutOfMemoryException, since the driver tries to load the entire table into application memory. I tried using statement.setFetchSize(Integer.MIN_VALUE); but then every ResultSet I open hangs on close(); looking online I found that that happens because it tries loading any unread rows before closing the ResultSet, but that is not the case since I do this: ResultSet existingRecords = getTableData(tablename); try { while (existingRecords.next()) { // ... } } finally { existingRecords.close(); // this line is hanging, and there was no exception in the try clause } The hangs happen for small tables (3 rows) as well, and if I don't close the RecordSet (which happened in one method) then connection.close() hangs.

Read the article

Browser, upload large file

- by Mike

I'm looking for a way to allow a user to upload a large file (~1gb) to my unix server using a web page and browser. There are a lot of examples that illustrate how to do this with a traditional post request, however this doesn't seem like a good idea when the file is this large. I'm looking for recommendations on the best approach. Bonus points if the method includes a way of providing progress information to the user. For now security is not a major concern, as most users who will be using the service can be trusted. We can also assume that the connection between client and host will not be interrupted (or if it is they have to start over).

Read the article

Mass data store with SQL SERVER

- by Leo

We need management 10,000 GPS devices, each GPS device upload a GPS data every 30 seconds, these data need to store in the database(MS SQL Server 2005). Each GPS device daily data quantity is: 24 * 60 * 2 = 2,880 10 000 10,000 GPS devices daily data quantity is: 10000 * 2880 = 28,800,000 Each GPS data approximately 160Byte, the amount of data per day is: 28,800,000 * 160 = 4.29GB We need hold at least 3 months of GPS data in the database, My question is: 1, whether SQL Server 2005 can support such a large amount of data store? 2, How to plan data table? (all GPS data storage in one table? Daily table? Each GPS device with a GPS data table?) The GPS data: GPSID varchar(21), RecvTime datetime, GPSTime datetime, IsValid bit, IsNavi bit, Lng float, Lat float, Alt float, Spd smallint, Head smallint, PulseValue bigint, Oil float, TSW1 bigint, TSW1Mask bigint, TSW2 bigint, TSW2Mask, BSW bigint, StateText varchar(200), PosText varchar(200), UploadType tinyint

Read the article

C# or windows equivalent of OS X's Core Data?

- by Nektarios

I'm late to the boat and have only just now started using Core Data in OS X / Cocoa - it's incredible and is really changing the way I look at things. Is there an equivalent technology in C# or the modern Windows frameworks? i.e. having managed data types where you get saving, data management, deleting, searching all for free? Also wondering if there's anything like this on Linux.

Read the article

What is the most efficient way to use Core Data?

- by Eric

I'm developing an iPad application using Core Data, and was hoping someone could clarify something about Core Data. Right now, I populate my table by making a fetch request for all of my data in viewDidLoad. I'd rather make individual fetch requests in my tableView:cellForRowAtIndexPath:. Can anyone tell me which is more efficient, and why? In other words, is it much less efficient to make lots of small requests as opposed to one big request?

Read the article

Get count matches in query on large table very slow

- by Roy Roes

I have a mysql table "items" with 2 integer fields: seid and tiid The table has about 35000000 records, so it's very large. seid tiid ----------- 1 1 2 2 2 3 2 4 3 4 4 1 4 2 The table has a primary key on both fields, an index on seid and an index on tiid. Someone types in 1 or more tiid values and now I would like to get the seid with most results. For example when someone types 1,2,3, I would like to get seid 2 and 4 as result. They both have 2 matches on the tiid values. My query so far: SELECT COUNT(*) as c, seid FROM items WHERE tiid IN (1,2,3) GROUP BY seid HAVING c = (SELECT COUNT(*) as c, seid FROM items WHERE tiid IN (1,2,3) GROUP BY seid ORDER BY c DESC LIMIT 1) But this query is extremly slow, because of the large table. Does anyone know how to construct a better query for this purpose?

Read the article

Secrets of delivering .NET size large products?

- by Joan Venge

In software companies I have seen it's really hard to work on very large products where everything depends on everything else. For instance Microsoft works on C#, F#, .NET, WPF, Visual Studio where these things are interconnected. I don't know how many people are involved, but if it's in 100s, how do they keep in sync with everything, so they design and implement features without conflicting with other dependencies and future plans of other products? I am wondering that if MS is able to do this, they must have a very good system. Any guidelines or secrets for MS or non-MS very large software product delivering?

Read the article

What happens if a user jumps over 10 versions before updating, and every version had a new data mode

- by dontWatchMyProfile

Example: User installs app v.1.0, adds data. Then the dev submits 10 updates in 10 weeks. After 11 weeks, the user wants v.11.0 and grabs a copy from the app store. Assuming that the app has got 11 .xcdatamodel versions inside, where ***11.xcdatamodel is the current one, what would happen now since the persistent store of the user is ages old? would the migration happen 10 times, step-by-step through every migration iteration? Or does the actual migration of data (lets assume gigabytes of data) happen exactly once, after Core Data (or the persistent store coordinator) has figured out precisely what to do to go from v.1.0 to v.11.0?

Read the article

What are the different methods to expedite the data retrieval from database?

- by saillu2003

How can we expedite the data retrieval from the database for my web site which is extensively updating and fetching data from database.

Read the article

Comparing large strings in JavaScript with a hash

- by user4815162342

I have a form with a textarea that can contain large amounts of content (say, articles for a blog) edited using one of a number of third party rich text editors. I'm trying to implement something like an autosave feature, which should submit the content through ajax if it's changed. However, I have to work around the fact that some of the editors I have as options don't support an "isdirty" flag, or an "onchange" event which I can use to see if the content has changed since the last save. So, as a workaround, what I'd like to do is keep a copy of the content in a variable (let's call it lastSaveContent), as of the last save, and compare it with the current text when the "autosave" function fires (on a timer) to see if it's different. However, I'm worried about how much memory that could take up with very large documents. Would it be more efficient to store some sort of hash in the lastSaveContent variable, instead of the entire string, and then compare the hash values? If so, can you recommend a good javascript library/jquery plugin that implements an appropriate hash for this requirement?

Read the article

Passing Large amount of data in PHP.

- by Simple

I would like to know what is the best way to pass a large amount of XML data from one PHP script to another. I have a script that reads in an XML feed of jobs. I would like to have the script display a list of the job titles as links. When the user clicks a link they would be taken to another page displaying the details for that job. The job details are too large to send in the query string, and it seems poor style to start a session for data that isn't specific to that user. Any ideas?

Read the article

WCF – interchangeable data-contract types

- by nmarun

In a WSDL based environment, unlike a CLR-world, we pass around the ‘state’ of an object and not the reference of an object. Well firstly, what does ‘state’ mean and does this also mean that we can send a struct where a class is expected (or vice-versa) as long as their ‘state’ is one and the same? Let’s see. So I have an operation contract defined as below: 1: [ServiceContract] 2: public interface ILearnWcfServiceExtend : ILearnWcfService 3: { 4: [OperationContract] 5: Employee SaveEmployee(Employee employee); 6: } 7: 8: [ServiceBehavior] 9: public class LearnWcfService : ILearnWcfServiceExtend 10: { 11: public Employee SaveEmployee(Employee employee) 12: { 13: employee.EmployeeId = 123; 14: return employee; 15: } 16: } Quite simplistic operation there (which translates to ‘absolutely no business value’). Now, the data contract Employee mentioned above is a struct. 1: public struct Employee 2: { 3: public int EmployeeId { get; set; } 4: 5: public string FName { get; set; } 6: } After compilation and consumption of this service, my proxy (in the Reference.cs file) looks like below (I’ve ignored the rest of the details just to avoid unwanted confusion): 1: public partial struct Employee : System.Runtime.Serialization.IExtensibleDataObject, System.ComponentModel.INotifyPropertyChanged I call the service with the code below: 1: private static void CallWcfService() 2: { 3: Employee employee = new Employee { FName = "A" }; 4: Console.WriteLine("IsValueType: {0}", employee.GetType().IsValueType); 5: Console.WriteLine("IsClass: {0}", employee.GetType().IsClass); 6: Console.WriteLine("Before calling the service: {0} - {1}", employee.EmployeeId, employee.FName); 7: employee = LearnWcfServiceClient.SaveEmployee(employee); 8: Console.WriteLine("Return from the service: {0} - {1}", employee.EmployeeId, employee.FName); 9: } The output is: I now change my Employee type from a struct to a class in the proxy class and run the application: 1: public partial class Employee : System.Runtime.Serialization.IExtensibleDataObject, System.ComponentModel.INotifyPropertyChanged { The output this time is: The state of an object implies towards its composition, the properties and the values of these properties and not based on whether it is a reference type (class) or a value type (struct). And as shown above, we’re actually passing an object by its state and not by reference. Continuing on the same topic of ‘type-interchangeability’, WCF treats two data contracts as equivalent if they have the same ‘wire-representation’. We can do so using the DataContract and DataMember attributes’ Name property. 1: [DataContract] 2: public struct Person 3: { 4: [DataMember] 5: public int Id { get; set; } 6: 7: [DataMember] 8: public string FirstName { get; set; } 9: } 10: 11: [DataContract(Name="Person")] 12: public class Employee 13: { 14: [DataMember(Name = "Id")] 15: public int EmployeeId { get; set; } 16: 17: [DataMember(Name="FirstName")] 18: public string FName { get; set; } 19: } I’ve created two data contracts with the exact same wire-representation. Just remember that the names and the types of data members need to match to be considered equivalent. The question then arises as to what gets generated in the proxy class. Despite us declaring two data contracts (Person and Employee), only one gets emitted – Person. This is because we’re saying that the Employee type has the same wire-representation as the Person type. Also that the signature of the SaveEmployee operation gets changed on the proxy side: 1: [System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "4.0.0.0")] 2: [System.ServiceModel.ServiceContractAttribute(ConfigurationName="ServiceProxy.ILearnWcfServiceExtend")] 3: public interface ILearnWcfServiceExtend 4: { 5: [System.ServiceModel.OperationContractAttribute(Action="http://tempuri.org/ILearnWcfServiceExtend/SaveEmployee", ReplyAction="http://tempuri.org/ILearnWcfServiceExtend/SaveEmployeeResponse")] 6: ClientApplication.ServiceProxy.Person SaveEmployee(ClientApplication.ServiceProxy.Person employee); 7: } But, on the service side, the SaveEmployee still accepts and returns an Employee data contract. 1: [ServiceBehavior] 2: public class LearnWcfService : ILearnWcfServiceExtend 3: { 4: public Employee SaveEmployee(Employee employee) 5: { 6: employee.EmployeeId = 123; 7: return employee; 8: } 9: } Despite all these changes, our output remains the same as the last one: This is type-interchangeability at work! Here’s one more thing to ponder about. Our Person type is a struct and Employee type is a class. Then how is it that the Person type got emitted as a ‘class’ in the proxy? It’s worth mentioning that WSDL describes a type called Employee and does not say whether it is a class or a struct (see the SOAP message below): 1: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" 2: xmlns:tem="http://tempuri.org/" 3: xmlns:ser="http://schemas.datacontract.org/2004/07/ServiceApplication"> 4: <soapenv:Header/> 5: <soapenv:Body> 6: <tem:SaveEmployee> 7:  8: <tem:employee> 9:  10: <ser:EmployeeId>?</ser:EmployeeId> 11:  12: <ser:FName>?</ser:FName> 13: </tem:employee> 14: </tem:SaveEmployee> 15: </soapenv:Body> 16: </soapenv:Envelope> There are some differences between how ‘Add Service Reference’ and the svcutil.exe generate the proxy class, but turns out both do some kind of reflection and determine the type of the data contract and emit the code accordingly. So since the Employee type is a class, the proxy ‘Person’ type gets generated as a class. In fact, reflecting on svcutil.exe application, you’ll see that there are a couple of places wherein a flag actually determines a type as a class or a struct. One example is in the ExportISerializableDataContract method in the System.Runtime.Serialization.CodeExporter class. Seems like these flags have a say in deciding whether the type gets emitted as a struct or a class. This behavior is different if you use the WSDL tool though. WSDL tool does not do any kind of reflection of the data contract / serialized type, it emits the type as a class by default. You can check this using the two command lines below: Note to self: Remember ‘state’ and type-interchangeability when traversing through the WSDL planet!

Read the article

Data management in unexpected places

- by Ashok_Ora

Normal 0 false false false EN-US X-NONE X-NONE Data management in unexpected places When you think of network switches, routers, firewall appliances, etc., it may not be obvious that at the heart of these kinds of solutions is an engine that can manage huge amounts of data at very high throughput with low latencies and high availability. Consider a network router that is processing tens (or hundreds) of thousands of network packets per second. So what really happens inside a router? Packets are streaming in at the rate of tens of thousands per second. Each packet has multiple attributes, for example, a destination, associated SLAs etc. For each packet, the router has to determine the address of the next “hop” to the destination; it has to determine how to prioritize this packet. If it’s a high priority packet, then it has to be sent on its way before lower priority packets. As a consequence of prioritizing high priority packets, lower priority data packets may need to be temporarily stored (held back), but addressed fairly. If there are security or privacy requirements associated with the data packet, those have to be enforced. You probably need to keep track of statistics related to the packets processed (someone’s sure to ask). You have to do all this (and more) while preserving high availability i.e. if one of the processors in the router goes down, you have to have a way to continue processing without interruption (the customer won’t be happy with a “choppy” VoIP conversation, right?). And all this has to be achieved without ANY intervention from a human operator – the router is most likely to be in a remote location – it must JUST CONTINUE TO WORK CORRECTLY, even when bad things happen. How is this implemented? As soon as a packet arrives, it is interpreted by the receiving software. The software decodes the packet headers in order to determine the destination, kind of packet (e.g. voice vs. data), SLAs associated with the “owner” of the packet etc. It looks up the internal database of “rules” of how to process this packet and handles the packet accordingly. The software might choose to hold on to the packet safely for some period of time, if it’s a low priority packet. Ah – this sounds very much like a database problem. For each packet, you have to minimally · Look up the most efficient next “hop” towards the destination. The “most efficient” next hop can change, depending on latency, availability etc. · Look up the SLA and determine the priority of this packet (e.g. voice calls get priority over data ftp) · Look up security information associated with this data packet. It may be necessary to retrieve the context for this network packet since a network packet is a small “slice” of a session. The context for the “header” packet needs to be stored in the router, in order to make this work. · If the priority of the packet is low, then “store” the packet temporarily in the router until it is time to forward the packet to the next hop. · Update various statistics about the packet. In most cases, you have to do all this in the context of a single transaction. For example, you want to look up the forwarding address and perform the “send” in a single transaction so that the forwarding address doesn’t change while you’re sending the packet. So, how do you do all this? Berkeley DB is a proven, reliable, high performance, highly available embeddable database, designed for exactly these kinds of usage scenarios. Berkeley DB is a robust, reliable, proven solution that is currently being used in these scenarios. First and foremost, Berkeley DB (or BDB for short) is very very fast. It can process tens or hundreds of thousands of transactions per second. It can be used as a pure in-memory database, or as a disk-persistent database. BDB provides high availability – if one board in the router fails, the system can automatically failover to another board – no manual intervention required. BDB is self-administering – there’s no need for manual intervention in order to maintain a BDB application. No need to send a technician to a remote site in the middle of nowhere on a freezing winter day to perform maintenance operations. BDB is used in over 200 million deployments worldwide for the past two decades for mission-critical applications such as the one described here. You have a choice of spending valuable resources to implement similar functionality, or, you could simply embed BDB in your application and off you go! I know what I’d do – choose BDB, so I can focus on my business problem. What will you do? /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article

Metrics - A little knowledge can be a dangerous thing (or 'Why you're not clever enough to interpret metrics data')

- by Jason Crease

At RedGate Software, I work on a .NET obfuscator called SmartAssembly. Various features of it use a database to store various things (exception reports, name-mappings, etc.) The user is given the option of using either a SQL-Server database (which requires them to have Microsoft SQL Server), or a Microsoft Access MDB file (which requires nothing). MDB is the default option, but power-users soon switch to using a SQL Server database because it offers better performance and data-sharing. In the fashionable spirit of optimization and metrics, an obvious product-management question is 'Which is the most popular? SQL Server or MDB?' We've collected data about this fact, using our 'Feature-Usage-Reporting' technology (available as part of SmartAssembly) and more recently our 'Application Metrics' technology: Parameter Number of users % of total users Number of sessions Number of usages SQL Server 28 19.0 8115 8115 MDB 114 77.6 1449 1449 (As a disclaimer, please note than SmartAssembly has far more than 132 users . This data is just a selection of one build) So, it would appear that SQL-Server is used by fewer users, but more often. Great. But here's why these numbers are useless to me: Only the original developers understand the data What does a single 'usage' of 'MDB' mean? Does this happen once per run? Once per option change? On clicking the 'Obfuscate Now' button? When running the command-line version or just from the UI version? Each question could skew the data 10-fold either way, and the answers only known by the developer that instrumented the application in the first place. In other words, only the original developer can interpret the data - product-managers cannot interpret the data unaided. Most of the data is from uninterested users About half of people who download and run a free-trial from the internet quit it almost immediately. Only a small fraction use it sufficiently to make informed choices. Since the MDB option is the default one, we don't know how many of those 114 were people CHOOSING to use the MDB, or how many were JUST HAPPENING to use this MDB default for their 20-second trial. This is a problem we see across all our metrics: Are people are using X because it's the default or are they using X because they want to use X? We need to segment the data further - asking what percentage of each percentage meet our criteria for an 'established user' or 'informed user'. You end up spending hours writing sophisticated and dubious SQL queries to segment the data further. Not fun. You can't find out why they used this feature Metrics can answer the when and what, but not the why. Why did people use feature X? If you're anything like me, you often click on random buttons in unfamiliar applications just to explore the feature-set. If we listened uncritically to metrics at RedGate, we would eliminate the most-important and more-complex features which people actually buy the software for, leaving just big buttons on the main page and the About-Box. "Ah, that's interesting!" rather than "Ah, that's actionable!" People do love data. Did you know you eat 1201 chickens in a lifetime? But just 4 cows? Interesting, but useless. Often metrics give you a nice number: '5.8% of users have 3 or more monitors' . But unless the statistic is both SUPRISING and ACTIONABLE, it's useless. Most metrics are collected, reviewed with lots of cooing. and then forgotten. Unless a piece-of-data could change things, it's useless collecting it. People get obsessed with significance levels The first things that lots of people do with this data is do a t-test to get a significance level ("Hey! We know with 99.64% confidence that people prefer SQL Server to MDBs!") Believe me: other causes of error/misinterpretation in your data are FAR more significant than your t-test could ever comprehend. Confirmation bias prevents objectivity If the data appears to match our instinct, we feel satisfied and move on. If it doesn't, we suspect the data and dig deeper, plummeting down a rabbit-hole of segmentation and filtering until we give-up and move-on. Data is only useful if it can change our preconceptions. Do you trust this dodgy data more than your own understanding, knowledge and intelligence? I don't. There's always multiple plausible ways to interpret/action any data Let's say we segment the above data, and get this data: Post-trial users (i.e. those using a paid version after the 14-day free-trial is over): Parameter Number of users % of total users Number of sessions Number of usages SQL Server 13 9.0 1115 1115 MDB 5 4.2 449 449 Trial users: Parameter Number of users % of total users Number of sessions Number of usages SQL Server 15 10.0 7000 7000 MDB 114 77.6 1000 1000 How do you interpret this data? It's one of: Mostly SQL Server users buy our software. People who can't afford SQL Server tend to be unable to afford or unwilling to buy our software. Therefore, ditch MDB-support. Our MDB support is so poor and buggy that our massive MDB user-base doesn't buy it. Therefore, spend loads of money improving it, and think about ditching SQL-Server support. People 'graduate' naturally from MDB to SQL Server as they use the software more. Things are fine the way they are. We're marketing the tool wrong. The large number of MDB users represent uninformed downloaders. Tell marketing to aggressively target SQL Server users. To choose an interpretation you need to segment again. And again. And again, and again. Opting-out is correlated with feature-usage Metrics tends to be opt-in. This skews the data even further. Between 5% and 30% of people choose to opt-in to metrics (often called 'customer improvement program' or something like that). Casual trial-users who are uninterested in your product or company are less likely to opt-in. This group is probably also likely to be MDB users. How much does this skew your data by? Who knows? It's not all doom and gloom. There are some things metrics can answer well. Environment facts. How many people have 3 monitors? Have Windows 7? Have .NET 4 installed? Have Japanese Windows? Minor optimizations. Is the text-box big enough for average user-input? Performance data. How long does our app take to start? How many databases does the average user have on their server? As you can see, questions about who-the-user-is rather than what-the-user-does are easier to answer and action. Conclusion Use SmartAssembly. If not for the metrics (called 'Feature-Usage-Reporting'), then at least for the obfuscation/error-reporting. Data raises more questions than it answers. Questions about environment are the easiest to answer.

Read the article

Data Source Connection Pool Sizing

- by Steve Felts

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} One of the most time-consuming procedures of a database application is establishing a connection. The connection pooling of the data source can be used to minimize this overhead. That argues for using the data source instead of accessing the database driver directly. Configuring the size of the pool in the data source is somewhere between an art and science – this article will try to move it closer to science. From the beginning, WLS data source has had an initial capacity and a maximum capacity configuration values. When the system starts up and when it shrinks, initial capacity is used. The pool can grow to maximum capacity. Customers found that they might want to set the initial capacity to 0 (more on that later) but didn’t want the pool to shrink to 0. In WLS 10.3.6, we added minimum capacity to specify the lower limit to which a pool will shrink. If minimum capacity is not set, it defaults to the initial capacity for upward compatibility. We also did some work on the shrinking in release 10.3.4 to reduce thrashing; the algorithm that used to shrink to the maximum of the currently used connections or the initial capacity (basically the unused connections were all released) was changed to shrink by half of the unused connections. The simple approach to sizing the pool is to set the initial/minimum capacity to the maximum capacity. Doing this creates all connections at startup, avoiding creating connections on demand and the pool is stable. However, there are a number of reasons not to take this simple approach. When WLS is booted, the deployment of the data source includes synchronously creating the connections. The more connections that are configured in initial capacity, the longer the boot time for WLS (there have been several projects for parallel boot in WLS but none that are available). Related to creating a lot of connections at boot time is the problem of logon storms (the database gets too much work at one time). WLS has a solution for that by setting the login delay seconds on the pool but that also increases the boot time. There are a number of cases where it is desirable to set the initial capacity to 0. By doing that, the overhead of creating connections is deferred out of the boot and the database doesn’t need to be available. An application may not want WLS to automatically connect to the database until it is actually needed, such as for some code/warm failover configurations. There are a number of cases where minimum capacity should be less than maximum capacity. Connections are generally expensive to keep around. They cause state to be kept on both the client and the server, and the state on the backend may be heavy (for example, a process). Depending on the vendor, connection usage may cost money. If work load is not constant, then database connections can be freed up by shrinking the pool when connections are not in use. When using Active GridLink, connections can be created as needed according to runtime load balancing (RLB) percentages instead of by connection load balancing (CLB) during data source deployment. Shrinking is an effective technique for clearing the pool when connections are not in use. In addition to the obvious reason that there times where the workload is lighter, there are some configurations where the database and/or firewall conspire to make long-unused or too-old connections no longer viable. There are also some data source features where the connection has state and cannot be used again unless the state matches the request. Examples of this are identity based pooling where the connection has a particular owner and XA affinity where the connection is associated with a particular RAC node. At this point, WLS does not re-purpose (discard/replace) connections and shrinking is a way to get rid of the unused existing connection and get a new one with the correct state when needed. So far, the discussion has focused on the relationship of initial, minimum, and maximum capacity. Computing the maximum size requires some knowledge about the application and the current number of simultaneously active users, web sessions, batch programs, or whatever access patterns are common. The applications should be written to only reserve and close connections as needed but multiple statements, if needed, should be done in one reservation (don’t get/close more often than necessary). This means that the size of the pool is likely to be significantly smaller then the number of users. If possible, you can pick a size and see how it performs under simulated or real load. There is a high-water mark statistic (ActiveConnectionsHighCount) that tracks the maximum connections concurrently used. In general, you want the size to be big enough so that you never run out of connections but no bigger. It will need to deal with spikes in usage, which is where shrinking after the spike is important. Of course, the database capacity also has a big influence on the decision since it’s important not to overload the database machine. Planning also needs to happen if you are running in a Multi-Data Source or Active GridLink configuration and expect that the remaining nodes will take over the connections when one of the nodes in the cluster goes down. For XA affinity, additional headroom is also recommended. In summary, setting initial and maximum capacity to be the same may be simple but there are many other factors that may be important in making the decision about sizing.

Read the article

Data Source Security Part 4

- by Steve Felts

So far, I have covered Client Identity and Oracle Proxy Session features, with WLS or database credentials. This article will cover one more feature, Identify-based pooling. Then, there is one more topic to cover - how these options play with transactions.Identity-based Connection Pooling An identity based pool creates a heterogeneous pool of connections. This allows applications to use a JDBC connection with a specific DBMS credential by pooling physical connections with different DBMS credentials. The DBMS credential is based on either the WebLogic user mapped to a database user or the database user directly, based on the “use database credentials” setting as described earlier. Using this feature enabled with “use database credentials” enabled seems to be what is proposed in the JDBC standard, basically a heterogeneous pool with users specified by getConnection(user, password). The allocation of connections is more complex if Enable Identity Based Connection Pooling attribute is enabled on the data source. When an application requests a database connection, the WebLogic Server instance selects an existing physical connection or creates a new physical connection with requested DBMS identity. The following section provides information on how heterogeneous connections are created:1. At connection pool initialization, the physical JDBC connections based on the configured or default “initial capacity” are created with the configured default DBMS credential of the data source.2. An application tries to get a connection from a data source.3a. If “use database credentials” is not enabled, the user specified in getConnection is mapped to a DBMS credential, as described earlier. If the credential map doesn’t have a matching user, the default DBMS credential is used from the datasource descriptor.3b. If “use database credentials” is enabled, the user and password specified in getConnection are used directly.4. The connection pool is searched for a connection with a matching DBMS credential.5. If a match is found, the connection is reserved and returned to the application.6. If no match is found, a connection is created or reused based on the maximum capacity of the pool: - If the maximum capacity has not been reached, a new connection is created with the DBMS credential, reserved, and returned to the application.- If the pool has reached maximum capacity, based on the least recently used (LRU) algorithm, a physical connection is selected from the pool and destroyed. A new connection is created with the DBMS credential, reserved, and returned to the application. It should be clear that finding a matching connection is more expensive than a homogeneous pool. Destroying a connection and getting a new one is very expensive. If you can use a normal homogeneous pool or one of the light-weight options (client identity or an Oracle proxy connection), those should be used instead of identity based pooling. Regardless of how physical connections are created, each physical connection in the pool has its own DBMS credential information maintained by the pool. Once a physical connection is reserved by the pool, it does not change its DBMS credential even if the current thread changes its WebLogic user credential and continues to use the same connection. To configure this feature, select Enable Identity Based Connection Pooling. See http://docs.oracle.com/cd/E24329_01/apirefs.1211/e24401/taskhelp/jdbc/jdbc_datasources/EnableIdentityBasedConnectionPooling.html "Enable identity-based connection pooling for a JDBC data source" in Oracle WebLogic Server Administration Console Help. You must make the following changes to use Logging Last Resource (LLR) transaction optimization with Identity-based Pooling to get around the problem that multiple users will be accessing the associated transaction table.- You must configure a custom schema for LLR using a fully qualified LLR table name. All LLR connections will then use the named schema rather than the default schema when accessing the LLR transaction table. - Use database specific administration tools to grant permission to access the named LLR table to all users that could access this table via a global transaction. By default, the LLR table is created during boot by the user configured for the connection in the data source. In most cases, the database will only allow access to this user and not allow access to mapped users. Connections within Transactions Now that we have covered the behavior of all of these various options, it’s time to discuss the exception to all of the rules. When you get a connection within a transaction, it is associated with the transaction context on a particular WLS instance. When getting a connection with a data source configured with non-XA LLR or 1PC (using the JTS driver) with global transactions, the first connection obtained within the transaction is returned on subsequent connection requests regardless of the values of username/password specified and independent of the associated proxy user session, if any. The connection must be shared among all users of the connection when using LLR or 1PC. For XA data sources, the first connection obtained within the global transaction is returned on subsequent connection requests within the application server, regardless of the values of username/password specified and independent of the associated proxy user session, if any. The connection must be shared among all users of the connection within a global transaction within the application server/JVM.

Read the article

24TB RAID 6 configuration

- by Phil

I am in charge of a new website in a niche industry that stores lots of data (10+ TB per client, growing to 2 or 3 clients soon). We are considering ordering about $5000 worth of 3TB drives (10 in a RAID 6 configuration and 10 for backup), which will give us approximately 24 TB of production storage. The data will be written once and remain unmodified for the lifetime of the website, so we only need to do a backup one time. I understand basic RAID theory, however I am not experienced with it. My question is, does this sound like a good configuration? What potential problems could this setup cause? Also, what is the best way to do a one-time backup? Have two RAID 6 arrays, one for offsite backup and one for production? Or should I backup the RAID 6 production array to a JBOD? EDIT: The data server is running Windows 2008 Server x64. EDIT 2: To reduce rebuild time, what would you think about using two RAID 5's instead of one RAID 6?

Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 107/2640 | < Previous Page | 103 104 105 106 107 108 109 110 111 112 113 114 | Next Page >

- by skeept

- by ina

- by Matt

- by Jonathon

- by gatapia

- by John

- by mystify

- by angelina

- by configurator

- by Mike

- by Leo

- by Nektarios

- by Eric

- by Roy Roes

- by Joan Venge

- by dontWatchMyProfile

- by saillu2003

- by user4815162342

- by Simple

- by nmarun

- by Ashok_Ora

- by Jason Crease

- by Steve Felts

- by Steve Felts

- by Phil

< Previous Page | 103 104 105 106 107 108 109 110 111 112 113 114 | Next Page >