Search Results

Search found 80052 results on 3203 pages for 'data load performance'.

Page 22/3203 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

Performance problems loading XML with SSIS, an alternative way!

- by AtulThakor

I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed: Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table. Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed: The same foreach container was used but instead the following SQL command was executed (this is an expression): "INSERT INTO MyTable(FileDate) SELECT CAST(bulkcolumn AS XML) FROM OPENROWSET( BULK '" + @[User::CurrentFile] + "', SINGLE_BLOB ) AS x" Using this method we managed to load approximately 20 records per second, much faster…for data loading! For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose! Openrowset Method Much faster to get the data into SQL You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!) Not able to apply validation/transformation against the data when loading it The SQL Server service account will need permission to the file No schema validation when loading files SSIS Slower (in our case) Schema validation Allows you to apply transformations/joins to the data Permissions should be less of a problem Data can be loaded into the final form through the package When using a schema validation errors can fail the package (I’ll do another post on this)

Read the article
After writing SQL statements in MySQL, how to measure the speed / performance of them?

- by Jian Lin

I saw something from an "execution plan" article: 10 rows fetched in 0.0003s (0.7344s) How come there are 2 durations shown? What if I don't have large data set yet. For example, if I have only 20, 50, or even just 100 records, I can't really measure how faster 2 different SQL statements compare in term of speed in real life situation? In other words, there needs to be at least hundreds of thousands of records, or even a million records to accurately compares the performance of 2 different SQL statements?

Read the article
Linux RAID-0 performance doesn't scale up over 1 GB/s

- by wazoox

I have trouble getting the max throughput out of my setup. The hardware is as follow : dual Quad-Core AMD Opteron(tm) Processor 2376 16 GB DDR2 ECC RAM dual Adaptec 52245 RAID controllers 48 1 TB SATA drives set up as 2 RAID-6 arrays (256KB stripe) + spares. Software : Plain vanilla 2.6.32.25 kernel, compiled for AMD-64, optimized for NUMA; Debian Lenny userland. benchmarks run : disktest, bonnie++, dd, etc. All give the same results. No discrepancy here. io scheduler used : noop. Yeah, no trick here. Up until now I basically assumed that striping (RAID 0) several physical devices should augment performance roughly linearly. However this is not the case here : each RAID array achieves about 780 MB/s write, sustained, and 1 GB/s read, sustained. writing to both RAID arrays simultaneously with two different processes gives 750 + 750 MB/s, and reading from both gives 1 + 1 GB/s. however when I stripe both arrays together, using either mdadm or lvm, the performance is about 850 MB/s writing and 1.4 GB/s reading. at least 30% less than expected! running two parallel writer or reader processes against the striped arrays doesn't enhance the figures, in fact it degrades performance even further. So what's happening here? Basically I ruled out bus or memory contention, because when I run dd on both drives simultaneously, aggregate write speed actually reach 1.5 GB/s and reading speed tops 2 GB/s. So it's not the PCIe bus. I suppose it's not the RAM. It's not the filesystem, because I get exactly the same numbers benchmarking against the raw device or using XFS. And I also get exactly the same performance using either LVM striping and md striping. What's wrong? What's preventing a process from going up to the max possible throughput? Is Linux striping defective? What other tests could I run?

Read the article
VMWare - Writing files to virtual hard drive performance

- by Ardman

We have just moved our infrastructure from physical servers to virtual machines. Everything is running great and we are happy with the result of the move. We have identified one problem, and that is reading/writing performance. We have an application that compiles files and writes to disk. This is considerably slower on the new virtual machines compared to the physical machines. Is there a performance bottleneck when writing to a virtual hard drive compared to a physical hard drive?

Read the article
Optimize windows 2008 performance

- by Giorgi

Hello, I have windows server 2008 sp2 installed as virtual machine on my personal laptop. I use it only for source control (visual svn) and continuous integration (teamcity). As the virtual machine resources are limited I'd like to optimize it's performance by disabling services and features that are not necessary for my purposes. Can anyone recommend where to start or provide with tips for getting better performance. Thanks.

Read the article
Randomly poor 2D performance in Linux Mint 11 when using nvidia driver

- by SDD

I am using: - Linux Mint 11 - Geforce 560ti - nVidia driver (installed via helper programm, not from nvidia page) The third party nvidia drivers radomly cause very poor 2D performance. Radomly because the performance can be very great, but after the next reboot or login become very poor. After another reboot or login, this might change again to better or worse. I have no idea why and how and I need your help. Thank you.

Read the article
How to measure disk-performance under Windows?

- by Alphager

I'm trying to find out why my application is very slow on a certain machine (runs fine everywhere else). I think i have traced the performance-problems to hard-disk reads and writes and i think it's simply the very slow disk. What tool could i use to measure hd read and write performance under Windows 2003 in a non-destructive way (the partitions on the drives have to remain intact)?

Read the article
Oracle Data Warehouse and Big Data Magazine MAY Edition for Customers + Partners

- by KLaker

Follow us on The latest edition of our monthly data warehouse and big data magazine for Oracle customers and partners is now available. The content for this magazine is taken from the various data warehouse and big data Oracle product management blogs, Oracle press releases, videos posted on Oracle Media Network and Oracle Facebook pages. Click here to view the May Edition Please share this link http://flip.it/fKOUS to our magazine with your customers and partners This magazine is optimized for display on tablets and smartphones using the Flipboard App which is available from the Apple App store and Google Play store

Read the article
Version control and data provenance in charts, slides, and marketing materials that derive from code ouput

- by EMS

I develop as part of a small team that mostly does research and statistics stuff. But from the output of our code, other teams often create promotional materials, slides, presentations, etc. We run into a big problem because the marketing team (non-programmers) tend to use Excel, Adobe products, or other tools to carry out their work, and just want easy-to-use data formats from us. This leads to data provenance problems. We see email chains with attachments from 6 months ago and someone is saying "Hey, who generated this data. Can you generate more of it with the recent 6 months of results added in?" I want to help the other teams effectively use version control (my team uses it reasonably well for the code, but every other team classically comes up with many excuses to avoid it). For version controlling a software project where the participants are coders, I have some reasonable understanding of best practices and what to do. But for getting a team of marketing professionals to version control marketing materials and associate metadata about the software used to generate the data for the charts, I'm a bit at a loss. Some of the goals I'd like to achieve: Data that supported a material should never be associated with a person. As in, it should never be the case that someone says "Hey Person XYZ, I see you sent me this data as an attachment 6 months ago, can you update it for me?" Rather, data should be associated with the code and code-version of any code that was used to get it, and perhaps a team of many people who may maintain that code. Then references for data updates are about executing a specific piece of code, with a known version number. I'd like this to be a process that works easily with the tech that the marketing team already uses (e.g. Excel files, Adobe file, whatever). I don't want to burden them with needing to learn a bunch of new stuff just to use version control. They are capable folks, so learning something is fine. Ideally they could use our existing version control framework, but there are some issues around that. I think knowing some general best practices will be enough though, and I can handle patching that into the way our stuff works now. Are there any goals I am failing to think about? What are the time-tested ways to do something like this?

Read the article
Big Data – Buzz Words: What is NewSQL – Day 10 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the relational database. In this article we will take a quick look at the what is NewSQL. What is NewSQL? NewSQL stands for new scalable and high performance SQL Database vendors. The products sold by NewSQL vendors are horizontally scalable. NewSQL is not kind of databases but it is about vendors who supports emerging data products with relational database properties (like ACID, Transaction etc.) along with high performance. Products from NewSQL vendors usually follow in memory data for speedy access as well are available immediate scalability. NewSQL term was coined by 451 groups analyst Matthew Aslett in this particular blog post. On the definition of NewSQL, Aslett writes: “NewSQL” is our shorthand for the various new scalable/high performance SQL database vendors. We have previously referred to these products as ‘ScalableSQL‘ to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all the products, we adopted the term ‘NewSQL’ in the new report. And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL. In other words - NewSQL incorporates the concepts and principles of Structured Query Language (SQL) and NoSQL languages. It combines reliability of SQL with the speed and performance of NoSQL. Categories of NewSQL There are three major categories of the NewSQL New Architecture – In this framework each node owns a subset of the data and queries are split into smaller query to sent to nodes to process the data. E.g. NuoDB, Clustrix, VoltDB MySQL Engines – Highly Optimized storage engine for SQL with the interface of MySQ Lare the example of such category. E.g. InnoDB, Akiban Transparent Sharding – This system automatically split database across multiple nodes. E.g. Scalearc Summary In simple words – NewSQL is kind of database following relational database principals and provides scalability like NoSQL. Tomorrow In tomorrow’s blog post we will discuss about the Role of Cloud Computing in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
What does it mean to treat data as an asset?

What does it mean to treat data as an asset? When considering this concept, we must define what data is and how it can be considered an asset. Data can easily be defined as a collection of stored truths that are open to interpretation and manipulation. Expanding on this definition, data can be viewed as a set of captured facts, measurements, and ideas used to make decisions. Furthermore, InvestorsWords.com defines asset as any item of economic value owned by an individual or corporation. Now let’s apply this definition of asset to our definition of data, and ask the following question. Can facts, measurements and ideas be items that are of economic value owned by an individual or corporation? The obvious answer is yes; data can be bought and sold like commodities or analyzed to make smarter business decisions. We can look at the economic value of data in one of two ways. First, data can be sold as a commodity that can take the form of goods like eBooks, Training, Music, Movies, and so on. Customers are willing to pay to gain access to this data for their consumption. This directly implies that there is an economic value for data in the form of a commodity because customers see a value in obtaining it. Secondly data can be used in making smarter business decisions that allow for companies to become more profitable and/or reduce their potential for risk in regards to how they operate. In the past I have worked at companies where we had to analyze previous sales activities in conjunction with current activities to determine how the company was preforming for the quarter. In addition trends can be formulated based on existing data that allow companies to forecast data so that they can make strategic business decisions based sound forecasted data. Companies that truly value their data are constantly trying to grow and upgrade their data and supporting applications because it is the life blood of a company. If we look at an eBook retailer for example, imagine if they lost all of their data. They would be in essence forced out of business because they would have nothing to sell. In turn, if we look at a company that was using data to facilitate better decision making processes and they lost all of their data then they could be losing potential revenue and/ or increasing the company’s losses by making important business decisions virtually in the dark compared to when they were made on solid data.

Read the article
How to stop ADO Data Service from sending 'MetaData' with the Data

- by ActionFactory

I'm using MS Data Service to send JSON. (Using a View in the DB) Problem is that it sends "__metadata": { "uri":"ALL THE DATA REPEATED"} with the Data - each time. This doubles our Data over the wire each call. THERE MUST BE A WAY TO TURN THIS OFF!!! WHO KNOWS HOW? THX

Read the article
Working with Spatial Data Part II - Plotting Shapes and Storing Geocoded Data

Continuing from Part I of the Spatial Data series we model an in-memory/persistent data repository for storing geocoded data and plot the data.

Read the article
Data Virtualization: Federated and Hybrid

- by Krishnamoorthy

Data becomes useful when it can be leveraged at the right time. Not only enterprises application stores operate on large volume, velocity and variety of data. Mobile and social computing are in the need of operating in foresaid data. Replicating and transferring large swaths of data is one challenge faced in the field of data integration. However, smaller chunks of data aggregated from a variety of sources presents and even more interesting challenge in the industry. Over the past few decades, technology trends focused on best user experience, operating systems, high performance computing, high performance web sites, analysis of warehouse data, service oriented architecture, social computing, cloud computing, and big data. Operating on the ‘dark data’ becomes mandatory in the future technology trend, although, no solution can make dark data useful data in a single day. Useful data can be quantified by the facts of contextual, personalized and on time delivery. In most cases, data from a single source may not be complete the picture. Data has to be combined and computed from various sources, where data may be captured as hybrid data, meaning the combination of structured and unstructured data. Since related data is often found across disparate sources, effectively integrating these sources determines how useful this data ultimately becomes. Technology trends in 2013 are expected to focus on big data and private cloud. Consumers are not merely interested in where data is located or how data is retrieved and computed. Consumers are interested in how quick and how the data can be leveraged. In many cases, data virtualization is the right solution, and is expected to play a foundational role for SOA, Cloud integration, and Big Data. The Oracle Data Integration portfolio includes a data virtualization product called ODSI (Oracle Data Service Integrator). Unlike other data virtualization solutions, ODSI can perform both read and write operations on federated/hybrid data (RDBMS, Webservices, delimited file and XML). The ODSI Engine is built on XQuery, hence ODSI user can perform computations on data either using XQuery or SQL. Built in data and query caching features, which reduces latency in repetitive calls. Rightly positioning ODSI, can results in a highly scalable model, reducing spend on additional hardware infrastructure.

Read the article
what is best way to store long term data in iphone Core Data or SQLLite?

- by AmitSri

Hi all, I am working on i-Phone app targeting 3.1.3 and later SDK. I want to know the best way to store user's long term data on i-phone without losing performance, consistency and security. I know, that i can use Core Data, PList and SQL-Lite for storing user specific data in custom formats.But, want to know which one is good to use without compromising app performance and scalability in near future. Thanks

Read the article
Python what's the data structure for triple data

- by Paul

I've got a set of data that has three attributes, say A, B, and C, where A is kind of the index (i.e., A is used to look up the other two attributes.) What would be the best data structure for such data? I used two dictionaries, with A as the index of each. However, there's key errors when the query to the data doesn't match any instance of A.

Read the article
Load balance to proxies

- by LoveRight

I have installed several proxy programs whose IP addresses are, for example, 127.0.0.1:8580(use http), 127.0.0.1:9050(use socks5). You may regrard them as Tor and its alternatives. You know, certain proxy programs are faster than others at times, while at other times, they would be slower. The Firefox add-in, AutoProxy and FoxyProxy Standard, can define a list of rules such as any urls matching the pattern *.google.com should be proxied to 127.0.0.1:8580 using socks5 protocol. But the rule is "static". I want *.google.com to be proxied to the fastest proxy, no matter which one. I think that is kind of load balancing. I thought I could set a rule that direct request of *.google.com to the address the load balancer listens, and the load balancer forwards the request to the fastest real proxy. I notice that tor uses socks5 protocol and some other applications use http. I feel confused that which protocol should the load balancer use. I also start to wonder about the feasibility of this solution. Any suggestions? My operating system is Windows 7 x64.

Read the article
Implementing a generic repository for WCF data services

- by cibrax

The repository implementation I am going to discuss here is not exactly what someone would call repository in terms of DDD, but it is an abstraction layer that becomes handy at the moment of unit testing the code around this repository. In other words, you can easily create a mock to replace the real repository implementation. The WCF Data Services update for .NET 3.5 introduced a nice feature to support two way data bindings, which is very helpful for developing WPF or Silverlight based application but also for implementing the repository I am going to talk about. As part of this feature, the WCF Data Services Client library introduced a new collection DataServiceCollection<T> that implements INotifyPropertyChanged to notify the data context (DataServiceContext) about any change in the association links. This means that it is not longer necessary to manually set or remove the links in the data context when an item is added or removed from a collection. Before having this new collection, you basically used the following code to add a new item to a collection. Order order = new Order { Name = "Foo" }; OrderItem item = new OrderItem { Name = "bar", UnitPrice = 10, Qty = 1 }; var context = new OrderContext(); context.AddToOrders(order); context.AddToOrderItems(item); context.SetLink(item, "Order", order); context.SaveChanges(); Now, thanks to this new collection, everything is much simpler and similar to what you have in other ORMs like Entity Framework or L2S. Order order = new Order { Name = "Foo" }; OrderItem item = new OrderItem { Name = "bar", UnitPrice = 10, Qty = 1 }; order.Items.Add(item); var context = new OrderContext(); context.AddToOrders(order); context.SaveChanges(); In order to use this new feature, you first need to enable V2 in the data service, and then use some specific arguments in the datasvcutil tool (You can find more information about this new feature and how to use it in this post). DataSvcUtil /uri:"http://localhost:3655/MyDataService.svc/" /out:Reference.cs /dataservicecollection /version:2.0 Once you use those two arguments, the generated proxy classes will use DataServiceCollection<T> rather than a simple ObjectCollection<T>, which was the default collection in V1. There are some aspects that you need to know to use this feature correctly. 1. All the entities retrieved directly from the data context with a query track the changes and report those to the data context automatically. 2. A entity created with “new” does not track any change in the properties or associations. In order to enable change tracking in this entity, you need to do the following trick. public Order CreateOrder() { var collection = new DataServiceCollection<Order>(this.context); var order = new Order(); collection.Add(order); return order; } You basically need to create a collection, and add the entity to that collection with the “Add” method to enable change tracking on that entity. 3. If you need to attach an existing entity (For example, if you created the entity with the “new” operator rather than retrieving it from the data context with a query) to a data context for tracking changes, you can use the “Load” method in the DataServiceCollection. var order = new Order { Id = 1 }; var collection = new DataServiceCollection<Order>(this.context); collection.Load(order); In this case, the order with Id = 1 must exist on the data source exposed by the Data service. Otherwise, you will get an error because the entity did not exist. These cool extensions methods discussed by Stuart Leeks in this post to replace all the magic strings in the “Expand” operation with Expression Trees represent another feature I am going to use to implement this generic repository. Thanks to these extension methods, you could replace the following query with magic strings by a piece of code that only uses expressions. Magic strings, var customers = dataContext.Customers .Expand("Orders") .Expand("Orders/Items") Expressions, var customers = dataContext.Customers .Expand(c => c.Orders.SubExpand(o => o.Items)) That query basically returns all the customers with their orders and order items. Ok, now that we have the automatic change tracking support and the expression support for explicitly loading entity associations, we are ready to create the repository. The interface for this repository looks like this,public interface IRepository { T Create<T>() where T : new(); void Update<T>(T entity); void Delete<T>(T entity); IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties); IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties); void Attach<T>(T entity); void SaveChanges(); } The Retrieve and RetrieveAll methods are used to execute queries against the data service context. While both methods receive an array of expressions to load associations explicitly, only the Retrieve method receives a predicate representing the “where” clause. The following code represents the final implementation of this repository.public class DataServiceRepository: IRepository { ResourceRepositoryContext context; public DataServiceRepository() : this (new DataServiceContext()) { } public DataServiceRepository(DataServiceContext context) { this.context = context; } private static string ResolveEntitySet(Type type) { var entitySetAttribute = (EntitySetAttribute)type.GetCustomAttributes(typeof(EntitySetAttribute), true).FirstOrDefault(); if (entitySetAttribute != null) return entitySetAttribute.EntitySet; return null; } public T Create<T>() where T : new() { var collection = new DataServiceCollection<T>(this.context); var entity = new T(); collection.Add(entity); return entity; } public void Update<T>(T entity) { this.context.UpdateObject(entity); } public void Delete<T>(T entity) { this.context.DeleteObject(entity); } public void Attach<T>(T entity) { var collection = new DataServiceCollection<T>(this.context); collection.Load(entity); } public IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query.Where(predicate); } public IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query; } public void SaveChanges() { this.context.SaveChanges(SaveChangesOptions.Batch); } } For instance, you can use the following code to retrieve customers with First name equal to “John”, and all their orders in a single call. repository.Retrieve<Customer>( c => c.FirstName == “John”, //Where c => c.Orders.SubExpand(o => o.Items)); In case, you want to have some pre-defined queries that you are going to use across several places, you can put them in an specific class. public static class CustomerQueries { public static Expression<Func<Customer, bool>> LastNameEqualsTo(string lastName) { return c => c.LastName == lastName; } } And then, use it with the repository. repository.Retrieve<Customer>( CustomerQueries.LastNameEqualsTo("foo"), c => c.Orders.SubExpand(o => o.Items));

Read the article
More CPU cores may not always lead to better performance – MAXDOP and query memory distribution in spotlight

- by sqlworkshops

More hardware normally delivers better performance, but there are exceptions where it can hinder performance. Understanding these exceptions and working around it is a major part of SQL Server performance tuning. When a memory allocating query executes in parallel, SQL Server distributes memory to each task that is executing part of the query in parallel. In our example the sort operator that executes in parallel divides the memory across all tasks assuming even distribution of rows. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. In reality, how often are column values evenly distributed, think about an example; are employees working for your company distributed evenly across all the Zip codes or mainly concentrated in the headquarters? What happens when you sort result set based on Zip codes? Do all products in the catalog sell equally or are few products hot selling items? One of my customers tested the below example on a 24 core server with various MAXDOP settings and here are the results:MAXDOP 1: CPU time = 1185 ms, elapsed time = 1188 msMAXDOP 4: CPU time = 1981 ms, elapsed time = 1568 msMAXDOP 8: CPU time = 1918 ms, elapsed time = 1619 msMAXDOP 12: CPU time = 2367 ms, elapsed time = 2258 msMAXDOP 16: CPU time = 2540 ms, elapsed time = 2579 msMAXDOP 20: CPU time = 2470 ms, elapsed time = 2534 msMAXDOP 0: CPU time = 2809 ms, elapsed time = 2721 ms - all 24 cores.In the above test, when the data was evenly distributed, the elapsed time of parallel query was always lower than serial query. Why does the query get slower and slower with more CPU cores / higher MAXDOP? Maybe you can answer this question after reading the article; let me know: [email protected]. Well you get the point, let’s see an example. The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Let’s update the Employees table with 49 out of 50 employees located in Zip code 2001. update Employees set Zip = EmployeeID / 400 + 1 where EmployeeID % 50 = 1 update Employees set Zip = 2001 where EmployeeID % 50 != 1 go update statistics Employees with fullscan go Let’s create the temporary table #FireDrill with all possible Zip codes. drop table #FireDrill go create table #FireDrill (Zip int primary key) insert into #FireDrill select distinct Zip from Employees update statistics #FireDrill with fullscan go Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --First serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 1) goThe query took 1011 ms to complete. The execution plan shows the 77816 KB of memory was granted while the estimated rows were 799624. No Sort Warnings in SQL Server Profiler. Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 0) go The query took 1912 ms to complete. The execution plan shows the 79360 KB of memory was granted while the estimated rows were 799624. The estimated number of rows between serial and parallel plan are the same. The parallel plan has slightly more memory granted due to additional overhead. Sort properties shows the rows are unevenly distributed over the 4 threads. Sort Warnings in SQL Server Profiler. Intermediate Summary: The reason for the higher duration with parallel plan was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001. Now let’s update the Employees table and distribute employees evenly across all Zip codes. update Employees set Zip = EmployeeID / 400 + 1 go update statistics Employees with fullscan go Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --Serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 1) go The query took 751 ms to complete. The execution plan shows the 77816 KB of memory was granted while the estimated rows were 784707. No Sort Warnings in SQL Server Profiler. Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 0) go The query took 661 ms to complete. The execution plan shows the 79360 KB of memory was granted while the estimated rows were 784707. Sort properties shows the rows are evenly distributed over the 4 threads. No Sort Warnings in SQL Server Profiler. Intermediate Summary: When employees were distributed unevenly, concentrated on 1 Zip code, parallel sort spilled while serial sort performed well without spilling to tempdb. When the employees were distributed evenly across all Zip codes, parallel sort and serial sort did not spill to tempdb. This shows uneven data distribution may affect the performance of some parallel queries negatively. For detailed discussion of memory allocation, refer to webcasts available at www.sqlworkshops.com/webcasts. Some of you might conclude from the above execution times that parallel query is not faster even when there is no spill. Below you can see when we are joining limited amount of Zip codes, parallel query will be fasted since it can use Bitmap Filtering. Let’s update the Employees table with 49 out of 50 employees located in Zip code 2001. update Employees set Zip = EmployeeID / 400 + 1 where EmployeeID % 50 = 1 update Employees set Zip = 2001 where EmployeeID % 50 != 1 go update statistics Employees with fullscan go Let’s create the temporary table #FireDrill with limited Zip codes. drop table #FireDrill go create table #FireDrill (Zip int primary key) insert into #FireDrill select distinct Zip from Employees where Zip between 1800 and 2001 update statistics #FireDrill with fullscan go Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --Serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 1) go The query took 989 ms to complete. The execution plan shows the 77816 KB of memory was granted while the estimated rows were 785594. No Sort Warnings in SQL Server Profiler. Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 0) go The query took 1799 ms to complete. The execution plan shows the 79360 KB of memory was granted while the estimated rows were 785594. Sort Warnings in SQL Server Profiler. The estimated number of rows between serial and parallel plan are the same. The parallel plan has slightly more memory granted due to additional overhead. Intermediate Summary: The reason for the higher duration with parallel plan even with limited amount of Zip codes was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001. Now let’s update the Employees table and distribute employees evenly across all Zip codes. update Employees set Zip = EmployeeID / 400 + 1 go update statistics Employees with fullscan go Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --Serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 1) go The query took 250 ms to complete. The execution plan shows the 9016 KB of memory was granted while the estimated rows were 79973.8. No Sort Warnings in SQL Server Profiler. Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e inner join #FireDrill fd on (e.Zip = fd.Zip) order by e.Zip option (maxdop 0) go The query took 85 ms to complete. The execution plan shows the 13152 KB of memory was granted while the estimated rows were 784707. No Sort Warnings in SQL Server Profiler. Here you see, parallel query is much faster than serial query since SQL Server is using Bitmap Filtering to eliminate rows before the hash join. Parallel queries are very good for performance, but in some cases it can hinder performance. If one identifies the reason for these hindrances, then it is possible to get the best out of parallelism. I covered many aspects of monitoring and tuning parallel queries in webcasts (www.sqlworkshops.com/webcasts) and articles (www.sqlworkshops.com/articles). I suggest you to watch the webcasts and read the articles to better understand how to identify and tune parallel query performance issues. Summary: One has to avoid sort spill over tempdb and the chances of spills are higher when a query executes in parallel with uneven data distribution. Parallel query brings its own advantage, reduced elapsed time and reduced work with Bitmap Filtering. So it is important to understand how to avoid spills over tempdb and when to execute a query in parallel. I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here. Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here. R Meyyappan [email protected] LinkedIn: http://at.linkedin.com/in/rmeyyappan

Read the article
jquery change load get url

- by john morris

Ok so I have a php page with data dynamically outputs depending on a $_GET variable ['file']. I have another page (index.php) which has a jQuery script that uses the load() function to load the php page. I have a list of links and when you click on one, it needs to change the $_GET variable to load, then refresh the load() jQuery variable. Heres a snippet: $("#remote-files").load("data.php?file=wat.txt"); $(".link1").mousedown(function() { $("#remote-files").load("data.php?file=link1.txt"); }); As you can see it loads the data into a div with the ID of remote-files. Is there a better way to do this, like update the page with the new get variable instead of redefining a new load function?

Read the article
data structure for counting frequencies in a database table-like format

- by user373312

i was wondering if there is a data structure optimized to count frequencies against data that is stored in a database table-like format. for example, the data comes in a (comma) delimited format below. col1, col2, col3 x, a, green x, b, blue ... y, c, green now i simply want to count the frequency of col1=x or col1=x and col2=green. i have been storing the data in a database table, but in my profiling and from empirical observation, database connection is the bottle-neck. i have tried using in-memory database solutions too, and that works quite well; the only problem is memory requirements and quirky init/destroy calls. also, i work mainly with java, but have experience with .net, and was wondering if there was any api to work with "tabular" data in a linq way using java. any help is appreciated.

Read the article
Oracle Big Data Software Downloads

- by Mike.Hallett(at)Oracle-BI&EPM

Companies have been making business decisions for decades based on transactional data stored in relational databases. Beyond that critical data, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Oracle offers a broad integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data Connectors Downloads here, includes: Oracle SQL Connector for Hadoop Distributed File System Release 2.1.0 Oracle Loader for Hadoop Release 2.1.0 Oracle Data Integrator Companion 11g Oracle R Connector for Hadoop v 2.1 Oracle Big Data Documentation The Oracle Big Data solution offers an integrated portfolio of products to help you organize and analyze your diverse data sources alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data, Release 2.2.0 - E41604_01 zip (27.4 MB) Integrated Software and Big Data Connectors User's Guide HTML PDF Oracle Data Integrator (ODI) Application Adapter for Hadoop Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases. Typical processing in Hadoop includes data validation and transformations that are programmed as MapReduce jobs. Designing and implementing a MapReduce job usually requires expert programming knowledge. However, when you use Oracle Data Integrator with the Application Adapter for Hadoop, you do not need to write MapReduce jobs. Oracle Data Integrator uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), the application adapter provides the following capabilities: Loading data into Hadoop from the local file system and HDFS Performing validation and transformation of data within Hadoop Loading processed data from Hadoop to an Oracle database for further processing and generating reports Oracle Database Loader for Hadoop Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. It pre-partitions the data if necessary and transforms it into a database-ready format. Oracle Loader for Hadoop is a Java MapReduce application that balances the data across reducers to help maximize performance. Oracle R Connector for Hadoop Oracle R Connector for Hadoop is a collection of R packages that provide: Interfaces to work with Hive tables, the Apache Hadoop compute infrastructure, the local R environment, and Oracle database tables Predictive analytic techniques, written in R or Java as Hadoop MapReduce jobs, that can be applied to data in HDFS files You install and load this package as you would any other R package. Using simple R functions, you can perform tasks such as: Access and transform HDFS data using a Hive-enabled transparency layer Use the R language for writing mappers and reducers Copy data between R memory, the local file system, HDFS, Hive, and Oracle databases Schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations Oracle SQL Connector for Hadoop Distributed File System Using Oracle SQL Connector for HDFS, you can use an Oracle Database to access and analyze data residing in Hadoop in these formats: Data Pump files in HDFS Delimited text files in HDFS Hive tables For other file formats, such as JSON files, you can stage the input in Hive tables before using Oracle SQL Connector for HDFS. Oracle SQL Connector for HDFS uses external tables to provide Oracle Database with read access to Hive tables, and to delimited text files and Data Pump files in HDFS. Related Documentation Cloudera's Distribution Including Apache Hadoop Library HTML Oracle R Enterprise HTML Oracle NoSQL Database HTML Recent Blog Posts Big Data Appliance vs. DIY Price Comparison Big Data: Architecture Overview Big Data: Achieve the Impossible in Real-Time Big Data: Vertical Behavioral Analytics Big Data: In-Memory MapReduce Flume and Hive for Log Analytics Building Workflows in Oozie

Read the article
Apache load balancer limits with Tomcat over AJP

- by PAS

Hi All, I have Apache acting as a load balancer in front of 3 Tomcat servers. Occasionally, Apache returns 503 responses, which I would like to remove completely. All 4 servers are not under significant load in terms of CPU, memory, or disk, so I am a little unsure what is reaching it's limits or why. 503s are returned when all workers are in error state - whatever that means. Here are the details: Apache config: <IfModule mpm_prefork_module> StartServers 30 MinSpareServers 30 MaxSpareServers 60 MaxClients 200 MaxRequestsPerChild 1000 </IfModule> ... <Proxy *> AddDefaultCharset Off Order deny,allow Allow from all </Proxy> # Tomcat HA cluster <Proxy balancer://mycluster> BalancerMember ajp://10.176.201.9:8009 keepalive=On retry=1 timeout=1 ping=1 BalancerMember ajp://10.176.201.10:8009 keepalive=On retry=1 timeout=1 ping=1 BalancerMember ajp://10.176.219.168:8009 keepalive=On retry=1 timeout=1 ping=1 </Proxy> # Passes thru track. or api. ProxyPreserveHost On ProxyStatus On # Original tracker ProxyPass /m balancer://mycluster/m ProxyPassReverse /m balancer://mycluster/m Tomcat config: <Server port="8005" shutdown="SHUTDOWN"> <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" /> <Listener className="org.apache.catalina.core.JasperListener" /> <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" /> <Service name="Catalina"> <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" /> <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" /> <Engine name="Catalina" defaultHost="localhost"> <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false"> </Engine> </Service> </Server> Apache error log: [Mon Mar 22 18:39:47 2010] [error] (70007)The timeout specified has expired: proxy: AJP: attempt to connect to 10.176.201.10:8009 (10.176.201.10) failed [Mon Mar 22 18:39:47 2010] [error] ap_proxy_connect_backend disabling worker for (10.176.201.10) [Mon Mar 22 18:39:47 2010] [error] proxy: AJP: failed to make connection to backend: 10.176.201.10 [Mon Mar 22 18:39:47 2010] [error] (70007)The timeout specified has expired: proxy: AJP: attempt to connect to 10.176.201.9:8009 (10.176.201.9) failed [Mon Mar 22 18:39:47 2010] [error] ap_proxy_connect_backend disabling worker for (10.176.201.9) [Mon Mar 22 18:39:47 2010] [error] proxy: AJP: failed to make connection to backend: 10.176.201.9 [Mon Mar 22 18:39:47 2010] [error] (70007)The timeout specified has expired: proxy: AJP: attempt to connect to 10.176.219.168:8009 (10.176.219.168) failed [Mon Mar 22 18:39:47 2010] [error] ap_proxy_connect_backend disabling worker for (10.176.219.168) [Mon Mar 22 18:39:47 2010] [error] proxy: AJP: failed to make connection to backend: 10.176.219.168 [Mon Mar 22 18:39:47 2010] [error] proxy: BALANCER: (balancer://mycluster). All workers are in error state [Mon Mar 22 18:39:47 2010] [error] proxy: BALANCER: (balancer://mycluster). All workers are in error state [Mon Mar 22 18:39:47 2010] [error] proxy: BALANCER: (balancer://mycluster). All workers are in error state [Mon Mar 22 18:39:47 2010] [error] proxy: BALANCER: (balancer://mycluster). All workers are in error state [Mon Mar 22 18:39:47 2010] [error] proxy: BALANCER: (balancer://mycluster). All workers are in error state [Mon Mar 22 18:39:47 2010] [error] proxy: BALANCER: (balancer://mycluster). All workers are in error state Load balancer top info: top - 23:44:11 up 210 days, 4:32, 1 user, load average: 0.10, 0.11, 0.09 Tasks: 135 total, 2 running, 133 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.2%id, 0.1%wa, 0.0%hi, 0.1%si, 0.3%st Mem: 524508k total, 517132k used, 7376k free, 9124k buffers Swap: 1048568k total, 352k used, 1048216k free, 334720k cached Tomcat top info: top - 23:47:12 up 210 days, 3:07, 1 user, load average: 0.02, 0.04, 0.00 Tasks: 63 total, 1 running, 62 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 99.8%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2097372k total, 2080888k used, 16484k free, 21464k buffers Swap: 4194296k total, 380k used, 4193916k free, 1520912k cached Catalina.out does not have any error messages in it. According to Apache's server status, it seems to be maxing out at 143 requests/sec. I believe the servers can handle substantially more load than they are, so any hints about low default limits or other reasons why this setup would be maxing out would be greatly appreciated.

Read the article
Benchmarking a file server

- by Joel Coel

I'm working on building a new file server... a simple Windows Server box with a few terabytes of disk space to share on the LAN. Pain for current hard drive prices aside :( -- I would like to get some benchmarks for this device under load compared to our old server. The old server was installed in 2005 and had 5 136GB 10K disks in RAID 5. The new server has 8 1TB disks in two RAID 10 volumes (plus a hot spare for each volume), but they're only 7.2K rpm, and of course with a much larger cache size. I'd like to get an idea of the performance expectations of the new server relative to the old. Where do I get started? I'd like to know both raw potential under different kinds of load for each server, as well an idea of what our real-world load looks like and how it will translate. Will disk load even matter, or will performance be more driven by the network connection? I could probably fumble through some disk i/o and wait counters in performance monitor, but I don't really know what to look for, which counters to watch, or for how long and when. FWIW, I'm expecting a nice improvement because of the benefits of having two different volumes and the better RAID 10 performance vs RAID 5, in spite of using slower disks... but I'd like to get an idea of how much.

Read the article
Load-balancing between a Procurve switch and a server

- by vlad

Hello I've been searching around the web for this problem i've been having. It's similar in a way to this question: How exactly & specifically does layer 3 LACP destination address hashing work? My setup is as follows: I have a central switch, a Procurve 2510G-24, image version Y.11.16. It's the center of a star topology, there are four switches connected to it via a single gigabit link. Those switches service the users. On the central switch, I have a server with two gigabit interfaces that I want to bond together in order to achieve higher throughput, and two other servers that have single gigabit connections to the switch. The topology looks as follows: sw1 sw2 sw3 sw4 | | | | --------------------- | sw0 | --------------------- || | | srv1 srv2 srv3 The servers were running FreeBSD 8.1. On srv1 I set up a lagg interface using the lacp protocol, and on the switch I set up a trunk for the two ports using lacp as well. The switch showed that the server was a lacp partner, I could ping the server from another computer, and the server could ping other computers. If I unplugged one of the cables, the connection would keep working, so everything looked fine. Until I tested throughput. There was only one link used between srv1 and sw0. All testing was conducted with iperf, and load distribution was checked with systat -ifstat. I was looking to test the load balancing for both receive and send operations, as I want this server to be a file server. There were therefore two scenarios: iperf -s on srv1 and iperf -c on the other servers iperf -s on the other servers and iperf -c on srv1 connected to all the other servers. Every time only one link was used. If one cable was unplugged, the connections would keep going. However, once the cable was plugged back in, the load was not distributed. Each and every server is able to fill the gigabit link. In one-to-one test scenarios, iperf was reporting around 940Mbps. The CPU usage was around 20%, which means that the servers could withstand a doubling of the throughput. srv1 is a dell poweredge sc1425 with onboard intel 82541GI nics (em driver on freebsd). After troubleshooting a previous problem with vlan tagging on top of a lagg interface, it turned out that the em could not support this. So I figured that maybe something else is wrong with the em drivers and / or lagg stack, so I started up backtrack 4r2 on this same server. So srv1 now uses linux kernel 2.6.35.8. I set up a bonding interface bond0. The kernel module was loaded with option mode=4 in order to get lacp. The switch was happy with the link, I could ping to and from the server. I could even put vlans on top of the bonding interface. However, only half the problem was solved: if I used srv1 as a client to the other servers, iperf was reporting around 940Mbps for each connection, and bwm-ng showed, of course, a nice distribution of the load between the two nics; if I run the iperf server on srv1 and tried to connect with the other servers, there was no load balancing. I thought that maybe I was out of luck and the hashes for the two mac addresses of the clients were the same, so I brought in two new servers and tested with the four of them at the same time, and still nothing changed. I tried disabling and reenabling one of the links, and all that happened was the traffic switched from one link to the other and back to the first again. I also tried setting the trunk to "plain trunk mode" on the switch, and experimented with other bonding modes (roundrobin, xor, alb, tlb) but I never saw any traffic distribution. One interesting thing, though: one of the four switches is a Cisco 2950, image version 12.1(22)EA7. It has 48 10/100 ports and 2 gigabit uplinks. I have a server (call it srv4) with a 4 channel trunk connected to it (4x100), FreeBSD 8.0 release. The switch is connected to sw0 via gigabit. If I set up an iperf server on one of the servers connected to sw0 and a client on srv4, ALL 4 links are used, and iperf reports around 330Mbps. systat -ifstat shows all four interfaces are used. The cisco port-channel uses src-mac to balance the load. The HP should use both the source and destination according to the manual, so it should work as well. Could this mean there is some bug in the HP firmware? Am I doing something wrong?

Read the article

Search Results

Search found 80052 results on 3203 pages for 'data load performance'.

Page 22/3203 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

- by AtulThakor

- by Jian Lin

- by wazoox

- by Ardman

- by Giorgi

- by SDD

- by Alphager

- by KLaker

- by EMS

- by Pinal Dave

- by ActionFactory

- by Krishnamoorthy

- by AmitSri

- by Paul

- by LoveRight

- by cibrax

- by sqlworkshops

- by john morris

- by user373312

- by Mike.Hallett(at)Oracle-BI&EPM

- by PAS

- by Joel Coel

- by vlad

< Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >