scalability - Page 7 - Developer IT

Scaling a postgres server to multiple servers

- by Oliver Kiel

Our postgres server is about hitting its capacity and we're looking into adding a second database server. Are there any scaling solutions that are particularly good for a postgres setup?

Implications of using many USB web cameras

I'm looking into connecting multiple low resolution USB webcams to a single computer. What implications might this have on performance? How does, for example, four 320x240 cameras fare against a single 640x480 camera? I'm not well versed in the architecture of the USB interface, what are the performance caveats? By performance I mean how would it affect the time to read the image data from multiple cameras compared to a single one.

Read the article

Mysql overlap vs distance--which is faster?

- by alex

I'm trying to find rows which are within __meters from the given point. THis can be done via distance or overlap. WHat is faster?

Read the article

How do you deal with denormalization / secondary indexes in database sharding?

- by Continuation

Say I have a "message" table with 2 secondary indexes: "recipient_id" "sender_id" I want to shard the "message" table by "recipient_id". That way to retrieve all messages sent to a certain recipient I only need to query one shard. But at the same time, I want to be able to make a query that ask for all messages sent by a certain sender. Now I don't want to send that query to every single shard of the "message" table. One way to do this is to duplicate the data and have a "message_by_sender" table sharded by "sender_id". The problem with that approach is that every time a message has been sent, I need to insert the message into both "message" and "message_by_sender" tables. But what if after inserting into "message" the insertion into "message_by_sender" fail? In that case the message exists in "message" but not in "message_by_sender". How do I make sure that if a message exists in "message" then it also exists in "message_by_sender" without resorting to 2 phase commit? This must be a very common issue for anyone who shards their databases. How do you deal woth it?

Read the article

NServiceBus pipeline with Distributors

- by David

I'm building a processing pipeline with NServiceBus but I'm having trouble with the configuration of the distributors in order to make each step in the process scalable. Here's some info: The pipeline will have a master process that says "OK, time to start" for a WorkItem, which will then start a process like a flowchart. Each step in the flowchart may be computationally expensive, so I want the ability to scale out each step. This tells me that each step needs a Distributor. I want to be able to hook additional activities onto events later. This tells me I need to Publish() messages when it is done, not Send() them. A process may need to branch based on a condition. This tells me that a process must be able to publish more than one type of message. A process may need to join forks. I imagine I should use Sagas for this. Hopefully these assumptions are good otherwise I'm in more trouble than I thought. For the sake of simplicity, let's forget about forking or joining and consider a simple pipeline, with Step A followed by Step B, and ending with Step C. Each step gets its own distributor and can have many nodes processing messages. NodeA workers contain a IHandleMessages processor, and publish EventA NodeB workers contain a IHandleMessages processor, and publish Event B NodeC workers contain a IHandleMessages processor, and then the pipeline is complete. Here are the relevant parts of the config files, where # denotes the number of the worker, (i.e. there are input queues NodeA.1 and NodeA.2): NodeA: <MsmqTransportConfig InputQueue="NodeA.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" /> <UnicastBusConfig DistributorControlAddress="NodeA.Distrib.Control" DistributorDataAddress="NodeA.Distrib.Data" > <MessageEndpointMappings> </MessageEndpointMappings> </UnicastBusConfig> NodeB: <MsmqTransportConfig InputQueue="NodeB.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" /> <UnicastBusConfig DistributorControlAddress="NodeB.Distrib.Control" DistributorDataAddress="NodeB.Distrib.Data" > <MessageEndpointMappings> <add Messages="Messages.EventA, Messages" Endpoint="NodeA.Distrib.Data" /> </MessageEndpointMappings> </UnicastBusConfig> NodeC: <MsmqTransportConfig InputQueue="NodeC.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" /> <UnicastBusConfig DistributorControlAddress="NodeC.Distrib.Control" DistributorDataAddress="NodeC.Distrib.Data" > <MessageEndpointMappings> <add Messages="Messages.EventB, Messages" Endpoint="NodeB.Distrib.Data" /> </MessageEndpointMappings> </UnicastBusConfig> And here are the relevant parts of the distributor configs: Distributor A: <add key="DataInputQueue" value="NodeA.Distrib.Data"/> <add key="ControlInputQueue" value="NodeA.Distrib.Control"/> <add key="StorageQueue" value="NodeA.Distrib.Storage"/> Distributor B: <add key="DataInputQueue" value="NodeB.Distrib.Data"/> <add key="ControlInputQueue" value="NodeB.Distrib.Control"/> <add key="StorageQueue" value="NodeB.Distrib.Storage"/> Distributor C: <add key="DataInputQueue" value="NodeC.Distrib.Data"/> <add key="ControlInputQueue" value="NodeC.Distrib.Control"/> <add key="StorageQueue" value="NodeC.Distrib.Storage"/> I'm testing using 2 instances of each node, and the problem seems to come up in the middle at Node B. There are basically 2 things that might happen: Both instances of Node B report that it is subscribing to EventA, and also that NodeC.Distrib.Data@MYCOMPUTER is subscribing to the EventB that Node B publishes. In this case, everything works great. Both instances of Node B report that it is subscribing to EventA, however, one worker says NodeC.Distrib.Data@MYCOMPUTER is subscribing TWICE, while the other worker does not mention it. In the second case, which seem to be controlled only by the way the distributor routes the subscription messages, if the "overachiever" node processes an EventA, all is well. If the "underachiever" processes EventA, then the publish of EventB has no subscribers and the workflow dies. So, my questions: Is this kind of setup possible? Is the configuration correct? It's hard to find any examples of configuration with distributors beyond a simple one-level publisher/2-worker setup. Would it make more sense to have one central broker process that does all the non-computationally-intensive traffic cop operations, and only sends messages to processes behind distributors when the task is long-running and must be load balanced? Then the load-balanced nodes could simply reply back to the central broker, which seems easier. On the other hand, that seems at odds with the decentralization that is NServiceBus's strength. And if this is the answer, and the long running process's done event is a reply, how do you keep the Publish that enables later extensibility on published events?

Read the article

Delphi : Sorted List

- by Sethu

I need to sort close to a 1,00,000 floating point entries in delphi. I am new to delphi and would like to know if there are any readymade solutions available. I tried a few language provided constructs and they take an inordinate amount of time to run to completion.(a 5-10 sec execution time is fine for the application)

Read the article

fast retrieval from MYSQL DB

- by trojanwarrior3000

I have a table of users - It contains around millions of rows (user-id is the primary key). I just want to retrieve user-id and their joining date. using "select user-id,joining date from table user" requires lot of time.Is there a fast way to query/retrieve the same data from this table?

Read the article

Practical approach to concurrency control

- by Industrial

Hi everyone, I'd read this article recently and are very interested on how to make a practical approach to Concurrency control on a web server. The server will run CentOS + PHP + mySQL with Memcached. How would you set it up to work? http://saasinterrupted.com/2010/02/05/high-availability-principle-concurrency-control/ Thanks!

Read the article

open source alternatives to oracle coherence?

- by Blankman

Are there any open source alternatives to oracle coherence? (btw, how much does coherence cost anyways?)

Read the article

Design Decision - Scaling out web based application's architecture

- by Vadi

This question is about design decision. I am currently working on a web project that will have 40K users to start with and in couple of month expected to grow 50M users (not concurrent users though). I would like to have a architecture that can be scaled out easily without much effort. In order to explain, I would like to use a trivial scenario. Lets say, User entities and services such as CreateUser, AuthenticateUser etc., are a simple method calls for the Page Controllers. But once the traffic increases, for example, authenticating user (or such services related to user entities) has to be moved out to a different internal server to spread the load. But at the same time using RPC calls over the network when the user count is 40K would become overkill. My proposal was to use IPC initially and when we need to scale out we can interally switch to TCP based RPC calls so that it can easily scale out. For example, I am referring to System.IO.Pipes.NamedPipeStreamServer to start with and move on to a TcpListener later on. If we have proper design that can encapsulate above said approach, it would easy for us to scale out services into multiple network servers but at the same time avoid network calls when the user count is small. Is this is a best approach? Any suggestions would be great .. Note: The database scaling is definetly the second phase optimization so we have already made architectural design in place to easily partition data when traffic increases. The primary bottleneck would be application servers over the time period.

Read the article

How do you efficiently bulk index lookups?

- by Liron Shapira

I have these entity kinds: Molecule Atom MoleculeAtom Given a list(molecule_ids) whose lengths is in the hundreds, I need to get a dict of the form {molecule_id: list(atom_ids)}. Likewise, given a list(atom_ids) whose length is in the hunreds, I need to get a dict of the form {atom_id: list(molecule_ids)}. Both of these bulk lookups need to happen really fast. Right now I'm doing something like: atom_ids_by_molecule_id = {} for molecule_id in molecule_ids: moleculeatoms = MoleculeAtom.all().filter('molecule =', db.Key.from_path('molecule', molecule_id)).fetch(1000) atom_ids_by_molecule_id[molecule_id] = [ MoleculeAtom.atom.get_value_for_datastore(ma).id() for ma in moleculeatoms ] Like I said, len(molecule_ids) is in the hundreds. I need to do this kind of bulk index lookup on almost every single request, and I need it to be FAST, and right now it's too slow. Ideas: Will using a Molecule.atoms ListProperty do what I need? Consider that I am storing additional data on the MoleculeAtom node, and remember it's equally important for me to do the lookup in the molecule-atom and atom-molecule directions. Caching? I tried memcaching lists of atom IDs keyed by molecule ID, but I have tons of atoms and molecules, and the cache can't fit it. How about denormalizing the data by creating a new entity kind whose key name is a molecule ID and whose value is a list of atom IDs? The idea is, calling db.get on 500 keys is probably faster than looping through 500 fetches with filters, right?

Read the article

How to calculate real-time stats?

- by Diego Jancic

I have a site with millions of users (well, actually it doesn't have any yet, but let's imagine), and I want to calculate some stats like "log-ins in the past hour". The problem is similar to the one described here: http://highscalability.com/blog/2008/4/19/how-to-build-a-real-time-analytics-system.html The simplest approach would be to do a select like this: select count(distinct user_id) from logs where date>='20120601 1200' and date <='20120601 1300' (of course other conditions could apply for the stats, like log-ins per country) Of course this would be really slow, mainly if it has millions (or even thousands) of rows, and I want to query this every time a page is displayed. How would you summarize the data? What should go to the (mem)cache? EDIT: I'm looking for a way to de-normalize the data, or to keep the cache up-to-date. For example I could increment an in-memory variable every time someone logs in, but that would help to know the total amount of logins, not the "logins in the last hour". Hope it's more clear now.

Read the article

What is optimal hardware configuration for heavy load LAMP application

- by Piotr Kochanski

I need to run Linux-Apache-PHP-MySQL application (Moodle e-learning platform) for a large number of concurrent users - I am aiming 5000 users. By concurrent I mean that 5000 people should be able to work with the application at the same time. "Work" means not only do database reads but writes as well. The application is not very typical, since it is doing a lot of inserts/updates on the database, so caching techniques are not helping to much. We are using InnoDB storage engine. In addition application is not written with performance in mind. For instance one Apache thread usually occupies about 30-50 MB of RAM. I would be greatful for information what hardware is needed to build scalable configuration that is able to handle this kind of load. We are using right now two HP DLG 380 with two 4 core processors which are able to handle much lower load (typically 300-500 concurrent users). Is it reasonable to invest in this kind of boxes and build cluster using them or is it better to go with some more high-end hardware? I am particularly curious how many and how powerful servers are needed (number of processors/cores, size of RAM) what network equipment should be used (what kind of switches, network cards) any other hardware, like particular disc storage solutions, etc, that are needed Another thing is how to put together everything, that is what is the most optimal architecture. Clustering with MySQL is rather hard (people are complaining about MySQL Cluster, even here on Stackoverflow).

Read the article

Fast data retrieval in MySQL

- by trojanwarrior3000

I have a table of users - It contains around millions of rows (user-id is the primary key). I just want to retrieve user-id and their joining date. Using SELECT user-id, joining-date FROM users requires lot of time. Is there a fast way to query/retrieve the same data from this table?

Read the article

A SelfHosted WCF Service over Basic HTTP Binding doesn't support more than 1000 concurrent requests

- by Krishnan

I have self hosted a WCF Service over BasicHttpBinding consumed by an ASMX Client. I'm simulating a concurrent user load of 1200 users. The service method takes a string parameter and returns a string. The data exchanged is less than 10KB. The processing time for a request is fixed at 2 seconds by having a Thread.Sleep(2000) statement. Nothing additional. I have removed all the DB Hits / business logic. The same piece of code runs fine for 1000 concurrent users. I get the following error when I bump up the number to 1200 users. System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.Net.PooledStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.Net.Connection.SyncRead(HttpWebRequest request, Boolean userRetrievedStream, Boolean probeRead) --- End of inner exception stack trace --- at System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(WebRequest request) at System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(WebRequest request) at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters) at WCF.Throttling.Client.Service.Function2(String param) This exception is often reported on DataContract mismatch and large data exchange. But never when doing a load test. I have browsed enough and have tried most of the options which include, Enabled Trace & Message log on server side. But no errors logged. To overcome Port Exhaustion MaxUserPort is set to 65535, and TcpTimedWaitDelay 30 secs. MaxConcurrent Calls is set to 600, and MaxConcurrentInstances is set to 1200. The Open, Close, Send and Receive Timeouts are set to 10 Minutes. The HTTPWebRequest KeepAlive set to false. I have not been able to nail down the issue for the past two days. Any help would be appreciated. Thank you.

Read the article

How to estimate tomcat server requirements?

- by Daniil

We have a brand new webapp written that runs on Tomcat. So far, only one client is using it through the day. They run about 180 unique logins a day. Not really a lot IMO. Now, we managed to sell it to a brand new client who likes and wants to roll it out to 50,000 clients. How many of them will login at the same time - no idea. But I need to do the whole thing - allocate, create, config and maintain. OK - last is simple(errrr). The application runs off of Tomcat 5.5 on Gentoo (I'm thinking to upgrade to Tomcat 6) with MSSQL & mySQL behind. I do realize that a more enterprise ready application would be a better fit, but that's not an option at the moment. Since I've never done this before, I'm a bit lost. Can someone advice on how to go about estimating the equipment requirements for this client? Tomcat does have clustering, so that I can do. MS SQL - I'm sure they have something too. I'm thinking to stick it behind LVS (which we do use at the moment for something else too). Any help from people who deal with these details is greatly appreciated!

Read the article

What should be the considerations for choosing SQL/NoSQL?

- by Yuval A

Target application is a medium-sized website built to support several hundred-thousand users an hour, with an option to scale above that. Data model is rather simple, and caching potential is pretty high (~10:1 ratio of read to edit actions). What should be the considerations when coming to choose between a relational, SQL-based datastore to a NoSQL option (such as HBase and Cassandra)?

Read the article

Is having functionality in DB a road block to scalability?

- by Estefany Velez

I may not be able to give the right title to the question. But here it is, We are developing financial portal for wealth management. We are expecting over 10000 clients to use the application. The portal calculates various performance analytics based on the the technical analysis of the stock market. We developed lot of the functionality through Stored procedures, user defined functions, triggers etc. through Database. We thought we can gain huge performance boost doing stuff directly in database than through C# code. And we actually did get a huge performance boost. When I tried to brag about the achievement to our CTO, he counter questioned my decision of having functionality implemented in database rather than code. According to him such applications suffer scalability problems. In his words "These days things are kept in memory/cache. Clustered data is hard to manage over time. Facebook, Google have nothing in database. It is the era of thin servers and thick clients. DB is used only to store plain data and functionality should be completely decoupled from the database." Can you guys please give me some suggestions as to whether what he says is right. How to go about architect such an application?

Read the article

What is the Reason large sites don't use MySQL with ASP.NET?

- by Luke101

I have read this article from highscalability about stackoverflow and other large websites. Many large high traffic .NET sites such as plentyoffish.com, mysapce and SO all use .NET technologies and use SQL SERver for their database. In the article it says SO said As you add more and more database servers the SQL Server license costs can be outrageous. So by starting scale up and gradually going scale out with non-open source software you can be in a world of financial hurt. I don't understand why don't high traffic .NET sites convert their databases to MySQL as it is waay cheaper then SQL Server

Read the article

Application Server or Lightweight Container?

- by Jeff Storey

Let me preface this by saying this is not an actual situation of mine but I'm asking this question more for my own knowledge and to get other people's inputs here. I've used both Spring and EJB3/JBoss, and for the smaller types of applications I've built, Spring (+Tomcat when needed) has been much simpler to use. However, when scaling up to larger applications that require things like load balancing and clustering, is Spring still a viable solution? Or is it time to turn to a solution like EJB3/JBoss when you start to get big enough to need that? I'm not sure if I've scoped the problem well enough to get a good answer, so please let me know. Thanks, Jeff

Read the article

Combining cache methods - memcache/disk based

- by Industrial

Hi! Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us. What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data. Here's our two approaches that we have thought of right now: Using memcache on all<< major queries and leaving it alone to do what it does best. Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage. The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance. Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes. What approach should we use? - Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users? Thanks a lot!

Read the article

JavaEE Application Server or Lightweight Container?

- by Jeff Storey

Let me preface this by saying this is not an actual situation of mine but I'm asking this question more for my own knowledge and to get other people's inputs here. I've used both Spring and EJB3/JBoss, and for the smaller types of applications I've built, Spring (+Tomcat when needed) has been much simpler to use. However, when scaling up to larger applications that require things like load balancing and clustering, is Spring still a viable solution? Or is it time to turn to a solution like EJB3/JBoss when you start to get big enough to need that? I'm not sure if I've scoped the problem well enough to get a good answer, so please let me know. Thanks, Jeff

Read the article

Boost::Spirit::Qi autorules -- avoiding repeated copying of AST data structures

- by phooji

I've been using Qi and Karma to do some processing on several small languages. Most of the grammars are pretty small (20-40 rules). I've been able to use autorules almost exclusively, so my parse trees consist entirely of variants, structs, and std::vectors. This setup works great for the common case: 1) parse something (Qi), 2) make minor manipulations to the parse tree (visitor), and 3) output something (Karma). However, I'm concerned about what will happen if I want to make complex structural changes to a syntax tree, like moving big subtrees around. Consider the following toy example: A grammar for s-expr-style logical expressions that uses autorules... // Inside grammar class; rule names match struct names... pexpr %= pand | por | var | bconst; pand %= lit("(and ") >> (pexpr % lit(" ")) >> ")"; por %= lit("(or ") >> (pexpr % lit(" ")) >> ")"; pnot %= lit("(not ") >> pexpr >> ")"; ... which leads to parse tree representation that looks like this... struct var { std::string name; }; struct bconst { bool val; }; struct pand; struct por; struct pnot; typedef boost::variant<bconst, var, boost::recursive_wrapper<pand>, boost::recursive_wrapper<por>, boost::recursive_wrapper<pnot> > pexpr; struct pand { std::vector<pexpr> operands; }; struct por { std::vector<pexpr> operands; }; struct pnot { pexpr victim; }; // Many Fusion Macros here Suppose I have a parse tree that looks something like this: pand / ... \ por por / \ / \ var var var var (The ellipsis means 'many more children of similar shape for pand.') Now, suppose that I want negate each of the por nodes, so that the end result is: pand / ... \ pnot pnot | | por por / \ / \ var var var var The direct approach would be, for each por subtree: - create pnot node (copies por in construction); - re-assign the appropriate vector slot in the pand node (copies pnot node and its por subtree). Alternatively, I could construct a separate vector, and then replace (swap) the pand vector wholesale, eliminating a second round of copying. All of this seems cumbersome compared to a pointer-based tree representation, which would allow for the pnot nodes to be inserted without any copying of existing nodes. My question: Is there a way to avoid copy-heavy tree manipulations with autorule-compliant data structures? Should I bite the bullet and just use non-autorules to build a pointer-based AST (e.g., http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)?

Read the article

Database for Python Twisted

- by Will

There's an API for Twisted apps to talk to a database in a scalable way: twisted.enterprise.dbapi The confusing thing is, which database to pick? The database will have a Twisted app that is mostly making inserts and updates and relatively few selects, and then other strictly-read-only clients that are accessing the database directly making selects. (The read-only users are not necessarily selecting the data that the Twisted app is inserting; its not as though the database is being used as a message-queue) My understanding - which I'd like corrected/adviced - is that: Postgres is a great DB, but all the Python bindings - and there is a confusing maze of them - are abandonware There is psycopg2, but that makes a lot of noise about doing its own connection-pooling and things; does this co-exist gracefully/usefully/transparently with the Twisted async database connection pooling and such? SQLLite is a great database for little things but if used in a multi-user way it does whole-database locking, so performance would suck in the usage pattern I envisage MySQL - after the Oracle takeover, who'd want to adopt it now or adopt a fork? Is there anything else out there?

Read the article

Building highly scalable web services

- by christopher-mccann

My team and I are in the middle of developing an application which needs to be able to handle pretty heavy traffic. Not facebook level but in the future I would like to be able to scale to that without massive code re-writes. My thought was to modularise out everything into seperate services with their own interfaces. So for example messaging would have a messaging interface that might have send and getMessages() as methods and then the PHP web app would simply query this interface through soap or curl or something like that. The messaging application could then be any kind of application so a Java application or Python or whatever was suitable for that particular functionality with its own seperate database shard. Is this a good approach?

Search Results

Search found 874 results on 35 pages for 'scalability'.

Page 7/35 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by Oliver Kiel

- by Martin

- by alex

- by Continuation

- by David

- by Sethu

- by trojanwarrior3000

- by Industrial

- by Blankman

- by Vadi

- by Liron Shapira

- by Diego Jancic

- by Piotr Kochanski

- by trojanwarrior3000

- by Krishnan

- by Daniil

- by Yuval A

- by Estefany Velez

- by Luke101

- by Jeff Storey

- by Industrial

- by Jeff Storey

- by phooji

- by Will

- by christopher-mccann

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >