atg best practice in industries - Page 489

How to manipulate huge amounts of data

- by Alejandro

Hi there! I'm having the following problem. I need to store huge amounts of information (~32 GB) and be able to manipulate it as fast as possible. I'm wondering what's the best way to do it (combinations of programming language + OS + whatever you think its important). The structure of the information I'm using is a 4D array (NxNxNxN) of double-precission floats (8 bytes). Right now my solution is to slice the 4D array into 2D arrays and store them in separate files in the HDD of my computer. This is really slow and the manipulation of the data is unbearable, so this is no solution at all! I'm thinking on moving into a Supercomputing facility in my country and store all the information in the RAM, but I'm not sure how to implement an application to take advantage of it (I'm not a professional programmer, so any book/reference will help me a lot). An alternative solution I'm thinking on is to buy a dedicated server with lots of RAM, but I don't know for sure if that will solve the problem. So right now my ignorance doesn't let me choose the best way to proceed. What would you do if you were in this situation? I'm open to any idea. Thanks in advance!

Read the article

Explaining the forecasts from an ARIMA model

- by Samik R.

I am trying to explain to myself the forecasting result from applying an ARIMA model to a time-series dataset. The data is from the M1-Competition, the series is MNB65. For quick reference, I have a google doc spreadsheet with the data. I am trying to fit the data to an ARIMA(1,0,0) model and get the forecasts. I am using R. Here are some output snippets: > arima(x, order = c(1,0,0)) Series: x ARIMA(1,0,0) with non-zero mean Call: arima(x = x, order = c(1, 0, 0)) Coefficients: ar1 intercept 0.9421 12260.298 s.e. 0.0474 202.717 > predict(arima(x, order = c(1,0,0)), n.ahead=12) $pred Time Series: Start = 53 End = 64 Frequency = 1 [1] 11757.39 11786.50 11813.92 11839.75 11864.09 11887.02 11908.62 11928.97 11948.15 11966.21 11983.23 11999.27 I have a few questions: (1) How do I explain that although the dataset shows a clear downward trend, the forecast from this model trends upward. This also happens for ARIMA(2,0,0), which is the best ARIMA fit for the data using auto.arima (forecast package) and for an ARIMA(1,0,1) model. (2) The intercept value for the ARIMA(1,0,0) model is 12260.298. Shouldn't the intercept satisfy the equation: C = mean * (1 - sum(AR coeffs)), in which case, the value should be 715.52. I must be missing something basic here. (3) This is clearly a series with non-stationary mean. Why is an AR(2) model still selected as the best model by auto.arima? Could there be an intuitive explanation? Thanks.

Read the article

Rules to choose hardware for OLTP systems (sql server)

- by Roman Pokrovskij

Ok. We know database size, number of concurrent users, number of transactions per minute; should choose number of processors, RAID, RAM, mirroring and clustering. There are no exact rule.. but may be there are no rules at all? In my practice in every case I have "legacy" system, and after some inspections and interview I can form an opinion how hardware and design can be improved. But every time when I meet "absolutely" new system (I guess there are no new systems, but sometimes are such tasks) I can't say anything trustful. So I'm interesting how people deal with such tasks? They map task on theirs experience or have some base formulas?

Read the article

Easily (as in WYSIWYG) customize the docbook output

- by Sukima

I've used DocBook in the past and I love the idea behind the separation of content from presentation. I am very comfortable editing XML directly. In my extensive search to find the best documenting solution for my needs I am always coming back to this one solution: DocBook - Build system (ant, make, etc.) - Output I have seen lots of information concerning the best WYSIWYG, XML, Text editors for writing DocBook including alternative markup languages like asciidoc. All these solutions focus on the creation of DocBook or the nightmare of the DocBook tool chain. No one ever addresses the Output side other then to say "Just use XSL" or "Custom scripts" When tasked to make a document or manual I don't want to worry about spending countless hours attempting to reprogram, customize, and modify the XSL, CSS, and shell scripts (i.e. O'Riely books). That is a very arduous task. My query: is there a tool that makes the customizing easier? And is there anything that could be similar to say Pages or Word in that the user creates a template and the tool chain does the rest? Attempting to do a visual task like pretty logos and fixing all the broken layouts that the default XSL comes up with (pagination is a mess) is very difficult from a text editor. Content is easy. Editing DocBook XSL was truly a nightmare when I did it in the past. I've searched and I find lots of info on XML editors but nothing on XSL editors. Or am I lacking a key understanding of the process. Thanks.

Read the article

Implement user authentication against remote DB with a Web Service

- by Juan González

I'm just starting reasearch about the best way to implement user authentication within my soon-to-be app. This is what I have so far: A desktop (Windows) application on a remote server. That application is accessed locally with a browser (it has a web console and MS SQL Server to store everything). The application is used with local credendials stored in the DB. This is what I'd like to accompllish: Provide access to some information on that SQL Server DB from my app. That access of course must be granted once a user has id himself with valid credentials. This is what I know so far: How to create my PHP web service and query info from a DB using JSON. How to work with AFNetworking libraries to retrieve information. How to display that info on the app. What I don't know is which could be the best method to implement user authentication from iOS. Should I send username and password? Should I send some hash? Is there a way to secure the handshake? I'd for sure appreciate any advise, tip, or recommendation you have from previous experience. I don't want to just implement it but instead I want to do it as good as possible.

Read the article

Changing Administrator password on a Windows 2008 web server

- by Nick

I've just taken over the administration of a Windows 2008 web server from a previous employee on a temporary basis. I need to change the Admin password as soon as I can but I've noticed that quite a few of the services also run under this account. So: Is there a quick way to find out which services will be affected by me changing the password or is it a question of going down the list? It doesn't seem right to me that the Admin account is used in this manner; should I create a different account for these services, or is using the Admin a/c standard practice? I realize everyone's servers / networks are set up differently, but are there any other items I should be aware of when changing the Admin password? Thanks

Read the article

Simple continuously running XMPP client in python

- by tom

I'm using python-xmpp to send jabber messages. Everything works fine except that every time I want to send messages (every 15 minutes) I need to reconnect to the jabber server, and in the meantime the sending client is offline and cannot receive messages. So I want to write a really simple, indefinitely running xmpp client, that is online the whole time and can send (and receive) messages when required. My trivial (non-working) approach: import time import xmpp class Jabber(object): def __init__(self): server = 'example.com' username = 'bot' passwd = 'password' self.client = xmpp.Client(server) self.client.connect(server=(server, 5222)) self.client.auth(username, passwd, 'bot') self.client.sendInitPresence() self.sleep() def sleep(self): self.awake = False delay = 1 while not self.awake: time.sleep(delay) def wake(self): self.awake = True def auth(self, jid): self.client.getRoster().Authorize(jid) self.sleep() def send(self, jid, msg): message = xmpp.Message(jid, msg) message.setAttr('type', 'chat') self.client.send(message) self.sleep() if __name__ == '__main__': j = Jabber() time.sleep(3) j.wake() j.send('[email protected]', 'hello world') time.sleep(30) The problem here seems to be that I cannot wake it up. My best guess is that I need some kind of concurrency. Is that true, and if so how would I best go about that? EDIT: After looking into all the options concerning concurrency, I decided to go with twisted and wokkel. If I could, I would delete this post.

Read the article

Rails form with a better URL

- by Sam

Wow, switching to REST is a different paradigm for sure and is mainly a headache right now. view <% form_tag (businesses_path, :method => "get") do %> <%= select_tag :business_category_id, options_for_select(@business_categories.collect {|bc| [bc.name, bc.id ]}.insert(0, ["All Containers", 0]), which_business_category(@business_category) ), { :onchange => "this.form.submit();"} %> <% end %> controller def index @business_categories = BusinessCategory.find(:all) if params[:business_category_id].to_i != 0 @business_category = BusinessCategory.find(params[:business_category_id]) @businesses = @business_category.businesses else @businesses = Business.all end respond_to do |format| format.html # index.html.erb format.xml { render :xml => @businesses } end end routes map.resources What I want to to is get a better URL than what this form is presenting which is the following: http://localhost:3000/businesses?business_category_id=1 Without REST I would have do something like http://localhost:3000/business/view/bbq bbq as permalink or I would have done http://localhost:300/business_categories/view/bbq and get the business that are associated with the category but I don't really know the best way of doing this. So the two questions are what is the best logic of finding a business by its categories using the latter form and number two how to get that in a pretty URL all through RESTful routes in Rails.

Read the article

Assemble an image browser side with JavaScript or Flash?

- by Kris Walker

Would it be possible to assemble an image on the browser by 'concatenating' other downloaded images together? The use case is this. The page will display 36 different tiles (small images). The user should be able to arrange those tiles into a 6 x 6 grid and save the resulting grid to disk as an image. The best solution would be to do it all in the browser without Flash. The next best solution would be to allow the user to create the grid in the browser with simple JavaScript drag and drop functionality and then send the coordinates to the server for image processing. The last solution would be to do it all in the browser with Flash. Is it even possible for Flash to create an image and then allow the user to save it from the browser? I am familiar with the Pixastic JavaScript library ( http://www.pixastic.com/ ), but it relies on getting image data to and from a canvas element which is not very well supported. What if I send the tile images to the browser as base64 encoded strings? Could I use JavaScript to create the 6 x 6 grid image? And if so, is there some way of allowing the user to get it onto disk without relying on the canvas element?

Read the article

Recommendations to handle development and deployment of php web apps using shared project code

- by Exception e

I am wondering what the best way (for a lone developer) is to develop a project that depends on code of other projects deploy the resulting project to the server I am planning to put my code in svn, and have shared code as a separate project. There are problems with svn:externals which I cannot fully estimate. I've read subversion:externals considered to be an anti-pattern, and How do you organize your version control repository, but there is one special thing with php-projects (and other interpreted source code): there is no final executable resulting from your libraries. External dependencies are thus always on raw source code. Ideally I really want to be able to develop simultaneously on one project and the projects it dependends on. Possible way: Check out a projects' dependency in a sub folder as a working copy of the trunk. Problems I foresee: When you want to deploy a project, you might want to freeze its dependencies, right? The dependency code should not end up as a duplicate in the projects repository, I think. *(update1: I additionally assume svn:ignore will pose problems if I cannot fall back on symlinks, see my comment) I am still looking for suggestions that do not require the use junction points. They are a sort of unsupported hack in winxp, which may break some programs* This leads me to the last part of the question (as one has influence on the other): how do you deploy apps whith such dependencies? I've looked into BuildOut for Python, but it seems to be tightly related to the python ecosystem (resolving and fetching python modules from the web etc). I am very eager to learn about your best practices.

Read the article

How to store some of the entity's values in another table using hibernate?

- by nimcap

Hi guys, is there a simple way to persist some of the fields in another class and table using hibernate. For example, I have a Person class with name, surname, email, address1, address2, city, country fields. I want my classes to be: public class Person { private String name; private String surname; private String email; private Address address; // .. } public class Address { private Person person; // to whom this belongs private String address1; private String address2; private String city; private Address country; // .. } and I want to store Address in another table. What is the best way to achieve this? Edit: I am using annotations. It does not have to be the way I described, I am looking for best practices. PS. If there is a way to make Address immutable (to use as a value object) that is even better, or maybe not because I thought everything from wrong perspective :)

Read the article

Indentation control while developing a small python like language

- by sap

Hello, I'm developing a small python like language using flex, byacc (for lexical and parsing) and C++, but i have a few questions regarding scope control. just as python it uses white spaces (or tabs) for indentation, not only that but i want to implement index breaking like for instance if you type "break 2" inside a while loop that's inside another while loop it would not only break from the last one but from the first loop as well (hence the number 2 after break) and so on. example: while 1 while 1 break 2 'hello world'!! #will never reach this. "!!" outputs with a newline end 'hello world again'!! #also will never reach this. again "!!" used for cout end #after break 2 it would jump right here but since I don't have an "anti" tab character to check when a scope ends (like C for example i would just use the '}' char) i was wondering if this method would the the best: I would define a global variable, like "int tabIndex" on my yacc file that i would access in my lex file using extern. then every time i find a tab character on my lex file i would increment that variable by 1. when parsing on my yacc file if i find a "break" keyword i would decrement by the amount typed after it from the tabIndex variable, and when i reach and EOF after compiling and i get a tabIndex != 0 i would output compilation error. now the problem is, whats the best way to see if the indentation got reduced, should i read \b (backspace) chars from lex and then reduce the tabIndex variable (when the user doesn't use break)? another method to achieve this? also just another small question, i want every executable to have its starting point on the function called start() should i hardcode this onto my yacc file? sorry for the long question any help is greatly appreciated. also if someone can provide an yacc file for python would be nice as a guideline (tried looking on Google and had no luck). thanks in advance.

Read the article

Building a ctypes-"based" C library with distutils

- by Robie Basak

Following this recommendation, I have written a native C extension library to optimise part of a Python module via ctypes. I chose ctypes over writing a CPython-native library because it was quicker and easier (just a few functions with all tight loops inside). I've now hit a snag. If I want my work to be easily installable using distutils using python setup.py install, then distutils needs to be able to build my shared library and install it (presumably into /usr/lib/myproject). However, this not a Python extension module, and so as far as I can tell, distutils cannot do this. I've found a few references to people other people with this problem: Someone on numpy-discussion with a hack back in 2006. Somebody asking on distutils-sig and not getting an answer. Somebody asking on the main python list and being pointed to the innards of an existing project. I am aware that I can do something native and not use distutils for the shared library, or indeed use my distribution's packaging system. My concern is that this will limit usability as not everyone will be able to install it easily. So my question is: what is the current best way of distributing a shared library with distutils that will be used by ctypes but otherwise is OS-native and not a Python extension module? Feel free to answer with one of the hacks linked to above if you can expand on it and justify why that is the best way. If there is nothing better, at least all the information will be in one place.

Read the article

Populating a foreign key table with variable user input

- by Vincent

I'm working on a website that will be based on user contributed data, submitted using a regular HTML form. To simplify my question, let's say that there will be two fields in the form: "User Name" and "Country" (this is just an example, not the actual site). There will be two tables in the database : "countries" and "users," with "users.country_id" being a foreign key to the "countries" table (one-to-many). The initial database will be empty. Users from all over the world will submit their names and the countries they live in and eventually the "countries" table will get filled out with all of the country names in the world. Since one country can have several alternative names, input like Chile, Chili, Chilli will generate 3 different records in the countries table, but in fact there is only one country. When I search for records from Chile, Chili and Chilli will not be included. So my question is - what would be the best way to deal with a situation like this, with conditions such that the initial database is empty, no other resources are available and everything is based on user input? How can I organize it in such way that Chile, Chili and Chilli would be treated as one country, with minimum manual interference. What are the best practices when it comes to normalizing user submitted data and is there a scientific term for this? I'm sure this is a common problem. Again, I used country names just to simplify my question, it can be anything that has possible different spellings.

Read the article

Calculating IOPS for a single HDD - what am I doing wrong?

- by red888

So I know there is no standardized way of calculating IOPS for a HDD, but from everything I have read it appears one of the most accurate formulas is the following: IOP/ms = + {rotational latency} + ({block size} / {data transfer rate}) Which is IOs per millisecond or what the book I've been reading calls "Disk Service Time". Also rotational latency is calculated as half of one rotation in milliseconds. This was taken from the EMC book "Information Storage and Management" -arguably a pretty reliable source right\wrong? Putting this formula into practice consider this Seagate data sheet. I am going to calculate IOPS for the ST3000DM001 model for a block size of 4kb: Seek Average (Write) = 9.5 -I'll measuring IOPS for writes Spindle speed = 7200rpm Average Data Rate = 156MB/s So my variables are: Seek Time = 9.5ms Rotational latency = (.5 / (7200rpm / 60)) = 0.004s = 4ms Data Rate = 156MB/s = (0.156MB/ms / 0.004MB) = 39 9.5ms + 4ms + 39 = IO/ms 52.5 1 / (52.5 * 0.001) = 19 IOPS 19 IOPS for this drive clearly is not right so what am I doing wrong?

Read the article

How do UEFI and virtual machines relate to each other?

- by Iterator

I am trying to get my head around UEFI (Unified Extensible Firmware Interface) and it's not entirely clear to me how this affects virtual machines. Thus, there are three parts to this question: Is UEFI an advance in hardware support for virtualization? All other things being equal, would a machine with UEFI be more likely to run a virtual machine more efficiently than one without, or does UEFI cause any performance hits that negate any speed improvements from a virtual machine? Would the difference in execution be visible to code running in a virtual machine? (In theory, it shouldn't, but in practice?)

Read the article

Customizing the TFS 2008 build sequence to avoid compilation and deploy SSRS

- by Andrew

I'm trying to create a CI process for SQL Server Reporting Services. I am fairly new to TFS but quite experienced with MSBuild. In the past I've used a combination of MSBuild with Team City so the whole build process is more or less custom. Here lies the start of my problems, as the solution I am deploying only contains Report Server projects (rds), no compilation is required. I thought that I would override the the first default task that TFS runs (EndToEndIteration) to override the default TFS build sequence and inject my own. The first snag that I have come across is that the build always fails, how can I set the status of the build to success? Currently the EndToEndIteration task is very light and only has a message. Is this the best method to create a custom build process in TFS where compilation is not required? Or should I use the default sequence and override one of the hook tasks mentioned in http://msdn.microsoft.com/en-us/library/aa337604%28VS.80%29.aspx (ie: AfterCompile) The core steps that I'd like to achieve are: Bundle the RDL and datasource files Connect to the host server to register/deploy the reports Re-apply any subscriptions that previously existed Run tests to verify the deployment succeeded and is returning results as expected I have found another article on Report services deployment: http://stackoverflow.com/questions/88710/reporting-services-deployment But it doesn't mention the best practice for customizing the standard build process. Any help would be appreciated.

Read the article

Debugging a Google Web Toolkit application that has an error when deployed on Google App Engine

- by gerdemb

I have a Google Web Toolkit application that I am deploying to Google App Engine. In the deployed application, I am getting a JavaScript error Uncaught TypeError: Cannot read property 'f' of null. This sounds like the JavaScript equivalent of a Java NullPointerException. The problem is that the GWT JavaScript is obfuscated, so it's impossible to debug in the browser and I can't reproduce the same problem in hosted mode where I could use the Java debugger. I think the reason I'm only seeing the error on the deployed application is that the database I'm using on the GAE server is triggering something differently than the test database I'm using during testing and development. So, any ideas about the best way to proceed? I've thought of the following things: Deploy a non-obsfucated version of my application. Despite a lot of Googling, I can't figure out how to do this using the automatic deploy script provided with the Google Eclipse Plugin. Does anyone know? Download and copy my GAE data to the local server Somehow point my development code to use the GAE server for data instead of the local test database. This seems like the best idea... Can anyone suggest how to proceed here? Finally, is there a way to catch these JavaScript errors on the production server and log them somewhere? Without logging, I won't have anyway to know if my users are having errors that don't occur on the server. The GWT.log() function is automatically stripped out of the production code...

Read the article

Should server spare parts be stored in climate controlled storage?

- by Shane Wealti

What is a best practice for storing server spares (hard drives, RAM, power supplies, etc) with respect to how/where they are stored? Some options are storing them in climate controlled storage or just a standard warehouse-type stockroom? My understanding is that all other things being equal climate-controlled storage is preferred. What are the risks of storing that type of thing in a non-climate-controlled, somewhat dusty shelving area? Conversely are there risks to storing spares in a climate controlled area? If there are space limitations in climate controlled storage are there some parts, such as hard drives, which should be in climate controlled, while other parts such as power supplies will probably be ok in non climate controlled storage?

Read the article

Good web hosting for ASP.NET MVC 1.0 app

- by magellings

I'm looking for hosting for an ASP.NET MVC 1.0 app. I've narrowed down with research to either asphostportal, asphostcentral, godaddy, or 1&1. I've ruled out crystaltech and softsyshosting due to price with better plans. Will be running a small e-commerce site written with ASP.NET MVC 1.0 and want to be sure it will work, as well as looking for cheapest price with best value in regards to disk space/bandwidth. And bandwidth is basically how much data can be sent from your site per month right? Any opinions appreciated as I'm finding this tough to narrow down. I know you can bin deploy MVC but heard full trust mode is required as well as some routing rules in IIS. 1&1 says they can't enable full trust. This is what I was looking at: name data(disk space/bandwidth) price MVCenabled crystal tech 500MB/50GB 7.95 + 7.95 setup 2000MB/200GB 16.95 softsyshosting 500MB/5GB 3.50 + 12/year domain 1000MB/10GB 5.84 3000MB/30GB 13.33 asphostportal 5GB/50GB 5.75 + 8.99/year yes 10GB/100GB 10.25 asphostcentral 2GB/15GB 4.99 yes 3GB/30GB 7.99/month domain free 5GB/40GB 11.99 godaddy 10GB/300GB 10.69 + 4.74/month 150GB/1500GB 6.99/month 1&1 10GB/unlimited 3.99 + free domain 150GB/unlimited 6.99 1&1 seems to be best value if MVC app will work. I'm a bit confused on bandwidth being unlimited. May seem like a good thing, but what if one website on the server is a resource hog because of this?

Read the article

Pattern for UI configuration

- by TERACytE

I have a Win32 C++ program that validates user input and updates the UI with status information and options. Currently it is written like this: void ShowError() { SetIcon(kError); SetMessageString("There was an error"); HideButton(kButton1); HideButton(kButton2); ShowButton(kButton3); } void ShowSuccess() { SetIcon(kError); std::String statusText (GetStatusText()); SetMessageString(statusText); HideButton(kButton1); HideButton(kButton2); ShowButton(kButton3); } // plus several more methods to update the UI using similar mechanisms I do not likes this because it duplicates code and causes me to update several methods if something changes in the UI. I am wondering if there is a design pattern or best practice to remove the duplication and make the functionality easier to understand and update. I could consolidate the code inside a config function and pass in flags to enable/disable UI items, but I am not convinced this is the best approach. Any suggestions and ideas?

Read the article

Can I use a static cache Helper method in a NET MVC controller?

- by Euston

I realise there have been a few posts regarding where to add a cache check/update and the separation of concerns between the controller, the model and the caching code. There are two great examples that I have tried to work with but being new to MVC I wonder which one is the cleanest and suits the MVC methodology the best? I know you need to take into account DI and unit testing. Example 1 (Helper method with delegate) ...in controller var myObject = CacheDataHelper.Get(thisID, () => WebServiceServiceWrapper.GetMyObjectBythisID(thisID)); Example 2 (check for cache in model class) in controller var myObject = WebServiceServiceWrapper.GetMyObjectBythisID(thisID)); then in model class.............. if (!CacheDataHelper.Get(cachekey, out myObject)) { //do some repository processing // Add obect to cache CacheDataHelper.Add(myObject, cachekey); } Both use a static cache helper class but the first example uses a method signature with a delegate method passed in that has the name of the repository method being called. If the data is not in cache the method is called and the cache helper class handles the adding or updating to the current cache. In the second example the cache check is part of the repository method with an extra line to call the cache helper add method to update the current cache. Due to my lack of experience and knowledge I am not sure which one is best suited to MVC. I like the idea of calling the cache helper with the delegate method name in order to remove any cache code in the repository but I am not sure if using the static method in the controller is ideal? The second example deals with the above but now there is no separation between the caching check and the repository lookup. Perhaps that is not a problem as you know it requires caching anyway?

Read the article

DNS settings for SaaS in the cloud?

- by Jeremy

I am building a SaaS product. When a user signs up for an account they must select an alias for their site --------.getlaunchpoint.com. Right now I have an A record *.getlaunchpoint.com that points to the ip address server. However, with Azure I am not given an IP address. The suggested implementation is to make use of a CNAME. I need to create a CNAME for *.getlaunchpoint.com - getlaunchpoint.cloudapp.net GoDaddy does not support CNAME wildcards. Searching on Google I'm getting conflicting information... is CNAME wildcard a bad practice? I run into the same problem with Amazon EC2 if I want to make use of load balancers because you cannot tie a public IP address to an Amazon Load Balancer. Amazon also suggests the use of a CNAME. Any help would be appreciated.

Read the article

BeautifulSoup HTMLParseError. What's wrong with this?

- by user1915496

This is my code: from bs4 import BeautifulSoup as BS import urllib2 url = "http://services.runescape.com/m=news/recruit-a-friend-for-free-membership-and-xp" res = urllib2.urlopen(url) soup = BS(res.read()) other_content = soup.find_all('div',{'class':'Content'})[0] print other_content Yet an error comes up: /Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py:149: RuntimeWarning: Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help. "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) Traceback (most recent call last): File "web.py", line 5, in <module> soup = BS(res.read()) File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in __init__ self._feed() File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in _feed self.builder.feed(self.markup) File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 150, in feed raise e I've let two other people use this code, and it works for them perfectly fine. Why is it not working for me? I have bs4 installed...

Read the article

Algorithm for finding similar users through a join table

- by Gdeglin

I have an application where users can select a variety of interests from around 300 possible interests. Each selected interest is stored in a join table containing the columns user_id and interest_id. Typical users select around 50 interests out of the 300. I would like to build a system where users can find the top 20 users that have the most interests in common with them. Right now I am able to accomplish this using the following query: SELECT i2.user_id, count(i2.interest_id) AS count FROM interests_users as i1, interests_users as i2 WHERE i1.interest_id = i2.interest_id AND i1.user_id = 35 GROUP BY i2.user_id ORDER BY count DESC LIMIT 20; However, this query takes approximately 500 milliseconds to execute with 10,000 users and 500,000 rows in the join table. All indexes and database configuration settings have been tuned to the best of my ability. I have also tried avoiding the use of joins altogether using the following query: select user_id,count(interest_id) count from interests_users where interest_id in (13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,508) group by user_id order by count desc limit 20; But this one is even slower (~800 milliseconds). How could I best lower the time that I can gather this kind of data to below 100 milliseconds? I have considered putting this data into a graph database like Neo4j, but I am not sure if that is the easiest solution or if it would even be faster than what I am currently doing.

Search Results

Search found 31207 results on 1249 pages for 'atg best practice in industries'.

Page 489/1249 | < Previous Page | 485 486 487 488 489 490 491 492 493 494 495 496 | Next Page >

- by Alejandro

- by Samik R.

- by Roman Pokrovskij

- by Sukima

- by Juan González

- by Nick

- by tom

- by Sam

- by Kris Walker

- by Exception e

- by nimcap

- by sap

- by Robie Basak

- by Vincent

- by red888

- by Iterator

- by Andrew

- by gerdemb

- by Shane Wealti

- by magellings

- by TERACytE

- by Euston

- by Jeremy

- by user1915496

- by Gdeglin

< Previous Page | 485 486 487 488 489 490 491 492 493 494 495 496 | Next Page >