Search Results

Search found 29949 results on 1198 pages for 'large scale website'.

Page 224/1198 | < Previous Page | 220 221 222 223 224 225 226 227 228 229 230 231  | Next Page >

  • How can I efficiently group a large list of URLs by their host name in Perl?

    - by jesper
    I have text file that contains over one million URLs. I have to process this file in order to assign URLs to groups, based on host address: { 'http://www.ex1.com' = ['http://www.ex1.com/...', 'http://www.ex1.com/...', ...], 'http://www.ex2.com' = ['http://www.ex2.com/...', 'http://www.ex2.com/...', ...] } My current basic solution takes about 600 MB of RAM to do this (size of file is about 300 MB). Could you provide some more efficient ways? My current solution simply reads line by line, extracts host address by regex and puts the url into a hash. EDIT Here is my implementation (I've cut off irrelevant things): while($line = <STDIN>) { chomp($line); $line =~ /(http:\/\/.+?)(\/|$)/i; $host = "$1"; push @{$urls{$host}}, $line; } store \%urls, 'out.hash';

    Read the article

  • How best to implement "favourites" feature? (like favourite products on a data driven website)

    - by ClarkeyBoy
    Hi, I have written a dynamic database driven, object oriented website with an administration frontend etc etc. I would like to add a feature where customers can save items as "favourites", without having to create an account and login, to come back to them later, but I dont know how exactly to go about doing this... I see three options: Log favourites based on IP address and then change these to be logged against an account if the customer then creates an account; Force customers to create an account to be able to use this functionality; Log favourites based on IP address but give users the option to save their favourites under a name they specify. The problem with option 1 is that I dont know much about IP addresses - my Dad thinks they are unique, but I know people have had problems with systems like this. The problem with 1 and 2 is that accounts have not been opened up to customers yet - only administrators can log in at the moment. It should be easy to alter this (no more than a morning or afternoons work) but I would also have to implement usergroups too. The problem with option 3 is that if user A saves a favourites list called "My Favourites", and then user B tries to save a list under this name and it is refused, user B will then be able to access the list saved by user A because they now know it already exists. A solution to this is to password protect lists, but to go to all this effort I may as well implement option 2. Of course I could always use option 4; use an alternative if anyone can suggest a better solution than any of the above options. So has anyone ever done something like this before? If so how did you go about it? What do you recommend (or not recommend)? Many thanks in advance, Regards, Richard

    Read the article

  • Optimal method to create a large string containing several variables?

    - by Runcible
    I want to create a string that contains many variables: std::string name1 = "Frank"; std::string name2 = "Joe"; std::string name3 = "Nancy"; std::string name4 = "Sherlock"; std::string sentence; sentence = name1 + " and " + name2 + " sat down with " + name3; sentence += " to play cards, while " + name4 + " played the violin."; This should produce a sentence that reads Frank and Joe sat down with Nancy to play cards, while Sherlock played the violin. My question is: What is the optimal way to accomplish this? I am concerned that constantly using the + operator is ineffecient. Is there a better way?

    Read the article

  • Payment Processors - What do I need to know if I want to accept credit cards on my website?

    - by Michael Pryor
    This question talks about different payment processors and what they cost, but I'm looking for the answer to what do I need to do if I want to accept credit card payments? Assume I need to store credit card numbers for customers, so that the obvious solution of relying on the credit card processor to do the heavy lifting is not available. PCI Data Security, which is apparently the standard for storing credit card info, has a bunch of general requirements, but how does one implement them? And what about the vendors, like Visa, who have their own best practices? Do I need to have keyfob access to the machine? What about physically protecting it from hackers in the building? Or even what if someone got their hands on the backup files with the sql server data files on it? What about backups? Are there other physical copies of that data around? Tip: If you get a merchant account, you should negotiate that they charge you "interchange-plus" instead of tiered pricing. With tiered pricing, they will charge you different rates based on what type of Visa/MC is used -- ie. they charge you more for cards with big rewards attached to them. Interchange plus billing means you only pay the processor what Visa/MC charges them, plus a flat fee. (Amex and Discover charge their own rates directly to merchants, so this doesn't apply to those cards. You'll find Amex rates to be in the 3% range and Discover could be as low as 1%. Visa/MC is in the 2% range). This service is supposed to do the negotiation for you (I haven't used it, this is not an ad, and I'm not affiliated with the website, but this service is greatly needed.) This blog post gives a complete rundown of handling credit cards (specifically for the UK).

    Read the article

  • Optimize date query for large child tables: GiST or GIN?

    - by Dave Jarvis
    Problem 72 child tables, each having a year index and a station index, are defined as follows: CREATE TABLE climate.measurement_12_013 ( -- Inherited from table climate.measurement_12_013: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass), -- Inherited from table climate.measurement_12_013: station_id integer NOT NULL, -- Inherited from table climate.measurement_12_013: taken date NOT NULL, -- Inherited from table climate.measurement_12_013: amount numeric(8,2) NOT NULL, -- Inherited from table climate.measurement_12_013: category_id smallint NOT NULL, -- Inherited from table climate.measurement_12_013: flag character varying(1) NOT NULL DEFAULT ' '::character varying, CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = 7), CONSTRAINT measurement_12_013_taken_check CHECK (date_part('month'::text, taken)::integer = 12) ) INHERITS (climate.measurement) CREATE INDEX measurement_12_013_s_idx ON climate.measurement_12_013 USING btree (station_id); CREATE INDEX measurement_12_013_y_idx ON climate.measurement_12_013 USING btree (date_part('year'::text, taken)); (Foreign key constraints to be added later.) The following query runs abysmally slow due to a full table scan: SELECT count(1) AS measurements, avg(m.amount) AS amount FROM climate.measurement m WHERE m.station_id IN ( SELECT s.id FROM climate.station s, climate.city c WHERE -- For one city ... -- c.id = 5182 AND -- Where stations are within an elevation range ... -- s.elevation BETWEEN 0 AND 3000 AND 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 ) AND -- -- Begin extracting the data from the database. -- -- The data before 1900 is shaky; insufficient after 2009. -- extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND -- Whittled down by category ... -- m.category_id = 1 AND m.taken BETWEEN -- Start date. (extract( YEAR FROM m.taken )||'-01-01')::date AND -- End date. Calculated by checking to see if the end date wraps -- into the next year. If it does, then add 1 to the current year. -- (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date GROUP BY extract( YEAR FROM m.taken ) The sluggishness comes from this part of the query: m.taken BETWEEN /* Start date. */ (extract( YEAR FROM m.taken )||'-01-01')::date AND /* End date. Calculated by checking to see if the end date wraps into the next year. If it does, then add 1 to the current year. */ (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date The HashAggregate from the plan shows a cost of 10006220141.11, which is, I suspect, on the astronomically huge side. There is a full table scan on the measurement table (itself having neither data nor indexes) being performed. The table aggregates 237 million rows from its child tables. Question What is the proper way to index the dates to avoid full table scans? Options I have considered: GIN GiST Rewrite the WHERE clause Separate year_taken, month_taken, and day_taken columns to the tables What are your thoughts? Thank you!

    Read the article

  • Efficient way to create/unpack large bitfields in C?

    - by drhorrible
    I have one microcontroller sampling from a lot of ADC's, and sending the measurements over a radio at a very low bitrate, and bandwidth is becoming an issue. Right now, each ADC only give us 10 bits of data, and its being stored in a 16-bit integer. Is there an easy way to pack them in a deterministic way so that the first measurement is at bit 0, second at bit 10, third at bit 20, etc? To make matters worse, the microcontroller is little endian, and I have no control over the endianness of the computer on the other side.

    Read the article

  • Will MyISAM type tables work better than InnoDB for large numbers of columns?

    - by Ethan
    I have a MySQL InnoDB table with 238 columns. 56 of them are TEXT type, 27 are VARCHAR(255). I am getting MySQL error 139 when users insert data sometimes. After research I found that I'm probably running into InnoDB row size/column size/column count limitations. (I'm putting it that way because the specific limits among those three things are interdependent.) Docs on InnoDB give an idea of the limits. If I switch this table to MyISAM is it likely to solve the problem? I understand the maximum row size of 65,535 bytes. I think I'm hitting InnoDB's additional 8000 byte limit somehow. Switching to PostgreSQL is also a remote option, but would take much longer.

    Read the article

  • How to create column width in CSS that expands with large images yet stays a default size for normal

    - by ChrisJF
    I am creating an HTML5 web page with a one column layout. Basically, it is a forum thread with individual posts. I have specified in my CSS file the column to be 600px wide and centered it in the window using margin: 0 auto;. However, some images that are in the individual posts are larger than 600px and spill out of the column. I'd like to widen an individual post to fit the larger images. However, I want all the other posts to still be 600px wide. Right now, I'm just using overflow:auto which will create a scroll bar, but this is less than ideal. Is this possible to have the an individual post width grow for larger content yet stay fixed for normal content? Is this possible using just pure CSS? Thanks in advance!

    Read the article

  • How to structure an index for type ahead for extremely large dataset using Lucene or similar?

    - by Pete
    I have a dataset of 200million+ records and am looking to build a dedicated backend to power a type ahead solution. Lucene is of interest given its popularity and license type, but I'm open to other open source suggestions as well. I am looking for advice, tales from the trenches, or even better direct instruction on what I will need as far as amount of hardware and structure of software. Requirements: Must have: The ability to do starts with substring matching (I type in 'st' and it should match 'Stephen') The ability to return results very quickly, I'd say 500ms is an upper bound. Nice to have: The ability to feed relevance information into the indexing process, so that, for example, more popular terms would be returned ahead of others and not just alphabetical, aka Google style. In-word substring matching, so for example ('st' would match 'bestseller') Note: This index will purely be used for type ahead, and does not need to serve standard search queries. I am not worried about getting advice on how to set up the front end or AJAX, as long as the index can be queried as a service or directly via Java code. Up votes for any useful information that allows me to get closer to an enterprise level type ahead solution

    Read the article

  • How to generate one large dependency map for the whole project that builds with makefiles?

    - by Stan
    I have a gigantic project that is built using makefiles. Running make at the root of the project takes over 20 minutes when no files have changed (i.e. just traversing the project and checking for updated files). I'd like to create a dependency map that will tell me which directories I need to run 'make' in based on the file(s) changed. I already have a list of updated files that I can get from my version control system, and I'd like to skip the 20 minutes of traversing and get straight to the locations that do need to be recompiled. The project has a mix of several languages and custom tools, so this would ideally be language-independent (i.e. it would only process all makefiles to generate dependencies). I'll settle for a C/C++-specific solution, too, as the majority of the project is in C++. The project is built on Linux.

    Read the article

  • How to troubleshoot a Highcharts script that's not rendering data when date is added and hanging the JS engine with large datasets?

    - by ylluminate
    I have a Highchart JS graph that I'm building in Rails (although I don't think Ruby has real bearing on this problem unless it's the Date output format) to which I'm adding the timestamp of each datapoint. Presently the array of floats is rendering fine without timestamps, however when I add the timestamp to the series it fails to rend. What's worse is that when the series has hundreds of entries all sorts of problems arise, not the least of which is the browser entirely hanging and requiring a force quit / kill. I'm using the following to build the array of arrays data series: series1 = readings.map{|row| [(row.date.to_i * 1000), (row.data1.to_f if BigDecimal(row.data1) != BigDecimal("-1000.0"))] } This yields a result like this: series: [{"name":"Data 1","data":[[1326262980000,1.79e-09],[1326262920000,1.29e-09],[1326262860000,1.22e-09],[1326262800000,1.42e-09],[1326262740000,1.29e-09],[1326262680000,1.34e-09],[1326262620000,1.31e-09],[1326262560000,1.51e-09],[1326262500000,1.24e-09],[1326262440000,1.7e-09],[1326262380000,1.24e-09],[1326262320000,1.29e-09],[1326262260000,1.53e-09],[1326262200000,1.23e-09],[1326262140000,1.21e-09]],"color":"blue"}] Yet nothing appears on the graph as noted. Notwithstanding, when I compare the data series in one of their very similar examples here: http://www.highcharts.com/demo/spline-irregular-time It appears that really the data series are formatted identically (except in mine I use the timestamp vs date method). This leads me to think I've got a problem with the timestamp output, but I'm just not able to figure out where / how as I'm converting the date output to an integer multipled by 1000 to convert it to milliseconds as per explained in a similar Railscasts tutorial. I would very much appreciate it if someone could point me in the right direction here as to what I may be doing wrong. What could cause no data to appear on the graph in smaller sized sets (<100 points) and when into the hundreds causes an apparent hang in the javascript engine in this case? Perhaps ultimately the key lies here as this is the entire js that's being generated and not rendering: jQuery(function() { // 1. Define JSON options var options = { chart: {"defaultSeriesType":"spline","renderTo":"chart_name"}, title: {"text":"Title"}, legend: {"layout":"vertical","style":{}}, xAxis: {"title":{"text":"UTC Time"},"type":"datetime"}, yAxis: [{"title":{"text":"Left Title","margin":10}},{"title":{"text":"Right Groups Title"},"opposite":true}], tooltip: {"enabled":true}, credits: {"enabled":false}, plotOptions: {"areaspline":{}}, series: [{"name":"Data 1","data":[[1326262980000,1.79e-08],[1326262920000,1.69e-08],[1326262860000,1.62e-08],[1326262800000,1.42e-08],[1326262740000,1.29e-08],[1326262680000,1.34e-08],[1326262620000,1.31e-08],[1326262560000,1.51e-08],[1326262500000,1.64e-08],[1326262440000,1.7e-08],[1326262380000,1.64e-08],[1326262320000,1.69e-08],[1326262260000,1.53e-08],[1326262200000,1.23e-08],[1326262140000,1.21e-08]],"color":"blue"},{"name":"Data 2","data":[[1326262980000,9.79e-09],[1326262920000,9.78e-09],[1326262860000,9.8e-09],[1326262800000,9.82e-09],[1326262740000,9.88e-09],[1326262680000,9.89e-09],[1326262620000,1.3e-06],[1326262560000,1.32e-06],[1326262500000,1.33e-06],[1326262440000,1.33e-06],[1326262380000,1.34e-06],[1326262320000,1.33e-06],[1326262260000,1.32e-06],[1326262200000,1.32e-06],[1326262140000,1.32e-06]],"color":"red"}], subtitle: {} }; // 2. Add callbacks (non-JSON compliant) // 3. Build the chart var chart = new Highcharts.StockChart(options); });

    Read the article

  • How to handle large dataset with JPA (or at least with Hibernate)?

    - by Roman
    I need to make my web-app work with really huge datasets. At the moment I get either OutOfMemoryException or output which is being generated 1-2 minutes. Let's put it simple and suppose that we have 2 tables in DB: Worker and WorkLog with about 1000 rows in the first one and 10 000 000 rows in the second one. Latter table has several fields including 'workerId' and 'hoursWorked' fields among others. What we need is: count total hours worked by each user; list of work periods for each user. The most straightforward approach (IMO) for each task in plain SQL is: 1) select Worker.name, sum(hoursWorked) from Worker, WorkLog where Worker.id = WorkLog.workerId group by Worker.name; //results of this query should be transformed to Multimap<Worker, Long> 2) select Worker.name, WorkLog.start, WorkLog.hoursWorked from Worker, WorkLog where Worker.id = WorkLog.workerId; //results of this query should be transformed to Multimap<Worker, Period> //if it was JDBC then it would be vitally //to set resultSet.setFetchSize (someSmallNumber), ~100 So, I have two questions: how to implement each of my approaches with JPA (or at least with Hibernate); how would you handle this problem (with JPA or Hibernate of course)?

    Read the article

  • [Python] How can I speed up unpickling large objects if I have plenty of RAM?

    - by conradlee
    It's taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file). Note that the file quickly loads into memory. In other words, if I run: import cPickle as pickle f = open("bigNetworkXGraph.pickle","rb") binary_data = f.read() # This part doesn't take long graph = pickle.loads(binary_data) # This takes ages How can I speed this last operation up? Note that I have tried pickling the data both in using both binary protocols (1 and 2), and it doesn't seem to make much difference which protocol I use. Also note that although I am using the "loads" (meaning "load string") function above, it is loading binary data, not ascii-data. I have 128gb of RAM on the system I'm using, so I'm hoping that somebody will tell me how to increase some read buffer buried in the pickle implementation.

    Read the article

  • How would you mask data returned in a Dynamic Data for Entities website?

    - by David Stratton
    I'm doing this in Visual Studio 2008, not 2010, in case there is a relevant difference between the two versions of the Dynamic Data websites. How would I mask data in the automatically generated tables in a Dynamic Data for Entities website? The scenario is we have one table where we want to allow users to ENTER sensitive data, but not VIEW sensitive data, so... (In the list below, I'm using "template" to mean "The web page generated automatically based on the schema and action. I'm sure that's the wrong terminology, but the meaning should be clear.) The "Insert" template should have the field's textbox available for the user to type a value in. The "Edit" template should have the field's textbox blanked out (empty string) regardless of what was in the field in the database in the first place, but the user should be able to type in new data and have it save The "View" template should either have the data for this field masked, or non-visible. The auto-generated table showing the list of records should also have this field masked or non-visible. I can do this easily with standard Web Forms, but I'm having a hard time figuring this out in the Dynamic Data site I'm working on. Masking data is such a common task, I have to believe Microsoft thought of this and provided a way to do it...

    Read the article

  • How can I optimize retrieving lowest edit distance from a large table in SQL?

    - by Matt
    Hey, I'm having troubles optimizing this Levenshtein Distance calculation I'm doing. I need to do the following: Get the record with the minimum distance for the source string as well as a trimmed version of the source string Pick the record with the minimum distance If the min distances are equal (original vs trimmed), choose the trimmed one with the lowest distance If there are still multiple records that fall under the above two categories, pick the one with the highest frequency Here's my working version: DECLARE @Results TABLE ( ID int, [Name] nvarchar(200), Distance int, Frequency int, Trimmed bit ) INSERT INTO @Results SELECT ID, [Name], (dbo.Levenshtein(@Source, [Name])) As Distance, Frequency, 'False' As Trimmed FROM MyTable INSERT INTO @Results SELECT ID, [Name], (dbo.Levenshtein(@SourceTrimmed, [Name])) As Distance, Frequency, 'True' As Trimmed FROM MyTable SET @ResultID = (SELECT TOP 1 ID FROM @Results ORDER BY Distance, Trimmed, Frequency) SET @Result = (SELECT TOP 1 [Name] FROM @Results ORDER BY Distance, Trimmed, Frequency) SET @ResultDist = (SELECT TOP 1 Distance FROM @Results ORDER BY Distance, Trimmed, Frequency) SET @ResultTrimmed = (SELECT TOP 1 Trimmed FROM @Results ORDER BY Distance, Trimmed, Frequency) I believe what I need to do here is to.. Not dumb the results to a temporary table Do only 1 select from `MyTable` Setting the results right in the select from the initial select statement. (Since select will set variables and you can set multiple variables in one select statement) I know there has to be a good implementation to this but I can't figure it out... this is as far as I got: SELECT top 1 @ResultID = ID, @Result = [Name], (dbo.Levenshtein(@Source, [Name])) As distOrig, (dbo.Levenshtein(@SourceTrimmed, [Name])) As distTrimmed, Frequency FROM MyTable WHERE /* ... yeah I'm lost */ ORDER BY distOrig, distTrimmed, Frequency Any ideas?

    Read the article

  • Website Live Chat Script - Interface With Windows Live Messenger?

    - by Sootah
    Hello again fellow StackOverflow users! What I am looking for currently is your opinions on the what the best live chat script would be for my website. I'd be using this on both my local computer repair (which I've not really set up yet) company's site, as well as my other technical support site that is geared more towards showing people how to fix whatever problem they have themselves. It'll be a little bit before I launch it on either site, as I need to find a good design for my local computer repair company's site, as well as redesign TweaksForGeeks.com's layout so that it's more magazine like. (If you have any suggestions on good Wordpress templates for these, that'd also be swell. :) ) Features I'd like for the live chat script to have: (if possible) Real-time posting of messages (or as close to real time as possible) Integration into my Windows Live Messenger account - By this I mean it'd be handy if instead of me having to man a separate open browser window and constantly check it for new messages, it'd be much more convenient if it would instead just allow me to receive and respond to messages via my Windows Live account the same way I'd chat with any other person on my contact list. Sound notification of new messages when the chat isn't the focused window - In the event that using Windows Messenger isn't possible then I'd need it to notify me upon receipt of a new message. Recording of chat transcripts - This is a MUST HAVE Easy customization - I'll likely want to modify how it looks and whatnot, in addition to the fact that I'm going to set it up so that the chat transcripts can be posted to my site easily in order to be useful to my other visitors later on. With all that in mind, what is available out there? I've not really researched this at all yet; aside from doing a cursory search just before I decided to ask my StackOverflow brothers. As always, thank you! -Sootah

    Read the article

  • Best way to keep a large number of hobby projects alive; open sourcing?

    - by Daan van Yperen
    Because my time is limited I can usually only focus on one or two of my hobby projects, while the others sit there wasting away. I am looking for a solution that would allow me to divide my time better. is open sourcing where I take the role of guiding the project realistic, or are there better solutions? In my case, one project has a reasonably sized community of users going for it but is currently closed source. There have been requests to open source it.

    Read the article

< Previous Page | 220 221 222 223 224 225 226 227 228 229 230 231  | Next Page >