Search Results

Search found 38088 results on 1524 pages for 'large scale project'.

Page 228/1524 | < Previous Page | 224 225 226 227 228 229 230 231 232 233 234 235 | Next Page >

Beginner Question: For extract a large subset of a table from MySQL, how does Indexing, order of tab

- by chongman

Sorry if this is too simple, but thanks in advance for helping. This is for MySQL but might be relevant for other RDMBSs tblA has 4 columns: colA, colB, colC, mydata, A_id It has about 10^9 records, with 10^3 distinct values for colA, colB, colC. tblB has 3 columns: colA, colB, B_id It has about 10^4 records. I want all the records from tblA (except the A_id) that have a match in tblB. In other words, I want to use tblB to describe the subset that I want to extract and then extract those records from tblA. Namely: SELECT a.colA, a.colB, a.colC, a.mydata FROM tblA as a INNER JOIN tblB as b ON a.colA=b.colA a.colB=b.colB ; It's taking a really long time (more than an hour) on a newish computer (4GB, Core2Quad, ubuntu), and I just want to check my understanding of the following optimization steps. ** Suppose this is the only query I will ever run on these tables. So ignore the need to run other queries. Now my questions: 1) What indexes should I create to optimize this query? I think I just need a multiple index on (colA, colB) for both tables. I don't think I need separate indexes for colA and colB. Another stack overflow article (that I can't find) mentioned that when adding new indexes, it is slower when there are existing indexes, so that might be a reason to use the multiple index. 2) Is INNER JOIN correct? I just want results where a match is found. 3) Is it faster if I join (tblA to tblB) or the other way around, (tblB to tblA)? This previous answer says that the optimizer should take care of that. 4) Does the order of the part after ON matter? This previous answer say that the optimizer also takes care of the execution order.

Read the article
Why can't I roll a loop in Javascript?

- by Carl Manaster

I am working on a web page that uses dojo and has a number (6 in my test case, but variable in general) of project widgets on it. I'm invoking dojo.addOnLoad(init), and in my init() function I have these lines: dojo.connect(dijit.byId("project" + 0).InputNode, "onChange", function() {makeMatch(0);}); dojo.connect(dijit.byId("project" + 1).InputNode, "onChange", function() {makeMatch(1);}); dojo.connect(dijit.byId("project" + 2).InputNode, "onChange", function() {makeMatch(2);}); dojo.connect(dijit.byId("project" + 3).InputNode, "onChange", function() {makeMatch(3);}); dojo.connect(dijit.byId("project" + 4).InputNode, "onChange", function() {makeMatch(4);}); dojo.connect(dijit.byId("project" + 5).InputNode, "onChange", function() {makeMatch(5);}); and change events for my project widgets properly invoke the makeMatch function. But if I replace them with a loop: for (var i = 0; i < 6; i++) dojo.connect(dijit.byId("project" + i).InputNode, "onChange", function() {makeMatch(i);}); same makeMatch() function, same init() invocation, same everything else - just rolling my calls up into a loop - the makeMatch function is never called; the objects are not wired. What's going on, and how do I fix it? I've tried using dojo.query, but its behavior is the same as the for loop case.

Read the article
How do I run NUnit in debug mode from Visual Studio?

- by Jon Cage

I've recently been building a test framework for a bit of C# I've been working on. I have NUnit set up and a new project within my workspace to test the component. All works well if I load up my unit tests from Nunit (v2.4), but I've got to the point where it would be really useful to run in debug mode and set some break points. I've tried the suggestions from several guides which all suggest changing the 'Debug' properties of the test project: Start external program: C:\Program Files\NUnit 2.4.8\bin\nunit-console.exe Command line arguments: /assembly: <full-path-to-solution>\TestDSP\bin\Debug\TestDSP.dll I'm using the console version there, but have tried the calling the GUI as well. Both give me the same error when I try and start debugging: Cannot start test project 'TestDSP' because the project does not contain any tests. Is this because I normally load \DSP.nunit into the Nunit GUI and that's where the tests are held? I'm beginning to think the problem may be that VS wants to run it's own test framework and that's why it's failing to find the NUnit tests? [Edit] To those asking about test fixtures, one of my .cs files in the TestDSP project looks roughly like this: namespace Some.TestNamespace { // Testing framework includes using NUnit.Framework; [TestFixture] public class FirFilterTest { /// <summary> /// Tests that a FirFilter can be created /// </summary> [Test] public void Test01_ConstructorTest() { ...some tests... } } } ...I'm pretty new to C# and the Nunit test framework so it's entirely possible I've missed some crucial bit of information ;-) [FINAL SOLUTION] The big problem was the project I'd used. If you pick: Other Languages->Visual C#->Test->Test Project ...when you're choosing the project type, Visual Studio will try and use it's own testing framework as far as I can tell. You should pick a normal c# class library project instead and then the instructions in my selected answer will work.

Read the article
How to store large string data into s SQL SERVER 2005 data base?

- by prateeksaluja20

Hello Experts, I have 100000 Ids to store into our DataBase.Id is in string format.each id contain 10 char.so what is the best data type is for this data? i have been used vrchar(max),text but my problem is not solved. so please experts help me.

Read the article
Alternate User select interface in django admin to reduce page size on large site?

- by David Eyk

I have a Django-based site with roughly 300,000 User objects. Admin pages for objects with a ForeignKey field to User take a very long time to load as the resulting form is about 6MB in size. Of course, the resulting dropdown isn't particularly useful, either. Are there any off-the-shelf replacements for handling this case? I've been googling for a snippet or a blog entry, but haven't found anything yet. I'd like to have a smaller download size and a more usable interface.

Read the article
ADO program to list members of a large group.

- by AlexGomez

Hi everyone, I'm attempting to list all the members in a Active Directory group using ADO. The problem I have is that many of these groups have over 1500 members and ADSI cannot handle more than 1500 items in a multi-valued attribute. Fortunately I came across Richard Muller's wonderful VBScript that handles more than 1500 members at http://www.rlmueller.net/DocumentLargeGroup.htm I modified his code as shown below so that I can list ALL the groups and its memberships in a certain OU. However, I'm keeping getting the exception shown below: "ADODB.Recordset: Item cannot be found in the collection corresponding to the requested name or ordinal." My program appears to get stuck at: strPath = adoRecordset.Fields("ADsPath").Value Set objGroup = GetObject(strPath) All I am doing above is issuing the query to get back a recordset consisting of the ADsPath for each group in the OU. It then walks through the recordset and grabs the ADsPath for the first group and store its in a variable named strPath; we then use the value of that variable to bind to the group account for that group. It really should work! Any idea why the code below doesn't work for me? Any pointers will be great appreciated. Thanks. Option Explicit Dim objRootDSE, strDNSDomain, adoCommand Dim adoConnection, strBase, strAttributes Dim strFilter, strQuery, adoRecordset Dim strDN, intCount, blnLast, intLowRange Dim intHighRange, intRangeStep, objField Dim objGroup, objMember, strName ' Determine DNS domain name. Set objRootDSE = GetObject("LDAP://RootDSE") 'strDNSDomain = objRootDSE.Get("DefaultNamingContext") strDNSDomain = "XXXXXXXX" ' Use ADO to search Active Directory. Set adoCommand = CreateObject("ADODB.Command") Set adoConnection = CreateObject("ADODB.Connection") adoConnection.Provider = "ADsDSOObject" adoConnection.Open = "Active Directory Provider" adoCommand.ActiveConnection = adoConnection adoCommand.Properties("Page Size") = 100 adoCommand.Properties("Timeout") = 30 adoCommand.Properties("Cache Results") = False ' Specify base of search. strBase = "<LDAP://" & strDNSDomain & ">" ' Specify the attribute values to retrieve. strAttributes = "member" ' Filter on objects of class "group" strFilter = "(&(objectClass=group)(samAccountName=*))" ' Enumerate direct group members. ' Use range limits to handle more than 1000/1500 members. ' Setup to retrieve 1000 members at a time. blnLast = False intRangeStep = 999 intLowRange = 0 IntHighRange = intLowRange + intRangeStep Do While True If (blnLast = True) Then ' If last query, retrieve remaining members. strQuery = strBase & ";" & strFilter & ";" _ & strAttributes & ";range=" & intLowRange _ & "-*;subtree" Else ' If not last query, retrieve 1000 members. strQuery = strBase & ";" & strFilter & ";" _ & strAttributes & ";range=" & intLowRange & "-" _ & intHighRange & ";subtree" End If adoCommand.CommandText = strQuery Set adoRecordset = adoCommand.Execute adoRecordset.MoveFirst intCount = 0 Do Until adoRecordset.EOF strPath = adoRecordset.Fields("ADsPath").Value Set objGroup = GetObject(strPath) For Each objField In adoRecordset.Fields If (VarType(objField) = (vbArray + vbVariant)) _ Then For Each strDN In objField.Value ' Escape any forward slash characters, "/", with the backslash ' escape character. All other characters that should be escaped are. strDN = Replace(strDN, "/", "\/") ' Check dictionary object for duplicates. 'If (objGroupList.Exists(strDN) = False) Then ' Add to dictionary object. 'objGroupList.Add strDN, True ' Bind to each group member, to find member's samAccountName Set objMember = GetObject("LDAP://" & strDN) ' Output group cn, group samaAccountName and group member's samAccountName. Wscript.Echo objMember.samAccountName intCount = intCount + 1 'End if Next End If Next adoRecordset.MoveNext Loop adoRecordset.Close ' If this is the last query, exit the Do While loop. If (blnLast = True) Then Exit Do End If ' If the previous query returned no members, then the previous ' query for the next 1000 members failed. Perform one more ' query to retrieve remaining members (less than 1000). If (intCount = 0) Then blnLast = True Else ' Setup to retrieve next 1000 members. intLowRange = intHighRange + 1 intHighRange = intLowRange + intRangeStep End If Loop

Read the article
How can I store large amount of data from a database to XML (memory problem)?

- by Andrija

First, I had a problem with getting the data from the Database, it took too much memory and failed. I've set -Xmx1500M and I'm using scrolling ResultSet so that was taken care of. Now I need to make an XML from the data, but I can't put it in one file. At the moment, I'm doing it like this: while(rs.next()){ i++; xmlStringBuilder.append("\n\t<row>"); xmlStringBuilder.append("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>"); xmlStringBuilder.append("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>"); xmlStringBuilder.append("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>"); //etc. xmlStringBuilder.append("\n\t</row>"); if (i%100000 == 0){ //stores the data to a file with the name i.xml storeKBR(xmlStringBuilder.toString(),i); xmlStringBuilder= null; xmlStringBuilder= new StringBuilder(); } and it works; I get 12 100 MB files. Now, what I'd like to do is to do is have all that data in one file (which I then compress) but if just remove the if part, I go out of memory. I thought about trying to write to a file, closing it, then opening, but that wouldn't get me much since I'd have to load the file to memory when I open it. P.S. If there's a better way to release the Builder, do let me know :)

Read the article
Setting CPU target to x86 on .NET 2.0 project adds .NET 3.5 dependencies.

- by AngryHacker

I have a project in VS2008 that targets .NET 2.0 framework. It was original set to build for AnyCPU. I changed it to x86 and for whatever reason, VS adds the following lines to .csproj: <ItemGroup> <BootstrapperPackage Include="Microsoft.Net.Client.3.5"> <Visible>False</Visible> <ProductName>.NET Framework Client Profile</ProductName> <Install>false</Install> </BootstrapperPackage> ... ... <BootstrapperPackage Include="Microsoft.Net.Framework.3.5.SP1"> <Visible>False</Visible> <ProductName>.NET Framework 3.5 SP1</ProductName> <Install>false</Install> </BootstrapperPackage> </ItemGroup> Can someone explain as to why this is being added and whether I can safely remove it, as I still have to target the .NET 2.0 framework. Thanks.

Read the article
How can I change the Python that my Django project is using?

- by Burak

I have 2 versions installed in my server. I used virtualenv to install Python 2.7. I am using WSGI to deploy my project. WSGIPythonPath /home/ENV/lib/python2.7/site-packages WSGIScriptAlias / /var/www/html/my_project/wsgi.py My http.conf is like that. python -V gives Python 2.7.3 But in my projects Debug window, it says Django is using 2.6.8. Where am I wrong? UPDATE: Here is my wsgi file import os import sys sys.path.append('/var/www/html') os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") from django.core.wsgi import get_wsgi_application application = get_wsgi_application() Python Version: 2.6.8 Python Path: ['/home/ENV/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg', '/home/ENV/lib/python2.7/site-packages/pip-1.1-py2.7.egg', '/home/ENV/lib/python2.7/site-packages/Django-1.4-py2.7.egg', '/home/ENV/lib/python2.7/site-packages', '/usr/lib/python2.6/site-packages/pip-1.1-py2.6.egg', '/usr/lib/python2.6/site-packages/django_transmeta-0.6.7-py2.6.egg', '/usr/lib/python2.6/site-packages/ipython-0.13-py2.6.egg', '/usr/lib/python2.6/site-packages/virtualenv-1.7.2-py2.6.egg', '/usr/lib64/python26.zip', '/usr/lib64/python2.6', '/usr/lib64/python2.6/plat-linux2', '/usr/lib64/python2.6/lib-tk', '/usr/lib64/python2.6/lib-old', '/usr/lib64/python2.6/lib-dynload', '/usr/lib64/python2.6/site-packages', '/usr/lib64/python2.6/site-packages/gtk-2.0', '/usr/lib/python2.6/site-packages', '/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info', '/var/www/html'] In my error_log of httpd: [Tue Jul 10 20:51:29 2012] [error] python_init: Python version mismatch, expected '2.6.7', found '2.6.8'. [Tue Jul 10 20:51:29 2012] [error] python_init: Python executable found '/usr/bin/python'. [Tue Jul 10 20:51:29 2012] [error] python_init: Python path being used '/usr/lib64/python26.zip:/usr/lib64/python2.6/:/usr/lib64/python2.6/plat-linux2:/usr/lib64/python2.6/lib-tk:/usr/lib64/python2.6/lib-old:/usr/lib64/python2.6/lib-dynload'.

Read the article
Why use hashing to create pathnames for large collections of files?

- by Stephen

Hi, I noticed a number of cases where an application or database stored collections of files/blobs using a has to determine the path and filename. I believe the intended outcome is a situation where the path never gets too deep, or the folders ever get too full - too many files (or folders) in a folder making for slower access. EDIT: Examples are often Digital libraries or repositories, though the simplest example I can think of (that can be installed in about 30s) is the Zotero document/citation database. Why do this? EDIT: thanks Mat for the answer - does this technique of using a hash to create a file path have a name? Is it a pattern? I'd like to read more, but have failed to find anything in the ACM Digital Library

Read the article
What is the impact/limitation of oracle select with large number of bind variables?

- by Igal Serban

We had our oracle server chocking during processing a select statement with close to 3500(!!) bind variables. This select is, obviously, build dynamically by code that we can't change. During the execution of this select the db server went to 100% cpu usage and our system almost halted. We know how to reproduce this problem. So we can prevent this specific condition. But I am wondering if there is a way to protect the db ( by configuration) from this type of problems.

Read the article
How to map (large) integer on (small in size( alphanumeric string with PHP? (Cantor?)

- by Glooh

Dear all, I can't figure out how to optimally do the following in PHP: In a database, I have messages with a unique ID, like 19041985. Now, I want to refer to these messages in a short-url service but not using generated hashes but by simply 'calculate' the original ID. In other words, for example: http://short.url/sYsn7 should let me calculate the message ID the visitor would like to request. To make it more obvious, I wrote the following in PHP to generate these 'alphanumeric ID versions' and of course, the other way around will let me calculate the original message ID. The question is: Is this the optimal way of doing this? I hardly think so, but can't think of anything else. $alphanumString = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_'; for($i=0;$i < strlen($alphanumString);$i++) { $alphanumArray[$i] = substr($alphanumString,$i,1); } $id = 19041985; $out = ''; for($i=0;$i < strlen($id);$i++) { if(isset($alphanumString["".substr($id,$i,2).""]) && strlen($alphanumString["".substr($id,$i,2).""]) 0) { $out.=$alphanumString["".substr($id,$i,2).""]; } else { $out.=$alphanumString["".substr($id,$i,1).""]; $out.=$alphanumString["".substr($id,($i+1),1).""]; } $i++; } print $out;

Read the article
How can I shared controller logic in ASP.NET MVC for 2 controllers, where they are overriden

- by AbeP

Hello, I am trying to implement user-friendly URLS, while keeping the existing routes, and was able to do so using the ActionName tag on top of my controller (http://stackoverflow.com/questions/436866/can-you-overload-controller-methods-in-asp-net-mvc) I have 2 controllers: ActionName("UserFriendlyProjectIndex")] public ActionResult Index(string projectName) { ... } public ActionResult Index(long id) { ... } Basically, what I am trying to do is I store the user-friendly URL in the database for each project. If the user enters the URL /Project/TopSecretProject/, the action UserFriendlyProjectIndex gets called. I do a database lookup and if everything checks out, I want to apply the exact same logic that is used in the Index action. I am basically trying to avoid writing duplicate code. I know I can separate the common logic into another method, but I wanted to see if there is a built-in way of doing this in ASP.NET MVC. Any suggestions? I tried the following and I go the View could not be found error message: [ActionName("UserFriendlyProjectIndex")] public ActionResult Index(string projectName) { var filteredProjectName = projectName.EscapeString().Trim(); if (string.IsNullOrEmpty(filteredProjectName)) return RedirectToAction("PageNotFound", "Error"); using (var db = new PIMPEntities()) { var project = db.Project.Where(p => p.UserFriendlyUrl == filteredProjectName).FirstOrDefault(); if (project == null) return RedirectToAction("PageNotFound", "Error"); return View(Index(project.ProjectId)); } } Here's the error message: The view 'UserFriendlyProjectIndex' or its master could not be found. The following locations were searched: ~/Views/Project/UserFriendlyProjectIndex.aspx ~/Views/Project/UserFriendlyProjectIndex.ascx ~/Views/Shared/UserFriendlyProjectIndex.aspx ~/Views/Shared/UserFriendlyProjectIndex.ascx Project\UserFriendlyProjectIndex.spark Shared\UserFriendlyProjectIndex.spark I am using the SparkViewEngine as the view engine and LINQ-to-Entities, if that helps. thank you!

Read the article
where can i get large number of proxy ip's ?

- by wefwgeweg

i need a a long list of working proxy ip's to get around ip banning. where can i find it ?

Read the article
Parse large XML file w/ script or use BioPython API ?

- by jeremy04

Hey guys this is my first question on here. I'm trying to make a local copy of the UniprotKB in SQL. The UniprotKB is 2.1GB, and it comes in XML and a special text format used by SwissProt Here are my options: 1) Use a SAX parser (XML) - I chose Ruby, and Nokogiri. I started writing the parser, but my initial reaction: how would I map the XML schema to the SAX parser? 2) BioPython - I already have BioSQL/Biopython installed, which literally created my SQL schema for me, and I was able to successfully insert one SwissProt/Uniprot txt file into the database. I'm running it right now (crosses fingers) on the entire 2.1gb. Here is the code I'm running: from Bio import SeqIO from BioSQL import BioSeqDatabase from Bio import SwissProt server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "root", passwd = "", host="localhost", db = "bioseqdb") db = server["uniprot"] iterator = SeqIO.parse(open("/path/to/uniprot_sprot.dat", "r"), "swiss") db.load(iterator) server.commit() Edit: it's now crashing because the transactions are getting locked (since the tables are Innodb) Error Number: 1205 Lock wait timeout exceeded; try restarting transaction. I'm using MySQL version: 5.1.43 Should I switch my database to Postgrelsql ?

Read the article
What if a large number of objects are passed to my SwingWorker.process() method?

- by Trejkaz

I just found an interesting situation. Suppose you have some SwingWorker (I've made this one vaguely reminiscent of my own): public class AddressTreeBuildingWorker extends SwingWorker<Void, NodePair> { private DefaultTreeModel model; public AddressTreeBuildingWorker(DefaultTreeModel model) { } @Override protected Void doInBackground() { // Omitted; performs variable processing to build a tree of address nodes. } @Override protected void process(List<NodePair> chunks) { for (NodePair pair : chunks) { // Actually the real thing inserts in order. model.insertNodeInto(parent, child, parent.getChildCount()); } } private static class NodePair { private final DefaultMutableTreeNode parent; private final DefaultMutableTreeNode child; private NodePair(DefaultMutableTreeNode parent, DefaultMutableTreeNode child) { this.parent = parent; this.child = child; } } } If the work done in the background is significant then things work well - process() is called with relatively small lists of objects and everything is happy. Problem is, if the work done in the background is suddenly insignificant for whatever reason, process() receives a huge list of objects (I have seen 1,000,000, for instance) and by the time you process each object, you have spent 20 seconds on the Event Dispatch Thread, exactly what SwingWorker was designed to avoid. In case it isn't clear, both of these occur on the same SwingWorker class for me - it depends on the input data, and the type of processing the caller wanted. Is there a proper way to handle this? Obviously I can intentionally delay or yield the background processing thread so that a smaller number might arrive each time, but this doesn't feel like the right solution to me.

Read the article
How-to Get Best Performance With Large Amount Of Data In aspx page?

- by Vibin Jith

I want to display 1000 rows and 80 rows in a GridView. I am little bit worried about the Performance of the application. If you know how to make it better performance ,please tell hw to do that in detailed way.

Read the article
How can I get a iterable resultset from the database using pdo, instead of a large array?

- by Tchalvak

I'm using PDO inside a database abstraction library function query. I'm using fetchAll(), which if you have a lot of results, can get memory intensive, so I want to provide an argument to toggle between a fetchAll associative array and a pdo result set that can be iterated over with foreach and requires less memory (somehow). I remember hearing about this, and I searched the PDO docs, but I couldn't find any useful way to do that. Does anyone know how to get an iterable resultset back from PDO instead of just a flat array? And am I right that using an iterable resultset will be easier on memory? I'm using Postgresql, if it matters in this case. . . . The current query function is as follows, just for clarity. /** * Running bound queries on the database. * * Use: query('select all from players limit :count', array('count'=>10)); * Or: query('select all from players limit :count', array('count'=>array(10, PDO::PARAM_INT))); **/ function query($sql_query, $bindings=array()){ DatabaseConnection::getInstance(); $statement = DatabaseConnection::$pdo->prepare($sql_query); foreach($bindings as $binding => $value){ if(is_array($value)){ $statement->bindParam($binding, $value[0], $value[1]); } else { $statement->bindValue($binding, $value); } } $statement->execute(); // TODO: Return an iterable resultset here, and allow switching between array and iterable resultset. return $statement->fetchAll(PDO::FETCH_ASSOC); }

Read the article
Why does File::Find finished short of completely traversing a large directory?

- by Stan

A directory exists with a total of 2,153,425 items (according to Windows folder Properties). It contains .jpg and .gif image files located within a few subdirectories. The task was to move the images into a different location while querying each file's name to retrieve some relevant info and store it elsewhere. The script that used File::Find finished at 20462 files. Out of curiosity I wrote a tiny recursive function to count the items which returned a count of 1,734,802. I suppose the difference can be accounted for by the fact that it didn't count folders, only files that passed the -f test. The problem itself can be solved differently by querying for file names first instead of traversing the directory. I'm just wondering what could've caused File::Find to finish at a small fraction of all files. The data is stored on an NTFS file system.

Read the article
Best practice for handling memory leaks in large Java projects?

- by knorv

In almost all larger Java projects I've been involved with I've noticed that the quality of service of the application degrades with the uptime of the container. This is most probably due to memory leaks in the code. The correct way to solve this problem is obviously to trace back to the root cause of the problem and fix the leaks in the code. The quick and dirty way of solving the problem is simply restarting Tomcat (or whichever servlet container you're using). These are my three questions: Assume that you choose to solve the problem by tracing the root cause of the problem (the memory leaks), how would you collect data to zoom in on the problem? Assume that you choose the quick and dirty way of speeding things up by simply restarting the container, how would you collect data to choose the optimal restart cycle? Have you been able to deploy and run projects over an extended period of time without ever restarting the servlet container to regain snappiness? Or is an occasional servlet restart something that one has to simply accept?

Read the article
What to use to check html links in large project, on Linux?

- by depesz

I have directory with 1000 .html files, and would like to check all of them for bad links - preferably using console. Any tool you can recommend for such task?

Read the article
Is is possible to parse a web page from the client side for a large number of words and if so, how?

- by Technoh

I have a list of keywords, about 25,000 of them. I would like people who add a certain < script tag on their web page to have these keywords transformed into links. What would be the best way to go and achieve this? I have tried the simple javascript approach (an array with lots of elements and regexping/replacing each) and it obviously slows down the browser. I could always process the content server-side if there was a way, from the client, to send the page's content to a cross-domain server script (I'm partial to PHP but it could be anything) but I don't know of any way to do this. Any other working solution is also welcome.

Read the article
What's the most scalable way to handle somewhat large file uploads in a Python webapp?

- by Jason Baker

We have a web application that takes file uploads for some parts. The file uploads aren't terribly big (mostly word documents and such), but they're much larger than your typical web request and they tend to tie up our threaded servers (zope 2 servers running behind an Apache proxy). I'm mostly in the brainstorming phase right now and trying to figure out a general technique to use. Some ideas I have are: Using a python asynchronous server like tornado or diesel or gunicorn. Writing something in twisted to handle it. Just using nginx to handle the actual file uploads. It's surprisingly difficult to find information on which approach I should be taking. I'm sure there are plenty of details that would be needed to make an actual decision, but I'm more worried about figuring out how to make this decision than anything else. Can anyone give me some advice about how to proceed with this?

Read the article
Does anybody have any suggestions on which of these two approaches is better for large delete?

- by RPS

Approach #1: DECLARE @count int SET @count = 2000 DECLARE @rowcount int SET @rowcount = @count WHILE @rowcount = @count BEGIN DELETE TOP (@count) FROM ProductOrderInfo WHERE ProductId = @product_id AND bCopied = 1 AND FileNameCRC = @localNameCrc SELECT @rowcount = @@ROWCOUNT WAITFOR DELAY '000:00:00.400' Approach #2: DECLARE @count int SET @count = 2000 DECLARE @rowcount int SET @rowcount = @count WHILE @rowcount = @count BEGIN DELETE FROM ProductOrderInfo WHERE ProductId = @product_id AND FileNameCRC IN ( SELECT TOP(@count) FileNameCRC FROM ProductOrderInfo WITH (NOLOCK) WHERE bCopied = 1 AND FileNameCRC = @localNameCrc ) SELECT @rowcount = @@ROWCOUNT WAITFOR DELAY '000:00:00.400' END

Read the article
Syncing large personal school-material -git-repo with things such as casual notes? Rsync, wget and Git -- or some ready tool?

- by hhh

My friend wants to store electrically her school -notes and process them fast, with backups. She has over 2GB -size repo already and growing all the time (mostly appended material i.e. more school notes, different formats, pdf, pictures and scanned, some text -files, etc). The goal of my friend is to process fast the notes. I suggested command like this here i.e. "# crontab -e @weekly wget --random-wait -e robots=off -U mozilla -mirror http://VeryLong.com". But I think plugging in Rsync somewhere could make it much better with Git. How would you help my friend to process and store the school -material under Git-version-controlling and still keep the size reasonable? Perhaps related rsync .git directory rsync git big repository Different scope Git/rsync mix for projects with large binaries and text files What's a good way to organize a large collection of personal scripts using git?

Read the article

< Previous Page | 224 225 226 227 228 229 230 231 232 233 234 235 | Next Page >