crawl - Page 15 - Developer IT

Is it so bad to have heaps of elements in your DOM?

- by alex

I am making a real estate non interactive display for their shop window. I have kicked jCarousel into doing what I want: Add panels per AJAX Towards the end of the current set, go and AJAX some new panels and insert them This works fine, but it appears calling jQuery's remove() on the prior elements cause an ugly bump. I'm not sure if calling hide() will free up any resources, as the element will still exist (and the element will be off screen anyway). I've seen this, and tried carousel.reset() from within a callback. It just clears out all the elements. This will be running on Google Chrome on Windows XP, and will solely by displaying on LCD televisions. I am wondering, if I can't find a reasonable solution to remove the extra DOM elements, will it bring my application to a crawl, or will Chrome do some clever garbage collecting? Or, how would you solve this problem? Thanks

Read the article

How to Identify the website's content language

- by Ajay

I am developing a website to crawl the other website content in ASP.NET . I am able to get the content correctly but how can I identify which language is used based on that content. I used following code. HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(TextBox1.Text ); request.UserAgent = "A .NET Web Crawler"; WebResponse response = request.GetResponse(); Stream stream = response.GetResponseStream(); StreamReader reader = new StreamReader(stream); string htmlText = reader.ReadToEnd();

Read the article

VB.Net HTTPWebRequest Speed is slow comparing Python URLOpen

- by regexhacks

Hi I am coding a web-crawler which will crawl the websites and selectively parse different sections of a web site. I am a .Net developer so the choice was obvious that I did it in .Net but the speed was very slow which included downloading and parsing of HTMLPages Then I tried to just download the contents first using .Net and then same domains using python but the python was very impressive in downloading data. I have achieved downloading using python but the later part is not that easy to code in python, which obviously i don't want to do. The same batch of domain which took 100 seconds in Python was taking 20 minutes in .Net based crawler I tried http://www.eqlit.com/ to download and in took 8 seconds in Python and same was taking 100 Seconds in .Net crawler Does anyone anyone have any idea why this is slow in .Net but fast in python?

Read the article

Hiding part of a page from Search Server 2010 Express

- by Jonathan

I'm working on a soon-to-be-public-facing site, and we want to have our search live on day 1, and want it to be searchable but non-public during testing, so we're planning to use something whose crawling we can control -- Search Server 2010 Express. However, if I search for something in my top navigation bar, I get nearly every page as a hit. It kinda makes sense, as every page has that content, but it's completely irrelevant on most pages. I want it to crawl through my navigation, but ignore the text within the navigation for search results. I was hoping that it'd just figure that out on it's own (the HTML for the top nav is static), but it's apparently not. Is there some standard thing I can put in my HTML that will achieve the effect I'm going for? On a side note: when I go live, will I have the same problem with public search engines, or do they tend to be smarter?

Read the article

SqlCeCommand ExecuteNonQuery performance issue

- by Michael

I've been asked to resolve an issue with a .Net/SqlServerCe application. Specifically, after repeated inserts against the db, performance becomes increasingly degraded. In one instance at ~200 rows, in another at ~1000 rows. In the latter case the code being used looks like this: Dim cm1 As System.Data.SqlServerCe.SqlCeCommand = cn1.CreateCommand cm1.CommandText = "INSERT INTO Table1 Values(?,?,?,?,?,?,?,?,?,?,?,?,?)" For j = 0 To ds.Tables(0).Rows.Count - 1 'this is 3110 For i = 0 To 12 cm1.Parameters(tbl(i, 0)).Value = Vals(j,i) 'values taken from a different db Next cm1.ExecuteNonQuery() Next The specifics aren't super important (like what 'tbl' is, etc) but rather whether or not this code should be expected to handle this number of inserts, or if the crawl I'm witnessing is to be expected.

Read the article

Is it possible to build this type of program in PHP?

- by Steven

I want to build a QA program that will crawl all the pages of a site (all files under a specified domain name), and it will return all external links on the site that doesn't open in a new window (does not have the target="_blank" attribute in the href). I can make a php or javascript to open external links in new windows or to report all problem links that don't open in new windows of a single page (the same page the script is in) but what I want is for the QA tool to go and search all pages of a website and report back to me what it finds. This "spidering" is what I have no idea how to do, and am not sure if it's even possible to do with a language like PHP. If it's possible how can I go about it?

Read the article

User activity vs. System activity on the Index Usage Statistics report

- by Zachary G Jensen

I recently decided to crawl over the indexes on one of our most heavily used databases to see which were suboptimal. I generated the built-in Index Usage Statistics report from SSMS, and it's showing me a great deal of information that I'm unsure how to understand. I found an article at Carpe Datum about the report, but it doesn't tell me much more than I could assume from the column titles. In particular, the report differentiates between User activity and system activity, and I'm unsure what qualifies as each type of activity. I assume that any query that uses a given index increases the '# of user X' columns. But what increases the system columns? building statistics? Is there anything that depends on the user or role(s) of a user that's running the query?

Read the article

Manager property is not available for Full Text Search (SharePoint 2010)

- by Vijay

Hi, I had created a web part on MOSS 2007 which displays a organizational chart by searching (Full Text) the user profiles. To identify the subordinates of a user, I used to search for users with the particular user in Manager property. The query looked like this: SELECT AccountName, PreferredName, Manager, WorkEmail FROM scope() WHERE ("SCOPE" = 'People') AND Manager = 'domain\parent_user' But, the same query does not run in SharePoint 2010 as Manager crawled property does not exists. So, I created a new crawled property and mapped it to People:Manager(Text) now, the Manager property is always empty. Even a full crawl after clearing the indexes also not helping. Can anyone please help me in getting manager information in Full Text Search? Thanks in advance!

Read the article

Is it possible to set path of database for delayed job in rails?

- by WitchOfCloud

Now, I am developing with mailing system with delayed_jobs gem. When I ran on developing environment, it operated well. But, after deploying application on server, it is not acted. This is my database.yml development: adapter: sqlite3 database: db/development.sqlite3 pool: 5 timeout: 5000 test: adapter: sqlite3 database: db/test.sqlite3 pool: 5 timeout: 5000 production: adapter: sqlite3 database: /var/www/service/shared/db/production.sqlite3 pool: 5 timeout: 5000 I checked queue(in /var/www/...) and it act well. Also, I started delayed_jobs(rake jobs:work). So, I think that problem is delayed_job crawl db/development.sqlite3 How can solve this problem?

Read the article

Xubuntu 13.10 64bit - Slow and buggy "log out" process?

- by MrKatSwordfish

I'm a Windows convert who has done only a little bit of dabbling in Ubuntu in the past (back in Dapper Drake a few years back). A lot has changes since then, and I've been yearning to jump back into linux again! So, having just bought a new SSD, I felt that this would be as good of a time as any to set up a dual-boot system again. I've messed around with Ubuntu 13.10 a bit, and while Unity has its issues, I think that it still needs some time to develop. I looked into XFCE and liked it a lot, so I went with Xubuntu. I've installed Xubuntu, and for the most part it's running smoothly and it a pleasure to work with. The customization is great and the minimalistic look and feel is really nice! But here's my problem, whenever I select the "Log Out" option from either the application menu, or the user profiles menu, my PC comes to a crawl, and the dialog box with all the options (shut down, restart, log out, etc.) takes maybe a minute or more to appear. I click the log out button, my PC is brought to a snail's pace, and I have to wait for what seems like an eternity for the logout options to appear! If i try to open something else (even a terminal window) while it's loading the logout options, that other program won't finish loading until the logout screen finally appears. Keep in mind, this is a pretty much vanilla install of Xubuntu 13.10 64bit, on a PC with an intel i7, an SSD, 6gb DDR3 RAM, and a new AMD 7770 gpu (drivers haven't been installed yet, though). Everything else runs fast, most applications open near-instantly! It must be an issue with how the logout options screen initializes or something, but I'm not sure exactly how I can fix it.. Edit - Extra Info: This problem is very consistent when using the "Log Out" buttons in Xubuntu. However, I've found that I'm able to reboot and shutdown much more quickly by going through the "Switch User" screen, and using the reboot or shutdown buttons on that screen. I'm nearly certain that it has something to do with the little Log Out options screen that appears when I select Log Out from the menu, and not the actual process of shutting down.. So what should I do? I really like XFCE so far, and I've never tried a non-ubuntu based distro before, but should I just switch to something else? Is there any known fix for this issue? Are there any work-arounds for logging out/shutting down/rebooting via the terminal so that I can avoid this irritating bug? Is there any that I can monitor the progress of the log out via terminal, allowing me to see which parts are causing the slow-down? What is the best way to report this bug to someone?

Read the article

An increase to 3 Gig of RAM slows down Ubuntu 10.04 LTS

- by williepabon

I have Ubuntu 10.04 running from an external hard drive (installed on an enclosure) connected via USB port. Like a month or so ago, I increased RAM on my pc from 2 Gigs to 3 Gigs. This resulted on extremely long boot times and slow application loads. While I was understanding the nature of my problem, I posted various threads on this forum ( Questions # 188417, 188801), where I was advised to gather speed tests, and other info on my machine. I was also suggested that I might have problems with the RAM installed. Initially, I did not consider that possibility because: 1) I did a memory test with a diagnostic program from DELL (My pc is from Dell) 2) My pc works fine with Windows XP (the default OS), no problems with memory 3) My pc works fine when booting with Ubuntu 10.10 memory stick, no speed problems 4) My pc works fine when booting with Ubuntu 11.10 memory stick, no speed problems Anyway, I performed the memory tests suggested. But before doing it, and to check out any possibility of hardware issues on the hard drive, I did the following: (1) purchased a new hard drive enclosure and moved my hard drive to it, (2) purchased a new USB cable and used it to connect my hard drive/enclosure setup to a different USB port on my pc. Then, I performed speed tests with 1 Gig, 2 Gigs and 3 Gigs of RAM with my Ubuntu 10.04 OS. Ubuntu 10.04 worked well when booted with 1 Gig or 2 Gigs of RAM. When I increased to 3 Gigs, it slowed down to a crawl. I can't understand the relationship between an increase of 1 Gig and the effect it has in Ubuntu 10.04. This doesn't happen with Ubuntu 10.10 and 11.10. Unfortunately for me, Ubuntu 10.04 is my principal work operating system. So, I need a solution for this. Hardware and system information: DELL Precision 670 2 internal SATA Hard drives Audigy 2 ZS audio system Factory OS: Windows XP Professional SP3 NVidia 8400 GTS video card More info: williepabon@WP-WrkStation:~$ uname -a Linux WP-WrkStation 2.6.32-38-generic #83-Ubuntu SMP Wed Jan 4 11:13:04 UTC 2012 i686 GNU/Linux williepabon@WP-WrkStation:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 10.04.4 LTS Release: 10.04 Codename: lucid Speed test with the 3 Gigs of RAM installed: williepabon@WP-WrkStation:~$ sudo hdparm -tT /dev/sdc [sudo] password for williepabon: /dev/sdc: Timing cached reads: 84 MB in 2.00 seconds = 41.96 MB/sec Timing buffered disk reads: 4 MB in 3.81 seconds = 1.05 MB/sec This is a very slow transfer rate from a hard drive. I will really appreciate a solution or a work around for this problem. I know that that there are users that have Ubuntu 10.04 with 3 Gigs or more of RAM and they don't have this problem. Same question asked on Launchpad for reference.

Read the article

Site Search Engine for 1,000 page website

- by Ian

I manage a website with about 1,000 articles that need to be searchable by my members. The site search engines I've tried all had their own problems: Fluid Dynamics Search Engine Since it's written in perl, it was a bit hacky to integrate with my PHP-based CMS. I basically had to file_get_contents the search results page. However, FDSE had the best search results. Google CSE Ugh, the search results SUCK. It can't find documents even using unique strings. I'm so surprised that a Google search product is this bad. Nor can I get any answers on their 'help' forums, and I am a paying user. Boo, Google. Boo. Sphider Again, bad search results. Unable to locate some phrases used in link text. Better results than Google CSE though. Shame on Google that a free PHP script has better search results than their paid application. IndexTank This one looked really promising. I got all set up with their PHP API client. But it would only randomly add articles that I submitted. Out of 700+ articles I pushed to the index through their API, only 8 made it in. Unable to find any help on this subject. Update for IndexTank -- Got the above issue fixed, so this looks most promising so far. The site itself runs on php/mysql and FreeBSD, though this shouldn't matter for a web crawling indexer. I've looked at Lucene, but I don't know anything about Java or installing Java programs on my web server. I also do not have root access on my web server, if this would be required for installation. I really don't need a lot of fancy features. It just needs to be able to crawl my web site and return great (even decent!) search results. I don't need any crazy search operators. It doesn't need to index off my primary domain. It just needs to work! Thanks, Hive Mind!

Read the article

Architecture strategies for a complex competition scoring system

- by mikewassmer

Competition description: There are about 10 teams competing against each other over a 6-week period. Each team's total score (out of a 1000 total available points) is based on the total of its scores in about 25,000 different scoring elements. Most scoring elements are worth a small fraction of a point and there will about 10 X 25,000 = 250,000 total raw input data points. The points for some scoring elements are awarded at frequent regular time intervals during the competition. The points for other scoring elements are awarded at either irregular time intervals or at just one moment in time. There are about 20 different types of scoring elements. Each of the 20 types of scoring elements has a different set of inputs, a different algorithm for calculating the earned score from the raw inputs, and a different number of total available points. The simplest algorithms require one input and one simple calculation. The most complex algorithms consist of hundreds or thousands of raw inputs and a more complicated calculation. Some types of raw inputs are automatically generated. Other types of raw inputs are manually entered. All raw inputs are subject to possible manual retroactive adjustments by competition officials. Primary requirements: The scoring system UI for competitors and other competition followers will show current and historical total team scores, team standings, team scores by scoring element, raw input data (at several levels of aggregation, e.g. daily, weekly, etc.), and other metrics. There will be charts, tables, and other widgets for displaying historical raw data inputs and scores. There will be a quasi-real-time dashboard that will show current scores and raw data inputs. Aggregate scores should be updated/refreshed whenever new raw data inputs arrive or existing raw data inputs are adjusted. There will be a "scorekeeper UI" for manually entering new inputs, manually adjusting existing inputs, and manually adjusting calculated scores. Decisions: Should the scoring calculations be performed on the database layer (T-SQL/SQL Server, in my case) or on the application layer (C#/ASP.NET MVC, in my case)? What are some recommended approaches for calculating updated total team scores whenever new raw inputs arrives? Calculating each of the teams' total scores from scratch every time a new input arrives will probably slow the system to a crawl. I've considered some kind of "diff" approach, but that approach may pose problems for ad-hoc queries and some aggegates. I'm trying draw some sports analogies, but it's tough because most games consist of no more than 20 or 30 scoring elements per game (I'm thinking of a high-scoring baseball game; football and soccer have fewer scoring events per game). Perhaps a financial balance sheet analogy makes more sense because financial "bottom line" calcs may be calculated from 250,000 or more transactions. Should I be making heavy use of caching for this application? Are there any obvious approaches or similar case studies that I may be overlooking?

Read the article

How to get tens of millions of pages indexed by Google bot?

- by Chris Adragna

We are currently developing a site that currently has 8 million unique pages that will grow to about 20 million right away, and eventually to about 50 million or more. Before you criticize... Yes, it provides unique, useful content. We continually process raw data from public records and by doing some data scrubbing, entity rollups, and relationship mapping, we've been able to generate quality content, developing a site that's quite useful and also unique, in part due to the breadth of the data. It's PR is 0 (new domain, no links), and we're getting spidered at a rate of about 500 pages per day, putting us at about 30,000 pages indexed thus far. At this rate, it would take over 400 years to index all of our data. I have two questions: Is the rate of the indexing directly correlated to PR, and by that I mean is it correlated enough that by purchasing an old domain with good PR will get us to a workable indexing rate (in the neighborhood of 100,000 pages per day). Are there any SEO consultants who specialize in aiding the indexing process itself. We're otherwise doing very well with SEO, on-page especially, besides, the competition for our "long-tail" keyword phrases is pretty low, so our success hinges mostly on the number of pages indexed. Our main competitor has achieved approx 20MM pages indexed in just over one year's time, along with an Alexa 2000-ish ranking. Noteworthy qualities we have in place: page download speed is pretty good (250-500 ms) no errors (no 404 or 500 errors when getting spidered) we use Google webmaster tools and login daily friendly URLs in place I'm afraid to submit sitemaps. Some SEO community postings suggest a new site with millions of pages and no PR is suspicious. There is a Google video of Matt Cutts speaking of a staged on-boarding of large sites, too, in order to avoid increased scrutiny (at approx 2:30 in the video). Clickable site links deliver all pages, no more than four pages deep and typically no more than 250(-ish) internal links on a page. Anchor text for internal links is logical and adds relevance hierarchically to the data on the detail pages. We had previously set the crawl rate to the highest on webmaster tools (only about a page every two seconds, max). I recently turned it back to "let Google decide" which is what is advised.

Read the article

How exactly is Google Webmaster Tools measuring "Site Performance"?

- by Rémi

I've been working for two months now on improving our response time (mainly server side) on a new forum (a brand new product on a technical point of view) we've launched in Germany a few month ago and I'm a lot surprised by the results I get. I monitor our response time using Apache logs and our own implementation of Boomerang beacon. Using my stats, I can see that our new product responds in about 680 ms where our old product was responding in about 1050 ms. On the other side, Google Webmaster Tool tells us that our pages have an average reponse time of about 1500 ms today where it was 700 three months ago with our old product. I've figured that GWT was taking client side metrics into account so I've added some measures on our Boomerang beacon and everything looks just fine. I've also ran some random pages on ySlow and Google's Page Speed and everything looks better than it was before. We event have a 82% on Google's Page Speed tool which is quite cool for a site with some ads in it :) Lately, we have signed a deal with Akamai to use two of their products : CDN for our static files (we were using another CDN before but it wasn't very effective) and RMA to improve Networks routes. We have also introduced a new agressive cache mecanism to ensure that most of the pages served to crawlers are cached by our memcache grid. After checking my metrics, it seems that this changes have improved from 650ms to about 500ms, which is good (still not great but it is definitly an improvement). But webmaster tools continues to report an increasing average response time where we see it decreasing in the same time. Have you ever had the same kind of wierd behavior on your sites while doing performance improvements ? Do you have any idea how to monitor the same thing Google does with Site Performance in Google Webmaster Tools so that we could improve our site and constantly check if it is what Google wants ? Edit 2011/07/26 : Thanks for your answers guys ! Nevertheless, I was not precise enough. The main issue we have is not with the Site Performance page but with the Crawl Stats one for now. We probably found an issue on our side with some very slow pages (around 3000 ms !!) and we are trying to fix them. I'll keep you posted as soon I'll have some infos. Thanks again !

Read the article

I didn't mean to become a database developer, but now I am. Should I stop or try to get better?

- by pretlow majette

20 years ago I couldn't afford even a cheap POS program when I opened my first surf shop in the Virgin Islands. I bought a copy of Paradox (remember that?) in 1990 and spent months in a back room scratching out a POS application. Over many iterations, including a switch to Access (2000)/SQL Server (2003), I built a POS and backoffice solution that runs four stores with multiple cash registers, a warehouse and office. Until recently, all my stores were connected to the same LAN (in a small shopping center) and performance wasn't an issue. Now that we've opened a location in the States that's changed. Connecting to my local server via the internet has slowed that locations application to a crawl. This is partly due to the slow and crappy dsl service we have in the Virgin Islands, and partly due to my less-than-professional code and sql. With other far-away stores in the works, I need a better solution. I like my application. My staff knows it well, and I'm not inclined to take on the expense of a proper commercial solution. So where does that leave me? I should probably host my sql online to sidestep the slow dsl here. I think I can handle cleaning up my SQL querries to speed that up a bit. What about Access? My version seems so old, but I don't like the newer versions with the 'ribbon'. There are so many options... Should I be learning Visual Studio with an eye on moving completely to the web? Will my VBA skills help me at all there? I don't have the luxury of a year at the keyboard to figure it out anymore. What about dotnetnuke, sharepoint, or lightswitch? They all seem like possibilities, but even understanding their capabilities is daunting. I'm pretty deep into it, but maybe I should bail and hire a consultant or programmer. That sounds expensive tho, and there's no guarantee there either... Any advice would be greatly appreciated. Or, if anybody is interested in buying a small chain of surf shops...

Read the article

SAN with iSCSI-Target Performance Horrendous

- by Justin

We have a poor man's SAN setup in a 1U Ubuntu server running iSCSI-Target with two 300GB drives in RAID-0. We then are using it for block level storage for virtual machines. The hypervisor is connected to the SAN via gigabit on a dedicated VLAN and interfaces. We only have a single virtual machine setup and doing some benchmarks. If we run hdparm -t /dev/sda1 from the virtual machine, we get 'ok' performance of 75MB/s from the virtual machine to the SAN. Then we basically compile a package with ./configure and make. Things start ok, but then all the sudden the load average on the SAN grows to 7+ and things slow down to a crawl. When we SSH into the SAN and run top, sure the load is 7+, but the CPU usage is basically nothing, also the server has 1.5GB of memory available. When we kill the compile on the virtual machine, slowly the LOAD on the SAN goes back to sub 1 figures. What in the world is causing this? How can we diagnosis this further? Here are two screenshot from the SAN during high load. 1> Output of iotop on the SAN: 2> Output of top on the SAN:

Read the article

Throttling Postfix memory

- by teddybeard

I have a VPS on 1and1 similar to this configuration (512MB, burst up to 2GB). I run a web service where I crawl the web and notify my users through email and sms when a certain online data feed changes. When I send the emails out, I just have PHP loop through the recipients list and send the emails out using the mail() function. Whenever I try to send a large volume of these messages out, my server starts acting funny. I can't even run an 'ls' sometimes because the shell tells me it 'cannot allocate memory'. The shell is unusable and yet my website is being served up fine. Mail.err contains: Nov 14 17:30:09 s15351477 postfix/smtp[26000]: fatal: inet_addr_local[getifaddrs]: getifaddrs: Cannot allocate memory Nov 14 17:30:09 s15351477 postfix/sendmail[25999]: fatal: username(1000): unable to execute /usr/sbin/postdrop -r: Success Nov 14 18:29:14 s15351477 postfix/smtp[9911]: fatal: inet_addr_local[getifaddrs]: getifaddrs: Cannot allocate memory Nov 14 18:29:14 s15351477 postfix/sendmail[9910]: fatal: username(1000): unable to execute /usr/sbin/postdrop -r: Success Also, if relevant, my bean counters are: Version: 2.5 uid resource held maxheld barrier limit failcnt 53907331: kmemsize 20779422 21041560 31457280 34603008 2989403 lockedpages 0 0 512 512 0 privvmpages 81488 82498 524288 576716 94640 shmpages 2831 2831 32768 32768 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numproc 90 91 128 128 6603 physpages 32692 33531 2147483647 2147483647 0 vmguarpages 0 0 131072 2147483647 0 oomguarpages 32942 33781 9223372036854775807 2147483647 0 numtcpsock 22 23 720 720 0 numflock 27 28 376 413 0 numpty 1 1 32 32 0 numsiginfo 0 1 512 512 0 tcpsndbuf 425888 441064 3440640 5406720 0 tcprcvbuf 369200 376832 3440640 5406720 0 othersockbuf 268000 268464 2252160 4194304 0 dgramrcvbuf 0 8472 524288 576716 0 numothersock 180 182 720 720 0 dcachesize 952146 966231 5242880 5767168 0 numfile 3609 3683 8192 8192 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 25 25 200 205 0 Is there some way I can throttle postfix to keep it from swamping the system like this? Also wondering: why does email use so many resources, these emails are just short text?

Read the article

Bad Performance when SQL Server hits 99% Memory Usage

- by user15863

I've got a server that reports 8 GB of ram used up at 99%. When restart Sql Server, it drops down to about 5% usage, but gradually builds back up to 99% over about 2 hours. When I look at the sqlserver process, its reported as only using 100k ram, and generally never goes up or below that number by very much. In fact, if I add up all the processes in my TaskManager, it's barely scratching the surface of my total available (yet TaskManager still shows 99% memory usage with "All processes shown"). It appears that Sql Server has a huge memory leak going on but it's not reporting it. The server has ran fine for nearly two years, with this only starting to manifest itself in the last 3-4 weeks. Anyone seen this or have any insight into the problem? EDIT When the server hits 99%, performance goes down hill. All queries to the server, apps, etc. come to a crawl. Restarting the service makes things zippy again, until 2 hours has passed and the server hits 99% once again.

Read the article

Online Storage and security concerns

- by Megge

I plan to set up a small fileserver. I already own a small server at HostEurope (VirtualServer L, 250GB space), but they don't offer enough space (there is the HostEurope Cloud, but paying for bandwidth isn't an option here, video-streaming should be possible) Requirements summarized: Storage: 2TB, Users: ~15, Filesizes: < 100GB, should be easily reachable (Mount as a networkdrive or at least have solid "communication" software) My first question would be: Where can I get halfway affordable online storages? And how should I connect them to my server? Getting an additional server is a bit overkill, as I know no hoster which allows 2 TB on a small 2 Ghz Dual Core 2 GB RAM thingy (that would be enough by far, I just need much space), and connecting it via NFS or FTP over Internet seems a bit strange and cripples performance. Do you have any advice where I could get that storage service from? (I sent HostEurope a custom request today, but they didn't answer till now. If they can provide me with that space, this question will be irrelevant, but the 2nd one is the more important one anway, don't do much more than recommend me some based on experience, you don't have to crawl hours through hosting services) livedrive for example offers 5 TB for 17€ / month, I'd be happy with 2 TB for 20 €, the caveat is: It doesn't allow multiple users, which leads me to my second question: Where are the security problems? Which protocol is sufficient (I want private and "public" folders etc. the usual "every user has its own and a public space"-thing), secure and fast? (I'd tend to (S)FTP, problem with FTP is: Most of those hosting services don't even allow FTP with mutliple users and single users lead me into "hacking" a solution (you could map the basic folder structure on the main server and just mount every subfolder from the storage, things get difficult with a public folder with 644 permissions though) Is useing something like PKI or 802.1X overkill for private uses?

Read the article

nVidia performance with newer X and newer driver abysmal with Compiz

- by Nakedible

I recently upgraded Debian to Xorg 2.9.4 and installed nvidia-glx from experimental, version 260.19.21. This was somewhat of an uphill battle as the dependencies for the experimental nvidia-glx package are still somewhat broken. I got it to work without forcing the installation of any packages and without modifying the packages. However, after the upgrade compiz performance has been abysmal. I am using the desktop wall plugin and switching viewports is really slow - takes a few seconds for each switch. In addition to this, every effect that compiz does, such as zoom animations for icons when launching applications, takes seconds. The viewport switching speed changes relative to the amount of windows on that virtual screen - empty screens switch almost at normal speed, single browser windows work almost decently, but just 4 rxvt terminals slows the switches down to a crawl. My compiz configuration should be pretty basic. Xorg is likewise configured without anything special - the only "custom" configuration is forcing the driver name to be "nvidia". I've fiddled around with the nvidia-settings and compizconfig trying different VSync settings, but none of those helped. My graphics card is: NVIDIA GPU NVS 3100M (GT218) at PCI:1:0:0 (GPU-0). This is laptop GPU that is from the Geforce GTX 200 series. Graphics card performance should naturally be no problem.

Read the article

weird POST request in IIS logs

- by MIrrorMirror

I noticed weird log entries (unless there's something i don't understand) in my IIS (7.5) logs. it's an online dictionary with requests ( user friendly url rewriting ) and most of them are GET. However I noticed weird POST requests which are taking place by a person who is trying to crawl our content ( tens of thousands of such requests ) 2013-11-09 20:39:27 GET /dict/mylang/word1 - y.y.y.y Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 200 296 2013-11-09 20:39:29 GET /dict/mylang/word2 - z.z.z.z Mozilla/5.0+(iPhone;+CPU+iPhone+OS+6_0+like+Mac+OS+X)+AppleWebKit/536.26+(KHTML,+like+Gecko)+Version/6.0+Mobile/10A5376e+Safari/8536.25+(compatible;+Googlebot-Mobile/2.1;++http://www.google.com/bot.html) - 200 468 2013-11-09 20:39:29 POST /dict/mylang/word3 - x.x.x.x - - 200 2593 The two first requests are legal. Now for the third request, I don't think I have allowed cross domain POST. if that what the third log line means. all those POST requests take that much time for unknown reasons to me. I would like to know how are those POST requests possible and how can I stop them. p.s. I have masked the IPs on purpose. any help would be appreciated! thank you in advance.

Read the article

Locating memory leak in Apache httpd process, PHP/Doctrine-based application

- by Sam

I have a PHP application using these components: Apache 2.2.3-31 on Centos 5.4 PHP 5.2.10 Xdebug 2.0.5 with Remote Debugging enabled APC 3.0.19 Doctrine ORM for PHP 1.2.1 using Query Caching and Results Caching via APC MySQL 5.0.77 using Query Caching I've noticed that when I start up Apache, I eventually end up 10 child processes. As time goes on, each process will grow in memory until each one approaches 10% of available memory, which begins to slow the server to a crawl since together they grow to take up 100% of memory. Here is a snapshot of my top output: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1471 apache 16 0 626m 201m 18m S 0.0 10.2 1:11.02 httpd 1470 apache 16 0 622m 198m 18m S 0.0 10.1 1:14.49 httpd 1469 apache 16 0 619m 197m 18m S 0.0 10.0 1:11.98 httpd 1462 apache 18 0 622m 197m 18m S 0.0 10.0 1:11.27 httpd 1460 apache 15 0 622m 195m 18m S 0.0 10.0 1:12.73 httpd 1459 apache 16 0 618m 191m 18m S 0.0 9.7 1:13.00 httpd 1461 apache 18 0 616m 190m 18m S 0.0 9.7 1:14.09 httpd 1468 apache 18 0 613m 190m 18m S 0.0 9.7 1:12.67 httpd 7919 apache 18 0 116m 75m 15m S 0.0 3.8 0:19.86 httpd 9486 apache 16 0 97.7m 56m 14m S 0.0 2.9 0:13.51 httpd I have no long-running scripts (they all terminate eventually, the longest being maybe 2 minutes long), and I am working under the assumption that once each script terminates, the memory it uses gets deallocated. (Maybe someone can correct me on that). My hunch is that it could be APC, since it stores data between requests, but at the same time, it seems weird that it would store data inside the httpd process. How can I track down which part of my app is causing the memory leak? What tools can I use to see how the memory usage is growing inside the httpd process and what is contributing to it?

Read the article

SSL in IIS 7 on a subdomain in a web farm

- by justjoshingyou

I have been having one of the most frustrating days in my entire IT career. I am trying to install an SSL certificate on a subdomain in a web farm. http://shop.mydomain.com needs to ALWAYS be forced to https://shop.mydomain.com I have a temporary cert issued from verisign on shop.mydomain.com I have installed the cert on the server. The website for shop.mydomain.com is set as a host header in IIS with the DNS entry pointed to the same IP as mydomain.com - which is our load balancer. I actually have 2 load balancers (as needed by our ISP). One redirects all traffic on port 80 out to the different servers on port 80. The other pushes out port 443 to the servers on port 443. shop.mydomain.com is to be the only site protected by SSL at this time. When I add the binding and I navigate to https://shop.mydomain.com it pops up with a warning about the cert being invalid (assumed because this is a test cert), and then it sends the user to http. So, I checked the box "Require SSL and it redirects to http://shop.mydomain.com/default.aspx and displayes an ASP.NET 404 error message. (not the IIS 404 error) I tried removing the binding on the site to port 80 as well with no luck. I am nearly ready to crawl under my desk into the fetal position. How on earth do I make this work? I can't even get it to work on one machine, let alone in the load balanced environment.

Read the article

Apache logging issues

- by Dan

I'm trying to parse apache log files, but I'm finding some strange results and I'm not sure what they mean. Hopefully someone can provide some insight. (all of the IP addresses were altered. none actually start with 192, I didn't figure the search engines mattered though.) In the first example, multiple ip addresses are showing up in the host field: 192.249.71.25 - - [04/Aug/2009:04:21:44 -0500] "GET /publications/example.pdf HTTP/1.1" 200 2738 192.0.100.93, 192.20.31.86 - - [04/Aug/2009:04:21:22 -0500] "GET /docs/another.pdf HTTP/1.0" 206 371469 What causes this? Does it have to do with proxy servers? Is there a way to have Apache only log one? In the second example, a bunch of information is just completely missing! What would cause this? msnbot-65-55-207-50.search.msn.com - - [29/Dec/2009:15:45:16 -0600] "GET /publications/example.pdf HTTP/1.1" 200 3470073 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 266 3476792 - - - - "-" - - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.1)" 285 594 - - - - "-" - - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.1)" 285 4195 - - - - "-" - - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.1)" 299 109218 crawl-17c.cuil.com - - [29/Dec/2009:15:45:46 -0600] "GET /publications/another.pdf HTTP/1.0" 200 101481 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)" 253 101704 My CustomLog configuration says: LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %I %O" common

Search Results

Search found 446 results on 18 pages for 'crawl'.

Page 15/18 | < Previous Page | 11 12 13 14 15 16 17 18 | Next Page >

- by alex

- by Ajay

- by regexhacks

- by Jonathan

- by Michael

- by Steven

- by Zachary G Jensen

- by Vijay

- by WitchOfCloud

- by MrKatSwordfish

- by williepabon

- by Ian

- by mikewassmer

- by Chris Adragna

- by Rémi

- by pretlow majette

- by Justin

- by teddybeard

- by user15863

- by Megge

- by Nakedible

- by MIrrorMirror

- by Sam

- by justjoshingyou

- by Dan

< Previous Page | 11 12 13 14 15 16 17 18 | Next Page >