big theta - Page 13 - Developer IT

Data Aggregation of CSV files java

- by royB

I have k csv files (5 csv files for example), each file has m fields which produce a key and n values. I need to produce a single csv file with aggregated data. I'm looking for the most efficient solution for this problem, speed mainly. I don't think by the way that we will have memory issues. Also I would like to know if hashing is really a good solution because we will have to use 64 bit hashing solution to reduce the chance for a collision to less than 1% (we are having around 30000000 rows per aggregation). For example file 1: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,50,60,70,80 a3,b2,c4,60,60,80,90 file 2: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,30,50,90,40 a3,b2,c4,30,70,50,90 result: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,80,110,160,120 a3,b2,c4,90,130,130,180 algorithm that we thought until now: hashing (using concurentHashTable) merge sorting the files DB: using mysql or hadoop or redis. The solution needs to be able to handle Huge amount of data (each file more than two million rows) a better example: file 1 country,city,peopleNum england,london,1000000 england,coventry,500000 file 2: country,city,peopleNum england,london,500000 england,coventry,500000 england,manchester,500000 merged file: country,city,peopleNum england,london,1500000 england,coventry,1000000 england,manchester,500000 The key is: country,city. This is just an example, my real key is of size 6 and the data columns are of size 8 - total of 14 columns. We would like that the solution will be the fastest in regard of data processing.

Read the article

Vector with Constant-Time Remove - still a Vector?

- by Darrel Hoffman

One of the drawbacks of most common implementations of the Vector class (or ArrayList, etc. Basically any array-backed expandable list class) is that their remove() operation generally operates in linear time - if you remove an element, you must shift all elements after it one space back to keep the data contiguous. But what if you're using a Vector just to be a list-of-things, where the order of the things is irrelevant? In this case removal can be accomplished in a few simple steps: Swap element to be removed with the last element in the array Reduce size field by 1. (No need to re-allocate the array, as the deleted item is now beyond the size field and thus not "in" the list any more. The next add() will just overwrite it.) (optional) Delete last element to free up its memory. (Not needed in garbage-collected languages.) This is clearly now a constant-time operation, since only performs a single swap regardless of size. The downside is of course that it changes the order of the data, but if you don't care about the order, that's not a problem. Could this still be called a Vector? Or is it something different? It has some things in common with "Set" classes in that the order is irrelevant, but unlike a Set it can store duplicate values. (Also most Set classes I know of are backed by a tree or hash map rather than an array.) It also bears some similarity to Heap classes, although without the log(N) percolate steps since we don't care about the order.

Read the article

OSS App Hackathon @ National Information Society Agency

- by Edward J. Yoon

Yesterday, there was a OSS App Hackathon arranged by the NIA (National Information Society Agency) in Seoul. I attended as a panel of judges w/ Prof. Lee of the Next, NHN University. A lot of people were in there. You can read more details (Korean news) here: - http://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=105&oid=138&aid=0001997038

Read the article

questions on a particular algorithm

- by paul smith

Upon searching for a fast primr algorithm, I stumbled upon this: public static boolean isP(long n) { if (n==2 || n==3) return true; if ((n&0x1)==0 || n%3==0 || n<2) return false; long root=(long)Math.sqrt(n)+1L; // we check just numbers of the form 6*k+1 and 6*k-1 for (long k=6;k<=root;k+=6) { if (n%(k-1)==0) return false; if (n%(k+1)==0) return false; } return true; } My questions are: Why is long being used everywhere instead of int? Because with a long type the argument could be much larger than Integer.MAX thus making the method more flexible? In the second 'if', is n&0x1 the same as n%2? If so why didn't the author just use n%2? To me it's more readable. The line that sets the 'root' variable, why add the 1L? What is the run-time complexity? Is it O(sqrt(n/6)) or O(sqrt(n)/6)? Or would we just say O(n)?

Read the article

Read how a customer uses Oracle NoSQL Database

- by Jean-Pierre Dijcks

For those who have had the pleasure to be in SF for Oracle Openworld, you might have seen or heard about this story already. If you did not, here is a great story on how to use Oracle NoSQL Database. Apart from all the cool technology, I'm just excited that this is a company founded by a football international and dealing with sports data, games and other cool things. Like an all things cool combo in one place.

Read the article

Can you work for the big (Google, Microsoft, Facebook etc.) without getting too much involved?

- by Developer Art

Having seen people talking about interviewing and working for the big companies, I keep wondering how much are you expected to actually get involved in there. 1) That's because I keep seeing folks from Google and Microsoft and others writing in forums, blogging, tweeting, speaking at conferences and seemingly doing this on the 24/7/365 basis from their office, apartment, hotel and even plane. Are you really expected to commit that much if you come to work for them? Do they want you to think about your work while you're eating, sleeping, taking a shower, making love and so on? Can you in fact "switch off" at five and go home forgetting everything? Perhaps you have a hobby, family life, kids, friends, personal projects anyone? Is it so that if you work for the big then you're expected not to have any life outside of the company? You can't develop own projects, have own clients and just have another life? 2) One other thing is the work contracts the big use. I've heard for instance that when you join Microsoft you need to provide a list of projects you're currently working on and after that anything new you'll come up with during your employment automatically belongs to the company. Are all of the big doing this? Can you deny signing a contract until such clause is removed or with the big it is "take it or leave it" because the legal department won't accept any change? Can you make them write the contract in that manner that they step away from anything you've developed in your private time? Of all the big I have only been at SAP during my internship. Lately while browsing through the old papers I've found my old contact which stipulated they owned everything I developed or invented during my employment, which I would never have signed these days. On a side note I don't think I would return to SAP since I remember most people there were clueless and provided the impression they were simply sitting out their years waiting for the retirement. But anyway, what do the other big put in their contracts? How far do you get involved when you go working for the big? Or perhaps fully committed with your body and soul? P.S. I'm not planning to join any of them I'm just curious.

Read the article

Using replacement to get possible outcomes to then search through HUGE amount of data

- by Samuel Cambridge

I have a database table holding 40 million records (table A). Each record has a string a user can search for. I also have a table with a list of character replacements (table B) i.e. i = Y, I = 1 etc. I need to be able to take the string a user is searching for, iterate through each letter and create an array of every possible outcome (the users string, then each outcome with alternative letters used). I need to check for alternatives on both lower and uppercase letters in the word A search string can be no longer than 10 characters long. I'm using PHP and a MySQL database. Does anyone have any thoughts / articles / guidance on doing this in an efficient way?

Read the article

Oracle Subscribes To The Big Data Journal: So Can You!

- by Roxana Babiciu

Oracle Product Development has funded access to the Big Data Journal for all Oracle employees. Big Data is a highly innovative, open-access, peer-reviewed journal of world-class research, exploring the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data. This includes data science, big data infrastructure and analytics, and pervasive computing. Register here to receive Big Data articles online or sign up for the table of content alert or the RSS feed.

Read the article

Java: very slow tomcat and too big war file

- by NaN

I created some sort of RESTful API backend for a mobile app. It's written completely in Java using Jersey as Framework. At the moment no database is used, it's all in the memory, but this is no problem so far (it's only for prototyping purposes). I ordered the smallest package from digital ocean and installed tomcat7. All in all tomcat works, but I have three major problems: 1) It takes a long time until tomcat deploys the app: I deploy it per tomcat manager and it takes about 2 minutes unit the site works (excl. war upload time). 2) The war files are quite big (16MB): I don't know why they are so big. There are no database dependencies and most logic is written in plain java. Okay, we are using jersey, but 16MB are a lot for the logic of a small webservice. 3) I have to restart tomcat all 3 days or so. It looks like a memory leak or something similar. If the app runs for a few days the response time is quite high and the server seems to be frozen. It works again, if I restart tomcat per ssh. You can find my mvn pom file right here. Do you have some tips? Are there good tomcat alternatives?

Read the article

Port Lacie Big Disk (raid0) to server

- by Frankie

Does anyone know if it is possible (and linear) to port a LaCie Big Disk (hardware based array 0) to a linux server? Can I just install the HD's on the server and then mdadm it? Thank you!

Read the article

Cache coherence literature for big (>=16CPU) systems

- by osgx

Hello What books and articles can you recommend to learn basis of cache coherence problems in big SMP systems (which are NUMA and ccNUMA really) with =16 cpu sockets? Something like SGI Altix architecture analysis may be interesting. What protocols (MOESI, smth else) can scale up well?

Read the article

How big are most production databases?

- by TheLQ

Seeing some posts that say 10 million rows in a table is nothing made me wonder: Just how big are most production databases? I'm not talking about physical disk size of the database (saying 60 GB tells me nothing), I'm wondering how many rows. Personally the largest DB I've ever worked with was a test DB of a production system with 10 million rows. But I've seen people brag about DB's in the billions of rows.

Read the article

Availability of big files on multiple servers

- by Imises

I have to handle many (1'000 - 30'000) big files ranging from 200MB up to 2GB. The demand for these files is variable (0 - 300 downloads / file). This is why a single file must saved on 2 or more servers. My servers are placed in different datacenters (France), with different size HDDs (750GB to 4TB). Currently I share the files using PHP and ncftpget / ncftpput, but it's very slow. I need a solution to handle balancing these files across 7+ servers.

Read the article

One big executable or many small DLL's?

- by Patrick

Over the years my application has grown from 1MB to 25MB and I expect it to grow further to 40, 50 MB. I don't use DLL's, but put everything in this one big executable. Having one big executable has certain advantages: Installing my application at the customer is really: copy and run. Upgrades can be easily zipped and sent to the customer There is no risk of having conflicting DLL's (where the customer has version X of the EXE, but version Y of the DLL) The big disadvantage of the big EXE is that linking times seem to grow exponentially. Additional problem is that a part of the code (let's say about 40%) is shared with another application. Again, the advantages are that: There is no risk on having a mix of incorrect DLL versions Every developer can make changes on the common code which speeds up developments. But again, this has a serious impact on compilation times (everyone compiles the common code again on his PC) and on linking times. The question http://stackoverflow.com/questions/2387908/grouping-dlls-for-use-in-executable mentions the possibility of mixing DLL's in one executable, but it looks like this still requires you to link all functions manually in your application (using LoadLibrary, GetProcAddress, ...). What is your opinion on executable sizes, the use of DLL's and the best 'balance' between easy deployment and easy/fast development?

Read the article

Big square ads appear in lower right corner of both IE and Chrome

- by BrianK

In both IE and Chrome, large ads appear in the lower right corner of the browser window. Sometime they look reputable like for Microsoft, but sometimes they are big flashing boxes that say "You have won". Right now I am looking at "Need to lose 30 lbs?" I ran Microsofot Security Essentials and it didn't find anything. I then ran Windows Defender Offline (boot from CD). WDO found five things lincluding browser hijack that caused the wrong page to appear after clicking a link. It reported that it cleaned successfully, after which I ran a quick scan to confirm. After rebooting I still see the ads. Do I still have an infection? Any other tools to try? What about ComboFix? Thanks Update: Here's a screenshot - on superuser

Read the article

Why is Latex so big?

- by putmatrix

On windows, MikTex comes in a DVD. It's several times bigger than a typical Linux distribution. This makes it impossible to carry Latex on a memory stick like I do with many other useful software. Why is it so big? I thought it was just a language or system, but I've never seen any programming language with gigabytes of libraries. It's just that there's a bad feeling when your Latex distribution takes up four gigabytes of space when you expect it to be more of, say, 200mb.

Read the article

Handout export to word from PowerPoint are too big :(

- by nickjohn

EDITED i am using power point lectures. i want to mail merge speaker data into the respective lecture. now thats not possible with ppt as far i know, so i have to convert these lectures to handout by using power point option "publishMS word handouts" and use word mail merger. this is good since it will keep the comments/notes added in slides in handouts aswell. but these exported handouts in word remain actual slides and retain link to original ppt rather than simply get exported as images. so the file size gets verrry big 10mb ppt = 212mb doc=88mb docx Is there any option to convert handouts exported from power point to word as images? i simply cant save them as pngs from powerpoint since that will not include the comments data. Thanks

Read the article

Big Excel File Freezing/Running Slowly

- by ktm5124

Hi, My co-worker has a very large Excel file (over 7 MB) that suffers from the problems of (A) running slowly (B) taking forever to open/save/close and (C) freezing the computer, requiring a restart. I set the calculations to Manual, and I repaired the file, but the file didn't change in file size and it is still having these problems. My questions are: (1) Is there any way around this problem or is Excel just bad at handling ~7MB files? (2) Would upgrading RAM make a big difference? (3) It's possible that we can't afford to spend the money on a RAM upgrade. Are there are any other ways of mitigating the problem? Thanks.

Read the article

Passing big multi-dimensional array to function in C

- by kirbuchi

Hi, I'm having trouble passing a big array to a function in C. I declare: int image[height][width][3]={}; where height and width can be as big as 1500. And when I call: foo((void *)image,height,width); which is declared as follows: int *foo(const int *inputImage, int h, int w); I get segmentation fault error. What's strange is that if my values are: height=1200; width=290; theres no problem, but when they're: height=1200; width=291; i get the mentioned error. At 4 bytes per integer with both height and width of 1500 (absolute worst case) the array size would be of 27MB which imo isn't that big and shouldn't really matter because I'm only passing a pointer to the first element of the array. Any advice?

Read the article

Fastest way to find sum of digits on big numbers

- by dada

I have some big numbers (again) and i need to find if the sum of the digits is an even number. I tried this: finding the sum of the digits with a while loop and then checking if that sum % 2 equals 0 and it's working but it's too slow for big numbers, because i am given intervals of numbers and if the input is 1999999 19999999999 then my program fails, i cannot complete within the time limit which is 0,1 sec. What to do ? Is there any other faster way to do this ? EDIT: The input 1999999 19999999999 means it will start with 1999999 and check all the numbers like i wrote above until 19999999999, and because we are talking about big numbers (< 2^30) my program is not worthy.

Read the article

big size of database-log SQL-Server 2008

- by t.kehl

I have a database which is running under Microsoft SQL Server 2008. Now, I have seen, that the log of the database (ldf-file) is growing to big size. The database-file (mdf) has a size of 630MB and the log-file has a size of 12GB. I ask me now, what the reason for this can be. Is there a tool which let me seeing into the log where I can see, what is the reason for this big size? What can I do to prevent that the log is growing to this big size?

Read the article

big speed difference on a network link with and without VPN tunnel

- by xirtyllo

Scenario: We have a network link between two offices. The link is provided by a third party company through a VLAN on their network, but to us it is totally transparent -as if we had a simple ethernet cable going from one location to the other-. We have one router at each side of the link, with 3 VPN tunnels in between the two. The test: When I test the speed of the network link with the routers in place, with one laptop directly connected to the router on each side, I consistently get ~30/35Mbps. But if I take out the routers and I test the link connecting the laptops directly to the ethernet cable at each side, I consistently get ~85/88Mbps. It's quite a big performance hit, and I would tend to think that the VPN tunnels are responsible for the slow down. Is it normal that this configuration (two routers with three VPN tunnels between them) takes away so much bandwidth? More info: The encryption algorithm used for the VPN tunnels is AES128. The routers model is Zyxel USG200 and Zyxel USG1000, and their CPU, memory, and storage use is well within normal limits. The nominal bandwidth of the network link is 100Mbps. The network link in question is supplied by a third party company (the building in between our two offices). Basically it passes through their network as a VLAN, but the VLAN is completely transparent to us (e.g. no configuration required on our side, just like one single cable from end to end). Unfortunately (or maybe fortunately) I cannot directly test different routers configurations as I'm not the person in charge of it.

Read the article

Archive software for big files and fast index

- by AkiRoss

I'm currently using tar for archiving some files. Problem is: archives are pretty big, contains many data and tar is very slow when listing and extracting. I often need to extract single files or folders from the archive, but I don't currently have an external index of files. So, is there an alternative for Linux, allowing me to build uncompressed archive files, preserving the file attributes AND having fast access list table? I'm talking about archives of 10 to 100 GB, and it's pretty impractical to wait several minutes to access a single file. Anyway, any trick to solve this problem is welcome (but single archives are non-optional, so no rsync or similar). Thanks in advance! EDIT: I'm not compressing archives, and using tar I think they are too slow. To be precise about "slow", I'd like that: listing archive content should take time linear in files count inside the archive, but with very little constant (e.g. if a list of all the files is included at the head of the archive, it could be very fast). extraction of a target file/directory should (filesystem premitting) take time linear with the target size (e.g. if I'm extracting a 2MB PDF file in a 40GB directory, I'd really like it to take less than few minutes... If not seconds). Of course, this is just my idea and not a requirement. I guess such performances could be achievable if the archive contained an index of all the files with respective offset and such index is well organized (e.g. tree structure).

Read the article

Split big Apache log to folder structure

- by Dough

I just changed my Apache log behavior because it was making me having very BIG files... So I now use cronolog to split my logs to log/httpd/2012/11/access_2012.11.30.log for exemple, pattern : %Y/%m/access_%Y.%m.%d.log I now want to split my old 42GB file to the same structure but really don't know how to do that efficiently. I tried some simple commands with cat, egrep, awk... but really don't know how to handle all that in a more powerful script. Here is how the log looks like : x.x.237.134 - - [08/Apr/2011:14:43:09 +0200] "GET... x.x.50.15 - - [08/Apr/2011:14:43:09 +0200] "GET... [...] x.x.254.19 - - [28/Feb/2012:15:24:48 +0100] "GET... So I need for yeah line to get : year %Y (ex. 2012) month %m (ex. 11) day %d And to push out the entire line to : %Y/%m/access_%Y.%m.%d.log Can someone give me clues to get that working ? Thanks a lot for your interest.

Read the article

big cpu load on vmware server / linux

- by dezfafara

Hi, I currently using a server 2.x hosting 4 virtual machines on a linux system Today, on my physical server, I saw an enormous load average: this is the "top" of the server, illustrating my 4 virtual guests. top - 11:02:02 up 194 days, 23:09, 5 users, load average: 18.78, 12.05, 13.55 Tasks: 113 total, 4 running, 109 sleeping, 0 stopped, 0 zombie Cpu0 : 71.6%us, 19.0%sy, 0.0%ni, 8.8%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Cpu1 : 74.3%us, 10.4%sy, 0.0%ni, 15.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 72.5%us, 17.6%sy, 0.0%ni, 9.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 79.5%us, 4.6%sy, 0.0%ni, 16.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8178884k total, 8129980k used, 48904k free, 134904k buffers Swap: 10490436k total, 148k used, 10490288k free, 6129728k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7312 root 6 -10 1149m 921m 559m R 97 11.5 107947:09 vmware-vmx 6995 root 6 -10 779m 687m 317m R 92 8.6 107374:31 vmware-vmx 6693 root 6 -10 880m 659m 409m S 85 8.3 76947:33 vmware-vmx 12937 root 6 -10 960m 719m 523m S 75 9.0 67219:49 vmware-vmx In bold are the cpu usage for my 4 virtuals guests These guests are running on a linux system, and the appropriate process are usually 5% - 15% of cpu I don't understang why , since a few days I have this big problem. This is the "top" on a virtual guest which is at 95% of cpu load top - 11:23:15 up 194 days, 23:13, 4 users, load average: 0.25, 0.47, 0.59 Tasks: 92 total, 2 running, 90 sleeping, 0 stopped, 0 zombie Cpu(s): 1.4%us, 7.7%sy, 0.0%ni, 90.5%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 382296k total, 369732k used, 12564k free, 145156k buffers Swap: 979924k total, 13956k used, 965968k free, 86988k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3691 root 20 0 23948 1148 960 S 13.0 0.3 15339:23 vmware-guestd 3840 root 20 0 19880 584 512 S 7.7 0.2 1729:17 hald-addon-stor This virtual guest state is ok ... If anyone has any ideas .. Thanks

Search Results

Search found 9128 results on 366 pages for 'big theta'.

Page 13/366 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by royB

- by Darrel Hoffman

- by Edward J. Yoon

- by paul smith

- by Jean-Pierre Dijcks

- by Developer Art

- by Samuel Cambridge

- by Roxana Babiciu

- by NaN

- by Frankie

- by osgx

- by TheLQ

- by Imises

- by Patrick

- by BrianK

- by putmatrix

- by nickjohn

- by ktm5124

- by kirbuchi

- by dada

- by t.kehl

- by xirtyllo

- by AkiRoss

- by Dough

- by dezfafara

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >