Search Results

Search found 287 results on 12 pages for 'sparse'.

Page 2/12 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • k-means clustering in R on very large, sparse matrix?

    - by movingabout
    Hello, I am trying to do some k-means clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have the data as a plain CSV file. Is there any package available in R for loading such sparse matrices efficiently? I'd then use the regular k-means algorithm from the cluster package to proceed. Many thanks

    Read the article

  • Card deck and sparse matrix interview questions

    - by MrDatabase
    I just had a technical phone screen w/ a start-up. Here's the technical questions I was asked ... and my answers. What do think of these answers? Feel free to post better answers :-) Question 1: how would you represent a standard 52 card deck in (basically any language)? How would you shuffle the deck? Answer: use an array containing a "Card" struct or class. Each instance of card has some unique identifier... either it's position in the array or a unique integer member variable in the range [0, 51]. Shuffle the cards by traversing the array once from index zero to index 51. Randomly swap ith card with "another card" (I didn't remember how this shuffle algorithm works exactly). Watch out for using the same probability for each card... that's a gotcha in this algorithm. I mentioned the algorithm is from Programming Pearls. Question 2: how to represent a large sparse matrix? the matrix can be very large... like 1000x1000... but only a relatively small number (~20) of the entries are non-zero. Answer: condense the array into a list of the non-zero entries. for a given entry (i,j) in the array... "map" (i,j) to a single integer k... then use k as a key into a dictionary or hashtable. For the 1000x1000 sparse array map (i,j) to k using something like f(i, j) = i + j * 1001. 1001 is just one plus the maximum of all i and j. I didn't recall exactly how this mapping worked... but the interviewer got the idea (I think). Are these good answers? I'm wondering because after I finished the second question the interviewer said the dreaded "well that's all the questions I have for now." Cheers!

    Read the article

  • Message "Sparse file not allowed" after succesfull install without swap-partition

    - by FUZxxl
    I've installed Ubuntu without creating a swap partition and with / on a btrfs.# Now I get the message "Sparse file is not allowed" on each boot. This message appears before the splash-screen. Is there a way to kill this warning? # I use some programs that tend to get havoc on memory usage. To prevent them from killing my system, I let the OOM killer do the rest when out of memory rather than making my system unreasonably slow by excessive swapping.

    Read the article

  • how to implement a sparse_vector class

    - by Neil G
    I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value. So, sparse_vector would store the index-value pairs for all indices whose value is not T(). I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector and eigen::SparseVector. Both store: size_t* indices_; // a dynamic array T* values_; // a dynamic array int size_; int capacity_; Why don't they simply use vector<pair<size_t, T>> data_; My main question is what are the pros and cons of both systems, and which is ultimately better? The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference. The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type? It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?

    Read the article

  • Sparse virtual machine disk image resizing weirdness?

    - by Matt H
    I have a partitioned virtual machine disk image created by vmware. What I want to do is resize that by 10GB. The file size is showing as 64424509440. Or 60GB. So I ran this: dd if=/dev/zero of=./win7.img seek=146800640 count=0 It ran without errors and I can verify the new size is in fact 75161927680 bytes or 70GB. This is where it gets a little odd. I started the guest domain in xen which is a Windows 7 enterprise machine. What I was expecting to see in diskmgmt.msc is 2 partitions. 1 system partition at the start of around 100MB and near 60GB partition (which is C drive) followed by around 10GB of free space. Actually what I saw was a 70GB partition!?! That confused me... so I decided to run the Check Disk which when you set it on the C drive it asks you to reboot so it'll run on boot. So I did that and during the boot it ran the checks. It got all the way through stage 3 and didn't show any errors at all. Looked at the partitions in disk manager and now C drive has shrunk back to 60GB and there is no free space. What gives? Ok, I thought I'd try mounting it under Dom0 and examining it with fdisk. This is what I get when mounted sudo xl block-attach 0 tap:aio:/home/xen/vms/otoy_v1202-xen.img xvda w sudo fdisk -l /dev/xvda Disk /dev/xvda: 64.4 GB, 64424509440 bytes 255 heads, 63 sectors/track, 7832 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x582dfc96 Device Boot Start End Blocks Id System /dev/xvda1 * 1 13 102400 7 HPFS/NTFS Partition 1 does not end on cylinder boundary. /dev/xvda2 13 7833 62810112 7 HPFS/NTFS Note the cylinder boundary comment. When I run sudo cfdisk /dev/xvda I get: FATAL ERROR: Bad primary partition 1: Partition ends in the final partial cylinder Press any key to exit cfdisk So I guess this is a bigger problem than first thought. How can I fix this? EDIT: Oops, the cylinder boundary thing is not a problem at all since disks have used LBA etc. So that threw me for a moment... still the problem exists... Now this output looks a little different. sudo sfdisk -uS -l /dev/xvda Disk /dev/xvda: 7832 cylinders, 255 heads, 63 sectors/track Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/xvda1 * 2048 206847 204800 7 HPFS/NTFS /dev/xvda2 206848 125827071 125620224 7 HPFS/NTFS /dev/xvda3 0 - 0 0 Empty /dev/xvda4 0 - 0 0 Empty BTW: I do have a backup of the image so if you help me mess it up that's ok. EDIT: sudo parted /dev/xvda print free Model: Xen Virtual Block Device (xvd) Disk /dev/xvda: 64.4GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 32.3kB 1049kB 1016kB Free Space 1 1049kB 106MB 105MB primary ntfs boot 2 106MB 64.4GB 64.3GB primary ntfs 64.4GB 64.4GB 1049kB Free Space Cool. Linux is showing free space is 10GB which is what I expect. The problem is windows isn't seeing this?

    Read the article

  • Anyone Know a Great Sparse One Dimensional Array Library in Python?

    - by TheJacobTaylor
    I am working on an algorithm in Python that uses arrays heavily. The arrays are typically sparse and are read from and written to constantly. I am currently using relatively large native arrays and the performance is good but the memory usage is high (as expected). I would like to be able to have the array implementation not waste space for values that are not used and allow an index offset other than zero. As an example, if my numbers start at 1,000,000 I would like to be able to index my array starting at 1,000,000 and not be required to waste memory with a million unused values. Array reads and writes needs to be fast. Expanding into new territory can be a small delay but reads and writes should be O(1) if possible. Does anybody know of a library that can do it? Thanks!

    Read the article

  • what are sparse voxel octrees?

    - by pdeva
    I have reading a lot about the potential use of sparse voxel octrees in future graphics engines. However I have been unable to find technical information on them. I understand what a voxel is, however I dont know what sparse voxel octrees are or how are they any more efficient than the polygonal techniques in use now. Could somebody explain or point me to an explanation for this?

    Read the article

  • What is most efficient way of setting row to zeros for a sparce scipy matrix?

    - by Alex Reinking
    I'm trying to convert the following MATLAB code to Python and am having trouble finding a solution that works in any reasonable amount of time. M = diag(sum(a)) - a; where = vertcat(in, out); M(where,:) = 0; M(where,where) = 1; Here, a is a sparse matrix and where is a vector (as are in/out). The solution I have using Python is: M = scipy.sparse.diags([degs], [0]) - A where = numpy.hstack((inVs, outVs)).astype(int) M = scipy.sparse.lil_matrix(M) M[where, :] = 0 # This is the slowest line M[where, where] = 1 M = scipy.sparse.csc_matrix(M) But since A is 334863x334863, this takes like three minutes. If anyone has any suggestions on how to make this faster, please contribute them! For comparison, MATLAB does this same step imperceptibly fast. Thanks!

    Read the article

  • Repository layout and sparse checkouts

    - by chuanose
    My team is considering to move from Clearcase to Subversion and we are thinking of organising the repository like this: \trunk\project1 \trunk\project2 \trunk\project3 \trunk\staticlib1 \trunk\staticlib2 \trunk\staticlib3 \branches\.. \tags\.. The issue here is that we have lots of projects (1000+) and each project is a dll that links in several common static libraries. Therefore checking out everything in trunk is a non-starter as it will take way too long (~2 GB), and is unwieldy for branching. Using svn:externals to pull out relevant folders for each project doesn't seem ideal because it results in several working copies for each static library folder. We also cannot do an atomic commit if the changes span the project and some static libraries. Sparse checkouts sounds very suitable for this as we can write a script to pull out only the required directories. However when we want to merge changes from a branch back to the trunk we will need to first check out a full trunk. Wonder if there is some advice on 1) a better repository organization or 2) a way to merge over branch changes to a trunk working copy that is sparse?

    Read the article

  • Optimizing sparse dot-product in C#

    - by Haggai
    Hello. I'm trying to calculate the dot-product of two very sparse associative arrays. The arrays contain an ID and a value, so the calculation should be done only on those IDs that are common to both arrays, e.g. <(1, 0.5), (3, 0.7), (12, 1.3) * <(2, 0.4), (3, 2.3), (12, 4.7) = 0.7*2.3 + 1.3*4.7 . My implementation (call it dict) currently uses Dictionaries, but it is too slow to my taste. double dot_product(IDictionary<int, double> arr1, IDictionary<int, double> arr2) { double res = 0; double val2; foreach (KeyValuePair<int, double> p in arr1) if (arr2.TryGetValue(p.Key, out val2)) res += p.Value * val2; return res; } The full arrays have about 500,000 entries each, while the sparse ones are only tens to hundreds entries each. I did some experiments with toy versions of dot products. First I tried to multiply just two double arrays to see the ultimate speed I can get (let's call this "flat"). Then I tried to change the implementation of the associative array multiplication using an int[] ID array and a double[] values array, walking together on both ID arrays and multiplying when they are equal (let's call this "double"). I then tried to run all three versions with debug or release, with F5 or Ctrl-F5. The results are as follows: debug F5: dict: 5.29s double: 4.18s (79% of dict) flat: 0.99s (19% of dict, 24% of double) debug ^F5: dict: 5.23s double: 4.19s (80% of dict) flat: 0.98s (19% of dict, 23% of double) release F5: dict: 5.29s double: 3.08s (58% of dict) flat: 0.81s (15% of dict, 26% of double) release ^F5: dict: 4.62s double: 1.22s (26% of dict) flat: 0.29s ( 6% of dict, 24% of double) I don't understand these results. Why isn't the dictionary version optimized in release F5 as do the double and flat versions? Why is it only slightly optimized in the release ^F5 version while the other two are heavily optimized? Also, since converting my code into the "double" scheme would mean lots of work - do you have any suggestions how to optimize the dictionary one? Thanks! Haggai

    Read the article

  • efficient video format/codec for sparse & binary blob tracking

    - by user391339
    I am working on a blob tracking project and have many high-definition videos that I would like to reduce in size for storage and downstream tracking/shape-analysis. I want to use a lossless method that takes advantage of the black and white nature of the video as well as the fact that not much is moving between individual frames. The videos are quite sparse, with 5 to 10 b&w blobs per frame occupying <30% of the space in total, with each blob moving <5-10% of the field of view between frames and not changing shape too much between 2-3 frames. I will work in Python, Matlab, or LabView for this project, and could use a batch utility if available. It may be worthwhile to export the files as compressed image stacks if a proper video format can't be found. What are the pros and cons of this? A video codec uses correlations between neighboring frames, so it should be more efficient, but not if the wrong one is chosen or if it is improperly configured.

    Read the article

  • Sparse parameter selection using Genetic Algorithm

    - by bgbg
    Hello, I'm facing a parameter selection problem, which I would like to solve using Genetic Algorithm (GA). I'm supposed to select not more than 4 parameters out of 3000 possible ones. Using the binary chromosome representation seems like a natural choice. The evaluation function punishes too many "selected" attributes and if the number of attributes is acceptable, it then evaluates the selection. The problem is that in these sparse conditions the GA can hardly improve the population. Neither the average fitness cost, nor the fitness of the "worst" individual improves over the generations. All I see is slight (even tiny) improvement in the score of the best individual, which, I suppose, is a result of random sampling. Encoding the problem using indices of the parameters doesn't work either. This is most probably, due to the fact that the chromosomes are directional, while the selection problem isn't (i.e. chromosomes [1, 2, 3, 4]; [4, 3, 2, 1]; [3, 2, 4, 1] etc. are identical) What problem representation would you suggest? P.S If this matters, I use PyEvolve.

    Read the article

  • finding ALL cycles in a huge sparse matrix

    - by Andy
    Hi there, First of all I'm quite a Java beginner, so I'm not sure if this is even possible! Basically I have a huge (3+million) data source of relational data (i.e. A is friends with B+C+D, B is friends with D+G+Z (but not A - i.e. unmutual) etc.) and I want to find every cycle within this (not necessarily connected) directed graph. I've found this thread (http://stackoverflow.com/questions/546655/finding-all-cycles-in-graph/549402#549402) which has pointed me to Donald Johnson's (elementary) cycle-finding algorithm which, superficially at least, looks like it'll do what I'm after (I'm going to try when I'm back at work on Tuesday - thought it wouldn't hurt to ask in the meanwhile!). I had a quick scan through the code of the Java implementation of Johnson's algorithm (in that thread) and it looks like a matrix of relations is the first step, so I guess my questions are: a) Is Java capable of handling a 3+million*3+million matrix? (was planning on representing A-friends-with-B by a binary sparse matrix) b) Do I need to find every connected subgraph as my first problem, or will cycle-finding algorithms handle disjoint data? c) Is this actually an appropriate solution for the problem? My understanding of "elementary" cycles is that in the graph below, rather than picking out A-B-C-D-E-F it'll pick out A-B-F, B-C-D etc. but that's not the end of the world given the task. E / \ D---F / \ / \ C---B---A d) If necessary, I can simplify the problem by enforcing mutuality in relations - i.e. A-friends-with-B <== B-friends-with-A, and if really necessary I can maybe cut down the data size, but realistically it is always going to be around the 1mil mark. z) Is this a P or NP task?! Am I biting off more than I can chew? Thanks all, any help appreciated! Andy

    Read the article

  • How do I convert a Linux disk image into a sparse file?

    - by endolith
    I have a bunch of disk images, made with ddrescue, on an EXT partition, and I want to reduce their size without losing data, while still being mountable. How can I fill the empty space in the image's filesystem with zeros, and then convert the file into a sparse file so this empty space is not actually stored on disk? For example: > du -s --si --apparent-size Jimage.image 120G Jimage.image > du -s --si Jimage.image 121G Jimage.image This actually only has 50G of real data on it, though, so the second measurement should be much smaller. This supposedly will fill empty space with zeros: cat /dev/zero > zero.file rm zero.file But if sparse files are handled transparently, it might actually create a sparse file without writing anything to the virtual disk, ironically preventing me from turning the virtual disk image into a sparse file itself. :) Does it? Note: For some reason, sudo dd if=/dev/zero of=./zero.file works when cat does not on a mounted disk image.

    Read the article

  • Excel Match Bug in a sparse range with duplicate keys

    - by DangerMouse
    Data in TheRange is {1,"",1,"",1,"",1,"",2} =Match(2, TheRange, 1) returns 9 as expected =Match(1.5, TheRange, 1) returns 7 as expected =Match(1, TheRange, 1) returns 5 which is not expected Anyone come across this ? Anyone got a fix? Additionally if I use Worksheet.Function.Match in VBA I get more unexpected results.

    Read the article

  • PL/SQL bulk collect into associative array with sparse key

    - by Dan
    I want to execute a SQL query inside PL/SQL and populate the results into an associative array, where one of the columns in the SQL becomes the key in the associative array. For example, say I have a table Person with columns PERSON_ID INTEGER PRIMARY KEY PERSON_NAME VARCHAR2(50) ...and values like: PERSON_ID | PERSON_NAME ------------------------ 6 | Alice 15 | Bob 1234 | Carol I want to bulk collect this table into a TABLE OF VARCHAR2(50) INDEX BY INTEGER such that the key 6 in this associative array has the value Alice and so on. Can this be done in PL/SQL? If so, how?

    Read the article

  • android - pre-allocate space for a file before downloading it

    - by android developer
    how can i create a pre-allocated file on android? something like a sparse file ? i need to make an applicaton that will download a large file , and i want to avoid having an out-of-space error while the download is commencing . my solution needs to support resuming and writing to both internal and external storage. i've tried what is written here: Create file with given size in Java but none of the solutions there didn't work for some reason.

    Read the article

  • Does the OSS Backup Solution amanda.org support sparse files?

    - by user97961
    I want to (or better have to) do Backups of my KVM Virtual Machine images. I have searched for days for a good Backup Soloution. I know amanda is a very good solution. It would be kinf if someone kenn tell me if the following is supported: Trigger the Creation of LVM Snapshot (by invoking a Shell Script that I will write for that purpose) Do a Differential/Delta Backup on my KVM LVM qcow2 sparse file. = I only want to copy the actually changed bits/bytes (=Delta Backup). And it has to support that the file to be backuped up is a sparse file. (Rsync seems to have some kind of problems in regard to this (if the file does not exist yet on the other side... Then it will create a full file, not a sparse file)) Release the LVM Snapshot (By invoking a Script that I will write for that purpose) It's strange, I have nowhere found any documentation about this fact when searching the internet. Zmanda (Commercial Edition) has support vom XEN VM Backup (but not for KVM as far as I can tell)...

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >