Search Results

Search found 58 results on 3 pages for 'user248237'.

Page 2/3 | < Previous Page | 1 2 3  | Next Page >

  • hierarchical clustering on correlations in Python scipy/numpy?

    - by user248237
    How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically clustering by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy array "X" that contains the 100 x 9 matrix, how can I do this? I tried using hcluster, based on this example: Y=pdist(X, 'seuclidean') Z=linkage(Y, 'single') dendrogram(Z, color_threshold=0) however, pdist is not what I want since that's euclidean distance. Any ideas? thanks.

    Read the article

  • reading csv files in scipy/numpy in Python

    - by user248237
    I am having trouble reading a csv file, delimited by tabs, in python. I use the following function: def csv2array(filename, skiprows=0, delimiter='\t', raw_header=False, missing=None, with_header=True): """ Parse a file name into an array. Return the array and additional header lines. By default, parse the header lines into dictionaries, assuming the parameters are numeric, using 'parse_header'. """ f = open(filename, 'r') skipped_rows = [] for n in range(skiprows): header_line = f.readline().strip() if raw_header: skipped_rows.append(header_line) else: skipped_rows.append(parse_header(header_line)) f.close() if missing: data = genfromtxt(filename, dtype=None, names=with_header, deletechars='', skiprows=skiprows, missing=missing) else: if delimiter != '\t': data = genfromtxt(filename, dtype=None, names=with_header, delimiter=delimiter, deletechars='', skiprows=skiprows) else: data = genfromtxt(filename, dtype=None, names=with_header, deletechars='', skiprows=skiprows) if data.ndim == 0: data = array([data.item()]) return (data, skipped_rows) the problem is that genfromtxt complains about my files, e.g. with the error: Line #27100 (got 12 columns instead of 16) I am not sure where these errors come from. Any ideas? Here's an example file that causes the problem: #Gene 120-1 120-3 120-4 30-1 30-3 30-4 C-1 C-2 C-5 genesymbol genedesc ENSMUSG00000000001 7.32 9.5 7.76 7.24 11.35 8.83 6.67 11.35 7.12 Gnai3 guanine nucleotide binding protein alpha ENSMUSG00000000003 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Pbsn probasin Is there a better way to write a generic csv2array function? thanks.

    Read the article

  • writing header in csv python with DictWriter

    - by user248237
    assume I have a csv.DictReader object and I want to write it out as a csv file. How can I do this? I thought of the following: dr = csv.DictReader(open(f), delimiter='\t') # process my dr object # ... # write out object output = csv.DictWriter(open(f2, 'w'), delimiter='\t') for item in dr: output.writerow(item) Is that the best way? More importantly, how can I make it so a header is written out too, in this case the object "dr"s .fieldnames property? thanks.

    Read the article

  • vectorized approach to binning with numpy/scipy in Python

    - by user248237
    I am binning a 2d array (x by y) in Python into the bins of its x value (given in "bins"), using np.digitize: elements_to_bins = digitize(vals, bins) where "vals" is a 2d array, i.e.: vals = array([[1, v1], [2, v2], ...]). elements_to_bins just says what bin each element falls into. What I then want to do is get a list whose length is the number of bins in "bins", and each element returns the y-dimension of "vals" that falls into that bin. I do it this way right now: points_by_bins = [] for curr_bin in range(min(elements_to_bins), max(elements_to_bins) + 1): curr_indx = where(elements_to_bins == curr_bin)[0] curr_bin_vals = vals[:, curr_indx] points_by_bins.append(curr_bin_vals) is there a more elegant/simpler way to do this? All I need is a list of of lists of the y-values that fall into each bin. thanks.

    Read the article

  • making binned boxplot in matplotlib with numpy and scipy in Python

    - by user248237
    I have a 2-d array containing pairs of values and I'd like to make a boxplot of the y-values by different bins of the x-values. I.e. if the array is: my_array = array([[1, 40.5], [4.5, 60], ...]]) then I'd like to bin my_array[:, 0] and then for each of the bins, produce a boxplot of the corresponding my_array[:, 1] values that fall into each box. So in the end I want the plot to contain number of bins-many box plots. I tried the following: min_x = min(my_array[:, 0]) max_x = max(my_array[:, 1]) num_bins = 3 bins = linspace(min_x, max_x, num_bins) elts_to_bins = digitize(my_array[:, 0], bins) However, this gives me values in elts_to_bins that range from 1 to 3. I thought I should get 0-based indices for the bins, and I only wanted 3 bins. I'm assuming this is due to some trickyness with how bins are represented in linspace vs. digitize. What is the easiest way to achieve this? I want num_bins-many equally spaced bins, with the first bin containing the lower half of the data and the upper bin containing the upper half... i.e., I want each data point to fall into some bin, so that I can make a boxplot. thanks.

    Read the article

  • packaging and distributing Python code as executables

    - by user248237
    How can I compile my python module into an executable, and distribute it in a way that doesn't require the user to download all the external Python packages that the module I wrote uses? Aside from that, is there a general guy on packaging and distributing python code? i.e. going from a script/module to someone that's user-friendly and can be run, ideally cross-platform (at least between unix and mac). thanks.

    Read the article

  • sampling integers uniformly efficiently in python using numpy/scipy

    - by user248237
    I have a problem where depending on the result of a random coin flip, I have to sample a random starting position from a string. If the sampling of this random position is uniform over the string, I thought of two approaches to do it: one using multinomial from numpy.random, the other using the simple randint function of Python standard lib. I tested this as follows: from numpy import * from numpy.random import multinomial from random import randint import time def use_multinomial(length, num_points): probs = ones(length)/float(length) for n in range(num_points): result = multinomial(1, probs) def use_rand(length, num_points): for n in range(num_points): rand(1, length) def main(): length = 1700 num_points = 50000 t1 = time.time() use_multinomial(length, num_points) t2 = time.time() print "Multinomial took: %s seconds" %(t2 - t1) t1 = time.time() use_rand(length, num_points) t2 = time.time() print "Rand took: %s seconds" %(t2 - t1) if __name__ == '__main__': main() The output is: Multinomial took: 6.58072400093 seconds Rand took: 2.35189199448 seconds it seems like randint is faster, but it still seems very slow to me. Is there a vectorized way to get this to be much faster, using numpy or scipy? thanks.

    Read the article

  • taking intersection of N-many lists in python

    - by user248237
    what's the easiest way to take the intersection of N-many lists in python? if I have two lists a and b, I know I can do: a = set(a) b = set(b) intersect = a.intersection(b) but I want to do something like a & b & c & d & ... for an arbitrary set of lists (ideally without converting to a set first, but if that's the easiest / most efficient way, I can deal with that.) I.e. I want to write a function intersect(*args) that will do it for arbitrarily many sets efficiently. What's the easiest way to do that? EDIT: My own solution is reduce(set.intersection, [a,b,c]) -- is that good? thanks.

    Read the article

  • problem with hierarchical clustering in Python

    - by user248237
    I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (i.e. 1 - Pearson correlation). My code is the following (the data is in a variable called "data"): from hcluster import * Y = pdist(data, 'correlation') cluster_type = 'average' Z = linkage(Y, cluster_type) dendrogram(Z) The error I get is: ValueError: Linkage 'Z' contains negative distances. What causes this error? The matrix "data" that I use is simply: [[ 156.651968 2345.168618] [ 158.089968 2032.840106] [ 207.996413 2786.779081] [ 151.885804 2286.70533 ] [ 154.33665 1967.74431 ] [ 150.060182 1931.991169] [ 133.800787 1978.539644] [ 112.743217 1478.903191] [ 125.388905 1422.3247 ]] I don't see how pdist could ever produce negative numbers when taking 1 - pearson correlation. Any ideas on this? thank you.

    Read the article

  • Doing arithmetic with up to two decimal places in Python?

    - by user248237
    I have two floats in Python that I'd like to subtract, i.e. v1 = float(value1) v2 = float(value2) diff = v1 - v2 I want "diff" to be computed up to two decimal places, that is compute it using %.2f of v1 and %.2f of v2. How can I do this? I know how to print v1 and v2 up to two decimals, but not how to do arithmetic like that. The particular issue I am trying to avoid is this. Suppose that: v1 = 0.982769777778 v2 = 0.985980444444 diff = v1 - v2 and then I print to file the following: myfile.write("%.2f\t%.2f\t%.2f\n" %(v1, v2, diff)) then I will get the output: 0.98 0.99 0.00, suggesting that there's no difference between v1 and v2, even though the printed result suggests there's a 0.01 difference. How can I get around this? thanks.

    Read the article

  • how to set a fixed color bar for pcolor in python matplotlib?

    - by user248237
    I am using pcolor with a custom color map to plot a matrix of values. I set my color map so that low values are white and high values are red, as shown below. All of my matrices have values between 0 and 20 (inclusive) and I'd like 20 to always be pure red and 0 to always be pure white, even if the matrix has values that don't span the entire range. For example, if my matrix only has values between 2 and 7, I don't want it to plot 2 as white and 7 as red, but rather color it as if the range is still 0 to 20. How can I do this? I tried using the "ticks=" option of colorbar but it did not work. Here is my current code (assume "my_matrix" contains the values to be plotted): cdict = {'red': ((0.0, 1.0, 1.0), (0.5, 1.0, 1.0), (1.0, 1.0, 1.0)), 'green': ((0.0, 1.0, 1.0), (0.5, 1.0, 1.0), (1.0, 0.0, 0.0)), 'blue': ((0.0, 1.0, 1.0), (0.5, 1.0, 1.0), (1.0, 0.0, 0.0))} my_cmap = matplotlib.colors.LinearSegmentedColormap('my_colormap', cdict, 256) colored_matrix = plt.pcolor(my_matrix, cmap=my_cmap) plt.colorbar(colored_matrix, ticks=[0, 5, 10, 15, 20]) any idea how I can fix this to get the right result? thanks very much.

    Read the article

  • filtering elements from list of lists in Python?

    - by user248237
    I want to filter elements from a list of lists, and iterate over the elements of each element using a lambda. For example, given the list: a = [[1,2,3],[4,5,6]] suppose that I want to keep only elements where the sum of the list is greater than N. I tried writing: filter(lambda x, y, z: x + y + z >= N, a) but I get the error: <lambda>() takes exactly 3 arguments (1 given) How can I iterate while assigning values of each element to x, y, and z? Something like zip, but for arbitrarily long lists. thanks, p.s. I know I can write this using: filter(lambda x: sum(x)..., a) but that's not the point, imagine that these were not numbers but arbitrary elements and I wanted to assign their values to variable names.

    Read the article

  • embedding a slideshow photo gallery with Picasa that autorepeats?

    - by user248237
    how can I modify the embedded photo gallery slideshow from Google's Picasa to auto repeat the pictures, i.e. play them over and over again? This is the code Picasa gives me for embedding in a website: <embed type="application/x-shockwave-flash" src="http://picasaweb.google.com/s/c/bin/slideshow.swf" width="400" height="267" flashvars="host=picasaweb.google.com&hl=en_US&feat=flashalbum&RGB=0x000000&feed=http%3A%2F%2Fpicasaweb.google.com%2Fdata%2Ffeed%2Fapi%2Fuser%2F102341124641778821128%2Falbumid%2F5469290618448219537%3Falt%3Drss%26kind%3Dphoto%26hl%3Den_US" pluginspage="http://www.macromedia.com/go/getflashplayer"> Related to this, are there free tools that take as input a Picasa username or a set of photos and automatically create a thumbnailed gallery, which is embeddable using Flash in any website? thanks.

    Read the article

  • building a pairwise matrix in scipy/numpy in Python from dictionaries

    - by user248237
    I have a dictionary whose keys are strings and values are numpy arrays, e.g.: data = {'a': array([1,2,3]), 'b': array([4,5,6]), 'c': array([7,8,9])} I want to compute a statistic between all pairs of values in 'data' and build an n by x matrix that stores the result. Assume that I know the order of the keys, i.e. I have a list of "labels": labels = ['a', 'b', 'c'] What's the most efficient way to compute this matrix? I can compute the statistic for all pairs like this: result = [] for elt1, elt2 in itertools.product(labels, labels): result.append(compute_statistic(data[elt1], data[elt2])) But I want result to be a n by n matrix, corresponding to "labels" by "labels". How can I record the results as this matrix? thanks.

    Read the article

  • efficiently finding the interval with non-zeros in scipy/numpy in Python?

    - by user248237
    suppose I have a python list or a python 1-d array (represented in numpy). assume that there is a contiguous stretch of elements how can I find the start and end coordinates (i.e. indices) of the stretch of non-zeros in this list or array? for example, a = [0, 0, 0, 0, 1, 2, 3, 4] nonzero_coords(a) should return [4, 7]. for: b = [1, 2, 3, 4, 0, 0] nonzero_coords(b) should return [0, 2]. thanks.

    Read the article

  • slicing arrays in numpy/scipy

    - by user248237
    I have an array like: a = array([[1,2,3],[3,4,5],[4,5,6]]) what's the most efficient way to slice out a 1x2 array out of this that has only the first two columns of "a"? I.e., array([[2,3],[4,5],[5,6]]) in this case. thanks.

    Read the article

  • using Python reduce over a list of pairs

    - by user248237
    I'm trying to pair together a bunch of elements in a list to create a final object, in a way that's analogous to making a sum of objects. I'm trying to use a simple variation on reduce where you consider a list of pairs rather than a flat list to do this. I want to do something along the lines of: nums = [1, 2, 3] reduce(lambda x, y: x + y, nums) except I'd like to add additional information to the sum that is specific to each element in the list of numbers nums. For example for each pair (a,b) in the list, run the sum as (a+b): nums = [(1, 0), (2, 5), (3, 10)] reduce(lambda x, y: (x[0]+x[1]) + (y[0]+y[1]), nums) This does not work: >>> reduce(lambda x, y: (x[0]+x[1]) + (y[0]+y[1]), nums) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <lambda> TypeError: 'int' object is unsubscriptable Why does it not work? I know I can encode nums as a flat list - that is not the point - I just want to be able to create a reduce operation that can iterate over a list of pairs, or over two lists of the same length simultaneously and pool information from both lists. thanks.

    Read the article

  • customizing Django look and feel in Python

    - by user248237
    I am learning Django and got it to work with wsgi. I'm following the tutorial here: http://docs.djangoproject.com/en/1.1/intro/tutorial01/ My question is: how can I customize the look and feel of Django? Is there a repository of templates that "look good", kind of like there are for Wordpress, that I can start from? I find the tutorial counterintuitive in that it goes immediately toward customizing the admin page of Django, rather than the main pages visible to users of the site. Is there an example of a "typical" Django site, with a decent template, that I can look at and built on/modify? The polls application is again not very representative since it's so specialized. any references on this would be greatly appreciated. thanks.

    Read the article

  • displaying a colored 2d array in matplotlib in Python

    - by user248237
    I'd like to plot a 2-d matrix from numpy as a colored matrix in Matplotlib. I have the following 9-by-9 array: my_array = diag(ones(9)) # plot the array pcolor(my_array) I'd like to set the first three elements of the diagonal to be a certain color, the next three to be a different color, and the last three a different color. I'd like to specify the color by a hex code string, like "#FF8C00". How can I do this? Also, how can I set the color of 0-valued elements for pcolor?

    Read the article

  • Accessing items from a dictionary using pickle efficiently in Python

    - by user248237
    I have a large dictionary mapping keys (which are strings) to objects. I pickled this large dictionary and at certain times I want to pull out only a handful of entries from it. The dictionary has usually thousands of entries total. When I load the dictionary using pickle, as follows: from cPickle import * # my dictionary from pickle, containing thousands of entries mydict = open(load('mypickle.pickle')) # accessing only handful of entries here for entry in relevant_entries: # find relevant entry value = mydict[entry] I notice that it can take up to 3-4 seconds to load the entire pickle, which I don't need, since I access only a tiny subset of the dictionary entries later on (shown above.) How can I make it so pickle only loads those entries that I have from the dictionary, to make this faster? Thanks.

    Read the article

  • using wild card when listing directories in python

    - by user248237
    how can I use wild cars like '*' when getting a list of files inside a directory in Python? for example, I want something like: os.listdir('foo/*bar*/*.txt') which would return a list of all the files ending in .txt in directories that have bar in their name inside of the foo parent directory. how can I do this? thanks.

    Read the article

  • doing arithmetic upto two significant figures in Python?

    - by user248237
    I have two floats in Python that I'd like to subtract, i.e. v1 = float(value1) v2 = float(value2) diff = v1 - v2 I want "diff" to be computed upto two significant figures, that is compute it using %.2f of v1 and %.2f of v2. How can I do this? I know how to print v1 and v2 up to two decimals, but not how to do arithmetic like that. thanks.

    Read the article

  • Putting newline in matplotlib label with TeX in Python?

    - by user248237
    How can I add a newline to a plot's label (e.g. xlabel or ylabel) in Matplotlib? For example, plt.bar([1, 2], [4, 5]) plt.xlabel("My x label") plt.ylabel(r"My long label with $\Sigma_{C}$ math \n continues here") Ideally i'd like the y-labeled to be centered too. Is there a way to do this? It's important that the label have both tex (enclosed in '$') and the newline. thanks.

    Read the article

< Previous Page | 1 2 3  | Next Page >