percona xtradb cluster - Page 67

Need help implementing this algorithm with map Hadoop MapReduce

- by Julia

Hi all! i have algorithm that will go through a large data set read some text files and search for specific terms in those lines. I have it implemented in Java, but I didnt want to post code so that it doesnt look i am searching for someone to implement it for me, but it is true i really need a lot of help!!! This was not planned for my project, but data set turned out to be huge, so teacher told me I have to do it like this. EDIT(i did not clarified i previos version)The data set I have is on a Hadoop cluster, and I should make its MapReduce implementation I was reading about MapReduce and thaught that i first do the standard implementation and then it will be more/less easier to do it with mapreduce. But didnt happen, since algorithm is quite stupid and nothing special, and map reduce...i cant wrap my mind around it. So here is shortly pseudo code of my algorithm LIST termList (there is method that creates this list from lucene index) FOLDER topFolder INPUT topFolder IF it is folder and not empty list files (there are 30 sub folders inside) FOR EACH sub folder GET file "CheckedFile.txt" analyze(CheckedFile) ENDFOR END IF Method ANALYZE(CheckedFile) read CheckedFile WHILE CheckedFile has next line GET line FOR(loops through termList) GET third word from line IF third word = term from list append whole line to string buffer ENDIF ENDFOR END WHILE OUTPUT string buffer to file Also, as you can see, each time when "analyze" is called, new file has to be created, i understood that map reduce is difficult to write to many outputs??? I understand mapreduce intuition, and my example seems perfectly suited for mapreduce, but when it comes to do this, obviously I do not know enough and i am STUCK! Please please help.

Read the article

Are document-oriented databases any more suitable than relational ones for persisting objects?

- by Owen Fraser-Green

In terms of database usage, the last decade was the age of the ORM with hundreds competing to persist our object graphs in plain old-fashioned RMDBS. Now we seem to be witnessing the coming of age of document-oriented databases. These databases are highly optimized for schema-free documents but are also very attractive for their ability to scale out and query a cluster in parallel. Document-oriented databases also hold a couple of advantages over RDBMS's for persisting data models in object-oriented designs. As the tables are schema-free, one can store objects belonging to different classes in an inheritance hierarchy side-by-side. Also, as the domain model changes, so long as the code can cope with getting back objects from an old version of the domain classes, one can avoid having to migrate the whole database at every change. On the other hand, the performance benefits of document-oriented databases mainly appear to come about when storing deeper documents. In object-oriented terms, classes which are composed of other classes, for example, a blog post and its comments. In most of the examples of this I can come up with though, such as the blog one, the gain in read access would appear to be offset by the penalty in having to write the whole blog post "document" every time a new comment is added. It looks to me as though document-oriented databases can bring significant benefits to object-oriented systems if one takes extreme care to organize the objects in deep graphs optimized for the way the data will be read and written but this means knowing the use cases up front. In the real world, we often don't know until we actually have a live implementation we can profile. So is the case of relational vs. document-oriented databases one of swings and roundabouts? I'm interested in people's opinions and advice, in particular if anyone has built any significant applications on a document-oriented database.

Read the article

Developing on both Windows & Linux machines simultaneously

- by Jamie

Sorry for the bad title (couldn't think of a better way to describe it) I have a windows machine which I do development on. However, I have a new project which needs to interact with a linux system (executing linux commands etc.). So, obviously I can't do development on my windows machine..and I don't wish to code on the dev machine, svn commit and then svn update it on the linux machine. Is there a way where any changes I make on my dev machine will be quickly mirrored to the linux machine? SVN is not a very quick alternative and of course some changes will be very minor. Any ideas? A network share I guess....but that's not very pretty (bit slow too). As fellow developers I would like to know if you've been in a similar situation and how you've resolved it. On a furthernote, I can't just install Ubuntu as my development machine and mirror the commands, applications etc. from the linux machine because it's a cluster 'master' machine and so therefore it has quite a special configuration. Thanks guys! EDIT: I've also thought about having web services on the linux machine and then just calling them from code thus seperating platform development dependency. What do you think about that too? thanks

Read the article

Simplest distributed persistent key/value store that supports primary key range queries

- by StaxMan

I am looking for a properly distributed (i.e. not just sharded) and persisted (not bounded by available memory on single node, or cluster of nodes) key/value ("nosql") store that does support range queries by primary key. So far closest such system is Cassandra, which does above. However, it adds support for other features that are not essential for me. So while I like it (and will consider using it of course), I am trying to figure out if there might be other mature projects that implement what I need. Specifically, for me the only aspect of value I need is to access it as a blob. For key, however, I need range queries (as in, access values ordered, limited by start and/or end values). While values can have structures, there is no need to use that structure for anything on server side (can do client-side data binding, flexible value/content types etc). For added bonus, Cassandra style storage (journaled, all sequential writes) seems quite optimal for my use case. To help filter out answers, I have investigated some alternatives within general domain like: Voldemort (key/value, but no ordering) and CouchDB (just sharded, more batch-oriented); and am aware of systems that are not quite distributed while otherwise qualifying (bdb variants, tokyo cabinet itself (not sure if Tyrant might qualify), redis (in-memory store only)).

Read the article

CORBA on MacOS X (Cocoa)

- by user8472

I am currently looking into different ways to support distributed model objects (i.e., a computational model that runs on several different computers) in a project that initially focuses on MacOS X (using Cocoa). As far as I know there is the possibility to use the class cluster around NSProxy. But there also seem to be implementations of CORBA around with Objective-C support. At a later time there may be the need to also support/include Windows machines. In that case I would need to use something like Gnustep on the Windows side (which may be an option, if it works well) or come up with a combination of both technologies. Or write something manually (which is, of course, the least desirable option). My questions are: If you have experience with both technologies (Cocoa native infrastructure vs. CORBA) can you point out some key features/issues of either approach? Is it possible to use Gnustep with Cocoa in the way explained above? Is it possible (and reasonably feasible, i.e. simpler than writing a network layer manually) to communicate among all MacOS clients using Cocoa's technology and with Windows clients through CORBA?

Read the article

How can you transform a set of numbers into mostly whole ones?

- by Alice

Small amount of background: I am working on a converter that bridges between a map maker (Tiled) that outputs in XML, and an engine (Angel2D) that inputs lua tables. Most of this is straight forward However, Tiled outputs in pixel offsets (integers of absolute values), while Angel2D inputs OpenGL units (floats of relative values); a conversion factor between these two is needed (for example, 32px = 1gu). Since OpenGL units are abstract, and the camera can zoom in or out if the objects are too small or big, the actual conversion factor isn't important; I could use a random number, and the user would merely have to zoom in or out. But it would be best if the conversion factor was selected such that most numbers outputted were small and whole (or fractions of small whole numbers), because that makes it easier to work with (and the whole point of the OpenGL units is that they are easy to work with). How would I find such a conversion factor reliably? My first attempt was to use the smallest number given; this resulted in no fractions below 1, but often lead to lots of decimal places where the factors didn't line up. Then I tried the mode of the sequence, which lead to the largest number of 1's possible, but often lead to very long floats for background images. My current approach gets the GCD of the whole sequence, which, when it works, works great, but can easily be thrown off course by a single bad apple. Note that while I could easily just pass the numbers I am given along, or pick some fixed factor, or use one of the conversions I specified above, I am looking for a method to reliably scale this list of integers to small, whole numbers or simple fractions, because this would most likely be unsurprising to the end user; this is not a one off conversion. The end users tend to use 1.0 as their "base" for manipulations (because it's simple and obvious), so it would make more sense for the sizes of entities to cluster around this.

Read the article

Advice needed: cold backup for SQL Server 2008 Express?

- by Mikey Cee

What are my options for achieving a cold backup server for SQL Server Express instance running a single database? I have an SQL Server 2008 Express instance in production that currently represents a single point of failure for my application. I have a second physical box sitting at the installation that is currently doing nothing. I want to somehow replicate my database in near real time (a little bit of data loss is acceptable) to the second box. The database is very small and resources are utilized very lightly. In the case that the production server dies, I would manually reconfigure my application to point to the backup server instead. Although Express doesn't support log shipping, I am thinking that I could manually script a poor man's version of it, where I use batch files to take the logs and copy them across the network and apply them to the second server at 5 minute intervals. Does anyone have any advice on whether this is technically achievable, or if there is a better way to do what I am trying to do? Note that I want to avoid having to pay for the full version of SQL Server and configure mirroring as I think it is an overkill for this application. I understand that other DB platforms may present suitable options (eg. a MySQL Cluster), but for the purposes of this discussion, let's assume we have to stick to SQL Server.

Read the article

How to make ActionController::Base.asset_host and Base.relative_url_root independent?

- by GSP

In our Intranet environment we have a decree that common assets (stylesheets, images, etc) should be fed from Apache root while Rails apps runs from from a "sub directory" (a proxy to a Mongrel cluster). In other words: <%= stylesheet_tag '/common' %> # <link href="http://1.1.1.1/stylesheets/common.css" /> <%= link_to 'Home', :controller=>'home', :action=>'index' %> # <a href="http://1.1.1.1/myapp/" /> What I would like to do is define the assets and relative url in my configuration like this: ActionController::Base.asset_host=http://1.1.1.1 ActionController::Base.relative_url_root=/myapp But when I do this, the relative_url_root value gets appended to the asset_host value. (e.g. <link href="http://1.1.1.1/myapp/stylesheets/common.css"> ) Is there a way to stop this? And, more importantly, is there a best practice for how to deploy a rails app to an environment like this? (BTW, I don't want to simply hard code my asset paths since the development environment does not match the test and production environments.) Environment: Rails 2.3.4 Ruby 1.8.7 RedHat Linux 5.4?

Read the article

Developing on a windows machine that interacts with a linux system

- by Jamie

Sorry for the bad title (couldn't think of a better way to describe it) I have a windows machine which I do development on. However, I have a new project which needs to interact with a linux system (executing linux commands etc.). So, obviously I can't do development on my windows machine..and I don't wish to code on the dev machine, svn commit and then svn update it on the linux machine. Is there a way where any changes I make on my dev machine will be quickly mirrored to the linux machine? SVN is not a very quick alternative and of course some changes will be very minor. Any ideas? A network share I guess....but that's not very pretty (bit slow too). As fellow developers I would like to know if you've been in a similar situation and how you've resolved it. On a furthernote, I can't just install Ubuntu as my development machine and mirror the commands, applications etc. from the linux machine because it's a cluster 'master' machine and so therefore it has quite a special configuration. Thanks guys! EDIT: I've also thought about having web services on the linux machine and then just calling them from code thus seperating platform development dependency. What do you think about that too? thanks

Read the article

Time with and without OpenMP

- by was

I have a question.. I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication.. relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c). The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation: (This program runs in a cluster and computers have 2 cores, thus I created 2 threads.) The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP. I tried to multiply two 600x600 matrices. void Local_matrix_multiply( LOCAL_MATRIX_T* local_A /* in */, LOCAL_MATRIX_T* local_B /* in */, LOCAL_MATRIX_T* local_C /* out */) { int i, j, k; chunk = CHUNKSIZE; // 100 #pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2) { /* tid = omp_get_thread_num(); if(tid == 0){ nthreads = omp_get_num_threads(); printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads); } printf("Thread %d use the matrix: \n", tid); */ #pragma omp for schedule(static, chunk) for (i = 0; i < Order(local_A); i++) for (j = 0; j < Order(local_A); j++) for (k = 0; k < Order(local_B); k++) Entry(local_C,i,j) = Entry(local_C,i,j) + Entry(local_A,i,k)*Entry(local_B,k,j); } //end pragma omp parallel } /* Local_matrix_multiply */

Read the article

What kind of library to use for display of graphical objects and right click context menus

- by Gopal

Hi all, Goal: To develop a web based NMS interface which displays a network topology (e.g., switches, routers, links, endhosts). Each node should be 'movable' (draggable to an appropriate place manually or their best location computed algorithmically). I should be able to zoom into the network graph (say if there are many clusters of nodes and I want to concentrate on a particular cluster of nodes). I should be able to right-click any node or link and get a context menu (e.g., 'show routing table', 'show interfaces', 'show bandwidth utilization graph' etc.). The data for this network topology will be fetched by making calls to an apache based webserver where the backend scripts in python will fetch the appropriate data and send it via JSON to the web client. Question: I am assuming that some sort of javascript library/framework will be most appropriate for this - jQuery, Dojo, Moo etc. [I've never used any of these before]. Which of these would be most recommended for this sort of thing. Which would be easiest to learn (say in a months time). Thanks for any tips.

Read the article

grailsApplication access in Grails unit Test

- by Reza

I am trying to write unit tests for a service which use grailsApplication.config to do some settings. It seems that in my unit tests that service instance could not access the config file (null pointer) for its setting while it could access that setting when I run "run-app". How could I configure the service to access grailsApplication service in my unit tests. class MapCloudMediaServerControllerTests { def grailsApplication @Before public void setUp(){ grailsApplication.config= ''' video{ location="C:\\tmp\\" // or shared filesystem drive for a cluster yamdi{ path="C:\\FFmpeg\\ffmpeg-20121125-git-26c531c-win64-static\\bin\\yamdi" } ffmpeg { fileExtension = "flv" // use flv or mp4 conversionArgs = "-b 600k -r 24 -ar 22050 -ab 96k" path="C:\\FFmpeg\\ffmpeg-20121125-git-26c531c-win64-static\\bin\\ffmpeg" makethumb = "-an -ss 00:00:03 -an -r 2 -vframes 1 -y -f mjpeg" } ffprobe { path="C:\\FFmpeg\\ffmpeg-20121125-git-26c531c-win64-static\\bin\\ffprobe" params="" } flowplayer { version = "3.1.2" } swfobject { version = "" qtfaststart { path= "C:\\FFmpeg\\ffmpeg-20121125-git-26c531c-win64-static\\bin\\qtfaststart" } } ''' } @Test void testMpegtoFlvConvertor() { log.info "In test Mpg to Flv Convertor function!" def controller=new MapCloudMediaServerController() assert controller!=null controller.videoService=new VideoService() assert controller.videoService!=null log.info "Is the video service null? ${controller.videoService==null}" controller.videoService.grailsApplication=grailsApplication log.info "Is grailsApplication null? ${controller.videoService.grailsApplication==null}" //Very important part for simulating the HTTP request controller.metaClass.request = new MockMultipartHttpServletRequest() controller.request.contentType="video/mpg" controller.request.content= new File("..\\MapCloudMediaServer\\web-app\\videoclips\\sample3.mpg").getBytes() controller.mpegtoFlvConvertor() byte[] videoOut=IOUtils.toByteArray(controller.response.getOutputStream()) def outputFile=new File("..\\MapCloudMediaServer\\web-app\\videoclips\\testsample3.flv") outputFile.append(videoOut) } }

Read the article

Send Special Keys to Gtk.VteTerminal

- by Ubersoldat

Hi I have this OSS Project called Monocaffe connections manager which uses the Gtk.VteTerminal widget from PyGTK. A nice feature is that it allows the users to send commands to different servers' consoles (cluster mode) using a Gtk.TextView for the input. The way I send key strokes to each Gtk.VteTerminal is by using the feed_child method. For common keys there's no problem: I simply feed what the TextView receives to all the terminals, but when doing so with special keys I get into a little trouble. For "Return" I catch the event and feed the terminal a '\n'. For back-space is the same, catch the event and feed a '\b'. def cluster_backspace(self, widget): return self.cluster_send_key('\b') The problem comes with other keys like Tab, Arrows, Esc which I don't know how to feed as str to the terminal to recognize them. In the case of Esc is a real pain, because the users can edit the same file on different servers using vi, but cannot escape insert mode. Anyway, I'm not looking for a complete solution, just ideas since I've ran out of them. Thanks.

Read the article

Generating different randoms valid for a day on different independent devices?

- by Pentium10

Let me describe the system. There are several mobile devices, each independent from each other, and they are generating content for the same record id. I want to avoid generating the same content for the same record on different devices, for this I though I would use a random and make it so too cluster the content pool based on these randoms. Suppose you have choices from 1 to 100. Day 1 Device#1 will choose for the record#33 between 1-10 Device#2 will choose for the record#33 between 40-50 Device#3 will choose for the record#33 between 50-60 Device#1 will choose for the record#55 between 40-50 Device#2 will choose for the record#55 between 1-10 Device#3 will choose for the record#55 between 10-20 Device#1 will choose for the record#11 between 1-10 Device#2 will choose for the record#22 between 1-10 Device#3 will choose for the record#99 between 1-10 Day 2 Device#1 will choose for the record#33 between 90-100 Device#2 will choose for the record#33 between 1-10 Device#3 will choose for the record#33 between 50-60 They don't have access to a central server. Data available for each of them: IMEI (unique per mobile) Date of today (same on all devices) Record id (same on all devices) What do you think, how is it possible? ps. tags can be edited

Read the article

Wicket application + Apache + mod_jk - AJP queues are filling up!

- by nojyarg

Dear community, We are having a Wicket-based Java application deployed in a production server cluster using Apache (2.2.3) with mod_jk (1.2.30) as load balancing component w/ sticky session and Jboss 5 as application container for the Java application. We are inconsistently seeing an issue in our production environment where our AJP queues between Apache and Jboss as shown in the JMX console fill up with requests to the point where the application server is no longer taking on any new requests. When looking at all involved system components (overall traffic, load db, process list db, load of all clustered application server nodes) nothing points towards a capacity issue which would explain why the calls are being stalled in the AJP queue. Instead all systems appear sufficiently idle. So far, our only remedy to this issue is to restart the appservers and the load balancer which only occasionally clears the AJP queues. We are trying to figure out why the queues are filling up to the point that no calls get returned to the end user although the system is not under a high load. Has anyone else experienced similar problems? Are there any other system metrics we should monitor that could explain the queuing behavior? Is this potentially a mod_jk issue? If so, is it advisable to swap mod_jk with mod_cluster to resolve the issue? Any advice is highly appreciated. If I can provide additional information for the sake of troubleshooting I would be more than willing to do so. /Ben

Read the article

Error handling approach on PHP

- by Industrial

Hi everybody, We have a web server that we're about to launch a number of applications onto. They will all share database and memcached servers, but each application has it's own mySQL database and all memcached keys per application, is prefixed. Possible scenario: If a memcached server in our cluster goes boom, we want someone (operative system admin) to be automatically contacted by email/iphone push notification or in any other appropriate way. If we we're about to install 150 identical applications for our customers on our servers, and a memcached server dies - all 150 applications will individually find this out and contact our system admin, which most certainly is going to think about getting a new job where he or she isn't about to be woken up by getting 150 messages sent 4:15 in the morning. Possible solution: One idea is to set up an external server for error handling that gets a $_POST or cURL request sent, and handles storage of the error message depending on the seriousness of the actual error message. It would of course check upon receiving the error call, that if the same memcached server have already been reported as offline, there would be no need to spam the system admin with additional reminders... The questions: What's a good approach on how to handle errors? How does the big guys in the industry handle this? Thanks!

Read the article

File IO with Streams - Best Memory Buffer Size

- by AJ

I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance).

Read the article

Error monitoring/handling on webservers

- by Industrial

Hi everybody, We have a web server that we're about to launch a number of applications onto. They will all share database and memcached servers, but each application has it's own mySQL database and all memcached keys per application, is prefixed. Possible scenario: If a memcached server in our cluster goes boom, we want someone (operative system admin) to be automatically contacted by email/iphone push notification or in any other appropriate way. If we we're about to install 150 identical applications for our customers on our servers, and a memcached server dies - all 150 applications will individually find this out and contact our system admin, which most certainly is going to think about getting a new job where he or she isn't about to be woken up by getting 150 messages sent 4:15 in the morning. Possible solution: One idea is to set up an external server for error handling that gets a $_POST or cURL request sent, and handles storage of the error message depending on the seriousness of the actual error message. It would of course check upon receiving the error call, that if the same memcached server have already been reported as offline, there would be no need to spam the system admin with additional reminders... The questions: What's a good approach on how to handle errors? How does the big guys in the industry handle this? Thanks!

Read the article

How to deal with a Java serialized object whose package changed?

- by Alex

I have a Java class that is stored in an HttpSession object that's serialized and transfered between servers in a cluster environment. For the purpose of this explanation, lets call this class "Person". While in the process of improving the code, this class was moved from "com.acme.Person" to "com.acme.entity.Person". Internally, the class remains exactly the same (same fields, same methods, same everything). The problem is that we have two sets of servers running the old code and the new code at the same time. The servers with the old code have serialized HttpSession object and when the new code unserializes it, it throws a ClassNotFoundException because it can't find the old reference to com.acme.Person. At this point, it's easy to deal with this because we can just recreate the object using the new package. The problem then becomes that the HttpSession in the new servers, will serialize the object with the new reference to com.acme.entity.Person, and when this is unserialized in the servers running the old code, another exception will be thrown. At this point, we can't deal with this exception anymore. What's the best strategy to follow for this kind of cases? Is there a way to tell the new servers to serialize the object with the reference to the old package and unserialize references to the old package to the new one? How would we transition to using the new package and forgetting about the old one once all servers run the new code?

Read the article

shell script segment to avoid overwriting files

- by johndashen

I have a perl script (or any executable) E which will take a file foo.xml and write a file foo.txt. I use a Beowulf cluster to run E for a large number of XML files, but I'd like to write a simple job server script in shell (bash) which doesn't overwrite existing txt files. I'm currently doing something like #!/bin/sh PATTERN="[A-Z]*0[1-2][a-j]"; # this matches foo in all cases todo=`ls *.xml | grep $PATTERN`; isdone=`ls *.foo | grep $PATTERN`; whatsleft=todo - isdone; # what's the unix magic? #and then call the job server; jobserve E "$whatsleft"; and then I don't know how to get the difference between $todo and $isdone. I'd prefer using sort/uniq to something like a for loop with grep inside, but I'm not sure how to do it (pipes? temporary files?) As a bonus question, is there a way to do lookahead search in bash grep?

Read the article

Managing server instance identity on EC2

- by kikibobo

I recently brought up a cluster on EC2, and I felt like I had to invent a lot of things. I'm wondering what kinds of tools, patterns, ideas are out there for how to deal with this. Some context: I had 3 different kinds of servers, so first I created AMIs for each of them. The first AMI had zookeeper, so step one in deploying the system was to get the zookeeper server running. My script then made a note of the mapping between EC2's completely arbitrary and unpredictable hostnames, and the zookeeper server. Then as I brought up new instances of the other 2 kinds of servers, the first thing I would do is ssh to the new server, and add the zookeeper server to its /etc/hosts file. Then as the server software on each instance starts up, it can find zookeeper. Obviously this is a problem that lots of people have to solve, and it probably works a little bit differently in different clouds. Are there products that address this concept? I was pretty surprised that EC2 didn't provide some kind of way to tie your own name to its name. Thanks for any ideas.

Read the article

method works fine, until it is called in a function, then UnboundLocalError

- by user1776100

I define a method called dist, to calculate the distance between two points which I does it correctly when directly using the method. However, when I get a function to call it to calculate the distance between two points, I get UnboundLocalError: local variable 'minkowski_distance' referenced before assignment edit sorry, I just realised, this function does work. However I have another method calling it that doesn't. I put the last method at the bottom This is the method: class MinkowskiDistance(Distance): def __init__(self, dist_funct_name_str = 'Minkowski distance', p=2): self.p = p def dist(self, obj_a, obj_b): distance_to_power_p=0 p=self.p for i in range(len(obj_a)): distance_to_power_p += abs((obj_a[i]-obj_b[i]))**(p) minkowski_distance = (distance_to_power_p)**(1/p) return minkowski_distance and this is the function: (it basically splits the tuples x and y into their number and string components and calculates the distance between the numeric part of x and y and then the distance between the string parts, then adds them. def total_dist(x, y, p=2, q=2): jacard = QGramDistance(q=q) minkowski = MinkowskiDistance(p=p) x_num = [] x_str = [] y_num = [] y_str = [] #I am spliting each vector into its numerical parts and its string parts so that the distances #of each part can be found, then summed together. for i in range(len(x)): if type(x[i]) == float or type(x[i]) == int: x_num.append(x[i]) y_num.append(y[i]) else: x_str.append(x[i]) y_str.append(y[i]) num_dist = minkowski.dist(x_num,y_num) str_dist = I find using some more steps #I am simply adding the two types of distance to get the total distance: return num_dist + str_dist class NearestNeighbourClustering(Clustering): def __init__(self, data_file, clust_algo_name_str='', strip_header = "no", remove = -1): self.data_file= data_file self.header_strip = strip_header self.remove_column = remove def run_clustering(self, max_dist, p=2, q=2): K = {} #dictionary of clusters data_points = self.read_data_file() K[0]=[data_points[0]] k=0 #I added the first point in the data to the 0th cluster #k = number of clusters minus 1 n = len(data_points) for i in range(1,n): data_point_in_a_cluster = "no" for c in range(k+1): distances_from_i = [total_dist(data_points[i],K[c][j], p=p, q=q) for j in range(len(K[c]))] d = min(distances_from_i) if d <= max_dist: K[c].append(data_points[i]) data_point_in_a_cluster = "yes" if data_point_in_a_cluster == "no": k += 1 K[k]=[data_points[i]] return K

Read the article

System Account Logon Failures ever 30 seconds

- by floyd

We have two Windows 2008 R2 SP1 servers running in a SQL failover cluster. On one of them we are getting the following events in the security log every 30 seconds. The parts that are blank are actually blank. Has anyone seen similar issues, or assist in tracking down the cause of these events? No other event logs show anything relevant that I can tell. Log Name: Security Source: Microsoft-Windows-Security-Auditing Date: 10/17/2012 10:02:04 PM Event ID: 4625 Task Category: Logon Level: Information Keywords: Audit Failure User: N/A Computer: SERVERNAME.domainname.local Description: An account failed to log on. Subject: Security ID: SYSTEM Account Name: SERVERNAME$ Account Domain: DOMAINNAME Logon ID: 0x3e7 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: Account Domain: Failure Information: Failure Reason: Unknown user name or bad password. Status: 0xc000006d Sub Status: 0xc0000064 Process Information: Caller Process ID: 0x238 Caller Process Name: C:\Windows\System32\lsass.exe Network Information: Workstation Name: SERVERNAME Source Network Address: - Source Port: - Detailed Authentication Information: Logon Process: Schannel Authentication Package: Kerberos Transited Services: - Package Name (NTLM only): - Key Length: 0 Second event which follows every one of the above events Log Name: Security Source: Microsoft-Windows-Security-Auditing Date: 10/17/2012 10:02:04 PM Event ID: 4625 Task Category: Logon Level: Information Keywords: Audit Failure User: N/A Computer: SERVERNAME.domainname.local Description: An account failed to log on. Subject: Security ID: NULL SID Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: Account Domain: Failure Information: Failure Reason: An Error occured during Logon. Status: 0xc000006d Sub Status: 0x80090325 Process Information: Caller Process ID: 0x0 Caller Process Name: - Network Information: Workstation Name: - Source Network Address: - Source Port: - Detailed Authentication Information: Logon Process: Schannel Authentication Package: Microsoft Unified Security Protocol Provider Transited Services: - Package Name (NTLM only): - Key Length: 0 EDIT UPDATE: I have a bit more information to add. I installed Network Monitor on this machine and did a filter for Kerberos traffic and found the following which corresponds to the timestamps in the security audit log. A Kerberos AS_Request Cname: CN=SQLInstanceName Realm:domain.local Sname krbtgt/domain.local Reply from DC: KRB_ERROR: KDC_ERR_C_PRINCIPAL_UNKOWN I then checked the security audit logs of the DC which responded and found the following: A Kerberos authentication ticket (TGT) was requested. Account Information: Account Name: X509N:<S>CN=SQLInstanceName Supplied Realm Name: domain.local User ID: NULL SID Service Information: Service Name: krbtgt/domain.local Service ID: NULL SID Network Information: Client Address: ::ffff:10.240.42.101 Client Port: 58207 Additional Information: Ticket Options: 0x40810010 Result Code: 0x6 Ticket Encryption Type: 0xffffffff Pre-Authentication Type: - Certificate Information: Certificate Issuer Name: Certificate Serial Number: Certificate Thumbprint: So appears to be related to a certificate installed on the SQL machine, still dont have any clue why or whats wrong with said certificate. It's not expired etc.

Read the article

Handling bounced email when using a postfix smarthost

- by Mark Rose

I'm running a high availability cluster, and so far, most things work great. I have two external machines that act as outgoing mail hosts (smarthosts). The internal hosts are configured to relay all email through these two external facing hosts. My smarthosts' main.cf looks like this: myhostname = lb1.example.com alias_maps = hash:/etc/aliases alias_database = hash:/etc/aliases mydestination = lb1.example.com, localhost relayhost = mynetworks = 127.0.0.0/8 10.1.248.0/24 My internal hosts' main.cf looks like this: mynetworks = 127.0.0.0/8 myhostname = web1.example.com mydestination = $myhostname, localhost.$mydomain, localhost relayhost = [10.1.248.3] smtp_fallback_relay = [10.1.248.2] lb1's internal IP is 10.1.248.2, and lb2's internal IP is 10.1.248.3. On the external hosts, email for root and www-data is forwarded to [email protected] with /etc/aliases. One advantage to using the smarthost setup is that spam filters and the like can connect back to the sending sending server. All email is sent fine, and headers look like this: Received: from lb2.example.com ([198.51.100.3]) by mx.google.com with ESMTP id y17si1571259icb.76.2011.01.13.18.20.32; Thu, 13 Jan 2011 18:20:32 -0800 (PST) Received-SPF: neutral (google.com: 198.51.100.3 is neither permitted nor denied by best guess record for domain of [email protected]) client-ip=198.51.100.3; Received: from db1.example.com (unknown [10.1.248.20]) by lb2.example.com (Postfix) with ESMTP id D364823C0BE for <[email protected]>; Thu, 13 Jan 2011 21:20:31 -0500 (EST) Received: by db1.example.com (Postfix) id C9FA7760D6A; Thu, 13 Jan 2011 21:20:31 -0500 (EST) Delivered-To: www-data@localhost Received: by db1.example.com (Postfix, from userid 0) id C1632760D6C; Thu, 13 Jan 2011 21:20:31 -0500 (EST) The problem is bounced/reject email. The external machine tries to forward the email back to the internal machine, e.g. www-data on web1 sending an email that bounces (such as a user signing up with a bad email address). An additional complication is using Google mail for the main example.com domain. In lieu of specifying every internal host in the external hosts' mydestination, is there a better way of setting things up, keeping in mind I can't adjust touch the mx for example.com?

Read the article

vSphere Client vCenter Template Customization Specification Using Windows Sysprep Unattended Answer XML File

- by Brian

I'm trying to setup a vSphere Client vCenter v5.0.0 Build 455964 Template Customization Specification using a Windows Sysprep unattended answer XML file for Win2008R2. However I didn't know how Sysprep worked before attempting this so it was a time-consuming nightmare (even after reviewing VMware vSphere ESXi 5's documentation)! I think I've figure out what I'm supposed to be doing, but it's still not working. The biggest problem at this point is that vSphere Client vCenter Customization Specification IP address information is not sticking when I load a Sysprep XML file with just 1 basic setting! This can only be a bug. Here is the process I'm using: PROCESS for Windows - vSphere Client Install Windows OS install VM Tools customize Windows (GPOs can be used to do this after deployment) install Applications (GPOs can be used to do this after deployment too) shutdown the VM convert the VM to a template create a custom Windows Sysprep XML answer file with desired customizations View Management Customization Specifications Manager create "New" Specification for "Target Virtual Machine OS" select Windows check "Use Custom Sysprep Answer File" (ADDS: Custom Sysprep File. KEEPS: Network (IP), Operating System Options (SID, Sysprep /generalize). REPLACES: Registration Information of Owner Name & Organization, Computer Name, Windows License (Key), Administrator Password, Time Zone, Run Once, Workgroup or Domain) name it as "VMwareCS-OS####R#x32/64w/Sysprep-TEST" (CS=Customization Specification) set Description as "Created YYYY/MM/DD by FLast" NEXT import a Sysprep answer file from secure location NEXT Custom settings NEXT click "..." box to right of "Use DHCP" set "Use the following IP settings:" for "IP Address" fill out the first 2 octets set appropriate values for other 2-3 fields set DNS server addresses OK NEXT check "Generate New Security ID (SID)" ALWAYS as template is likely a domain-member computer so it can be updated occasionally NEXT Finish View Inventory VMs and Templates right-click previously completed template Deploy Virtual Machine from this Template provide the new OS name (max15char) select inventory location NEXT select Host/Cluster (wait for validation to succeed) NEXT select Resource Pool (wait for validation to succeed) NEXT select Storage location NEXT check "Power on this virtual machine after creation" select "Customize using an existing customization specification" select desired specification select "Use the Customization Wizard to temporarily adjust the specification before deployment" NEXT NEXT Custom settings? NEXT check "Generate New Security ID (SID)" ALWAYS as template is likely a domain-member computer so it can be updated occasionally NEXT Finish Finish. I know a community member named "brian" (http://serverfault.com/users/25904/brian) has worked with this scenario before, but I couldn't figure out how to contact him directly, so Brian if you see this message could you provide some information to help? Thanks, Brian

Search Results

Search found 1815 results on 73 pages for 'percona xtradb cluster'.

Page 67/73 | < Previous Page | 63 64 65 66 67 68 69 70 71 72 73 | Next Page >

- by Julia

- by Owen Fraser-Green

- by Jamie

- by StaxMan

- by user8472

- by Alice

- by Mikey Cee

- by GSP

- by Jamie

- by was

- by Gopal

- by Reza

- by Ubersoldat

- by Pentium10

- by nojyarg

- by Industrial

- by AJ

- by Industrial

- by Alex

- by johndashen

- by kikibobo

- by user1776100

- by floyd

- by Mark Rose

- by Brian

< Previous Page | 63 64 65 66 67 68 69 70 71 72 73 | Next Page >