Search Results

Search found 1848 results on 74 pages for 'algorithms'.

Page 68/74 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 | Next Page >

What are the benefits of `while(condition) { //work }` and `do { //work } while(condition)`?

- by Shaharyar

I found myself confronted with an interview question where the goal was to write a sorting algorithm that sorts an array of unsorted int values: int[] unsortedArray = { 9, 6, 3, 1, 5, 8, 4, 2, 7, 0 }; Now I googled and found out that there are so many sorting algorithms out there! Finally I could motivate myself to dig into Bubble Sort because it seemed pretty simple to start with. I read the sample code and came to a solution looking like this: static int[] BubbleSort(ref int[] array) { long lastItemLocation = array.Length - 1; int temp; bool swapped; do { swapped = false; for (int itemLocationCounter = 0; itemLocationCounter < lastItemLocation; itemLocationCounter++) { if (array[itemLocationCounter] > array[itemLocationCounter + 1]) { temp = array[itemLocationCounter]; array[itemLocationCounter] = array[itemLocationCounter + 1]; array[itemLocationCounter + 1] = temp; swapped = true; } } } while (swapped); return array; } I clearly see that this is a situation where the do { //work } while(cond) statement is a great help to be and prevents the use of another helper variable. But is this the only case that this is more useful or do you know any other application where this condition has been used?

Read the article
How can I implement a splay tree that performs the zig operation last, not first?

- by Jakob

For my Algorithms & Data Structures class, I've been tasked with implementing a splay tree in Haskell. My algorithm for the splay operation is as follows: If the node to be splayed is the root, the unaltered tree is returned. If the node to be splayed is one level from the root, a zig operation is performed and the resulting tree is returned. If the node to be splayed is two or more levels from the root, a zig-zig or zig-zag operation is performed on the result of splaying the subtree starting at that node, and the resulting tree is returned. This is valid according to my teacher. However, the Wikipedia description of a splay tree says the zig step "will be done only as the last step in a splay operation" whereas in my algorithm it is the first step in a splay operation. I want to implement a splay tree that performs the zig operation last instead of first, but I'm not sure how it would best be done. It seems to me that such an algorithm would become more complex, seeing as how one needs to find the node to be splayed before it can be determined whether a zig operation should be performed or not. How can I implement this in Haskell (or some other functional language)?

Read the article
Concept: Information Into Memory Location.

- by Richeve S. Bebedor

I am having troubles conceptualizing an algorithm to be used to transform any information or data into a specific appropriate and reasonable memory location in any data structure that I will be devising. To give you an idea, I have a JPanel object instance and I created another Container type object instance of any subtype (note this is in Java because I love this language), then I collected those instances into a data structure not specifically just for those instances but also applicable to any type of object. Now my procedure for fetching those data again is to extract the object specific features similar in category to all object in that data structure and transform it into a integer data memory location (specifically as much as possible) or any type of data that will pertain to this transformation. And I can already access that memory location without further sorting or applications of O(n) time complex algorithms (which I think preferable but I wanted to do my own way XD). The data structure is of any type either binary tree, linked list, arrays or sets (and the like XD). What is important is I don't need to have successive comparing and analysis of data just to locate information in big structures. To give you a technical idea, I have to an array DS that contains JLabel object instance with a specific name "HelloWorld". But array DS contains other types of object (in multitude). Now this JLabel object has a location in the array at index [124324] (which is if you do any type of searching algorithm just to arrive at that location is conceivably slow because added to it the data structure used was an array *note please disregard the efficiency of the data structure to be used I just want to explain to you my concept XD). Now I want to equate "HelloWorld" to 124324 by using a conceptually made function applicable to all data types. So that I can do a direct search by doing this DS[extractLocation("HelloWorld")] just to get that JLabel instance. I know this may sound crazy but I want to test my concept of non-sorting feature extracting search algorithm for any data structure wherein my main problem is how to transform information to be stored into memory location of where it was stored.

Read the article
Algorithm to determine if array contains n...n+m?

- by Kyle Cronin

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions: Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false. The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out. Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...) edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this? The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?

Read the article
Meaning of NEXT in Linked List creation in perl

- by seleniumnewbie

So I am trying to learn Linked Lists using Perl. I am reading "Mastering Algorithms with Perl" by Job Orwant. In the book he explains how to create a linked list I understand most of it, but I just simply fail to understand the command/index/key NEXT in the second last line of the code snippet. $list=undef; $tail=\$list; foreach (1..5){ my $node = [undef, $_ * $_]; $$tail = $node; $tail = \${$node->[NEXT]}; # The NEXT on this line? } What is he trying to do there? Isn $node a scalar, which stores the address of the unnamed array. Also even if we are de-referencing $node, should we not refer to the individual elements by an index number example (0,1). If we do use "NEXT" as a key, is $node a reference to a hash? I am very confused. Something in plain English will be highly appreciated.

Read the article
finding long repeated substrings in a massive string

- by Will

I naively imagined that I could build a suffix trie where I keep a visit-count for each node, and then the deepest nodes with counts greater than one are the result set I'm looking for. I have a really really long string (hundreds of megabytes). I have about 1 GB of RAM. This is why building a suffix trie with counting data is too inefficient space-wise to work for me. To quote Wikipedia's Suffix tree: storing a string's suffix tree typically requires significantly more space than storing the string itself. The large amount of information in each edge and node makes the suffix tree very expensive, consuming about ten to twenty times the memory size of the source text in good implementations. The suffix array reduces this requirement to a factor of four, and researchers have continued to find smaller indexing structures. And that was wikipedia's comments on the tree, not trie. How can I find long repeated sequences in such a large amount of data, and in a reasonable amount of time (e.g. less than an hour on a modern desktop machine)? (Some wikipedia links to avoid people posting them as the 'answer': Algorithms on strings and especially Longest repeated substring problem ;-) )

Read the article
Type-safe generic data structures in plain-old C?

- by Bradford Larsen

I have done far more C++ programming than "plain old C" programming. One thing I sorely miss when programming in plain C is type-safe generic data structures, which are provided in C++ via templates. For sake of concreteness, consider a generic singly linked list. In C++, it is a simple matter to define your own template class, and then instantiate it for the types you need. In C, I can think of a few ways of implementing a generic singly linked list: Write the linked list type(s) and supporting procedures once, using void pointers to go around the type system. Write preprocessor macros taking the necessary type names, etc, to generate a type-specific version of the data structure and supporting procedures. Use a more sophisticated, stand-alone tool to generate the code for the types you need. I don't like option 1, as it is subverts the type system, and would likely have worse performance than a specialized type-specific implementation. Using a uniform representation of the data structure for all types, and casting to/from void pointers, so far as I can see, necessitates an indirection that would be avoided by an implementation specialized for the element type. Option 2 doesn't require any extra tools, but it feels somewhat clunky, and could give bad compiler errors when used improperly. Option 3 could give better compiler error messages than option 2, as the specialized data structure code would reside in expanded form that could be opened in an editor and inspected by the programmer (as opposed to code generated by preprocessor macros). However, this option is the most heavyweight, a sort of "poor-man's templates". I have used this approach before, using a simple sed script to specialize a "templated" version of some C code. I would like to program my future "low-level" projects in C rather than C++, but have been frightened by the thought of rewriting common data structures for each specific type. What experience do people have with this issue? Are there good libraries of generic data structures and algorithms in C that do not go with Option 1 (i.e. casting to and from void pointers, which sacrifices type safety and adds a level of indirection)?

Read the article
Optimize Duplicate Detection

- by Dave Jarvis

Background This is an optimization problem. Oracle Forms XML files have elements such as: <Trigger TriggerName="name" TriggerText="SELECT * FROM DUAL" ... /> Where the TriggerText is arbitrary SQL code. Each SQL statement has been extracted into uniquely named files such as: sql/module=DIAL_ACCESS+trigger=KEY-LISTVAL+filename=d_access.fmb.sql sql/module=REP_PAT_SEEN+trigger=KEY-LISTVAL+filename=rep_pat_seen.fmb.sql I wrote a script to generate a list of exact duplicates using a brute force approach. Problem There are 37,497 files to compare against each other; it takes 8 minutes to compare one file against all the others. Logically, if A = B and A = C, then there is no need to check if B = C. So the problem is: how do you eliminate the redundant comparisons? The script will complete in approximately 208 days. Script Source Code The comparison script is as follows: #!/bin/bash echo Loading directory ... for i in $(find sql/ -type f -name \*.sql); do echo Comparing $i ... for j in $(find sql/ -type f -name \*.sql); do if [ "$i" = "$j" ]; then continue; fi # Case insensitive compare, ignore spaces diff -IEbwBaq $i $j > /dev/null # 0 = no difference (i.e., duplicate code) if [ $? = 0 ]; then echo $i :: $j >> clones.txt fi done done Question How would you optimize the script so that checking for cloned code is a few orders of magnitude faster? System Constraints Using a quad-core CPU with an SSD; trying to avoid using cloud services if possible. The system is a Windows-based machine with Cygwin installed -- algorithms or solutions in other languages are welcome. Thank you!

Read the article
Java: multi-threaded maps: how do the implementations compare?

- by user346629

I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice. The options I know are: HashMap. Disastrously un-thread safe. ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance. Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives. Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe. Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of? Thanks in advance for your input!

Read the article
Graph search problem with route restrictions

- by Darcara

I want to calculate the most profitable route and I think this is a type of traveling salesman problem. I have a set of nodes that I can visit and a function to calculate cost for traveling between nodes and points for reaching the nodes. The goal is to reach a fixed known score while minimizing the cost. This cost and rewards are not fixed and depend on the nodes visited before. The starting node is fixed. There are some restrictions on how nodes can be visited. Some simplified examples include: Node B can only be visited after A After node C has been visited, D or E can be visited. Visiting at least one is required, visiting both is permissible. Z can only be visited after at least 5 other nodes have been visited Once 50 nodes have been visited, the nodes A-M will no longer reward points Certain nodes can (and probably must) be visited multiple times Currently I can think of only two ways to solve this: a) Genetic Algorithms, with the fitness function calculating the cost/benefit of the generated route b) Dijkstra search through the graph, since the starting node is fixed, although the large number of nodes will probably make that not feasible memory wise. Are there any other ways to determine the best route through the graph? It doesn't need to be perfect, an approximated path is perfectly fine, as long as it's error acceptable. Would TSP-solvers be an option here?

Read the article
Python, web log data mining for frequent patterns

- by descent

Hello! I need to develop a tool for web log data mining. Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website. I am new to Data Mining, and now examining Google a lot. Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies. So my questions are: Are there any python-based tools that do what I need or at least smth similar? Can Orange toolkit be of any help? Can reading the book Programming Collective Intelligence be of any help? What to Google for, what to read, which relatively simple algorithms to use best? I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time. Thanks in advance!

Read the article
Financial Market Developer dilemma...

- by Sahat

...In the future I am planning to work in the financial sector as a programmer. I have a couple of options right now (1 or 2): Learn and master .NET since presumably that's widely used in that industry OR Learn the programming concepts, learn algorithms, learn a little bit of c,c++,c#,java,objective-c,sql,oracle,cobol - in other words learn the fundamental principles that tie all programming languages together without going too deep in any particular language. Someone has told me that most of the time as a programmer you won't be writing any code, but instead maintaing and existing code that people before you have built. Does that mean I don't really need to master any specific language and as long as I have general concepts it'll be good enough? If you or if you know someone who has worked in the financial industry as a software developer could you please share the experience and what is the daily routine consists of? Also what should I be learning right now while I am still young and in college? Do I have to thoroughly understand the market and the current economy? What about Oracle or SQL Databases - do I need to know them inside out as a programmer? Thanks if you have anything else to add that I have not mentioned then please do so! Thanks in advance!

Read the article
How can I create an array of random numbers in C++

- by Nick

Instead of The ELEMENTS being 25 is there a way to randomly generate a large array of elements....10000, 100000, or even 1000000 elements and then use my insertion sort algorithms. I am trying to have a large array of elements and use insertion sort to put them in order and then also in reverse order. Next I used clock() in the time.h file to figure out the run time of each algorithm. I am trying to test with a large amount of numbers. #define ELEMENTS 25 void insertion_sort(int x[],int length); void insertion_sort_reverse(int x[],int length); int main() { clock_t tStart = clock(); int B[ELEMENTS]={4,2,5,6,1,3,17,14,67,45,32,66,88, 78,69,92,93,21,25,23,71,61,59,60,30}; int x; cout<<"Not Sorted: "<<endl; for(x=0;x<ELEMENTS;x++) cout<<B[x]<<endl; insertion_sort(B,ELEMENTS); cout <<"Sorted Normal: "<<endl; for(x=0;x<ELEMENTS;x++) cout<< B[x] <<endl; insertion_sort_reverse(B,ELEMENTS); cout <<"Sorted Reverse: "<<endl; for(x=0;x<ELEMENTS;x++) cout<< B[x] <<endl; double seconds = clock() / double(CLK_TCK); cout << "This program has been running for " << seconds << " seconds." << endl; system("pause"); return 0; }

Read the article
Strange Puzzle - Invalid memory access of location

- by Rob Graeber

The error message I'm getting consistently is: Invalid memory access of location 0x8 rip=0x10cf4ab28 What I'm doing is making a basic stock backtesting system, that is iterating huge arrays of stocks/historical data across various algorithms, using java + eclipse on the latest Mac Os X. I tracked down the code that seems to be causing it. A method that is used to get the massive arrays of data and is called thousands of times. Nothing is retained so I don't think there is a memory leak. However there seems to be a set limit of around 7000 times I can iterate over it before I get the memory error. The weird thing is that it works perfectly in debug mode. Does anyone know what debug mode does differently in Eclipse? Giving the jvm more memory doesn't help, and it appears to work fine using -xint. And again it works perfectly in debug mode. public static List<Stock> getStockArray(ExchangeType e){ List<Stock> stockArray = new ArrayList<Stock>(); if(e == ExchangeType.ALL){ stockArray.addAll(getStockArray(ExchangeType.NYSE)); stockArray.addAll(getStockArray(ExchangeType.NASDAQ)); }else if(e == ExchangeType.ETF){ stockArray.addAll(etfStockArray); }else if(e == ExchangeType.NYSE){ stockArray.addAll(nyseStockArray); }else if(e == ExchangeType.NASDAQ){ stockArray.addAll(nasdaqStockArray); } return stockArray; } A simple loop like this, iterated over 1000s of times, will cause the memory error. But not in debug mode. for (Stock stock : StockDatabase.getStockArray(ExchangeType.ETF)) { System.out.println(stock.symbol); }

Read the article
Question about permute-by-sorting

- by davit-datuashvili

In the book "Introduction to Algorithms", second edition, there is the following problem: Suppose we have some array: int a[] = {1,2,3,4} and some random priorities array: P = {36,3,97,19} and the goal is to permute the array a randomly using this priorities array. This is the pseudo code: PERMUTE-BY-SORTING (A) 1 n ? length[A] 2 for i ? 1 to n 3 do P[i] = RANDOM (1, n 3) 4 sort A, using P as sort keys 5 return A The result should be the permuted array: B={2, 4, 1, 3}; I have written this code: import java.util.*; public class Permute { public static void main (String[] args) { Random r = new Random(); int a[] = new int[] {1,2,3,4}; int n = a.length; int b[] = new int[a.length]; int p[] = new int[a.length]; for (int i=0; i<p.length; i++) { p[i] = r.nextInt(n*n*n) + 1; } // for (int i=0;i<p.length;i++){ // System.out.println(p[i]); //} } } How do I continue?

Read the article
How can I be a Guru? Is it possible? [closed]

- by Kev

Before 1999, I heard about computer. But, I don't know what it look like. TV? Maybe! Before 2001, I only saw it in book. It looks like a TV. Before 2005, I touched it in reality. It still looks like a TV + Black Box. In 2005, I entered university. I had a cource about Mathematica.I loved programming since then. In 2006, I owned a computer. I was coding C every day. if...else..., for..., while..., switch... entered my life. Since 2007, I have learned Data Structures, Algorithms...Then, C#, Java, Python, HTML/CSS/JavaScript, F#... A lot of languages. I'm still learning new lang. Unfortunately, I only know syntax! I'm always a novice(??)! I know some guru start programming at age of 8 or 12. I admire these gurus who are compiler writers, language designers, architecture designers, Linux hackers... Is it possible to become a guru like me. If possible, how? what should I do now? Any advice? Thank you very much.

Read the article
What db fits me?

- by afvasd

Dear Everyone I am currently using mysql. I am finding that my schema is getting incredibly complicated. I seek to find a new db that will suit my needs: Let's assume I am building a news aggregrator (which collects news from multiple website). I then run algorithms to determine if two news from different sites are actually referring to the same topic. I run this algorithm to cluster news together. The relationship is depicted below: cluster \--news1 \--word1 \--word2 \--news2 \--word3 \--news3 \--word1 \--word3 And then I will apply some magic and determine the importance of each word. Summing all the importance of each word gives me the importance of a news article. Summing the importance of each news article gives me the importance of a cluster. Note that above cluster there are also subgroups( like split by region etc), and categories (like sports, etc) which I have to determine the importance of that in a particular day per se. I have used views in the past to do so, but I realized that views are very slow. So i will normally do an insert into an actual table and index them for better performance. As you can see this leads to multiple tables derived like (cluster, importance), (news, importance), (words, importance) etc which can get pretty messy. Also the "importance" metric will change. It has become increasingly difficult to alter tables, update data (which I am using TRUNCATE TABLE) and then inserting from null. I am currently looking into something schemaless like Mongodb. I do not need distributedness. I would very much want something that is reasonably fast (which can be indexed) and something that is a lot more flexible that traditional RDMBS. Also, I need something that has some kind of ORM because I personally like ORM a lot. I am currently using sqlalchemy Please help!

Read the article
JAVA - How to code Node neighbours in a Grid ?

- by ke3pup

Hi guys I'm new to programming and as a School task i need to implement BFS,DFS and A* search algorithms in java to search for a given Goal from a given start position in a Grid of given size, 4x4,8x8..etc to begin with i don't know how to code the neighbors of all the nodes. For example tile 1 in grid as 2 and 9 as neighbors and Tile 12 has ,141,13,20 as its neighbours but i'm struggling to code that. I need the neighbours part so that i can move from start position to other parts of gird legally by moving horizontally or vertically through the neighbours. my node class is: class node { int value; LinkedList neighbors; bool expanded; } let's say i'm given a 8x8 grid right, So if i start the program with a grid of size 8x8 right : 1 - my main will func will create an arrayList of nodes for example node ArrayList test = new ArrayList(); and then using a for loop assign value to all the nodes in arrayList from 1 to 64 (if the grid size was 8x8). BUT somehow i need t on coding that, if anyone can give me some details i would really appreciate it.

Read the article
question about permut-by-sorting

- by davit-datuashvili

hi i have following question from book introduction in algorithms second edition there is such problem suppose we have some array A int a[]={1,2,3,4} and we have some random priorities array P={36,3,97,19} we shoud permut array a randomly using this priorities array here is pseudo code P ERMUTE -B Y-S ORTING ( A) 1 n ? length[A] 2 for i ? 1 to n do P[i] = R ANDOM(1, n 3 ) 3 4 sort A, using P as sort keys 5 return A and result will be permuted array B={2, 4, 1, 3}; please help any ideas i have done this code and need aideas how continue import java.util.*; public class Permut { public static void main(String[]args){ Random r=new Random(); int a[]=new int[]{1,2,3,4}; int n=a.length; int b[]=new int[a.length]; int p[]=new int[a.length]; for (int i=0;i<p.length;i++){ p[i]=r.nextInt(n*n*n)+1; } // for (int i=0;i<p.length;i++){ // System.out.println(p[i]); //} } } please help

Read the article
[Wireless LAN]hostapd is giving error whwn running in target board

- by Renjith G

hi, I got the following error when i tried to run the hostapd command in my target board. Any idea about this? /etc # hostapd -dd hostapd.conf Configuration file: hostapd.conf madwifi_set_iface_flags: dev_up=0 madwifi_set_privacy: enabled=0 BSS count 1, BSSID mask ff:ff:ff:ff:ff:ff (0 bits) Flushing old station entries madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=3 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 3) Could not connect to kernel driver. Deauthenticate all stations madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=2 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 2) madwifi_set_privacy: enabled=0 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=0 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=1 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=2 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=3 Using interface ath0 with hwaddr 00:0b:6b:33:8c:30 and ssid '"RG_WLAN Testing Renjith G"' SSID - hexdump_ascii(len=27): 22 52 47 5f 57 4c 41 4e 20 54 65 73 74 69 6e 67 "RG_WLAN Testing 20 52 65 6e 6a 69 74 68 20 47 22 Renjith G" PSK (ASCII passphrase) - hexdump_ascii(len=12): 6d 79 70 61 73 73 70 68 72 61 73 65 mypassphrase PSK (from passphrase) - hexdump(len=32): 70 6f a6 92 da 9c a8 3b ff 36 85 76 f3 11 9c 5e 5d 4a 4b 79 f4 4e 18 f6 b1 b8 09 af 6c 9c 6c 21 madwifi_set_ieee8021x: enabled=1 madwifi_configure_wpa: group key cipher=1 madwifi_configure_wpa: pairwise key ciphers=0xa madwifi_configure_wpa: key management algorithms=0x2 madwifi_configure_wpa: rsn capabilities=0x0 madwifi_configure_wpa: enable WPA=0x1 WPA: group state machine entering state GTK_INIT (VLAN-ID 0) GMK - hexdump(len=32): [REMOVED] GTK - hexdump(len=32): [REMOVED] WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0) madwifi_set_key: alg=TKIP addr=00:00:00:00:00:00 key_idx=1 madwifi_set_privacy: enabled=1 madwifi_set_iface_flags: dev_up=1 ath0: Setup of interface done. l2_packet_receive - recvfrom: Network is down Wireless event: cmd=0x8b1a len=40 Register Fail Register Fail WPA: group state machine entering state SETKEYS (VLAN-ID 0) GMK - hexdump(len=32): [REMOVED] GTK - hexdump(len=32): [REMOVED] wpa_group_setkeys: GKeyDoneStations=0 WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0) madwifi_set_key: alg=TKIP addr=00:00:00:00:00:00 key_idx=2 Signal 2 received - terminating Flushing old station entries madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=3 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 3) Could not connect to kernel driver. Deauthenticate all stations madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=2 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 2) madwifi_set_privacy: enabled=0 madwifi_set_ieee8021x: enabled=0 madwifi_set_iface_flags: dev_up=0

Read the article
TPROXY Not working with HAProxy, Ubuntu 14.04

- by Nyxynyx

I'm trying to use HAProxy as a fully transparent proxy using TPROXY in Ubuntu 14.04. HAProxy will be setup on the first server with eth1 111.111.250.250 and eth0 10.111.128.134. The single balanced server has eth1 and eth0 as well. eth1 is the public facing network interface while eth0 is for the private network which both servers are in. Problem: I'm able to connect to the balanced server's port 1234 directly (via eth1) but am not able to reach the balanced server via Haproxy port 1234 (which redirects to 1234 via eth0). Am I missing out something in this configuration? On the HAProxy server The current kernel is: Linux extremehash-lb2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The kernel appears to have TPROXY support: # grep TPROXY /boot/config-3.13.0-24-generic CONFIG_NETFILTER_XT_TARGET_TPROXY=m HAProxy was compiled with TPROXY support: haproxy -vv HA-Proxy version 1.5.3 2014/07/25 Copyright 2000-2014 Willy Tarreau <[email protected]> Build options : TARGET = linux26 CPU = x86_64 CC = gcc CFLAGS = -g -fno-strict-aliasing OPTIONS = USE_LINUX_TPROXY=1 USE_LIBCRYPT=1 USE_STATIC_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built without zlib support (USE_ZLIB not set) Compression algorithms supported : identity Built without OpenSSL support (USE_OPENSSL not set) Built with PCRE version : 8.31 2012-07-06 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. In /etc/haproxy/haproxy.cfg, I've configured a port to have the following options: listen test1235 :1234 mode tcp option tcplog balance leastconn source 0.0.0.0 usesrc clientip server balanced1 10.111.163.76:1234 check inter 5s rise 2 fall 4 weight 4 On the balanced server In /etc/networking/interfaces I've set the gateway for eth0 to be the HAProxy box 10.111.128.134 and restarted networking. auto eth0 eth1 iface eth0 inet static address 111.111.250.250 netmask 255.255.224.0 gateway 111.131.224.1 dns-nameservers 8.8.4.4 8.8.8.8 209.244.0.3 iface eth1 inet static address 10.111.163.76 netmask 255.255.0.0 gateway 10.111.128.134 ip route gives: default via 111.111.224.1 dev eth0 10.111.0.0/16 dev eth1 proto kernel scope link src 10.111.163.76 111.111.224.0/19 dev eth0 proto kernel scope link src 111.111.250.250

Read the article
Is there a way to replicate a very large file shares in real-time?

- by fsckin

I have an hourly cron job that copies about 40GB of data from a source folder into a new folder with the hour appended on the end. When it's done, the job prunes anything older than 24 hours. This data changes very often during work hours and is on a samba file share. Here's how the folder structure looks: \server\Version.1 \server\Version.2 \server\Version.3 ... \server\Version.24 The contents of each new folder compared to the last one usually doesn't change very much, since this is a hourly job. Now you might be thinking that I'm an idiot for setting dreaming this up. Truth is, I just found out. It's actually been used for years and is so incredibly simple, anyone could delete the ENTIRE 40GB share (imagine that dialog spooling up... deleting thousands and thousands of files) and it would actually be faster to restore by moving the latest copy back to the source than it took to delete. Brilliant! Now to top this off, I need to efficiently replicate this 960GB of "mostly similar" data to a remote server over WAN link, with the replication happening as close to real-time as possible -- think hot spare, disaster recovery, etc. My first thought was rsync. Total failure. Rsync sees it sees a deletion of the folder that is 24 hours old and the addition of a new folder with 30GB of data to sync! I also looked at rdiff-backup and unison, they both appear to use similar algorithms and do not keep enough meta-data to do this intelligently. Best thing that I can find "out of the box" to do this is Windows Server "Distributed Filesystem Replication" which uses "Remote Differential Compression" -- After reading the background information on how this works, it actually looks like exactly what I need. Problem: Both servers are running Linux. D'oh! One approach to this I'm looking at is this, say it's 5AM and the cron job finishes: New Version.5 folder arrives at on local server SSH to remote server and copy Version.4 to Version.5 Run rsync on the local server pushing changes to the remote server. Rsync finally knows to do a differential copy between Version.4 and Version.5 Is there a smarter way to replicate Samba shares as close to real-time as possible? Anything out there that does "Remote Differential Compression" on Linux?

Read the article
WebLogic Server Performance and Tuning: Part I - Tuning JVM

- by Gokhan Gungor

Each WebLogic Server instance runs in its own dedicated Java Virtual Machine (JVM) which is their runtime environment. Every Admin Server in any domain executes within a JVM. The same also applies for Managed Servers. WebLogic Server can be used for a wide variety of applications and services which uses the same runtime environment and resources. Oracle WebLogic ships with 2 different JVM, HotSpot and JRocket but you can choose which JVM you want to use. JVM is designed to optimize itself however it also provides some startup options to make small changes. There are default values for its memory and garbage collection. In real world, you will not want to stick with the default values provided by the JVM rather want to customize these values based on your applications which can produce large gains in performance by making small changes with the JVM parameters. We can tell the garbage collector how to delete garbage and we can also tell JVM how much space to allocate for each generation (of java Objects) or for heap. Remember during the garbage collection no other process is executed within the JVM or runtime, which is called STOP THE WORLD which can affect the overall throughput. Each JVM has its own memory segment called Heap Memory which is the storage for java Objects. These objects can be grouped based on their age like young generation (recently created objects) or old generation (surviving objects that have lived to some extent), etc. A java object is considered garbage when it can no longer be reached from anywhere in the running program. Each generation has its own memory segment within the heap. When this segment gets full, garbage collector deletes all the objects that are marked as garbage to create space. When the old generation space gets full, the JVM performs a major collection to remove the unused objects and reclaim their space. A major garbage collect takes a significant amount of time and can affect system performance. When we create a managed server either on the same machine or on remote machine it gets its initial startup parameters from $DOMAIN_HOME/bin/setDomainEnv.sh/cmd file. By default two parameters are set: Xms: The initial heapsize Xmx: The max heapsize Try to set equal initial and max heapsize. The startup time can be a little longer but for long running applications it will provide a better performance. When we set -Xms512m -Xmx1024m, the physical heap size will be 512m. This means that there are pages of memory (in the state of the 512m) that the JVM does not explicitly control. It will be controlled by OS which could be reserve for the other tasks. In this case, it is an advantage if the JVM claims the entire memory at once and try not to spend time to extend when more memory is needed. Also you can use -XX:MaxPermSize (Maximum size of the permanent generation) option for Sun JVM. You should adjust the size accordingly if your application dynamically load and unload a lot of classes in order to optimize the performance. You can set the JVM options/heap size from the following places: Through the Admin console, in the Server start tab In the startManagedWeblogic script for the managed servers $DOMAIN_HOME/bin/startManagedWebLogic.sh/cmd JAVA_OPTIONS="-Xms1024m -Xmx1024m" ${JAVA_OPTIONS} In the setDomainEnv script for the managed servers and admin server (domain wide) USER_MEM_ARGS="-Xms1024m -Xmx1024m" When there is free memory available in the heap but it is too fragmented and not contiguously located to store the object or when there is actually insufficient memory we can get java.lang.OutOfMemoryError. We should create Thread Dump and analyze if that is possible in case of such error. The second option we can use to produce higher throughput is to garbage collection. We can roughly divide GC algorithms into 2 categories: parallel and concurrent. Parallel GC stops the execution of all the application and performs the full GC, this generally provides better throughput but also high latency using all the CPU resources during GC. Concurrent GC on the other hand, produces low latency but also low throughput since it performs GC while application executes. The JRockit JVM provides some useful command-line parameters that to control of its GC scheme like -XgcPrio command-line parameter which takes the following options; XgcPrio:pausetime (To minimize latency, parallel GC) XgcPrio:throughput (To minimize throughput, concurrent GC ) XgcPrio:deterministic (To guarantee maximum pause time, for real time systems) Sun JVM has similar parameters (like -XX:UseParallelGC or -XX:+UseConcMarkSweepGC) to control its GC scheme. We can add -verbosegc -XX:+PrintGCDetails to monitor indications of a problem with garbage collection. Try configuring JVM’s of all managed servers to execute in -server mode to ensure that it is optimized for a server-side production environment.

Read the article
Form, function and complexity in rule processing

- by Charles Young

Tim Bass posted on ‘Orwellian Event Processing’. I was involved in a heated exchange in the comments, and he has more recently published a post entitled ‘Disadvantages of Rule-Based Systems (Part 1)’. Whatever the rights and wrongs of our exchange, it clearly failed to generate any agreement or understanding of our different positions. I don't particularly want to promote further argument of that kind, but I do want to take the opportunity of offering a different perspective on rule-processing and an explanation of my comments. For me, the ‘red rag’ lay in Tim’s claim that “...rules alone are highly inefficient for most classes of (not simple) problems” and a later paragraph that appears to equate the simplicity of form (‘IF-THEN-ELSE’) with simplicity of function. It is not the first time Tim has expressed these views and not the first time I have responded to his assertions. Indeed, Tim has a long history of commenting on the subject of complex event processing (CEP) and, less often, rule processing in ‘robust’ terms, often asserting that very many other people’s opinions on this subject are mistaken. In turn, I am of the opinion that, certainly in terms of rule processing, which is an area in which I have a specific interest and knowledge, he is often mistaken. There is no simple answer to the fundamental question ‘what is a rule?’ We use the word in a very fluid fashion in English. Likewise, the term ‘rule processing’, as used widely in IT, is equally difficult to define simplistically. The best way to envisage the term is as a ‘centre of gravity’ within a wider domain. That domain contains many other ‘centres of gravity’, including CEP, statistical analytics, neural networks, natural language processing and so much more. Whole communities tend to gravitate towards and build themselves around some of these centres. The term 'rule processing' is associated with many different technology types, various software products, different architectural patterns, the functional capability of many applications and services, etc. There is considerable variation amongst these different technologies, techniques and products. Very broadly, a common theme is their ability to manage certain types of processing and problem solving through declarative, or semi-declarative, statements of propositional logic bound to action-based consequences. It is generally important to be able to decouple these statements from other parts of an overall system or architecture so that they can be managed and deployed independently. As a centre of gravity, ‘rule processing’ is no island. It exists in the context of a domain of discourse that is, itself, highly interconnected and continuous. Rule processing does not, for example, exist in splendid isolation to natural language processing. On the contrary, an on-going theme of rule processing is to find better ways to express rules in natural language and map these to executable forms. Rule processing does not exist in splendid isolation to CEP. On the contrary, an event processing agent can reasonably be considered as a rule engine (a theme in ‘Power of Events’ by David Luckham). Rule processing does not live in splendid isolation to statistical approaches such as Bayesian analytics. On the contrary, rule processing and statistical analytics are highly synergistic. Rule processing does not even live in splendid isolation to neural networks. For example, significant research has centred on finding ways to translate trained nets into explicit rule sets in order to support forms of validation and facilitate insight into the knowledge stored in those nets. What about simplicity of form? Many rule processing technologies do indeed use a very simple form (‘If...Then’, ‘When...Do’, etc.) However, it is a fundamental mistake to equate simplicity of form with simplicity of function. It is absolutely mistaken to suggest that simplicity of form is a barrier to the efficient handling of complexity. There are countless real-world examples which serve to disprove that notion. Indeed, simplicity of form is often the key to handling complexity. Does rule processing offer a ‘one size fits all’. No, of course not. No serious commentator suggests it does. Does the design and management of large knowledge bases, expressed as rules, become difficult? Yes, it can do, but that is true of any large knowledge base, regardless of the form in which knowledge is expressed. The measure of complexity is not a function of rule set size or rule form. It tends to be correlated more strongly with the size of the ‘problem space’ (‘search space’) which is something quite different. Analysis of the problem space and the algorithms we use to search through that space are, of course, the very things we use to derive objective measures of the complexity of a given problem. This is basic computer science and common practice. Sailing a Dreadnaught through the sea of information technology and lobbing shells at some of the islands we encounter along the way does no one any good. Building bridges and causeways between islands so that the inhabitants can collaborate in open discourse offers hope of real progress.

Read the article
Why lock-free data structures just aren't lock-free enough

- by Alex.Davies

Today's post will explore why the current ways to communicate between threads don't scale, and show you a possible way to build scalable parallel programming on top of shared memory. The problem with shared memory Soon, we will have dozens, hundreds and then millions of cores in our computers. It's inevitable, because individual cores just can't get much faster. At some point, that's going to mean that we have to rethink our architecture entirely, as millions of cores can't all access a shared memory space efficiently. But millions of cores are still a long way off, and in the meantime we'll see machines with dozens of cores, struggling with shared memory. Alex's tip: The best way for an application to make use of that increasing parallel power is to use a concurrency model like actors, that deals with synchronisation issues for you. Then, the maintainer of the actors framework can find the most efficient way to coordinate access to shared memory to allow your actors to pass messages to each other efficiently. At the moment, NAct uses the .NET thread pool and a few locks to marshal messages. It works well on dual and quad core machines, but it won't scale to more cores. Every time we use a lock, our core performs an atomic memory operation (eg. CAS) on a cell of memory representing the lock, so it's sure that no other core can possibly have that lock. This is very fast when the lock isn't contended, but we need to notify all the other cores, in case they held the cell of memory in a cache. As the number of cores increases, the total cost of a lock increases linearly. A lot of work has been done on "lock-free" data structures, which avoid locks by using atomic memory operations directly. These give fairly dramatic performance improvements, particularly on systems with a few (2 to 4) cores. The .NET 4 concurrent collections in System.Collections.Concurrent are mostly lock-free. However, lock-free data structures still don't scale indefinitely, because any use of an atomic memory operation still involves every core in the system. A sync-free data structure Some concurrent data structures are possible to write in a completely synchronization-free way, without using any atomic memory operations. One useful example is a single producer, single consumer (SPSC) queue. It's easy to write a sync-free fixed size SPSC queue using a circular buffer*. Slightly trickier is a queue that grows as needed. You can use a linked list to represent the queue, but if you leave the nodes to be garbage collected once you're done with them, the GC will need to involve all the cores in collecting the finished nodes. Instead, I've implemented a proof of concept inspired by this intel article which reuses the nodes by putting them in a second queue to send back to the producer. * In all these cases, you need to use memory barriers correctly, but these are local to a core, so don't have the same scalability problems as atomic memory operations. Performance tests I tried benchmarking my SPSC queue against the .NET ConcurrentQueue, and against a standard Queue protected by locks. In some ways, this isn't a fair comparison, because both of these support multiple producers and multiple consumers, but I'll come to that later. I started on my dual-core laptop, running a simple test that had one thread producing 64 bit integers, and another consuming them, to measure the pure overhead of the queue. So, nothing very interesting here. Both concurrent collections perform better than the lock-based one as expected, but there's not a lot to choose between the ConcurrentQueue and my SPSC queue. I was a little disappointed, but then, the .NET Framework team spent a lot longer optimising it than I did. So I dug out a more powerful machine that Red Gate's DBA tools team had been using for testing. It is a 6 core Intel i7 machine with hyperthreading, adding up to 12 logical cores. Now the results get more interesting. As I increased the number of producer-consumer pairs to 6 (to saturate all 12 logical cores), the locking approach was slow, and got even slower, as you'd expect. What I didn't expect to be so clear was the drop-off in performance of the lock-free ConcurrentQueue. I could see the machine only using about 20% of available CPU cycles when it should have been saturated. My interpretation is that as all the cores used atomic memory operations to safely access the queue, they ended up spending most of the time notifying each other about cache lines that need invalidating. The sync-free approach scaled perfectly, despite still working via shared memory, which after all, should still be a bottleneck. I can't quite believe that the results are so clear, so if you can think of any other effects that might cause them, please comment! Obviously, this benchmark isn't realistic because we're only measuring the overhead of the queue. Any real workload, even on a machine with 12 cores, would dwarf the overhead, and there'd be no point worrying about this effect. But would that be true on a machine with 100 cores? Still to be solved. The trouble is, you can't build many concurrent algorithms using only an SPSC queue to communicate. In particular, I can't see a way to build something as general purpose as actors on top of just SPSC queues. Fundamentally, an actor needs to be able to receive messages from multiple other actors, which seems to need an MPSC queue. I've been thinking about ways to build a sync-free MPSC queue out of multiple SPSC queues and some kind of sign-up mechanism. Hopefully I'll have something to tell you about soon, but leave a comment if you have any ideas.

Read the article

Search Results

Search found 1848 results on 74 pages for 'algorithms'.

Page 68/74 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 | Next Page >

- by Shaharyar

- by Jakob

- by Richeve S. Bebedor

- by Kyle Cronin

- by seleniumnewbie

- by Will

- by Bradford Larsen

- by Dave Jarvis

- by user346629

- by Darcara

- by descent

- by Sahat

- by Nick

- by Rob Graeber

- by davit-datuashvili

- by Kev

- by afvasd

- by ke3pup

- by davit-datuashvili

- by Renjith G

- by Nyxynyx

- by fsckin

- by Gokhan Gungor

- by Charles Young

- by Alex.Davies

< Previous Page | 64 65 66 67 68 69 70 71 72 73 74 | Next Page >