Search Results

Search found 4304 results on 173 pages for 'bytes'.

Page 113/173 | < Previous Page | 109 110 111 112 113 114 115 116 117 118 119 120  | Next Page >

  • "destination host unreachable" while ping attempt between 2 pc on 1 network

    - by Roberto Sadfasdf
    I have 2 computers in my network ones ip address is 192.168.2.31(PC-A) and the others is 192.168.2.33(PC-B). When I'm trying to ping from PC-B to PC-A it says: Pinging 192.168.2.33 with 32 bytes of data: Reply from 192.168.2.31: Destination host unreachable. Reply from 192.168.2.31: Destination host unreachable. Reply from 192.168.2.31: Destination host unreachable. Reply from 192.168.2.31: Destination host unreachable. Ping statistics for 192.168.2.33: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss) And when i'm trying the otherwise it tells me same message with ip address switched..They are both can connect to the Internet

    Read the article

  • Random TCP Resets

    - by allenwei
    We got randomly TCP "reset" error when we send request to remote server. Log from remote server Cisco TCP Connection Terminated,Nov 05 14:43:39 EST: %ASA-session-6-302014: Teardown TCP connection 640068283 for Outside:xxxx to xxxx duration 0:00:00 bytes 4160 TCP Reset-O One my local machine I saw when I use netstat 100703 connections reset due to unexpected data 324186 connections reset due to early user close I also use tcpdump to see what's wrong with it, I saw xxxx.https: Flags [R.], seq 290, ack 1369, win 136, options [nop,nop,TS val 2871790533 ecr 1897173283], length 0 The problem just happened today, we didn't change anything on our server. Anyone know what's wrong with it? Is it related to code we wrote send out request or related to linux configuration?

    Read the article

  • error in assigning a const character to a usigned char array in C++

    - by mekasperasky
    #include <iostream> #include <fstream> #include <cstring> using namespace std; typedef unsigned long int WORD; /* Should be 32-bit = 4 bytes */ #define w 32 /* word size in bits */ #define r 12 /* number of rounds */ #define b 16 /* number of bytes in key */ #define c 4 /* number words in key */ /* c = max(1,ceil(8*b/w)) */ #define t 26 /* size of table S = 2*(r+1) words */ WORD S [t],L[c]; /* expanded key table */ WORD P = 0xb7e15163, Q = 0x9e3779b9; /* magic constants */ /* Rotation operators. x must be unsigned, to get logical right shift*/ #define ROTL(x,y) (((x)<<(y&(w-1))) | ((x)>>(w-(y&(w-1))))) #define ROTR(x,y) (((x)>>(y&(w-1))) | ((x)<<(w-(y&(w-1))))) void RC5_DECRYPT(WORD *ct, WORD *pt) /* 2 WORD input ct/output pt */ { WORD i, B=ct[1], A=ct[0]; for (i=r; i>0; i--) { B = ROTR(B-S [2*i+1],A)^A; A = ROTR(A-S [2*i],B)^B; } pt [1] = B-S [1] ;pt [0] = A-S [0]; } void RC5_SETUP(unsigned char *K) /* secret input key K 0...b-1] */ { WORD i, j, k, u=w/8, A, B, L [c]; /* Initialize L, then S, then mix key into S */ for (i=b-1,L[c-1]=0; i!=-1; i--) L[i/u] = (L[i/u]<<8)+K[ i]; for (S [0]=P,i=1; i<t; i++) S [i] = S [i-1]+Q; for (A=B=i=j=k=0; k<3*t; k++,i=(i+1)%t,j=(j+1)%c) /* 3*t > 3*c */ { A = S[i] = ROTL(S [i]+(A+B),3); B = L[j] = ROTL(L[j]+(A+B),(A+B)); } } void printword(WORD A) { WORD k; for (k=0 ;k<w; k+=8) printf("%02.2lX",(A>>k)&0xFF); } int main() { WORD i, j, k, pt [2], pt2 [2], ct [2] = {0,0}; unsigned char key[b]; ofstream out("cpt.txt"); ifstream in("key.txt"); if(!in) { cout << "Cannot open file.\n"; return 1; } if(!out) { cout << "Cannot open file.\n"; return 1; } key="111111000001111"; RC5_SETUP(key); ct[0]=2185970173; ct[1]=3384368406; for (i=1;i<2;i++) { RC5_DECRYPT(ct,pt2); printf("\n plaintext "); printword(pt [0]); printword(pt[1]); } return 0; } when i run this code i get two warnings and also an error saying that i cant assign a char value to my character array . Why is that ?

    Read the article

  • Windows/Samba connection error

    - by Gomibushi
    I have a Linux fileserver serving up /home for linux and windows users. I was able to connect from my windows client, but not from a DC. Then suddenly I could connect from the DC too. The linux servers run Centrify clients, and as such are part of the domain. All on same subnet. This is what the the log.smbd says, repeatedly: [2010/02/11 11:25:57, 0] lib/util_sock.c:read_data(534) read_data: read failure for 4 bytes to client 192.168.200.3. Error = Connection reset by peer On Windows it appeared as an "unknown error". EDIT: the error code is "0x80004005". We are developing a system depended on the samba share, and are worried this will appear again. It would be nice to pin point the root of this. Any ideas what this might be? Places to look?

    Read the article

  • Override `drop` for a custom sequence

    - by Bruno Reis
    In short: in Clojure, is there a way to redefine a function from the standard sequence API (which is not defined on any interface like ISeq, IndexedSeq, etc) on a custom sequence type I wrote? 1. Huge data files I have big files in the following format: A long (8 bytes) containing the number n of entries n entries, each one being composed of 3 longs (ie, 24 bytes) 2. Custom sequence I want to have a sequence on these entries. Since I cannot usually hold all the data in memory at once, and I want fast sequential access on it, I wrote a class similar to the following: (deftype DataSeq [id ^long cnt ^long i cached-seq] clojure.lang.IndexedSeq (index [_] i) (count [_] (- cnt i)) (seq [this] this) (first [_] (first cached-seq)) (more [this] (if-let [s (next this)] s '())) (next [_] (if (not= (inc i) cnt) (if (next cached-seq) (DataSeq. id cnt (inc i) (next cached-seq)) (DataSeq. id cnt (inc i) (with-open [f (open-data-file id)] ; open a memory mapped byte array on the file ; seek to the exact position to begin reading ; decide on an optimal amount of data to read ; eagerly read and return that amount of data )))))) The main idea is to read ahead a bunch of entries in a list and then consume from that list. Whenever the cache is completely consumed, if there are remaining entries, they are read from the file in a new cache list. Simple as that. To create an instance of such a sequence, I use a very simple function like: (defn ^DataSeq load-data [id] (next (DataSeq. id (count-entries id) -1 []))) ; count-entries is a trivial "open file and read a long" memoized As you can see, the format of the data allowed me to implement count in very simply and efficiently. 3. drop could be O(1) In the same spirit, I'd like to reimplement drop. The format of these data files allows me to reimplement drop in O(1) (instead of the standard O(n)), as follows: if dropping less then the remaining cached items, just drop the same amount from the cache and done; if dropping more than cnt, then just return the empty list. otherwise, just figure out the position in the data file, jump right into that position, and read data from there. My difficulty is that drop is not implemented in the same way as count, first, seq, etc. The latter functions call a similarly named static method in RT which, in turn, calls my implementation above, while the former, drop, does not check if the instance of the sequence it is being called on provides a custom implementation. Obviously, I could provide a function named anything but drop that does exactly what I want, but that would force other people (including my future self) to remember to use it instead of drop every single time, which sucks. So, the question is: is it possible to override the default behaviour of drop? 4. A workaround (I dislike) While writing this question, I've just figured out a possible workaround: make the reading even lazier. The custom sequence would just keep an index and postpone the reading operation, that would happen only when first was called. The problem is that I'd need some mutable state: the first call to first would cause some data to be read into a cache, all the subsequent calls would return data from this cache. There would be a similar logic on next: if there's a cache, just next it; otherwise, don't bother populating it -- it will be done when first is called again. This would avoid unnecessary disk reads. However, this is still less than optimal -- it is still O(n), and it could easily be O(1). Anyways, I don't like this workaround, and my question is still open. Any thoughts? Thanks.

    Read the article

  • How can I monitor network usage by process on Mac OS X?

    - by psmith
    Is there any way to find out which process using how much internet bandwidth on Mac OS X Lion? I'm on mobile internet now, which is not very fast, so it would be nice if I can tell that for example, Chrome using 10kB/s, and Skype using 2kB/s. I can see the total amount of traffic in Activity Monitor, but it is not enough for me. I'd like to use an existing application, not interested to write an app like this. And I'm not interested in the actual traffic, only the number of bytes transferred and received by each processes.

    Read the article

  • Database implementation question? [closed]

    - by gundam
    consider a disk with a sector size of 512 bytes, 2000 tracks/surface, 50 sectors/track, 5 doubled sided platters, average seek time is 10 msec. Assume a block size of 1024-byte is selected. Assume a file that contains 100,000 records of 100-byte each is to be stored on the disk, and NONE of the reocd can be spanned 2 blocks. How many blocks are needed to store the entire file?? If the file is arranged sequentially on disk, how many surfaces are required?? Now, i have calculated that 10,000 blocks are needed to store 100,000 records. But i am not sure how to find out the answer of the surfaces required. I only calculated the capacity of track is 25KB and capacity of surface is 50,000 KB But I don't know how to calculate the number of surfaces... Could anyone help me how to get the answer? Thanks a lot!!

    Read the article

  • Can't delete file from windows 7

    - by r.s.mahanti
    I downloaded a torrent file from internet. The problem is it's size is showing 0 bytes. I tried to scan it with antivirus, upload it to virus total, delete it but it's showing the file is not found. I tried to delete it in safe mode also but no success. Can anybody explain me, what can be reason for this and what is the way to delete this file? Thanks in advance. My operating system is windows 7. EDIT : the name of the file is "[Torrentreactor.to] - Site Translator 4.06.torrent."

    Read the article

  • Split a Large File In C++

    - by wdow88
    Hey all, I'm trying to write a program that takes a large file (of any time) and splits it into many smaller "chunks". I think I have the basic idea down, but for some reason I cannot create a chunk size over 12,000 bites. I know there are a few solutions on google, etc. but I am more interested in learning what the origin of this limitation is then actually using the program to split files. //This file splits are larger into smaller files of a user inputted size. #include<iostream> #include<fstream> #include<string> #include<sstream> #include <direct.h> #include <stdlib.h> using namespace std; void GetCurrentPath(char* buffer) { _getcwd(buffer, _MAX_PATH); } int main() { // use the function to get the path char CurrentPath[_MAX_PATH]; GetCurrentPath(CurrentPath);//Get the current directory (used for displaying output) fstream bigFile; string filename; int partsize; cout << "Enter a file name: "; cin >> filename; //Recieve target file cout << "Enter the number of bites in each smaller file: "; cin >> partsize; //Recieve volume size bigFile.open(filename.c_str(),ios::in | ios::binary); bigFile.seekg(0, ios::end); // position get-ptr 0 bytes from end int size = bigFile.tellg(); // get-ptr position is now same as file size bigFile.seekg(0, ios::beg); // position get-ptr 0 bytes from beginning for (int i = 0; i <= (size / partsize); i++) { //Build File Name string partname = filename; //The original filename string charnum; //archive number stringstream out; //stringstream object out, used to build the archive name out << "." << i; charnum = out.str(); partname.append(charnum); //put the part name together //Write new file part fstream filePart; filePart.open(partname.c_str(),ios::out | ios::binary); //Open new file with the name built above //Check if near the end of file if (bigFile.tellg() < (size - (size%partsize))) { filePart.write(reinterpret_cast<char *>(&bigFile),partsize); //Write the selected amount to the file filePart.close(); //close file bigFile.seekg(partsize, ios::cur); //move pointer to next position to be written } //Changes the size of the last volume because it is the end of the file else { filePart.write(reinterpret_cast<char *>(&bigFile),(size%partsize)); //Write the selected amount to the file filePart.close(); //close file } cout << "File " << CurrentPath << partname << " produced" << endl; //display the progress of the split } bigFile.close(); cout << "Split Complete." << endl; return 0; } Any ideas? Thanks!

    Read the article

  • Is there any script to do accounting for the proftpd's xferlog?

    - by Aseques
    I would like to convert from the xferlog format that proftpd uses into per user in/out bytes, to have a summary on how much traffic does each user use per month. The exact format is this: Thu Oct 17 12:47:05 2013 1 123.123.123.123 74852 /home/vftp/doc1.txt b _ i r user ftp 0 * c Thu Oct 17 12:47:06 2013 2 123.123.123.123 86321 /home/vftp/doc2.txt b _ i r user ftp 0 * c So far I only found a script that makes a nice report but not exactly what I needed, that one can be found here I might create a fork of this one and place it somewhere but it probably has been done a lot of times already. Just found a well hidden page in proftpd site with some more examples here

    Read the article

  • kernel 2.6.36 not booting

    - by Saumitra
    Hi, I m a newbie to kernel programming. I am trying to boot the kernel 2.6.36 on my ECG machine.It was working perfectly on 2.6.33.2. It is getting stuck on this step: ## Booting kernel from Legacy Image at 81000000 ... Image Name: Created: 2010-12-27 5:55:56 UTC Image Type: MIPS Linux Kernel Image (gzip compressed) Data Size: 1974278 Bytes = 1.9 MB Load Address: 80100000 Entry Point: 80104730 Verifying Checksum ... OK Uncompressing Kernel Image ... OK Starting kernel ... After this the system either resets or it hangs. I have also checked the configuration & set it properly.Please let me know.

    Read the article

  • What command line tools for monitoring host network activity on linux do you use?

    - by user27388
    What command line tools are good for reliably monitoring network activity? I have used ifconfig, but an office colleague said that its statistics are not always reliable. Is that true? I have recently used ethtool, but is it reliable? What about just looking at /proc/net 'files'? Is that any better? EDIT I'm interested in packets Tx/Rx, bytes Tx/Rx, but most importantly drops or errors and why the drop/error might have occurred.

    Read the article

  • Pinging 192.168.1.4 is looking to 192.168.1.2?

    - by BVernon
    When I run the command "ping 192.168.1.4" I am get the following results: Pinging 192.168.1.4 with 32 bytes of data: Reply from 192.168.1.2: Destination host unreachable. Reply from 192.168.1.2: Destination host unreachable. Reply from 192.168.1.2: Destination host unreachable. Reply from 192.168.1.2: Destination host unreachable. Can anyone help me understand why in the world it's telling me that 192.168.1.2 is unreachable when that's not even the ip address I typed in? I'm very confused. Also in case it's relevant, I'm on a workgroup.

    Read the article

  • Total network data sent/received of a non-daemon Linux process?

    - by leden
    I'm looking for a simple and effective way of measuring total bytes received/sent from a single process upon its termination. Basically, I am looking for a tool which has the interface similar to "time" and "/usr/bin/time", e.g. measure-net-data <prog_to_run> <prog_args> Received (b): XYZ Sent (b): ABC I know that there are many tools for bandwidth/network monitoring, but as far I can tell all of them are performing the measurements it real-time, which is inappropriate not only because of overhead but also because of the inconvenience - I would need to stop the program, capture the output of the tool and then kill it. I have seen that newer versions of Linux 2.6.20+ provide /proc/<pid>/io/ which contain the information I'm looking for; however, everything under /proc/<pid> when the process terminates, so I'm again back to the same problem as with any network monitoring tool.

    Read the article

  • Would SSD drives benefit from a non-default allocation unit size?

    - by davebug
    The default allocation unit size recommended when formatting a drive in our current set-up is 4096 bytes. I understand the basics of the pros and cons of larger and smaller sizes (performance boost vs. space preservation) but it seems the benefits of a solid state drive (seek times massively lower than hard disks) may create a situation where a much smaller allocation size is not detrimental. Were this the case it would at least partially help to overcome the disadvantage of SSD (massively higher prices per GB). Is there a way to determine the 'cost' of smaller allocation sizes specifically related to seek times? Or are there any studies or articles recommending a change from the default based on this newer tech? (Assume the most average scattering of sizes program files, OS files, data, mp3s, text files, etc.)

    Read the article

  • Can't ping my computer - "Transmit failed. General failure."

    - by Vaccano
    I am having an issue with my computer. My IIS services are not working. I have narrowed it down to the fact that my computer cannot find itself via its name. I try pinging my computer by its name and I get this: C:\Users\18773ping MyComputerNameHere Pinging MyComputerNameHere [::1] with 32 bytes of data: PING: transmit failed. General failure. PING: transmit failed. General failure. PING: transmit failed. General failure. PING: transmit failed. General failure. Ping statistics for ::1: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss), I tried having someone else ping my machine and it works fine for them. Any ideas?

    Read the article

  • Nested loop traversing arrays

    - by alecco
    There are 2 very big series of elements, the second 100 times bigger than the first. For each element of the first series, there are 0 or more elements on the second series. This can be traversed and processed with 2 nested loops. But the unpredictability of the amount of matching elements for each member of the first array makes things very, very slow. The actual processing of the 2nd series of elements involves logical and (&) and a population count. I couldn't find good optimizations using C but I am considering doing inline asm, doing rep* mov* or similar for each element of the first series and then doing the batch processing of the matching bytes of the second series, perhaps in buffers of 1MB or something. But the code would be get quite messy. Does anybody know of a better way? C preferred but x86 ASM OK too. Many thanks! Sample/demo code with simplified problem, first series are "people" and second series are "events", for clarity's sake. (the original problem is actually 100m and 10,000m entries!) #include <stdio.h> #include <stdint.h> #define PEOPLE 1000000 // 1m struct Person { uint8_t age; // Filtering condition uint8_t cnt; // Number of events for this person in E } P[PEOPLE]; // Each has 0 or more bytes with bit flags #define EVENTS 100000000 // 100m uint8_t P1[EVENTS]; // Property 1 flags uint8_t P2[EVENTS]; // Property 2 flags void init_arrays() { for (int i = 0; i < PEOPLE; i++) { // just some stuff P[i].age = i & 0x07; P[i].cnt = i % 220; // assert( sum < EVENTS ); } for (int i = 0; i < EVENTS; i++) { P1[i] = i % 7; // just some stuff P2[i] = i % 9; // just some other stuff } } int main(int argc, char *argv[]) { uint64_t sum = 0, fcur = 0; int age_filter = 7; // just some init_arrays(); // Init P, P1, P2 for (int64_t p = 0; p < PEOPLE ; p++) if (P[p].age < age_filter) for (int64_t e = 0; e < P[p].cnt ; e++, fcur++) sum += __builtin_popcount( P1[fcur] & P2[fcur] ); else fcur += P[p].cnt; // skip this person's events printf("(dummy %ld %ld)\n", sum, fcur ); return 0; } gcc -O5 -march=native -std=c99 test.c -o test

    Read the article

  • Program using read() entering into an infinite loop

    - by Soham
    1oid ReadBinary(char *infile,HXmap* AssetMap) { int fd; size_t bytes_read, bytes_expected = 100000000*sizeof(char); char *data; if ((fd = open(infile,O_RDONLY)) < 0) err(EX_NOINPUT, "%s", infile); if ((data = malloc(bytes_expected)) == NULL) err(EX_OSERR, "data malloc"); bytes_read = read(fd, data, bytes_expected); if (bytes_read != bytes_expected) printf("Read only %d of %d bytes %d\n", \ bytes_read, bytes_expected,EX_DATAERR); /* ... operate on data ... */ printf("\n"); int i=0; int counter=0; char ch=data[0]; char message[512]; Message* newMessage; while(i!=bytes_read) { while(ch!='\n') { message[counter]=ch; i++; counter++; ch =data[i]; } message[counter]='\n'; message[counter+1]='\0'; //--------------------------------------------------- newMessage = (Message*)parser(message); MessageProcess(newMessage,AssetMap); //-------------------------------------------------- //printf("idNUM %e\n",newMessage->idNum); free(newMessage); i++; counter=0; ch =data[i]; } free(data); } Here, I have allocated 100MB of data with malloc, and passed a file big enough(not 500MB) size of 926KB about. When I pass small files, it reads and exits like a charm, but when I pass a big enough file, the program executes till some point after which it just hangs. I suspect it either entered an infinite loop, or there is memory leak. EDIT For better understanding I stripped away all unnecessary function calls, and checked what happens, when given a large file as input. I have attached the modified code void ReadBinary(char *infile,HXmap* AssetMap) { int fd; size_t bytes_read, bytes_expected = 500000000*sizeof(char); char *data; if ((fd = open(infile,O_RDONLY)) < 0) err(EX_NOINPUT, "%s", infile); if ((data = malloc(bytes_expected)) == NULL) err(EX_OSERR, "data malloc"); bytes_read = read(fd, data, bytes_expected); if (bytes_read != bytes_expected) printf("Read only %d of %d bytes %d\n", \ bytes_read, bytes_expected,EX_DATAERR); /* ... operate on data ... */ printf("\n"); int i=0; int counter=0; char ch=data[0]; char message[512]; while(i<=bytes_read) { while(ch!='\n') { message[counter]=ch; i++; counter++; ch =data[i]; } message[counter]='\n'; message[counter+1]='\0'; i++; printf("idNUM \n"); counter=0; ch =data[i]; } free(data); } What looks like is, it prints a whole lot of idNUM's and then poof segmentation fault I think this is an interesting behaviour, and to me it looks like there is some problem with memory FURTHER EDIT I changed back the i!=bytes_read it gives no segmentation fault. When I check for i<=bytes_read it blows past the limits in the innerloop.(courtesy gdb)

    Read the article

  • Intel Corporation Ethernet Connection does not start properly

    - by Oscar Alejos
    I'm experiencing some problems when trying to connect my PC to the router through a switch. When the PC is directly connected to the router, everything works fine, Ubuntu (14.04) starts normally, and the Internet connection runs inmediately. The Ethernet controller is the Intel Corporation Ethernet Connection, as lspci returns: $ lspci | grep Eth 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-V (rev 04) However, when I try to connect through the switch what I get is the following. dmesg returns: $ dmesg | grep eth [ 1.035585] e1000e 0000:00:19.0 eth0: registered PHC clock [ 1.035587] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:22:4d:a7:be:5d [ 1.035589] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection [ 1.035625] e1000e 0000:00:19.0 eth0: MAC: 11, PHY: 12, PBA No: FFFFFF-0FF [ 1.357838] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 2.165413] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 2.165574] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 2.641287] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 16.715086] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx [ 16.715090] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO [ 16.715117] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready It looks like eth0 is properly working. Actually, nm-tool returns: $ nm-tool - Device: eth0 [Conexión cableada] ------------------------------------------- Type: Wired Driver: e1000e State: connected Default: yes HW Address: 00:22:4D:A7:BE:5D Capabilities: Carrier Detect: yes Speed: 100 Mb/s Wired Properties Carrier: on IPv4 Settings: Address: 192.168.1.30 Prefix: 24 (255.255.255.0) Gateway: 192.168.1.1 DNS: 80.58.61.250 DNS: 80.58.61.254 DNS: 192.168.1.1 However, ping returns: $ ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. From 192.168.1.30 icmp_seq=1 Destination Host Unreachable From 192.168.1.30 icmp_seq=2 Destination Host Unreachable From 192.168.1.30 icmp_seq=3 Destination Host Unreachable The connection is restored by restarting it: # ifconfig eth0 down # ifconfig eth0 up From this point on, everything runs smoothly, as if the PC were directly connected to the router. It seems to be an issue related to the integrated LAN adaptor and the Ethernet controller, since my laptop connects without any problem. My desktop board is an Intel DB85FL. I'd be grateful if anyone could give some ideas on how to solve this issue. Thank you in advance.

    Read the article

  • Obama’s win and the geeky stuff behind his success

    - by Gopinath
    Mr. Obama is elected as President of United States for the second term with a great majority. Here are few geeky bytes associated with Mr.Obama’s success and the world records he set up Obama Creates New Records on Twitter and Facebook – Just after winning the race for president, Barak Obama setup world records on Twitter & Facebook. His tweet “Four more years” re-tweeted more than 770K times and his Facebook post received 3.8 million likes. Twitter Kills the Fail Whale, One Tweet at a Time – Twitter handled insane user loads with millions of tweets related to election and proved that it’s platform is now more robust than ever. In a post on the company’s engineering blog, Twitter said people sent 31 million election-related tweets on Tuesday alone. From 8:11 p.m. to 9:11 p.m. P.S.T., Twitter processed an average 9,965 tweets per second, with a one-second peak of 15,107 tweets per second at 8:20 p.m., the company said. Obama’s win a big vindication for Nate Silver, king of the quants – Nate Silver, a statistician and blogger was spot on in predicting Obama’s win many weeks. He did not depend on astrology or surveys to predict Obama’s success. He used big data and statistical analysis to project votes. CNET says “Despite some incredulous political pundits, the FiveThirtyEight statistician appears to have correctly predicted the winner in all 50 states in the presidential election” Inside the Secret World of the Data Crunchers Who Helped Obama Win – Team Obama used big data and analytical systems to win the elections!! Barack Obama Goes On Reddit For Last Campaign Stop Of Political Career – Reditt is getting a lot of Obama’s attention these days. He is quite often stopping by Reditt to say hi to the nerds & geeks hanging out there. His chat with nerds on Reddit has paid rich dividends.

    Read the article

  • Why are data structures so important in interviews?

    - by Vamsi Emani
    I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures when I graduated out of college. Through out the campus placements during my graduation, I've witnessed that most of the biggie tech companies like Amazon, Microsoft etc focused mainly on data structures. It appears as if data structures is the only thing that they expect from a graduate. Adding to this, I see that there is this general perspective that a good programmer is necessarily a one with good knowledge about data structures. To be honest, I felt bad about that. I write good code. I follow standard design patterns of coding, I do use data structures but at the superficial level as in java exposed APIs like ArrayLists, LinkedLists etc. But the companies usually focused on the intricate aspects of Data Structures like pointer based memory manipulation and time complexities. Probably because of my java-ish background, Back then, I understood code efficiency and logic only when talked in terms of Object Oriented Programming like Objects, instances, etc but I never drilled down into the level of bits and bytes. I did not want people to look down upon me for this knowledge deficit of mine in Data Structures. So really why all this emphasis on Data Structures? Does, Not having knowledge in Data Structures really effect one's career in programming? Or is the knowledge in this subject really a sufficient basis to differentiate a good and a bad programmer?

    Read the article

  • Running a simple integration scenario using the Oracle Big Data Connectors on Hadoop/HDFS cluster

    - by hamsun
    Between the elephant ( the tradional image of the Hadoop framework) and the Oracle Iron Man (Big Data..) an english setter could be seen as the link to the right data Data, Data, Data, we are living in a world where data technology based on popular applications , search engines, Webservers, rich sms messages, email clients, weather forecasts and so on, have a predominant role in our life. More and more technologies are used to analyze/track our behavior, try to detect patterns, to propose us "the best/right user experience" from the Google Ad services, to Telco companies or large consumer sites (like Amazon:) ). The more we use all these technologies, the more we generate data, and thus there is a need of huge data marts and specific hardware/software servers (as the Exadata servers) in order to treat/analyze/understand the trends and offer new services to the users. Some of these "data feeds" are raw, unstructured data, and cannot be processed effectively by normal SQL queries. Large scale distributed processing was an emerging infrastructure need and the solution seemed to be the "collocation of compute nodes with the data", which in turn leaded to MapReduce parallel patterns and the development of the Hadoop framework, which is based on MapReduce and a distributed file system (HDFS) that runs on larger clusters of rather inexpensive servers. Several Oracle products are using the distributed / aggregation pattern for data calculation ( Coherence, NoSql, times ten ) so once that you are familiar with one of these technologies, lets says with coherence aggregators, you will find the whole Hadoop, MapReduce concept very similar. Oracle Big Data Appliance is based on the Cloudera Distribution (CDH), and the Oracle Big Data Connectors can be plugged on a Hadoop cluster running the CDH distribution or equivalent Hadoop clusters. In this paper, a "lab like" implementation of this concept is done on a single Linux X64 server, running an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0, and a single node Apache hadoop-1.2.1 HDFS cluster, using the SQL connector for HDFS. The whole setup is fairly simple: Install on a Linux x64 server ( or virtual box appliance) an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 server Get the Apache Hadoop distribution from: http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-1.2.1. Get the Oracle Big Data Connectors from: http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html?ssSourceSiteId=ocomen. Check the java version of your Linux server with the command: java -version java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Decompress the hadoop hadoop-1.2.1.tar.gz file to /u01/hadoop-1.2.1 Modify your .bash_profile export HADOOP_HOME=/u01/hadoop-1.2.1 export PATH=$PATH:$HADOOP_HOME/bin export HIVE_HOME=/u01/hive-0.11.0 export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin (also see my sample .bash_profile) Set up ssh trust for Hadoop process, this is a mandatory step, in our case we have to establish a "local trust" as will are using a single node configuration copy the new public keys to the list of authorized keys connect and test the ssh setup to your localhost: We will run a "pseudo-Hadoop cluster", in what is called "local standalone mode", all the Hadoop java components are running in one Java process, this is enough for our demo purposes. We need to "fine tune" some Hadoop configuration files, we have to go at our $HADOOP_HOME/conf, and modify the files: core-site.xml hdfs-site.xml mapred-site.xml check that the hadoop binaries are referenced correctly from the command line by executing: hadoop -version As Hadoop is managing our "clustered HDFS" file system we have to create "the mount point" and format it , the mount point will be declared to core-site.xml as: The layout under the /u01/hadoop-1.2.1/data will be created and used by other hadoop components (MapReduce = /mapred/...) HDFS is using the /dfs/... layout structure format the HDFS hadoop file system: Start the java components for the HDFS system As an additional check, you can use the GUI Hadoop browsers to check the content of your HDFS configurations: Once our HDFS Hadoop setup is done you can use the HDFS file system to store data ( big data : )), and plug them back and forth to Oracle Databases by the means of the Big Data Connectors ( which is the next configuration step). You can create / use a Hive db, but in our case we will make a simple integration of "raw data" , through the creation of an External Table to a local Oracle instance ( on the same Linux box, we run the Hadoop HDFS one node cluster and one Oracle DB). Download some public "big data", I use the site: http://france.meteofrance.com/france/observations, from where I can get *.csv files for my big data simulations :). Here is the data layout of my example file: Download the Big Data Connector from the OTN (oraosch-2.2.0.zip), unzip it to your local file system (see picture below) Modify your environment in order to access the connector libraries , and make the following test: [oracle@dg1 bin]$./hdfs_stream Usage: hdfs_stream locationFile [oracle@dg1 bin]$ Load the data to the Hadoop hdfs file system: hadoop fs -mkdir bgtest_data hadoop fs -put obsFrance.txt bgtest_data/obsFrance.txt hadoop fs -ls /user/oracle/bgtest_data/obsFrance.txt [oracle@dg1 bg-data-raw]$ hadoop fs -ls /user/oracle/bgtest_data/obsFrance.txt Found 1 items -rw-r--r-- 1 oracle supergroup 54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt [oracle@dg1 bg-data-raw]$hadoop fs -ls hdfs:///user/oracle/bgtest_data/obsFrance.txt Found 1 items -rw-r--r-- 1 oracle supergroup 54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt Check the content of the HDFS with the browser UI: Start the Oracle database, and run the following script in order to create the Oracle database user, the Oracle directories for the Oracle Big Data Connector (dg1 it’s my own db id replace accordingly yours): #!/bin/bash export ORAENV_ASK=NO export ORACLE_SID=dg1 . oraenv sqlplus /nolog <<EOF CONNECT / AS sysdba; CREATE OR REPLACE DIRECTORY osch_bin_path AS '/u01/orahdfs-2.2.0/bin'; CREATE USER BGUSER IDENTIFIED BY oracle; GRANT CREATE SESSION, CREATE TABLE TO BGUSER; GRANT EXECUTE ON sys.utl_file TO BGUSER; GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO BGUSER; CREATE OR REPLACE DIRECTORY BGT_LOG_DIR as '/u01/BG_TEST/logs'; GRANT READ, WRITE ON DIRECTORY BGT_LOG_DIR to BGUSER; CREATE OR REPLACE DIRECTORY BGT_DATA_DIR as '/u01/BG_TEST/data'; GRANT READ, WRITE ON DIRECTORY BGT_DATA_DIR to BGUSER; EOF Put the following in a file named t3.sh and make it executable, hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -D oracle.hadoop.exttab.tableName=BGTEST_DP_XTAB \ -D oracle.hadoop.exttab.defaultDirectory=BGT_DATA_DIR \ -D oracle.hadoop.exttab.dataPaths="hdfs:///user/oracle/bgtest_data/obsFrance.txt" \ -D oracle.hadoop.exttab.columnCount=7 \ -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/dg1 \ -D oracle.hadoop.connection.user=BGUSER \ -D oracle.hadoop.exttab.printStackTrace=true \ -createTable --noexecute then test the creation fo the external table with it: [oracle@dg1 samples]$ ./t3.sh ./t3.sh: line 2: /u01/orahdfs-2.2.0: Is a directory Oracle SQL Connector for HDFS Release 2.2.0 - Production Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved. Enter Database Password:] The create table command was not executed. The following table would be created. CREATE TABLE "BGUSER"."BGTEST_DP_XTAB" ( "C1" VARCHAR2(4000), "C2" VARCHAR2(4000), "C3" VARCHAR2(4000), "C4" VARCHAR2(4000), "C5" VARCHAR2(4000), "C6" VARCHAR2(4000), "C7" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "BGT_DATA_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 STRING SIZES ARE IN CHARACTERS PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "C1" CHAR(4000), "C2" CHAR(4000), "C3" CHAR(4000), "C4" CHAR(4000), "C5" CHAR(4000), "C6" CHAR(4000), "C7" CHAR(4000) ) ) LOCATION ( 'osch-20131022081035-74-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files would be created. osch-20131022081035-74-1 contains 1 URI, 54103 bytes 54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt Then remove the --noexecute flag and create the external Oracle table for the Hadoop data. Check the results: The create table command succeeded. CREATE TABLE "BGUSER"."BGTEST_DP_XTAB" ( "C1" VARCHAR2(4000), "C2" VARCHAR2(4000), "C3" VARCHAR2(4000), "C4" VARCHAR2(4000), "C5" VARCHAR2(4000), "C6" VARCHAR2(4000), "C7" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "BGT_DATA_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 STRING SIZES ARE IN CHARACTERS PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "C1" CHAR(4000), "C2" CHAR(4000), "C3" CHAR(4000), "C4" CHAR(4000), "C5" CHAR(4000), "C6" CHAR(4000), "C7" CHAR(4000) ) ) LOCATION ( 'osch-20131022081719-3239-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files were created. osch-20131022081719-3239-1 contains 1 URI, 54103 bytes 54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt This is the view from the SQL Developer: and finally the number of lines in the oracle table, imported from our Hadoop HDFS cluster SQL select count(*) from "BGUSER"."BGTEST_DP_XTAB"; COUNT(*) ---------- 1151 In a next post we will integrate data from a Hive database, and try some ODI integrations with the ODI Big Data connector. Our simplistic approach is just a step to show you how these unstructured data world can be integrated to Oracle infrastructure. Hadoop, BigData, NoSql are great technologies, they are widely used and Oracle is offering a large integration infrastructure based on these services. Oracle University presents a complete curriculum on all the Oracle related technologies: NoSQL: Introduction to Oracle NoSQL Database Using Oracle NoSQL Database Big Data: Introduction to Big Data Oracle Big Data Essentials Oracle Big Data Overview Oracle Data Integrator: Oracle Data Integrator 12c: New Features Oracle Data Integrator 11g: Integration and Administration Oracle Data Integrator: Administration and Development Oracle Data Integrator 11g: Advanced Integration and Development Oracle Coherence 12c: Oracle Coherence 12c: New Features Oracle Coherence 12c: Share and Manage Data in Clusters Oracle Coherence 12c: Oracle GoldenGate 11g: Fundamentals for Oracle Oracle GoldenGate 11g: Fundamentals for SQL Server Oracle GoldenGate 11g Fundamentals for Oracle Oracle GoldenGate 11g Fundamentals for DB2 Oracle GoldenGate 11g Fundamentals for Teradata Oracle GoldenGate 11g Fundamentals for HP NonStop Oracle GoldenGate 11g Management Pack: Overview Oracle GoldenGate 11g Troubleshooting and Tuning Oracle GoldenGate 11g: Advanced Configuration for Oracle Other Resources: Apache Hadoop : http://hadoop.apache.org/ is the homepage for these technologies. "Hadoop Definitive Guide 3rdEdition" by Tom White is a classical lecture for people who want to know more about Hadoop , and some active "googling " will also give you some more references. About the author: Eugene Simos is based in France and joined Oracle through the BEA-Weblogic Acquisition, where he worked for the Professional Service, Support, end Education for major accounts across the EMEA Region. He worked in the banking sector, ATT, Telco companies giving him extensive experience on production environments. Eugen currently specializes in Oracle Fusion Middleware teaching an array of courses on Weblogic/Webcenter, Content,BPM /SOA/Identity-Security/GoldenGate/Virtualisation/Unified Comm Suite) throughout the EMEA region.

    Read the article

  • Microsoft Test Manager error in displaying test steps caused by malware

    - by terje
    Sometimes the tool is blamed for errors which are not the fault of the tool – this is one such story.  It was however, not so easy to get to the bottom of it, so I hope sharing this story can help some others. One of our test developers started to get this message inside the test steps part of a test case in the MTM. saying “Could not load file or assembly ‘0 bytes from System, Version=4.0.0.0,……..” The same error came up inside Visual Studio when we opened a test case there. Then we noted a similar error on another piece of software – this error: A System.BadImageFormatException, and same message as above, but just for framework 2.0. We found this  description which pointed to a malware problem (See bottom of that post), that is a fake anti-spyware program called “Additional Guard”.  We checked the computer in question using Malwarebytes Anti-Malware tool.  It found and cleaned out 753 registry keys!!  After this cleanup operation the error was gone.  This is a great tool !  The “Additional Guard” program had been inadvertently installed, and then uninstalled afterwards, but the corrupted keys were of course not removed.  We also noted that this computer had full corporate virus scanning and malware protection, but still this nasty little thing still slipped through. Technorati Tags: Malware,BadImageFormatException,Microsoft Test Manager,Malwarebytes

    Read the article

  • Restoring android after ubuntu touch fails

    - by deimus
    I'm trying to restore android after playing around with ubuntu touch I follow exactly the same steps described the ubuntu's wiki page i.e. Download the factory image corresponding to your device's model and version (initial table has links). Ensure the device is connected and powered on. Extract the downloaded file and cd into the extracted directory. run adb reboot-bootloader run ./flash-all.sh (use sudo if lack of permissions on the workstation don't allow you to talk to the device). The archive is downloaded successfully, checked the sha1 checksum everything is ok. But the ./flash-all.sh fails like this sending 'bootloader' (2308 KB)... OKAY [ 0.513s] writing 'bootloader'... OKAY [ 0.292s] finished. total time: 0.805s rebooting into bootloader... OKAY [ 0.007s] finished. total time: 0.008s sending 'radio' (12288 KB)... OKAY [ 2.668s] writing 'radio'... OKAY [ 1.372s] finished. total time: 4.040s rebooting into bootloader... OKAY [ 0.009s] finished. total time: 0.009s archive does not contain 'boot.sig' archive does not contain 'recovery.sig' failed to allocate 435793780 bytes error: update package missing system.img My device is Nexus 4. Tried both 4.2.2 and 4.3 androind versions for Nexus 4 still the same. Any ideas how problem can be solved ?

    Read the article

  • Building Simple Workflows in Oozie

    - by dan.mcclary
    Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

    Read the article

< Previous Page | 109 110 111 112 113 114 115 116 117 118 119 120  | Next Page >