ecce homo - Developer IT

Uniq in awk; removing duplicate values in a column using awk

- by D W

I have a large datafile in the following format below: ENST00000371026 WDR78,WDR78,WDR78, WD repeat domain 78 isoform 1,WD repeat domain 78 isoform 1,WD repeat domain 78 isoform 2, ENST00000371023 WDR32 WD repeat domain 32 isoform 2 ENST00000400908 RERE,KIAA0458, atrophin-1 like protein isoform a,Homo sapiens mRNA for KIAA0458 protein, partial cds., The columns are tab separated. Multiple values within columns are comma separated. I would like to remove the duplicate values in the second column to result in something like this: ENST00000371026 WDR78 WD repeat domain 78 isoform 1,WD repeat domain 78 isoform 1,WD repeat domain 78 isoform 2, ENST00000371023 WDR32 WD repeat domain 32 isoform 2 ENST00000400908 RERE,KIAA0458 atrophin-1 like protein isoform a,Homo sapiens mRNA for KIAA0458 protein, partial cds., I tried the following code below but it doesn't seem to remove the duplicate values. awk ' BEGIN { FS="\t" } ; { split($2, valueArray,","); j=0; for (i in valueArray) { if (!( valueArray[i] in duplicateArray)) { duplicateArray[j] = valueArray[i]; j++; } }; printf $1 "\t"; for (j in duplicateArray) { if (duplicateArray[j]) { printf duplicateArray[j] ","; } } printf "\t"; print $3 }' knownGeneFromUCSC.txt How can I remove the duplicates in column 2 correctly?

Read the article

Why Software Sucks...and What You Can Do About It – book review

- by DigiMortal

How do our users see the products we are writing for them and how happy they are with our work? Are they able to get their work done without fighting with cool features and crashes or are they just switching off resistance part of their brain to survive our software? Yeah, the overall picture of software usability landscape is not very nice. Okay, it is not even nice. But, fortunately, Why Software Sucks...and What You Can Do About It by David S. Platt explains everything. Why Software Sucks… is book for software users but I consider it as a-must reading also for developers and specially for their managers whose politics often kills all usability topics as soon as they may appear. For managers usability is soft topic that can be manipulated the way it is best in current state of project. Although developers are not UI designers and usability experts they are still very often forced to deal with these topics and this is how usability problems start (of course, also designers are able to produce designs that are stupid and too hard to use for users, but this blog here is about development). I found this book to be very interesting and funny reading. It is not humor book but it explains you all so you remember later very well what you just read. It took me about three evenings to go through this book and I am still enjoying what I found and how author explains our weird young working field to end users. I suggest this book to all developers – while you are demanding your management to hire or outsource usability expert you are at least causing less pain to end users. So, go and buy this book, just like I did. And… they thanks to mr. Platt :) There is one book more I suggest you to read if you are interested in usability - Don't Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition by Steve Krug. Editorial review from Amazon Today’s software sucks. There’s no other good way to say it. It’s unsafe, allowing criminal programs to creep through the Internet wires into our very bedrooms. It’s unreliable, crashing when we need it most, wiping out hours or days of work with no way to get it back. And it’s hard to use, requiring large amounts of head-banging to figure out the simplest operations. It’s no secret that software sucks. You know that from personal experience, whether you use computers for work or personal tasks. In this book, programming insider David Platt explains why that’s the case and, more importantly, why it doesn’t have to be that way. And he explains it in plain, jargon-free English that’s a joy to read, using real-world examples with which you’re already familiar. In the end, he suggests what you, as a typical user, without a technical background, can do about this sad state of our software—how you, as an informed consumer, don’t have to take the abuse that bad software dishes out. As you might expect from the book’s title, Dave’s expose is laced with humor—sometimes outrageous, but always dead on. You’ll laugh out loud as you recall incidents with your own software that made you cry. You’ll slap your thigh with the same hand that so often pounded your computer desk and wished it was a bad programmer’s face. But Dave hasn’t written this book just for laughs. He’s written it to give long-overdue voice to your own discovery—that software does, indeed, suck, but it shouldn’t. Table of contents Acknowledgments xiii Introduction Chapter 1: Who’re You Calling a Dummy? Where We Came From Why It Still Sucks Today Control versus Ease of Use I Don’t Care How Your Program Works A Bad Feature and a Good One Stopping the Proceedings with Idiocy Testing on Live Animals Where We Are and What You Can Do Chapter 2: Tangled in the Web Where We Came From How It Works Why It Still Sucks Today Client-Centered Design versus Server-Centered Design Where’s My Eye Opener? It’s Obvious—Not! Splash, Flash, and Animation Testing on Live Animals What You Can Do about It Chapter 3: Keep Me Safe The Way It Was Why It Sucks Today What Programmers Need to Know, but Don’t A Human Operation Budgeting for Hassles Users Are Lazy Social Engineering Last Word on Security What You Can Do Chapter 4: Who the Heck Are You? Where We Came From Why It Still Sucks Today Incompatible Requirements OK, So Now What? Chapter 5: Who’re You Looking At? Yes, They Know You Why It Sucks More Than Ever Today Users Don’t Know Where the Risks Are What They Know First Milk You with Cookies? Privacy Policy Nonsense Covering Your Tracks The Google Conundrum Solution Chapter 6: Ten Thousand Geeks, Crazed on Jolt Cola See Them in Their Native Habitat All These Geeks Who Speaks, and When, and about What Selling It The Next Generation of Geeks—Passing It On Chapter 7: Who Are These Crazy Bastards Anyway? Homo Logicus Testosterone Poisoning Control and Contentment Making Models Geeks and Jocks Jargon Brains and Constraints Seven Habits of Geeks Chapter 8: Microsoft: Can’t Live With ’Em and Can’t Live Without ’Em They Run the World Me and Them Where We Came From Why It Sucks Today Damned if You Do, Damned if You Don’t We Love to Hate Them Plus ça Change Growing-Up Pains What You Can Do about It The Last Word Chapter 9: Doing Something About It 1. Buy 2. Tell 3. Ridicule 4. Trust 5. Organize Epilogue About the Author

Read the article

How can I extract paragaphs and selected lines with Perl?

- by neversaint

I have a text where I need to: to extract the whole paragraph under the section "Aceview summary" until the line that starts with "Please quote" (not to be included). to extract the line that starts with "The closest human gene". to store them into array with two elements. The text looks like this (also on pastebin): AceView: gene:1700049G17Rik, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView. <META NAME="title" CONTENT=" AceView: gene:1700049G17Rik a comprehensive annotation of human, mouse and worm genes with mRNAs or EST"> <META NAME="keywords" CONTENT=" AceView, genes, Acembly, AceDB, Homo sapiens, Human, nematode, Worm, Caenorhabditis elegans , WormGenes, WormBase, mouse, mammal, Arabidopsis, gene, alternative splicing variant, structure, sequence, DNA, EST, mRNA, cDNA clone, transcript, transcription, genome, transcriptome, proteome, peptide, GenBank accession, dbest, RefSeq, LocusLink, non-coding, coding, exon, intron, boundary, exon-intron junction, donor, acceptor, 3'UTR, 5'UTR, uORF, poly A, poly-A site, molecular function, protein annotation, isoform, gene family, Pfam, motif ,Blast, Psort, GO, taxonomy, homolog, cellular compartment, disease, illness, phenotype, RNA interference, RNAi, knock out mutant expression, regulation, protein interaction, genetic, map, antisense, trans-splicing, operon, chromosome, domain, selenocysteine, Start, Met, Stop, U12, RNA editing, bibliography"> <META NAME="Description" CONTENT= " AceView offers a comprehensive annotation of human, mouse and nematode genes reconstructed by co-alignment and clustering of all publicly available mRNAs and ESTs on the genome sequence. Our goals are to offer a reliable up-to-date resource on the genes, their functions, alternative variants, expression, regulation and interactions, in the hope to stimulate further validating experiments at the bench "> <meta name="author" content="Danielle Thierry-Mieg and Jean Thierry-Mieg, NCBI/NLM/NIH, [email protected]">  However I am stuck with the following script logic. What's the right way to achieve that? #!/usr/bin/perl -w my $INFILE_file_name = $file; # input file name open ( INFILE, '<', $INFILE_file_name ) or croak "$0 : failed to open input file $INFILE_file_name : $!\n"; my @allsum; while ( <INFILE> ) { chomp; my $line = $_; my @temp1 = (); if ( $line =~ /^ AceView summary/ ) { print "$line\n"; push @temp1, $line; } elsif( $line =~ /Please quote/) { push @allsum, [@temp1]; @temp1 = (); } elsif ($line =~ /The closest human gene/) { push @allsum, $line; } } close ( INFILE ); # close input file # Do something with @allsum There are many files like that I need to process.

Read the article

Unable to ping local machines by name in Windows 7

- by aardvarkk

I'm having a strange (and persistent!) problem with pinging local machines on my network by name. I believe my machine (Windows 7 64-bit) is the only one having this issue. This is over a wireless connection. As an example, consider a device on my network by the name of WDTVLiveHub. It's a Western Digital Live Hub (surprise!). If I go to my router's DHCP Client Table in the browser (my router is a WRT400N), I see this entry: WDTVLiveHub 192.168.1.101 Great. So I try to ping that IP address: ping 192.168.1.101 Pinging 192.168.1.101 with 32 bytes of data: Reply from 192.168.1.101: bytes=32 time=9ms TTL=64 Reply from 192.168.1.101: bytes=32 time=16ms TTL=64 Reply from 192.168.1.101: bytes=32 time=16ms TTL=64 Reply from 192.168.1.101: bytes=32 time=16ms TTL=64 Ping statistics for 192.168.1.101: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 9ms, Maximum = 16ms, Average = 14ms OK, still looking good. Now I try to ping it by name: ping WDTVLiveHub Ping request could not find host WDTVLiveHub. Please check the name and try again. From what I've read, this implies a problem with DNS servers and host name lookups. Interestingly, if I type the following: pathping 192.168.1.101 I get this output: Tracing route to WDTVLIVEHUB [192.168.1.101] over a maximum of 30 hops: 0 Scotty [192.168.1.103] 1 WDTVLIVEHUB [192.168.1.101] Computing statistics for 25 seconds... Source to Here This Node/Link Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address 0 Scotty [192.168.1.103] 1/ 100 = 1% | 1 12ms 1/ 100 = 1% 0/ 100 = 0% WDTVLIVEHUB [192.168.1.101] Trace complete. Scotty is obviously the name of my local machine. So it's able to find the name somehow when I do that approach... ipconfig /all shows the following under DNS servers: DNS Servers . . . . . . . . . . . : 192.168.1.1 ***.***.***.*** ***.***.***.*** Where the * represents the same DNS servers that show up in my router under DNS 1 and DNS 2 through the Internet. For completeness, here's the whole output of ipconfig /all: Windows IP Configuration Host Name . . . . . . . . . . . . : Scotty Primary Dns Suffix . . . . . . . : Node Type . . . . . . . . . . . . : Peer-Peer IP Routing Enabled. . . . . . . . : No WINS Proxy Enabled. . . . . . . . : No Wireless LAN adapter Wireless Network Connection: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Dell Wireless 1397 WLAN Mini-Card Physical Address. . . . . . . . . : 0C-EE-E6-D1-07-E8 DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes IPv6 Address. . . . . . . . . . . : 2002:d83a:31e5:1234:5592:398e:8968:43d1(Preferred) Temporary IPv6 Address. . . . . . : 2002:d83a:31e5:1234:ecce:2f79:72a5:5273(Preferred) Link-local IPv6 Address . . . . . : fe80::5592:398e:8968:43d1%26(Preferred) IPv4 Address. . . . . . . . . . . : 192.168.1.103(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Lease Obtained. . . . . . . . . . : September-17-12 11:05:57 PM Lease Expires . . . . . . . . . . : September-18-12 11:05:57 PM Default Gateway . . . . . . . . . : fe80::200:ff:fe00:0%26 192.168.1.1 DHCP Server . . . . . . . . . . . : 192.168.1.1 DHCPv6 IAID . . . . . . . . . . . : 537718502 DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-12-80-3D-D7-00-26-B9-0D-08-70 DNS Servers . . . . . . . . . . . : 192.168.1.1 ***.***.***.*** ***.***.***.*** NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter VirtualBox Host-Only Network: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : VirtualBox Host-Only Ethernet Adapter Physical Address. . . . . . . . . : 08-00-27-00-98-9A DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes Link-local IPv6 Address . . . . . : fe80::b48a:916b:c0f:fb29%23(Preferred) Autoconfiguration IPv4 Address. . : 169.254.251.41(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.0.0 Default Gateway . . . . . . . . . : DHCPv6 IAID . . . . . . . . . . . : 570949671 DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-12-80-3D-D7-00-26-B9-0D-08-70 DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1 fec0:0:0:ffff::2%1 fec0:0:0:ffff::3%1 NetBIOS over Tcpip. . . . . . . . : Enabled Tunnel adapter Local Area Connection* 15: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Teredo Tunneling Pseudo-Interface Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes Tunnel adapter isatap.{55899375-C31D-4173-A529-4427D63FD28B}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Microsoft ISATAP Adapter #2 Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes Tunnel adapter isatap.{64B8F35F-A6AB-4D6B-B1D5-DD95F57B1458}: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Microsoft ISATAP Adapter #3 Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes Not sure exactly how to diagnose exactly what's going on... but the problem is really frustrating! The biggest problem is that my mapped network drives have to be done by IP, and then any time the router assigns new IP addresses to those devices, all of my network shares break again. Stinks! Would love some assistance on possible solutions. I've tried all of this netsh catalog resetting and that didn't seem to fix anything at all. Would love an explanation of what's going wrong, too, rather than blindly resetting things! Thanks!

Developer IT