hanging - Page 24 - Developer IT

Toorcon 15 (2013)

- by danx

The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

Read the article

Diving into OpenStack Network Architecture - Part 1

- by Ronen Kofman

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} rkofman Normal rkofman 83 3045 2014-05-23T21:11:00Z 2014-05-27T06:58:00Z 3 1883 10739 Oracle Corporation 89 25 12597 12.00 140 Clean Clean false false false false EN-US X-NONE HE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Arial; mso-bidi-theme-font:minor-bidi; mso-bidi-language:AR-SA;} Before we begin OpenStack networking has very powerful capabilities but at the same time it is quite complicated. In this blog series we will review an existing OpenStack setup using the Oracle OpenStack Tech Preview and explain the different network components through use cases and examples. The goal is to show how the different pieces come together and provide a bigger picture view of the network architecture in OpenStack. This can be very helpful to users making their first steps in OpenStack or anyone wishes to understand how networking works in this environment. We will go through the basics first and build the examples as we go. According to the recent Icehouse user survey and the one before it, Neutron with Open vSwitch plug-in is the most widely used network setup both in production and in POCs (in terms of number of customers) and so in this blog series we will analyze this specific OpenStack networking setup. As we know there are many options to setup OpenStack networking and while Neturon + Open vSwitch is the most popular setup there is no claim that it is either best or the most efficient option. Neutron + Open vSwitch is an example, one which provides a good starting point for anyone interested in understanding OpenStack networking. Even if you are using different kind of network setup such as different Neutron plug-in or even not using Neutron at all this will still be a good starting point to understand the network architecture in OpenStack. The setup we are using for the examples is the one used in the Oracle OpenStack Tech Preview. Installing it is simple and it would be helpful to have it as reference. In this setup we use eth2 on all servers for VM network, all VM traffic will be flowing through this interface.The Oracle OpenStack Tech Preview is using VLANs for L2 isolation to provide tenant and network isolation. The following diagram shows how we have configured our deployment: This first post is a bit long and will focus on some basic concepts in OpenStack networking. The components we will be discussing are Open vSwitch, network namespaces, Linux bridge and veth pairs. Note that this is not meant to be a comprehensive review of these components, it is meant to describe the component as much as needed to understand OpenStack network architecture. All the components described here can be further explored using other resources. Open vSwitch (OVS) In the Oracle OpenStack Tech Preview OVS is used to connect virtual machines to the physical port (in our case eth2) as shown in the deployment diagram. OVS contains bridges and ports, the OVS bridges are different from the Linux bridge (controlled by the brctl command) which are also used in this setup. To get started let’s view the OVS structure, use the following command: # ovs-vsctl show 7ec51567-ab42-49e8-906d-b854309c9edf Bridge br-int Port br-int Interface br-int type: internal Port "int-br-eth2" Interface "int-br-eth2" Bridge "br-eth2" Port "br-eth2" Interface "br-eth2" type: internal Port "eth2" Interface "eth2" Port "phy-br-eth2" Interface "phy-br-eth2" ovs_version: "1.11.0" We see a standard post deployment OVS on a compute node with two bridges and several ports hanging off of each of them. The example above is a compute node without any VMs, we can see that the physical port eth2 is connected to a bridge called “br-eth2”. We also see two ports "int-br-eth2" and "phy-br-eth2" which are actually a veth pair and form virtual wire between the two bridges, veth pairs are discussed later in this post. When a virtual machine is created a port is created on one the br-int bridge and this port is eventually connected to the virtual machine (we will discuss the exact connectivity later in the series). Here is how OVS looks after a VM was launched: # ovs-vsctl show efd98c87-dc62-422d-8f73-a68c2a14e73d Bridge br-int Port "int-br-eth2" Interface "int-br-eth2" Port br-int Interface br-int type: internal Port "qvocb64ea96-9f" tag: 1 Interface "qvocb64ea96-9f" Bridge "br-eth2" Port "phy-br-eth2" Interface "phy-br-eth2" Port "br-eth2" Interface "br-eth2" type: internal Port "eth2" Interface "eth2" ovs_version: "1.11.0" Bridge "br-int" now has a new port "qvocb64ea96-9f" which connects to the VM and tagged with VLAN 1. Every VM which will be launched will add a port on the “br-int” bridge for every network interface the VM has. Another useful command on OVS is dump-flows for example: # ovs-ofctl dump-flows br-int NXST_FLOW reply (xid=0x4): cookie=0x0, duration=735.544s, table=0, n_packets=70, n_bytes=9976, idle_age=17, priority=3,in_port=1,dl_vlan=1000 actions=mod_vlan_vid:1,NORMAL cookie=0x0, duration=76679.786s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop cookie=0x0, duration=76681.36s, table=0, n_packets=68, n_bytes=7950, idle_age=17, hard_age=65534, priority=1 actions=NORMAL As we see the port which is connected to the VM has the VLAN tag 1. However the port on the VM network (eth2) will be using tag 1000. OVS is modifying the vlan as the packet flow from the VM to the physical interface. In OpenStack the Open vSwitch agent takes care of programming the flows in Open vSwitch so the users do not have to deal with this at all. If you wish to learn more about how to program the Open vSwitch you can read more about it at http://openvswitch.org looking at the documentation describing the ovs-ofctl command. Network Namespaces (netns) Network namespaces is a very cool Linux feature can be used for many purposes and is heavily used in OpenStack networking. Network namespaces are isolated containers which can hold a network configuration and is not seen from outside of the namespace. A network namespace can be used to encapsulate specific network functionality or provide a network service in isolation as well as simply help to organize a complicated network setup. Using the Oracle OpenStack Tech Preview we are using the latest Unbreakable Enterprise Kernel R3 (UEK3), this kernel provides a complete support for netns. Let's see how namespaces work through couple of examples to control network namespaces we use the ip netns command: Defining a new namespace: # ip netns add my-ns # ip netns list my-ns As mentioned the namespace is an isolated container, we can perform all the normal actions in the namespace context using the exec command for example running the ifconfig command: # ip netns exec my-ns ifconfig -a lo Link encap:Local Loopback LOOPBACK MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) We can run every command in the namespace context, this is especially useful for debug using tcpdump command, we can ping or ssh or define iptables all within the namespace. Connecting the namespace to the outside world: There are various ways to connect into a namespaces and between namespaces we will focus on how this is done in OpenStack. OpenStack uses a combination of Open vSwitch and network namespaces. OVS defines the interfaces and then we can add those interfaces to namespace. So first let's add a bridge to OVS: # ovs-vsctl add-br my-bridge Now let's add a port on the OVS and make it internal: # ovs-vsctl add-port my-bridge my-port # ovs-vsctl set Interface my-port type=internal And let's connect it into the namespace: # ip link set my-port netns my-ns Looking inside the namespace: # ip netns exec my-ns ifconfig -a lo Link encap:Local Loopback LOOPBACK MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) my-port Link encap:Ethernet HWaddr 22:04:45:E2:85:21 BROADCAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Now we can add more ports to the OVS bridge and connect it to other namespaces or other device like physical interfaces. Neutron is using network namespaces to implement network services such as DCHP, routing, gateway, firewall, load balance and more. In the next post we will go into this in further details. Linux Bridge and veth pairs Linux bridge is used to connect the port from OVS to the VM. Every port goes from the OVS bridge to a Linux bridge and from there to the VM. The reason for using regular Linux bridges is for security groups’ enforcement. Security groups are implemented using iptables and iptables can only be applied to Linux bridges and not to OVS bridges. Veth pairs are used extensively throughout the network setup in OpenStack and are also a good tool to debug a network problem. Veth pairs are simply a virtual wire and so veths always come in pairs. Typically one side of the veth pair will connect to a bridge and the other side to another bridge or simply left as a usable interface. In this example we will create some veth pairs, connect them to bridges and test connectivity. This example is using regular Linux server and not an OpenStack node: Creating a veth pair, note that we define names for both ends: # ip link add veth0 type veth peer name veth1 # ifconfig -a . . veth0 Link encap:Ethernet HWaddr 5E:2C:E6:03:D0:17 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) veth1 Link encap:Ethernet HWaddr E6:B6:E2:6D:42:B8 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) . . To make the example more meaningful this we will create the following setup: veth0 => veth1 => br-eth3 => eth3 ======> eth2 on another Linux server br-eth3 – a regular Linux bridge which will be connected to veth1 and eth3 eth3 – a physical interface with no IP on it, connected to a private network eth2 – a physical interface on the remote Linux box connected to the private network and configured with the IP of 50.50.50.1 Once we create the setup we will ping 50.50.50.1 (the remote IP) through veth0 to test that the connection is up: # brctl addbr br-eth3 # brctl addif br-eth3 eth3 # brctl addif br-eth3 veth1 # brctl show bridge name bridge id STP enabled interfaces br-eth3 8000.00505682e7f6 no eth3 veth1 # ifconfig veth0 50.50.50.50 # ping -I veth0 50.50.50.51 PING 50.50.50.51 (50.50.50.51) from 50.50.50.50 veth0: 56(84) bytes of data. 64 bytes from 50.50.50.51: icmp_seq=1 ttl=64 time=0.454 ms 64 bytes from 50.50.50.51: icmp_seq=2 ttl=64 time=0.298 ms When the naming is not as obvious as the previous example and we don't know who are the paired veth interfaces we can use the ethtool command to figure this out. The ethtool command returns an index we can look up using ip link command, for example: # ethtool -S veth1 NIC statistics: peer_ifindex: 12 # ip link . . 12: veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 Summary That’s all for now, we quickly reviewed OVS, network namespaces, Linux bridges and veth pairs. These components are heavily used in the OpenStack network architecture we are exploring and understanding them well will be very useful when reviewing the different use cases. In the next post we will look at how the OpenStack network is laid out connecting the virtual machines to each other and to the external world. @RonenKofman

Read the article

What is hogging my connection?

- by SF.

At times it seems like dozens, if not hundreds of root-owned HTTP connections spring up. This is not much of a problem on LAN or WLAN as each of them seems to transfer very little, but if I use GPRS link, my ping times go into minutes (seriously, 80000ms is not infrequent!) and all connections grind to a halt waiting till these end. This usually lasts some 15 minutes and ends about when I start troubleshooting it for real. I've managed to capture a fragment of Nethogs output NetHogs version 0.8.0 PID USER PROGRAM DEV SENT RECEIVED ? root 37.209.147.180:59854-141.101.114.59:80 0.013 0.000 KB/sec ? root 37.209.147.180:59853-141.101.114.59:80 0.000 0.000 KB/sec ? root 37.209.147.180:52804-173.194.70.95:80 0.000 0.000 KB/sec 1954 bw /home/bw/.dropbox-dist/dropbox ppp0 0.000 0.000 KB/sec ? root 37.209.147.180:59851-141.101.114.59:80 0.000 0.000 KB/sec ? root 37.209.147.180:59850-141.101.114.59:80 0.000 0.000 KB/sec ? root 37.209.147.180:52801-173.194.70.95:80 0.000 0.000 KB/sec 13301 bw /usr/lib/firefox/firefox ppp0 0.000 0.000 KB/sec ? root unknown TCP 0.000 0.000 KB/sec Unfortunately, it doesn't display the owning process of these. Does anyone recognize these addresses or is able to suggest how to troubleshoot it further or disable it? Is it some automatic update or something like that? EDIT: per request; netstat -n, for obvious reason that normal netstat won't ever launch as all DNS requests are hogged just the same. netstat -n Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 1 93.154.166.62:51314 198.252.206.16:80 FIN_WAIT1 tcp 0 1 37.209.147.180:44098 198.252.206.16:80 FIN_WAIT1 tcp 0 1 37.209.147.180:59855 141.101.114.59:80 FIN_WAIT1 tcp 1 0 192.168.43.224:38237 213.189.45.39:443 CLOSE_WAIT tcp 1 0 93.154.146.186:35167 75.101.152.29:80 CLOSE_WAIT tcp 1 0 192.168.43.224:32939 199.15.160.100:80 CLOSE_WAIT tcp 1 0 192.168.43.224:55619 63.245.217.207:443 CLOSE_WAIT tcp 1 0 93.154.146.186:60210 75.101.152.29:443 CLOSE_WAIT tcp 1 0 192.168.43.224:32944 199.15.160.100:80 CLOSE_WAIT tcp 0 1 37.209.147.180:52804 173.194.70.95:80 FIN_WAIT1 tcp 1 0 93.154.146.186:46606 23.21.151.181:80 CLOSE_WAIT tcp 1 0 93.154.146.186:52619 107.22.246.76:80 CLOSE_WAIT tcp 415 0 93.154.146.186:36156 82.112.106.104:80 CLOSE_WAIT tcp 1 0 93.154.146.186:50352 107.22.246.76:443 CLOSE_WAIT tcp 1 0 192.168.43.224:55000 213.189.45.44:443 CLOSE_WAIT tcp 0 1 37.209.147.180:59853 141.101.114.59:80 FIN_WAIT1 tcp 1 0 192.168.43.224:32937 199.15.160.100:80 CLOSE_WAIT tcp 1 0 192.168.43.224:56055 93.184.221.40:80 CLOSE_WAIT tcp 415 0 93.154.146.186:36155 82.112.106.104:80 CLOSE_WAIT tcp 0 1 37.209.147.180:44097 198.252.206.16:80 FIN_WAIT1 tcp 1 0 93.154.146.186:35166 75.101.152.29:80 CLOSE_WAIT tcp 1 0 192.168.43.224:32943 199.15.160.100:80 CLOSE_WAIT tcp 1 0 93.154.146.186:46607 23.21.151.181:80 CLOSE_WAIT tcp 1 0 93.154.146.186:36422 23.21.151.181:443 CLOSE_WAIT tcp 1 0 192.168.43.224:36081 93.184.220.148:80 CLOSE_WAIT tcp 1 0 192.168.43.224:44462 213.189.45.29:443 CLOSE_WAIT tcp 1 0 192.168.43.224:32938 199.15.160.100:80 CLOSE_WAIT tcp 1 0 93.154.146.186:36419 23.21.151.181:443 CLOSE_WAIT tcp 0 497 93.154.166.62:51313 198.252.206.16:80 FIN_WAIT1 tcp 0 1 37.209.147.180:59851 141.101.114.59:80 FIN_WAIT1 tcp 0 1 37.209.147.180:44095 198.252.206.16:80 FIN_WAIT1 tcp 1 0 93.154.146.186:46611 23.21.151.181:80 CLOSE_WAIT tcp 1 0 192.168.43.224:38236 213.189.45.39:443 CLOSE_WAIT tcp 0 171 37.209.147.180:45341 173.194.113.146:443 ESTABLISHED tcp 0 1 37.209.147.180:52801 173.194.70.95:80 FIN_WAIT1 tcp 1 0 192.168.43.224:36080 93.184.220.148:80 CLOSE_WAIT tcp 0 1 37.209.147.180:59856 141.101.114.59:80 FIN_WAIT1 tcp 0 1 37.209.147.180:44096 198.252.206.16:80 FIN_WAIT1 tcp 0 1 93.154.166.62:57471 108.160.162.49:80 FIN_WAIT1 tcp 0 1 37.209.147.180:59854 141.101.114.59:80 FIN_WAIT1 tcp 0 171 37.209.147.180:45340 173.194.113.146:443 ESTABLISHED tcp 0 168 37.209.147.180:45334 173.194.113.146:443 FIN_WAIT1 tcp 1 0 93.154.146.186:46609 23.21.151.181:80 CLOSE_WAIT tcp 0 1248 93.154.166.62:58270 64.251.23.59:443 FIN_WAIT1 tcp 0 1 37.209.147.180:59850 141.101.114.59:80 FIN_WAIT1 tcp 1 0 93.154.146.186:35181 75.101.152.29:80 CLOSE_WAIT tcp 232 0 93.154.172.168:46384 198.252.206.25:80 ESTABLISHED tcp 1 0 93.154.146.186:52618 107.22.246.76:80 CLOSE_WAIT tcp 1 0 93.154.172.168:36298 173.194.69.95:443 CLOSE_WAIT tcp 1 0 93.154.146.186:60209 75.101.152.29:443 CLOSE_WAIT tcp 0 168 37.209.147.180:45335 173.194.113.146:443 FIN_WAIT1 tcp 415 0 93.154.146.186:36157 82.112.106.104:80 CLOSE_WAIT tcp 1 0 192.168.43.224:36082 93.184.220.148:80 CLOSE_WAIT tcp 1 0 192.168.43.224:32942 199.15.160.100:80 CLOSE_WAIT tcp 1 0 93.154.146.186:50350 107.22.246.76:443 CLOSE_WAIT tcp 1 0 192.168.43.224:32941 199.15.160.100:80 CLOSE_WAIT tcp 0 534 37.209.147.180:44089 198.252.206.16:80 FIN_WAIT1 tcp 1 0 93.154.146.186:46608 23.21.151.181:80 CLOSE_WAIT tcp 1 0 93.154.146.186:46612 23.21.151.181:80 CLOSE_WAIT udp 0 0 37.209.147.180:49057 193.41.112.14:53 ESTABLISHED udp 0 0 37.209.147.180:51631 193.41.112.18:53 ESTABLISHED udp 0 0 37.209.147.180:34827 193.41.112.18:53 ESTABLISHED udp 0 0 37.209.147.180:35908 193.41.112.14:53 ESTABLISHED udp 0 0 37.209.147.180:44106 193.41.112.14:53 ESTABLISHED udp 0 0 37.209.147.180:42184 193.41.112.14:53 ESTABLISHED udp 0 0 37.209.147.180:54485 193.41.112.14:53 ESTABLISHED udp 0 0 37.209.147.180:42216 193.41.112.18:53 ESTABLISHED udp 0 0 37.209.147.180:51961 193.41.112.14:53 ESTABLISHED udp 0 0 37.209.147.180:48412 193.41.112.14:53 ESTABLISHED The interesting lines from ping got lost, but the summary over past few hours is: --- 8.8.8.8 ping statistics --- 107459 packets transmitted, 104376 received, +22 duplicates, 2% packet loss, time 195427362ms rtt min/avg/max/mdev = 24.822/528.132/90538.257/2519.263 ms, pipe 90 EDIT: Per request: Happened again, reboot didn't help but cleaned up all "hanging" processes. Currently netstat shows: bw@pony:/var/log$ netstat -n -t Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 93.154.188.68:42767 74.125.239.143:443 TIME_WAIT tcp 0 0 93.154.188.68:50270 173.194.69.189:443 ESTABLISHED tcp 0 0 93.154.188.68:45250 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:53488 173.194.32.198:80 ESTABLISHED tcp 0 0 93.154.188.68:53490 173.194.32.198:80 ESTABLISHED tcp 0 159 93.154.188.68:42741 74.125.239.143:443 LAST_ACK tcp 0 0 93.154.188.68:45808 198.252.206.25:80 ESTABLISHED tcp 0 0 93.154.188.68:52449 173.194.32.199:443 ESTABLISHED tcp 0 0 93.154.188.68:52600 173.194.32.199:443 TIME_WAIT tcp 0 0 93.154.188.68:50300 173.194.69.189:443 TIME_WAIT tcp 0 0 93.154.188.68:45253 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:46252 173.194.32.204:443 ESTABLISHED tcp 0 0 93.154.188.68:45246 190.93.244.58:80 ESTABLISHED tcp 0 0 93.154.188.68:47064 173.194.113.143:443 ESTABLISHED tcp 0 0 93.154.188.68:34484 173.194.69.95:443 ESTABLISHED tcp 0 0 93.154.188.68:45252 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:54290 173.194.32.202:443 ESTABLISHED tcp 0 0 93.154.188.68:47063 173.194.113.143:443 ESTABLISHED tcp 0 0 93.154.188.68:53469 173.194.32.198:80 TIME_WAIT tcp 0 0 93.154.188.68:45242 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:53468 173.194.32.198:80 ESTABLISHED tcp 0 0 93.154.188.68:50299 173.194.69.189:443 TIME_WAIT tcp 0 0 93.154.188.68:42764 74.125.239.143:443 TIME_WAIT tcp 0 0 93.154.188.68:45256 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:58047 108.160.162.105:80 ESTABLISHED tcp 0 0 93.154.188.68:45249 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:50297 173.194.69.189:443 TIME_WAIT tcp 0 0 93.154.188.68:53470 173.194.32.198:80 ESTABLISHED tcp 0 0 93.154.188.68:34100 68.232.35.121:443 ESTABLISHED tcp 0 0 93.154.188.68:42758 74.125.239.143:443 ESTABLISHED tcp 0 0 93.154.188.68:42765 74.125.239.143:443 TIME_WAIT tcp 0 0 93.154.188.68:39000 173.194.69.95:80 TIME_WAIT tcp 0 0 93.154.188.68:50296 173.194.69.189:443 TIME_WAIT tcp 0 0 93.154.188.68:53467 173.194.32.198:80 ESTABLISHED tcp 0 0 93.154.188.68:42766 74.125.239.143:443 TIME_WAIT tcp 0 0 93.154.188.68:45251 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:45248 190.93.244.58:80 TIME_WAIT tcp 0 0 93.154.188.68:45247 190.93.244.58:80 ESTABLISHED tcp 0 159 93.154.188.68:50254 173.194.69.189:443 LAST_ACK tcp 0 0 93.154.188.68:34483 173.194.69.95:443 ESTABLISHED Output of ps: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.8 0.0 3628 2092 ? Ss 16:52 0:03 /sbin/init root 2 0.0 0.0 0 0 ? S 16:52 0:00 [kthreadd] root 3 0.1 0.0 0 0 ? S 16:52 0:00 [ksoftirqd/0] root 4 0.1 0.0 0 0 ? S 16:52 0:00 [kworker/0:0] root 6 0.0 0.0 0 0 ? S 16:52 0:00 [migration/0] root 7 0.0 0.0 0 0 ? S 16:52 0:00 [watchdog/0] root 8 0.0 0.0 0 0 ? S 16:52 0:00 [migration/1] root 10 0.1 0.0 0 0 ? S 16:52 0:00 [ksoftirqd/1] root 11 0.0 0.0 0 0 ? S 16:52 0:00 [watchdog/1] root 12 0.0 0.0 0 0 ? S 16:52 0:00 [migration/2] root 14 0.1 0.0 0 0 ? S 16:52 0:00 [ksoftirqd/2] root 15 0.0 0.0 0 0 ? S 16:52 0:00 [watchdog/2] root 16 0.0 0.0 0 0 ? S 16:52 0:00 [migration/3] root 17 0.0 0.0 0 0 ? S 16:52 0:00 [kworker/3:0] root 18 0.1 0.0 0 0 ? S 16:52 0:00 [ksoftirqd/3] root 19 0.0 0.0 0 0 ? S 16:52 0:00 [watchdog/3] root 20 0.0 0.0 0 0 ? S< 16:52 0:00 [cpuset] root 21 0.0 0.0 0 0 ? S< 16:52 0:00 [khelper] root 22 0.0 0.0 0 0 ? S 16:52 0:00 [kdevtmpfs] root 23 0.0 0.0 0 0 ? S< 16:52 0:00 [netns] root 24 0.0 0.0 0 0 ? S 16:52 0:00 [sync_supers] root 25 0.0 0.0 0 0 ? S 16:52 0:00 [bdi-default] root 26 0.0 0.0 0 0 ? S< 16:52 0:00 [kintegrityd] root 27 0.0 0.0 0 0 ? S< 16:52 0:00 [kblockd] root 28 0.0 0.0 0 0 ? S< 16:52 0:00 [ata_sff] root 29 0.0 0.0 0 0 ? S 16:52 0:00 [khubd] root 30 0.0 0.0 0 0 ? S< 16:52 0:00 [md] root 42 0.0 0.0 0 0 ? S 16:52 0:00 [khungtaskd] root 43 0.0 0.0 0 0 ? S 16:52 0:00 [kswapd0] root 44 0.0 0.0 0 0 ? SN 16:52 0:00 [ksmd] root 45 0.0 0.0 0 0 ? SN 16:52 0:00 [khugepaged] root 46 0.0 0.0 0 0 ? S 16:52 0:00 [fsnotify_mark] root 47 0.0 0.0 0 0 ? S 16:52 0:00 [ecryptfs-kthrea] root 48 0.0 0.0 0 0 ? S< 16:52 0:00 [crypto] root 59 0.0 0.0 0 0 ? S< 16:52 0:00 [kthrotld] root 70 0.1 0.0 0 0 ? S 16:52 0:00 [kworker/2:1] root 71 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_0] root 72 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_1] root 73 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_2] root 74 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_3] root 75 0.0 0.0 0 0 ? S 16:52 0:00 [kworker/u:2] root 76 0.0 0.0 0 0 ? S 16:52 0:00 [kworker/u:3] root 79 0.0 0.0 0 0 ? S 16:52 0:00 [kworker/1:1] root 99 0.0 0.0 0 0 ? S< 16:52 0:00 [deferwq] root 100 0.0 0.0 0 0 ? S< 16:52 0:00 [charger_manager] root 101 0.0 0.0 0 0 ? S< 16:52 0:00 [devfreq_wq] root 102 0.1 0.0 0 0 ? S 16:52 0:00 [kworker/2:2] root 106 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_4] root 107 0.0 0.0 0 0 ? S 16:52 0:00 [usb-storage] root 108 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_5] root 109 0.0 0.0 0 0 ? S 16:52 0:00 [usb-storage] root 271 0.1 0.0 0 0 ? S 16:52 0:00 [kworker/1:2] root 316 0.0 0.0 0 0 ? S 16:52 0:00 [jbd2/sda1-8] root 317 0.0 0.0 0 0 ? S< 16:52 0:00 [ext4-dio-unwrit] root 440 0.1 0.0 2820 608 ? S 16:52 0:00 upstart-udev-bridge --daemon root 478 0.0 0.0 3460 1648 ? Ss 16:52 0:00 /sbin/udevd --daemon root 632 0.0 0.0 3348 1336 ? S 16:52 0:00 /sbin/udevd --daemon root 633 0.0 0.0 3348 1204 ? S 16:52 0:00 /sbin/udevd --daemon root 782 0.0 0.0 2816 596 ? S 16:52 0:00 upstart-socket-bridge --daemon root 822 0.0 0.0 6684 2400 ? Ss 16:52 0:00 /usr/sbin/sshd -D 102 834 0.2 0.0 4064 1864 ? Ss 16:52 0:01 dbus-daemon --system --fork root 857 0.0 0.1 7420 3380 ? Ss 16:52 0:00 /usr/sbin/modem-manager root 858 0.0 0.0 4784 1636 ? Ss 16:52 0:00 /usr/sbin/bluetoothd syslog 860 0.0 0.0 31068 1496 ? Sl 16:52 0:00 rsyslogd -c5 root 869 0.1 0.1 24280 5564 ? Ssl 16:52 0:00 NetworkManager avahi 883 0.0 0.0 3448 1488 ? S 16:52 0:00 avahi-daemon: running [pony.local] avahi 884 0.0 0.0 3448 436 ? S 16:52 0:00 avahi-daemon: chroot helper root 885 0.0 0.0 0 0 ? S< 16:52 0:00 [kpsmoused] root 892 0.0 0.1 25696 4140 ? Sl 16:52 0:00 /usr/lib/policykit-1/polkitd --no-debug root 923 0.0 0.0 0 0 ? S 16:52 0:00 [scsi_eh_6] root 959 0.0 0.0 0 0 ? S< 16:52 0:00 [krfcommd] root 970 0.0 0.1 7536 3120 ? Ss 16:52 0:00 /usr/sbin/cupsd -F colord 976 0.1 0.3 55080 10396 ? Sl 16:52 0:00 /usr/lib/i386-linux-gnu/colord/colord root 979 0.0 0.0 4632 872 tty4 Ss+ 16:52 0:00 /sbin/getty -8 38400 tty4 root 987 0.0 0.0 4632 884 tty5 Ss+ 16:52 0:00 /sbin/getty -8 38400 tty5 root 994 0.0 0.0 4632 884 tty2 Ss+ 16:52 0:00 /sbin/getty -8 38400 tty2 root 995 0.0 0.0 4632 868 tty3 Ss+ 16:52 0:00 /sbin/getty -8 38400 tty3 root 998 0.0 0.0 4632 876 tty6 Ss+ 16:52 0:00 /sbin/getty -8 38400 tty6 root 1022 0.0 0.0 2176 680 ? Ss 16:52 0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket root 1029 0.0 0.0 3632 664 ? Ss 16:52 0:00 /usr/sbin/irqbalance daemon 1030 0.0 0.0 2476 120 ? Ss 16:52 0:00 atd root 1031 0.0 0.0 2620 880 ? Ss 16:52 0:00 cron root 1061 0.1 0.0 0 0 ? S 16:52 0:00 [kworker/3:2] root 1064 0.0 1.0 34116 31072 ? SLsl 16:52 0:00 lightdm root 1076 13.4 1.2 118688 37920 tty7 Ssl+ 16:52 0:55 /usr/bin/X :0 -core -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswit root 1085 0.0 0.0 0 0 ? S 16:52 0:00 [rts_pstor] root 1087 0.0 0.0 0 0 ? S 16:52 0:00 [rtsx-polling] root 1095 0.0 0.0 0 0 ? S< 16:52 0:00 [cfg80211] root 1127 0.0 0.0 0 0 ? S 16:52 0:00 [flush-8:0] root 1130 0.0 0.0 6136 1824 ? Ss 16:52 0:00 /sbin/wpa_supplicant -B -P /run/sendsigs.omit.d/wpasupplicant.pid -u -s -O /va root 1137 0.0 0.1 24604 3164 ? Sl 16:52 0:00 /usr/lib/accountsservice/accounts-daemon root 1140 0.0 0.0 0 0 ? S< 16:52 0:00 [hd-audio0] root 1188 0.0 0.1 34308 3420 ? Sl 16:52 0:00 /usr/sbin/console-kit-daemon --no-daemon root 1425 0.0 0.0 4632 872 tty1 Ss+ 16:52 0:00 /sbin/getty -8 38400 tty1 root 1443 0.1 0.1 29460 4664 ? Sl 16:52 0:00 /usr/lib/upower/upowerd root 1579 0.0 0.1 16540 3272 ? Sl 16:53 0:00 lightdm --session-child 12 19 bw 1623 0.0 0.0 2232 644 ? Ss 16:53 0:00 /bin/sh /usr/bin/startkde bw 1672 0.0 0.0 4092 204 ? Ss 16:53 0:00 /usr/bin/ssh-agent /usr/bin/gpg-agent --daemon --sh --write-env-file=/home/bw/ bw 1673 0.0 0.0 5492 384 ? Ss 16:53 0:00 /usr/bin/gpg-agent --daemon --sh --write-env-file=/home/bw/.gnupg/gpg-agent-in bw 1676 0.0 0.0 3848 792 ? S 16:53 0:00 /usr/bin/dbus-launch --exit-with-session /usr/bin/startkde bw 1677 0.5 0.0 5384 2180 ? Ss 16:53 0:02 //bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session root 1704 0.3 0.1 25348 3600 ? Sl 16:53 0:01 /usr/lib/udisks/udisks-daemon root 1705 0.0 0.0 6620 728 ? S 16:53 0:00 udisks-daemon: not polling any devices bw 1736 0.0 0.0 2008 64 ? S 16:53 0:00 /usr/lib/kde4/libexec/start_kdeinit +kcminit_startup bw 1737 0.0 0.5 115200 15588 ? Ss 16:53 0:00 kdeinit4: kdeinit4 Running... bw 1738 0.1 0.2 116756 8728 ? S 16:53 0:00 kdeinit4: klauncher [kdeinit] --fd=9 bw 1740 0.6 1.0 340524 31264 ? Sl 16:53 0:02 kdeinit4: kded4 [kdeinit] bw 1742 0.0 0.0 8944 2144 ? S 16:53 0:00 /usr/lib/i386-linux-gnu/gconf/gconfd-2 bw 1746 0.2 0.4 92028 14688 ? S 16:53 0:00 /usr/bin/kglobalaccel bw 1748 0.0 0.4 90804 13500 ? S 16:53 0:00 /usr/bin/kwalletd bw 1752 0.1 0.5 103764 15152 ? S 16:53 0:00 /usr/bin/kactivitymanagerd bw 1758 0.0 0.0 2144 280 ? S 16:53 0:00 kwrapper4 ksmserver bw 1759 0.1 0.5 150016 16088 ? Sl 16:53 0:00 kdeinit4: ksmserver [kdeinit] bw 1763 2.2 1.0 178492 32100 ? Sl 16:53 0:08 kwin bw 1772 0.2 0.5 106292 16340 ? Sl 16:53 0:00 /usr/bin/knotify4 bw 1777 0.9 1.1 246120 32912 ? Sl 16:53 0:03 /usr/bin/krunner bw 1778 6.3 2.7 389884 80216 ? Sl 16:53 0:23 /usr/bin/plasma-desktop bw 1785 0.0 0.0 2844 1208 ? S 16:53 0:00 ksysguardd bw 1789 0.1 0.4 82036 14176 ? S 16:53 0:00 /usr/bin/kuiserver bw 1805 0.3 0.1 61560 5612 ? Sl 16:53 0:01 /usr/bin/akonadi_control root 1806 0.0 0.0 0 0 ? S 16:53 0:00 [kworker/0:2] bw 1808 0.1 0.2 211852 8460 ? Sl 16:53 0:00 akonadiserver bw 1810 0.4 0.8 244116 25360 ? Sl 16:53 0:01 /usr/sbin/mysqld --defaults-file=/home/bw/.local/share/akonadi/mysql.conf --da bw 1874 0.0 0.0 35284 2956 ? Sl 16:53 0:00 /usr/bin/xsettings-kde bw 1876 0.0 0.3 68776 9488 ? Sl 16:53 0:00 /usr/bin/nepomukserver bw 1884 0.4 0.9 173876 29240 ? SNl 16:53 0:01 /usr/bin/nepomukservicestub nepomukstorage bw 1902 6.1 2.1 451512 63924 ? Sl 16:53 0:21 /home/bw/.dropbox-dist/dropbox bw 1906 3.8 1.0 142368 32376 ? Rl 16:53 0:13 /usr/bin/yakuake bw 1933 0.0 0.1 54636 4680 ? Sl 16:53 0:00 /usr/bin/zeitgeist-datahub bw 1943 0.5 1.5 164836 46836 ? Sl 16:53 0:01 python /usr/bin/printer-applet bw 1945 0.1 0.1 99636 5048 ? S<l 16:53 0:00 /usr/bin/pulseaudio --start --log-target=syslog rtkit 1947 0.0 0.0 21336 1248 ? SNl 16:53 0:00 /usr/lib/rtkit/rtkit-daemon bw 1958 0.0 0.1 44204 3792 ? Sl 16:53 0:00 /usr/bin/zeitgeist-daemon bw 1972 0.0 0.0 27008 2684 ? Sl 16:53 0:00 /usr/lib/gvfs/gvfsd bw 1974 0.1 0.5 90480 16660 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_akonotes_resource akonadi_akonotes_res bw 1984 0.1 0.5 90472 16636 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_akonotes_resource akonadi_akonotes_res bw 1985 0.3 0.9 148800 28304 ? S 16:53 0:01 /usr/bin/akonadi_archivemail_agent --identifier akonadi_archivemail_agent bw 1992 0.1 0.5 90020 16148 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_contacts_resource akonadi_contacts_res bw 1993 0.1 0.5 90132 16452 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_contacts_resource akonadi_contacts_res bw 1994 0.1 0.5 90564 16332 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_ical_resource akonadi_ical_resource_0 bw 1995 0.1 0.5 90676 16732 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_ical_resource akonadi_ical_resource_1 bw 1996 0.1 0.5 90468 16800 ? Sl 16:53 0:00 /usr/bin/akonadi_agent_launcher akonadi_maildir_resource akonadi_maildir_resou bw 1999 0.2 0.6 99324 19276 ? S 16:53 0:00 /usr/bin/akonadi_maildispatcher_agent --identifier akonadi_maildispatcher_agen bw 2006 0.3 0.9 148808 28332 ? S 16:53 0:01 /usr/bin/akonadi_mailfilter_agent --identifier akonadi_mailfilter_agent bw 2017 0.0 0.1 50256 4716 ? Sl 16:53 0:00 /usr/lib/zeitgeist/zeitgeist-fts bw 2024 0.2 0.6 103632 18376 ? Sl 16:53 0:00 /usr/bin/akonadi_nepomuk_feeder --identifier akonadi_nepomuk_feeder bw 2043 0.0 0.0 4484 280 ? S 16:53 0:00 /bin/cat bw 2101 0.2 0.7 113600 22396 ? Sl 16:53 0:00 /usr/lib/kde4/libexec/polkit-kde-authentication-agent-1 bw 2105 0.2 0.7 114196 22072 ? Sl 16:53 0:00 /usr/bin/nepomukcontroller bw 2156 0.3 1.0 333188 31244 ? Sl 16:54 0:01 /usr/bin/kmix bw 2167 0.0 0.0 6548 2724 pts/2 Ss 16:54 0:00 /bin/bash bw 2177 0.2 0.7 113496 22960 ? Sl 16:54 0:00 /usr/bin/klipper bw 2394 3.5 1.2 52932 35596 ? SNl 16:54 0:11 /usr/bin/virtuoso-t +foreground +configfile /tmp/virtuoso_hX1884.ini +wait root 2460 0.0 0.0 6184 1876 pts/2 S 16:54 0:00 sudo -s root 2500 0.0 0.0 6528 2700 pts/2 S 16:54 0:00 /bin/bash root 2599 0.0 0.0 5444 1280 pts/2 S+ 16:54 0:00 /bin/bash bin/aero root 2606 0.1 0.0 9836 2500 pts/2 S+ 16:54 0:00 wvdial aero2 root 2619 0.0 0.0 3504 1280 pts/2 S 16:54 0:00 /usr/sbin/pppd 57600 modem crtscts defaultroute usehostname -detach user aero bw 2653 0.0 0.0 6600 2880 pts/3 Ss 16:54 0:00 /bin/bash bw 2676 0.4 0.8 130296 24016 ? SNl 16:54 0:01 /usr/bin/nepomukservicestub nepomukfilewatch bw 2679 0.1 0.7 101636 22252 ? SNl 16:54 0:00 /usr/bin/nepomukservicestub nepomukqueryservice bw 2681 0.2 0.8 109836 24280 ? SNl 16:54 0:00 /usr/bin/nepomukservicestub nepomukbackupsync bw 3833 46.0 9.7 829272 288012 ? Rl 16:55 1:46 /usr/lib/firefox/firefox bw 3903 0.0 0.0 35128 2804 ? Sl 16:55 0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher bw 4708 0.1 0.0 6564 2736 pts/4 Ss 16:56 0:00 /bin/bash root 5210 0.0 0.0 0 0 ? S 16:57 0:00 [kworker/u:0] root 6140 0.2 0.0 0 0 ? S 16:58 0:00 [kworker/0:1] root 6371 0.5 0.0 6184 1868 pts/4 S+ 16:59 0:00 sudo nethogs ppp0 root 6411 17.7 0.2 8616 6144 pts/4 S+ 16:59 0:05 nethogs ppp0 bw 6787 0.0 0.0 5464 1220 pts/3 R+ 16:59 0:00 ps auxw

Read the article

Informed TDD – Kata “To Roman Numerals”

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons. PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

Read the article

Inbound SIP calls through Cisco 881 NAT hang up after a few seconds

- by MasterRoot24

I've recently moved to a Cisco 881 router for my WAN link. I was previously using a Cisco Linksys WAG320N as my modem/router/WiFi AP/NAT firewall. The WAG320N is now running in bridged mode, so it's simply acting as a modem with one of it's LAN ports connected to FE4 WAN on my Cisco 881. The Cisco 881 get's a DHCP provided IP from my ISP. My LAN is part of default Vlan 1 (192.168.1.0/24). General internet connectivity is working great, I've managed to setup static NAT rules for my HTTP/HTTPS/SMTP/etc. services which are running on my LAN. I don't know whether it's worth mentioning that I've opted to use NVI NAT (ip nat enable as opposed to the traditional ip nat outside/ip nat inside) setup. My reason for this is that NVI allows NAT loopback from my LAN to the WAN IP and back in to the necessary server on the LAN. I run an Asterisk 1.8 PBX on my LAN, which connects to a SIP provider on the internet. Both inbound and outbound calls through the old setup (WAG320N providing routing/NAT) worked fine. However, since moving to the Cisco 881, inbound calls drop after around 10 seconds, whereas outbound calls work fine. The following message is logged on my Asterisk PBX: [Dec 9 15:27:45] WARNING[27734]: chan_sip.c:3641 retrans_pkt: Retransmission timeout reached on transmission [email protected] for seqno 1 (Critical Response) -- See https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions Packet timed out after 6528ms with no response [Dec 9 15:27:45] WARNING[27734]: chan_sip.c:3670 retrans_pkt: Hanging up call [email protected] - no reply to our critical packet (see https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions). (I know that this is quite a common issue - I've spend the best part of 2 days solid on this, trawling Google.) I've done as I am told and checked https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions. Referring to the section "Other SIP requests" in the page linked above, I believe that the hangup to be caused by the ACK from my SIP provider not being passed back through NAT to Asterisk on my PBX. I tried to ascertain this by dumping the packets on my WAN interface on the 881. I managed to obtain a PCAP dump of packets in/out of my WAN interface. Here's an example of an ACK being reveived by the router from my provider: 689 21.219999 193.x.x.x 188.x.x.x SIP 502 Request: ACK sip:[email protected] | However a SIP trace on the Asterisk server show's that there are no ACK's received in response to the 200 OK from my PBX: http://pastebin.com/wwHpLPPz In the past, I have been strongly advised to disable any sort of SIP ALGs on routers and/or firewalls and the many posts regarding this issue on the internet seem to support this. However, I believe on Cisco IOS, the config command to disable SIP ALG is no ip nat service sip udp port 5060 however, this doesn't appear to help the situation. To confirm that config setting is set: Router1#show running-config | include sip no ip nat service sip udp port 5060 Another interesting twist: for a short period of time, I tried another provider. Luckily, my trial account with them is still available, so I reverted my Asterisk config back to the revision before I integrated with my current provider. I then dialled in to the DDI associated with the trial trunk and the call didn't get hung up and I didn't get the error above! To me, this points at the provider, however I know, like all providers do, will say "There's no issues with our SIP proxies - it's your firewall." I'm tempted to agree with this, as this issue was not apparent with the old WAG320N router when it was doing the NAT'ing. I'm sure you'll want to see my running-config too: ! ! Last configuration change at 15:55:07 UTC Sun Dec 9 2012 by xxx version 15.2 no service pad service tcp-keepalives-in service tcp-keepalives-out service timestamps debug datetime msec localtime show-timezone service timestamps log datetime msec localtime show-timezone no service password-encryption service sequence-numbers ! hostname Router1 ! boot-start-marker boot-end-marker ! ! security authentication failure rate 10 log security passwords min-length 6 logging buffered 4096 logging console critical enable secret 4 xxx ! aaa new-model ! ! aaa authentication login local_auth local ! ! ! ! ! aaa session-id common ! memory-size iomem 10 ! crypto pki trustpoint TP-self-signed-xxx enrollment selfsigned subject-name cn=IOS-Self-Signed-Certificate-xxx revocation-check none rsakeypair TP-self-signed-xxx ! ! crypto pki certificate chain TP-self-signed-xxx certificate self-signed 01 quit no ip source-route no ip gratuitous-arps ip auth-proxy max-login-attempts 5 ip admission max-login-attempts 5 ! ! ! ! ! no ip bootp server ip domain name dmz.merlin.local ip domain list dmz.merlin.local ip domain list merlin.local ip name-server x.x.x.x ip inspect audit-trail ip inspect udp idle-time 1800 ip inspect dns-timeout 7 ip inspect tcp idle-time 14400 ip inspect name autosec_inspect ftp timeout 3600 ip inspect name autosec_inspect http timeout 3600 ip inspect name autosec_inspect rcmd timeout 3600 ip inspect name autosec_inspect realaudio timeout 3600 ip inspect name autosec_inspect smtp timeout 3600 ip inspect name autosec_inspect tftp timeout 30 ip inspect name autosec_inspect udp timeout 15 ip inspect name autosec_inspect tcp timeout 3600 ip cef login block-for 3 attempts 3 within 3 no ipv6 cef ! ! multilink bundle-name authenticated license udi pid CISCO881-SEC-K9 sn ! ! username xxx privilege 15 secret 4 xxx username xxx secret 4 xxx ! ! ! ! ! ip ssh time-out 60 ! ! ! ! ! ! ! ! ! interface FastEthernet0 no ip address ! interface FastEthernet1 no ip address ! interface FastEthernet2 no ip address ! interface FastEthernet3 switchport access vlan 2 no ip address ! interface FastEthernet4 ip address dhcp no ip redirects no ip unreachables no ip proxy-arp ip nat enable duplex auto speed auto ! interface Vlan1 ip address 192.168.1.1 255.255.255.0 no ip redirects no ip unreachables no ip proxy-arp ip nat enable ! interface Vlan2 ip address 192.168.0.2 255.255.255.0 ! ip forward-protocol nd ip http server ip http access-class 1 ip http authentication local ip http secure-server ip http timeout-policy idle 60 life 86400 requests 10000 ! ! no ip nat service sip udp port 5060 ip nat source list 1 interface FastEthernet4 overload ip nat source static tcp x.x.x.x 80 interface FastEthernet4 80 ip nat source static tcp x.x.x.x 443 interface FastEthernet4 443 ip nat source static tcp x.x.x.x 25 interface FastEthernet4 25 ip nat source static tcp x.x.x.x 587 interface FastEthernet4 587 ip nat source static tcp x.x.x.x 143 interface FastEthernet4 143 ip nat source static tcp x.x.x.x 993 interface FastEthernet4 993 ip nat source static tcp x.x.x.x 1723 interface FastEthernet4 1723 ! ! logging trap debugging logging facility local2 access-list 1 permit 192.168.1.0 0.0.0.255 access-list 1 permit 192.168.0.0 0.0.0.255 no cdp run ! ! ! ! control-plane ! ! banner motd Authorized Access only ! line con 0 login authentication local_auth length 0 transport output all line aux 0 exec-timeout 15 0 login authentication local_auth transport output all line vty 0 1 access-class 1 in logging synchronous login authentication local_auth length 0 transport preferred none transport input telnet transport output all line vty 2 4 access-class 1 in login authentication local_auth length 0 transport input ssh transport output all ! ! end ...and, if it's of any use, here's my Asterisk SIP config: [general] context=default ; Default context for calls allowoverlap=no ; Disable overlap dialing support. (Default is yes) udpbindaddr=0.0.0.0 ; IP address to bind UDP listen socket to (0.0.0.0 binds to all) ; Optionally add a port number, 192.168.1.1:5062 (default is port 5060) tcpenable=no ; Enable server for incoming TCP connections (default is no) tcpbindaddr=0.0.0.0 ; IP address for TCP server to bind to (0.0.0.0 binds to all interfaces) ; Optionally add a port number, 192.168.1.1:5062 (default is port 5060) srvlookup=yes ; Enable DNS SRV lookups on outbound calls ; Note: Asterisk only uses the first host ; in SRV records ; Disabling DNS SRV lookups disables the ; ability to place SIP calls based on domain ; names to some other SIP users on the Internet ; Specifying a port in a SIP peer definition or ; when dialing outbound calls will supress SRV ; lookups for that peer or call. directmedia=no ; Don't allow direct RTP media between extensions (doesn't work through NAT) externhost=<MY DYNDNS HOSTNAME> ; Our external hostname to resolve to IP and be used in NAT'ed packets localnet=192.168.1.0/24 ; Define our local network so we know which packets need NAT'ing qualify=yes ; Qualify peers by default dtmfmode=rfc2833 ; Set the default DTMF mode disallow=all ; Disallow all codecs by default allow=ulaw ; Allow G.711 u-law allow=alaw ; Allow G.711 a-law ; ---------------------- ; SIP Trunk Registration ; ---------------------- ; Orbtalk register => <MY SIP PROVIDER USER NAME>:[email protected]/<MY DDI> ; Main Orbtalk number ; ---------- ; Trunks ; ---------- [orbtalk] ; Main Orbtalk trunk type=peer insecure=invite host=sipgw3.orbtalk.co.uk nat=yes username=<MY SIP PROVIDER USER NAME> defaultuser=<MY SIP PROVIDER USER NAME> fromuser=<MY SIP PROVIDER USER NAME> secret=xxx context=inbound I really don't know where to go with this. If anyone can help me find out why these calls are being dropped off, I'd be grateful if you could chime in! Please let me know if any further info is required.

Read the article

Apache + Codeigniter + New Server + Unexpected Errors

- by ngl5000

Alright here is the situation: I use to have my codeigniter site at bluehost were I did not have root access, I have since moved that site to rackspace. I have not changed any of the PHP code yet there has been some unexpected behavior. Unexpected Behavior: http://mysite.com/robots.txt Both old and new resolve to the robots file http://mysite.com/robots.txt/ The old bluehost setup resolves to my codeigniter 404 error page. The rackspace config resolves to: Not Found The requested URL /robots.txt/ was not found on this server. **This instance leads me to believe that there could be a problem with my mod rewrites or lack there of. The first one produces the error correctly through php while it seems the second senario lets the server handle this error. The next instance of this problem is even more troubling: 'http://mysite.com/search/term/9 x 1-1%2F2 white/' New site results in: Bad Request Your browser sent a request that this server could not understand. Old site results in: The actual page being loaded and the search term being unencoded. I have to assume that this has something to do with the fact that when I went to the new server I went from root level htaccess file to httpd.conf file and virtual server default and default-ssl. Here they are: Default file: <VirtualHost *:80> ServerAdmin webmaster@localhost ServerName mysite.com DocumentRoot /var/www <Directory /> Options +FollowSymLinks AllowOverride None </Directory> <Directory /var/www> Options -Indexes +FollowSymLinks -MultiViews AllowOverride None Order allow,deny allow from all RewriteEngine On RewriteBase / # force no www. (also does the IP thing) RewriteCond %{HTTPS} !=on RewriteCond %{HTTP_HOST} !^mysite\.com [NC] RewriteRule ^(.*)$ http://mysite.com/$1 [R=301,L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.+)\.(\d+)\.(js|css|png|jpg|gif)$ $1.$3 [L] # index.php remove any index.php parts RewriteCond %{THE_REQUEST} /index\.(php|html) RewriteRule (.*)index\.(php|html)(.*)$ /$1$3 [r=301,L] # codeigniter direct RewriteCond $0 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^.*$ index.php [L] </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/access.log combined Alias /doc/ "/usr/share/doc/" <Directory "/usr/share/doc/"> Options Indexes MultiViews FollowSymLinks AllowOverride None Order deny,allow Deny from all Allow from 127.0.0.0/255.0.0.0 ::1/128 </Directory> </VirtualHost> Default-ssl File <IfModule mod_ssl.c> <VirtualHost _default_:443> ServerAdmin webmaster@localhost ServerName mysite.com DocumentRoot /var/www <Directory /> Options +FollowSymLinks AllowOverride None </Directory> <Directory /var/www> Options -Indexes +FollowSymLinks -MultiViews AllowOverride None Order allow,deny allow from all RewriteEngine On RewriteBase / RewriteCond %{SERVER_PORT} !^443 RewriteRule ^ https://mysite.com%{REQUEST_URI} [R=301,L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.+)\.(\d+)\.(js|css|png|jpg|gif)$ $1.$3 [L] # index.php remove any index.php parts RewriteCond %{THE_REQUEST} /index\.(php|html) RewriteRule (.*)index\.(php|html)(.*)$ /$1$3 [r=301,L] # codeigniter direct RewriteCond $0 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^.*$ index.php [L] </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/ssl_access.log combined Alias /doc/ "/usr/share/doc/" <Directory "/usr/share/doc/"> Options Indexes MultiViews FollowSymLinks AllowOverride None Order deny,allow Deny from all Allow from 127.0.0.0/255.0.0.0 ::1/128 </Directory> # SSL Engine Switch: # Enable/Disable SSL for this virtual host. SSLEngine on # Use our self-signed certificate by default SSLCertificateFile /etc/apache2/ssl/certs/www.mysite.com.crt SSLCertificateKeyFile /etc/apache2/ssl/private/www.mysite.com.key # A self-signed (snakeoil) certificate can be created by installing # the ssl-cert package. See # /usr/share/doc/apache2.2-common/README.Debian.gz for more info. # If both key and certificate are stored in the same file, only the # SSLCertificateFile directive is needed. # SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem # SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key # Server Certificate Chain: # Point SSLCertificateChainFile at a file containing the # concatenation of PEM encoded CA certificates which form the # certificate chain for the server certificate. Alternatively # the referenced file can be the same as SSLCertificateFile # when the CA certificates are directly appended to the server # certificate for convinience. #SSLCertificateChainFile /etc/apache2/ssl.crt/server-ca.crt # Certificate Authority (CA): # Set the CA certificate verification path where to find CA # certificates for client authentication or alternatively one # huge file containing all of them (file must be PEM encoded) # Note: Inside SSLCACertificatePath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCACertificatePath /etc/ssl/certs/ #SSLCACertificateFile /etc/apache2/ssl.crt/ca-bundle.crt # Certificate Revocation Lists (CRL): # Set the CA revocation path where to find CA CRLs for client # authentication or alternatively one huge file containing all # of them (file must be PEM encoded) # Note: Inside SSLCARevocationPath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCARevocationPath /etc/apache2/ssl.crl/ #SSLCARevocationFile /etc/apache2/ssl.crl/ca-bundle.crl # Client Authentication (Type): # Client certificate verification type and depth. Types are # none, optional, require and optional_no_ca. Depth is a # number which specifies how deeply to verify the certificate # issuer chain before deciding the certificate is not valid. #SSLVerifyClient require #SSLVerifyDepth 10 # Access Control: # With SSLRequire you can do per-directory access control based # on arbitrary complex boolean expressions containing server # variable checks and other lookup directives. The syntax is a # mixture between C and Perl. See the mod_ssl documentation # for more details. #<Location /> #SSLRequire ( %{SSL_CIPHER} !~ m/^(EXP|NULL)/ \ # and %{SSL_CLIENT_S_DN_O} eq "Snake Oil, Ltd." \ # and %{SSL_CLIENT_S_DN_OU} in {"Staff", "CA", "Dev"} \ # and %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 \ # and %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 20 ) \ # or %{REMOTE_ADDR} =~ m/^192\.76\.162\.[0-9]+$/ #</Location> # SSL Engine Options: # Set various options for the SSL engine. # o FakeBasicAuth: # Translate the client X.509 into a Basic Authorisation. This means that # the standard Auth/DBMAuth methods can be used for access control. The # user name is the `one line' version of the client's X.509 certificate. # Note that no password is obtained from the user. Every entry in the user # file needs this password: `xxj31ZMTZzkVA'. # o ExportCertData: # This exports two additional environment variables: SSL_CLIENT_CERT and # SSL_SERVER_CERT. These contain the PEM-encoded certificates of the # server (always existing) and the client (only existing when client # authentication is used). This can be used to import the certificates # into CGI scripts. # o StdEnvVars: # This exports the standard SSL/TLS related `SSL_*' environment variables. # Per default this exportation is switched off for performance reasons, # because the extraction step is an expensive operation and is usually # useless for serving static content. So one usually enables the # exportation for CGI and SSI requests only. # o StrictRequire: # This denies access when "SSLRequireSSL" or "SSLRequire" applied even # under a "Satisfy any" situation, i.e. when it applies access is denied # and no other module can change it. # o OptRenegotiate: # This enables optimized SSL connection renegotiation handling when SSL # directives are used in per-directory context. #SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire <FilesMatch "\.(cgi|shtml|phtml|php)$"> SSLOptions +StdEnvVars </FilesMatch> <Directory /usr/lib/cgi-bin> SSLOptions +StdEnvVars </Directory> # SSL Protocol Adjustments: # The safe and default but still SSL/TLS standard compliant shutdown # approach is that mod_ssl sends the close notify alert but doesn't wait for # the close notify alert from client. When you need a different shutdown # approach you can use one of the following variables: # o ssl-unclean-shutdown: # This forces an unclean shutdown when the connection is closed, i.e. no # SSL close notify alert is send or allowed to received. This violates # the SSL/TLS standard but is needed for some brain-dead browsers. Use # this when you receive I/O errors because of the standard approach where # mod_ssl sends the close notify alert. # o ssl-accurate-shutdown: # This forces an accurate shutdown when the connection is closed, i.e. a # SSL close notify alert is send and mod_ssl waits for the close notify # alert of the client. This is 100% SSL/TLS standard compliant, but in # practice often causes hanging connections with brain-dead browsers. Use # this only for browsers where you know that their SSL implementation # works correctly. # Notice: Most problems of broken clients are also related to the HTTP # keep-alive facility, so you usually additionally want to disable # keep-alive for those clients, too. Use variable "nokeepalive" for this. # Similarly, one has to force some clients to use HTTP/1.0 to workaround # their broken HTTP/1.1 implementation. Use variables "downgrade-1.0" and # "force-response-1.0" for this. BrowserMatch "MSIE [2-6]" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 # MSIE 7 and newer should be able to use keepalive BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown httpd.conf File Just a lot of stuff from html5 boiler plate, I will post it if need be Old htaccess file <IfModule mod_rewrite.c> # index.php remove any index.php parts RewriteCond %{THE_REQUEST} /index\.(php|html) RewriteRule (.*)index\.(php|html)(.*)$ /$1$3 [r=301,L] RewriteCond $1 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^(.*)/$ /$1 [r=301,L] # codeigniter direct RewriteCond $1 !^(index\.php|assets|robots\.txt|sitemap\.xml|favicon\.ico) RewriteRule ^(.*)$ /index.php/$1 [L] </IfModule> Any Help would be hugely appreciated!!

Read the article

Using HTML 5 SessionState to save rendered Page Content

- by Rick Strahl

HTML 5 SessionState and LocalStorage are very useful and super easy to use to manage client side state. For building rich client side or SPA style applications it's a vital feature to be able to cache user data as well as HTML content in order to swap pages in and out of the browser's DOM. What might not be so obvious is that you can also use the sessionState and localStorage objects even in classic server rendered HTML applications to provide caching features between pages. These APIs have been around for a long time and are supported by most relatively modern browsers and even all the way back to IE8, so you can use them safely in your Web applications. SessionState and LocalStorage are easy The APIs that make up sessionState and localStorage are very simple. Both object feature the same API interface which is a simple, string based key value store that has getItem, setItem, removeitem, clear and key methods. The objects are also pseudo array objects and so can be iterated like an array with a length property and you have array indexers to set and get values with. Basic usage for storing and retrieval looks like this (using sessionStorage, but the syntax is the same for localStorage - just switch the objects):// set var lastAccess = new Date().getTime(); if (sessionStorage) sessionStorage.setItem("myapp_time", lastAccess.toString()); // retrieve in another page or on a refresh var time = null; if (sessionStorage) time = sessionStorage.getItem("myapp_time"); if (time) time = new Date(time * 1); else time = new Date(); sessionState stores data that is browser session specific and that has a liftetime of the active browser session or window. Shut down the browser or tab and the storage goes away. localStorage uses the same API interface, but the lifetime of the data is permanently stored in the browsers storage area until deleted via code or by clearing out browser cookies (not the cache). Both sessionStorage and localStorage space is limited. The spec is ambiguous about this - supposedly sessionStorage should allow for unlimited size, but it appears that most WebKit browsers support only 2.5mb for either object. This means you have to be careful what you store especially since other applications might be running on the same domain and also use the storage mechanisms. That said 2.5mb worth of character data is quite a bit and would go a long way. The easiest way to get a feel for how sessionState and localStorage work is to look at a simple example. You can go check out the following example online in Plunker: http://plnkr.co/edit/0ICotzkoPjHaWa70GlRZ?p=preview which looks like this: Plunker is an online HTML/JavaScript editor that lets you write and run Javascript code and similar to JsFiddle, but a bit cleaner to work in IMHO (thanks to John Papa for turning me on to it). The sample has two text boxes with counts that update session/local storage every time you click the related button. The counts are 'cached' in Session and Local storage. The point of these examples is that both counters survive full page reloads, and the LocalStorage counter survives a complete browser shutdown and restart. Go ahead and try it out by clicking the Reload button after updating both counters and then shutting down the browser completely and going back to the same URL (with the same browser). What you should see is that reloads leave both counters intact at the counted values, while a browser restart will leave only the local storage counter intact. The code to deal with the SessionStorage (and LocalStorage not shown here) in the example is isolated into a couple of wrapper methods to simplify the code: function getSessionCount() { var count = 0; if (sessionStorage) { var count = sessionStorage.getItem("ss_count"); count = !count ? 0 : count * 1; } $("#txtSession").val(count); return count; } function setSessionCount(count) { if (sessionStorage) sessionStorage.setItem("ss_count", count.toString()); } These two functions essentially load and store a session counter value. The two key methods used here are: sessionStorage.getItem(key); sessionStorage.setItem(key,stringVal); Note that the value given to setItem and return by getItem has to be a string. If you pass another type you get an error. Don't let that limit you though - you can easily enough store JSON data in a variable so it's quite possible to pass complex objects and store them into a single sessionStorage value:var user = { name: "Rick", id="ricks", level=8 } sessionStorage.setItem("app_user",JSON.stringify(user)); to retrieve it:var user = sessionStorage.getItem("app_user"); if (user) user = JSON.parse(user); Simple! If you're using the Chrome Developer Tools (F12) you can also check out the session and local storage state on the Resource tab: You can also use this tool to refresh or remove entries from storage. What we just looked at is a purely client side implementation where a couple of counters are stored. For rich client centric AJAX applications sessionStorage and localStorage provide a very nice and simple API to store application state while the application is running. But you can also use these storage mechanisms to manage server centric HTML applications when you combine server rendering with some JavaScript to perform client side data caching. You can both store some state information and data on the client (ie. store a JSON object and carry it forth between server rendered HTML requests) or you can use it for good old HTTP based caching where some rendered HTML is saved and then restored later. Let's look at the latter with a real life example. Why do I need Client-side Page Caching for Server Rendered HTML? I don't know about you, but in a lot of my existing server driven applications I have lists that display a fair amount of data. Typically these lists contain links to then drill down into more specific data either for viewing or editing. You can then click on a link and go off to a detail page that provides more concise content. So far so good. But now you're done with the detail page and need to get back to the list, so you click on a 'bread crumbs trail' or an application level 'back to list' button and… …you end up back at the top of the list - the scroll position, the current selection in some cases even filters conditions - all gone with the wind. You've left behind the state of the list and are starting from scratch in your browsing of the list from the top. Not cool! Sound familiar? This a pretty common scenario with server rendered HTML content where it's so common to display lists to drill into, only to lose state in the process of returning back to the original list. Look at just about any traditional forums application, or even StackOverFlow to see what I mean here. Scroll down a bit to look at a post or entry, drill in then use the bread crumbs or tab to go back… In some cases returning to the top of a list is not a big deal. On StackOverFlow that sort of works because content is turning around so quickly you probably want to actually look at the top posts. Not always though - if you're browsing through a list of search topics you're interested in and drill in there's no way back to that position. Essentially anytime you're actively browsing the items in the list, that's when state becomes important and if it's not handled the user experience can be really disrupting. Content Caching If you're building client centric SPA style applications this is a fairly easy to solve problem - you tend to render the list once and then update the page content to overlay the detail content, only hiding the list temporarily until it's used again later. It's relatively easy to accomplish this simply by hiding content on the page and later making it visible again. But if you use server rendered content, hanging on to all the detail like filters, selections and scroll position is not quite as easy. Or is it??? This is where sessionStorage comes in handy. What if we just save the rendered content of a previous page, and then restore it when we return to this page based on a special flag that tells us to use the cached version? Let's see how we can do this. A real World Use Case Recently my local ISP asked me to help out with updating an ancient classifieds application. They had a very busy, local classifieds app that was originally an ASP classic application. The old app was - wait for it: frames based - and even though I lobbied against it, the decision was made to keep the frames based layout to allow rapid browsing of the hundreds of posts that are made on a daily basis. The primary reason they wanted this was precisely for the ability to quickly browse content item by item. While I personally hate working with Frames, I have to admit that the UI actually works well with the frames layout as long as you're running on a large desktop screen. You can check out the frames based desktop site here: http://classifieds.gorge.net/ However when I rebuilt the app I also added a secondary view that doesn't use frames. The main reason for this of course was for mobile displays which work horribly with frames. So there's a somewhat mobile friendly interface to the interface, which ditches the frames and uses some responsive design tweaking for mobile capable operation: http://classifeds.gorge.net/mobile (or browse the base url with your browser width under 800px) Here's what the mobile, non-frames view looks like: As you can see this means that the list of classifieds posts now is a list and there's a separate page for drilling down into the item. And of course… originally we ran into that usability issue I mentioned earlier where the browse, view detail, go back to the list cycle resulted in lost list state. Originally in mobile mode you scrolled through the list, found an item to look at and drilled in to display the item detail. Then you clicked back to the list and BAM - you've lost your place. Because there are so many items added on a daily basis the full list is never fully loaded, but rather there's a "Load Additional Listings" entry at the button. Not only did we originally lose our place when coming back to the list, but any 'additionally loaded' items are no longer there because the list was now rendering as if it was the first page hit. The additional listings, and any filters, the selection of an item all were lost. Major Suckage! Using Client SessionStorage to cache Server Rendered Content To work around this problem I decided to cache the rendered page content from the list in SessionStorage. Anytime the list renders or is updated with Load Additional Listings, the page HTML is cached and stored in Session Storage. Any back links from the detail page or the login or write entry forms then point back to the list page with a back=true query string parameter. If the server side sees this parameter it doesn't render the part of the page that is cached. Instead the client side code retrieves the data from the sessionState cache and simply inserts it into the page. It sounds pretty simple, and the overall the process is really easy, but there are a few gotchas that I'll discuss in a minute. But first let's look at the implementation. Let's start with the server side here because that'll give a quick idea of the doc structure. As I mentioned the server renders data from an ASP.NET MVC view. On the list page when returning to the list page from the display page (or a host of other pages) looks like this: https://classifieds.gorge.net/list?back=True The query string value is a flag, that indicates whether the server should render the HTML. Here's what the top level MVC Razor view for the list page looks like:@model MessageListViewModel @{ ViewBag.Title = "Classified Listing"; bool isBack = !string.IsNullOrEmpty(Request.QueryString["back"]); } <form method="post" action="@Url.Action("list")"> <div id="SizingContainer"> @if (!isBack) { @Html.Partial("List_CommandBar_Partial", Model) <div id="PostItemContainer" class="scrollbox" xstyle="-webkit-overflow-scrolling: touch;"> @Html.Partial("List_Items_Partial", Model) @if (Model.RequireLoadEntry) { <div class="postitem loadpostitems" style="padding: 15px;"> <div id="LoadProgress" class="smallprogressright"></div> <div class="control-progress"> Load additional listings... </div> </div> } </div> } </div> </form> As you can see the query string triggers a conditional block that if set is simply not rendered. The content inside of #SizingContainer basically holds the entire page's HTML sans the headers and scripts, but including the filter options and menu at the top. In this case this makes good sense - in other situations the fact that the menu or filter options might be dynamically updated might make you only cache the list rather than essentially the entire page. In this particular instance all of the content works and produces the proper result as both the list along with any filter conditions in the form inputs are restored. Ok, let's move on to the client. On the client there are two page level functions that deal with saving and restoring state. Like the counter example I showed earlier, I like to wrap the logic to save and restore values from sessionState into a separate function because they are almost always used in several places.page.saveData = function(id) { if (!sessionStorage) return; var data = { id: id, scroll: $("#PostItemContainer").scrollTop(), html: $("#SizingContainer").html() }; sessionStorage.setItem("list_html",JSON.stringify(data)); }; page.restoreData = function() { if (!sessionStorage) return; var data = sessionStorage.getItem("list_html"); if (!data) return null; return JSON.parse(data); }; The data that is saved is an object which contains an ID which is the selected element when the user clicks and a scroll position. These two values are used to reset the scroll position when the data is used from the cache. Finally the html from the #SizingContainer element is stored, which makes for the bulk of the document's HTML. In this application the HTML captured could be a substantial bit of data. If you recall, I mentioned that the server side code renders a small chunk of data initially and then gets more data if the user reads through the first 50 or so items. The rest of the items retrieved can be rather sizable. Other than the JSON deserialization that's Ok. Since I'm using SessionStorage the storage space has no immediate limits. Next is the core logic to handle saving and restoring the page state. At first though this would seem pretty simple, and in some cases it might be, but as the following code demonstrates there are a few gotchas to watch out for. Here's the relevant code I use to save and restore:$( function() { … var isBack = getUrlEncodedKey("back", location.href); if (isBack) { // remove the back key from URL setUrlEncodedKey("back", "", location.href); var data = page.restoreData(); // restore from sessionState if (!data) { // no data - force redisplay of the server side default list window.location = "list"; return; } $("#SizingContainer").html(data.html); var el = $(".postitem[data-id=" + data.id + "]"); $(".postitem").removeClass("highlight"); el.addClass("highlight"); $("#PostItemContainer").scrollTop(data.scroll); setTimeout(function() { el.removeClass("highlight"); }, 2500); } else if (window.noFrames) page.saveData(null); // save when page loads $("#SizingContainer").on("click", ".postitem", function() { var id = $(this).attr("data-id"); if (!id) return true; if (window.noFrames) page.saveData(id); var contentFrame = window.parent.frames["Content"]; if (contentFrame) contentFrame.location.href = "show/" + id; else window.location.href = "show/" + id; return false; }); … The code starts out by checking for the back query string flag which triggers restoring from the client cache. If cached the cached data structure is read from sessionStorage. It's important here to check if data was returned. If the user had back=true on the querystring but there is no cached data, he likely bookmarked this page or otherwise shut down the browser and came back to this URL. In that case the server didn't render any detail and we have no cached data, so all we can do is redirect to the original default list view using window.location. If we continued the page would render no data - so make sure to always check the cache retrieval result. Always! If there is data the it's loaded and the data.html data is restored back into the document by simply injecting the HTML back into the document's #SizingContainer element:$("#SizingContainer").html(data.html); It's that simple and it's quite quick even with a fully loaded list of additional items and on a phone. The actual HTML data is stored to the cache on every page load initially and then again when the user clicks on an element to navigate to a particular listing. The former ensures that the client cache always has something in it, and the latter updates with additional information for the selected element. For the click handling I use a data-id attribute on the list item (.postitem) in the list and retrieve the id from that. That id is then used to navigate to the actual entry as well as storing that Id value in the saved cached data. The id is used to reset the selection by searching for the data-id value in the restored elements. The overall process of this save/restore process is pretty straight forward and it doesn't require a bunch of code, yet it yields a huge improvement in the usability of the site on mobile devices (or anybody who uses the non-frames view). Some things to watch out for As easy as it conceptually seems to simply store and retrieve cached content, you have to be quite aware what type of content you are caching. The code above is all that's specific to cache/restore cycle and it works, but it took a few tweaks to the rest of the script code and server code to make it all work. There were a few gotchas that weren't immediately obvious. Here are a few things to pay attention to: Event Handling Logic Timing of manipulating DOM events Inline Script Code Bookmarking to the Cache Url when no cache exists Do you have inline script code in your HTML? That script code isn't going to run if you restore from cache and simply assign or it may not run at the time you think it would normally in the DOM rendering cycle. JavaScript Event Hookups The biggest issue I ran into with this approach almost immediately is that originally I had various static event handlers hooked up to various UI elements that are now cached. If you have an event handler like:$("#btnSearch").click( function() {…}); that works fine when the page loads with server rendered HTML, but that code breaks when you now load the HTML from cache. Why? Because the elements you're trying to hook those events to may not actually be there - yet. Luckily there's an easy workaround for this by using deferred events. With jQuery you can use the .on() event handler instead:$("#SelectionContainer").on("click","#btnSearch", function() {…}); which monitors a parent element for the events and checks for the inner selector elements to handle events on. This effectively defers to runtime event binding, so as more items are added to the document bindings still work. For any cached content use deferred events. Timing of manipulating DOM Elements Along the same lines make sure that your DOM manipulation code follows the code that loads the cached content into the page so that you don't manipulate DOM elements that don't exist just yet. Ideally you'll want to check for the condition to restore cached content towards the top of your script code, but that can be tricky if you have components or other logic that might not all run in a straight line. Inline Script Code Here's another small problem I ran into: I use a DateTime Picker widget I built a while back that relies on the jQuery date time picker. I also created a helper function that allows keyboard date navigation into it that uses JavaScript logic. Because MVC's limited 'object model' the only way to embed widget content into the page is through inline script. This code broken when I inserted the cached HTML into the page because the script code was not available when the component actually got injected into the page. As the last bullet - it's a matter of timing. There's no good work around for this - in my case I pulled out the jQuery date picker and relied on native <input type="date" /> logic instead - a better choice these days anyway, especially since this view is meant to be primarily to serve mobile devices which actually support date input through the browser (unlike desktop browsers of which only WebKit seems to support it). Bookmarking Cached Urls When you cache HTML content you have to make a decision whether you cache on the client and also not render that same content on the server. In the Classifieds app I didn't render server side content so if the user comes to the page with back=True and there is no cached content I have to a have a Plan B. Typically this happens when somebody ends up bookmarking the back URL. The easiest and safest solution for this scenario is to ALWAYS check the cache result to make sure it exists and if not have a safe URL to go back to - in this case to the plain uncached list URL which amounts to effectively redirecting. This seems really obvious in hindsight, but it's easy to overlook and not see a problem until much later, when it's not obvious at all why the page is not rendering anything. Don't use <body> to replace Content Since we're practically replacing all the HTML in the page it may seem tempting to simply replace the HTML content of the <body> tag. Don't. The body tag usually contains key things that should stay in the page and be there when it loads. Specifically script tags and elements and possibly other embedded content. It's best to create a top level DOM element specifically as a placeholder container for your cached content and wrap just around the actual content you want to replace. In the app above the #SizingContainer is that container. Other Approaches The approach I've used for this application is kind of specific to the existing server rendered application we're running and so it's just one approach you can take with caching. However for server rendered content caching this is a pattern I've used in a few apps to retrofit some client caching into list displays. In this application I took the path of least resistance to the existing server rendering logic. Here are a few other ways that come to mind: Using Partial HTML Rendering via AJAXInstead of rendering the page initially on the server, the page would load empty and the client would render the UI by retrieving the respective HTML and embedding it into the page from a Partial View. This effectively makes the initial rendering and the cached rendering logic identical and removes the server having to decide whether this request needs to be rendered or not (ie. not checking for a back=true switch). All the logic related to caching is made on the client in this case. Using JSON Data and Client RenderingThe hardcore client option is to do the whole UI SPA style and pull data from the server and then use client rendering or databinding to pull the data down and render using templates or client side databinding with knockout/angular et al. As with the Partial Rendering approach the advantage is that there's no difference in the logic between pulling the data from cache or rendering from scratch other than the initial check for the cache request. Of course if the app is a full on SPA app, then caching may not be required even - the list could just stay in memory and be hidden and reactivated. I'm sure there are a number of other ways this can be handled as well especially using AJAX. AJAX rendering might simplify the logic, but it also complicates search engine optimization since there's no content loaded initially. So there are always tradeoffs and it's important to look at all angles before deciding on any sort of caching solution in general. State of the Session SessionState and LocalStorage are easy to use in client code and can be integrated even with server centric applications to provide nice caching features of content and data. In this post I've shown a very specific scenario of storing HTML content for the purpose of remembering list view data and state and making the browsing experience for lists a bit more friendly, especially if there's dynamically loaded content involved. If you haven't played with sessionStorage or localStorage I encourage you to give it a try. There's a lot of cool stuff that you can do with this beyond the specific scenario I've covered here… Resources Overview of localStorage (also applies to sessionStorage) Web Storage Compatibility Modernizr Test Suite© Rick Strahl, West Wind Technologies, 2005-2013Posted in JavaScript HTML5 ASP.NET MVC Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

questions regarding the use of A* with the 15-square puzzle

- by Cheeso

I'm trying to build an A* solver for a 15-square puzzle. The goal is to re-arrange the tiles so that they appear in their natural positions. You can only slide one tile at a time. Each possible state of the puzzle is a node in the search graph. For the h(x) function, I am using an aggregate sum, across all tiles, of the tile's dislocation from the goal state. In the above image, the 5 is at location 0,0, and it belongs at location 1,0, therefore it contributes 1 to the h(x) function. The next tile is the 11, located at 0,1, and belongs at 2,2, therefore it contributes 3 to h(x). And so on. EDIT: I now understand this is what they call "Manhattan distance", or "taxicab distance". I have been using a step count for g(x). In my implementation, for any node in the state graph, g is just +1 from the prior node's g. To find successive nodes, I just examine where I can possibly move the "hole" in the puzzle. There are 3 neighbors for the puzzle state (aka node) that is displayed: the hole can move north, west, or east. My A* search sometimes converges to a solution in 20s, sometimes 180s, and sometimes doesn't converge at all (waited 10 mins or more). I think h is reasonable. I'm wondering if I've modeled g properly. In other words, is it possible that my A* function is reaching a node in the graph via a path that is not the shortest path? Maybe have I not waited long enough? Maybe 10 minutes is not long enough? For a fully random arrangement, (assuming no parity problems), What is the average number of permutations an A* solution will examine? (please show the math) I'm going to look for logic errors in my code, but in the meantime, Any tips? (ps: it's done in Javascript). Also, no, this isn't CompSci homework. It's just a personal exploration thing. I'm just trying to learn Javascript. EDIT: I've found that the run-time is highly depend upon the heuristic. I saw the 10x factor applied to the heuristic from the article someone mentioned, and it made me wonder - why 10x? Why linear? Because this is done in javascript, I could modify the code to dynamically update an html table with the node currently being considered. This allowd me to peek at the algorithm as it was progressing. With a regular taxicab distance heuristic, I watched as it failed to converge. There were 5's and 12's in the top row, and they kept hanging around. I'd see 1,2,3,4 creep into the top row, but then they'd drop out, and other numbers would move up there. What I was hoping to see was 1,2,3,4 sort of creeping up to the top, and then staying there. I thought to myself - this is not the way I solve this personally. Doing this manually, I solve the top row, then the 2ne row, then the 3rd and 4th rows sort of concurrently. So I tweaked the h(x) function to more heavily weight the higher rows and the "lefter" columns. The result was that the A* converged much more quickly. It now runs in 3 minutes instead of "indefinitely". With the "peek" I talked about, I can see the smaller numbers creep up to the higher rows and stay there. Not only does this seem like the right thing, it runs much faster. I'm in the process of trying a bunch of variations. It seems pretty clear that A* runtime is very sensitive to the heuristic. Currently the best heuristic I've found uses the summation of dislocation * ((4-i) + (4-j)) where i and j are the row and column, and dislocation is the taxicab distance. One interesting part of the result I got: with a particular heuristic I find a path very quickly, but it is obviously not the shortest path. I think this is because I am weighting the heuristic. In one case I got a path of 178 steps in 10s. My own manual effort produce a solution in 87 moves. (much more than 10s). More investigation warranted. So the result is I am seeing it converge must faster, and the path is definitely not the shortest. I have to think about this more. Code: var stop = false; function Astar(start, goal, callback) { // start and goal are nodes in the graph, represented by // an array of 16 ints. The goal is: [1,2,3,...14,15,0] // Zero represents the hole. // callback is a method to call when finished. This runs a long time, // therefore we need to use setTimeout() to break it up, to avoid // the browser warning like "Stop running this script?" // g is the actual distance traveled from initial node to current node. // h is the heuristic estimate of distance from current to goal. stop = false; start.g = start.dontgo = 0; // calcHeuristic inserts an .h member into the array calcHeuristicDistance(start); // start the stack with one element var closed = []; // set of nodes already evaluated. var open = [ start ]; // set of nodes to evaluate (start with initial node) var iteration = function() { if (open.length==0) { // no more nodes. Fail. callback(null); return; } var current = open.shift(); // get highest priority node // update the browser with a table representation of the // node being evaluated $("#solution").html(stateToString(current)); // check solution returns true if current == goal if (checkSolution(current,goal)) { // reconstructPath just records the position of the hole // through each node var path= reconstructPath(start,current); callback(path); return; } closed.push(current); // get the set of neighbors. This is 3 or fewer nodes. // (nextStates is optimized to NOT turn directly back on itself) var neighbors = nextStates(current, goal); for (var i=0; i<neighbors.length; i++) { var n = neighbors[i]; // skip this one if we've already visited it if (closed.containsNode(n)) continue; // .g, .h, and .previous get assigned implicitly when // calculating neighbors. n.g is nothing more than // current.g+1 ; // add to the open list if (!open.containsNode(n)) { // slot into the list, in priority order (minimum f first) open.priorityPush(n); n.previous = current; } } if (stop) { callback(null); return; } setTimeout(iteration, 1); }; // kick off the first iteration iteration(); return null; }

Read the article

Weblogic is slow to start (11mins) under VM (VirtualBox and VMware)

- by Vladimir Dyuzhev

(SOLVED! BY FAKING SYSTEM RANDOM GENERATOR, SEE BELOW) I'm setting up a VM image for my dev/build team. Inside that VM a Weblogic domain should be running. I use Ububtu server distro, WLS 9.2MP3 + ALSB. Everything works OK, quite fast, but at the start time the WLS stops twice for a measurable amounts of time. Two stops in total amount to about 10 minutes delay. For tasks where deployment requires server restart it's very annoying. :-( Sleeping time is not constant, sometimes the server starts very fast, sometimes so-so, sometimes 10 minutes or more. Interesting that if I press Enter while looking at the stopped server, it wakes up much faster, sometimes after a few seconds. WLST (Weblogic Jython shell) is also hanging for quite a time when executed in VM. It doesn't react to Enter though. Here must be some developers who run WLS with a VM. I wonder if others have the same problem? Was someone able to solve it? Here's the server output (just for a case): Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04) Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode) Starting WLS with line: /shared2/beahome/jdk150_12/bin/java -client -Xmx256m -XX:MaxPermSize=128m -Xverify:none -da -Dplatform.home=/shared2/beahome/weblogic92 -Dwls.home=/shared2/beahome/weblogic92/server -Dwli.home=/shared2/beahome/weblogic92/integration -Dweblogic.management.discover=true -Dwl w.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/shared2/beahome/patch_weblogic923/profiles/default/sysext_ manifest_classpath -Dweblogic.management.username=admin -Dweblogic.management.password=wlsadmin -Dweblogic.Name=LOGMGR-admin -Djava.security .policy=/shared2/beahome/weblogic92/server/lib/weblogic.policy weblogic.Server <1-Apr-2010 12:47:22 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000395> <Following extensions directory contents added to the end of the classpath: /shared2/beahome/weblogic92/platform/lib/p13n/p13n-schemas.jar:/shared2/beahome/weblogic92/platform/lib/p13n/p13n_common.jar:/shared2/beahom e/weblogic92/platform/lib/p13n/p13n_system.jar:/shared2/beahome/weblogic92/platform/lib/wlp/netuix_common.jar:/shared2/beahome/weblogic92/pl atform/lib/wlp/netuix_schemas.jar:/shared2/beahome/weblogic92/platform/lib/wlp/netuix_system.jar:/shared2/beahome/weblogic92/platform/lib/wl p/wsrp-common.jar> <1-Apr-2010 12:47:22 o'clock PM GMT-05:00> <Info> <WebLogicServer> <BEA-000377> <Starting WebLogic Server with Java HotSpot(TM) Client VM Ve rsion 1.5.0_12-b04 from Sun Microsystems Inc.> <1-Apr-2010 12:47:23 o'clock PM GMT-05:00> <Info> <Management> <BEA-141107> <Version: WebLogic Server 9.2 MP3 Mon Mar 10 08:28:41 EDT 2008 1096261 > <1-Apr-2010 12:47:25 o'clock PM GMT-05:00> <Info> <WebLogicServer> <BEA-000215> <Loaded License : /shared2/beahome/license.bea> <1-Apr-2010 12:47:25 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING> <1-Apr-2010 12:47:25 o'clock PM GMT-05:00> <Info> <WorkManager> <BEA-002900> <Initializing self-tuning thread pool> <1-Apr-2010 12:47:25 o'clock PM GMT-05:00> <Notice> <Log Management> <BEA-170019> <The server log file /shared2/wldomains/beaadmd/LOGMGR/ser vers/LOGMGR-admin/logs/LOGMGR-admin.log is opened. All server side log events will be written to this file.> Here we have the first delay, up to 5 mins... <1-Apr-2010 12:53:21 o'clock PM GMT-05:00> <Notice> <Security> <BEA-090082> <Security initializing using security realm myrealm.> <1-Apr-2010 12:53:24 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STANDBY> <1-Apr-2010 12:53:24 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING> <1-Apr-2010 12:53:25 o'clock PM GMT-05:00> <Notice> <Log Management> <BEA-170027> <The server initialized the domain log broadcaster success fully. Log messages will now be broadcasted to the domain log.> <1-Apr-2010 12:53:25 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to ADMIN> <1-Apr-2010 12:53:25 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RESUMING> <1-Apr-2010 12:53:28 o'clock PM GMT-05:00> <Notice> <Security> <BEA-090171> <Loading the identity certificate and private key stored under t he alias adminuialias from the jks keystore file /shared2/wldomains/beaadmd/LOGMGR/CustomIdentity.jks.> And here is the second, again up to 5 mins. <1-Apr-2010 12:58:56 o'clock PM GMT-05:00> <Notice> <Security> <BEA-090169> <Loading trusted certificates from the jks keystore file /shared 2/wldomains/beaadmd/LOGMGR/CustomTrust.jks.> <1-Apr-2010 12:58:57 o'clock PM GMT-05:00> <Notice> <Server> <BEA-002613> <Channel "DefaultSecure" is now listening on 192.168.56.102:7002 f or protocols iiops, t3s, ldaps, https.> <1-Apr-2010 12:58:57 o'clock PM GMT-05:00> <Notice> <Server> <BEA-002613> <Channel "Default" is now listening on 192.168.56.102:8012 for pro tocols iiop, t3, ldap, http.> <1-Apr-2010 12:58:57 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000331> <Started WebLogic Admin Server "LOGMGR-admin" for domain " LOGMGR" running in Development Mode> <1-Apr-2010 12:58:57 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RUNNING> <1-Apr-2010 12:58:57 o'clock PM GMT-05:00> <Notice> <WebLogicServer> <BEA-000360> <Server started in RUNNING mode> UPDATE I think I've got the track: it must be the randon seed initialization. That may explain why generating keyboard events release the server. I've made the thread dump, and one thread is in runnable state, but waiting: "[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=1 tid=0x0a7b06e8 nid=0xeda runnable [0x728a500 0..0x728a6d80] at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:194) at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185) at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:202) - locked <0x7d928c78> (a java.lang.Object) at sun.security.provider.NativePRNG$RandomIO.access$300(NativePRNG.java:108) at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:102) at java.security.SecureRandom.generateSeed(SecureRandom.java:475) at weblogic.security.AbstractRandomData.ensureInittedAndSeeded(AbstractRandomData.java:83) SOLVED Weblogic uses SecureRandom to init security subsystem. SecureRandom by default uses /dev/urandom file. For some reason, reading this file under VM comes to halt quite often. Generating console events helps to create more randomness, and release the WLS. For the test purposes I have changed jre/lib/security/java.security file, property to securerandom.source=file:/tmp/big.random.file. Weblogic now starts in 15 seconds.

Read the article

Disable .htaccess from apache allowoverride none, still reads .htaccess files

- by John Magnolia

I have moved all of our .htaccess config into <Directory> blocks and set AllowOverride None in the default and default-ssl. Although after restarting apache it is still reading the .htaccess files. How can I completely turn off reading these files? Update of all files with "AllowOverride" /etc/apache2/mods-available/userdir.conf <IfModule mod_userdir.c> UserDir public_html UserDir disabled root <Directory /home/*/public_html> AllowOverride FileInfo AuthConfig Limit Indexes Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec <Limit GET POST OPTIONS> Order allow,deny Allow from all </Limit> <LimitExcept GET POST OPTIONS> Order deny,allow Deny from all </LimitExcept> </Directory> </IfModule> /etc/apache2/mods-available/alias.conf <IfModule alias_module> # # Aliases: Add here as many aliases as you need (with no limit). The format is # Alias fakename realname # # Note that if you include a trailing / on fakename then the server will # require it to be present in the URL. So "/icons" isn't aliased in this # example, only "/icons/". If the fakename is slash-terminated, then the # realname must also be slash terminated, and if the fakename omits the # trailing slash, the realname must also omit it. # # We include the /icons/ alias for FancyIndexed directory listings. If # you do not use FancyIndexing, you may comment this out. # Alias /icons/ "/usr/share/apache2/icons/" <Directory "/usr/share/apache2/icons"> Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all </Directory> </IfModule> /etc/apache2/httpd.conf # # Directives to allow use of AWStats as a CGI # Alias /awstatsclasses "/usr/share/doc/awstats/examples/wwwroot/classes/" Alias /awstatscss "/usr/share/doc/awstats/examples/wwwroot/css/" Alias /awstatsicons "/usr/share/doc/awstats/examples/wwwroot/icon/" ScriptAlias /awstats/ "/usr/share/doc/awstats/examples/wwwroot/cgi-bin/" # # This is to permit URL access to scripts/files in AWStats directory. # <Directory "/usr/share/doc/awstats/examples/wwwroot"> Options None AllowOverride None Order allow,deny Allow from all </Directory> Alias /awstats-icon/ /usr/share/awstats/icon/ <Directory /usr/share/awstats/icon> Options None AllowOverride None Order allow,deny Allow from all </Directory> /etc/apache2/sites-available/default-ssl <IfModule mod_ssl.c> <VirtualHost _default_:443> ServerAdmin webmaster@localhost DocumentRoot /var/www <Directory /> Options FollowSymLinks AllowOverride None </Directory> <Directory /var/www/> Options Indexes FollowSymLinks MultiViews AllowOverride None </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/ssl_access.log combined # SSL Engine Switch: # Enable/Disable SSL for this virtual host. SSLEngine on # A self-signed (snakeoil) certificate can be created by installing # the ssl-cert package. See # /usr/share/doc/apache2.2-common/README.Debian.gz for more info. # If both key and certificate are stored in the same file, only the # SSLCertificateFile directive is needed. SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key # Server Certificate Chain: # Point SSLCertificateChainFile at a file containing the # concatenation of PEM encoded CA certificates which form the # certificate chain for the server certificate. Alternatively # the referenced file can be the same as SSLCertificateFile # when the CA certificates are directly appended to the server # certificate for convinience. #SSLCertificateChainFile /etc/apache2/ssl.crt/server-ca.crt # Certificate Authority (CA): # Set the CA certificate verification path where to find CA # certificates for client authentication or alternatively one # huge file containing all of them (file must be PEM encoded) # Note: Inside SSLCACertificatePath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCACertificatePath /etc/ssl/certs/ #SSLCACertificateFile /etc/apache2/ssl.crt/ca-bundle.crt # Certificate Revocation Lists (CRL): # Set the CA revocation path where to find CA CRLs for client # authentication or alternatively one huge file containing all # of them (file must be PEM encoded) # Note: Inside SSLCARevocationPath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCARevocationPath /etc/apache2/ssl.crl/ #SSLCARevocationFile /etc/apache2/ssl.crl/ca-bundle.crl # Client Authentication (Type): # Client certificate verification type and depth. Types are # none, optional, require and optional_no_ca. Depth is a # number which specifies how deeply to verify the certificate # issuer chain before deciding the certificate is not valid. #SSLVerifyClient require #SSLVerifyDepth 10 # Access Control: # With SSLRequire you can do per-directory access control based # on arbitrary complex boolean expressions containing server # variable checks and other lookup directives. The syntax is a # mixture between C and Perl. See the mod_ssl documentation # for more details. #<Location /> #SSLRequire ( %{SSL_CIPHER} !~ m/^(EXP|NULL)/ \ # and %{SSL_CLIENT_S_DN_O} eq "Snake Oil, Ltd." \ # and %{SSL_CLIENT_S_DN_OU} in {"Staff", "CA", "Dev"} \ # and %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 \ # and %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 20 ) \ # or %{REMOTE_ADDR} =~ m/^192\.76\.162\.[0-9]+$/ #</Location> # SSL Engine Options: # Set various options for the SSL engine. # o FakeBasicAuth: # Translate the client X.509 into a Basic Authorisation. This means that # the standard Auth/DBMAuth methods can be used for access control. The # user name is the `one line' version of the client's X.509 certificate. # Note that no password is obtained from the user. Every entry in the user # file needs this password: `xxj31ZMTZzkVA'. # o ExportCertData: # This exports two additional environment variables: SSL_CLIENT_CERT and # SSL_SERVER_CERT. These contain the PEM-encoded certificates of the # server (always existing) and the client (only existing when client # authentication is used). This can be used to import the certificates # into CGI scripts. # o StdEnvVars: # This exports the standard SSL/TLS related `SSL_*' environment variables. # Per default this exportation is switched off for performance reasons, # because the extraction step is an expensive operation and is usually # useless for serving static content. So one usually enables the # exportation for CGI and SSI requests only. # o StrictRequire: # This denies access when "SSLRequireSSL" or "SSLRequire" applied even # under a "Satisfy any" situation, i.e. when it applies access is denied # and no other module can change it. # o OptRenegotiate: # This enables optimized SSL connection renegotiation handling when SSL # directives are used in per-directory context. #SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire <FilesMatch "\.(cgi|shtml|phtml|php)$"> SSLOptions +StdEnvVars </FilesMatch> <Directory /usr/lib/cgi-bin> SSLOptions +StdEnvVars </Directory> # SSL Protocol Adjustments: # The safe and default but still SSL/TLS standard compliant shutdown # approach is that mod_ssl sends the close notify alert but doesn't wait for # the close notify alert from client. When you need a different shutdown # approach you can use one of the following variables: # o ssl-unclean-shutdown: # This forces an unclean shutdown when the connection is closed, i.e. no # SSL close notify alert is send or allowed to received. This violates # the SSL/TLS standard but is needed for some brain-dead browsers. Use # this when you receive I/O errors because of the standard approach where # mod_ssl sends the close notify alert. # o ssl-accurate-shutdown: # This forces an accurate shutdown when the connection is closed, i.e. a # SSL close notify alert is send and mod_ssl waits for the close notify # alert of the client. This is 100% SSL/TLS standard compliant, but in # practice often causes hanging connections with brain-dead browsers. Use # this only for browsers where you know that their SSL implementation # works correctly. # Notice: Most problems of broken clients are also related to the HTTP # keep-alive facility, so you usually additionally want to disable # keep-alive for those clients, too. Use variable "nokeepalive" for this. # Similarly, one has to force some clients to use HTTP/1.0 to workaround # their broken HTTP/1.1 implementation. Use variables "downgrade-1.0" and # "force-response-1.0" for this. BrowserMatch "MSIE [2-6]" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 # MSIE 7 and newer should be able to use keepalive BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown </VirtualHost> </IfModule> /etc/apache2/sites-available/default <VirtualHost *:80> ServerAdmin webmaster@localhost DocumentRoot /var/www <Directory /> Options FollowSymLinks AllowOverride None </Directory> <Directory /var/www/> Options -Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all </Directory> ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ <Directory "/usr/lib/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all </Directory> Alias /delboy /usr/share/phpmyadmin <Directory /usr/share/phpmyadmin> # Restrict phpmyadmin access Order Deny,Allow Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog ${APACHE_LOG_DIR}/access.log combined Alias /doc/ "/usr/share/doc/" <Directory "/usr/share/doc/"> Options Indexes MultiViews FollowSymLinks AllowOverride None Order deny,allow Deny from all Allow from 127.0.0.0/255.0.0.0 ::1/128 </Directory> </VirtualHost> /etc/apache2/conf.d/security # # Disable access to the entire file system except for the directories that # are explicitly allowed later. # # This currently breaks the configurations that come with some web application # Debian packages. # #<Directory /> # AllowOverride None # Order Deny,Allow # Deny from all #</Directory> # Changing the following options will not really affect the security of the # server, but might make attacks slightly more difficult in some cases. # # ServerTokens # This directive configures what you return as the Server HTTP response # Header. The default is 'Full' which sends information about the OS-Type # and compiled in modules. # Set to one of: Full | OS | Minimal | Minor | Major | Prod # where Full conveys the most information, and Prod the least. # #ServerTokens Minimal ServerTokens OS #ServerTokens Full # # Optionally add a line containing the server version and virtual host # name to server-generated pages (internal error documents, FTP directory # listings, mod_status and mod_info output etc., but not CGI generated # documents or custom error documents). # Set to "EMail" to also include a mailto: link to the ServerAdmin. # Set to one of: On | Off | EMail # #ServerSignature Off ServerSignature On # # Allow TRACE method # # Set to "extended" to also reflect the request body (only for testing and # diagnostic purposes). # # Set to one of: On | Off | extended # TraceEnable Off #TraceEnable On /etc/apache2/apache2.conf # # Based upon the NCSA server configuration files originally by Rob McCool. # # This is the main Apache server configuration file. It contains the # configuration directives that give the server its instructions. # See http://httpd.apache.org/docs/2.2/ for detailed information about # the directives. # # Do NOT simply read the instructions in here without understanding # what they do. They're here only as hints or reminders. If you are unsure # consult the online docs. You have been warned. # # The configuration directives are grouped into three basic sections: # 1. Directives that control the operation of the Apache server process as a # whole (the 'global environment'). # 2. Directives that define the parameters of the 'main' or 'default' server, # which responds to requests that aren't handled by a virtual host. # These directives also provide default values for the settings # of all virtual hosts. # 3. Settings for virtual hosts, which allow Web requests to be sent to # different IP addresses or hostnames and have them handled by the # same Apache server process. # # Configuration and logfile names: If the filenames you specify for many # of the server's control files begin with "/" (or "drive:/" for Win32), the # server will use that explicit path. If the filenames do *not* begin # with "/", the value of ServerRoot is prepended -- so "foo.log" # with ServerRoot set to "/etc/apache2" will be interpreted by the # server as "/etc/apache2/foo.log". # ### Section 1: Global Environment # # The directives in this section affect the overall operation of Apache, # such as the number of concurrent requests it can handle or where it # can find its configuration files. # # # ServerRoot: The top of the directory tree under which the server's # configuration, error, and log files are kept. # # NOTE! If you intend to place this on an NFS (or otherwise network) # mounted filesystem then please read the LockFile documentation (available # at <URL:http://httpd.apache.org/docs/2.2/mod/mpm_common.html#lockfile>); # you will save yourself a lot of trouble. # # Do NOT add a slash at the end of the directory path. # #ServerRoot "/etc/apache2" # # The accept serialization lock file MUST BE STORED ON A LOCAL DISK. # LockFile ${APACHE_LOCK_DIR}/accept.lock # # PidFile: The file in which the server should record its process # identification number when it starts. # This needs to be set in /etc/apache2/envvars # PidFile ${APACHE_PID_FILE} # # Timeout: The number of seconds before receives and sends time out. # Timeout 300 # # KeepAlive: Whether or not to allow persistent connections (more than # one request per connection). Set to "Off" to deactivate. # KeepAlive On # # MaxKeepAliveRequests: The maximum number of requests to allow # during a persistent connection. Set to 0 to allow an unlimited amount. # We recommend you leave this number high, for maximum performance. # MaxKeepAliveRequests 100 # # KeepAliveTimeout: Number of seconds to wait for the next request from the # same client on the same connection. # KeepAliveTimeout 4 ## ## Server-Pool Size Regulation (MPM specific) ## # prefork MPM # StartServers: number of server processes to start # MinSpareServers: minimum number of server processes which are kept spare # MaxSpareServers: maximum number of server processes which are kept spare # MaxClients: maximum number of server processes allowed to start # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_prefork_module> StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 500 </IfModule> # worker MPM # StartServers: initial number of server processes to start # MaxClients: maximum number of simultaneous client connections # MinSpareThreads: minimum number of worker threads which are kept spare # MaxSpareThreads: maximum number of worker threads which are kept spare # ThreadLimit: ThreadsPerChild can be changed to this maximum value during a # graceful restart. ThreadLimit can only be changed by stopping # and starting Apache. # ThreadsPerChild: constant number of worker threads in each server process # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_worker_module> StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0 </IfModule> # event MPM # StartServers: initial number of server processes to start # MaxClients: maximum number of simultaneous client connections # MinSpareThreads: minimum number of worker threads which are kept spare # MaxSpareThreads: maximum number of worker threads which are kept spare # ThreadsPerChild: constant number of worker threads in each server process # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_event_module> StartServers 2 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule> # These need to be set in /etc/apache2/envvars User ${APACHE_RUN_USER} Group ${APACHE_RUN_GROUP} # # AccessFileName: The name of the file to look for in each directory # for additional configuration directives. See also the AllowOverride # directive. # AccessFileName .htaccess # # The following lines prevent .htaccess and .htpasswd files from being # viewed by Web clients. # <Files ~ "^\.ht"> Order allow,deny Deny from all Satisfy all </Files> # # DefaultType is the default MIME type the server will use for a document # if it cannot otherwise determine one, such as from filename extensions. # If your server contains mostly text or HTML documents, "text/plain" is # a good value. If most of your content is binary, such as applications # or images, you may want to use "application/octet-stream" instead to # keep browsers from trying to display binary files as though they are # text. # DefaultType text/plain # # HostnameLookups: Log the names of clients or just their IP addresses # e.g., www.apache.org (on) or 204.62.129.132 (off). # The default is off because it'd be overall better for the net if people # had to knowingly turn this feature on, since enabling it means that # each client request will result in AT LEAST one lookup request to the # nameserver. # HostnameLookups Off # ErrorLog: The location of the error log file. # If you do not specify an ErrorLog directive within a <VirtualHost> # container, error messages relating to that virtual host will be # logged here. If you *do* define an error logfile for a <VirtualHost> # container, that host's errors will be logged there and not here. # ErrorLog ${APACHE_LOG_DIR}/error.log # # LogLevel: Control the number of messages logged to the error_log. # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. # LogLevel warn # Include module configuration: Include mods-enabled/*.load Include mods-enabled/*.conf # Include all the user configurations: Include httpd.conf # Include ports listing Include ports.conf # # The following directives define some format nicknames for use with # a CustomLog directive (see below). # If you are behind a reverse proxy, you might want to change %h into %{X-Forwarded-For}i # LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %O" common LogFormat "%{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent # Include of directories ignores editors' and dpkg's backup files, # see README.Debian for details. # Include generic snippets of statements Include conf.d/ # Include the virtual host configurations: Include sites-enabled/

Read the article

c# Truncate HTML safely for article summary

- by WickedW

Hi All, Does anyone have a c# variation of this? This is so I can take some html and display it without breaking as a summary lead in to an article? http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags Save me from reinventing the wheel! Thank you very much ---------- edit ------------------ Sorry, new here, and your right, should have phrased the question better, heres a bit more info I wish to take a html string and truncate it to a set number of words (or even char length) so I can then show the start of it as a summary (which then leads to the main article). I wish to preserve the html so I can show the links etc in preview. The main issue I have to solve is the fact that we may well end up with unclosed html tags if we truncate in the middle of 1 or more tags! The idea I have for solution is to a) truncate the html to N words (words better but chars ok) first (be sure not to stop in the middle of a tag and truncate a require attribute) b) work through the opened html tags in this truncated string (maybe stick them on stack as I go?) c) then work through the closing tags and ensure they match the ones on stack as I pop them off? d) if any open tags left on stack after this, then write them to end of truncated string and html should be good to go!!!! -- edit 12112009 Here is what I have bumbled together so far as a unittest file in VS2008, this 'may' help someone in future My hack attempts based on Jan code are at top for char version + word version (DISCLAIMER: this is dirty rough code!! on my part) I assume working with 'well-formed' HTML in all cases (but not necessarily a full document with a root node as per XML version) Abels XML version is at bottom, but not yet got round to fully getting tests to run on this yet (plus need to understand the code) ... I will update when I get chance to refine having trouble with posting code? is there no upload facility on stack? Thanks for all comments :) using System; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Xml; using System.Xml.XPath; using Microsoft.VisualStudio.TestTools.UnitTesting; namespace PINET40TestProject { [TestClass] public class UtilityUnitTest { public static string TruncateHTMLSafeishChar(string text, int charCount) { bool inTag = false; int cntr = 0; int cntrContent = 0; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrContent == charCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) cntrContent++; } string substr = text.Substring(0, cntr); //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); // to be honest, this seemed like a good idea then I got lost along the way // so logic is probably hanging by a thread!! foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishWord(string text, int wordCount) { bool inTag = false; int cntr = 0; int cntrWords = 0; Char lastc = ' '; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrWords == wordCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) { // do not count double spaces, and a space not in a tag counts as a word if (c == 32 && lastc != 32) cntrWords++; } } string substr = text.Substring(0, cntr) + " ..."; //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishCharXML(string text, int charCount) { // your data, probably comes from somewhere, or as params to a methodint XmlDocument xml = new XmlDocument(); xml.LoadXml(text); // create a navigator, this is our primary tool XPathNavigator navigator = xml.CreateNavigator(); XPathNavigator breakPoint = null; // find the text node we need: while (navigator.MoveToFollowing(XPathNodeType.Text)) { string lastText = navigator.Value.Substring(0, Math.Min(charCount, navigator.Value.Length)); charCount -= navigator.Value.Length; if (charCount <= 0) { // truncate the last text. Here goes your "search word boundary" code: navigator.SetValue(lastText); breakPoint = navigator.Clone(); break; } } // first remove text nodes, because Microsoft unfortunately merges them without asking while (navigator.MoveToFollowing(XPathNodeType.Text)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent, then move the rest navigator.MoveTo(breakPoint); while (navigator.MoveToFollowing(XPathNodeType.Element)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent // then remove *all* empty nodes to clean up (not necessary): // TODO, add empty elements like <br />, <img /> as exclusion navigator.MoveToRoot(); while (navigator.MoveToFollowing(XPathNodeType.Element)) { while (!navigator.HasChildren && (navigator.Value ?? "").Trim() == "") { navigator.DeleteSelf(); } } // moves to parent navigator.MoveToRoot(); return navigator.InnerXml; } [TestMethod] public void TestTruncateHTMLSafeish() { // Case where we just make it to start of HREF (so effectively an empty link) // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>1234</h1><b><i>56789</i>012</b>", TruncateHTMLSafeishChar( @"<h1>1234</h1><b><i>56789</i>012345</b>", 12)); // In middle of a! Assert.AreEqual(@"<h1>1234</h1><a href=""testurl""><b>567</b></a>", TruncateHTMLSafeishChar( @"<h1>1234</h1><a href=""testurl""><b>5678</b></a><i><strong>some italic nested in string</strong></i>", 7)); // more Assert.AreEqual(@"<div><b><i><strong>1</strong></i></b></div>", TruncateHTMLSafeishChar( @"<div><b><i><strong>12</strong></i></b></div>", 1)); // br Assert.AreEqual(@"<h1>1 3 5</h1><br />6", TruncateHTMLSafeishChar( @"<h1>1 3 5</h1><br />678<br />", 6)); } [TestMethod] public void TestTruncateHTMLSafeishWord() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>one two <br /></h1><b><i>three ...</i></b>", TruncateHTMLSafeishWord( @"<h1>one two <br /></h1><b><i>three </i>four</b>", 3), "we have added ' ...' to end of summary"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } [TestMethod] public void TestTruncateHTMLSafeishWordXML() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags string output = TruncateHTMLSafeishCharXML( @"<body><h1>one two </h1><b><i>three </i>four</b></body>", 13); Assert.AreEqual(@"<body>\r\n <h1>one two </h1>\r\n <b>\r\n <i>three</i>\r\n </b>\r\n</body>", output, "XML version, no ... yet and addeds '\r\n + spaces?' to format document"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishCharXML( @"<body><h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i></body>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } } }

Developer IT