Search Results

Search found 1421 results on 57 pages for 'distinct'.

Page 57/57 | < Previous Page | 53 54 55 56 57 

  • Oracle Coherence, Split-Brain and Recovery Protocols In Detail

    - by Ricardo Ferreira
    This article provides a high level conceptual overview of Split-Brain scenarios in distributed systems. It will focus on a specific example of cluster communication failure and recovery in Oracle Coherence. This includes a discussion on the witness protocol (used to remove failed cluster members) and the panic protocol (used to resolve Split-Brain scenarios). Note that the removal of cluster members does not necessarily indicate a Split-Brain condition. Oracle Coherence does not (and cannot) detect a Split-Brain as it occurs, the condition is only detected when cluster members that previously lost contact with each other regain contact. Cluster Topology and Configuration In order to create an good didactic for the article, let's assume a cluster topology and configuration. In this example we have a six member cluster, consisting of one JVM on each physical machine. The member IDs are as follows: Member ID  IP Address  1  10.149.155.76  2  10.149.155.77  3  10.149.155.236  4  10.149.155.75  5  10.149.155.79  6  10.149.155.78 Members 1, 2, and 3 are connected to a switch, and members 4, 5, and 6 are connected to a second switch. There is a link between the two switches, which provides network connectivity between all of the machines. Member 1 is the first member to join this cluster, thus making it the senior member. Member 6 is the last member to join this cluster. Here is a log snippet from Member 6 showing the complete member set: 2010-02-26 15:27:57.390/3.062 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=main, member=6): Started DefaultCacheServer... SafeCluster: Name=cluster:0xDDEB Group{Address=224.3.5.3, Port=35465, TTL=4} MasterMemberSet ( ThisMember=Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) OldestMember=Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) ActualMemberSet=MemberSet(Size=6, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer) Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ) RecycleMillis=120000 RecycleSet=MemberSet(Size=0, BitSetCount=0 ) ) At approximately 15:30, the connection between the two switches is severed: Thirty seconds later (the default packet timeout in development mode) the logs indicate communication failures across the cluster. In this example, the communication failure was caused by a network failure. In a production setting, this type of communication failure can have many root causes, including (but not limited to) network failures, excessive GC, high CPU utilization, swapping/virtual memory, and exceeding maximum network bandwidth. In addition, this type of failure is not necessarily indicative of a split brain. Any communication failure will be logged in this fashion. Member 2 logs a communication failure with Member 5: 2010-02-26 15:30:32.638/196.928 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) ) The Coherence clustering protocol (TCMP) is a reliable transport mechanism built on UDP. In order for the protocol to be reliable, it requires an acknowledgement (ACK) for each packet delivered. If a packet fails to be acknowledged within the configured timeout period, the Coherence cluster member will log a packet timeout (as seen in the log message above). When this occurs, the cluster member will consult with other members to determine who is at fault for the communication failure. If the witness members agree that the suspect member is at fault, the suspect is removed from the cluster. If the witnesses unanimously disagree, the accuser is removed. This process is known as the witness protocol. Since Member 2 cannot communicate with Member 5, it selects two witnesses (Members 1 and 4) to determine if the communication issue is with Member 5 or with itself (Member 2). However, Member 4 is on the switch that is no longer accessible by Members 1, 2 and 3; thus a packet timeout for member 4 is recorded as well: 2010-02-26 15:30:35.648/199.938 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ) Member 1 has the ability to confirm the departure of member 4, however Member 6 cannot as it is also inaccessible. At the same time, Member 3 sends a request to remove Member 6, which is followed by a report from Member 3 indicating that Member 6 has departed the cluster: 2010-02-26 15:30:35.706/199.996 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=2): MemberLeft request for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) 2010-02-26 15:30:35.709/199.999 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=2): MemberLeft notification for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) The log for Member 3 determines how Member 6 departed the cluster: 2010-02-26 15:30:35.161/191.694 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=3): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer) ) 2010-02-26 15:30:35.165/191.698 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=3): Member departure confirmed by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer) ); removing Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) In this case, Member 3 happened to select two witnesses that it still had connectivity with (Members 1 and 2) thus resulting in a simple decision to remove Member 6. Given the departure of Member 6, Member 2 is left with a single witness to confirm the departure of Member 4: 2010-02-26 15:30:35.713/200.003 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=2): Member departure confirmed by MemberSet(Size=1, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) ); removing Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) In the meantime, Member 4 logs a missing heartbeat from the senior member. This message is also logged on Members 5 and 6. 2010-02-26 15:30:07.906/150.453 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group. Next, Member 4 logs a TcpRing failure with Member 2, thus resulting in the termination of Member 2: 2010-02-26 15:30:21.421/163.968 Oracle Coherence GE 3.5.3/465p2 <D4> (thread=Cluster, member=4): TcpRing: Number of socket exceptions exceeded maximum; last was "java.net.SocketTimeoutException: connect timed out"; removing the member: 2 For quick process termination detection, Oracle Coherence utilizes a feature called TcpRing which is a sparse collection of TCP/IP-based connections between different members in the cluster. Each member in the cluster is connected to at least one other member, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer, only heartbeat communications are sent once a second per each link. If a certain number of exceptions are thrown while trying to re-establish a connection, the member throwing the exceptions is removed from the cluster. Member 5 logs a packet timeout with Member 3 and cites witnesses Members 4 and 6: 2010-02-26 15:30:29.791/165.037 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=5): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ) 2010-02-26 15:30:29.798/165.044 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=5): Member departure confirmed by MemberSet(Size=2, BitSetCount=2 Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ); removing Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) Eventually we are left with two distinct clusters consisting of Members 1, 2, 3 and Members 4, 5, 6, respectively. In the latter cluster, Member 4 is promoted to senior member. The connection between the two switches is restored at 15:33. Upon the restoration of the connection, the cluster members immediately receive cluster heartbeats from the two senior members. In the case of Members 1, 2, and 3, the following is logged: 2010-02-26 15:33:14.970/369.066 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=Cluster, member=1): The member formerly known as Member(Id=4, Timestamp=2010-02-26 15:30:35.341, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored. Likewise for Members 4, 5, and 6: 2010-02-26 15:33:14.343/336.890 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=Cluster, member=4): The member formerly known as Member(Id=1, Timestamp=2010-02-26 15:30:31.64, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored. This message indicates that a senior heartbeat is being received from members that were previously removed from the cluster, in other words, something that should not be possible. For this reason, the recipients of these messages will initially ignore them. After several iterations of these messages, the existence of multiple clusters is acknowledged, thus triggering the panic protocol to reconcile this situation. When the presence of more than one cluster (i.e. Split-Brain) is detected by a Coherence member, the panic protocol is invoked in order to resolve the conflicting clusters and consolidate into a single cluster. The protocol consists of the removal of smaller clusters until there is one cluster remaining. In the case of equal size clusters, the one with the older Senior Member will survive. Member 1, being the oldest member, initiates the protocol: 2010-02-26 15:33:45.970/400.066 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=Cluster, member=1): An existence of a cluster island with senior Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) containing 3 nodes have been detected. Since this Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) is the senior of an older cluster island, the panic protocol is being activated to stop the other island's senior and all junior nodes that belong to it. Member 3 receives the panic: 2010-02-26 15:33:45.803/382.336 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=3): Received panic from senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) caused by Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member 4, the senior member of the younger cluster, receives the kill message from Member 3: 2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service. In turn, Member 4 requests the departure of its junior members 5 and 6: 2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service. 2010-02-26 15:33:43.343/349.015 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=6): Received a Kill message from a valid Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer); stopping cluster service. Once Members 4, 5, and 6 restart, they rejoin the original cluster with senior member 1. The log below is from Member 4. Note that it receives a different member id when it rejoins the cluster. 2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service. 2010-02-26 15:33:46.921/369.468 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Service Cluster left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Invocation:InvocationService, member=4): Service InvocationService left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=OptimisticCache, member=4): Service OptimisticCache left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=ReplicatedCache, member=4): Service ReplicatedCache left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=DistributedCache, member=4): Service DistributedCache left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Invocation:Management, member=4): Service Management left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service Management with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service DistributedCache with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service ReplicatedCache with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service OptimisticCache with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service InvocationService with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member(Id=6, Timestamp=2010-02-26 15:33:47.046, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) left Cluster with senior member 4 2010-02-26 15:33:49.218/371.765 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=main, member=n/a): Restarting cluster 2010-02-26 15:33:49.421/371.968 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a 2010-02-26 15:33:49.625/372.172 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=n/a): This Member(Id=5, Timestamp=2010-02-26 15:33:50.499, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) joined cluster "cluster:0xDDEB" with senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) Cool isn't it?

    Read the article

  • A Taxonomy of Numerical Methods v1

    - by JoshReuben
    Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms     Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

    Read the article

  • Toorcon 15 (2013)

    - by danx
    The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

    Read the article

  • Red Gate Coder interviews: Robin Hellen

    - by Michael Williamson
    Robin Hellen is a test engineer here at Red Gate, and is also the latest coder I’ve interviewed. We chatted about debugging code, the roles of software engineers and testers, and why Vala is currently his favourite programming language. How did you get started with programming?It started when I was about six. My dad’s a professional programmer, and he gave me and my sister one of his old computers and taught us a bit about programming. It was an old Amiga 500 with a variant of BASIC. I don’t think I ever successfully completed anything! It was just faffing around. I didn’t really get anywhere with it.But then presumably you did get somewhere with it at some point.At some point. The PC emerged as the dominant platform, and I learnt a bit of Visual Basic. I didn’t really do much, just a couple of quick hacky things. A bit of demo animation. Took me a long time to get anywhere with programming, really.When did you feel like you did start to get somewhere?I think it was when I started doing things for someone else, which was my sister’s final year of university project. She called up my dad two days before she was due to submit, saying “We need something to display a graph!”. Dad says, “I’m too busy, go talk to your brother”. So I hacked up this ugly piece of code, sent it off and they won a prize for that project. Apparently, the graph, the bit that I wrote, was the reason they won a prize! That was when I first felt that I’d actually done something that was worthwhile. That was my first real bit of code, and the ugliest code I’ve ever written. It’s basically an array of pre-drawn line elements that I shifted round the screen to draw a very spikey graph.When did you decide that programming might actually be something that you wanted to do as a career?It’s not really a decision I took, I always wanted to do something with computers. And I had to take a gap year for uni, so I was looking for twelve month internships. I applied to Red Gate, and they gave me a job as a tester. And that’s where I really started having to write code well. To a better standard that I had been up to that point.How did you find coming to Red Gate and working with other coders?I thought it was really nice. I learnt so much just from other people around. I think one of the things that’s really great is that people are just willing to help you learn. Instead of “Don’t you know that, you’re so stupid”, it’s “You can just do it this way”.If you could go back to the very start of that internship, is there something that you would tell yourself?Write shorter code. I have a tendency to write massive, many-thousand line files that I break out of right at the end. And then half-way through a project I’m doing something, I think “Where did I write that bit that does that thing?”, and it’s almost impossible to find. I wrote some horrendous code when I started. Just that principle, just keep things short. Even if looks a bit crazy to be jumping around all over the place all of the time, it’s actually a lot more understandable.And how do you hold yourself to that?Generally, if a function’s going off my screen, it’s probably too long. That’s what I tell myself, and within the team here we have code reviews, so the guys I’m with at the moment are pretty good at pulling me up on, “Doesn’t that look like it’s getting a bit long?”. It’s more just the subjective standard of readability than anything.So you’re an advocate of code review?Yes, definitely. Both to spot errors that you might have made, and to improve your knowledge. The person you’re reviewing will say “Oh, you could have done it that way”. That’s how we learn, by talking to others, and also just sharing knowledge of how your project works around the team, or even outside the team. Definitely a very firm advocate of code reviews.Do you think there’s more we could do with them?I don’t know. We’re struggling with how to add them as part of the process without it becoming too cumbersome. We’ve experimented with a few different ways, and we’ve not found anything that just works.To get more into the nitty gritty: how do you like to debug code?The first thing is to do it in my head. I’ll actually think what piece of code is likely to have caused that error, and take a quick look at it, just to see if there’s anything glaringly obvious there. The next thing I’ll probably do is throw in print statements, or throw some exceptions from various points, just to check: is it going through the code path I expect it to? A last resort is to actually debug code using a debugger.Why is the debugger the last resort?Probably because of the environments I learnt programming in. VB and early BASIC didn’t have much of a debugger, the only way to find out what your program was doing was to add print statements. Also, because a lot of the stuff I tend to work with is non-interactive, if it’s something that takes a long time to run, I can throw in the print statements, set a run off, go and do something else, and look at it again later, rather than trying to remember what happened at that point when I was debugging through it. So it also gives me the record of what happens. I hate just sitting there pressing F5, F5, continually. If you’re having to find out what your code is doing at each line, you’ve probably got a very wrong mental model of what your code’s doing, and you can find that out just as easily by inspecting a couple of values through the print statements.If I were on some codebase that you were also working on, what should I do to make it as easy as possible to understand?I’d say short and well-named methods. The one thing I like to do when I’m looking at code is to find out where a value comes from, and the more layers of indirection there are, particularly DI [dependency injection] frameworks, the harder it is to find out where something’s come from. I really hate that. I want to know if the value come from the user here or is a constant here, and if I can’t find that out, that makes code very hard to understand for me.As a tester, where do you think the split should lie between software engineers and testers?I think the split is less on areas of the code you write and more what you’re designing and creating. The developers put a structure on the code, while my major role is to say which tests we should have, whether we should test that, or it’s not worth testing that because it’s a tiny function in code that nobody’s ever actually going to see. So it’s not a split in the code, it’s a split in what you’re thinking about. Saying what code we should write, but alternatively what code we should take out.In your experience, do the software engineers tend to do much testing themselves?They tend to control the lowest layer of tests. And, depending on how the balance of people is in the team, they might write some of the higher levels of test. Or that might go to the testers. I’m the only tester on my team with three other developers, so they’ll be writing quite a lot of the actual test code, with input from me as to whether we should test that functionality, whereas on other teams, where it’s been more equal numbers, the testers have written pretty much all of the high level tests, just because that’s the best use of resource.If you could shuffle resources around however you liked, do you think that the developers should be writing those high-level tests?I think they should be writing them occasionally. It helps when they have an understanding of how testing code works and possibly what assumptions we’ve made in tests, and they can say “actually, it doesn’t work like that under the hood so you’ve missed this whole area”. It’s one of those agile things that everyone on the team should be at least comfortable doing the various jobs. So if the developers can write test code then I think that’s a very good thing.So you think testers should be able to write production code?Yes, although given most testers skills at coding, I wouldn’t advise it too much! I have written a few things, and I did make a few changes that have actually gone into our production code base. They’re not necessarily running every time but they are there. I think having that mix of skill sets is really useful. In some ways we’re using our own product to test itself, so being able to make those changes where it’s not working saves me a round-trip through the developers. It can be really annoying if the developers have no time to make a change, and I can’t touch the code.If the software engineers are consistently writing tests at all levels, what role do you think the role of a tester is?I think on a team like that, those distinctions aren’t quite so useful. There’ll be two cases. There’s either the case where the developers think they’ve written good tests, but you still need someone with a test engineer mind-set to go through the tests and validate that it’s a useful set, or the correct set for that code. Or they won’t actually be pure developers, they’ll have that mix of test ability in there.I think having slightly more distinct roles is useful. When it starts to blur, then you lose that view of the tests as a whole. The tester job is not to create tests, it’s to validate the quality of the product, and you don’t do that just by writing tests. There’s more things you’ve got to keep in your mind. And I think when you blur the roles, you start to lose that end of the tester.So because you’re working on those features, you lose that holistic view of the whole system?Yeah, and anyone who’s worked on the feature shouldn’t be testing it. You always need to have it tested it by someone who didn’t write it. Otherwise you’re a bit too close and you assume “yes, people will only use it that way”, but the tester will come along and go “how do people use this? How would our most idiotic user use this?”. I might not test that because it might be completely irrelevant. But it’s coming in and trying to have a different set of assumptions.Are you a believer that it should all be automated if possible?Not entirely. So an automated test is always better than a manual test for the long-term, but there’s still nothing that beats a human sitting in front of the application and thinking “What could I do at this point?”. The automated test is very good but they follow that strict path, and they never check anything off the path. The human tester will look at things that they weren’t expecting, whereas the automated test can only ever go “Is that value correct?” in many respects, and it won’t notice that on the other side of the screen you’re showing something completely wrong. And that value might have been checked independently, but you always find a few odd interactions when you’re going through something manually, and you always need to go through something manually to start with anyway, otherwise you won’t know where the important bits to write your automation are.When you’re doing that manual testing, do you think it’s important to do that across the entire product, or just the bits that you’ve touched recently?I think it’s important to do it mostly on the bits you’ve touched, but you can’t ignore the rest of the product. Unless you’re dealing with a very, very self-contained bit, you’re almost always encounter other bits of the product along the way. Most testers I know, even if they are looking at just one path, they’ll keep open and move around a bit anyway, just because they want to find something that’s broken. If we find that your path is right, we’ll go out and hunt something else.How do you think this fits into the idea of continuously deploying, so long as the tests pass?With deploying a website it’s a bit different because you can always pull it back. If you’re deploying an application to customers, when you’ve released it, it’s out there, you can’t pull it back. Someone’s going to keep it, no matter how hard you try there will be a few installations that stay around. So I’d always have at least a human element on that path. With websites, you could probably automate straight out, or at least straight out to an internal environment or a single server in a cloud of fifty that will serve some people. But I don’t think you should release to everyone just on automated tests passing.You’ve already mentioned using BASIC and C# — are there any other languages that you’ve used?I’ve used a few. That’s something that has changed more recently, I’ve become familiar with more languages. Before I started at Red Gate I learnt a bit of C. Then last year, I taught myself Python which I actually really enjoyed using. I’ve also come across another language called Vala, which is sort of a C#-like language. It’s basically a pre-processor for C, but it has very nice syntax. I think that’s currently my favourite language.Any particular reason for trying Vala?I have a completely Linux environment at home, and I’ve been looking for a nice language, and C# just doesn’t cut it because I won’t touch Mono. So, I was looking for something like C# but that was useable in an open source environment, and Vala’s what I found. C#’s got a few features that Vala doesn’t, and Vala’s got a few features where I think “It would be awesome if C# had that”.What are some of the features that it’s missing?Extension methods. And I think that’s the only one that really bugs me. I like to use them when I’m writing C# because it makes some things really easy, especially with libraries that you can’t touch the internals of. It doesn’t have method overloading, which is sometimes annoying.Where it does win over C#?Everything is non-nullable by default, you never have to check that something’s unexpectedly null.Also, Vala has code contracts. This is starting to come in C# 4, but the way it works in Vala is that you specify requirements in short phrases as part of your function signature and they stick to the signature, so that when you inherit it, it has exactly the same code contract as the base one, or when you inherit from an interface, you have to match the signature exactly. Just using those makes you think a bit more about how you’re writing your method, it’s not an afterthought when you’ve got contracts from base classes given to you, you can’t change it. Which I think is a lot nicer than the way C# handles it. When are those actually checked?They’re checked both at compile and run-time. The compile-time checking isn’t very strong yet, it’s quite a new feature in the compiler, and because it compiles down to C, you can write C code and interface with your methods, so you can bypass that compile-time check anyway. So there’s an extra runtime check, and if you violate one of the contracts at runtime, it’s game over for your program, there’s no exception to catch, it’s just goodbye!One thing I dislike about C# is the exceptions. You write a bit of code and fifty exceptions could come from any point in your ten lines, and you can’t mentally model how those exceptions are going to come out, and you can’t even predict them based on the functions you’re calling, because if you’ve accidentally got a derived class there instead of a base class, that can throw a completely different set of exceptions. So I’ve got no way of mentally modelling those, whereas in Vala they’re checked like Java, so you know only these exceptions can come out. You know in advance the error conditions.I think Raymond Chen on Old New Thing says “the only thing you know when you throw an exception is that you’re in an invalid state somewhere in your program, so just kill it and be done with it!”You said you’ve also learnt bits of Python. How did you find that compared to Vala and C#?Very different because of the dynamic typing. I’ve been writing a website for my own use. I’m quite into photography, so I take photos off my camera, post-process them, dump them in a file, and I get a webpage with all my thumbnails. So sort of like Picassa, but written by myself because I wanted something to learn Python with. There are some things that are really nice, I just found it really difficult to cope with the fact that I’m not quite sure what this object type that I’m passed is, I might not ever be sure, so it can randomly blow up on me. But once I train myself to ignore that and just say “well, I’m fairly sure it’s going to be something that looks like this, so I’ll use it like this”, then it’s quite nice.Any particular features that you’ve appreciated?I don’t like any particular feature, it’s just very straightforward to work with. It’s very quick to write something in, particularly as you don’t have to worry that you’ve changed something that affects a different part of the program. If you have, then that part blows up, but I can get this part working right now.If you were doing a big project, would you be willing to do it in Python rather than C# or Vala?I think I might be willing to try something bigger or long term with Python. We’re currently doing an ASP.NET MVC project on C#, and I don’t like the amount of reflection. There’s a lot of magic that pulls values out, and it’s all done under the scenes. It’s almost managed to put a dynamic type system on top of C#, which in many ways destroys the language to me, whereas if you’re already in a dynamic language, having things done dynamically is much more natural. In many ways, you get the worst of both worlds. I think for web projects, I would go with Python again, whereas for anything desktop, command-line or GUI-based, I’d probably go for C# or Vala, depending on what environment I’m in.It’s the fact that you can gain from the strong typing in ways that you can’t so much on the web app. Or, in a web app, you have to use dynamic typing at some point, or you have to write a hell of a lot of boilerplate, and I’d rather use the dynamic typing than write the boilerplate.What do you think separates great programmers from everyone else?Probably design choices. Choosing to write it a piece of code one way or another. For any given program you ask me to write, I could probably do it five thousand ways. A programmer who is capable will see four or five of them, and choose one of the better ones. The excellent programmer will see the largest proportion and manage to pick the best one very quickly without having to think too much about it. I think that’s probably what separates, is the speed at which they can see what’s the best path to write the program in. More Red Gater Coder interviews

    Read the article

  • Developing Spring Portlet for use inside Weblogic Portal / Webcenter Portal

    - by Murali Veligeti
    We need to understand the main difference between portlet workflow and servlet workflow.The main difference between portlet workflow and servlet workflow is that, the request to the portlet can have two distinct phases: 1) Action phase 2) Render phase. The Action phase is executed only once and is where any 'backend' changes or actions occur, such as making changes in a database. The Render phase then produces what is displayed to the user each time the display is refreshed. The critical point here is that for a single overall request, the action phase is executed only once, but the render phase may be executed multiple times. This provides a clean separation between the activities that modify the persistent state of your system and the activities that generate what is displayed to the user.The dual phases of portlet requests are one of the real strengths of the JSR-168 specification. For example, dynamic search results can be updated routinely on the display without the user explicitly re-running the search. Most other portlet MVC frameworks attempt to completely hide the two phases from the developer and make it look as much like traditional servlet development as possible - we think this approach removes one of the main benefits of using portlets. So, the separation of the two phases is preserved throughout the Spring Portlet MVC framework. The primary manifestation of this approach is that where the servlet version of the MVC classes will have one method that deals with the request, the portlet version of the MVC classes will have two methods that deal with the request: one for the action phase and one for the render phase. For example, where the servlet version of AbstractController has the handleRequestInternal(..) method, the portlet version of AbstractController has handleActionRequestInternal(..) and handleRenderRequestInternal(..) methods.The Spring Portlet Framework is designed around a DispatcherPortlet that dispatches requests to handlers, with configurable handler mappings and view resolution, just as the DispatcherServlet in the Spring Web Framework does.  Developing portlet.xml Let's start the sample development by creating the portlet.xml file in the /WebContent/WEB-INF/ folder as shown below: <?xml version="1.0" encoding="UTF-8"?> <portlet-app version="2.0" xmlns="http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <portlet> <portlet-name>SpringPortletName</portlet-name> <portlet-class>org.springframework.web.portlet.DispatcherPortlet</portlet-class> <supports> <mime-type>text/html</mime-type> <portlet-mode>view</portlet-mode> </supports> <portlet-info> <title>SpringPortlet</title> </portlet-info> </portlet> </portlet-app> DispatcherPortlet is responsible for handling every client request. When it receives a request, it finds out which Controller class should be used for handling this request, and then it calls its handleActionRequest() or handleRenderRequest() method based on the request processing phase. The Controller class executes business logic and returns a View name that should be used for rendering markup to the user. The DispatcherPortlet then forwards control to that View for actual markup generation. As you can see, DispatcherPortlet is the central dispatcher for use within Spring Portlet MVC Framework. Note that your portlet application can define more than one DispatcherPortlet. If it does so, then each of these portlets operates its own namespace, loading its application context and handler mapping. The DispatcherPortlet is also responsible for loading application context (Spring configuration file) for this portlet. First, it tries to check the value of the configLocation portlet initialization parameter. If that parameter is not specified, it takes the portlet name (that is, the value of the <portlet-name> element), appends "-portlet.xml" to it, and tries to load that file from the /WEB-INF folder. In the portlet.xml file, we did not specify the configLocation initialization parameter, so let's create SpringPortletName-portlet.xml file in the next section. Developing SpringPortletName-portlet.xml Create the SpringPortletName-portlet.xml file in the /WebContent/WEB-INF folder of your application as shown below: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> <bean id="viewResolver" class="org.springframework.web.servlet.view.InternalResourceViewResolver"> <property name="viewClass" value="org.springframework.web.servlet.view.JstlView"/> <property name="prefix" value="/jsp/"/> <property name="suffix" value=".jsp"/> </bean> <bean id="pointManager" class="com.wlp.spring.bo.internal.PointManagerImpl"> <property name="users"> <list> <ref bean="point1"/> <ref bean="point2"/> <ref bean="point3"/> <ref bean="point4"/> </list> </property> </bean> <bean id="point1" class="com.wlp.spring.bean.User"> <property name="name" value="Murali"/> <property name="points" value="6"/> </bean> <bean id="point2" class="com.wlp.spring.bean.User"> <property name="name" value="Sai"/> <property name="points" value="13"/> </bean> <bean id="point3" class="com.wlp.spring.bean.User"> <property name="name" value="Rama"/> <property name="points" value="43"/> </bean> <bean id="point4" class="com.wlp.spring.bean.User"> <property name="name" value="Krishna"/> <property name="points" value="23"/> </bean> <bean id="messageSource" class="org.springframework.context.support.ResourceBundleMessageSource"> <property name="basename" value="messages"/> </bean> <bean name="/users.htm" id="userController" class="com.wlp.spring.controller.UserController"> <property name="pointManager" ref="pointManager"/> </bean> <bean name="/pointincrease.htm" id="pointIncreaseController" class="com.wlp.spring.controller.IncreasePointsFormController"> <property name="sessionForm" value="true"/> <property name="pointManager" ref="pointManager"/> <property name="commandName" value="pointIncrease"/> <property name="commandClass" value="com.wlp.spring.bean.PointIncrease"/> <property name="formView" value="pointincrease"/> <property name="successView" value="users"/> </bean> <bean id="parameterMappingInterceptor" class="org.springframework.web.portlet.handler.ParameterMappingInterceptor" /> <bean id="portletModeParameterHandlerMapping" class="org.springframework.web.portlet.handler.PortletModeParameterHandlerMapping"> <property name="order" value="1" /> <property name="interceptors"> <list> <ref bean="parameterMappingInterceptor" /> </list> </property> <property name="portletModeParameterMap"> <map> <entry key="view"> <map> <entry key="pointincrease"> <ref bean="pointIncreaseController" /> </entry> <entry key="users"> <ref bean="userController" /> </entry> </map> </entry> </map> </property> </bean> <bean id="portletModeHandlerMapping" class="org.springframework.web.portlet.handler.PortletModeHandlerMapping"> <property name="order" value="2" /> <property name="portletModeMap"> <map> <entry key="view"> <ref bean="userController" /> </entry> </map> </property> </bean> </beans> The SpringPortletName-portlet.xml file is an application context file for your MVC portlet. It has a couple of bean definitions: viewController. At this point, remember that the viewController bean definition points to the com.ibm.developerworks.springmvc.ViewController.java class. portletModeHandlerMapping. As we discussed in the last section, whenever DispatcherPortlet gets a client request, it tries to find a suitable Controller class for handling that request. That is where PortletModeHandlerMapping comes into the picture. The PortletModeHandlerMapping class is a simple implementation of the HandlerMapping interface and is used by DispatcherPortlet to find a suitable Controller for every request. The PortletModeHandlerMapping class uses Portlet mode for the current request to find a suitable Controller class to use for handling the request. The portletModeMap property of portletModeHandlerMapping bean is the place where we map the Portlet mode name against the Controller class. In the sample code, we show that viewController is responsible for handling View mode requests. Developing UserController.java In the preceding section, you learned that the viewController bean is responsible for handling all the View mode requests. Your next step is to create the UserController.java class as shown below: public class UserController extends AbstractController { private PointManager pointManager; public void handleActionRequest(ActionRequest request, ActionResponse response) throws Exception { } public ModelAndView handleRenderRequest(RenderRequest request, RenderResponse response) throws ServletException, IOException { String now = (new java.util.Date()).toString(); Map<String, Object> myModel = new HashMap<String, Object>(); myModel.put("now", now); myModel.put("users", this.pointManager.getUsers()); return new ModelAndView("users", "model", myModel); } public void setPointManager(PointManager pointManager) { this.pointManager = pointManager; } } Every controller class in Spring Portlet MVC Framework must implement the org.springframework.web. portlet.mvc.Controller interface directly or indirectly. To make things easier, Spring Framework provides AbstractController class, which is the default implementation of the Controller interface. As a developer, you should always extend your controller from either AbstractController or one of its more specific subclasses. Any implementation of the Controller class should be reusable, thread-safe, and capable of handling multiple requests throughout the lifecycle of the portlet. In the sample code, we create the ViewController class by extending it from AbstractController. Because we don't want to do any action processing in the HelloSpringPortletMVC portlet, we override only the handleRenderRequest() method of AbstractController. Now, the only thing that HelloWorldPortletMVC should do is render the markup of View.jsp to the user when it receives a user request to do so. To do that, return the object of ModelAndView with a value of view equal to View. Developing web.xml According to Portlet Specification 1.0, every portlet application is also a Servlet Specification 2.3-compliant Web application, and it needs a Web application deployment descriptor (that is, web.xml). Let’s create the web.xml file in the /WEB-INF/ folder as shown in listing 4. Follow these steps: Open the existing web.xml file located at /WebContent/WEB-INF/web.xml. Replace the contents of this file with the code as shown below: <servlet> <servlet-name>ViewRendererServlet</servlet-name> <servlet-class>org.springframework.web.servlet.ViewRendererServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>ViewRendererServlet</servlet-name> <url-pattern>/WEB-INF/servlet/view</url-pattern> </servlet-mapping> <context-param> <param-name>contextConfigLocation</param-name> <param-value>/WEB-INF/applicationContext.xml</param-value> </context-param> <listener> <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class> </listener> The web.xml file for the sample portlet declares two things: ViewRendererServlet. The ViewRendererServlet is the bridge servlet for portlet support. During the render phase, DispatcherPortlet wraps PortletRequest into ServletRequest and forwards control to ViewRendererServlet for actual rendering. This process allows Spring Portlet MVC Framework to use the same View infrastructure as that of its servlet version, that is, Spring Web MVC Framework. ContextLoaderListener. The ContextLoaderListener class takes care of loading Web application context at the time of the Web application startup. The Web application context is shared by all the portlets in the portlet application. In case of duplicate bean definition, the bean definition in the portlet application context takes precedence over the Web application context. The ContextLoader class tries to read the value of the contextConfigLocation Web context parameter to find out the location of the context file. If the contextConfigLocation parameter is not set, then it uses the default value, which is /WEB-INF/applicationContext.xml, to load the context file. The Portlet Controller interface requires two methods that handle the two phases of a portlet request: the action request and the render request. The action phase should be capable of handling an action request and the render phase should be capable of handling a render request and returning an appropriate model and view. While the Controller interface is quite abstract, Spring Portlet MVC offers a lot of controllers that already contain a lot of the functionality you might need – most of these are very similar to controllers from Spring Web MVC. The Controller interface just defines the most common functionality required of every controller - handling an action request, handling a render request, and returning a model and a view. How rendering works As you know, when the user tries to access a page with PointSystemPortletMVC portlet on it or when the user performs some action on any other portlet on that page or tries to refresh that page, a render request is sent to the PointSystemPortletMVC portlet. In the sample code, because DispatcherPortlet is the main portlet class, Weblogic Portal / Webcenter Portal calls its render() method and then the following sequence of events occurs: The render() method of DispatcherPortlet calls the doDispatch() method, which in turn calls the doRender() method. After the doRenderService() method gets control, first it tries to find out the locale of the request by calling the PortletRequest.getLocale() method. This locale is used while making all the locale-related decisions for choices such as which resource bundle should be loaded or which JSP should be displayed to the user based on the locale. After that, the doRenderService() method starts iterating through all the HandlerMapping classes configured for this portlet, calling their getHandler() method to identify the appropriate Controller for handling this request. In the sample code, we have configured only PortletModeHandlerMapping as a HandlerMapping class. The PortletModeHandlerMapping class reads the value of the current portlet mode, and based on that, it finds out, the Controller class that should be used to handle this request. In the sample code, ViewController is configured to handle the View mode request so that the PortletModeHandlerMapping class returns the object of ViewController. After the object of ViewController is returned, the doRenderService() method calls its handleRenderRequestInternal() method. Implementation of the handleRenderRequestInternal() method in ViewController.java is very simple. It logs a message saying that it got control, and then it creates an instance of ModelAndView with a value equal to View and returns it to DispatcherPortlet. After control returns to doRenderService(), the next task is to figure out how to render View. For that, DispatcherPortlet starts iterating through all the ViewResolvers configured in your portlet application, calling their resolveViewName() method. In the sample code we have configured only one ViewResolver, InternalResourceViewResolver. When its resolveViewName() method is called with viewName, it tries to add /WEB-INF/jsp as a prefix to the view name and to add JSP as a suffix. And it checks if /WEB-INF/jsp/View.jsp exists. If it does exist, it returns the object of JstlView wrapping View.jsp. After control is returned to the doRenderService() method, it creates the object PortletRequestDispatcher, which points to /WEB-INF/servlet/view – that is, ViewRendererServlet. Then it sets the object of JstlView in the request and dispatches the request to ViewRendererServlet. After ViewRendererServlet gets control, it reads the JstlView object from the request attribute and creates another RequestDispatcher pointing to the /WEB-INF/jsp/View.jsp URL and passes control to it for actual markup generation. The markup generated by View.jsp is returned to user. At this point, you may question the need for ViewRendererServlet. Why can't DispatcherPortlet directly forward control to View.jsp? Adding ViewRendererServlet in between allows Spring Portlet MVC Framework to reuse the existing View infrastructure. You may appreciate this more when we discuss how easy it is to integrate Apache Tiles Framework with your Spring Portlet MVC Framework. The attached project SpringPortlet.zip should be used to import the project in to your OEPE Workspace. SpringPortlet_Jars.zip contains jar files required for the application. Project is written on Spring 2.5.  The same JSR 168 portlet should work on Webcenter Portal as well.  Downloads: Download WeblogicPotal Project which consists of Spring Portlet. Download Spring Jars In-addition to above you need to download Spring.jar (Spring2.5)

    Read the article

  • Informed TDD &ndash; Kata &ldquo;To Roman Numerals&rdquo;

    - by Ralf Westphal
    Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons.   PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

    Read the article

  • OBIA on Teradata - Part 2 Teradata DB Utilization for ETL

    - by Mohan Ramanuja
    Techniques to Monitor Queries and ETL Load CPU and Disk I/OSelect username, processor, sum(cputime), sum(diskio) from dbc.ampusage where processor ='1-0' order by 2,3 descgroup by 1,2;UserName    Vproc    Sum(CpuTime)    Sum(DiskIO)AC00916        10    6.71            24975 List Hardware ErrorsThere is a possibility that the system might have adequate disk space but out of free cylinders. In order to monitor hardware errors, the following query was used:Select * from dbc.Software_Event_Log where Text like '%restart%' order by thedate, thetime;For active users, usage of CPU and analysis of bad CPU to I/O ratiosSelect * from DBC.AMPUSAGE where username='CRMSTGC_DEV_ID';  AND SUBSTR(ACCOUNTNAME,6,3)='006'; Usage By I/OSelect AccountName, UserName, sum(CpuTime), sum(DiskIO)  from DBC.AMPUSAGE group by AccountName, UserName Order by Sum(DiskIO) desc; AccountName                       UserName                          Sum(CpuTime)  Sum(DiskIO)$M1$10062209                      AB89487                           374628.612    7821847$M1$10062210                      AB89487                           186692.244    2799412$M1$10062213                      COC_ETL_ID                        119531.068    331100426$M1$10062200                      AB63472                           118973.316    109881984$M1$10062204                      AB63472                           110825.356    94666986$M1$10062201                      AB63472                           110797.976    75016994$M1$10062202                      AC06936                           100924.448    407839702$M1$10062204                      AB67963                           0         4$M1$10062207                      AB91990                           0         2$M1$10062208                      AB63461                           0         24$M1$10062211                      AB84332                           0         6$M1$10062214                      AB65484                           0         8$M1$10062205                      AB77529                           0         58$M1$10062210                      AC04768                           0         36$M1$10062206                      AB54940                           0         22 Usage By CPUSelect AccountName, UserName, sum(CpuTime), sum(DiskIO)  from DBC.AMPUSAGE group by AccountName, UserName Order by Sum(CpuTime) desc;AccountName                       UserName                          Sum(CpuTime)  Sum(DiskIO)$M1$10062209                      AB89487                           374628.612    7821847$M1$10062210                      AB89487                           186692.244    2799412$M1$10062213                      COC_ETL_ID                        119531.068    331100426$M1$10062200                      AB63472                           118973.316    109881984$M1$10062204                      AB63472                           110825.356    94666986$M1$10062201                      AB63472                           110797.976    75016994$M2$100622105813004760047LOAD     T23_ETLPROC_ENT                   0 6$M1$10062215                      AA37720                           0     180$M1$10062209                      AB81670                           0     6Select count(distinct vproc) from dbc.ampusage;432select * from dbc.dbcinfo;AccountName     UserName     CpuTime DiskIO  CpuTimeNorm         Vproc VprocType    Model$M1$10062205                      CRM_STGC_DEV_ID                   0.32    1764    12.7423999023438    0     AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.28    1730    11.1495999145508    3     AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.304    1736    12.1052799072266    4    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.248    1731    9.87535992431641    7    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.332    1731    13.2202398986816    8    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.284    1712    11.3088799133301    11   AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.24    1757    9.55679992675781    12    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.292    1737    11.6274399108887    15   AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.268    1753    10.6717599182129    16   AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.276    1732    10.9903199157715    19   AMP      2580select * from dbc.dbcinfo;InfoKey    InfoDataLANGUAGE   SUPPORT           MODE    StandardRELEASE    12.00.03.03VERSION    12.00.03.01a

    Read the article

  • High Load mysql on Debian server stops every day. Why?

    - by Oleg Abrazhaev
    I have Debian server with 32 gb memory. And there is apache2, memcached and nginx on this server. Memory load always on maximum. Only 500m free. Most memory leak do MySql. Apache only 70 clients configured, other services small memory usage. When mysql use all memory it stops. And nothing works, need mysql reboot. Mysql configured use maximum 24 gb memory. I have hight weight InnoDB bases. (400000 rows, 30 gb). And on server multithread daemon, that makes many inserts in this tables, thats why InnoDB. There is my mysql config. [mysqld] # # * Basic Settings # default-time-zone = "+04:00" user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking default-time-zone='Europe/Moscow' # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. # # * Fine Tuning # #low_priority_updates = 1 concurrent_insert = ALWAYS wait_timeout = 600 interactive_timeout = 600 #normal key_buffer_size = 2024M #key_buffer_size = 1512M #70% hot cache key_cache_division_limit= 70 #16-32 max_allowed_packet = 32M #1-16M thread_stack = 8M #40-50 thread_cache_size = 50 #orderby groupby sort sort_buffer_size = 64M #same myisam_sort_buffer_size = 400M #temp table creates when group_by tmp_table_size = 3000M #tables in memory max_heap_table_size = 3000M #on disk open_files_limit = 10000 table_cache = 10000 join_buffer_size = 5M # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #myisam_use_mmap = 1 max_connections = 200 thread_concurrency = 8 # # * Query Cache Configuration # #more ignored query_cache_limit = 50M query_cache_size = 210M #on query cache query_cache_type = 1 # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log server-id = 1 log-bin = /var/lib/mysql/mysql-bin #replicate-do-db = gate log-bin-index = /var/lib/mysql/mysql-bin.index log-error = /var/lib/mysql/mysql-bin.err relay-log = /var/lib/mysql/relay-bin relay-log-info-file = /var/lib/mysql/relay-bin.info relay-log-index = /var/lib/mysql/relay-bin.index binlog_do_db = 24avia expire_logs_days = 10 max_binlog_size = 100M read_buffer_size = 4024288 innodb_buffer_pool_size = 5000M innodb_flush_log_at_trx_commit = 2 innodb_thread_concurrency = 8 table_definition_cache = 2000 group_concat_max_len = 16M #binlog_do_db = gate #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. #skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 500M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 32M key_buffer_size = 512M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/ Please, help me make it stable. Memory used /etc/mysql # free total used free shared buffers cached Mem: 32930800 32766424 164376 0 139208 23829196 -/+ buffers/cache: 8798020 24132780 Swap: 33553328 44660 33508668 Maybe my problem not in memory, but MySQL stops every day. As you can see, cache memory free 24 gb. Thank to Michael Hampton? for correction. Load overage on server 3.5. Maybe hdd or another problem? Maybe my config not optimal for 30gb InnoDB ? I'm already try mysqltuner and tunung-primer.sh , but they marked all green. Mysqltuner output mysqltuner >> MySQLTuner 1.0.1 - Major Hayden <[email protected]> >> Bug reports, feature requests, and downloads at http://mysqltuner.com/ >> Run with '--help' for additional options and output filtering -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.5.24-9-log [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 112G (Tables: 1528) [--] Data in InnoDB tables: 39G (Tables: 340) [--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17) [!!] Total fragmented tables: 344 -------- Performance Metrics ------------------------------------------------- [--] Up for: 8h 18m 33s (14M q [478.333 qps], 259K conn, TX: 9B, RX: 5B) [--] Reads / Writes: 84% / 16% [--] Total buffers: 10.5G global + 81.1M per thread (200 max threads) [OK] Maximum possible memory usage: 26.3G (83% of installed RAM) [OK] Slow queries: 1% (259K/14M) [!!] Highest connection usage: 100% (201/200) [OK] Key buffer size / total MyISAM indexes: 1.5G/5.6G [OK] Key buffer hit rate: 100.0% (6B cached / 1M reads) [OK] Query cache efficiency: 74.3% (8M cached / 11M selects) [OK] Query cache prunes per day: 0 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 247K sorts) [!!] Joins performed without indexes: 106025 [!!] Temporary tables created on disk: 49% (351K on disk / 715K total) [OK] Thread cache hit rate: 99% (249 created / 259K connections) [!!] Table cache hit rate: 15% (2K open / 13K opened) [OK] Open file limit used: 15% (3K/20K) [OK] Table locks acquired immediately: 99% (4M immediate / 4M locks) [!!] InnoDB data size / buffer pool: 39.4G/5.9G -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Reduce or eliminate persistent connections to reduce connection usage Adjust your join queries to always utilize indexes Temporary table size is already large - reduce result set size Reduce your SELECT DISTINCT queries without LIMIT clauses Increase table_cache gradually to avoid file descriptor limits Variables to adjust: max_connections (> 200) wait_timeout (< 600) interactive_timeout (< 600) join_buffer_size (> 5.0M, or always use indexes with joins) table_cache (> 10000) innodb_buffer_pool_size (>= 39G) Mysql primer output -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.5.24-9-log x86_64 Uptime = 0 days 8 hrs 20 min 50 sec Avg. qps = 478 Total Questions = 14369568 Threads Connected = 16 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1.000000 sec. You have 260626 out of 14369701 that take longer than 1.000000 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is enabled Binlog sync is not enabled, you could loose binlog records during a server crash WORKER THREADS Current thread_cache_size = 50 Current threads_cached = 45 Current threads_per_sec = 0 Historic threads_per_sec = 0 Your thread_cache_size is fine MAX CONNECTIONS Current max_connections = 200 Current threads_connected = 11 Historic max_used_connections = 201 The number of used connections is 100% of the configured maximum. You should raise max_connections INNODB STATUS Current InnoDB index space = 214 M Current InnoDB data space = 39.40 G Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 5.85 G Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 23.46 G Configured Max Per-thread Buffers : 15.84 G Configured Max Global Buffers : 7.54 G Configured Max Memory Limit : 23.39 G Physical Memory : 31.40 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 5.61 G Current key_buffer_size = 1.47 G Key cache miss rate is 1 : 5578 Key buffer free ratio = 77 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is enabled Current query_cache_size = 200 M Current query_cache_used = 101 M Current query_cache_limit = 50 M Current Query cache Memory fill ratio = 50.59 % Current query_cache_min_res_unit = 4 K MySQL won't cache query results that are larger than query_cache_limit in size SORT OPERATIONS Current sort_buffer_size = 64 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 5.00 M You have had 106606 queries where a join could not use an index properly You have had 8 joins without keys that check for key usage after each row join_buffer_size >= 4 M This is not advised You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. OPEN FILES LIMIT Current open_files_limit = 20210 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_open_cache = 10000 tables Current table_definition_cache = 2000 tables You have a total of 1910 tables You have 2151 open tables. The table_cache value seems to be fine TEMP TABLES Current max_heap_table_size = 2.92 G Current tmp_table_size = 2.92 G Of 366426 temp tables, 49% were created on disk Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables Note! BLOB and TEXT columns are not allow in memory tables. If you are using these columns raising these values might not impact your ratio of on disk temp tables. TABLE SCANS Current read_buffer_size = 3 M Current table scan ratio = 2846 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 185 You may benefit from selective use of InnoDB. If you have long running SELECT's against MyISAM tables and perform frequent updates consider setting 'low_priority_updates=1'

    Read the article

  • Using glDrawElements does not draw my .obj file

    - by Hallik
    I am trying to correctly import an .OBJ file from 3ds Max. I got this working using glBegin() & glEnd() from a previous question on here, but had really poor performance obviously, so I am trying to use glDrawElements now. I am importing a chessboard, its game pieces, etc. The board, each game piece, and each square on the board is stored in a struct GroupObject. The way I store the data is like this: struct Vertex { float position[3]; float texCoord[2]; float normal[3]; float tangent[4]; float bitangent[3]; }; struct Material { float ambient[4]; float diffuse[4]; float specular[4]; float shininess; // [0 = min shininess, 1 = max shininess] float alpha; // [0 = fully transparent, 1 = fully opaque] std::string name; std::string colorMapFilename; std::string bumpMapFilename; std::vector<int> indices; int id; }; //A chess piece or square struct GroupObject { std::vector<Material *> materials; std::string objectName; std::string groupName; int index; }; All vertices are triangles, so there are always 3 points. When I am looping through the faces f section in the obj file, I store the v0, v1, & v2 in the Material-indices. (I am doing v[0-2] - 1 to account for obj files being 1-based and my vectors being 0-based. So when I get to the render method, I am trying to loop through every object, which loops through every material attached to that object. I set the material information and try and use glDrawElements. However, the screen is black. I was able to draw the model just fine when I looped through each distinct material with all the indices associated with that material, and it drew the model fine. This time around, so I can use the stencil buffer for selecting GroupObjects, I changed up the loop, but the screen is black. Here is my render loop. The only thing I changed was the for loop(s) so they go through each object, and each material in the object in turn. void GLEngine::drawModel() { glEnable(GL_BLEND); glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); // Vertex arrays setup glEnableClientState( GL_VERTEX_ARRAY ); glVertexPointer(3, GL_FLOAT, model.getVertexSize(), model.getVertexBuffer()->position); glEnableClientState( GL_NORMAL_ARRAY ); glNormalPointer(GL_FLOAT, model.getVertexSize(), model.getVertexBuffer()->normal); glClientActiveTexture( GL_TEXTURE0 ); glEnableClientState( GL_TEXTURE_COORD_ARRAY ); glTexCoordPointer(2, GL_FLOAT, model.getVertexSize(), model.getVertexBuffer()->texCoord); glUseProgram(blinnPhongShader); objects = model.getObjects(); // Loop through objects... for( int i=0 ; i < objects.size(); i++ ) { ModelOBJ::GroupObject *object = objects[i]; // Loop through materials used by object... for( int j=0 ; j<object->materials.size() ; j++ ) { ModelOBJ::Material *pMaterial = object->materials[j]; glMaterialfv(GL_FRONT_AND_BACK, GL_AMBIENT, pMaterial->ambient); glMaterialfv(GL_FRONT_AND_BACK, GL_DIFFUSE, pMaterial->diffuse); glMaterialfv(GL_FRONT_AND_BACK, GL_SPECULAR, pMaterial->specular); glMaterialf(GL_FRONT_AND_BACK, GL_SHININESS, pMaterial->shininess * 128.0f); // Draw faces, letting OpenGL loop through them glDrawElements( GL_TRIANGLES, pMaterial->indices.size(), GL_UNSIGNED_INT, &pMaterial->indices ); } } if (model.hasNormals()) glDisableClientState(GL_NORMAL_ARRAY); if (model.hasTextureCoords()) { glClientActiveTexture(GL_TEXTURE0); glDisableClientState(GL_TEXTURE_COORD_ARRAY); } if (model.hasPositions()) glDisableClientState(GL_VERTEX_ARRAY); glBindTexture(GL_TEXTURE_2D, 0); glUseProgram(0); glDisable(GL_BLEND); } I don't know what I am missing that's important. If it's also helpful, here is where I read a 'f' face line and store the info in the obj importer in the pMaterial-indices. else if (sscanf(buffer, "%d/%d/%d", &v[0], &vt[0], &vn[0]) == 3) // v/vt/vn { fscanf(pFile, "%d/%d/%d", &v[1], &vt[1], &vn[1]); fscanf(pFile, "%d/%d/%d", &v[2], &vt[2], &vn[2]); v[0] = (v[0] < 0) ? v[0] + numVertices - 1 : v[0] - 1; v[1] = (v[1] < 0) ? v[1] + numVertices - 1 : v[1] - 1; v[2] = (v[2] < 0) ? v[2] + numVertices - 1 : v[2] - 1; currentMaterial->indices.push_back(v[0]); currentMaterial->indices.push_back(v[1]); currentMaterial->indices.push_back(v[2]); Again, this worked drawing it all together only separated by materials, so I haven't changed code anywhere else except added the indices to the materials within objects, and the loop in the draw method. Before everything was showing up black, now with the setup as above, I am getting an unhandled exception write violation on the glDrawElements line. I did a breakpoint there, and there are over 600 elements in the pMaterial-indices array, so it's not empty, it has indices to use. When I set the glDrawElements like this, it gives me the black screen but no errors glDrawElements( GL_TRIANGLES, pMaterial->indices.size(), GL_UNSIGNED_INT, &pMaterial->indices[0] ); I have also tried adding this when I loop through the faces on import if ( currentMaterial->startIndex == -1 ) currentMaterial->startIndex = v[0]; currentMaterial->triangleCount++; And when drawing... //in draw method glDrawElements( GL_TRIANGLES, pMaterial->triangleCount * 3, GL_UNSIGNED_INT, model.getIndexBuffer() + pMaterial->startIndex );

    Read the article

  • java.lang.NullPointerException exception in my controller file (using Spring Hibernate Maven)

    - by mrjayviper
    The problem doesn't seemed to have anything to do with Hibernate. As I've commented the Hibernate stuff but I'm still getting it. If I comment out this line message = staffDAO.searchForStaff(search); in my controller file, it goes through ok. But I don't see anything wrong with searchForStaff function. It's a very simple function that just returns the string "test" and run system.out.println("test"). Can you please help? thanks But this is the error that I'm getting: SEVERE: Servlet.service() for servlet [spring] in context with path [/directorymaven] threw exception [Request processing failed; nested exception is java.lang.NullPointerException] with root cause java.lang.NullPointerException at org.flinders.staffdirectory.controllers.SearchController.showSearchResults(SearchController.java:25) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:219) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:100) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:604) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:565) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) My spring-servlet xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xmlns:mvc="http://www.springframework.org/schema/mvc" xmlns:p="http://www.springframework.org/schema/p" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd"> <context:component-scan base-package="org.flinders.staffdirectory.controllers" /> <mvc:annotation-driven /> <mvc:resources mapping="/resources/**" location="/resources/" /> <tx:annotation-driven /> <bean id="propertyConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer" p:location="/WEB-INF/spring.properties" /> <bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" p:driverClassName="${jdbc.driverClassName}" p:url="${jdbc.databaseurl}" p:username="${jdbc.username}" p:password="${jdbc.password}" /> <bean id="sessionFactory" class="org.springframework.orm.hibernate4.LocalSessionFactoryBean" p:dataSource-ref="dataSource" p:configLocation="${hibernate.config}" p:packagesToScan="org.flinders.staffdirectory"/> <bean id="transactionManager" class="org.springframework.orm.hibernate4.HibernateTransactionManager" p:sessionFactory-ref="sessionFactory" /> <bean id="viewResolver" class="org.springframework.web.servlet.view.UrlBasedViewResolver" p:viewClass="org.springframework.web.servlet.view.tiles2.TilesView" /> <bean id="tilesConfigurer" class="org.springframework.web.servlet.view.tiles2.TilesConfigurer" p:definitions="/WEB-INF/tiles.xml" /> <bean id="staffDAO" class="org.flinders.staffdirectory.dao.StaffDAO" p:sessionFactory-ref="sessionFactory" /> <!-- <bean id="staffService" class="org.flinders.staffdirectory.services.StaffServiceImpl" p:staffDAO-ref="staffDAO" />--> </beans> This is my controller file package org.flinders.staffdirectory.controllers; import java.util.List; //import org.flinders.staffdirectory.models.database.SearchResult; import org.flinders.staffdirectory.models.misc.Search; import org.flinders.staffdirectory.dao.StaffDAO; //import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.ModelAttribute; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.servlet.ModelAndView; @Controller public class SearchController { //@Autowired private StaffDAO staffDAO; private String message; @RequestMapping("/SearchStaff") public ModelAndView showSearchResults(@ModelAttribute Search search) { //List<SearchResult> searchResults = message = staffDAO.searchForStaff(search); //System.out.println(search.getSurname()); return new ModelAndView("search/SearchForm", "Search", new Search()); //return new ModelAndView("search/SearchResults", "searchResults", searchResults); } @RequestMapping("/SearchForm") public ModelAndView showSearchForm() { return new ModelAndView("search/SearchForm", "search", new Search()); } } my dao class package org.flinders.staffdirectory.dao; import java.util.List; import org.hibernate.SessionFactory; //import org.springframework.beans.factory.annotation.Autowired; import org.flinders.staffdirectory.models.database.SearchResult; import org.flinders.staffdirectory.models.misc.Search; public class StaffDAO { //@Autowired private SessionFactory sessionFactory; public void setSessionFactory(SessionFactory sessionFactory) { this.sessionFactory = sessionFactory; } public String searchForStaff(Search search) { /*String SQL = "select distinct telsumm_id as id, telsumm_parent_id as parentId, telsumm_name_title as title, (case when substr(telsumm_surname, length(telsumm_surname) - 1, 1) = ',' then substr(telsumm_surname, 1, length(telsumm_surname) - 1) else telsumm_surname end) as surname, telsumm_preferred_name as firstname, nvl(telsumm_tele_number, '-') as telephoneNumber, nvl(telsumm_role, '-') as role, telsumm_display_department as department, lower(telsumm_entity_type) as entityType from teldirt.teld_summary where (telsumm_search_surname is not null) and not (nvl(telsumm_tele_directory,'xxx') IN ('N','S','D')) and not (telsumm_tele_number IS NULL AND telsumm_alias IS NULL) and (telsumm_alias_list = 'Y' OR (telsumm_tele_directory IN ('A','B'))) and ((nvl(telsumm_system_id_end,sysdate+1) > SYSDATE and telsumm_entity_type = 'P') or (telsumm_entity_type = 'N')) and (telsumm_search_department NOT like 'SPONSOR%')"; if (search.getSurname().length() > 0) { SQL += " and (telsumm_search_surname like '" + search.getSurname().toUpperCase() + "%')"; } if (search.getSurnameLike().length() > 0) { SQL += " and (telsumm_search_soundex like soundex(('%" + search.getSurnameLike().toUpperCase() + "%'))"; } if (search.getFirstname().length() > 0) { SQL += " and (telsumm_search_preferred_name like '" + search.getFirstname().toUpperCase() + "%' or telsumm_search_first_name like '" + search.getFirstname() + "%')"; } if (search.getTelephoneNumber().length() > 0) { SQL += " and (telsumm_tele_number like '" + search.getTelephoneNumber() + "%')"; } if (search.getDepartment().length() > 0) { SQL += " and (telsumm_search_department like '" + search.getDepartment().toUpperCase() + "%')"; } if (search.getRole().length() > 0) { SQL += " and (telsumm_search_role like '" + search.getRole().toUpperCase() + "%')"; } SQL += " order by surname, firstname"; List<Object[]> list = (List<Object[]>) sessionFactory.getCurrentSession().createQuery(SQL).list(); for(int j=0;j<list.size();j++){ Object [] obj= (Object[])list.get(j); for(int i=0;i<obj.length;i++) System.out.println(obj[i]); }*/ System.out.println("test"); return "test"; } }

    Read the article

  • New features of C# 4.0

    This article covers New features of C# 4.0. Article has been divided into below sections. Introduction. Dynamic Lookup. Named and Optional Arguments. Features for COM interop. Variance. Relationship with Visual Basic. Resources. Other interested readings… 22 New Features of Visual Studio 2008 for .NET Professionals 50 New Features of SQL Server 2008 IIS 7.0 New features Introduction It is now close to a year since Microsoft Visual C# 3.0 shipped as part of Visual Studio 2008. In the VS Managed Languages team we are hard at work on creating the next version of the language (with the unsurprising working title of C# 4.0), and this document is a first public description of the planned language features as we currently see them. Please be advised that all this is in early stages of production and is subject to change. Part of the reason for sharing our plans in public so early is precisely to get the kind of feedback that will cause us to improve the final product before it rolls out. Simultaneously with the publication of this whitepaper, a first public CTP (community technology preview) of Visual Studio 2010 is going out as a Virtual PC image for everyone to try. Please use it to play and experiment with the features, and let us know of any thoughts you have. We ask for your understanding and patience working with very early bits, where especially new or newly implemented features do not have the quality or stability of a final product. The aim of the CTP is not to give you a productive work environment but to give you the best possible impression of what we are working on for the next release. The CTP contains a number of walkthroughs, some of which highlight the new language features of C# 4.0. Those are excellent for getting a hands-on guided tour through the details of some common scenarios for the features. You may consider this whitepaper a companion document to these walkthroughs, complementing them with a focus on the overall language features and how they work, as opposed to the specifics of the concrete scenarios. C# 4.0 The major theme for C# 4.0 is dynamic programming. Increasingly, objects are “dynamic” in the sense that their structure and behavior is not captured by a static type, or at least not one that the compiler knows about when compiling your program. Some examples include a. objects from dynamic programming languages, such as Python or Ruby b. COM objects accessed through IDispatch c. ordinary .NET types accessed through reflection d. objects with changing structure, such as HTML DOM objects While C# remains a statically typed language, we aim to vastly improve the interaction with such objects. A secondary theme is co-evolution with Visual Basic. Going forward we will aim to maintain the individual character of each language, but at the same time important new features should be introduced in both languages at the same time. They should be differentiated more by style and feel than by feature set. The new features in C# 4.0 fall into four groups: Dynamic lookup Dynamic lookup allows you to write method, operator and indexer calls, property and field accesses, and even object invocations which bypass the C# static type checking and instead gets resolved at runtime. Named and optional parameters Parameters in C# can now be specified as optional by providing a default value for them in a member declaration. When the member is invoked, optional arguments can be omitted. Furthermore, any argument can be passed by parameter name instead of position. COM specific interop features Dynamic lookup as well as named and optional parameters both help making programming against COM less painful than today. On top of that, however, we are adding a number of other small features that further improve the interop experience. Variance It used to be that an IEnumerable<string> wasn’t an IEnumerable<object>. Now it is – C# embraces type safe “co-and contravariance” and common BCL types are updated to take advantage of that. Dynamic Lookup Dynamic lookup allows you a unified approach to invoking things dynamically. With dynamic lookup, when you have an object in your hand you do not need to worry about whether it comes from COM, IronPython, the HTML DOM or reflection; you just apply operations to it and leave it to the runtime to figure out what exactly those operations mean for that particular object. This affords you enormous flexibility, and can greatly simplify your code, but it does come with a significant drawback: Static typing is not maintained for these operations. A dynamic object is assumed at compile time to support any operation, and only at runtime will you get an error if it wasn’t so. Oftentimes this will be no loss, because the object wouldn’t have a static type anyway, in other cases it is a tradeoff between brevity and safety. In order to facilitate this tradeoff, it is a design goal of C# to allow you to opt in or opt out of dynamic behavior on every single call. The dynamic type C# 4.0 introduces a new static type called dynamic. When you have an object of type dynamic you can “do things to it” that are resolved only at runtime: dynamic d = GetDynamicObject(…); d.M(7); The C# compiler allows you to call a method with any name and any arguments on d because it is of type dynamic. At runtime the actual object that d refers to will be examined to determine what it means to “call M with an int” on it. The type dynamic can be thought of as a special version of the type object, which signals that the object can be used dynamically. It is easy to opt in or out of dynamic behavior: any object can be implicitly converted to dynamic, “suspending belief” until runtime. Conversely, there is an “assignment conversion” from dynamic to any other type, which allows implicit conversion in assignment-like constructs: dynamic d = 7; // implicit conversion int i = d; // assignment conversion Dynamic operations Not only method calls, but also field and property accesses, indexer and operator calls and even delegate invocations can be dispatched dynamically: dynamic d = GetDynamicObject(…); d.M(7); // calling methods d.f = d.P; // getting and settings fields and properties d[“one”] = d[“two”]; // getting and setting thorugh indexers int i = d + 3; // calling operators string s = d(5,7); // invoking as a delegate The role of the C# compiler here is simply to package up the necessary information about “what is being done to d”, so that the runtime can pick it up and determine what the exact meaning of it is given an actual object d. Think of it as deferring part of the compiler’s job to runtime. The result of any dynamic operation is itself of type dynamic. Runtime lookup At runtime a dynamic operation is dispatched according to the nature of its target object d: COM objects If d is a COM object, the operation is dispatched dynamically through COM IDispatch. This allows calling to COM types that don’t have a Primary Interop Assembly (PIA), and relying on COM features that don’t have a counterpart in C#, such as indexed properties and default properties. Dynamic objects If d implements the interface IDynamicObject d itself is asked to perform the operation. Thus by implementing IDynamicObject a type can completely redefine the meaning of dynamic operations. This is used intensively by dynamic languages such as IronPython and IronRuby to implement their own dynamic object models. It will also be used by APIs, e.g. by the HTML DOM to allow direct access to the object’s properties using property syntax. Plain objects Otherwise d is a standard .NET object, and the operation will be dispatched using reflection on its type and a C# “runtime binder” which implements C#’s lookup and overload resolution semantics at runtime. This is essentially a part of the C# compiler running as a runtime component to “finish the work” on dynamic operations that was deferred by the static compiler. Example Assume the following code: dynamic d1 = new Foo(); dynamic d2 = new Bar(); string s; d1.M(s, d2, 3, null); Because the receiver of the call to M is dynamic, the C# compiler does not try to resolve the meaning of the call. Instead it stashes away information for the runtime about the call. This information (often referred to as the “payload”) is essentially equivalent to: “Perform an instance method call of M with the following arguments: 1. a string 2. a dynamic 3. a literal int 3 4. a literal object null” At runtime, assume that the actual type Foo of d1 is not a COM type and does not implement IDynamicObject. In this case the C# runtime binder picks up to finish the overload resolution job based on runtime type information, proceeding as follows: 1. Reflection is used to obtain the actual runtime types of the two objects, d1 and d2, that did not have a static type (or rather had the static type dynamic). The result is Foo for d1 and Bar for d2. 2. Method lookup and overload resolution is performed on the type Foo with the call M(string,Bar,3,null) using ordinary C# semantics. 3. If the method is found it is invoked; otherwise a runtime exception is thrown. Overload resolution with dynamic arguments Even if the receiver of a method call is of a static type, overload resolution can still happen at runtime. This can happen if one or more of the arguments have the type dynamic: Foo foo = new Foo(); dynamic d = new Bar(); var result = foo.M(d); The C# runtime binder will choose between the statically known overloads of M on Foo, based on the runtime type of d, namely Bar. The result is again of type dynamic. The Dynamic Language Runtime An important component in the underlying implementation of dynamic lookup is the Dynamic Language Runtime (DLR), which is a new API in .NET 4.0. The DLR provides most of the infrastructure behind not only C# dynamic lookup but also the implementation of several dynamic programming languages on .NET, such as IronPython and IronRuby. Through this common infrastructure a high degree of interoperability is ensured, but just as importantly the DLR provides excellent caching mechanisms which serve to greatly enhance the efficiency of runtime dispatch. To the user of dynamic lookup in C#, the DLR is invisible except for the improved efficiency. However, if you want to implement your own dynamically dispatched objects, the IDynamicObject interface allows you to interoperate with the DLR and plug in your own behavior. This is a rather advanced task, which requires you to understand a good deal more about the inner workings of the DLR. For API writers, however, it can definitely be worth the trouble in order to vastly improve the usability of e.g. a library representing an inherently dynamic domain. Open issues There are a few limitations and things that might work differently than you would expect. · The DLR allows objects to be created from objects that represent classes. However, the current implementation of C# doesn’t have syntax to support this. · Dynamic lookup will not be able to find extension methods. Whether extension methods apply or not depends on the static context of the call (i.e. which using clauses occur), and this context information is not currently kept as part of the payload. · Anonymous functions (i.e. lambda expressions) cannot appear as arguments to a dynamic method call. The compiler cannot bind (i.e. “understand”) an anonymous function without knowing what type it is converted to. One consequence of these limitations is that you cannot easily use LINQ queries over dynamic objects: dynamic collection = …; var result = collection.Select(e => e + 5); If the Select method is an extension method, dynamic lookup will not find it. Even if it is an instance method, the above does not compile, because a lambda expression cannot be passed as an argument to a dynamic operation. There are no plans to address these limitations in C# 4.0. Named and Optional Arguments Named and optional parameters are really two distinct features, but are often useful together. Optional parameters allow you to omit arguments to member invocations, whereas named arguments is a way to provide an argument using the name of the corresponding parameter instead of relying on its position in the parameter list. Some APIs, most notably COM interfaces such as the Office automation APIs, are written specifically with named and optional parameters in mind. Up until now it has been very painful to call into these APIs from C#, with sometimes as many as thirty arguments having to be explicitly passed, most of which have reasonable default values and could be omitted. Even in APIs for .NET however you sometimes find yourself compelled to write many overloads of a method with different combinations of parameters, in order to provide maximum usability to the callers. Optional parameters are a useful alternative for these situations. Optional parameters A parameter is declared optional simply by providing a default value for it: public void M(int x, int y = 5, int z = 7); Here y and z are optional parameters and can be omitted in calls: M(1, 2, 3); // ordinary call of M M(1, 2); // omitting z – equivalent to M(1, 2, 7) M(1); // omitting both y and z – equivalent to M(1, 5, 7) Named and optional arguments C# 4.0 does not permit you to omit arguments between commas as in M(1,,3). This could lead to highly unreadable comma-counting code. Instead any argument can be passed by name. Thus if you want to omit only y from a call of M you can write: M(1, z: 3); // passing z by name or M(x: 1, z: 3); // passing both x and z by name or even M(z: 3, x: 1); // reversing the order of arguments All forms are equivalent, except that arguments are always evaluated in the order they appear, so in the last example the 3 is evaluated before the 1. Optional and named arguments can be used not only with methods but also with indexers and constructors. Overload resolution Named and optional arguments affect overload resolution, but the changes are relatively simple: A signature is applicable if all its parameters are either optional or have exactly one corresponding argument (by name or position) in the call which is convertible to the parameter type. Betterness rules on conversions are only applied for arguments that are explicitly given – omitted optional arguments are ignored for betterness purposes. If two signatures are equally good, one that does not omit optional parameters is preferred. M(string s, int i = 1); M(object o); M(int i, string s = “Hello”); M(int i); M(5); Given these overloads, we can see the working of the rules above. M(string,int) is not applicable because 5 doesn’t convert to string. M(int,string) is applicable because its second parameter is optional, and so, obviously are M(object) and M(int). M(int,string) and M(int) are both better than M(object) because the conversion from 5 to int is better than the conversion from 5 to object. Finally M(int) is better than M(int,string) because no optional arguments are omitted. Thus the method that gets called is M(int). Features for COM interop Dynamic lookup as well as named and optional parameters greatly improve the experience of interoperating with COM APIs such as the Office Automation APIs. In order to remove even more of the speed bumps, a couple of small COM-specific features are also added to C# 4.0. Dynamic import Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object from context, but explicitly has to perform a cast on the returned value to make use of that knowledge. These casts are so common that they constitute a major nuisance. In order to facilitate a smoother experience, you can now choose to import these COM APIs in such a way that variants are instead represented using the type dynamic. In other words, from your point of view, COM signatures now have occurrences of dynamic instead of object in them. This means that you can easily access members directly off a returned object, or you can assign it to a strongly typed local variable without having to cast. To illustrate, you can now say excel.Cells[1, 1].Value = "Hello"; instead of ((Excel.Range)excel.Cells[1, 1]).Value2 = "Hello"; and Excel.Range range = excel.Cells[1, 1]; instead of Excel.Range range = (Excel.Range)excel.Cells[1, 1]; Compiling without PIAs Primary Interop Assemblies are large .NET assemblies generated from COM interfaces to facilitate strongly typed interoperability. They provide great support at design time, where your experience of the interop is as good as if the types where really defined in .NET. However, at runtime these large assemblies can easily bloat your program, and also cause versioning issues because they are distributed independently of your application. The no-PIA feature allows you to continue to use PIAs at design time without having them around at runtime. Instead, the C# compiler will bake the small part of the PIA that a program actually uses directly into its assembly. At runtime the PIA does not have to be loaded. Omitting ref Because of a different programming model, many COM APIs contain a lot of reference parameters. Contrary to refs in C#, these are typically not meant to mutate a passed-in argument for the subsequent benefit of the caller, but are simply another way of passing value parameters. It therefore seems unreasonable that a C# programmer should have to create temporary variables for all such ref parameters and pass these by reference. Instead, specifically for COM methods, the C# compiler will allow you to pass arguments by value to such a method, and will automatically generate temporary variables to hold the passed-in values, subsequently discarding these when the call returns. In this way the caller sees value semantics, and will not experience any side effects, but the called method still gets a reference. Open issues A few COM interface features still are not surfaced in C#. Most notably these include indexed properties and default properties. As mentioned above these will be respected if you access COM dynamically, but statically typed C# code will still not recognize them. There are currently no plans to address these remaining speed bumps in C# 4.0. Variance An aspect of generics that often comes across as surprising is that the following is illegal: IList<string> strings = new List<string>(); IList<object> objects = strings; The second assignment is disallowed because strings does not have the same element type as objects. There is a perfectly good reason for this. If it were allowed you could write: objects[0] = 5; string s = strings[0]; Allowing an int to be inserted into a list of strings and subsequently extracted as a string. This would be a breach of type safety. However, there are certain interfaces where the above cannot occur, notably where there is no way to insert an object into the collection. Such an interface is IEnumerable<T>. If instead you say: IEnumerable<object> objects = strings; There is no way we can put the wrong kind of thing into strings through objects, because objects doesn’t have a method that takes an element in. Variance is about allowing assignments such as this in cases where it is safe. The result is that a lot of situations that were previously surprising now just work. Covariance In .NET 4.0 the IEnumerable<T> interface will be declared in the following way: public interface IEnumerable<out T> : IEnumerable { IEnumerator<T> GetEnumerator(); } public interface IEnumerator<out T> : IEnumerator { bool MoveNext(); T Current { get; } } The “out” in these declarations signifies that the T can only occur in output position in the interface – the compiler will complain otherwise. In return for this restriction, the interface becomes “covariant” in T, which means that an IEnumerable<A> is considered an IEnumerable<B> if A has a reference conversion to B. As a result, any sequence of strings is also e.g. a sequence of objects. This is useful e.g. in many LINQ methods. Using the declarations above: var result = strings.Union(objects); // succeeds with an IEnumerable<object> This would previously have been disallowed, and you would have had to to some cumbersome wrapping to get the two sequences to have the same element type. Contravariance Type parameters can also have an “in” modifier, restricting them to occur only in input positions. An example is IComparer<T>: public interface IComparer<in T> { public int Compare(T left, T right); } The somewhat baffling result is that an IComparer<object> can in fact be considered an IComparer<string>! It makes sense when you think about it: If a comparer can compare any two objects, it can certainly also compare two strings. This property is referred to as contravariance. A generic type can have both in and out modifiers on its type parameters, as is the case with the Func<…> delegate types: public delegate TResult Func<in TArg, out TResult>(TArg arg); Obviously the argument only ever comes in, and the result only ever comes out. Therefore a Func<object,string> can in fact be used as a Func<string,object>. Limitations Variant type parameters can only be declared on interfaces and delegate types, due to a restriction in the CLR. Variance only applies when there is a reference conversion between the type arguments. For instance, an IEnumerable<int> is not an IEnumerable<object> because the conversion from int to object is a boxing conversion, not a reference conversion. Also please note that the CTP does not contain the new versions of the .NET types mentioned above. In order to experiment with variance you have to declare your own variant interfaces and delegate types. COM Example Here is a larger Office automation example that shows many of the new C# features in action. using System; using System.Diagnostics; using System.Linq; using Excel = Microsoft.Office.Interop.Excel; using Word = Microsoft.Office.Interop.Word; class Program { static void Main(string[] args) { var excel = new Excel.Application(); excel.Visible = true; excel.Workbooks.Add(); // optional arguments omitted excel.Cells[1, 1].Value = "Process Name"; // no casts; Value dynamically excel.Cells[1, 2].Value = "Memory Usage"; // accessed var processes = Process.GetProcesses() .OrderByDescending(p =&gt; p.WorkingSet) .Take(10); int i = 2; foreach (var p in processes) { excel.Cells[i, 1].Value = p.ProcessName; // no casts excel.Cells[i, 2].Value = p.WorkingSet; // no casts i++; } Excel.Range range = excel.Cells[1, 1]; // no casts Excel.Chart chart = excel.ActiveWorkbook.Charts. Add(After: excel.ActiveSheet); // named and optional arguments chart.ChartWizard( Source: range.CurrentRegion, Title: "Memory Usage in " + Environment.MachineName); //named+optional chart.ChartStyle = 45; chart.CopyPicture(Excel.XlPictureAppearance.xlScreen, Excel.XlCopyPictureFormat.xlBitmap, Excel.XlPictureAppearance.xlScreen); var word = new Word.Application(); word.Visible = true; word.Documents.Add(); // optional arguments word.Selection.Paste(); } } The code is much more terse and readable than the C# 3.0 counterpart. Note especially how the Value property is accessed dynamically. This is actually an indexed property, i.e. a property that takes an argument; something which C# does not understand. However the argument is optional. Since the access is dynamic, it goes through the runtime COM binder which knows to substitute the default value and call the indexed property. Thus, dynamic COM allows you to avoid accesses to the puzzling Value2 property of Excel ranges. Relationship with Visual Basic A number of the features introduced to C# 4.0 already exist or will be introduced in some form or other in Visual Basic: · Late binding in VB is similar in many ways to dynamic lookup in C#, and can be expected to make more use of the DLR in the future, leading to further parity with C#. · Named and optional arguments have been part of Visual Basic for a long time, and the C# version of the feature is explicitly engineered with maximal VB interoperability in mind. · NoPIA and variance are both being introduced to VB and C# at the same time. VB in turn is adding a number of features that have hitherto been a mainstay of C#. As a result future versions of C# and VB will have much better feature parity, for the benefit of everyone. Resources All available resources concerning C# 4.0 can be accessed through the C# Dev Center. Specifically, this white paper and other resources can be found at the Code Gallery site. Enjoy! span.fullpost {display:none;}

    Read the article

  • Red Gate Coder interviews: Alex Davies

    - by Michael Williamson
    Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

    Read the article

  • The Incremental Architect&rsquo;s Napkin - #5 - Design functions for extensibility and readability

    - by Ralf Westphal
    Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/24/the-incremental-architectrsquos-napkin---5---design-functions-for.aspx The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points. Designing software thus consists of at least three phases: Analyzing the requirements to find the Entry Points and their signatures Designing the functionality to be executed when those Entry Points get triggered Implementing the functionality according to the design aka coding I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language. But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought. Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”. And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality. Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do). But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions. Functions are not relevant to functionality. Strange, isn´t it. What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent). That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development. To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it. Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it. But how to do functional design? An example of functional design Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e. There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line C:\>deduplicate "a, b, a, c, d, b, e, c, a" a, b, c, d, e …or by clicking on a GUI button. This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable. What then happens is a three step process: Transform the input data from the user into a request. Call the request handler. Transform the output of the request handler into a tangible result for the user. Or to phrase it a bit more generally: Accept input. Transform input into output. Present output. This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP). Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself. But it´s still on a meta-level. The application to the domain at hand is easy, though: Accept string list from command line De-duplicate Present de-duplicated strings on standard output And this concrete list of processing steps can easily be transformed into code:static void Main(string[] args) { var input = Accept_string_list(args); var output = Deduplicate(input); Present_deduplicated_string_list(output); } Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point. But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.private static string Accept_string_list(string[] args) { return args[0]; } private static void Present_deduplicated_string_list( string[] output) { var line = string.Join(", ", output); Console.WriteLine(line); } Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call. And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done? Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion: De-duplicate Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:private static string[] Parse_string_list(string input) { return input.Split(',') .Select(s => s.Trim()) .ToArray(); } private static Dictionary<string,object> Compile_unique_strings(string[] strings) { return strings.Aggregate( new Dictionary<string, object>(), (agg, s) => { agg[s] = null; return agg; }); } private static string[] Serialize_unique_strings( Dictionary<string,object> dict) { return dict.Keys.ToArray(); } With these three additional functions Main() now looks like this:static void Main(string[] args) { var input = Accept_string_list(args); var strings = Parse_string_list(input); var dict = Compile_unique_strings(strings); var output = Serialize_unique_strings(dict); Present_deduplicated_string_list(output); } I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design: Accept string list from command line Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Present de-duplicated strings on standard output You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later. And as a bonus: all the functions making up the process are small - which means easy to understand, too. So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface. Introducing Flow Design Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm. An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1] In its simplest form a process can be written as a bullet point list of steps, e.g. Get data from user Output result to user Transform data Parse data Map result for output Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming. Next comes ordering the steps. What should happen first, what next etc.? Get data from user Parse data Transform data Map result for output Output result to user That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD. Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion? My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title. Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with. So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple. However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text. In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps. That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient. Visualizing processes The building block of the functional design notation is a functional unit. I mostly draw it like this: Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware. Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design. It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details. That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview. But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”. Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want. Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call). TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers. Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem: For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure. Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units. Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified. Empty brackets mean “no data is flowing”, but nevertheless a signal is sent. A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose. A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type. If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]). But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent. Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output. A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects. A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input. If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings). The Principle of Mutual Oblivion A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other. Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving. Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel. Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream. That makes functional units very easy to test. At least as long as they don´t depend on state or resources. I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility. How the whole is built, how a larger goal is achieved, is of no concern to the single functional units. By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other. Take a nerve cell “controlling” a muscle cell for example:[2] The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way. Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about. Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism. How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment. My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”? So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units. That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology: Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware. Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle. Translating functional units into code Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules: Translate an input port to a function. Translate an output port either to a return statement in that function or to a function pointer visible to that function. The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO. Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks. Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input. Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed. Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”. In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data. Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value. So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]void filter_stop_words( string word, Action<string> onNoStopWord) { if (...check if not a stop word...) onNoStopWord(word); } If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation. Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow. Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only. Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically. Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.class Filter { public void filter_stop_words( string word) { if (...check if not a stop word...) onNoStopWord(word); } public event Action<string> onNoStopWord; } When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road. Another example of 1D functional design Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design. The function signature given is:string WordWrap(string text, int maxLineLength) {...} That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality. The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified. If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line. Flow Design Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed? Words need to be assembled into lines Words need to be extracted from the input text The resulting lines need to be assembled into the output text Words too long to fit in a line need to be split Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line). Here´s the Flow Design for this increment: The the first three bullet points turned into functional units with explicit data added. As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit. In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are. But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced. The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail. Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem. Implementation The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility. And the process flow becomes just a sequence of function calls: Easy to understand. It clearly states how word wrapping works - on a high level of abstraction. And it´s easy to evolve as you´ll see. Flow Design - Increment 2 So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long. Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope. Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible? Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future. I just “phase in” the remaining processing step: Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step. Implementation - Increment 2 A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on: Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with. Compare this to Robert C. Martin´s solution:[4] How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach. Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess. But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design. But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter. In closing I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly. Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here. For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.). Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members. Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities.   PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step. ? Image taken from www.physioweb.org ? My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer. ? Taken from his blog post “The Craftsman 62, The Dark Path”. ?

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Extending Oracle CEP with Predictive Analytics

    - by vikram.shukla(at)oracle.com
    Introduction: OCEP is often used as a business rules engine to execute a set of business logic rules via CQL statements, and take decisions based on the outcome of those rules. There are times where configuring rules manually is sufficient because an application needs to deal with only a small and well-defined set of static rules. However, in many situations customers don't want to pre-define such rules for two reasons. First, they are dealing with events with lots of columns and manually crafting such rules for each column or a set of columns and combinations thereof is almost impossible. Second, they are content with probabilistic outcomes and do not care about 100% precision. The former is the case when a user is dealing with data with high dimensionality, the latter when an application can live with "false" positives as they can be discarded after further inspection, say by a Human Task component in a Business Process Management software. The primary goal of this blog post is to show how this can be achieved by combining OCEP with Oracle Data Mining® and leveraging the latter's rich set of algorithms and functionality to do predictive analytics in real time on streaming events. The secondary goal of this post is also to show how OCEP can be extended to invoke any arbitrary external computation in an RDBMS from within CEP. The extensible facility is known as the JDBC cartridge. The rest of the post describes the steps required to achieve this: We use the dataset available at http://blogs.oracle.com/datamining/2010/01/fraud_and_anomaly_detection_made_simple.html to showcase the capabilities. We use it to show how transaction anomalies or fraud can be detected. Building the model: Follow the self-explanatory steps described at the above URL to build the model.  It is very simple - it uses built-in Oracle Data Mining PL/SQL packages to cleanse, normalize and build the model out of the dataset.  You can also use graphical Oracle Data Miner®  to build the models. To summarize, it involves: Specifying which algorithms to use. In this case we use Support Vector Machines as we're trying to find anomalies in highly dimensional dataset.Build model on the data in the table for the algorithms specified. For this example, the table was populated in the scott/tiger schema with appropriate privileges. Configuring the Data Source: This is the first step in building CEP application using such an integration.  Our datasource looks as follows in the server config file.  It is advisable that you use the Visualizer to add it to the running server dynamically, rather than manually edit the file.    <data-source>         <name>DataMining</name>         <data-source-params>             <jndi-names>                 <element>DataMining</element>             </jndi-names>             <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol>         </data-source-params>         <connection-pool-params>             <credential-mapping-enabled></credential-mapping-enabled>             <test-table-name>SQL SELECT 1 from DUAL</test-table-name>             <initial-capacity>1</initial-capacity>             <max-capacity>15</max-capacity>             <capacity-increment>1</capacity-increment>         </connection-pool-params>         <driver-params>             <use-xa-data-source-interface>true</use-xa-data-source-interface>             <driver-name>oracle.jdbc.OracleDriver</driver-name>             <url>jdbc:oracle:thin:@localhost:1522:orcl</url>             <properties>                 <element>                     <value>scott</value>                     <name>user</name>                 </element>                 <element>                     <value>{Salted-3DES}AzFE5dDbO2g=</value>                     <name>password</name>                 </element>                                 <element>                     <name>com.bea.core.datasource.serviceName</name>                     <value>oracle11.2g</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceVersion</name>                     <value>11.2.0</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceObjectClass</name>                     <value>java.sql.Driver</value>                 </element>             </properties>         </driver-params>     </data-source>   Designing the EPN: The EPN is very simple in this example. We briefly describe each of the components. The adapter ("DataMiningAdapter") reads data from a .csv file and sends it to the CQL processor downstream. The event payload here is same as that of the table in the database (refer to the attached project or do a "desc table-name" from a SQL*PLUS prompt). While this is for convenience in this example, it need not be the case. One can still omit fields in the streaming events, and need not match all columns in the table on which the model was built. Better yet, it does not even need to have the same name as columns in the table, as long as you alias them in the USING clause of the mining function. (Caveat: they still need to draw values from a similar universe or domain, otherwise it constitutes incorrect usage of the model). There are two things in the CQL processor ("DataMiningProc") that make scoring possible on streaming events. 1.      User defined cartridge function Please refer to the OCEP CQL reference manual to find more details about how to define such functions. We include the function below in its entirety for illustration. <?xml version="1.0" encoding="UTF-8"?> <jdbcctxconfig:config     xmlns:jdbcctxconfig="http://www.bea.com/ns/wlevs/config/application"     xmlns:jc="http://www.oracle.com/ns/ocep/config/jdbc">        <jc:jdbc-ctx>         <name>Oracle11gR2</name>         <data-source>DataMining</data-source>               <function name="prediction2">                                 <param name="CQLMONTH" type="char"/>                      <param name="WEEKOFMONTH" type="int"/>                      <param name="DAYOFWEEK" type="char" />                      <param name="MAKE" type="char" />                      <param name="ACCIDENTAREA"   type="char" />                      <param name="DAYOFWEEKCLAIMED"  type="char" />                      <param name="MONTHCLAIMED" type="char" />                      <param name="WEEKOFMONTHCLAIMED" type="int" />                      <param name="SEX" type="char" />                      <param name="MARITALSTATUS"   type="char" />                      <param name="AGE" type="int" />                      <param name="FAULT" type="char" />                      <param name="POLICYTYPE"   type="char" />                      <param name="VEHICLECATEGORY"  type="char" />                      <param name="VEHICLEPRICE" type="char" />                      <param name="FRAUDFOUND" type="int" />                      <param name="POLICYNUMBER" type="int" />                      <param name="REPNUMBER" type="int" />                      <param name="DEDUCTIBLE"   type="int" />                      <param name="DRIVERRATING"  type="int" />                      <param name="DAYSPOLICYACCIDENT"   type="char" />                      <param name="DAYSPOLICYCLAIM" type="char" />                      <param name="PASTNUMOFCLAIMS" type="char" />                      <param name="AGEOFVEHICLES" type="char" />                      <param name="AGEOFPOLICYHOLDER" type="char" />                      <param name="POLICEREPORTFILED" type="char" />                      <param name="WITNESSPRESNT" type="char" />                      <param name="AGENTTYPE" type="char" />                      <param name="NUMOFSUPP" type="char" />                      <param name="ADDRCHGCLAIM"   type="char" />                      <param name="NUMOFCARS" type="char" />                      <param name="CQLYEAR" type="int" />                      <param name="BASEPOLICY" type="char" />                                     <return-component-type>char</return-component-type>                                                      <sql><![CDATA[             SELECT to_char(PREDICTION_PROBABILITY(CLAIMSMODEL, '0' USING *))               AS probability             FROM (SELECT  :CQLMONTH AS MONTH,                                            :WEEKOFMONTH AS WEEKOFMONTH,                          :DAYOFWEEK AS DAYOFWEEK,                           :MAKE AS MAKE,                           :ACCIDENTAREA AS ACCIDENTAREA,                           :DAYOFWEEKCLAIMED AS DAYOFWEEKCLAIMED,                           :MONTHCLAIMED AS MONTHCLAIMED,                           :WEEKOFMONTHCLAIMED,                             :SEX AS SEX,                           :MARITALSTATUS AS MARITALSTATUS,                            :AGE AS AGE,                           :FAULT AS FAULT,                           :POLICYTYPE AS POLICYTYPE,                            :VEHICLECATEGORY AS VEHICLECATEGORY,                           :VEHICLEPRICE AS VEHICLEPRICE,                           :FRAUDFOUND AS FRAUDFOUND,                           :POLICYNUMBER AS POLICYNUMBER,                           :REPNUMBER AS REPNUMBER,                           :DEDUCTIBLE AS DEDUCTIBLE,                            :DRIVERRATING AS DRIVERRATING,                           :DAYSPOLICYACCIDENT AS DAYSPOLICYACCIDENT,                            :DAYSPOLICYCLAIM AS DAYSPOLICYCLAIM,                           :PASTNUMOFCLAIMS AS PASTNUMOFCLAIMS,                           :AGEOFVEHICLES AS AGEOFVEHICLES,                           :AGEOFPOLICYHOLDER AS AGEOFPOLICYHOLDER,                           :POLICEREPORTFILED AS POLICEREPORTFILED,                           :WITNESSPRESNT AS WITNESSPRESENT,                           :AGENTTYPE AS AGENTTYPE,                           :NUMOFSUPP AS NUMOFSUPP,                           :ADDRCHGCLAIM AS ADDRCHGCLAIM,                            :NUMOFCARS AS NUMOFCARS,                           :CQLYEAR AS YEAR,                           :BASEPOLICY AS BASEPOLICY                 FROM dual)                 ]]>         </sql>        </function>     </jc:jdbc-ctx> </jdbcctxconfig:config> 2.      Invoking the function for each event. Once this function is defined, you can invoke it from CQL as follows: <?xml version="1.0" encoding="UTF-8"?> <wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application">   <processor>     <name>DataMiningProc</name>     <rules>        <query id="q1"><![CDATA[                     ISTREAM(SELECT S.CQLMONTH,                                   S.WEEKOFMONTH,                                   S.DAYOFWEEK, S.MAKE,                                   :                                         S.BASEPOLICY,                                    C.F AS probability                                                 FROM                                 StreamDataChannel [NOW] AS S,                                 TABLE(prediction2@Oracle11gR2(S.CQLMONTH,                                      S.WEEKOFMONTH,                                      S.DAYOFWEEK,                                       S.MAKE, ...,                                      S.BASEPOLICY) AS F of char) AS C)                       ]]></query>                 </rules>               </processor>           </wlevs:config>   Finally, the last stage in the EPN prints out the probability of the event being an anomaly. One can also define a threshold in CQL to filter out events that are normal, i.e., below a certain mark as defined by the analyst or designer. Sample Runs: Now let's see how this behaves when events are streamed through CEP. We use only two events for brevity, one normal and other one not. This is one of the "normal" looking events and the probability of it being anomalous is less than 60%. Event is: eventType=DataMiningOutEvent object=q1  time=2904821976256 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=300, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.58931702982118561 isTotalOrderGuarantee=true\nAnamoly probability: .58931702982118561 However, the following event is scored as an anomaly with a very high probability of  89%. So there is likely to be something wrong with it. A close look reveals that the value of "deductible" field (10000) is not "normal". What exactly constitutes normal here?. If you run the query on the database to find ALL distinct values for the "deductible" field, it returns the following set: {300, 400, 500, 700} Event is: eventType=DataMiningOutEvent object=q1  time=2598483773496 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=10000, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.89171554529576691 isTotalOrderGuarantee=true\nAnamoly probability: .89171554529576691 Conclusion: By way of this example, we show: real-time scoring of events as they flow through CEP leveraging Oracle Data Mining.how CEP applications can invoke complex arbitrary external computations (function shipping) in an RDBMS.

    Read the article

  • How do I send automated e-mails from Drupal using Messaging and Notifications?

    - by Adrian
    I am working on a Notifications plugin, and after starting to write my notes down about how to do this, decided to just post them here. Please feel free to come make modifications and changes. Eventually I hope to post this on the Drupal handbook as well. Thanks. --Adrian Sending automated e-mails from Drupal using Messaging and Notifications To implement a notifications plugin, you must implement the following functions: Use hook_messaging, hook_token_list and hook_token_values to create the messages that will be sent. Use hook_notifications to create the subscription types Add code to fire events (eg in hook_nodeapi) Add all UI elements to allow users to subscribe/unsubscribe Understanding Messaging The Messaging module is used to compose messages that can be delivered using various formats, such as simple mail, HTML mail, Twitter updates, etc. These formats are called "send methods." The backend details do not concern us here; what is important are the following concepts: TOKENS: tokens are provided by the "tokens" module. They allow you to write keywords in square brackets, [like-this], that can be replaced by any arbitrary value. Note: the token groups you create must match the keys you add to the $events-objects[$key] array. MESSAGE KEYS: A key is a part of a message, such as the greetings line. Keys can be different for each send method. For example, a plaintext mail's greeting might be "Hi, [user]," while an HTML greeing might be "Hi, [user]," and Twitter's might just be "[user-firstname]: ". Keys can have any arbitrary name. Keys are very simple and only have a machine-readable name and a user-readable description, the latter of which is only seen by admins. MESSAGE GROUPS: A group is a bunch of keys that often, but not always, might be used together to make up a complete message. For example, a generic group might include keys for a greeting, body, closing and footer. Groups can also be "subclassed" by selecting a "fallback" group that will supply any keys that are missing. Groups are also associated with modules; I'm not sure what these are used for. Understanding Notifications The Notifications module revolves around the following concepts: SUBSCRIPTIONS: Notifications plugins may define one or more types of subscriptions. For example, notifications_content defines subscriptions for: Threads (users are notified whenever a node or its comments change) Content types (users are notified whenever a node of a certain type is created or is changed) Users (users are notified whenever another user is changed) Subscriptions refer to both the user who's subscribed, how often they wish to be notified, the send method (for Messaging) and what's being subscribed to. This last part is defined in two steps. Firstly, a plugin defines several "subscription fields" (through a hook_notifications op of the same name), and secondly, "subscription types" (also an op) defines which fields apply to each type of subscription. For example, notifications_content defines the fields "nid," "author" and "type," and the subscriptions "thread" (nid), "nodetype" (type), "author" (author) and "typeauthor" (type and author), the latter referring to something like "any STORY by JOE." Fields are used to link events to subscriptions; an event must match all fields of a subscription (for all normal subscriptions) to be delivered to the recipient. The $subscriptions object is defined in subsequent sections. Notifications prefers that you don't create these objects yourself, preferring you to call the notifications_get_link() function to create a link that users may click on, but you can also use notifications_save_subscription and notifications_delete_subscription to do it yourself. EVENTS: An event is something that users may be notified about. Plugins create the $event object then call notifications_event($event). This either sends out notifications immediately, queues them to send out later, or both. Events include the type of thing that's changed (eg 'node', 'user'), the ID of the thing that's changed (eg $node-nid, $user-uid) and what's happened to it (eg 'create'). These are, respectively, $event-type, $event-oid (object ID) and $event-action. Warning: notifications_content_nodeapi also adds a $event-node field, referring to the node itself and not just $event-oid = $node-nid. This is not used anywhere in the core notifications module; however, when the $event is passed back to the 'query' op (see below), we assume the node is still present. Events do not refer to the user they will be referred to; instead, Notifications makes the connection between subscriptions and events, using the subscriptions' fields. MATCHING EVENTS TO SUBSCRIPTIONS: An event matches a subscription if it has the same type as the event (eg "node") and if the event matches all the correct fields. This second step is determined by the "query" hook op, which is called with the $event object as a parameter. The query op is responsible for giving Notifications a value for all the fields defined by the plugin. For example, notifications_content defines the 'nid', 'type' and 'author' fields, so its query op looks like this (ignore the case where $event_or_user = 'user' for now): $event_or_user = $arg0; $event_type = $arg1; $event_or_object = $arg2; if ($event_or_user == 'event' && $event_type == 'node' && ($node = $event_or_object->node) || $event_or_user == 'user' && $event_type == 'node' && ($node = $event_or_object)) { $query[]['fields'] = array( 'nid' => $node->nid, 'type' => $node->type, 'author' => $node->uid, ); return $query; After extracting the $node from the $event, we set $query[]['fields'] to a dictionary defining, for this event, all the fields defined by the module. As you can tell from the presence of the $query object, there's way more you can do with this op, but they are not covered here. DIGESTING AND DEDUPING: Understanding the relationship between Messaging and Notifications Usually, the name of a message group doesn't matter, but when being used with Notifications, the names must follow very strict patterns. Firstly, they must start with the name "notifications," and then are followed by either "event" or "digest," depending on whether the message group is being used to represent either a single event or a group of events. For 'events,' the third part of the name is the "type," which we get from Notification's $event-type (eg: notifications_content uses 'node'). The last part of the name is the operation being performed, which comes from Notification's $event-action. For example: notifications-event-node-comment might refer to the message group used when someone comments on a node notifications-event-user-update to a user who's updated their profile Hyphens cannot appear anywhere other than to separate the parts of these words. For 'digest' messages, the third and fourth part of the name come from hook_notification's "event types" callback, specifically this line: $types[] = array( 'type' => 'node', 'action' => 'insert', ... 'digest' => array('node', 'type'), ); $types[] = array( 'type' => 'node', 'action' => 'update', ... 'digest' => array('node', 'nid'), ); In this case, the first event type (node insertion) will be digested with the notifications-digest-node-type message template providing the header and footer, likely saying something like "the following [type] was created." The second event type (node update) will be digested with the notifications-digest-node-nid message template. Data Structure and Callback Reference $event The $event object has the following members: $event-type: The type of event. Must match the type in hook_notification::"event types". {notifications_event} $event-action: The action the event describes. Most events are sorted by [$event-type][$event-action]. {notifications_event}. $event-object[$object_type]: All objects relevant to the event. For example, $event-object['node'] might be the node that the event describes. $object_type can come from the 'event types' hook (see below). The main purpose appears to be to be passed to token_replace_multiple as the second parameter. $event-object[$event-type] is assumed to exist in the short digest processing functions, but this doesn't appear to be used anywhere. Not saved in the database; loaded by hook_notifications::"event load" $event-oid: apparently unused. The id of the primary object relevant to this event (eg the node's nid). $event-module: apparently unused $event-params[$key]: Mainly a place for plugins to save random data. The main module will serialize the contents of this array but does not use it in any way. However, notifications_ui appears to do something weird with it, possibly by using subscriptions' fields as keys into this array. I'm not sure why though. hook_notifications op 'subscription types': returns an array of subscription types provided by the plugin, in the form $key = array(...) with the following members: event_type: this subscription can only match events whose $event-type has this value. Stored in the database as notifications.event_type for every individual subscription. Apparently, this can be overiden in code but I wouldn't try it (see notifications_save_subscription). fields: an unkeyed array of fields that must be matched by an event (in addition to the event_type) for it to match this subscription. Each element of this array must be a key of the array returned by op 'subscription fields' which in turn must be used by op 'query' to actually perform the matching. title: user-readable title for their subscriptions page (eg the 'type' column in user/%uid/notifications/subscriptions) description: a user-readable description. page callback: used to add a supplementary page at user/%uid/notifications/blah. This and the following are used by notifications_ui as a part of hook_menu_alter. Appears to be partially deprecated. user page: user/%uid/notifications/blah. op 'event types': returns an array of event types, with each event type being an array with the following members: type: this will match $event-type action: this will match $event-action digest: an array with two ordered (non-keyed) elements, "type" and "field." 'type' is used as an index into $event-objects. 'field' is also used to group events like so: $event-objects[$type]-$field. For example, 'field' might be 'nid' - if the object is a node, the digest lines will be grouped by node ID. Finally, both are used to find the correct Messaging template; see discussion above. description: used on the admin "Notifications-Events" page name: unused, use Messaging instead line: deprecated, use Messaging instead Other Stuff This is an example of the main query that inserts an event into the queue: INSERT INTO {notifications_queue} (uid, destination, sid, module, eid, send_interval, send_method, cron, created, conditions) SELECT DISTINCT s.uid, s.destination, s.sid, s.module, %d, // event ID s.send_interval, s.send_method, s.cron, %d, // time of the event s.conditions FROM {notifications} s INNER JOIN {notifications_fields} f ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = '%s') // subscription type AND (s.send_interval >= 0) AND (s.uid <> %d) AND ( (f.field = '%s' AND f.intval IN (%d)) // everything from 'query' op OR (f.field = '%s' AND f.intval = %d) OR (f.field = '%s' AND f.value = '%s') OR (f.field = '%s' AND f.intval = %d)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid)

    Read the article

  • Optimized OCR black/white pixel algorithm

    - by eagle
    I am writing a simple OCR solution for a finite set of characters. That is, I know the exact way all 26 letters in the alphabet will look like. I am using C# and am able to easily determine if a given pixel should be treated as black or white. I am generating a matrix of black/white pixels for every single character. So for example, the letter I (capital i), might look like the following: 01110 00100 00100 00100 01110 Note: all points, which I use later in this post, assume that the top left pixel is (0, 0), bottom right pixel is (4, 4). 1's represent black pixels, and 0's represent white pixels. I would create a corresponding matrix in C# like this: CreateLetter("I", new List<List<bool>>() { new List<bool>() { false, true, true, true, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, true, true, true, false } }); I know I could probably optimize this part by using a multi-dimensional array instead, but let's ignore that for now, this is for illustrative purposes. Every letter is exactly the same dimensions, 10px by 11px (10px by 11px is the actual dimensions of a character in my real program. I simplified this to 5px by 5px in this posting since it is much easier to "draw" the letters using 0's and 1's on a smaller image). Now when I give it a 10px by 11px part of an image to analyze with OCR, it would need to run on every single letter (26) on every single pixel (10 * 11 = 110) which would mean 2,860 (26 * 110) iterations (in the worst case) for every single character. I was thinking this could be optimized by defining the unique characteristics of every character. So, for example, let's assume that the set of characters only consists of 5 distinct letters: I, A, O, B, and L. These might look like the following: 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 After analyzing the unique characteristics of every character, I can significantly reduce the number of tests that need to be performed to test for a character. For example, for the "I" character, I could define it's unique characteristics as having a black pixel in the coordinate (3, 0) since no other characters have that pixel as black. So instead of testing 110 pixels for a match on the "I" character, I reduced it to a 1 pixel test. This is what it might look like for all these characters: var LetterI = new OcrLetter() { Name = "I", BlackPixels = new List<Point>() { new Point (3, 0) } } var LetterA = new OcrLetter() { Name = "A", WhitePixels = new List<Point>() { new Point(2, 4) } } var LetterO = new OcrLetter() { Name = "O", BlackPixels = new List<Point>() { new Point(3, 2) }, WhitePixels = new List<Point>() { new Point(2, 2) } } var LetterB = new OcrLetter() { Name = "B", BlackPixels = new List<Point>() { new Point(3, 1) }, WhitePixels = new List<Point>() { new Point(3, 2) } } var LetterL = new OcrLetter() { Name = "L", BlackPixels = new List<Point>() { new Point(1, 1), new Point(3, 4) }, WhitePixels = new List<Point>() { new Point(2, 2) } } This is challenging to do manually for 5 characters and gets much harder the greater the amount of letters that are added. You also want to guarantee that you have the minimum set of unique characteristics of a letter since you want it to be optimized as much as possible. I want to create an algorithm that will identify the unique characteristics of all the letters and would generate similar code to that above. I would then use this optimized black/white matrix to identify characters. How do I take the 26 letters that have all their black/white pixels filled in (e.g. the CreateLetter code block) and convert them to an optimized set of unique characteristics that define a letter (e.g. the new OcrLetter() code block)? And how would I guarantee that it is the most efficient definition set of unique characteristics (e.g. instead of defining 6 points as the unique characteristics, there might be a way to do it with 1 or 2 points, as the letter "I" in my example was able to). An alternative solution I've come up with is using a hash table, which will reduce it from 2,860 iterations to 110 iterations, a 26 time reduction. This is how it might work: I would populate it with data similar to the following: Letters["01110 00100 00100 00100 01110"] = "I"; Letters["00100 01010 01110 01010 01010"] = "A"; Letters["00100 01010 01010 01010 00100"] = "O"; Letters["01100 01010 01100 01010 01100"] = "B"; Now when I reach a location in the image to process, I convert it to a string such as: "01110 00100 00100 00100 01110" and simply find it in the hash table. This solution seems very simple, however, this still requires 110 iterations to generate this string for each letter. In big O notation, the algorithm is the same since O(110N) = O(2860N) = O(N) for N letters to process on the page. However, it is still improved by a constant factor of 26, a significant improvement (e.g. instead of it taking 26 minutes, it would take 1 minute). Update: Most of the solutions provided so far have not addressed the issue of identifying the unique characteristics of a character and rather provide alternative solutions. I am still looking for this solution which, as far as I can tell, is the only way to achieve the fastest OCR processing. I just came up with a partial solution: For each pixel, in the grid, store the letters that have it as a black pixel. Using these letters: I A O B L 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 You would have something like this: CreatePixel(new Point(0, 0), new List<Char>() { }); CreatePixel(new Point(1, 0), new List<Char>() { 'I', 'B', 'L' }); CreatePixel(new Point(2, 0), new List<Char>() { 'I', 'A', 'O', 'B' }); CreatePixel(new Point(3, 0), new List<Char>() { 'I' }); CreatePixel(new Point(4, 0), new List<Char>() { }); CreatePixel(new Point(0, 1), new List<Char>() { }); CreatePixel(new Point(1, 1), new List<Char>() { 'A', 'B', 'L' }); CreatePixel(new Point(2, 1), new List<Char>() { 'I' }); CreatePixel(new Point(3, 1), new List<Char>() { 'A', 'O', 'B' }); // ... CreatePixel(new Point(2, 2), new List<Char>() { 'I', 'A', 'B' }); CreatePixel(new Point(3, 2), new List<Char>() { 'A', 'O' }); // ... CreatePixel(new Point(2, 4), new List<Char>() { 'I', 'O', 'B', 'L' }); CreatePixel(new Point(3, 4), new List<Char>() { 'I', 'A', 'L' }); CreatePixel(new Point(4, 4), new List<Char>() { }); Now for every letter, in order to find the unique characteristics, you need to look at which buckets it belongs to, as well as the amount of other characters in the bucket. So let's take the example of "I". We go to all the buckets it belongs to (1,0; 2,0; 3,0; ...; 3,4) and see that the one with the least amount of other characters is (3,0). In fact, it only has 1 character, meaning it must be an "I" in this case, and we found our unique characteristic. You can also do the same for pixels that would be white. Notice that bucket (2,0) contains all the letters except for "L", this means that it could be used as a white pixel test. Similarly, (2,4) doesn't contain an 'A'. Buckets that either contain all the letters or none of the letters can be discarded immediately, since these pixels can't help define a unique characteristic (e.g. 1,1; 4,0; 0,1; 4,4). It gets trickier when you don't have a 1 pixel test for a letter, for example in the case of 'O' and 'B'. Let's walk through the test for 'O'... It's contained in the following buckets: // Bucket Count Letters // 2,0 4 I, A, O, B // 3,1 3 A, O, B // 3,2 2 A, O // 2,4 4 I, O, B, L Additionally, we also have a few white pixel tests that can help: (I only listed those that are missing at most 2). The Missing Count was calculated as (5 - Bucket.Count). // Bucket Missing Count Missing Letters // 1,0 2 A, O // 1,1 2 I, O // 2,2 2 O, L // 3,4 2 O, B So now we can take the shortest black pixel bucket (3,2) and see that when we test for (3,2) we know it is either an 'A' or an 'O'. So we need an easy way to tell the difference between an 'A' and an 'O'. We could either look for a black pixel bucket that contains 'O' but not 'A' (e.g. 2,4) or a white pixel bucket that contains an 'O' but not an 'A' (e.g. 1,1). Either of these could be used in combination with the (3,2) pixel to uniquely identify the letter 'O' with only 2 tests. This seems like a simple algorithm when there are 5 characters, but how would I do this when there are 26 letters and a lot more pixels overlapping? For example, let's say that after the (3,2) pixel test, it found 10 different characters that contain the pixel (and this was the least from all the buckets). Now I need to find differences from 9 other characters instead of only 1 other character. How would I achieve my goal of getting the least amount of checks as possible, and ensure that I am not running extraneous tests?

    Read the article

  • PHP Facebook Cronjob with offline access

    - by Mohamed Salem
    1:the code to greet the user, ask for his permission and store his session data so that we can use a cronjob with his session data afterwards. <?php $db_server = "localhost"; $db_username = "username"; $db_password = "password"; $db_name = "databasename"; #go to line 85, the script actually starts there mysql_connect($db_server,$db_username,$db_password); mysql_select_db($db_name); #you have to create a database to store session values. #if you do not know what columns there should be look at line 76 to see column names. #make them all varchars # Now lets load the FB GRAPH API require './facebook.php'; // Create our Application instance. global $facebook; $facebook = new Facebook(array( 'appId' => '121036530138', 'secret' => '9bbec378147064', 'cookie' => false,)); # Lets set up the permissions we need and set the login url in case we need it. $par['req_perms'] = "friends_about_me,friends_education_history,friends_likes, friends_interests,friends_location,friends_religion_politics, friends_work_history,publish_stream,friends_activities, friends_events, friends_hometown,friends_location ,user_interests,user_likes,user_events, user_about_me,user_status,user_work_history,read_requests, read_stream,offline_access,user_religion_politics,email,user_groups"; $loginUrl = $facebook->getLoginUrl($par); function save_session($session){ global $facebook; # OK lets go to the database and see if we have a session stored $sid=mysql_query("Select access_token from facebook_user WHERE uid =".$session['uid']); $session_id=mysql_fetch_row($sid); if (is_array($session_id)) { # We have a stored session, but is it valid? echo " We have a session, but is it valid?"; try { $attachment = array('access_token' => $session_id[0]); $ret_code=$facebook->api('/me', 'GET', $attachment); } catch (Exception $e) { # We don't have a good session so echo " our old session is not valid, let's delete saved invalid session data "; $res = mysql_query("delete from facebook_user WHERE uid =".$session['uid']); #save new good session #to see what is our session data: print_r($session); if (is_array($session)) { $sql="insert into facebook_user (session_key,uid,expires,secret,access_token,sig) VALUES ('".$session['session_key']."','".$session['uid']."','". $session['expires']."','". $session['secret'] ."','" . $session['access_token']."','". $session['sig']."');"; $res = mysql_query($sql); return $session['access_token']; } # this should never ever happen echo " Something is terribly wrong: Our old session was bad, and now we cannot get the new session"; return; } echo " Our old stored session is valid "; return $session_id[0]; } else { echo " no stored session, this means the user never subscribed to our application before. "; # let's store the session $session = $facebook->getSession(); if (is_array($session)) { # Yes we have a session! so lets store it! $sql="insert into facebook_user (session_key,uid,expires,secret,access_token,sig) VALUES ('".$session['session_key']."','".$session['uid']."','". $session['expires']."','". $session['secret'] ."','". $session['access_token']."','". $session['sig']."');"; $res = mysql_query($sql); return $session['access_token']; } } } #this is the first meaningful line of this script. $session = $facebook->getSession(); # Is the user already subscribed to our application? if ( is_null($session) ) { # no he is not #send him to permissions page header( "Location: $loginUrl" ); } else { #yes, he is already subscribed, or subscribed just now #in case he just subscribed now, save his session information $access_token=save_session($session); echo " everything is ok"; # write your code here to do something afterwards } ?> error Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent (output started at /home/content/28/9687528/html/ss/src/indexx.php:1) in /home/content/28/9687528/html/ss/src/facebook.php on line 49 Fatal error: Call to undefined method Facebook::getSession() in /home/content/28/9687528/html/ss/src/indexx.php on line 86 2:A cronjob template that reads the stored session of a user from database, uses his session data to work on his behalf, like reading status posts or publishing posts etc. <?php $db_server = "localhost"; $db_username = "username"; $db_password = "pass"; $db_name = "database"; # Lets connect to the Database and set up the table $link = mysql_connect($db_server,$db_username,$db_password); mysql_select_db($db_name); # Now lets load the FB GRAPH API require './facebook.php'; // Create our Application instance. global $facebook; $facebook = new Facebook(array( 'appId' => 'appid', 'secret' => 'secret', 'cookie' => false, )); function get_check_session($uidCheck){ global $facebook; # This function basically checks for a stored session and if we have one it returns it # OK lets go to the database and see if we have a session stored $sid=mysql_query("Select access_token from facebook_user WHERE uid =".$uidCheck); $session_id=mysql_fetch_row($sid); if (is_array($session_id)) { # We have a session # but, is it valid? try { $attachment = array('access_token' => $session_id[0],); $ret_code=$facebook->api('/me', 'GET', $attachment); } catch (Exception $e) { # We don't have a good session so echo " User ".$uidCheck." removed the application, or there is some other access problem. "; # let's delete stored data $res = mysql_query("delete from facebook_user where WHERE uid =".$uidCheck); return; } return $session_id[0]; } else { # "no stored session"; echo " error:newsFeedcrontab.php No stored sessions. This should not have happened "; } } # get all users that have given us offline access $users = getUsers(); foreach($users as $user){ # now for each user, check if they are still subscribed to our application echo " Checking user".$user; $access_token=get_check_session($user); # If we've not got an access_token we actually need to login. # but in the crontab, we just log the error, there is no way we can find the user to give us permission here. if ( is_null($access_token) ) { echo " error: newsFeedcrontab.php There is no access token for the user ".$user." "; } else { #we are going to read the newsfeed of user. There are user's friends' posts in this newsfeed try{ $attachment = array('access_token' => $access_token); $result=$facebook->api('/me/home', 'GET', $attachment); }catch(Exception $e){ echo " error: newsfeedcrontab.php, cannot get feed of ".$user.$e; } #do something with the result here #but what does the result look like? #go to http://developers.facebook.com/docs/reference/api/user/ and click on the "home" link under connections #we can also read the home of user. Home is the wall of the user who has given us offline access. try{ $attachment = array('access_token' => $access_token); $result=$facebook->api('/me/feed', 'GET', $attachment); }catch(Exception $e){ echo " error: newsfeedcrontab.php, cannot get wall of ".$user.$e; } #do something with the result here # #but what does the result look like? #go to http://developers.facebook.com/docs/reference/api/user/ and click on the "feed" link under connections } } function getUsers(){ $sql = "SELECT distinct(uid) from facebook_user Where 1"; $result = mysql_query($sql); while($row = mysql_fetch_array($result)){ $rows [] = $row['uid']; } print_r($rows); return $rows; } mysql_close($link); ?> error Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent (output started at /home/content/28/9687528/html/ss/src/cron.php:1) in /home/content/28/9687528/html/ss/src/facebook.php on line 49 Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/content/28/9687528/html/ss/src/cron.php on line 110 Warning: Invalid argument supplied for foreach() in /home/content/28/9687528/html/ss/src/cron.php on line 64

    Read the article

  • Can't add client machine to windows server 2008 domain controller

    - by Patrick J Collins
    A bit of background before I dive into the gritty details: I have a single server running Windows 2003 Server where I host my ASP.net website and SQL Server + Reports. I've been creating ordinary windows user accounts to authenticate my users, and I enabled integrated windows authentication with impersonation. I've set up a bunch of user groups which correspond to certain roles (admin, power user, normal user, etc) and I test membership to enable or disable certain features. Overall, I'm pretty happy with the solution, it was quick to setup and I don't have to worry about messing around storing passwords and whatnot. Well, what I'm trying to do now is set up a new environment with 3 servers (Web, SQL, Reports) and I'd like these three servers to share common user accounts. I understand that I could add these three machines to a domain, which means installing Active Directory on one of the machines. I am barking up the wrong tree here? Would you suggest an alternative configuration? Assuming that I stick with AD, I have a couple of questions regarding DNS. To be honest, I'd rather not fiddle around with the DNS settings because my ISP already has their own DNS server which works just fine. It would appear however that DNS and AD are intertwined. Firstly, if I am to create a new domain in called mycompany.net, do I actually need to be the registered owner of that domain name and ensure the DNS entry points to the IP address of the machine hosting AD? Secondly, for the two other machines that I am trying to add to the domain, do I need to fiddle with their DNS settings? I've tried setting the preferred DNS Server IP address to that of my newly installed AD, but no luck. At this point, I can't add the two other machines to the domain. Here are some diagnostics that I have run based on a few suggestions I read on forums (sorry they're in French, although I could translate if needed). I ran nltest, which seems to indicate that the client can discover the domain controller. When I run dcdiag, the call to DsGetDcName fails with error 1722, not really sure what that means. Any suggestions? Thanks! C:\Users\Administrator>nltest /dsgetdc:mycompany.net Contrôleur de domaine : \\REPORTS.mycompany.net Adresse : \\111.111.111.111 GUID dom : 3333a4ec-ca56-4f02-bb9e-76c29c6c3832 Nom dom : mycompany.net Nom de la forêt : mycompany.net Nom de site du contrôleur de domaine : Default-First-Site-Name Nom de notre site : Default-First-Site-Name Indicateurs : PDC GC DS LDAP KDC TIMESERV WRITABLE DNS_DC DNS_DOMAIN DNS _FOREST CLOSE_SITE FULL_SECRET La commande a été correctement exécutée C:\Users\Administrator>dcdiag /s:mycompany.net /u: mycompany.net \pcollins /p:somepass Diagnostic du serveur d'annuaire Exécution de l'installation initiale : * Forêt AD identifiée. Collecte des informations initiales terminée. Exécution des tests initiaux nécessaires Test du serveur : Default-First-Site-Name\REPORTS Démarrage du test : Connectivity ......................... Le test Connectivity de REPORTS a réussi Exécution des tests principaux Test du serveur : Default-First-Site-Name\REPORTS Démarrage du test : Advertising Erreur irrécupérable : l'appel DsGetDcName (REPORTS) a échoué ; erreur 1722 Le localisateur n'a pas pu trouver le serveur. ......................... Le test Advertising de REPORTS a échoué Démarrage du test : FrsEvent Impossible d'interroger le journal des événements File Replication Service sur le serveur REPORTS.mycompany.net. Erreur 0x6ba « Le serveur RPC n'est pas disponible. » ......................... Le test FrsEvent de REPORTS a échoué Démarrage du test : DFSREvent Impossible d'interroger le journal des événements DFS Replication sur le serveur REPORTS.mycompany.net. Erreur 0x6ba « Le serveur RPC n'est pas disponible. » ......................... Le test DFSREvent de REPORTS a échoué Démarrage du test : SysVolCheck [REPORTS] Une opération net use ou LsaPolicy a échoué avec l'erreur 53, Le chemin réseau n'a pas été trouvé.. ......................... Le test SysVolCheck de REPORTS a échoué Démarrage du test : KccEvent Impossible d'interroger le journal des événements Directory Service sur le serveur REPORTS.mycompany.net. Erreur 0x6ba « Le serveur RPC n'est pas disponible. » ......................... Le test KccEvent de REPORTS a échoué Démarrage du test : KnowsOfRoleHolders ......................... Le test KnowsOfRoleHolders de REPORTS a réussi Démarrage du test : MachineAccount Impossible d'ouvrir le canal avec [REPORTS] : échec avec l'erreur 53 : Le chemin réseau n'a pas été trouvé. Impossible d'obtenir le nom de domaine NetBIOS Échec : impossible de tester le nom principal de service (SPN) HOST Échec : impossible de tester le nom principal de service (SPN) HOST ......................... Le test MachineAccount de REPORTS a réussi Démarrage du test : NCSecDesc ......................... Le test NCSecDesc de REPORTS a réussi Démarrage du test : NetLogons [REPORTS] Une opération net use ou LsaPolicy a échoué avec l'erreur 53, Le chemin réseau n'a pas été trouvé.. ......................... Le test NetLogons de REPORTS a échoué Démarrage du test : ObjectsReplicated ......................... Le test ObjectsReplicated de REPORTS a réussi Démarrage du test : Replications ......................... Le test Replications de REPORTS a réussi Démarrage du test : RidManager ......................... Le test RidManager de REPORTS a réussi Démarrage du test : Services Impossible d'ouvrir IPC distant à [REPORTS.mycompany.net] : erreur 0x35 « Le chemin réseau n'a pas été trouvé. » ......................... Le test Services de REPORTS a échoué Démarrage du test : SystemLog Impossible d'interroger le journal des événements System sur le serveur REPORTS.mycompany.net. Erreur 0x6ba « Le serveur RPC n'est pas disponible. » ......................... Le test SystemLog de REPORTS a échoué Démarrage du test : VerifyReferences ......................... Le test VerifyReferences de REPORTS a réussi Exécution de tests de partitions sur ForestDnsZones Démarrage du test : CheckSDRefDom ......................... Le test CheckSDRefDom de ForestDnsZones a réussi Démarrage du test : CrossRefValidation ......................... Le test CrossRefValidation de ForestDnsZones a réussi Exécution de tests de partitions sur DomainDnsZones Démarrage du test : CheckSDRefDom ......................... Le test CheckSDRefDom de DomainDnsZones a réussi Démarrage du test : CrossRefValidation ......................... Le test CrossRefValidation de DomainDnsZones a réussi Exécution de tests de partitions sur Schema Démarrage du test : CheckSDRefDom ......................... Le test CheckSDRefDom de Schema a réussi Démarrage du test : CrossRefValidation ......................... Le test CrossRefValidation de Schema a réussi Exécution de tests de partitions sur Configuration Démarrage du test : CheckSDRefDom ......................... Le test CheckSDRefDom de Configuration a réussi Démarrage du test : CrossRefValidation ......................... Le test CrossRefValidation de Configuration a réussi Exécution de tests de partitions sur mycompany Démarrage du test : CheckSDRefDom ......................... Le test CheckSDRefDom de mycompany a réussi Démarrage du test : CrossRefValidation ......................... Le test CrossRefValidation de mycompany a réussi Exécution de tests d'entreprise sur mycompany.net Démarrage du test : LocatorCheck Avertissement : l'appel DcGetDcName(GC_SERVER_REQUIRED) a échoué ; erreur 1722 Serveur de catalogue global introuvable - Les catalogues globaux ne fonctionnent pas. Avertissement : l'appel DcGetDcName(PDC_REQUIRED) a échoué ; erreur 1722 Contrôleur principal de domaine introuvable. Le serveur contenant le rôle PDC ne fonctionne pas. Avertissement : l'appel DcGetDcName(TIME_SERVER) a échoué ; erreur 1722 Serveur de temps introuvable. Le serveur contenant le rôle PDC ne fonctionne pas. Avertissement : l'appel DcGetDcName(GOOD_TIME_SERVER_PREFERRED) a échoué ; erreur 1722 Serveur de temps introuvable. Avertissement : l'appel DcGetDcName(KDC_REQUIRED) a échoué ; erreur 1722 Centre de distribution de clés introuvable : les centres de distribution de clés ne fonctionnent pas. ......................... Le test LocatorCheck de mycompany.net a échoué Démarrage du test : Intersite ......................... Le test Intersite de mycompany.net a réussi Update 1 : I am under the distinct impression that the problem is caused by some security settings. I have read elsewhere that the client needs to be able to access the fileshare sysvol. I had to enable Client for Microsoft Windows and File and Printer Sharing which were previously disabled. When I now run dcdiag the Advertising test works, which I suppose is forward progress. It currently chokes on the Services step (unable to open remote IPC). Démarrage du test : Services Impossible d'ouvrir IPC distant à [REPORTS.locbus.net] : erreur 0x35 « Le chemin réseau n'a pas été trouvé. » ......................... Le test Services de REPORTS a échoué The original English version of that error message : Could not open Remote ipc to [server] Update 2 : I attach some more diagnostics : Netsetup.log (client): 09/24/2009 13:27:09:773 ----------------------------------------------------------------- 09/24/2009 13:27:09:773 NetpValidateName: checking to see if 'WEB' is valid as type 1 name 09/24/2009 13:27:12:773 NetpCheckNetBiosNameNotInUse for 'WEB' [MACHINE] returned 0x0 09/24/2009 13:27:12:773 NetpValidateName: name 'WEB' is valid for type 1 09/24/2009 13:27:12:805 ----------------------------------------------------------------- 09/24/2009 13:27:12:805 NetpValidateName: checking to see if 'WEB' is valid as type 5 name 09/24/2009 13:27:12:805 NetpValidateName: name 'WEB' is valid for type 5 09/24/2009 13:27:12:852 ----------------------------------------------------------------- 09/24/2009 13:27:12:852 NetpValidateName: checking to see if 'MYCOMPANY.NET' is valid as type 3 name 09/24/2009 13:27:12:992 NetpCheckDomainNameIsValid [ Exists ] for 'MYCOMPANY.NET' returned 0x0 09/24/2009 13:27:12:992 NetpValidateName: name 'MYCOMPANY.NET' is valid for type 3 09/24/2009 13:27:21:320 ----------------------------------------------------------------- 09/24/2009 13:27:21:320 NetpDoDomainJoin 09/24/2009 13:27:21:320 NetpMachineValidToJoin: 'WEB' 09/24/2009 13:27:21:320 OS Version: 6.0 09/24/2009 13:27:21:320 Build number: 6002 09/24/2009 13:27:21:320 ServicePack: Service Pack 2 09/24/2009 13:27:21:414 SKU: Windows Server® 2008 Standard 09/24/2009 13:27:21:414 NetpDomainJoinLicensingCheck: ulLicenseValue=1, Status: 0x0 09/24/2009 13:27:21:414 NetpGetLsaPrimaryDomain: status: 0x0 09/24/2009 13:27:21:414 NetpMachineValidToJoin: status: 0x0 09/24/2009 13:27:21:414 NetpJoinDomain 09/24/2009 13:27:21:414 Machine: WEB 09/24/2009 13:27:21:414 Domain: MYCOMPANY.NET 09/24/2009 13:27:21:414 MachineAccountOU: (NULL) 09/24/2009 13:27:21:414 Account: MYCOMPANY.NET\pcollins 09/24/2009 13:27:21:414 Options: 0x25 09/24/2009 13:27:21:414 NetpLoadParameters: loading registry parameters... 09/24/2009 13:27:21:414 NetpLoadParameters: DNSNameResolutionRequired not found, defaulting to '1' 0x2 09/24/2009 13:27:21:414 NetpLoadParameters: status: 0x2 09/24/2009 13:27:21:414 NetpValidateName: checking to see if 'MYCOMPANY.NET' is valid as type 3 name 09/24/2009 13:27:21:523 NetpCheckDomainNameIsValid [ Exists ] for 'MYCOMPANY.NET' returned 0x0 09/24/2009 13:27:21:523 NetpValidateName: name 'MYCOMPANY.NET' is valid for type 3 09/24/2009 13:27:21:523 NetpDsGetDcName: trying to find DC in domain 'MYCOMPANY.NET', flags: 0x40001010 09/24/2009 13:27:22:039 NetpDsGetDcName: failed to find a DC having account 'WEB$': 0x525, last error is 0x79 09/24/2009 13:27:22:039 NetpDsGetDcName: status of verifying DNS A record name resolution for 'KING.MYCOMPANY.NET': 0x0 09/24/2009 13:27:22:039 NetpDsGetDcName: found DC '\\KING.MYCOMPANY.NET' in the specified domain 09/24/2009 13:27:30:039 NetUseAdd to \\KING.MYCOMPANY.NET\IPC$ returned 53 09/24/2009 13:27:30:039 NetpJoinDomain: status of connecting to dc '\\KING.MYCOMPANY.NET': 0x35 09/24/2009 13:27:30:039 NetpDoDomainJoin: status: 0x35 09/24/2009 13:27:30:148 ----------------------------------------------------------------- ipconfig /all (on client): Configuration IP de Windows Nom de l'hôte . . . . . . . . . . : WEB Suffixe DNS principal . . . . . . : Type de noeud. . . . . . . . . . : Hybride Routage IP activé . . . . . . . . : Non Proxy WINS activé . . . . . . . . : Non Carte Ethernet Connexion au réseau local : Suffixe DNS propre à la connexion. . . : Description. . . . . . . . . . . . . . : Intel 21140-Based PCI Fast Ethernet Adapter (Emulated) Adresse physique . . . . . . . . . . . : **-15-5D-A1-17-** DHCP activé. . . . . . . . . . . . . . : Non Configuration automatique activée. . . : Oui Adresse IPv4. . . . . . . . . . . : **.***.163.122(préféré) Masque de sous-réseau. . . . . . . . . : 255.255.255.0 Passerelle par défaut. . . . . . . . . : **.***.163.2 Serveurs DNS. . . . . . . . . . . . . : **.***.163.123 NetBIOS sur Tcpip. . . . . . . . . . . : Activé ipconfig /all (on server): Configuration IP de Windows Nom de l'hôte . . . . . . . . . . : KING Suffixe DNS principal . . . . . . : mycompany.net Type de noeud. . . . . . . . . . : Hybride Routage IP activé . . . . . . . . : Non Proxy WINS activé . . . . . . . . : Non Liste de recherche du suffixe DNS.: locbus.net Carte Ethernet Connexion au réseau local : Suffixe DNS propre à la connexion. . . : Description. . . . . . . . . . . . . . : Intel 21140-Based PCI Fast Ethernet Adapter (Emulated) Adresse physique . . . . . . . . . . . : **-15-5D-A1-1E-** DHCP activé. . . . . . . . . . . . . . : Non Configuration automatique activée. . . : Oui Adresse IPv4. . . . . . . . . . . : **.***.163.123(préféré) Masque de sous-réseau. . . . . . . . . : 255.255.255.0 Passerelle par défaut. . . . . . . . . : **.***.163.2 Serveurs DNS. . . . . . . . . . . . . : 127.0.0.1 NetBIOS sur Tcpip. . . . . . . . . . . : Activé nslookup (on client): Serveur : *******.***.com Address: **.***.163.123 Nom : mycompany.net Addresses: ****:****:a37b::****:a37b **.****.163.123

    Read the article

  • Struts 1 ActionForm - retrieving a collection from pure HTML

    - by Yaneeve
    Hi all I have (just like the rest) inherited some struts 1 code. I have had need to add a few more pages to this project. What I cannot figure out is how to map several distinct but similarly natured input elements to the my ActionForm. Let me elaborate. I create a new <Input> element dynamically as the user inputs more and more items (I use the YUI autocomplete form element and for each entered input I add it as an input element to my form and draw a new YUI autocomplete - complex sounding, I know) So... My form looks a bit like (... after some prettifying and some such...): <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <title>My Cool App - Test Case Builder</title> <link rel="stylesheet" type="text/css" href="../script/yui/fonts/fonts-min.css" /> <link rel="stylesheet" type="text/css" href="../skins/myCoolApp/button/button.css" /> <link rel="stylesheet" type="text/css" href="../script/yui/autocomplete/assets/skins/sam/autocomplete.css" /> <link rel="stylesheet" type="text/css" media="screen" href="../skins/myCoolApp/testcase.css" /> <!-- YUI JAVA SCRIPTS --> <script type="text/javascript" src="../script/yui/yahoo-dom-event/yahoo-dom-event.js"></script> <script type="text/javascript" src="../script/yui/element/element-min.js"></script> <script type="text/javascript" src="../script/yui/button/button-min.js"></script> <script type="text/javascript" src="../script/yui/datasource/datasource-min.js"></script> <script type="text/javascript" src="../script/yui/autocomplete/autocomplete-min.js"></script> <!-- APP JAVA SCRIPTS --> <script type="text/javascript" src="../script/myCoolApp/myCoolApp.js" ></script> <script type="text/javascript" src="../script/myCoolApp/stack.js" ></script> <script type="text/javascript" src="../script/myCoolApp/testcase/testcase.js"></script> <script type="text/javascript" src="../script/myCoolApp/testcase/default-data.js" ></script> <script type="text/javascript" src="../script/myCoolApp/testcase/data-structs.js" ></script> <script type="text/javascript" src="../script/myCoolApp/testcase/ui-elements.js" ></script> </head> <body class="cf010"> <div id="wrap"> <div id="header"> <div id="main-header"> COOL APP </div> </div> <div id="main-body"> <div id="content"> <div class="col main"> <div id="main"> <form method="post" id="testcaseForm" class="typea" action=""> <fieldset> <legend>Test Case Builder</legend> <div id="tk1" class="tabcontrol"> <ul class="tabs"> <li class="first active"> <a href="#"> <span>General</span> </a> </li> <li class="last"> <a href="#"> <span>Parameters</span> </a> </li> </ul> <div id="tab0" class="tc-panel"> <dl class="cls9"> <dt> <label for="scenario">Choose Scenario:</label> </dt> <dd> <input type="text" id="scenario" name="scenario" class="text" /> <span id="scenarioToggle"></span> <div class="auto-complete" id="scenarioContainer"></div> </dd> <dt> <label for="ruleID">Choose Rule ID:</label> </dt> <dd> <input type="text" id="ruleID" name="ruleID" class="text" /> <span id="ruleIDToggle"></span> <div class="auto-complete" id="ruleIDContainer"></div> </dd> <dt> <label for="Test Case Name" accesskey="t"><span class="accesskey">T</span>est Case Name:</label> </dt> <dd> <input type="text" id="testCaseName" name="testCaseName" class="text" /> </dd> </dl> </div> <div id="tab1" class="tc-panel hidden"> <div class="toolbar" id="action-bar"> <ul> <li class="first"> <a title="select all" href="#" id="btmSelectAll" class="button"> <span>select all</span> </a> </li> <li> <a title="remove row" href="#" id="btmRemove" class="button"> <span>remove row</span> </a> </li> <li> <a title="undo last" href="#" id="btmRollBack" class="button disabled"> <span>undo last</span> </a> </li> <li class="last"> <a title="accept row" href="#" id="btmAccept" class="button disabled"> <span>accept row</span> </a> </li> </ul> </div> <div id="param.list" class="gridclip"> <table id='param.list.tbl' class='grid modela' > <caption>Test Case Summary</caption> <col/><col/><col/> <thead> <tr> <th class='hl center first'> <input class='grid-select-all' type='checkbox' /> <th> <th scope='col'>Row</th> <th scope='col'>Parameter</th> <th scope='col' class='last'>Value</th> </tr> </thead> <tfoot> <tr> <th scope='row'>Total</th> <td colspan='3'>2 parameters as Test Case input</td> </tr> </tfoot> <tbody id='param.list.tbl.body'> <tr class='odd'> <td class='rowcheck center first'> <input value='param1###value1' id='cb1' name='SelectedRows' class='grid-select-row' type='checkbox'/> </td> <td class='id'>1</td> <td>param1</td> <td class='last'>value1</td> </tr> <tr class='even'> <td class='rowcheck center first'> <input value='param2###value2' id='cb1' name='SelectedRows' class='grid-select-row' type='checkbox'/> </td> <td class='id'>2</td> <td>param2</td> <td class='last'>value2</td> </tr> <tr class='odd'> <td class='rowcheck center first' /> <td class='id'><em>new</em></td> <td> <dl class='clsTable'> <dt> <input type='text' id='param' name='param' class='text paramInput' /> </dt> <dd> <span id='paramToggle' /> </dd> <div class='auto-complete' id='paramContainer' /> </dl> </td> <td class='last'> <dl class='clsTable'> <dt> <input type='text' id='value' name='value' class='text valueInput' /> </dt> </dl> </td> </tr> </tbody> </table> </div> </div> </div> <!-- tabcontrol --> </fieldset> <div class="submit-box"> <input type="submit" name="formRun" id="formRun" class="form-save" value="Execute" accesskey="x" title="Run: Press Alt + [Shift] + x" /> <input type="submit" name="formSave" id="formSave" value="Save" accesskey="s" title="Save: Press Alt + [Shift] + s" /> <input type="submit" name="formLoad" id="formLoad" value="Load" accesskey="l" title="Load: Press Alt + [Shift] + l" /> <input type="submit" name="formCancel" id="formCancel" class="form-cancel" value="Cancel" accesskey="c" title="Cancel: Press Alt + [Shift] + c" /> </div> </form> </div> </div> </div> </div> </div> </body> </html> As you can see the following is pretty much a duplicate: <tr class='odd'> <td class='rowcheck center first'> <input value='param1###value1' id='cb1' name='SelectedRows' class='grid-select-row' type='checkbox'/> </td> <td class='id'>1</td> <td>param1</td> <td class='last'>value1</td> </tr> <tr class='even'> <td class='rowcheck center first'> <input value='param2###value2' id='cb1' name='SelectedRows' class='grid-select-row' type='checkbox'/> </td> <td class='id'>2</td> <td>param2</td> <td class='last'>value2</td> </tr> The relevant part of my stuts-config.xml file is: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE struts-config PUBLIC "-//Apache Software Foundation//DTD Struts Configuration 1.2//EN" "http://struts.apache.org/dtds/struts-config_1_2.dtd"> <struts-config> <data-sources /> <form-beans> <form-bean name="TestCaseForm" type="com.blahblah.mycoolapp.forms.TestCaseForm" /> </form-beans> <action-mappings> <action path="/pages/SaveTestCase" name="TestCaseForm" type="org.springframework.web.struts.DelegatingActionProxy" scope="request"> </action> </action-mappings> <message-resources parameter="MessageResources" /> </struts-config> I also use spring 2.56 (The relevant part being): <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd"> <bean name="/pages/SaveTestCase" class="com.blahblah.mycoolapp.actions.TestCaseBuilderSaveAction" /> </beans> My Java ActionForm class (from what I had learned off the net) is: package com.blahblah.mycoolapp.forms; import java.util.ArrayList; import java.util.List; import org.apache.struts.action.ActionForm; public class TestCaseForm extends ActionForm { private static final long serialVersionUID = 2352146257739099766L; private String scenario; private String ruleID; private String testCaseName; private List<String> SelectedRows = new ArrayList<String>() ; public String getScenario() { return scenario; } public void setScenario(String scenario) { this.scenario = scenario; } public String getRuleID() { return ruleID; } public void setRuleID(String ruleID) { this.ruleID = ruleID; } public String getTestCaseName() { return testCaseName; } public void setTestCaseName(String testCaseName) { this.testCaseName = testCaseName; } public List<String> getSelectedRows() { return SelectedRows; } public void setSelectedRows(int index, String value) { this.SelectedRows.add(value); } } The question is why do I get an empty SelectedRows in my TestCaseBuilderSave Action? Thanks all who have the patience to read such a long question... and (hopefully) thanks to all you potential saviors :)

    Read the article

  • Is there a Math.atan2 substitute for j2ME? Blackberry development

    - by Kai
    I have a wide variety of locations stored in my persistent object that contain latitudes and longitudes in double(43.7389, 7.42577) format. I need to be able to grab the user's latitude and longitude and select all items within, say 1 mile. Walking distance. I have done this in PHP so I snagged my PHP code and transferred it to Java, where everything plugged in fine until I figured out J2ME doesn't support atan2(double, double). So, after some searching, I find a small snippet of code that is supposed to be a substitute for atan2. Here is the code: public double atan2(double y, double x) { double coeff_1 = Math.PI / 4d; double coeff_2 = 3d * coeff_1; double abs_y = Math.abs(y)+ 1e-10f; double r, angle; if (x >= 0d) { r = (x - abs_y) / (x + abs_y); angle = coeff_1; } else { r = (x + abs_y) / (abs_y - x); angle = coeff_2; } angle += (0.1963f * r * r - 0.9817f) * r; return y < 0.0f ? -angle : angle; } I am getting odd results from this. My min and max latitude and longitudes are coming back as incredibly low numbers that can't possibly be right. Like 0.003785746 when I am expecting something closer to the original lat and long values (43.7389, 7.42577). Since I am no master of advanced math, I don't really know what to look for here. Perhaps someone else may have an answer. Here is my complete code: package store_finder; import java.util.Vector; import javax.microedition.location.Criteria; import javax.microedition.location.Location; import javax.microedition.location.LocationException; import javax.microedition.location.LocationListener; import javax.microedition.location.LocationProvider; import javax.microedition.location.QualifiedCoordinates; import net.rim.blackberry.api.invoke.Invoke; import net.rim.blackberry.api.invoke.MapsArguments; import net.rim.device.api.system.Bitmap; import net.rim.device.api.system.Display; import net.rim.device.api.ui.Color; import net.rim.device.api.ui.Field; import net.rim.device.api.ui.Graphics; import net.rim.device.api.ui.Manager; import net.rim.device.api.ui.component.BitmapField; import net.rim.device.api.ui.component.RichTextField; import net.rim.device.api.ui.component.SeparatorField; import net.rim.device.api.ui.container.HorizontalFieldManager; import net.rim.device.api.ui.container.MainScreen; import net.rim.device.api.ui.container.VerticalFieldManager; public class nearBy extends MainScreen { private HorizontalFieldManager _top; private VerticalFieldManager _middle; private int horizontalOffset; private final static long animationTime = 300; private long animationStart = 0; private double latitude = 43.7389; private double longitude = 7.42577; private int _interval = -1; private double max_lat; private double min_lat; private double max_lon; private double min_lon; private double latitude_in_degrees; private double longitude_in_degrees; public nearBy() { super(); horizontalOffset = Display.getWidth(); _top = new HorizontalFieldManager(Manager.USE_ALL_WIDTH | Field.FIELD_HCENTER) { public void paint(Graphics gr) { Bitmap bg = Bitmap.getBitmapResource("bg.png"); gr.drawBitmap(0, 0, Display.getWidth(), Display.getHeight(), bg, 0, 0); subpaint(gr); } }; _middle = new VerticalFieldManager() { public void paint(Graphics graphics) { graphics.setBackgroundColor(0xFFFFFF); graphics.setColor(Color.BLACK); graphics.clear(); super.paint(graphics); } protected void sublayout(int maxWidth, int maxHeight) { int displayWidth = Display.getWidth(); int displayHeight = Display.getHeight(); super.sublayout( displayWidth, displayHeight); setExtent( displayWidth, displayHeight); } }; add(_top); add(_middle); Bitmap lol = Bitmap.getBitmapResource("logo.png"); BitmapField lolfield = new BitmapField(lol); _top.add(lolfield); Criteria cr= new Criteria(); cr.setCostAllowed(true); cr.setPreferredResponseTime(60); cr.setHorizontalAccuracy(5000); cr.setVerticalAccuracy(5000); cr.setAltitudeRequired(true); cr.isSpeedAndCourseRequired(); cr.isAddressInfoRequired(); try{ LocationProvider lp = LocationProvider.getInstance(cr); if( lp!=null ){ lp.setLocationListener(new LocationListenerImpl(), _interval, 1, 1); } } catch(LocationException le) { add(new RichTextField("Location exception "+le)); } //_middle.add(new RichTextField("this is a map " + Double.toString(latitude) + " " + Double.toString(longitude))); int lat = (int) (latitude * 100000); int lon = (int) (longitude * 100000); String document = "<location-document>" + "<location lon='" + lon + "' lat='" + lat + "' label='You are here' description='You' zoom='0' />" + "<location lon='742733' lat='4373930' label='Hotel de Paris' description='Hotel de Paris' address='Palace du Casino' postalCode='98000' phone='37798063000' zoom='0' />" + "</location-document>"; // Invoke.invokeApplication(Invoke.APP_TYPE_MAPS, new MapsArguments( MapsArguments.ARG_LOCATION_DOCUMENT, document)); _middle.add(new SeparatorField()); surroundingVenues(); _middle.add(new RichTextField("max lat: " + max_lat)); _middle.add(new RichTextField("min lat: " + min_lat)); _middle.add(new RichTextField("max lon: " + max_lon)); _middle.add(new RichTextField("min lon: " + min_lon)); } private void surroundingVenues() { double point_1_latitude_in_degrees = latitude; double point_1_longitude_in_degrees= longitude; // diagonal distance + error margin double distance_in_miles = (5 * 1.90359441) + 10; getCords (point_1_latitude_in_degrees, point_1_longitude_in_degrees, distance_in_miles, 45); double lat_limit_1 = latitude_in_degrees; double lon_limit_1 = longitude_in_degrees; getCords (point_1_latitude_in_degrees, point_1_longitude_in_degrees, distance_in_miles, 135); double lat_limit_2 = latitude_in_degrees; double lon_limit_2 = longitude_in_degrees; getCords (point_1_latitude_in_degrees, point_1_longitude_in_degrees, distance_in_miles, -135); double lat_limit_3 = latitude_in_degrees; double lon_limit_3 = longitude_in_degrees; getCords (point_1_latitude_in_degrees, point_1_longitude_in_degrees, distance_in_miles, -45); double lat_limit_4 = latitude_in_degrees; double lon_limit_4 = longitude_in_degrees; double mx1 = Math.max(lat_limit_1, lat_limit_2); double mx2 = Math.max(lat_limit_3, lat_limit_4); max_lat = Math.max(mx1, mx2); double mm1 = Math.min(lat_limit_1, lat_limit_2); double mm2 = Math.min(lat_limit_3, lat_limit_4); min_lat = Math.max(mm1, mm2); double mlon1 = Math.max(lon_limit_1, lon_limit_2); double mlon2 = Math.max(lon_limit_3, lon_limit_4); max_lon = Math.max(mlon1, mlon2); double minl1 = Math.min(lon_limit_1, lon_limit_2); double minl2 = Math.min(lon_limit_3, lon_limit_4); min_lon = Math.max(minl1, minl2); //$qry = "SELECT DISTINCT zip.zipcode, zip.latitude, zip.longitude, sg_stores.* FROM zip JOIN store_finder AS sg_stores ON sg_stores.zip=zip.zipcode WHERE zip.latitude<=$lat_limit_max AND zip.latitude>=$lat_limit_min AND zip.longitude<=$lon_limit_max AND zip.longitude>=$lon_limit_min"; } private void getCords(double point_1_latitude, double point_1_longitude, double distance, int degs) { double m_EquatorialRadiusInMeters = 6366564.86; double m_Flattening=0; double distance_in_meters = distance * 1609.344 ; double direction_in_radians = Math.toRadians( degs ); double eps = 0.000000000000005; double r = 1.0 - m_Flattening; double point_1_latitude_in_radians = Math.toRadians( point_1_latitude ); double point_1_longitude_in_radians = Math.toRadians( point_1_longitude ); double tangent_u = (r * Math.sin( point_1_latitude_in_radians ) ) / Math.cos( point_1_latitude_in_radians ); double sine_of_direction = Math.sin( direction_in_radians ); double cosine_of_direction = Math.cos( direction_in_radians ); double heading_from_point_2_to_point_1_in_radians = 0.0; if ( cosine_of_direction != 0.0 ) { heading_from_point_2_to_point_1_in_radians = atan2( tangent_u, cosine_of_direction ) * 2.0; } double cu = 1.0 / Math.sqrt( ( tangent_u * tangent_u ) + 1.0 ); double su = tangent_u * cu; double sa = cu * sine_of_direction; double c2a = ( (-sa) * sa ) + 1.0; double x= Math.sqrt( ( ( ( 1.0 /r /r ) - 1.0 ) * c2a ) + 1.0 ) + 1.0; x= (x- 2.0 ) / x; double c= 1.0 - x; c= ( ( (x * x) / 4.0 ) + 1.0 ) / c; double d= ( ( 0.375 * (x * x) ) -1.0 ) * x; tangent_u = distance_in_meters /r / m_EquatorialRadiusInMeters /c; double y= tangent_u; boolean exit_loop = false; double cosine_of_y = 0.0; double cz = 0.0; double e = 0.0; double term_1 = 0.0; double term_2 = 0.0; double term_3 = 0.0; double sine_of_y = 0.0; while( exit_loop != true ) { sine_of_y = Math.sin(y); cosine_of_y = Math.cos(y); cz = Math.cos( heading_from_point_2_to_point_1_in_radians + y); e = (cz * cz * 2.0 ) - 1.0; c = y; x = e * cosine_of_y; y = (e + e) - 1.0; term_1 = ( sine_of_y * sine_of_y * 4.0 ) - 3.0; term_2 = ( ( term_1 * y * cz * d) / 6.0 ) + x; term_3 = ( ( term_2 * d) / 4.0 ) -cz; y= ( term_3 * sine_of_y * d) + tangent_u; if ( Math.abs(y - c) > eps ) { exit_loop = false; } else { exit_loop = true; } } heading_from_point_2_to_point_1_in_radians = ( cu * cosine_of_y * cosine_of_direction ) - ( su * sine_of_y ); c = r * Math.sqrt( ( sa * sa ) + ( heading_from_point_2_to_point_1_in_radians * heading_from_point_2_to_point_1_in_radians ) ); d = ( su * cosine_of_y ) + ( cu * sine_of_y * cosine_of_direction ); double point_2_latitude_in_radians = atan2(d, c); c = ( cu * cosine_of_y ) - ( su * sine_of_y * cosine_of_direction ); x = atan2( sine_of_y * sine_of_direction, c); c = ( ( ( ( ( -3.0 * c2a ) + 4.0 ) * m_Flattening ) + 4.0 ) * c2a * m_Flattening ) / 16.0; d = ( ( ( (e * cosine_of_y * c) + cz ) * sine_of_y * c) + y) * sa; double point_2_longitude_in_radians = ( point_1_longitude_in_radians + x) - ( ( 1.0 - c) * d * m_Flattening ); heading_from_point_2_to_point_1_in_radians = atan2( sa, heading_from_point_2_to_point_1_in_radians ) + Math.PI; latitude_in_degrees = Math.toRadians( point_2_latitude_in_radians ); longitude_in_degrees = Math.toRadians( point_2_longitude_in_radians ); } public double atan2(double y, double x) { double coeff_1 = Math.PI / 4d; double coeff_2 = 3d * coeff_1; double abs_y = Math.abs(y)+ 1e-10f; double r, angle; if (x >= 0d) { r = (x - abs_y) / (x + abs_y); angle = coeff_1; } else { r = (x + abs_y) / (abs_y - x); angle = coeff_2; } angle += (0.1963f * r * r - 0.9817f) * r; return y < 0.0f ? -angle : angle; } private Vector fetchVenues(double max_lat, double min_lat, double max_lon, double min_lon) { return new Vector(); } private class LocationListenerImpl implements LocationListener { public void locationUpdated(LocationProvider provider, Location location) { if(location.isValid()) { nearBy.this.longitude = location.getQualifiedCoordinates().getLongitude(); nearBy.this.latitude = location.getQualifiedCoordinates().getLatitude(); //double altitude = location.getQualifiedCoordinates().getAltitude(); //float speed = location.getSpeed(); } } public void providerStateChanged(LocationProvider provider, int newState) { // MUST implement this. Should probably do something useful with it as well. } } } please excuse the mess. I have the user lat long hard coded since I do not have GPS functional yet. You can see the SQL query commented out to know how I plan on using the min and max lat and long values. Any help is appreciated. Thanks

    Read the article

< Previous Page | 53 54 55 56 57