Uncoded Series 5.1 Code Refactoring to Eliminate Duplicate Code

1 Foreword

This article can be regarded as a “personal interpretation” of Yuanyingjie, a senior consultant at ThoughtWorks, on the “Four Orthogonal Principles” strategy of “eliminating duplication”. If there are any errors, please correct me. First of all, I would like to thank yuanyingjie, senior consultant of ThoughtWorks. This document “copied” a large number of consultants’ opinions. There are too many, and I apologize for not marking them one by one.

The “Practical Combat” chapter was written by the author for recent code exercises and is a real code evolution process. The complexity of this code just fits the expression of the “refactoring” idea to solve the problem of “lack of actual project reference” in software engineering training.

The case code in this article is a by-product of an exercise. Its design idea originated from a module developed by the author ten years ago. At the same time, it is also a practical test of the ideas of this series of blog posts “Uncoded Series-2 Code Architect’s Fantasy”. Therefore, the design style will be different from TransDSL (designed by Yuanyingjie). But this does not prevent us from learning and exploring ideas about “code refactoring”.

2 Thoughts on eliminating duplicate code

2.1 Why should we eliminate duplicate code?

Eliminating duplicate code is a “driver” that triggers a move toward minimalist code design. It transforms architectural design ideas into a practical method that can be implemented in practice.

It is not the essence of architectural design, nor is it the ultimate goal of writing code, but it triggers people to think: Can my code be simpler? Could it be any simpler? This prompts us to design a code form with high cohesion and low coupling.

Note: The code form of high cohesion and low coupling is for large software and long life cycle software to better adapt to changes. It makes code changes (modification/addition/deletion of functional features) centralized and orderly, preventing one move from affecting the whole body. The readability of its code is placed second. Let’s think about it in reverse: If a software is developed today and will be obsolete the day after tomorrow, is it necessary to be so particular? In this case, process-oriented fast coding is the best choice.

2.2 What is duplicate code?

Code architecture design is so that “the code can better express the business.” An excellent architecture allows business code to “only focus on the essence of business logic” while paying as little additional cost as possible. These additional costs include (not limited to list only):

1) Preparatory work to describe business logic: such as collecting and organizing data;

2) Additional code added due to weak language expression capabilities; such as multi-threading and its thread pool management, asynchronous critical data and locks.

3) Adapt to specific operating environments: adapt to differences in virtual machines and physical machines, differences in communication protocols, and differences in operating systems.

4) Commercial costs, product specifications considerations: memory consumption, IO capabilities, power consumption…

High cohesion and low coupling science:

High cohesion means abstractly integrating repetitive and scattered code logic. On the surface, it seems to be a collection of duplicate codes. There is a common logic hidden behind the repeated code. This logic can be integrated through abstraction and become a logical unit that is not directly perceived by the business. This logical unit serves the business, but is not the essence of business logic. It is precisely because it is “not the essence of business logic concern” that we can make such a logical unit independent and make business logic rise from “no concern” to “lower dependence”. This achieves our goal of “low coupling”. Special attention should be paid to a concept here: low coupling does not mean “no coupling at all”. The correlation of business logic exists objectively. The limit of decoupling is “low enough coupling”, but it is impossible to be “complete decoupling”.

Duplicate code found:

If a business is defined as “characteristics that are different from other businesses”, then the common logic is not a business and can be understood as a kind of infrastructure. One way to eliminate duplication is to turn common logic into infrastructure. The ultimate goal is that we get a code with high cohesion and low coupling. Note: This is only one of the methods, there are more methods waiting for you to design.

Usually we regard the code that is the same, or if one code has bugs and the other code must also have bugs, as duplicate code. Or if a piece of code needs to be modified, another piece of code will inevitably need to be modified similarly.

In order to save time, I assume that readers have already done some code micro-refactoring and accumulated regular experience. We won’t discuss the “obvious” duplication of code. It also does not discuss solutions such as “macro definition” and “function encapsulation” that have been clearly defined in “textbooks”. Besides “obvious” code duplication, how should we look for “duplicate code”?

Use reverse thinking to observe two things A and B. When we introduce business B to a friend who is very familiar with business A, we will say: B is very similar to A, it is just a little different from A in aspect X. In a sense, if code is used to implement two businesses, A and B, is the code only slightly different in aspect X?

When business code A already exists, business code B is developed. When our lovely programmers write code for business B, they will find that things are far from “as easy as it sounds”. It would be very laborious to share the A business code and modify its X differences. It is better to write a separate business code for B, and try to reuse A’s functions if they can be reused, and write another piece of code if they cannot be reused. The code we observe every day is roughly like this: There is a lot of reused code, but it is often limited to one function or a small fragment. In terms of expression, the code is far from being as easy to express as everyday language.

How should the sentence “only slightly different in X” be expressed in code? If “X aspects” are cells scattered in each message flow, they correspond to different scenarios, different code blocks… All in all, the logic duplication between the two businesses A and B is not necessarily “duplicated” at the code level. To say the least, if you have found a way (tool) to capture “logical duplication”, then it is duplicate code. If you haven’t found a way yet, just think of it as “no code duplication”. In a sense, whether code is repeated or not is related to the language we use, its characteristics, and the skills we master. We have more tools and methods to express business logic concisely, and we can find more commonalities in the business. It’s the duplicate code we’re looking for.

2.3 Methods to eliminate duplicate code

There are many ways to eliminate duplication. For example, use macro definitions, encapsulate them into functions, and use Java annotations. Some duplications are “logical duplications with different code”. Conventional methods of eliminating duplication may not necessarily apply. If a good architecture can be designed and a common logical expression scheme can be found from the perspective of code logic abstraction, duplication can be eliminated. Without architectural support, these logics cannot be reasonably integrated. Even we think “this is not repeated code” due to “limitations in knowledge and ability”.

The programming language itself does not have certain characteristics. For example, C++ does not have Java’s reflection mechanism. This will make it difficult to express some code logic. So the price paid to eliminate a duplicate code is too high, with no benefit. At this time, the intuitive judgment given is “this code is not repeated.” If we can find a solution to the problem from language features, that is, reduce the “cost” to a low enough level, it can become a means of “eliminating duplicate code”. Therefore, the means are devised by people and are not limited to conventional methods such as macro definition and encapsulation into functions.

Code architects should know many high-level languages and understand the reasons behind high-level language “features.” When any language provides a certain feature, it has its background and intention. It must be to solve a certain problem and make it cheaper to solve the problem. Only in this way, when we encounter a problem, can we quickly flash a solution: How is this problem solved in XX language? What analog/alternative solutions do we have in the language of the current product?

2.4 “Elimination of duplicate code” becomes the driving force for code architecture design

The criterion for judging duplication is not that the code is exactly the same or similar. The same must be repeated. If the code is different but the logic is the same, or even the logic is different, but similar in spirit but not in form, is it duplication?

For example, A sends a message and receives a response. B sends a message but does not receive a response. No matter in terms of message encoding format or message behavior, A and B are different. In transDSL (see “Transaction DSL – Yuan Ying Jie.pdf” by Yuanyingjie, senior consultant at ThoughtWorks), they are all encapsulated as actions (one step). That is, it is incorporated into a unified action. Even an action can be a unit that neither sends nor receives messages, or only receives but does not send. For these logically different codes and behaviors, if a model is found to express their commonalities and their characteristics are continuously “refined”, the final result will be the elimination of duplication.

Therefore, the fundamental meaning of eliminating duplication is to refine the commonality of the code and retain the characteristics of the code. Described in a mathematical sense, any transaction can be separated into a general solution space and a special solution space. If we feel that the code is not duplicated, maybe we just haven’t found a more refined way of expressing features for the time being. The result of eliminating duplication is to structure the general solution to the problem and compress the expression of the specific solution into the simplest and most essential way.

The problem always has its essence, and it is almost impossible for us to make the code express a complexity lower than the essence of the problem. Most of the time the form of code expression is much more complex than the essence of the problem. This complexity is an additional burden imposed by the expressiveness of the code. We need to find a tool to reduce this additional burden. Eliminating duplicate code is a driving force that tries to remind us that “it’s time to find such a tool.” This tool becomes our code architecture.

3 Eliminate duplicate code —Practical part

3.1 Extract process framework from business special cases

The appendix “Code Evolution First Edition” uses a process-oriented approach to implement an FTP server toy (just to illustrate the problem, don’t worry about the details). Process-oriented business processing code usually has better readability. For any process link, you can find a corresponding piece of code.

Note: Therefore, the author does not despise process orientation. In some scenarios, process orientation is more advantageous. Process-oriented and object-oriented are like a screwdriver VS a Swiss army knife. It is difficult to say who can replace the other. The author will not argue whether PHP is the best language.

There is a serious problem with the first version of the code: all code is “private” and there is no public logic for third parties to use.

For example, Listen is a long-term execution task, Login, UserName, and Password behaviors are serial links, and Get, Put, Ls, Cd, and other behaviors are parallel cases. They all belong to “execution units”. In order to coordinate the sequence relationship and error handling of these “execution units”, there is a lot of duplication of code.

3.1.1 Problems with code

§ Duplicate error handling:

Note: A good architecture needs to be able to handle exceptions “elegantly”. Error handling requires “centralization and unification” to avoid handling abnormal scenarios in many ways and causing other problems. This is also a requirement for trusted software.

§ Duplicate code case form:

Note: A large number of switch-cases are a common form of procedural code. Since it does not have the ability to inject objects, it has to be enumerated through a large number of cases in a function. This kind of code leads to shotgun changes (a cohesive feature needs to be split into different code blocks).

§ Business process code cannot be copied:

The code describes an FTP usage process in a process-oriented manner. But this process is difficult to be reused by other businesses. For example, make a mutation to the FTP usage scenario to obtain a new business process. This code is completely unreusable.

3.1.2 Refactored code

§ The business process description mechanism is extracted so that the business process description can be shared by other businesses.

§ Unified error handling mechanism: OnErrorGoto

§ Eliminates a large number of switch-cases embedded in the same function, reducing coupling between business units

§ Intuitive description of business processes: abstract and gather together the process description codes scattered throughout the code. Provides a visual overview of the entire business

a href=””>§ Logical correspondence between code reconstruction:

§ Explicitly extract the process-oriented “process logic”

This process logic is a state machine in a logical sense. Traditional software design is translated from “state transition diagram” into code, the corresponding code implementation is a state machine. In the implementation idea of transDSL, the state machine is hidden in the process described by trans. In this code, the state machine is hidden in the “execution plan”.

Note: The above code style and refactoring refer to more refactoring knowledge points. Subsequent documents will explain step by step.

Since this code example is slightly different from transDSL, some readers are already familiar with transDSL. A logical comparison is provided here.

href=””>3.2 Consolidate scattered codes to improve cohesion

Regularize the code form and find macro-duplicate code. The code of a business logic unit should be put together as much as possible. Through some features provided by the language, we try our best to put a business code into “one building block”. Prevent the code of this building block from being disassembled and scattered in multiple locations. This makes it easier for us to compare and analyze the two building blocks and find commonalities. And further simplify the building blocks.

Note: The Java language provides a good annotation/injection method to achieve this. The Go language implementation is not that elegant and can barely do it. C++ uses global variables for information exchange and can also simulate injection. In C language, it is inevitable to actively “register” information to the global situation.

§ Group information belonging to a business unit:

There is a certain degree of logic similarity between different business units. But from a code perspective, it’s very different. This is because a complete logical unit is broken down into multiple “smaller particles” by us. These small particles are classified and stored. So much so that when comparing codes that should be similar in “large-grained” business units, they are not similar in small-grained comparisons. To put it simply, things that should be similar at the macro level are no longer similar at the micro level.

§ Look again to see if there are more similarities:

In the above code, there are several obvious code duplication points:

1. The mechanism for registering event listening is the same, but the parameters are different.

2. The for loop is the same

3. The mechanism of stop control exit is the same.

4. There is a common mechanism for NameMap. This mechanism was later used to carry out the important task of “eliminating duplication”

3.3 Use reverse derivation to Extract the essence of the business

Through reverse thinking, we first look for the essence of the business. Find out which code will never be repeated.

Look at the box in the picture below to see the essential difference between the two business units. The rest of the code is an extra price to pay to execute it correctly.

§ Discover essential differences in your business:

Remove “non-essential” code:

1) Register event listening code, parameterized

2)For loop control, converted into return instruction

3) stop controls signal processing, extracts it and integrates it into the framework

The effect after removing duplicate code:

The code marked in yellow below is the “most essential” expression of business logic. After refactoring, other auxiliary code logic has been reduced to a minimum.

Note: The key here is “reduce the code logic to the minimum”, not “reduce the number of lines of code to the minimum”. Because the purpose of refactoring is to “reduce the logic complexity of the code”, not to reduce the number of lines of code. Although most refactorings also resulted in a significant reduction in lines of code.

var _ = gworker.NameMap("CmdLs", & amp;CmdLs{},"ls")
  func (t *CmdLs)OnRequest(wkSpace interface{}, data string) gworker.TaskCtrl {
   space := wkSpace.(*SpaceA)
   fmt.Println("rcv cmd:", data)
   sendMsg := "list:\r\
a.txt\r\
b.txt\r\
c.txt\r\
"
TcpSendMessage(space.TcpCon, sendMsg)
   return gworker.TaskWaitMore
}

4 Appendix code snippet

Note: The code in this appendix is a by-product of the author’s exercises. It is available for everyone to learn and verify software engineering methodology. If you need to run this code framework in products or tools, please reinforce and improve it yourself. This is the code for the GO language version. There is also a similar code structure in the C++ language version. If necessary, you can request it from the author.

4.1 Code Evolution First Edition

func main() {
addr := "127.0.0.1:8080"
fmt.Println("listen :" + addr)
fmt.Println("input command: cd ls get close")
listenOnPort(addr)
return
}

Business unit implementation case:

Process and business logic are mixed together. Flow control logic cannot be reused. Exception handling mechanisms cannot be aggregated.

func FtpServer(conn net.Conn) {
   defer conn.Close()
  
   var ftpInfo FtpInfo
  
     user, _, isOk1 := GetUserName(conn)
if isOk1 == false {
      fmt.Println("get user name error")
      return
}
  
   ftpInfo.UserName = user
  
   pwd, _, isOk2 := GetPassword(conn)
   if isOk2 == false {
      fmt.Println("get password error")
      return
}
  
   ftpInfo.Pwd = pwd
  
   leftBuf := []byte{}
   cmd := ""
isOk := false

isClode := false

for {
      if isClode == true {
         break
}
  
      cmd, leftBuf, isOk = AskAndAnser(conn, "", leftBuf)
      if isOk == false {
         continue
}
  
      switch cmd {
      case "cd":
         {
            conn.Write([]byte("change dir\r\
"))
         }
  case "ls":
         {
            conn.Write([]byte("a.txt\r\
b.txt\r\
c.txt\r\
"))
         }
      case "get":
         {
            conn.Write([]byte("a.txt\r\
"))
         }
  
      case "close":
         {
            conn.Write([]byte("close....\r\
"))
            time.Sleep(time.Duration(1) * time.Second)
            isClode = true
}
      }
   }
}

4.2 Code Evolution Second Edition

The second version implements a “language” based on logical description to describe business processes. The code architecture dynamically assembles business logic through text names and interprets them for execution.

func ServiceA() gworker.SpaceIf {
   var plan SpaceA
   plan.Init()
  
plan.OnErrorGoto("StepEnd") // Jump to StepEnd after an error and perform final processing

plan.S("Listen 127.0.0.1:8080") // Listening port, infinite loop call. After receiving a connection here, a new task needs to be forked.
plan.B("ReadLine", "DispatchCmd") // Background execution of full life cycle tasks
plan.S("CmdLogin", "CmdUser", "CmdPwd") // Execute tasks sequentially and verify passwords
plan.P("CmdPut", "CmdLs", "CmdCd", "CmdGet", "CmdClose") //Concurrent execution of tasks
plan.S("StepEnd") //Enter the task termination operation and recycle resources
plan.S("SayBye")
  
   return & plan
}

Requires a global registry information:

func init(){
   fmt.Println("gworker init...")
  
   AddNameMap("Listen", & amp;Listen{})
   AddNameMap("ReadLine", & amp;ReadLine{})
   AddNameMap("DispatchCmd", & amp;DispatchCmd{})
   AddNameMap("CmdLogin", & amp;CmdLogin{})
   AddNameMap("CmdUser", & amp;CmdUser{})
   AddNameMap("CmdPwd", & amp;CmdPwd{})
   AddNameMap("CmdPut", & amp;CmdPut{})
   AddNameMap("CmdLs", & amp;CmdLs{})
   AddNameMap("CmdCd", & amp;CmdCd{})
   AddNameMap("CmdGet", & amp;CmdGet{})
   AddNameMap("CmdClose", & amp;CmdClose{})
   AddNameMap("StepEnd", & amp;StepEnd{})
   AddNameMap("SayBye", & amp;SayBye{})
}

Implementation case of a business unit:

type CmdLs struct {
   gworker.TaskBase
}
  
  func (t *CmdLs)OnRequest(wkSpace interface{}) int {
   fmt.Println("CmdLs finish OnRequest" )
  
   space := wkSpace.(*SpaceA)
  
myChn := make(chan interface{}, 5)
   space.EventListen("ls", myChn)
   defer space.EventListen("ls", nil)
  
   inRun := true
for ;inRun; {
      cmd, ok := <- myChn
      if ok != true {
         break
}
      switch msg := cmd.(type) {
      case gworker.EventStop:
         inRun = false
case string:
         fmt.Println("rcv cmd:", msg)
         sendMsg := "list:\r\
a.txt\r\
b.txt\r\
c.txt\r\
"
TcpSendMessage(space.TcpCon, sendMsg)
      }
   }
  
   return 0
  }

4.3 Code Evolution Third Edition

func ServiceA() gworker.SpaceIf {
   var plan SpaceA
   plan.Init()
  
plan.OnErrorGoto("StepEnd") // Jump to StepEnd after an error and perform final processing

plan.S("Listen 127.0.0.1:8080") // Listening port, infinite loop call. After receiving a connection here, a new task needs to be forked.
plan.S("Echo receive a tcp connect")
   plan.B("ReadLine", "DispatchCmd") // Background execution of full life cycle tasks
plan.S("Echo login first:", "CmdLogin", "CmdUser", "CmdPwd") // Execute tasks sequentially and verify passwords
plan.S("Echo and then input command: put cd ls get close")
   plan.P("CmdPut", "CmdLs", "CmdCd", "CmdGet", "CmdClose") //Concurrent execution of tasks
plan.S("StepEnd") //Enter the task termination operation and recycle resources
plan.S("SayBye")
  
   return & plan
}

Implementation case of a business unit:

The third edition organizes the format of business units and uses Go language features to implement a mechanism similar to Java annotations. Makes object injection more elegant. At the same time, the originally separated business logic unit codes are physically gathered together. Become a real building block.

var _ = gworker.NameMap("CmdLs", & amp;CmdLs{})
  type CmdLs struct {
   gworker.TaskBase
}
  
  func (t *CmdLs)OnRequest(wkSpace interface{}) int {
   fmt.Println("CmdLs enter OnRequest" )
  
   space := wkSpace.(*SpaceA)
  
myChn := make(chan interface{}, 5)
   space.EventListen("ls", myChn)
   defer space.EventListen("ls", nil)
  
   inRun := true
for ;inRun; {
      cmd, ok := <- myChn
      if ok != true {
         break
}
      switch msg := cmd.(type) {
      case gworker.EventStop:
         inRun = false
case string:
         fmt.Println("rcv cmd:", msg)
         sendMsg := "list:\r\
a.txt\r\
b.txt\r\
c.txt\r\
"
TcpSendMessage(space.TcpCon, sendMsg)
      }
   }
  
   return 0
  }

4.4 Code Evolution Fourth Edition

func ServiceA() gworker.SpaceIf {
   var plan SpaceA
   plan.Init()
  
plan.OnErrorGoto("StepEnd") // Jump to StepEnd after an error and perform final processing

plan.S("Listen 127.0.0.1:8080") // Listening port, infinite loop call. After receiving a connection here, a new task needs to be forked.
plan.S("Echo receive a tcp connect")
   plan.B("ReadLine", "DispatchCmd") // Background execution of full life cycle tasks
plan.S("Echo login first:", "CmdLogin", "CmdUser", "CmdPwd") // Execute tasks sequentially and verify passwords
plan.S("Echo and then input command: put cd ls get close")
   plan.P("CmdPut", "CmdLs", "CmdCd", "CmdGet", "CmdClose") //Concurrent execution of tasks
plan.S("StepEnd") //Enter the task termination operation and recycle resources
plan.S("SayBye")
  
   return & plan
}

Implementation case of a business unit:

The fourth version of the code extracts similar codes in the business logic unit. The event listening mechanism is parameterized, and the control process capabilities are implemented in the architecture.

var _ = gworker.NameMap("CmdLs", & amp;CmdLs{},"ls")
  type CmdLs struct {
   gworker.TaskBase
}
  
  func (t *CmdLs)OnRequest(wkSpace interface{}, data string) gworker.TaskCtrl {
   fmt.Println("CmdLs enter OnRequest" )
   space := wkSpace.(*SpaceA)
  
   fmt.Println("rcv cmd:", data)
   sendMsg := "list:\r\
a.txt\r\
b.txt\r\
c.txt\r\
"
TcpSendMessage(space.TcpCon, sendMsg)
  
   return gworker.TaskWaitMore
}