Found a really interesting new project!

Click Follow Public Account: Internet Architect, background reply 2TGet2TB< strong>Learning resources!

Previous: Alibaba Open Source Intranet High Concurrency Programming Manual.pdf

Source: https://juejin.cn/post/7203734711779196986

Recently I saw a very interesting project relative calculator, I feel very good, and I will share it with you today.

Note: I refer to the original project author below

Original link: https://juejin.cn/post/7203734711779196986

Relatives Calculator is probably the most complex algorithm I’ve written so far, it may seem like it’s logic is simple, only 1 method call, but it took me a lot of time!

It has been more than 7 years since the idea of realizing it came to mind at the beginning, and it has been open source for more than 7 years. During this period, I have been constantly updating it to make it more and more perfect. Its work is not only to organize the data, but also to sort out the program logic and deliberate on the design ideas.

If you are also a little bit interested in traditional culture, you might as well read on patiently… maybe you will find that a name we take for granted in daily life needs to consider so many details.

Extensive appellation system

The complexity of China’s system of calling relatives lies in the fact that it has specific names for each type of relationship, and may have different names for the same kind of relationship in different places and for people of different genders.

  1. For foreigners, the siblings of parents are nothing more than: uncle, aunt; for us, the siblings of parents are: uncle, uncle, aunt, uncle, aunt;

  2. Different places have different names for the same relative. Taking father as an example, other names include: father, father, father, father, father, father, father, old man, old man, etc.;

  3. Different relationship chains may have the same appellation; for example, the word “uncle” can be the uncle of the parents or the uncle of the husband, but the two relationships have different generations. The reason, I guess, is that traditionally, the kinship relationship created by in-laws, in order to express humility, will descend to the next generation, and follow the children to call the spouse’s elders.

  4. A title may be a combination of multiple relationships. For example: “parents”, “children”, “parents-in-law”, they do not refer to a relationship between characters, but a collective term for several relationships.

When designing this set of algorithms, I hope that it can include various titles and various relationship chains as much as possible, because the reason why I did this project is to make it really gather multiple requirements, otherwise if it is not comprehensive enough, it will be a code demonstration after all That’s all.

Relationship Network Expression

The relationship network of relatives is connected by blood and marriage. Each node is a person. basic relationship. The number of nodes in the relationship network increases exponentially with the deepening of the hierarchy! If it is a 5-layer relationship, there are probably 9x9x9x9x9 = 59049 relationships (of course, a small part of which is repeated). It is obviously impossible to collect tens of thousands of relationships and hundreds of thousands of titles. No one has the energy to maintain them.

6b98d511e1f71539cea7b855b3ba702d.png

How to express the relationship between each node in the kinship network with a data structure is a difficult point. It needs to ensure that the amount of data is as complete as possible, occupy a small volume, be easy to retrieve, and be scalable, so as to ensure the integrity and efficiency of the algorithm when retrieving relationships.

Network addressing issues

Since it is a calculation, it must not simply find the corresponding title through the basic relationships such as father, mother, son, and daughter. Otherwise, this is just a simple dictionary query, not an algorithm.

What if the question is: “Aunt’s son’s grandma’s grandson”? First, you need to find a single address on the Internet, such as “aunt”, and the next step is to find her “son”, not your own “son”. This requires a function similar to a pointer. Every time the relationship chain moves forward, the pointer guides the nodes of the relationship, and finally needs to find the answer.

As mentioned above, some titles may correspond to multiple relationships, and some relationships are not unique. For example, is your father’s son you? Could it be a younger brother or older brother? And do these also depend on your gender?

Because if you are a woman, then your father’s son must not be you!

This puts a requirement on the algorithm, which must accurately include multiple possibilities.

Estimates of age and gender

With the complexity of the relationship chain, there are many kinds of answers in the end. Is there a possibility that there are some words in the description of the relationship chain, which can be judged logically to know the gender or age of the other party, and then rule out some unacceptable ones?

For example, “the son of the lover’s mother-in-law”, we can’t infer our gender from the word “lover”, and the “mother-in-law” after that is indeed a relative that only women have, and “the lover’s mother-in-law” is enough to infer that we are male , then “the son of the lover’s mother-in-law” must include himself.

a673a64737e27e9bc1d8dca3dad49aeb.png

On the contrary, “the daughter of the lover’s mother-in-law” must not be me, but my sister.

a92b6010ec85964cf612fce0155fb710.png

Another example: my brother’s cousin is also your cousin, is your brother’s cousin still your cousin? Because you can’t tell who is older between your brother and his cousin, naturally you can’t tell whether the other is your cousin or your cousin. Since it is possible to exist, it is necessary to retain the possibility for further calculation. This involves not only hidden gender cues, but also age cues need to be considered in the calculation of the relationship chain.

cad6b8545adf9059c79898a69af8af67.png

Switching of identity perspective

Counting relatives’ titles only from the relationship chain between relatives and oneself is only a one-way calculation, and it only needs to be calculated one by one. If you want to know what the other party calls me, you need to stand on the other side’s point of view and reverse the relationship between me and him. For example, what should my “grandson” call me?

9bccbfa7a1504bbe0f30f3a8cdc6f05c.png

On the other hand, if I put myself in the third party and want to know how my two relatives call each other, I have to look at the relationship between them from the perspective of the two relatives at the same time. For example: what should my “aunt” call my “grandmother”?

c119cd5eb264eb8dddd7b5d3ce8aeb49.png

Age sorting problem

What I mentioned above are all scrutiny of the possibilities in different relationship chains, so how to judge the age if the relationship is the same? What if you have 3 uncles? Although no matter which uncle, they have the same relationship with you, you have to call their wives “aunt”. But after all, there is an age difference between them, so naturally there is an order of seniority. With sorting, it triggers thinking about the relationship between them.

Or give an example: What is the relationship between “uncle” and “aunt”? I believe most of the first reaction is the relationship between husband and wife! Not really, after all, some people don’t only have one uncle, do they? Then “big aunt” and “second uncle” are not husband and wife, they are sister-in-law. The “second uncle” has to call the “big aunt” “sister-in-law”, and the “big aunt” has to call the “second uncle” “little uncle”.

138d51b8ae6f7a8167e81e39cbe14b08.png
c42c014531d6aee49618a939aed493d5.png

Furthermore, the “son of the second uncle” has to be called “aunt” as “aunt”, and the “son of the uncle” has to be called “second uncle” as “second uncle”. These differences in titles are affected by the sorting issues of the parents, but this is what my algorithm needs to consider.

abf01fe39f6417061cd22fa84a76aca4.png
f2867bc7bf035e81e3f0e9f5db13eddd.png

How about it? Is it not as simple as imagined?

Project online access address: https://passer-by.com/relationship/

Project source code: https://github.com/mumuy/relationship

Introduction to the Principle of Algorithm Implementation

FAQ

Algorithms of the same type on the market are basically implemented in the form of “title-direct relationship-title”, such as:

"Dad": {
    "Dad": "Grandpa",
    "Mom": "Grandma",
    "Brother": "Uncle",
    "Brother": "Uncle",
    "Sister": "Aunt",
    "Sister": "Aunt",
    "Husband": "Unknown",
    "wife": "mother",
    "son": {"older": "brother", 'middle': "I", "younger": "brother"},
    "daughter": {"older": "sister", 'middle': "me", "younger": "sister"}
}

Such a structure mainly has the following problems:

  1. It is impossible to directly query across generations, such as: how to know who “aunt’s mother-in-law” is?

  2. It is impossible to reversely query titles, such as: Is the mother of “cousin’s mother” “aunt”, “aunt” or “aunt”?

  3. The data structure is too bloated, such as: there may be multiple “unknown” under a certain node

  4. Cannot be compatible with multiple titles, such as: the titles are different in different places, “Dad” can also be called “Father” or “Daddy”

  5. Unable to carry out relationship topology, such as: what is the relationship between “aunt” and me?

The principle of this algorithm

Establish a database by means of “relationship chain-title set” hash pairs, mapping each node in the relative network and its own relationship

data structure

'h':['husband','husband','Mr.','official','man','man','husband ','husband','Sister-in-law','husband-in-law','lover','wife'],
'h,f':['Elder-in-law','Weng Qin','Old Father-in-law'],

Each title can find the corresponding relationship chain, and each relationship chain has a corresponding set of titles. Here you need to introduce your own “invented” special grammatical mark

Grammar Description

【Relationship】f: father, m: mother, h: husband, w: wife, s: son, d: female, xb: brother, ob: brother, lb: brother, xs: sister, os: sister, ls: sister
【Modifiers】 1: male, 0: female, & amp; o: older, & amp; l: younger, #: partition, [a|b]: parallel
For example:

"f" corresponds to father, then: "f,m" corresponds to grandma,"f,f" corresponds to grandpa;

In this way, when querying the relationship, you only need to calculate the relationship chain, instead of doing a dictionary lookup for the title

Algorithm ideas

When the user enters “aunt’s mother-in-law”, two objects “aunt” and “mother-in-law” (the mother-in-law of the former) can be decomposed

From the “relationship chain-title set” mapping relationship, we can see that the relationship chains of these two objects are: “m,xb,w” and “h,m”, the merged relationship is: “m, xb,w,h,m”

At this time, the relationship chain will be redundant and needs further processing:

“w,h” means “wife’s husband”, i.e. yourself, you can directly simplify the relationship chain into: “m,xb,m”

Similarly, “xb,m” means “brother’s mother”, that is, one’s own mother, and the relationship chain can be simplified to: “m,m”

When it cannot be further simplified, the “simplest relationship chain” is obtained, and it is brought into the relationship database query, and it can be known that “m,m” is my “grandmother”

In this way, the complex relationship chain is converted into a direct relationship. In addition, the relationship can be found by calling in reverse according to the “relationship chain-name collection”;

Implementation details

How to realize the simplification of the relationship chain?

The relationship chain is a character string, which can be matched according to the situation by using regular expressions, and the operation of “replacement” can be achieved at the same time. Because of all indirect relationships, there is redundancy in the expression of relationship chains. Although redundancy may be multi-layered and complex, it is only necessary to consider the de-redundancy in the two-layer relationship and deal with it repeatedly.

Some multi-layer relationships may not necessarily correspond to one relationship, how to solve the non-unique relationship?

“Daddy’s son” does not have to be himself, it may also be his brother. The syntax of “isolation” and “parallel” is introduced in the grammar. Regular expressions can be used to split such non-unique relationships into multiple groups, and each time they are brought into recursion to find the simplest solution.

Each node is one level away from itself, and the node data doubles. How to solve the problem of excessive data volume?

There are certain rules in the relationship between relatives in China. Collateral branches generally consist of branch nodes and their offspring relationships. We only need to record branch nodes and offspring relationships. For example: “Uncle Cousin” and “Cousin” have certain similarities in the relationship chain with themselves, so it is not necessary to record all the relationships between the two. You only need to know that “cousin” is a descendant of “uncle”, and “cousin” is a descendant of “uncle”, then all the descendants and in-laws of “cousin” and “cousin” can only save 3 parts. Right now:

Uncle and cousin relationship data = uncle (branch node) + elder brother relationship data (child relationship);

Cousin relationship data = uncle (branch node) + brother relationship data (child relationship);

There are many such relationships, such as: “Uncle Biao”, “Aunt Biao”, “Congtang”, “Aunt Biao and Uncle Biao”, etc. By splitting and multiplexing the relational data, the amount of compressed data can be achieved. At the same time, the database can be assembled by splicing the branch node and the child relationship during the running of the script.

Do you know any other good ways to calculate relatives, please leave a message in the comment area~

Finally, pay attention to the official account Internet Architect, and reply in the background: 2T, you can get the Java series interview questions and answers I compiled, which are very complete.

end of text

Recommended reading ↓↓↓

1. How does JetBrains view its software being frequently cracked in China?

2. Alibaba Open Source Intranet High Concurrency Programming Manual.pdf

3. From what platform can programmers generally accept private work?

4. 40 years old, just got laid off, want to say something.

5. Why is the domestic 996 not as good as the foreign 955?

6. What is the level of China’s railway booking system in the world?

7.15 pictures to understand the difference between being busy and being efficient!

7ce15d066ac07498d3a40ab7b83a2068.gif