编译原理助我脱坑

2023-02-28

def 注释 ch

一、背景有一个需求，需要将源码提供出去，交予三方进行安全审核，为了减少代码泄漏带来的影响，要求将自己的源码中要有代码注释对外提供的代码中，将所有的代码注释移除，增加其他人的代码阅读难度二、艰难的爬坑过程1、整理java中的注释情形多行注释：复制/**多行注释*/1.2.3.单行注释：复制//单行注释

一、背景

有一个需求，需要将源码提供出去，交予三方进行安全审核，为了减少代码泄漏带来的影响，要求将

自己的源码中要有代码注释
对外提供的代码中，将所有的代码注释移除，增加其他人的代码阅读难度

二、艰难的爬坑过程

1、整理java中的注释情形

多行注释：

/**
多行注释
*/1.
2.
3.

单行注释：

// 单行注释1.

2、初步入坑

对上一步的情况分析后简单的结论，

以 /* 开头，以*/结尾
以 //开头，以换行或者文件结束结尾

3、核心代码

获取多行注释内容的代码

re.findall("\/\*.*?\*\/", content, re.S)1.

按行遍历文件，检查是否包含 “//”，如果包含就将 “//”及以后的内容替换为换行

re.findall("\/\/.*", line, re.S)1.

4、进入大坑

本以为这样是一个简单粗暴的方法，真正跑起来之后发现有较大问题

字符串中包含多行和单行注释的话，就会导致字符串内的内容被修改

QQSj6VIJ">单行被错误移除

"(?m)^//.*(?=\\n)"1.

单行注释移除异常

// 后面有一个单独的 “ ，引号的排查优先级高，导致这个引号被保留，导致整体移除异常1.

字符串的识别遇到问题

‘“’1.

"select * from user where name = 'root'"1.

“\"” 1.

中间的引号不能作为字符串的开始和结尾
最后的双引号需要算做字符串的结尾
这样的不能当作双引号，也就是这样的不能作为字符串的开始和结尾

三、回头是岸

有时候捷径不是最快的路径。

折腾了几天之后发现按照穷举法去发现所有的异常case实在是太难了，因为我们的精力是有限的，一时半会无法想到所有的case，那有没有什么办法呢？

这个时候记忆深处的一些内容开始冒泡泡，程序员的三大浪漫之一的 编译原理 开始出现了。

之前懵懵懂懂读过的文本开始有一点点印象了。重新翻开经典之作的内容，看看他是怎么来处理词法和语法的。

1、回顾编译原理

词法分析，程序中的单词大体可以分成五类：

语法分析，比如，对于赋值语句position = initial + 2 * 60，经过语法分析后生成的树

语义分析，比如position = initial + 2 * 60 经过语义分析后

2、着手处理

按照编译原理中讲的过程，要先一个的拆成词，然后将词串成语句，然后一个语句一个语句的处理。

整体的思路：

判断是否已经在不用关注的范围内，例如在双引号中间的，在多行注释中的，在单行注释后面的
如果已经开始了，就只用关注是否是双引号、多行注释、单行注释的结尾
如果是结尾，就分别处理，

多行注释的删除
单行注释的删除
双引号中间的保留

在双引号、多行注释、单行注释开始的时候，把前一次的给保存到新文件中

3、代码

# coding=utf-8
foler_path = "./java/test/"

def rewriteContent(dirpath, filename, content):
    writefile = open(dirpath + "/" + filename, "w+")
    # print content
    writefile.write(content)
    writefile.close()



def clean_all_note():
    for dirpath, dirnanes, filenams in os.walk(foler_path):
        for filename in filenams:
            print dirpath + "/" + filename
            clean_note(dirpath, filename)

#判断是否是双引号，需要排除 '"' 和 \" 的情况，
def is_available_quotes(ch, pre_ch,next_ch):
    return ch == "\"" and pre_ch != "\\" and not (pre_ch == "'" and next_ch == "'")

#判断是否是多行注释的开头 即 /*
def is_prefix_multiline_comment(ch, pre_ch):
    return ch == "*" and pre_ch == "/"

#判断是否是多行注释的结尾，即 */
def is_suffix_multiline_comment(ch, pre_ch):
    return ch == "/" and pre_ch == "*"

#判断是否是单行注释 //
def is_single_line_comment(ch, pre_ch):
    return ch == "/" and pre_ch == "/"

# 判断是否是换行
def is_line_feed(ch, pre_ch):
    return ch == "\n"


def clean_note(dirpath, filename):
    file = open(dirpath + "/" + filename, "r+")
    content = file.read()
    multiline_ing = False
    single_line_ing = False
    quotes_ing = False
    pre_ch = ""
    index = 0
    lastPoi = 0
    newContent = ""
    for ch in content:
        if multiline_ing:
            if is_suffix_multiline_comment(ch,pre_ch):
                # print "m l e:" + pre_ch + ch
                lastPoi = index+1
                multiline_ing = False
        elif single_line_ing:
            if is_line_feed(ch,pre_ch) or index == len(content)-1:
                # print "s l e:" + content[lastPoi:index-1]
                lastPoi = index
                single_line_ing = False
        elif quotes_ing:
            #解决 "\\"
            if ch == "\\" and pre_ch == "\\":
                ch = ''
            if is_available_quotes(ch, pre_ch,content[index+1]):
                # print "yinhao e :" + pre_ch + ch
                newContent = newContent + content[lastPoi:index]
                lastPoi = index
                quotes_ing = False
        else:
            if index == len(content)-1:
                # print "e s :" + pre_ch + ch
                newContent = newContent + content[lastPoi:index+1]
            elif is_available_quotes(ch, pre_ch,content[index+1]):
                # print "yinhao s :" + pre_ch + ch
                # newContent = newContent + content[lastPoi:index]+"----"
                quotes_ing = True
            elif is_prefix_multiline_comment(ch, pre_ch):
                # print "m l s :" + pre_ch + ch
                newContent = newContent + content[lastPoi:index-1]
                multiline_ing = True
            elif is_single_line_comment(ch, pre_ch):
                # print "s l s :" + pre_ch + ch
                newContent = newContent + content[lastPoi:index-1]
                single_line_ing = True

        index = index+1
        pre_ch = ch
    rewriteContent(dirpath, filename, newContent)





if __name__ == '__main__':
    for dirpath, dirnanes, filenams in os.walk(foler_path):
        for filename in filenams:
            clean_note(dirpath, filename)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.

深圳幻海软件技术有限公司

编译原理助我脱坑

一、背景

二、艰难的爬坑过程

1、整理java中的注释情形

2、初步入坑

3、核心代码

4、进入大坑

三、回头是岸

1、回顾编译原理

2、着手处理

3、代码

VR、AR、MR | 虚拟世界近在眼前

&& ，|| 超越了我的认知

遗留系统的服务拆分

Spring Boot加一个注解，轻松实现 Redis 分布式锁

谈谈JS二进制：File、Blob、FileReader、ArrayBuffer、Base64

7月份中国市场品牌智能手机销量：OPPO第一，荣耀第三，小米第四

前端(js部分讲解)

❤️全面图解快速排序，详细图文并茂解析！❤️

深圳幻海软件技术有限公司

编译原理助我脱坑

一、背景

二、艰难的爬坑过程

1、整理java中的注释情形

2、初步入坑

3、核心代码

4、进入大坑

三、回头是岸

1、回顾编译原理

2、着手处理

3、代码

如何两天时间上线一款AI应用？

用自己的编程语言实现了一个网站（增强版）