Show Notes For Final Podcast:
We will be talking about the results of static code analysis on the source code of three different projects:
1. Audacity:
2. RethinkDB is an open-source distributed database. It has an intuitive query language, automatically parallelized queries, and simple administration: https://github.com/rethinkdb/rethinkdb
3. Fast Image Cache:
In the final blog post submitted, we focused on the possible code violations present in the source code and explored the architectural features like dependency graphs, UML diagrams of the file subsystems. In this podcast, we will focus on static bug detection using clang. Clang is a standalone tool which is invoked from the command line, and is intended to be run in tandem with a build of a codebase. As number of defects is an important indicator of software quality, We decided to run Clang on each of the code bases to assess the code quality.
The first source code we assessed was Audacity which is an open source software for recording podcasts.
Following were the violations found:
- An external object or function should be declared in one and only one file. We found a violation wherein there were multiple declarations of the same object and function.
- Externals shall have the same type in the declaration and definition If one or more of the declarations incompletely specify the object's type, and there exists one declaration of the object with completed type, all the declarations are taken to be in agreement with the completed type.
- Global: Identifier with external linkage shall have exactly one external definition.
- If an object with external linkage is declared or used in an expression, there must be only one external definition for the identifier somewhere in the program. If the same object is declared more than once externally, the declarations must agree in type and linkage.
- Global: Prefer internal linkage over external whenever possible
- Global: use static keyword for internal linkage
- 281 files had control flow violations(Dangling Else, Single Exit Point at End, Unreachable code)also, 2344 files contained violations related to Memory Allocation in the form of dynamic heap allocation.
The code base is pretty large and no bugs were found using our static analysis methods with the exception of a few standards we mentioned before. And they had a very good comment to line ratio which shows that the code is well documented according to the industry standards.
The second source code we assessed was RethinkDB which is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.
RethinkDB has a C++ code base which indicates that it has Object Oriented Properties.
The four steps involved in the Object Oriented Design process are:
a. Identification of classes and objects
b. Semantics of classes and objects
c. Relationships between classes and objects
d. Implementation of classes and objects
The metrics developed can be listed as follows:
Findings:
· Certain coding standards were violated like presence of commented out code, goto statements which should not be present ideally. Also certain functions were too long which made them candidates for refactoring. There were also instances of unreachable code and unused functions.
· In the metrics summary, we get an idea about the lines of code, classes and files involved. The architectural browser gives us an idea about the languages and directory structure. It becomes easier for maintainers to seek expertise in the field they need to maintain and support the code
· The lines of code are few in comparison to the previous two code bases explored, this is a relatively smaller application, also the comment to line ratio is 0.57 which may indicate the possibility of a well documented code.
We also generated the following metric charts:
· Code volume distribution which helps us to know the distribution of blank lines, comment lines, execution lines and code declaration lines
· We generated complexity charts which gave us the amount of cyclomatic complexity which deals with number of linearly independent paths through a programs source code
The third source code we assessed wast that of FastImageCache.
Memory leaks due to new and delete operations and also due to virtual function calls during construction and destruction. Also there were a few instances of unreachable code, instances of identifiers with more than one external linkages, instances where static keyword was not used for external linkages.
We finally constructed a treemap which gave us the number of violations and distinct violation types in each file. The result locator feature helped us to locate the violation in the particular line and column of file.
In addition to code check, we have also generated Architectural Internal dependencies and parent declaration graphs and architectural internal dependency by language.
Conclusion:
We analysed three code bases as mentioned and discovered violations with respect to coding standards. We noticed that the one of the obstacles in the acceptance of the proposed visual analysis techniques was the difficulty of finding a suitable source code . The other difficulty we faced was to analyse the complexity of code using a hybrid tool like Understand. Reverse-engineering and code assessment scenarios are by nature iterative and exploratory, so they map perfectly to repeated selection, query, and these efforts need to be iterated atleast a few times to get an acceptable result . Certain limitations faced by any software maintainer as follows:The three difficulties discussed above show that the software maintainer needs to interact, especially in the implementation phase, with an environment that presents a view of software systems characterized by: a) a large set of interrelated facts that must be stored in some sort of data base; b) a query language that allows the extraction of a subset and correlation of different facts; c) an evolutionary set of general and summary rules that define answers to questions, starting from the facts actually stored. Every static analysis tool suffers from the problem of false positives. To eliminate them automatically, there is a need of an alternative, where reported violations are identified as genuine or false positive in a semi automatic manner.
No comments:
Post a Comment