2.3. Code Smells

Source [1]

  1. Using globals - but that is a code smell in any language.

  2. Using eval and exec unless the input is tightly controlled.

  3. Using range over the length of a container to iterate over the container, or get the index.

  4. Creating a list of values, when you don't need to keep the list - a generator is often better.

  5. Having getter and setter methods rather than using properties.

  6. Having named methods with similar semantics to existing operators, instead of overriding the magic methods those operators use (eg. having an add method rather than overriding __add__() and __iadd__() ) .

  7. Using nested loops to combine two iterables rather than using something out of itertools.

  8. Having lots of code and classes in a single module rather than having a sensible split of code into well named modules.

  9. Unit tests that don't use mock (or similar).

  10. Modules, classes, methods and functions with no docstring

  11. Methods and functions which return a value to signify an error rather raising an exception.

  12. Using open and similar without using with.

  13. Using mutable values as a default arguments value (without a clear comment stating it is deliberate)

  14. Using explicit loop to build a list/set/dict rather than a comprehension.

  15. Using system to call OS commands rather than using shutil methods.

  16. Using str manipulation (slicing etc) to build file paths rather than use os.path or pathlib.

  17. You code has a function which is a reimplementation of code in the standard library.

  18. You have started a new Project in 2019 (or late 2018) which uses Python2, without a good technical reason. It being what you learnt is not a good technical reason - if you learnt Python2 then you can learn Python3.

2.3.1. Bad practice

2.3.2. range(len())

  • Very common bad practice

  • poor variable naming and readability

  • range(len(...)) will evaluate generator to calculate length

  • DATA[i] lookups has O(n) complexity!!

  • Does not use generator at all!

Bad practice:

DATA = ['a', 'b', 'c']

for i in range(len(DATA)):
    print(DATA[i])

# a
# b
# c

Better solution:

DATA = ['a', 'b', 'c']

for letter in DATA:
    print(letter)

# a
# b
# c

2.3.3. Example 1

  • Fixing to only valid solution

a = 'UL. JANA III SOBIESKIEGO 1/2'
b = 'ulica Jana III Sobieskiego 1 apt 2'
c = 'os. Jana III Sobieskiego'
d = 'plac Jana III SobieSKiego 1/2'
e = 'aleja JANA III Sobieskiego'
f = 'alei Jana III Sobieskiego 1/2'
g = 'ul. jana III SOBIESKIEGO 1 m. 2'
h = 'os. Jana Iii Sobieskiego 1 apt 2'

a_prim=a[4:-4].title().replace(a[9:12],a[9:12].upper())
a_prim=a_prim.replace(a_prim[6:9],a_prim[6:9].upper())

dl=len(a_prim)


b_prim=b[6:6+dl]
c_prim=c[4:4+dl]
d_prim=d[5:5+dl].replace('SK','sk')
e_prim=e[6:6+dl]
e_prim=e_prim.replace(e_prim[1:4],e_prim[1:4].lower())
f_prim=f[5:5+dl]
g_prim=g[4:4+dl]
g_prim=g_prim[0].upper() + g_prim[1:9]+ g_prim[9:].title()
h_prim=h[4:4+dl]
h_prim=h_prim.replace(h_prim[5:8],h_prim[5:8].upper())
#%%
print(a_prim)
print(b_prim)
print(c_prim)
print(d_prim)
print(e_prim)
print(f_prim)
print(g_prim)
print(h_prim)

2.3.4. Example 2

  • Adding strings ?!

DATA = [
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]

features=[]
species=[]

for i in DATA:
    fet=(str(i)[1:19],)
    spe=str(i)[22:-2]
    features.append(fet)
    species.append(spe)

print(features)
print(species)

2.3.5. Example 3

  • Try to change: not power of two, but power of three

  • How to do that?

import numpy as np
import pandas as pd


np.random.seed(0)

A=np.random.randint(0, 1025, 50*50).reshape((50,50))
B=[]

for i in A:
    for j in i:
        if (j & (j-1))==0 and j!=0 and j not in B:
            B.append(j)

B=np.asarray(B)
B=np.sort(B)

df_A=pd.DataFrame(A)
mask=df_A>=10

df_A[mask]=np.nan
df_A_clean=df_A.dropna(how='all')
df_A_clean.fillna(0, inplace=True)
df_A_clean
df_A_clean.describe()

2.3.6. Example 4

  • Using numpy everywhere

  • Methods?!

  • Passing all variables to __init__() instead of *args, **kwargs

"""
Napisz metody total() oraz average(), które będą wyliczały
odpowiednio sumę lub średnią z wszystkich pomiarów sepal_length,
sepal_width, petal_length, petal_width dla danego obiektu.
"""

import numpy as np


class Iris():
    def __init__(self, sepal_length, sepal_width, petal_length, petal_width):
        self.sepal_length = sepal_length
        self.sepal_width = sepal_width
        self.petal_length = petal_length
        self.petal_width = petal_width

    def __repr__(self):
        return f'{self.sepal_length}, {self.sepal_width}, {self.petal_length}, {self.petal_width}, {self.species}'


class Virginica(Iris):
    def __init__(self, sepal_length,sepal_width, petal_length, petal_width):
        Iris.__init__(self, sepal_length,sepal_width, petal_length, petal_width)
        self.species='virginica'

class Versicolor(Iris):
    def __init__(self, sepal_length,sepal_width, petal_length, petal_width):
        Iris.__init__(self, sepal_length,sepal_width, petal_length, petal_width)
        self.species='versicolor'

class Setosa(Iris):
    def __init__(self, sepal_length,sepal_width, petal_length, petal_width):
        Iris.__init__(self, sepal_length,sepal_width, petal_length, petal_width)
        self.species='setosa'


def ttl(iris):
    tmp = [iris.sepal_length, iris.sepal_width, iris.petal_length, iris.petal_width]
    return np.sum(tmp)

def avg(iris):
    tmp = [iris.sepal_length, iris.sepal_width, iris.petal_length, iris.petal_width]
    return np.mean(tmp)

a = Setosa(1,2,3,4)
print(a)

DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]

DATA_2=DATA[1:]
for item in DATA_2:
    if item[4]=='virginica':
        v1=Virginica(item[0], item[1], item[2], item[3])
    elif item[4]=='versicolor':
        v1=Versicolor(item[0], item[1], item[2], item[3])
    elif item[4]=='setosa':
        v1=Setosa(item[0], item[1], item[2], item[3])


    print(item[4])
    print(v1.total())
    print(v1.average())

2.3.7. Example 6

  • Very common bad practice

  • poor variable naming and readability

  • range(len(...)) will evaluate generator to calculate length

  • DATA[i] lookups has O(n) complexity!!

  • Does not use generator at all!

  • Use .startswith() to check not str[0]

Bad practice:

DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, {'name': 'virginica'}),
    (5.1, 3.5, 1.4, 0.2, {'name': 'setosa'}),
    (5.7, 2.8, 4.1, 1.3, {'name': 'versicolor'}),
    (6.3, 2.9, 5.6, 1.8, {'name': 'virginica'}),
    (6.4, 3.2, 4.5, 1.5, {'name': 'versicolor'}),
    (4.7, 3.2, 1.3, 0.2, {'name': 'setosa'}),
    (7.0, 3.2, 4.7, 1.4, {'name': 'versicolor'}),
    (7.6, 3.0, 6.6, 2.1, {'name': 'virginica'}),
    (4.6, 3.1, 1.5, 0.2, {'name': 'setosa'}),
]

for i in range(1, len(DATA)):
    if DATA[i][-1]['name'][0] == 'v':
        print(DATA[i][-1]['name'])

Pythonic way:

DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, {'name': 'virginica'}),
    (5.1, 3.5, 1.4, 0.2, {'name': 'setosa'}),
    (5.7, 2.8, 4.1, 1.3, {'name': 'versicolor'}),
    (6.3, 2.9, 5.6, 1.8, {'name': 'virginica'}),
    (6.4, 3.2, 4.5, 1.5, {'name': 'versicolor'}),
    (4.7, 3.2, 1.3, 0.2, {'name': 'setosa'}),
    (7.0, 3.2, 4.7, 1.4, {'name': 'versicolor'}),
    (7.6, 3.0, 6.6, 2.1, {'name': 'virginica'}),
    (4.6, 3.1, 1.5, 0.2, {'name': 'setosa'}),
]

header, *rows = DATA

for *measurements, species in rows:
    species = species['name']
    if species.startswith('v'):
        print(species)

2.3.8. References