Python Interview Questions

Interviewing for Python can be quite intimidating.  If you are appearing for a technical round of interview for Python, here’s a list of the top 101 interview questions with answers to help you prepare. The first set of questions and answers are curated for freshers while the second set is designed for advanced users. These questions cover all the basic applications of Python and will showcase your expertise in the subject. The questions are divided into groups such as basic, intermediate, and advanced questions. 
Before going through the questions, here’s a quick video to help you refresh your memory on Python. 

Basic Interview Questions

1. What are the key features of Python? 

Python is one of the most popular programming languages used by data scientists and AIML professionals. This popularity is due to the following key features of Python:

  • Python is easy to learn due to its clear syntax and readability
  • Python is easy to interpret, making debugging easy
  • Python is free and Open-source
  • It can be used across different languages
  • It is an object-oriented language which supports concepts of classes
  • It can be easily integrated with other languages like C++, Java and more

2. What are Keywords in Python? 

Keywords in Python are reserved words which are used as identifiers, function name or variable name. They help define the structure and syntax of the language. 

There are a total of 33 keywords in Python 3.7 which can change in the next version, i.e., Python 3.8. A list of all the keywords is provided below:

Keywords in Python

Falseclassfinallyisreturn
Nonecontinueforlambdatry
Truedeffromnonlocalwhile
anddelglobalnotwith
aselififoryield
assertelseimportpass
breakexcept


3. What are Literals in Python and explain about different Literals?

Literals in Python refer to the data that is given in a variable or constant. Python has various kinds of literals including:

  1. String Literals: It is a sequence of characters enclosed in codes. There can be single, double and triple strings based on the number of quotes used. Character literals are single characters surrounded by single or double-quotes. 
  2. Numeric Literals: These are unchangeable kind and belong to three different types – integer, float and complex.
  3. Boolean Literals: They can have either of the two values- True or False which represents ‘1’ and ‘0’ respectively. 
  4. Special Literals: Special literals are sued to classify fields that are not created. It is represented by the value ‘none’.

4. What is linear regression?

Linear regression is one of the most commonly used types of predictive analysis. Linear regression is used to analyse the effectiveness of variables in a training model. Linear regression assumes that there is a linear relation between the input and output variables. There are two types of linear regression- 

1. Simple Linear Regression
2. Multiple Linear Regression

5. What is Supervised Learning?

Supervised learning refers to the technique of aiding the machine in the learning process. Supervised learning uses labeled data to train the machines so that it understands the values assigned to the data set. Post the training process, when the machine is provided with a new set of data, it uses supervised learning algorithms to analyse and produce the correct result.

6. What is Unsupervised Learning?

Unsupervised learning refers to the process of training machines without explicit instructions. Unlike supervised learning, unsupervised learning doesn’t use labelled or classified data to train the machine. It allows the learning model to discover information on its own through unlabelled data. Unsupervised learning algorithms are used for more complex processing tasks where not much information is available.

7. In Python how do you convert a string into lowercase?

All the upper cases in a string can be converted into lowercase by using the method: string.lower()

ex: string = ‘GREATLEARNING’ print(string.lower())
o/p: greatlearning

8. How do you get a list of all the keys in a dictionary?

One of the ways we can get a list of keys is by using: dict.keys()
This method returns all the available keys in the dictionary. dict = {1:a, 2:b, 3:c} dict.keys()
o/p: [1, 2, 3]

9. How can you capitalize the first letter of a string?

We can use the capitalize() function to capitalize the first character of a string. If the first character is already in capital then it returns the original string.

Syntax: string_name.capitalize() ex: n = “greatlearning” print(n.capitalize())
o/p: Greatlearning

10. How can you insert an element at a given index in Python?

Python has an inbuilt function called the insert() function.
It can be used used to insert an element at a given index.
Syntax: list_name.insert(index, element)
ex: list = [ 0,1, 2, 3, 4, 5, 6, 7 ]
#insert 10 at 6th index
list.insert(6, 10)
o/p: [0,1,2,3,4,5,10,6,7]

11. How will you remove duplicate elements from a list?

There are various methods to remove duplicate elements from a list. But, the most common one is, converting the list into a set by using the set() function and using the list() function to convert it back to a list, if required. ex: list0 = [2, 6, 4, 7, 4, 6, 7, 2]
list1 = list(set(list0)) print (“The list without duplicates : ” + str(list1)) o/p: The list without duplicates : [2, 4, 6, 7]

12. What is recursion?

Recursion is a function calling itself one or more times in it body. One very important condition a recursive function should have to be used in a program is, it should terminate, else there would be a problem of an infinite loop.

13. Explain Python List Comprehension

List comprehensions are used for transforming one list into another list. Elements can be conditionally included in the new list and each element can be transformed as needed. It consists of an expression leading a for clause, enclosed in brackets. for ex: list = [i for i in range(1000)]
print list

14. What is the bytes() function?

The bytes() function returns a bytes object. It is used to convert objects into bytes objects, or create empty bytes object of the specified size.

15. What are the different types of operators in Python?

Python has the following basic operators:
Arithmetic( Addition(+), Substraction(-), Multiplication(*), Division(/), Modulus(%) ), Relational ( <, >, <=, >=, ==, !=, ),
Assignment ( =. +=, -=, /=, *=, %= ),
Logical ( and, or not ), Membership, Identity, and Bitwise Operators

16. What is the ‘with statement’?

“with” statement in python is used in exception handling. A file can be opened and closed while executing a block of code, containing the “with” statement., without using the close() function. It essentially makes the code much more easy to read.

17. How do we interpret Python?

When a python program is written, it converts the source code written by the developer into intermediate language, which is then coverted into machine language that needs to be executed.

18. What are the tools present to perform statics analysis?

The two static analysis tool used to find bugs in Python are: Pychecker and Pylint. Pychecker detects bugs from the source code and warns about its style and complexity.While, Pylint checks whether the module matches upto a coding standard.

19. What is the difference between tuple and dictionary?

One major difference between a tuple and a dictionary is that dictionary is mutable while a tuple is not. Meaning the content of a dictionary can be changed without changing it’s identity, but in tuple that’s not possible.

20. What are module and package in Python?

Modules are the way to structure a program. Each Python program file is a module, importing other attributes and objects. The folder of a program is a package of modules. A package can have modules or subfolders.

21. What is object() function in Python?

In Python the object() function returns an empty object. New properties or methods cannot be added to this object.

22. What is the difference between NumPy and SciPy?

NumPy stands for Numerical Python while SciPy stands for Scientific Python. NumPy is the basic library for defining arrays and simple mathematica problems, while SciPy is used for more complex problems like numerical integration and optimization and machine learning and so on.

23. What does len() do?

len() is used to determine the length of a string, a list, an array, and so on. ex: str = “greatlearning”
print(len(str))
o/p: 13

24. Define encapsulation in Python?

Encapsulation means binding the code and the data together. A Python class for example.

25. What is the type () in Python?

type() is a built-in method which either returns the type of the object or returns a new type object based on the arguments passed.

ex: a = 100
type(a)

o/p: int

Intermediate Questions

26. How to remove spaces from a string in Python?

Spaces can be removed from a string in python by using strip() or replace() functions. Strip() function is used to remove the leading and trailing white spaces while the replace() function is used to remove all the white spaces in the string:

string.replace(” “,””) ex1: str1= “great learning”
print (str.strip())

o/p: great learning

ex2: str2=”great learning”
print (str.replace(” “,””))

o/p: greatlearning

27. What is a map() function in Python?

The map() function in Python is used for applying a function on all elements of a specified iterable. It consists of two parameters, function and iterable. The function is taken as an argument and then applied to all the elements of an iterable(passed as the second argument). An object list is returned as a result.

def add(n):
return n + n number= (15, 25, 35, 45)
res= map(add, num)
print(list(res))

o/p: 30,50,70,90

28. Explain the file processing modes that Python supports.

There are three file processing modes in Python: read-only(r), write-only(w), read-write(rw) and append (a). So, if you are opening a text file in say, read mode. The preceding modes become: “rt” fot read-only, “wt” for write and so on. Similarly a binary file can be opened by specifying “b” along with the file accessing flags (“r” ,”w”, “rw” and “a”) preceding it.

29. What is __init__ in Python?

_init_ methodology is a reserved method in Python aka constructor in OOP. When an object is created from a class and _init_ methodolgy is called to acess the class attributes.

30. What is pickling and unpickling?

Pickling is the process of converthing a Python object hierarachy into a byte stream for storing it into a database.It is also known as serialization. Unpickling is the reverse of pickling. The byte stream is conveted back into obhect hierarchy.

31. How is memory managed in Python?

Memory management in python comprises of a private heap containing all objects and data stucture. The heap is managed by the interpreter and the programmer does not have acess to it at all. The Python memory manger does all the memory allocation. Moreover, there is an inbuilt garbage collector that recycles and frees memory for the heap space.

32. What is pass in Python?

Pass is a statentemen which does nothing when executed. In other words it is a Null statement. This statement is not ignored by the interpreter, but the statement results in no operation. It is used when you do not want any command to execute but a statement is required.

33. What is unittest in Python?

Unittest is a unit testinf framework in Python. It supports sharing of setup and shutdown code for tests, aggregation of tests into collections,test automation, and independence of the tests from the reporting framework.

34. How can an object be copied in Python?

Not all objects can be copied in Python, but most can. We ca use the “=” operator to copy an obect to a variable.

ex: var=copy.copy(obj)

35. How can a number be converted to a string?

The inbuilt function str() can be used to convert a nuber to a string.

36. How do you delete a file in Python?

Files can be deleted in Python by using the command os.remove (filename) or os.unlink(filename)

37. What is split() function used for?

Split fuction is used to split a string into shorter string using defined seperatos. letters = (” A, B, C”)
n = text.split(“,”)
print(n)

o/p: [‘A’, ‘B’, ‘C’ ]

38. How do you create an empty class in Python?

To create an empty class we can use the pass command after the definition of the class object. A pass is a statement in Python that does nothing.

39. How can you concatenate two tuples?

  Solution ->

Let’s say we have two tuples like this ->

tup1 = (1,”a”,True)

tup2 = (4,5,6)

Concatenation of tuples means that we are adding the elements of one tuple at the end of another tuple.

Now, let’s go ahead and concatenate tuple2 with tuple1:

All you have to do is, use the ‘+’ operator between the two tuples and you’ll get the concatenated result.

Similarly, let’s concatenate tuple1 with tuple2:

40. How can you find the minimum and maximum values present in a tuple?

 Solution ->

We can use the min() function on top of the tuple to find out the minimum value present in the tuple:

We see that the minimum value present in the tuple is 1.

Analogous to the min() function is the max() function, which will help us to find out the maximum value present in the tuple:

We see that the maximum value present in the tuple is 5

41. If you have a list like this -> [1,”a”,2,”b”,3,”c”]. How can you access the 2nd, 4th and 5th elements from this list?

Solution ->

We will start off by creating a tuple which will comprise of the indices of elements which we want to access:

Then, we will use a for loop to go through the index values and print them out:

Below is the entire code for the process:

42. If you have a list like this -> [“sparta”,True,3+4j,False]. How would you reverse the elements of this list?

Solution ->

We can use  the reverse() function on the list:

43. If you have dictionary like this – > fruit={“Apple”:10,”Orange”:20,”Banana”:30,”Guava”:40}. How would you update the value of ‘Apple’ from 10 to 100?

Solution ->

 This is how you can do it:

Give in the name of the key inside the parenthesis and assign it a new value.

44. If you have two sets like this -> s1 = {1,2,3,4,5,6}, s2 = {5,6,7,8,9}. How would you find the common elements in these sets.

Solution ->

You can use the intersection() function to find the common elements between the two sets:

We see that the common elements between the two sets are 5 & 6.

45. Write a program to print out the 2-table using while loop.

Solution ->

Below is the code to print out the 2-table:

We start off by initializing two variables ‘i’ and ‘n’. ‘i’ is initialized to 1 and ‘n’ is initialized to ‘2’.

Inside the while loop, since the ‘i’ value goes from 1 to 10, the loop iterates 10 times.

Initially n*i is equal to 2*1, and we print out the value.

Then, ‘i’ value is incremented and n*i becomes 2*2. We go ahead and print it out.

This process goes on until i value becomes 10.

46. What are functions in Python? 

Ans: Functions in Python refer to blocks that have organised, and reusable codes to perform single, and related events. Functions are important to create better modularity for applications which reuse high degree of coding. Python has a number of built-in functions like print(). However, it also allows you to create user-defined functions.

47. Write a function, which will take in a value and print out if it is even or odd.

Solution ->

The below code will do the job:

Here, we start off by creating a method, with the name ‘even_odd()’. This function takes a single parameter and prints out if the number taken is even or odd.

Now, let’s invoke the function:

We see that, when 5 is passed as a parameter into the function, we get the output -> ‘5 is odd’.

48. Write a python program to print the factorial of a number.

Solution ->

Below is the code to print the factorial of a number:

We start off by taking an input which is stored in ‘num’. Then, we check if ‘num’ is less than zero and if it is actually less than 0, we print out ‘Sorry, factorial does not exist for negative numbers’.

After that, we check,if ‘num’ is equal to zero, and it that’s the case, we print out ‘The factorial of 0 is 1’.

On the other hand, if ‘num’ is greater than 1, we enter the for loop and calculate the factorial of the number.

49. Write a python program to check if the number given is a palindrome or not

Solution ->

Below is the code to Check whether the given number is palindrome or not:

We will start off by taking an input and store it in ‘n’ and make a duplicate of it in ‘temp’. We will also initialize another variable ‘rev’ to 0. 

Then, we will enter a while loop which will go on until ‘n’ becomes 0. 

Inside the loop, we will start off by dividing ‘n’ with 10 and then store the remainder in ‘dig’.

Then, we will multiply ‘rev’ with 10 and then add ‘dig’ to it. This result will be stored back in ‘rev’.

Going ahead, we will divide ‘n’ by 10 and store the result back in ‘n’

Once the for loop ends, we will compare the values of ‘rev’ and ‘temp’. If they are equal, we will print ‘The number is a palindrome’, else we will print ‘The number isn’t a palindrome’.

50. Write a python program to print the following pattern ->

1

2 2

3 3 3

4 4 4 4

5 5 5 5 5

Solution ->

Below is the code to print this pattern:

We are solving the problem with the help of nested for loop. We will have an outer for loop, which goes from 1 to 5. Then, we have an inner for loop, which would print the respective numbers.

51. What do you understand by object oriented programming in Python?

Object oriented programming refers to the process of solving a problem by creating objects. This approach takes into account two key factors of an object- attributes and behaviour.

52. How can you initialize a 5*5 numpy array with only zeroes?  

Solution ->

We will be using the .zeros() meethod

Use np.zeros() and pass in the dimensions inside it. Since, we want a 5*5 matrix, we will pass (5,5) inside the .zeros() method.

This will be the output:

53.  Pattern questions. Print the following pattern

#
# #
# # #
# # # #
# # # # #

Solution –>

def pattern_1(num): 
      
    # outer loop handles the number of rows
    # inner loop handles the number of columns 
    # n is the number of rows. 
    for i in range(0, n): 
      # value of j depends on i 
        for j in range(0, i+1): 
          
            # printing hashes
            print("#",end="") 
       
        # ending line after each row 
        print("\r")  
num = int(input("Enter the number of rows in pattern: "))
pattern_1(num)

54. Print the following pattern

  # 
      # # 
    # # # 
  # # # #
# # # # #

Solution –>

       
Code:
def pattern_2(num): 
      
    # define the number of spaces 
    k = 2*num - 2
  
    # outer loop always handles the number of rows 
    # let us use the inner loop to control the number of spaces
    # we need the number of spaces as maximum initially and then decrement it after every iteration
    for i in range(0, num): 
        for j in range(0, k): 
            print(end=" ") 
      
        # decrementing k after each loop 
        k = k - 2
      
        # reinitializing the inner loop to keep a track of the number of columns
        # similar to pattern_1 function
        for j in range(0, i+1):  
            print("# ", end="") 
      
        # ending line after each row 
        print("\r") 
  

num = int(input("Enter the number of rows in pattern: "))
pattern_2(num)

55. Print the following pattern:

0
0 1
0 1 2
0 1 2 3
0 1 2 3 4

Solution –>

Code: 
def pattern_3(num): 
      
    # initialising starting number  
    number = 1
    # outer loop always handles the number of rows 
    # let us use the inner loop to control the number 
   
    for i in range(0, num): 
      
        # re assigning number after every iteration
        # ensure the column starts from 0
        number = 0
      
        # inner loop to handle number of columns 
        for j in range(0, i+1): 
          
                # printing number 
            print(number, end=" ") 
          
            # increment number column wise 
            number = number + 1
        # ending line after each row 
        print("\r") 
 
num = int(input("Enter the number of rows in pattern: "))
pattern_3(num)

56. Print the following pattern:

1
2 3
4 5 6
7 8 9 10
11 12 13 14 15

Solution –>

Code: 

def pattern_4(num): 
      
    # initialising starting number  
    number = 1
    # outer loop always handles the number of rows 
    # let us use the inner loop to control the number 
   
    for i in range(0, num): 
      
        # commenting the reinitialization part ensure that numbers are printed continuously
        # ensure the column starts from 0
        number = 0
      
        # inner loop to handle number of columns 
        for j in range(0, i+1): 
          
                # printing number 
            print(number, end=" ") 
          
            # increment number column wise 
            number = number + 1
        # ending line after each row 
        print("\r") 
  

num = int(input("Enter the number of rows in pattern: "))
pattern_4(num)

57. Print the following pattern:

A
B B
C C C
D D D D

Solution –>

def pattern_5(num): 
    # initializing value of A as 65
    # ASCII value  equivalent
    number = 65
  
    # outer loop always handles the number of rows 
    for i in range(0, num): 
      
        # inner loop handles the number of columns 
        for j in range(0, i+1): 
          
            # finding the ascii equivalent of the number 
            char = chr(number) 
          
            # printing char value  
            print(char, end=" ") 
      
        # incrementing number 
        number = number + 1
      
        # ending line after each row 
        print("\r") 
  
num = int(input("Enter the number of rows in pattern: "))
pattern_5(num)

58. Print the following pattern:

A
B C
D E F
G H I J
K L M N O
P Q R S T U

Solution –>

def  pattern_6(num): 
    # initializing value equivalent to 'A' in ASCII  
    # ASCII value 
    number = 65
 
    # outer loop always handles the number of rows 
    for i in range(0, num):
        # inner loop to handle number of columns 
        # values changing acc. to outer loop 
        for j in range(0, i+1):
            # explicit conversion of int to char
# returns character equivalent to ASCII. 
            char = chr(number) 
          
            # printing char value  
            print(char, end=" ") 
            # printing the next character by incrementing 
            number = number +1    
        # ending line after each row 
        print("\r") 
num = int(input("enter the number of rows in the pattern: "))
pattern_6(num)

59. Print the following pattern

  #
    # # 
   # # # 
  # # # # 
 # # # # #

Solution –>

Code: 
def pattern_7(num): 
      
    # number of spaces is a function of the input num 
    k = 2*num - 2
  
    # outer loop always handle the number of rows 
    for i in range(0, num): 
      
        # inner loop used to handle the number of spaces 
        for j in range(0, k): 
            print(end=" ") 
      
        # the variable holding information about number of spaces
        # is decremented after every iteration 
        k = k - 1
      
        # inner loop reinitialized to handle the number of columns  
        for j in range(0, i+1): 
          
            # printing hash
            print("# ", end="") 
      
        # ending line after each row 
        print("\r") 
 
num = int(input("Enter the number of rows: "))
pattern_7(n)

60. What is pandas?

Pandas is an open source python library which has a very rich set of data structures for data based operations. Pandas with it’s cool features fits in every role of data operation, whether it be academics or solving complex business problems. Pandas can deal with a large variety of files and is one of the most important tools to have a grip on.

61. What are dataframes?

A pandas dataframe is a data structure in pandas which is mutable. Pandas has support for heterogeneous data which is arranged across two axes.( rows and columns).

Reading files into pandas:-

Import pandas as pd

df=p.read_csv(“mydata.csv”)

Here df is a pandas data frame. read_csv() is used to read a comma delimited file as a dataframe in pandas.

62. What is a pandas Series?

Series is a one dimensional pandas data structure which can data of almost any type. It resembles an excel column. It supports multiple operations and is used for single dimensional data operations.

Creating a series from data:

63. What is pandas groupby?

A pandas groupby is a feature supported by pandas which is used to split and group an object.  Like the sql/mysql/oracle groupby it used to group data by classes, entities which can be further used for aggregation. A dataframe can be grouped by one or more columns.

64. How to create a dataframe from lists?

To create a dataframe from lists ,

1)create an empty dataframe

2)add lists as individuals columns to the list

65. How to create dataframe from a dictionary?

A dictionary can be directly passed as an argument to the DataFrame() function to create the data frame.

66. How to create a new column in pandas by using values from other columns?

We can perform column based mathematical operations on a pandas dataframe. Pandas columns containing numeric values can be operated upon by operators.

67. What are the different functions that can be used by grouby in pandas ?

grouby() in pandas can be used with multiple aggregate functions. Some of which are sum(),mean(), count(),std().

Data is divided into groups based on categories and then the data in these individual groups can be aggregated by the aforementioned functions.

68. How to combine dataframes in pandas?

Two different data frames can be stacked either horizontally or vertically by the concat(), append() and join() functions in pandas.

Concat works best when the dataframes have the same columns and can be used for concatenation of data having similar fields and is basically vertical stacking of dataframes into a single dataframe.

Append() is used for horizontal stacking of dataframes. If two tables(dataframes) are to be merged together then this is the best concatenation function.

Join is used when we need to extract data from different dataframes which are having one or more common columns. The stacking is horizontal in this case.

69. What kind of joins does pandas offer?

Pandas has a left join, inner join, right join and an outer join.

70. How to merge dataframes in pandas?

Merging depends on the type and fields of different dataframes being merged. If data is having similar fields data is merged along axis 0 else they are merged along axis 1.

71. Given the below dataframes form a single dataframe by vertical stacking.

We use the pd.concat and axis as 0 to stack them horizontally.

72. Given the below dataframes stack them horizontally to form a single data frame.

We use the pd.concat and axis as 0 to stack them horizontally.

73. How to select columns in pandas and add them to a new dataframe? What if there are two columns with the same name?

If df is dataframe in pandas df.columns gives the list of all columns. We can then form new columns by selecting columns.

If there are two columns with the same name then both columns get copied to the new dataframe.

74. How to delete a columns or group of columns in pandas? Given the below dataframe drop column “col1”.

drop() function can be used to delete the columns from a dataframe. 

75. Give the below dataframe drop all rows having Nan.

The dropna function can be used to do that.

76. Given the following data frame drop rows having column values as A.

77. Given the below dataset find the highest paid player in each college in each team.

78. Given the above dataset find the min max and average salary of a player collegewise and teamwise.

79. What is reindexing in pandas?

Reindexing is the process of re-assigning the index of a pandas dataframe.

80. How to access the first five entries of a dataframe?

By using the head(5) function we can get the top five entries of a dataframe. By default df.head() returns the top 5 rows. To get the top n rows df.head(n) will be used.

81. How to access the last five entries of a dataframe?

By using tail(5) function we can get the top five entries of a dataframe. By default df.tail() returns the top 5 rows. To get the last n rows df.tail(n) will be used.

82. How to fetch a data entry from a pandas dataframe using a given value in index?

To fetch a row from dataframe given index x, we can use loc.

Df.loc[10] where 10 is the value of the index.

83. What are comments and how can you add comments in Python?

Comments in Python refer to a piece of text intended for information. It is especially relevant when more than one person works on a set of codes. It can be used to analyse code, leave feedback, and debug it. There are two types of comments which includes:

  1. Single-line comment
  2. Multiple-line comment

Codes needed for adding comment

#Note –single line comment
“””Note
Note
Note”””—–multiline comment

84. What is the difference between list and tuples in Python?

Lists are mutable, but tuples are immutable.

85. What is a dictionary in Python? Give an example

A Python dictionary is a collection of items in no particular order. Python dictionaries are written in curly brackets with keys and values. Dictionaries are optimised to retrieve value for known keys.

Example

d={“a”:1,”b”:2}

86. If you have a dictionary like this -> d1={“k1″:10,”k2″:20,”k3”:30}. How would you increment values of all the keys ?

d1={"k1":10,"k2":20,"k3":30}
 
for i in d1.keys():
  d1[i]=d1[i]+1

87. What do you understand by lambda function? Create a lambda function which will print the sum of all the elements in this list -> [5, 8, 10, 20, 50, 100]

from functools import reduce
sequences = [5, 8, 10, 20, 50, 100]
sum = reduce (lambda x, y: x+y, sequences)
print(sum)

88. How are classes created in Python? Give an example

class Node(object):
  def __init__(self):
    self.x=0
    self.y=0

Here Node is a class

89. What is inheritance in Object oriented programming? Give an example of multiple inheritance.

Inheritance is one of the core concepts of object-oriented programming. It is a process of deriving a class from a different class and form a hierarchy of classes that share the same attributes and methods. It is generally used for deriving different kinds of exceptions, create custom logic for existing frameworks and even map domain models for database.

Example

class Node(object):
  def __init__(self):
    self.x=0
    self.y=0

Here class Node inherits from the object class.

90. What is multi-level inheritance? Give an example for multi-level inheritance?

If class A inherits from B and C inherits from A it’s called multilevel inheritance.
class B(object):
  def __init__(self):
    self.b=0
 
class A(B):
  def __init__(self):
    self.a=0
 
class C(A):
  def __init__(self):
    self.c=0

91. Find out the mean, median and standard deviation of this numpy array -> np.array([1,5,3,100,4,48])

import numpy as np
n1=np.array([10,20,30,40,50,60])
print(np.mean(n1))
print(np.median(n1))
print(np.std(n1))

92. What is vstack() in numpy? Give an example

vstack() is a function to align rows verticaly. All rows must have same number of elements.

93. What is Machine Learning?

Machine Learning refers to the technologies (AIML) that enable machines to automatically learn and improve from experience without explicit instructions or algorithms. ML methodologies use data to feed machines with prototypes, examples from which it can learn patterns and make decisions based on this learning. The end goal of machine learning is to eliminate or reduce human intervention while working with computers.

94. What is a classifier?

A classifier is used to predict the class of any data point. Classifiers are special hypotheses that are used to assign class labels to any particular data points. A classifier often uses training data to understand the relation between input variables and the class. Classification is a method used in supervised learning in Machine Learning.

95. What are features?

A feature is a measurable component of the data set that machine learning sets out to analyse. Features appear as columns for datasets. The quality of features determine the quality of the insights that you can gain out of a data set. Features vary from one business case to another. Hence, it is important to understand the business goals of your machine learning project to optimise the features (through feature selection, feature engineering and more).

96. What is discrete data?

Discrete data refers to data that has a distinct set of values. Discrete data is countable and cannot be subdivided any further. Discrete data can take on only particular values and is generally numeric.

97. What is continuous data?

Continuous data cannot be restricted to just separate values but can occupy any value over a continuous range. There may be an infinite number of values between any two continuous data. Continuous data is not countable but is measurable. It is always represented in numeric values.

98. What are the assumptions of linear regression?

The basic assumptions of linear regression are as follows:

  • There is a linear relationship between the features and target
  • There is little or no Multicollinearity between the features
  • Homoscedasticity Assumption : When the error term is same across all values 
  • Errors follow a normal distribution
  • There is little or no autocorrelation in the residuals

99. What is the visualization technique best suited for representing counts?

Visualization techniques best suited for representing counts are as follows:

  • Countplot- It is similar to a bar graph or a histogram which shows the number of occurrences of an item based on a certain type of category.  
  • Barplot – It is one of the most common types of graphics. It represents the relationship between a numeric and a categorical value (where categorical value is represented through a bar and the numeric value through the height of the bar)

100. What is scatter plot and what is the use of it?

A scatter plot represents values for different numeric variables through dots where the position of each dot on the horizontal and vertical axis indicates values for individual data points.

Scatter plots are used to determine the relationship between different variables and understand how one variable is affected by another.

101. What is normalization?

Normalisation is a database design technique that reduces redundancy and dependency of data. This involves organising tables by diving bigger ones into smaller parts and linking them through relationships. This process ensures data is stored logically.

Great Learning offers extensive courses on Artificial Intelligence and Machine Learning. Upskilling in this domain can land you the job of your dreams.

102. You have this covid-19 dataset below:

From this dataset, how will you make a bar-plot for the top 5 states having maximum confirmed cases as of 17=07-2020?

Sol:
#keeping only required columns

df = df[[‘Date’, ‘State/UnionTerritory’,’Cured’,’Deaths’,’Confirmed’]]

#renaming column names

df.columns = [‘date’, ‘state’,’cured’,’deaths’,’confirmed’]

#current date

today = df[df.date == ‘2020-07-17’]

#Sorting data w.r.t number of confirmed cases

max_confirmed_cases=today.sort_values(by=”confirmed”,ascending=False)

max_confirmed_cases

#Getting states with maximum number of confirmed cases

top_states_confirmed=max_confirmed_cases[0:5]

#Making bar-plot for states with top confirmed cases

sns.set(rc={‘figure.figsize’:(15,10)})

sns.barplot(x=”state”,y=”confirmed”,data=top_states_confirmed,hue=”state”)

plt.show()

Code explanation:

We start off by taking only the required columns with this command:
df = df[[‘Date’, ‘State/UnionTerritory’,’Cured’,’Deaths’,’Confirmed’]]
Then, we go ahead and rename the columns:
df.columns = [‘date’, ‘state’,’cured’,’deaths’,’confirmed’]

After that, we extract only those records, where the date is equal to 17th July:
today = df[df.date == ‘2020-07-17’]
Then, we go ahead and select the top 5 states with maximum no. of covide cases:
max_confirmed_cases=today.sort_values(by=”confirmed”,ascending=False)
max_confirmed_cases
top_states_confirmed=max_confirmed_cases[0:5]

Finally, we go ahead and make a bar-plot with this:
sns.set(rc={‘figure.figsize’:(15,10)})
sns.barplot(x=”state”,y=”confirmed”,data=top_states_confirmed,hue=”state”)
plt.show()

Here, we are using seaborn library to make the bar-plot. “State” column is mapped onto the x-axis and “confirmed” column is mapped onto the y-axis. The color of the bars is being determined by the “state” column.

103. From this covid-19 dataset:

How can you make a bar-plot for the top-5 states with the most amount of deaths?

Sol:

max_death_cases=today.sort_values(by=”deaths”,ascending=False)

max_death_cases

sns.set(rc={‘figure.figsize’:(15,10)})

sns.barplot(x=”state”,y=”deaths”,data=top_states_death,hue=”state”)

plt.show()

Code Explanation:

We start off by sorting our dataframe in descending order w.r.t the “deaths” column:

max_death_cases=today.sort_values(by=”deaths”,ascending=False)

Max_death_cases

Then, we go ahead and make the bar-plot with the help of seaborn library:

sns.set(rc={‘figure.figsize’:(15,10)})

sns.barplot(x=”state”,y=”deaths”,data=top_states_death,hue=”state”)

plt.show()

Here, we are mapping “state” column onto the x-axis and “deaths” column onto the y-axis.


104. From this covid-19 dataset:

How can you make a line plot indicating the confirmed cases with respect to date?

Sol:

maha = df[df.state == ‘Maharashtra’]

sns.set(rc={‘figure.figsize’:(15,10)})

sns.lineplot(x=”date”,y=”confirmed”,data=maha,color=”g”)

plt.show()

Code Explanation:

We start off by extracting all the records where the state is equal to “Maharashtra”:

maha = df[df.state == ‘Maharashtra’]

Then, we go ahead and make a line-plot using seaborn library:

sns.set(rc={‘figure.figsize’:(15,10)})

sns.lineplot(x=”date”,y=”confirmed”,data=maha,color=”g”)

plt.show()

Here, we map the “date” column onto the x-axis and “confirmed” column onto y-axis.


105. On this “Maharashtra” dataset:

How will you implement a linear regression algorithm with “date” as independent variable and “confirmed” as dependent variable. That is you have to predict the number of confirmed cases w.r.t date.

Sol:

from sklearn.model_selection import train_test_split

maha[‘date’]=maha[‘date’].map(dt.datetime.toordinal)

maha.head()

x=maha[‘date’]

y=maha[‘confirmed’]

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(np.array(x_train).reshape(-1,1),np.array(y_train).reshape(-1,1))

lr.predict(np.array([[737630]]))

Code solution:

We will start off by converting the date to ordinal type:

from sklearn.model_selection import train_test_split

maha[‘date’]=maha[‘date’].map(dt.datetime.toordinal)

This is done because we cannot build the linear regression algorithm on top of the date column.

Then, we go ahead and divide the dataset into train and test sets:

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

Finally, we go ahead and build the model:

from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(np.array(x_train).reshape(-1,1),np.array(y_train).reshape(-1,1))

lr.predict(np.array([[737630]]))

106. On  this customer_churn dataset:

Build a keras sequential model to find out how many customers will churn out on the basis of tenure of customer?

Sol:

from keras.models import Sequential

from keras.layers import Dense

model = Sequential()

model.add(Dense(12, input_dim=1, activation=’relu’))

model.add(Dense(8, activation=’relu’))

model.add(Dense(1, activation=’sigmoid’))

model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

model.fit(x_train, y_train, epochs=150,validation_data=(x_test,y_test))

y_pred = model.predict_classes(x_test)

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test,y_pred)

Code explanation:

We will start off by importing the required libraries:

from keras.models import Sequential

from keras.layers import Dense

Then, we go ahead and build the structure of the sequential model:

model = Sequential()

model.add(Dense(12, input_dim=1, activation=’relu’))

model.add(Dense(8, activation=’relu’))

model.add(Dense(1, activation=’sigmoid’))

Finally, we will go ahead and predict the values:

model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

model.fit(x_train, y_train, epochs=150,validation_data=(x_test,y_test))

y_pred = model.predict_classes(x_test)

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test,y_pred)


107. On this iris dataset:

Build a decision tree classification model, where dependent variable is “Species” and independent variable is “Sepal.Length”.

Sol:

y = iris[[‘Species’]]

x = iris[[‘Sepal.Length’]]

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.4)

from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier()

dtc.fit(x_train,y_train)

y_pred=dtc.predict(x_test)

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test,y_pred)

(22+7+9)/(22+2+0+7+7+11+1+1+9)

Code explanation:

We start off by extracting the independent variable and dependent variable:

y = iris[[‘Species’]]

x = iris[[‘Sepal.Length’]]

Then, we go ahead and divide the data into train and test set:

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.4)

After that, we go ahead and build the model:

from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier()

dtc.fit(x_train,y_train)

y_pred=dtc.predict(x_test)

Finally, we build the confusion matrix:

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test,y_pred)

(22+7+9)/(22+2+0+7+7+11+1+1+9)

108. On this iris dataset:

Build a decision tree regression model where the independent variable is “petal length” and dependent variable is “Sepal length”.

Sol:

x= iris[[‘Petal.Length’]]

y = iris[[‘Sepal.Length’]]

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25)

from sklearn.tree import DecisionTreeRegressor

dtr = DecisionTreeRegressor()

dtr.fit(x_train,y_train)

y_pred=dtr.predict(x_test)

y_pred[0:5]

from sklearn.metrics import mean_squared_error

mean_squared_error(y_test,y_pred)


109. How will you scrape data from the website “cricbuzz”?

Sol:

import sys

import time

from bs4 import BeautifulSoup

import requests

import pandas as pd

try:

        #use the browser to get the url. This is suspicious command that might blow up.

    page=requests.get(‘cricbuzz.com’)                             # this might throw an exception if something goes wrong.

except Exception as e:                                   # this describes what to do if an exception is thrown

    error_type, error_obj, error_info = sys.exc_info()      # get the exception information

    print (‘ERROR FOR LINK:’,url)                          #print the link that cause the problem

    print (error_type, ‘Line:’, error_info.tb_lineno)     #print error info and line that threw the exception

                                                 #ignore this page. Abandon this and go back.

time.sleep(2)   

soup=BeautifulSoup(page.text,’html.parser’)

links=soup.find_all(‘span’,attrs={‘class’:’w_tle’}) 

links

for i in links:

    print(i.text)

    print(“\n”)


110. Write a user-defined function to implement central-limit theorem. You have to implement central limit theorem on this “insurance” dataset:

You also have to build two plots on “Sampling Distribution of bmi” and “Population distribution of  bmi”.

Sol:

df = pd.read_csv(‘insurance.csv’)

series1 = df.charges

series1.dtype

def central_limit_theorem(data,n_samples = 1000, sample_size = 500, min_value = 0, max_value = 1338):

    “”” Use this function to demonstrate Central Limit Theorem. 

        data = 1D array, or a pd.Series

        n_samples = number of samples to be created

        sample_size = size of the individual sample

        min_value = minimum index of the data

        max_value = maximum index value of the data “””

    %matplotlib inline

    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt

    import seaborn as sns

    b = {}

    for i in range(n_samples):

        x = np.unique(np.random.randint(min_value, max_value, size = sample_size)) # set of random numbers with a specific size

        b[i] = data[x].mean()   # Mean of each sample

    c = pd.DataFrame()

    c[‘sample’] = b.keys()  # Sample number 

    c[‘Mean’] = b.values()  # mean of that particular sample

    plt.figure(figsize= (15,5))

    plt.subplot(1,2,1)

    sns.distplot(c.Mean)

    plt.title(f”Sampling Distribution of bmi. \n \u03bc = {round(c.Mean.mean(), 3)} & SE = {round(c.Mean.std(),3)}”)

    plt.xlabel(‘data’)

    plt.ylabel(‘freq’)

    plt.subplot(1,2,2)

    sns.distplot(data)

    plt.title(f”population Distribution of bmi. \n \u03bc = {round(data.mean(), 3)} & \u03C3 = {round(data.std(),3)}”)

    plt.xlabel(‘data’)

    plt.ylabel(‘freq’)

    plt.show()

central_limit_theorem(series1,n_samples = 5000, sample_size = 500)

Code Explanation:

We start off by importing the insurance.csv file with this command:

df = pd.read_csv(‘insurance.csv’)

Then we go ahead and define the central limit theorem method:

def central_limit_theorem(data,n_samples = 1000, sample_size = 500, min_value = 0, max_value = 1338):

This method comprises of these parameters:

  • Data
  • N_samples
  • Sample_size
  • Min_value
  • Max_value

Inside this method, we import all the required libraries:

    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt

    import seaborn as sns

Then, we go ahead and create the first sub-plot for “Sampling distribution of bmi”:

  plt.subplot(1,2,1)

    sns.distplot(c.Mean)

    plt.title(f”Sampling Distribution of bmi. \n \u03bc = {round(c.Mean.mean(), 3)} & SE = {round(c.Mean.std(),3)}”)

    plt.xlabel(‘data’)

    plt.ylabel(‘freq’)

Finally, we create the sub-plot for “Population distribution of bmi”:

 plt.subplot(1,2,2)

    sns.distplot(data)

    plt.title(f”population Distribution of bmi. \n \u03bc = {round(data.mean(), 3)} & \u03C3 = {round(data.std(),3)}”)

    plt.xlabel(‘data’)

    plt.ylabel(‘freq’)

    plt.show()


111.  Write code to perform sentiment analysis on amazon reviews:

Sol:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.python.keras import models, layers, optimizers

import tensorflow

from tensorflow.keras.preprocessing.text import Tokenizer, text_to_word_sequence

from tensorflow.keras.preprocessing.sequence import pad_sequences

import bz2

from sklearn.metrics import f1_score, roc_auc_score, accuracy_score

import re

%matplotlib inline

def get_labels_and_texts(file):

    labels = []

    texts = []

    for line in bz2.BZ2File(file):

        x = line.decode(“utf-8”)

        labels.append(int(x[9]) – 1)

        texts.append(x[10:].strip())

    return np.array(labels), texts

train_labels, train_texts = get_labels_and_texts(‘train.ft.txt.bz2’)

test_labels, test_texts = get_labels_and_texts(‘test.ft.txt.bz2’)

Train_labels[0]

Train_texts[0]

train_labels=train_labels[0:500]

train_texts=train_texts[0:500]

import re

NON_ALPHANUM = re.compile(r'[\W]’)

NON_ASCII = re.compile(r'[^a-z0-1\s]’)

def normalize_texts(texts):

    normalized_texts = []

    for text in texts:

        lower = text.lower()

        no_punctuation = NON_ALPHANUM.sub(r’ ‘, lower)

        no_non_ascii = NON_ASCII.sub(r”, no_punctuation)

        normalized_texts.append(no_non_ascii)

    return normalized_texts

train_texts = normalize_texts(train_texts)

test_texts = normalize_texts(test_texts)

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(binary=True)

cv.fit(train_texts)

X = cv.transform(train_texts)

X_test = cv.transform(test_texts)

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(

    X, train_labels, train_size = 0.75)

for c in [0.01, 0.05, 0.25, 0.5, 1]:

    lr = LogisticRegression(C=c)

    lr.fit(X_train, y_train)

    print (“Accuracy for C=%s: %s” 

           % (c, accuracy_score(y_val, lr.predict(X_val))))

lr.predict(X_test[29])


112. Implement a probability plot using numpy and matplotlib:

sol:

import numpy as np

import pylab

import scipy.stats as stats

from matplotlib import pyplot as plt

n1=np.random.normal(loc=0,scale=1,size=1000)

np.percentile(n1,100)

n1=np.random.normal(loc=20,scale=3,size=100)

stats.probplot(n1,dist=”norm”,plot=pylab)

plt.show()


113. Implement multiple linear regression on this iris dataset:

The independent variables should be “Sepal.Width”, “Petal.Length”, “Petal.Width”, while the dependent variable should be “Sepal.Length”.

Sol:

import pandas as pd

iris = pd.read_csv(“iris.csv”)

iris.head()

x = iris[[‘Sepal.Width’,’Petal.Length’,’Petal.Width’]]

y = iris[[‘Sepal.Length’]]

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.35)

from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(x_train, y_train)

y_pred = lr.predict(x_test)

from sklearn.metrics import mean_squared_error

mean_squared_error(y_test, y_pred)

Code solution:

We start off by importing the required libraries:

import pandas as pd

iris = pd.read_csv(“iris.csv”)

iris.head()

Then, we will go ahead and extract the independent variables and dependent variable:

x = iris[[‘Sepal.Width’,’Petal.Length’,’Petal.Width’]]

y = iris[[‘Sepal.Length’]]

Following which, we divide the data into train and test sets:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.35)

Then, we go ahead and build the model:

from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(x_train, y_train)

y_pred = lr.predict(x_test)

Finally, we will find out the mean squared error:

from sklearn.metrics import mean_squared_error

mean_squared_error(y_test, y_pred)


114. From this credit fraud dataset:

Find the percentage of transactions which are fraudulent and not fraudulent. Also build a logistic regression model, to find out if the transaction is fraudulent or not.

Sol:

nfcount=0

notFraud=data_df[‘Class’]

for i in range(len(notFraud)):

  if notFraud[i]==0:

    nfcount=nfcount+1

nfcount    

per_nf=(nfcount/len(notFraud))*100

print(‘percentage of total not fraud transaction in the dataset: ‘,per_nf)

fcount=0

Fraud=data_df[‘Class’]

for i in range(len(Fraud)):

  if Fraud[i]==1:

    fcount=fcount+1

fcount    

per_f=(fcount/len(Fraud))*100

print(‘percentage of total fraud transaction in the dataset: ‘,per_f)

x=data_df.drop([‘Class’], axis = 1)#drop the target variable

y=data_df[‘Class’]

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.2, random_state = 42) 

logisticreg = LogisticRegression()

logisticreg.fit(xtrain, ytrain)

y_pred = logisticreg.predict(xtest)

accuracy= logisticreg.score(xtest,ytest)

cm = metrics.confusion_matrix(ytest, y_pred)

print(cm)


115.  Implement a simple CNN on the MNIST dataset using Keras. Following which, also add in drop out layers.

Sol:

from __future__ import absolute_import, division, print_function

import numpy as np

# import keras

from tensorflow.keras.datasets import cifar10, mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Reshape

from tensorflow.keras.layers import Convolution2D, MaxPooling2D

from tensorflow.keras import utils

import pickle

from matplotlib import pyplot as plt

import seaborn as sns

plt.rcParams[‘figure.figsize’] = (15, 8)

%matplotlib inline

# Load/Prep the Data

(x_train, y_train_num), (x_test, y_test_num) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype(‘float32’)

x_test = x_test.reshape(x_test.shape[0], 28, 28, 1).astype(‘float32’)

x_train /= 255

x_test /= 255

y_train = utils.to_categorical(y_train_num, 10)

y_test = utils.to_categorical(y_test_num, 10)

print(‘— THE DATA —‘)

print(‘x_train shape:’, x_train.shape)

print(x_train.shape[0], ‘train samples’)

print(x_test.shape[0], ‘test samples’)

TRAIN = False

BATCH_SIZE = 32

EPOCHS = 1

# Define the Type of Model

model1 = tf.keras.Sequential()

# Flatten Imgaes to Vector

model1.add(Reshape((784,), input_shape=(28, 28, 1)))

# Layer 1

model1.add(Dense(128, kernel_initializer=’he_normal’, use_bias=True))

model1.add(Activation(“relu”))

# Layer 2

model1.add(Dense(10, kernel_initializer=’he_normal’, use_bias=True))

model1.add(Activation(“softmax”))

# Loss and Optimizer

model1.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

# Store Training Results

early_stopping = keras.callbacks.EarlyStopping(monitor=’val_acc’, patience=10, verbose=1, mode=’auto’)

callback_list = [early_stopping]# [stats, early_stopping]

# Train the model

model1.fit(x_train, y_train, nb_epoch=EPOCHS, batch_size=BATCH_SIZE, validation_data=(x_test, y_test), callbacks=callback_list, verbose=True)

#drop-out layers:

    # Define Model

    model3 = tf.keras.Sequential()

    # 1st Conv Layer

    model3.add(Convolution2D(32, (3, 3), input_shape=(28, 28, 1)))

    model3.add(Activation(‘relu’))

    # 2nd Conv Layer

    model3.add(Convolution2D(32, (3, 3)))

    model3.add(Activation(‘relu’))

    # Max Pooling

    model3.add(MaxPooling2D(pool_size=(2,2)))

    # Dropout

    model3.add(Dropout(0.25))

    # Fully Connected Layer

    model3.add(Flatten())

    model3.add(Dense(128))

    model3.add(Activation(‘relu’))

    # More Dropout

    model3.add(Dropout(0.5))

    # Prediction Layer

    model3.add(Dense(10))

    model3.add(Activation(‘softmax’))

    # Loss and Optimizer

    model3.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

    # Store Training Results

    early_stopping = tf.keras.callbacks.EarlyStopping(monitor=’val_acc’, patience=7, verbose=1, mode=’auto’)

    callback_list = [early_stopping]

    # Train the model

    model3.fit(x_train, y_train, batch_size=BATCH_SIZE, nb_epoch=EPOCHS, 

              validation_data=(x_test, y_test), callbacks=callback_list)


116. Implement a popularity based recommendation system on this movie lens dataset:

import os

import numpy as np  

import pandas as pd

ratings_data = pd.read_csv(“ratings.csv”)  

ratings_data.head() 

movie_names = pd.read_csv(“movies.csv”)  

movie_names.head()  

movie_data = pd.merge(ratings_data, movie_names, on=’movieId’)  

movie_data.groupby(‘title’)[‘rating’].mean().head()  

movie_data.groupby(‘title’)[‘rating’].mean().sort_values(ascending=False).head() 

movie_data.groupby(‘title’)[‘rating’].count().sort_values(ascending=False).head()  

ratings_mean_count = pd.DataFrame(movie_data.groupby(‘title’)[‘rating’].mean())

ratings_mean_count.head()

ratings_mean_count[‘rating_counts’] = pd.DataFrame(movie_data.groupby(‘title’)[‘rating’].count())

ratings_mean_count.head()  


117. Implement the naive bayes algorithm on top of the diabetes dataset:

Sol:

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt       # matplotlib.pyplot plots data

%matplotlib inline 

import seaborn as sns

pdata = pd.read_csv(“pima-indians-diabetes.csv”)

columns = list(pdata)[0:-1] # Excluding Outcome column which has only 

pdata[columns].hist(stacked=False, bins=100, figsize=(12,30), layout=(14,2)); 

# Histogram of first 8 columns

# However we want to see correlation in graphical representation so below is function for that

def plot_corr(df, size=11):

    corr = df.corr()

    fig, ax = plt.subplots(figsize=(size, size))

    ax.matshow(corr)

    plt.xticks(range(len(corr.columns)), corr.columns)

    plt.yticks(range(len(corr.columns)), corr.columns)

plot_corr(pdata)

from sklearn.model_selection import train_test_split

X = pdata.drop(‘class’,axis=1)     # Predictor feature columns (8 X m)

Y = pdata[‘class’]   # Predicted class (1=True, 0=False) (1 X m)

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)

# 1 is just any random seed number

x_train.head()

from sklearn.naive_bayes import GaussianNB # using Gaussian algorithm from Naive Bayes

# creatw the model

diab_model = GaussianNB()

diab_model.fit(x_train, y_train.ravel())

diab_train_predict = diab_model.predict(x_train)

from sklearn import metrics

print(“Model Accuracy: {0:.4f}”.format(metrics.accuracy_score(y_train, diab_train_predict)))

print()

diab_test_predict = diab_model.predict(x_test)

from sklearn import metrics

print(“Model Accuracy: {0:.4f}”.format(metrics.accuracy_score(y_test, diab_test_predict)))

print()

print(“Confusion Matrix”)

cm=metrics.confusion_matrix(y_test, diab_test_predict, labels=[1, 0])

df_cm = pd.DataFrame(cm, index = [i for i in [“1″,”0”]],

                  columns = [i for i in [“Predict 1″,”Predict 0”]])

plt.figure(figsize = (7,5))

sns.heatmap(df_cm, annot=True)

Python Interview related FAQs-

Ques 1. How do you stand out in a Python coding interview?

Now that you’re ready for a Python Interview in terms of technical skills, you must be wondering how to stand out from the crowd so that you’re the selected candidate. You must be able to show that you can write clean production codes and have knowledge about the libraries and tools required. If you’ve worked on any prior projects, then showcasing these projects in your interview will also help you stand out from the rest of the crowd.

Ques 2. How do I prepare for a Python interview?

To prepare for a Python Interview, you must know syntax, key-words, functions and classes, data types, basic coding, and exception handling. Having basic knowledge regarding all the libraries, IDE’s used and reading blogs related to Python Tutorial’s will help you going forward. Showcase your example projects, brush up your basic skills about algorithms, data structures. This will help you stay prepared.

Ques 3. Are Python coding interviews very difficult?

The difficulty level of a Python Interview will vary depending on the role you are applying for, the company, their requirements, and your skill and knowledge/work experience. If you’re a beginner in the field and are not yet confident about your coding ability, you may feel that the interview is difficult. Being prepared and knowing what type of questions to expect will help you prepare well and ace the interview.

Ques 4. How do I pass the Python coding interview?

Having adequate knowledge regarding Object Relational Mapper (ORM) libraries, Django or Flask, unit testing and debugging skills, fundamental design principles behind a scalable application, Python packages such as NumPy, Scikit learn are extremely important for you to clear a coding interview. You can showcase your previous work experience or coding ability through projects, this acts as an added advantage.

Also Read: How to build a Python Developers Resume

Ques 5. Which courses or certifications can help boost knowledge in Python?

If you wish to upskill, taking up a certificate course will help you gain the required knowledge. You can take up Great Learning Academy’s free course today!

append in python
2

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

eight + 15 =