Data scientists are often asked a variety of challenging and insightful questions. Being a fresher, you will often be asked technical questions that test your base or questions related to the projects you have done in your academic career. Some may even ask you concepts of data science. All in all, if you prepare all such questions well, you are surely going to get a data scientist’s job!
Importance of p-value?
After hypothesis testing is done, the significance value is computed (which is the p-value) which usually lies between 0 and 1. If the p-value is lesser than 0.05, the null hypothesis is accepted, and if greater than 0.05, then rejected. Thus, the p-value is very important in judging the probability of the hypothesis.
What is A/B testing?
This type of hypothesis testing is used to test two variables A and B, of a randomized experiment. It is used to compare two versions of a website or an app to determine which performs better. Usually, small changes are added to web pages that are distributed among a sample of different sets of audiences. Based on their inputs, the changes are executed.
How is Hadoop useful?
Hadoop is an open-source framework that helps in processing large amounts of datasets across computers, using simple programming models. It provides the ability to deal with large scale unstructured data and allows to implement different algorithms on them. It is a boon to data scientists as it has a unique capability where data can be stored and retrieved from a single place.
Name some NoSQL databases
Some popular NoSQL databases are MongoDB, Cassandra, CRM, HBase, Hypertable, Redis, etc.
Name some clauses used in SQL
There are five types of clauses used in SQL, namely, Order By clause, Top clause, Where clause, Group By clause, and Having clause.
What is a foreign key?
A foreign key is used to link two tables together. They are the columns of a table that are used to point to the primary key of the other table. They act as a cross-reference between tables.
Difference between deep learning and machine learning?
Deep learning is actually a part of machine learning itself but has different capabilities. It is about developing algorithms that simulate the way humans react as per their nervous system. Whereas machine learning involves predicting and understanding patterns and over large datasets.
How to handle missing values in data?
There are a few ways to handle this situation.
- Replace the missing values with the mean/median value of the observation
- Run predictive regression models
- Drop the values
- Delete the observation
- Clustering and finding the accurate value
What is univariate, bivariate and multivariate analysis?
The univariate analysis involves statistical techniques that can differentiate on the number of variables involved. The bivariate analysis highlights the difference between two variables at one time. The multivariate analysis helps in analyzing more than two variables at once, and also tells the effect of variables on the responses.