No-SQL Interview Questions and Answers
by Sathish, on Jan 7, 2021 9:29:57 PM
1.What is NoSQL?
Ans: NoSQL encompasses a wide variety of different database technologies that were developed in response to a rise in the volume of data stored about users, objects and products. The frequency in which this data is accessed, and performance and processing needs. Relational databases, on the other hand, were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the cheap storage and processing power available today.
2.What are NoSQL databases? What are the different types of NoSQL databases?
Ans: A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases (like SQL, Oracle, etc.).
Types of NoSQL databases:
- Document Oriented
- Key Value
- Column Oriented
3.Compare NoSQL & RDBMS?
|Data format||Does not follow any order||Organized and structured|
|Querying||Limited as no Join Clause||Using SQL|
|Storage mechanism||Key-Value Pair, document, column storage, etc.||Data & relationship stored in different tables|
4.What are the features of NoSQL?
Ans: When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:
- Large volumes of structured, semi-structured, and unstructured data
- Agile sprints, quick iteration, and frequent code pushes
- Object-oriented programming that is easy to use and flexible
- Efficient, scale-out architecture instead of expensive, monolithic architecture
5.Explain “Polyglot Persistence” in NoSQL?
Ans: In 2006, Neal Ford coined the term polyglot programming, to express the idea that applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems.
Complex applications combine different types of problems, so picking the right language for each job may be more productive than trying to fit all aspects into a single language.Similarly, when working on an e-commerce business problem, using a data store for the shopping cart which is highly available and can scale is important, but the same data store cannot help you find products bought by the customers’ friends—which is a totally different question. We use the term polyglot persistence to define this hybrid approach to persistence.
6.When would you use NoSQL?
Ans: It depends from some general points:
- NoSQL is typically good for unstructured/"schemaless" data - usually, you don't need to explicitly define your schema up front and can just include new fields without any ceremony
- NoSQL typically favours a denormalised schema due to no support for JOINs per the RDBMS world. So you would usually have a flattened, denormalized representation of your data.
- Using NoSQL doesn't mean you could lose data. Different DBs have different strategies. e.g. MongoDB - you can essentially choose what level to trade off performance vs potential for data loss - best performance = greater scope for data loss.
- It's often very easy to scale out NoSQL solutions. Adding more nodes to replicate data to is one way to a) offer more scalability and b) offer more protection against data loss if one node goes down. But again, depends on the NoSQL DB/configuration. NoSQL does not necessarily mean "data loss" like you infer.
- IMHO, complex/dynamic queries/reporting are best served from an RDBMS. Often the query functionality for a NoSQL DB is limited.
- It doesn't have to be a 1 or the other choice. My experience has been using RDBMS in conjunction with NoSQL for certain use cases.
- NoSQL DBs often lack the ability to perform atomic operations across multiple "tables".
7.Explain the difference between NoSQL v/s Relational database?
Ans: Google needs a storage layer for their inverted search index. They figure a traditional RDBMS is not going to cut it. So they implement a NoSQL data store, BigTable on top of their GFS file system. The major part is that thousands of cheap commodity hardware machines provides the speed and the redundancy.Everyone else realizes what Google just did.Brewers CAP theorem is proven. All RDBMS systems of use are CA systems. People begin playing with CP and AP systems as well. K/V stores are vastly simpler, so they are the primary vehicle for the research.
Software-as-a-service systems in general do not provide an SQL-like store. Hence, people get more interested in the NoSQL type stores.I think much of the take-off can be related to this history. Scaling Google took some new ideas at Google and everyone else follows suit because this is the only solution they know to the scaling problem right now. Hence, you are willing to rework everything around the distributed database idea of Google because it is the only way to scale beyond a certain size.Get to know more about this NoSQL vs. SQL – What is Better? that can help you grow in your career.
8. Explain About Cassandra Nosql?
Ans: Cassandra is an open source scalable and highly available “NoSQL” distributed database management system from Apache. Cassandra claims to offer fault tolerant linear scalability with no single point of failure. Cassandra sits in the ColumnFamily NoSQL camp.The Cassandra data model is designed for large scale distributed data and trades ACID compliant data practices for performance and availability.Cassandra is optimized for very fast and highly available writes.Cassandra is written in Java and can run on a vast array of operating systems and platform.
9. What Are The Various Categories On Nosql?
Ans: The various categories on NOSQL :
- KeyValue Store Database
- Column Family Database
- Document Store Database
- Graph Database
- Multivalue Database
- Object Database
- Tripple Store Database
- Tuple Store Database
- Tabular Database
10.What Is Graph Database?
Ans: This kind of NoSQL database fits best in the case where in a connected set of all nodes,edges satisfy a given predicate, starting from a given node.A classic example may be any social engineering site.
Examples : Neo4j etc.
Intermediate Interview Questions
11.When should I use a NoSQL database instead of a relational database?
Ans: A relational database enforces ACID. So, you will have schema based transaction oriented data stores. It’s proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.But, there are limitations on speed and scaling when it comes to massive high availability data stores.
For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not per formant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That’s the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.NoSQL databases have been around for a long time – just the term is new. Some examples are graph, object, column, XML and document databases.
12.What is the difference between NoSQL & Mysql DBs’?
Ans: NoSQL databases are becoming a major part of the database landscape today, and with their handful of advantages, they can be a real game changer in the enterprise arena. However, NoSQL isn’t ripe yet, and professionals in the industry need to approach it with caution.
This is because it lacks the maturity that SQL databases like MySQL offer. If your application doesn’t fall into the category of the likes of Google, Yahoo, Facebook or Wikipedia, you should reconsider your options for using NoSQL and stick with MySQL instead. Not only is there a major skills gap with finding NoSQL professionals, but issues like analytics, performance reporting and migration also need to be considered.
13.What Is Eventual Consistency In Nosql Stores?
Ans: Eventual consistency means eventually, when all service logic is executed, the system is left in a consistent state. This concept is widely used in distributed systems to achieve high availability. It informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
In NoSQL systems, the eventual consistent services are often classified as providing BASE (Basically Available, Soft state, Eventual consistency) and in RDMS, it is classified as ACID (Availability, Consistency, Isolation and Durability). Leading NoSQL databases like Riak, Couchbase, and DynamoDB provide client applications with a guarantee of “eventual consistency”. Others, like MongoDB and Cassandra are eventually consistent in some configurations.
14. How does NoSQL DB budget memory?
Ans: The Replication Node manages the data in a NoSQL DB store and is the main consumer of memory. The Java heap and cache size used by the Replication Node can be important performance factors. By default, the Replication Node heap and cache are calculated by NoSQL DB based on the amount of memory available to the Storage Node.
We recommend that you specify the available memory for a Storage Node using the -memory_mb flag for makebootconfig, or the memory_mb Storage Node parameter. If you do not define memory_mb, it will default to the memory available on the node. NoSQL DB will then use 85% of memory_mb as the heap for the Replication Node processes hosted by that Storage Node. If the Storage Node hosts more than one Replication Node, the memory will be divided evenly between all RNs.
If the number of Replication Nodes on a Storage Node changes, the per-RN memory will be recalculated dynamically. The percentage used for heap is controlled by the rnHeapPercent Storage Node parameter. You can choose to override the default value of 85%.Each Replication Node uses a cache, and the size of that cache defaults to 70% of the Replication Node heap. You can override the 70% default by setting the rnCachePercent Replication Node parameter.
The Replication Node heap can also be specified directly by setting the -Xmx in the Replication Node javaMiscParams parameter. Likewise, the Replication Node cache can be set directly with the cache Size Replication Node parameter. While that’s possible, it’s advisable to use the Storage Node memory_mb setting.
As an example, suppose you specify that a Storage Node may use 3000 MB of memory, by setting memory_mb to 3000. If that Storage Node hosts two Replication Nodes, the heap for each RN will be (3000 * .85)/2 = 1275MB. Each RN cache will be (1275 * .70) = 892MB.
15.How to script NoSQL DB configuration?
Ans: You may find that you want to build the same NoSQL DB configuration repeatedly for testing purposes. The Admin CLI commands can be scripted in several ways.Many uses of the Admin CLI are simple commands, such as java -jar kvstore.jar makebootconfig to initially configure a StorageNode, shown above.
These are as amenable to scripting as any other UNIX commands and will not be discussed further here.The interactive commands available in java -jar kvstore.jar runadmin, among which are those used to create and execute plans, can be scripted in two ways. You can create a file containing the sequence of commands that you want to run, and run them in a batch using java -jar kvstore.jar runadmin load -file <script>.For example, a script file named deploy.kvs could contain commands such as the following:
configure -name mystore
plan deploy-datacenter -name boston -rf 3 -wait
plan deploy-sn -dcname boston -host localhost -port 5000 -wait
plan deploy-admin -sn sn1 -port 5001 -wait
You could execute this script by issuing the command
java -jar kvstore.jar runadmin -host localhost -port 5000 load -file
16.What Is Cap Theorem? How Is It Applicable To Nosql Systems?
Ans: Eric Brewer posted the CAP theorem in early 2000.
In it he discusses three system attributes within the context of distributed databases as follows:
- Consistency: The notion that all nodes see the same data at the same time.
- Availability: A guarantee that every request to the system receives a response about whether it was successful or not.
- Partition Tolerance: A quality stating that the system continues to operate despite failure of part of the system.
The common understanding around the CAP theorem is that a distributed database system may only provide at most 2 of the above 3 capabilities. As such, most NoSQL databases cite it as a basis for employing an eventual consistency model with respect to how database updates are handled.
17.How Does NoSQL relate to big data?
Ans: NoSQL databases are designed with “Big Data” needs in mind. Since they are not bound by a fixed schema model, this makes them suitable for today’s business needs where there is a large volume of non-uniform data (Big Data).
18.Can you explain the transaction support by using a BASE in NoSQL?
Ans: The CAP theorem states that distributed systems cannot achieve all three properties at the same time; consistency, availability and partition tolerance. The BASE system gives up on consistency while maintaining the other two. The BASE system works well despite physical network partitions and always allow a client with reading and write availability.
- BASE stands for:
19.List the different kinds of NoSQL data stores?
The variety of NoSQL data stores available which are widely distributed are categorized into four categories. They are: –
Key-value store– it is a simple data storage key system that uses keys to access different values.
Column family store– it is a sparse matrix system. It uses columns and rows as keys.
Graph store– it is used in case of relationships-intensive problems.
Document stores- it is used for storing hierarchical data structures directly in the database.
20.What do you mean by eventual consistency in NoSQL stores?
Ans: Eventual consistency in NoSQL means that when all the service logics have been executed, the system is left in a consistent state. For achieving high availability, this concept is used in the distributed systems. It gives a guarantee that, if new updates are not made to a given data item, then eventually all accesses to that item will return the last updated value. In NoSQL, it is provided in terms of BASE and RDMS are also known as the ACID properties. Present NoSQL databases provide client applications with a guarantee of eventual consistency. Some NoSQL databases like- MongoDB and Cassandra are eventually consistent in some of the configurations.
21 .Why MongoDB is known as the best NoSQL database?
MongoDB is the best NoSQL database because it is:
1. Document Oriented
2. Rich Query language
3. High Performance
4. Highly Available
5. Easily Scalable
22.What challenges did you face while working on NoSQL?
Ans: This question may be specific to your technology and completely depends on your past work experience. So you need to just explain the challenges you faced related to NoSQL in your Project.