2 items tagged "data storage"

  • Database possibilities in an era of big data

    Database possibilities in an era of big data

    We live in an era of big data. The sheer volume of data currently existing is huge enough without also grappling with the amount of new information that’s generated every day. Think about it: financial transactions, social media posts, web traffic, IoTsensor data, and much more, being ceaselessly pulled into databases the world over. Outdated technology simply can’t keep up.

    The modern types of databases that have arisen to tackle the challenges of big data take a variety of forms, each suited for different kinds of data and tasks. Whatever your company does, choosing the right database to build your product or service on top of is a vital decision. In this article, we’ll dig into the different types of database options you could be considering for your unique challenges, as well as the underlying database technologies you should be familiar with. We’ll be focusing on relational database management systems (RDBMS), NoSQL DBMS, columnar stores, and cloud solutions.


    First up, the reliable relational database management system. This widespread variety is renowned for its focus on the core database attributes of atomicity (keeping tasks indivisible and irreducible), consistency (actions taken by the database obey certain constraints), isolation (a transaction’s immediate state is invisible to other transactions), and durability (data changes reliably persist). Data in an RDBMS is stored in tables and an RDBMS is able to tackle tons of data and complex queries as opposed to flat files, which tend to take up more memory and are less efficient. An RDBMS is usually made up of a collection of tables, each with columns (fields) and records (rows). Popular examples of RDBM systems include Microsoft SQL, Oracle, MySQL, and Postgres.

    Some of the strengths of an RDBMS include flexibility and scalability. Given the huge amounts of information that modern businesses need to handle, these are important factors to consider when surveying different types of databases. Ease of management is another strength since each of the constituent tables can be changed without impacting the others. Additionally, administrators can choose to share different tables with certain users and not others (ideal if working with confidential information you might not want shared with all users). It’s easy to update data and expand your database, and since each piece of data is stored at a single point, it’s easy to keep your system free from errors as well.

    No system is perfect, however. Each RDBMS is built on a single server, so once you hit the limits of the machine you’ve got, you need to buy a new one. Rapidly changing data can also challenge these systems, as increased volume, variety, velocity, and complexity create complicated relationships that the RDBMS can have trouble keeping up with. Lastly, despite having 'relation' in the name, relational database management systems don’t store the relationships between elements, meaning that the system doesn’t actually understand the connections between data as pertains to various joins you may be using. 


    NoSQL (originally, 'non relational' or 'not SQL') DBMS emerged as web applications were becoming more complex. These types of databases are designed to handle heterogeneous data that’s difficult to stick in a normalization schema. While they can take a wide array of forms, the most important difference between NoSQL and RDBMS is that while relational databases rigidly define how all the data contained within must be arranged, NoSQL databases can be schema agnostic. This means that if you’ve got unstructured and semi-structured data, you can store and manipulate it easily, whereas an RDBMS might not be able to handle it at all. 

    Considering this, it’s no wonder that NoSQL databases are seeing a lot of use in big data and real-time web apps. Examples of these database technologies include MongoDB, Riak, Amazon S3, Cassandra, and Hbase. However, one drawback of NoSQL databases is that they have 'eventual consistency', meaning that all nodes will eventually have the same data. However, since there’s a lag while all the nodes update, it’s possible to get out-of-sync data depending on which node you end up querying during the update window. Data consistency is a challenge with NoSQL since they do not perform ACID transactions.

    Columnar storage database

    A columnar storage database’s defining characteristic is that it stores data tables by column rather than by row. The main benefit of this configuration is that it accelerates analyses because the system only has to read the locations your query is interested in, all within a single column. Also, these systems compress repeating volumes in storage, allowing better compression, since the data in one specific column is homogeneous across all the columns (or, columns are all the same type: integers, strings, etc. so that they can be better compressed). 

    However, due to this feature, Columnar storage databases are not typically used to build transactional databases. One of the drawbacks of these types of database is that inserts and updates on an entire row (necessary for apps like ERPs and CRMs, for example) can be expensive. It’s also slower for these types of applications. For example, when opening an account’s page in a CRM, the app needs to read the entire row (name, address, email, account id, etc) to populate the page and write back all that as well. In this example, a relational database would be more efficient. 

    Cloud solutions

    While not technically a type of database themselves, no discussion of modern types of database solutions would be complete without discussing the cloud. In this age of big data and fast-moving data sources, data engineers are increasingly turning to cloud solutions (AWS, Snowflake, etc.) to store, access, and analyze their data. One of the biggest advantages of cloud options is that you don’t have to pay for the physical space or the physical machine associated with your database (or its upkeep, emergency backups, etc.). Additionally, you only pay for what you use: as your memory and processing power needs scale up, you pay for the level of service you need, but you don’t have to pre-purchase these capabilities.

    There are some drawbacks to using a cloud solution, however. First off, since you’re connecting to a remote resource, bandwidth limitations can be a factor. Additionally, even though the cloud does offer cost savings, especially when starting a company from scratch, the lifetime costs of paying your server fees could exceed what you would have paid buying your own equipment. Lastly, depending on the type of data you’re dealing with, compliance and security can be issues because the responsibility of managing the data and its security is no longer handled by you, the data owner, and instead by the third party provider. For example, unsecured APIs and interfaces that can be more readily exploited, data breaches, data loss or leakage risks can be elevated, and unauthorized access through improperly configured firewalls are some ways in which cloud databases can be compromised.

    Decision time

    The era of Big Data is changing the way companies deal with their data. This means choosing new database models and finding the right analytics and BI tools to help your team get the most out of your data and build the apps, products, and services that will shape the world. Whatever you’re creating, picking the right database type for you, and build boldly.

    Author: Jack Cieslak

    Source: Sisense

  • Why data lakes are the future of data storage

    Why data lakes are the future of data storage

    The term big data has been around since 2005, but what does it actually mean? Exactly how big is big? We are creating data every second. It’s generated across all industries and by a myriad of devices, from computers to industrial sensors to weather balloons and countless other sources. According to a recent study conducted by Data Never Sleeps, there are a quintillion bytes of data generated each minute, and the forecast is that our data will only keep growing at an unprecedented rate.

    We have also come to realize just how important data really is. Some liken its value to something as precious to our existence as water or oil, although those aren’t really valid comparisons. Water supplies can fall and petroleum stores can be depleted, but data isn’t going anywhere. It only continues to grow. Not just in volume, but also in variety and velocity. Thankfully, over the past decade, data storage has become cheaper, faster and more easily available, and as a result, where to store all this information isn’t the biggest concern anymore. Industries that work in the IoT and faster payments space are now starting to push data through at a very high speed and that data is constantly changing shape.

    In essence, all this gives rise to a 'data demon'. Our data has become so complex that normal techniques for harnessing it often fail, keeping us from realizing data’s full potential.

    Most organizations currently treat data as a cost center. Each time a data project is spun off, there is an 'expense' attached to it. It’s contradictive. On the one side, we’re proclaiming that data is our most valuable asset, but on the other side we perceive it as a liability. It’s time to change that perception, especially when it comes to banks. The volumes of data financial institutions have can be used to create tremendous value. Note that I’m not talking about 'selling the data', but leveraging it more effectively to provide crisp analytics that deliver knowledge and drive better business decisions.

    What’s stopping people from converting data from an expense to an asset, then? The technology and talent exist, but the thought process is lacking.

    Data warehouses have been around for a long time and traditionally were the only way to store large amounts of data that’s used for analytical and reporting purposes. However, a warehouse, as the name suggests, immediately makes one think of a rigid structure that’s limited. In a physical warehouse, you can store products in three dimensions: length, breadth and height. These dimensions, though, are limited by your warehouse’s architecture. If you want to add more products, you must go through a massive upgrade process. Technically, it’s doable, but not ideal. Similarly, data warehouses present a bit of rigidity when handling constantly changing data elements.

    Data lakes are a modern take on big data. When you think of a lake, you cannot define its shape and size, nor can you define what lives in it and how. Lakes just form, even if they are man-made. There is still an element of randomness to them and it’s this randomness that helps us in situations where the future is, well, sort of unpredictable. Lakes expand and contract, they change over periods of time, and they have an ecosystem that’s home to various types of animals and organisms. This lake can be a source of food (such as fish) or fresh water and can even be the locale for water-based adventures. Similarly, a data lake contains a vast body of data and is able to handle that data’s volume, velocity and variety.

    When the mammoth data organizations like Yahoo, Google, Facebook and LinkedIn started to realize that their data and data usage were drastically different and that it was almost impossible to use traditional methods to analyze it, they had to innovate. This in turn gave rise to technologies like document-based databases and big data engines like Hadoop, Spark, HPCC Systems and others. These technologies were designed to allow the flexibility one needs when handling unpredictable data inputs.

    Jeff Lewis is SVP of Payments at Sutton Bank, a small community bank that’s challenging the status quo for other banks in the payments space. 'Banks have to learn to move on from data warehouses to data lakes. The speed, accuracy and flexibility of information coming out of a data lake is crucial to the increased operational efficiency of employees and to provide a better regulatory oversight', said Lewis. 'Bankers are no longer old school and are ready to innovate with the FinTechs of the world. A data centric thought process and approach is crucial for success'.

    Data lakes are a natural choice to handle the complexity of such data, and the application of machine learning and AI are also becoming more common, as well. From using AI to clean and augment incoming data, to running complex algorithms to correlate different sources of information to detect complex fraud, there is an algorithm for just about everything. And now, with the help of distributed processing, these algorithms can be run on multiple clusters and the workload can be spread across nodes.

    One thing to remember is that you should be building a data lake and not a data swamp. It’s hard to control a swamp. You cannot drink from it, nor can you navigate it easily. So, when you look at creating a data lake, think about what the ecosystem looks like and who your consumers are. Then, embark on a journey to build a lake on your own.

    Source: Insidebigdata

EasyTagCloud v2.8