4 items tagged "SQL "

  • 6 Basic Security Concerns for SQL Databases

    Durjoy-Patranabish-Blueocean-Market-IntelligenceConsider these scenarios: A low-level IT systems engineer spills soda, which takes down a bank of servers; a warehouse fire burns all of the patient records of a well-regarded medical firm; a government division’s entire website vanishes without a trace. Data breaches and failures are not isolated incidents. According to the 2014 Verizon Data Breach Investigations Report, databases are one of the most critical vulnerability points in corporate data assets. Databases are targeted because their information is so valuable, and many organizations are not taking the proper steps to ensure data protection.

    • Only 5 percent of billions of dollars allocated to security products is used for security in data centers, according to a report from International Data Corporation (IDC).
    • In a July 2011 survey of employees at organizations with multiple computers connected to the Internet, almost half said they had lost or deleted data by accident.
    • According to Fortune magazine, corporate CEOs are not making data security a priority, seemingly deciding that they will handle a data problem if it actually happens.

    You might think CEOs would be more concerned, even if it is just for their own survival. A 2013 data breach at Target was widely considered to be an important contributing factor to the ouster of Greg Steinhafel, then company president, CEO and chairman of the board. The Target breach affected more than 40 million debit and credit card accounts at the retailing giant. Stolen data included names of customers, their associated card numbers, security codes and expiration dates.
    Although the threats to corporate database security have never been more sophisticated and organized, taking necessary steps and implementing accepted best practices will decrease the chances of a data breach, or other database security crisis, taking place at your organization.

    6 Basic Security Concerns

    If you are new to database administration, you may not be familiar with the basic steps you can take to improve database security. Here are the first moves you should make

    1. The physical environment. One of the most-often overlooked steps in increasing database security is locking down the physical environment. While most security threats are, in fact, at the network level, the physical environment presents opportunities for bad actors to compromise physical devices. Unhappy employees can abscond with company records, health information or credit data. To protect the physical environment, start by implementing and maintaining strict security measures that are detailed and updated on a regular basis. Severely limit access to physical devices to only a short list of employees who must have access as part of their job. Strive to educate employees and systems technicians about maintaining good security habits while operating company laptops, hard drives, and desktop computers. Lackadaisical security habits by employees can make them an easy target.


    2. Network security. Database administrators should assess any weak points in its network and how company databases connect. An updated antivirus software that runs on the network is a fundamental essential item. Also, ensure that secure firewalls are implemented on every server. Consider changing TCP/IP ports from the defaults, as the standard ports are known access points for hackers and Trojan horses.


    3. Server environment. Information in a database can appear in other areas, such as log files, depending on the nature of the operating system and database application. Because the data can appear in different areas in the server environment, you should check that every folder and file on the system is protected. Limit access as much is possible, only allowing the people who absolutely need permission to get that information. This applies to the physical machine as well. Do not provide users with elevated access when they only need lower-level permissions.


    4. Avoid over-deployment of features. Modern databases and related software have some services designed to make the database faster, more efficient and secure. At the same time, software application companies are in a very competitive field, essentially a mini arms race to provide better functionality every year. The result is that you may have deployed more services and features than you will realistically use. Review each feature that you have in place, and turn off any service that is not really needed. Doing so cuts down the number of areas or “fronts” where hackers can attack your database.


    5. Patch the system. Just like a personal computer operating system, databases must be updated on a continuing basis. Vendors constantly release patches, service packs and security updates. These are only good if you implement them right away. Here is a cautionary tale: In 2003, a computer worm called the SQL Slammer was able to penetrate tens of thousands of computer services within minutes of its release. The worm exploited a vulnerability in Microsoft’s Desktop Engines and SQL Server. A patch that fixed a weakness in the server’s buffer overflow was released the previous summer, but many companies that became infected had never patched their servers.


    6. Encrypt sensitive data. Although back-end databases might seem to be more secure than components that interface with end users, the data must still be accessed through the network, which increases its risk. Encryption cannot stop malicious hackers from attempting to access data. However, it does provide another layer of security for sensitive information such as credit card numbers.

    Famous Data Breaches

    Is all this overblown? Maybe stories of catastrophic database breaches are ghost stories, conjured up by senior IT managers to force implementation of inconvenient security procedures. Sadly, data breaches happen on a regular basis to small and large organizations alike. Here are some examples:

    • TJX Companies. In December 2006, TJX Companies, Inc., failed to protect its IT systems with a proper firewall. A group led by high-profile hacker Albert Gonzalez gained access to more than 90 million credit cards. He was convicted of the crime and invited to spend over 40 years in prison. Eleven other people were arrested in relation to the breach.
    • Department of Veterans Affairs. A database containing names, dates of birth, types of disability and Social Security numbers of more than 26 million veterans was stolen from an unencrypted database at the Department of Veterans Affairs. Leaders in the organization estimated that it would cost between $100 million and $500 million to cover damages resulting from the theft. This is an excellent example of human error being the softest point in the security profile. An external hard drive and laptop were stolen from the home of an analyst who worked at the department. Although the theft was reported to local police promptly, the head of the department was not notified until two weeks later. He informed federal authorities right away, but the department did not make any public statement until several days had gone by. Incredibly, an unidentified person returned the stolen data in late June 2006.
    • Sony PlayStation Network. In April 2011, more than 75 million PlayStation network accounts were compromised. The popular site was down for weeks, and industry experts estimate the company lost millions of dollars. It is still considered by many as the worst breach of a multiplayer gaming network in history. To this day, the company says it has not determined who the attacks were. The hackers were able to get the names of gamers, their email addresses, passwords, buying history, addresses and credit card numbers. Because Sony is a technology company, it was even more surprising and concerning. Consumers began to wonder: If it could happen to Sony, was their data safe at other big companies.
    • Gawker Media. Hackers breached Gawker Media, parent company of the popular gossip site Gawker.com, in December 2010. The passwords and email addresses of more than one million users of Gawker Media properties like Gawker, Gizmodo, and Lifehacker, were compromised. The company made basic security mistakes, including storing passwords in a format hackers could easily crack.

    Take These Steps

    In summary, basic database security is not especially difficult but requires constant vigilance and consistent effort. Here is a snapshot review:

    • Secure the physical environment.
    • Strengthen network security.
    • Limit access to the server.
    • Cut back or eliminate unneeded features.
    • Apply patches and updates immediately.
    • Encrypt sensitive data such as credit cards, bank statements, and passwords.
    • Document baseline configurations, and ensure all database administrators follow the policies.
    • Encrypt all communications between the database and applications, especially Web-based programs.
    • Match internal patch cycles to vendor release patterns.
    • Make consistent backups of critical data, and protect the backup files with database encryption.
    • Create an action plan to implement if data is lost or stolen. In the current computing environment, it is better to think in terms of when this could happen, not if it will happen.

    Basic database security seems logical and obvious. However, the repeated occurrences of major and minor data breaches in organizations of all sizes indicate that company leadership, IT personnel, and database administrators are not doing all they can to implement consistent database security principles.
    The cost to do otherwise is too great. Increasingly, corporate America is turning to cloud-based enterprise software. Many of today’s popular applications like Facebook, Google and Amazon rely on advanced databases and high-level computer languages to handle millions of customers accessing their information at the same time. In our next article, we take a closer look at advanced database security methods that these companies and other forward-thinking organizations use to protect their data and prevent hackers, crackers, and thieves from making off with millions of dollars worth of information.

    Source: Sys-con Media

  • Becoming a better data scientist by improving your SQL skills

    Becoming a better data scientist by improving your SQL skills

    Learning advanced SQL skills can help data scientists effectively query their databases and unlock new insights into data relationships, resulting in more useful information.

    The skills people most often associate with data scientists are usually those "hard" technical and math skills, including statistics, probability, linear algebra, algorithm knowledge and data visualization.  They need to understand how to work with structured and unstructured data stores and use machine learning and analytics programs to extract valuable information from these stores.

    Data scientists also need to possess "soft" skills such as business and domain process knowledge, problem solving, communication and collaboration.

    These skills, combined with advanced SQL abilities, enable data scientists to extract value, information and insight from data.

    In order to unlock the full value from data, data scientists need to have a collection of tools for dealing with structured information. Many organizations still operate and rely heavily on structured enterprise data stores, data warehouses and databases. Having advanced skills to extract, manipulate and transform this data can really set data scientists apart from the pack.

    Advanced vs. beginner SQL skills for data scientists

    The common tool and language for interacting with structured data stores is the Structured Query Language (SQL), a standard, widely adopted syntax for data stores that contain schemas that define the structure of their information. SQL allows the user to query, manipulate, edit, update and retrieve data from data sources, including the relational database, an omnipresent feature of modern enterprises.

    Relational databases that utilize SQL are popular within organizations, so data scientists should have SQL knowledge at both the basic and advanced levels.

    Basic SQL skills include knowing how to extract information from data tables as well as how to insert and update those records.

    Because relational databases are often large with many columns and millions of rows, data scientists won't want to pull the entire database for most queries but rather extract only the information needed from a table. As a result, data scientists will need to know at a fundamental level how to apply conditional filters to filter and extract only the data they need.

    For most cases, the data that analysts need to work with will not live on just one database, and certainly not in a single table in that database.

    It's not uncommon for organizations to have hundreds or thousands of tables spread across hundreds or thousands of databases that were created by different groups and at different periods. Data scientists need to know how to join these multiple tables and databases together, making it easier to analyze different data sets.

    So, data scientists need to have deep knowledge of JOIN and SELECT operations in SQL as well as their impact on overall query performance.

    However, to address more complex data analytics needs, data scientists need to move beyond these basic skills and gain advanced SQL skills to enable a wider range of analytic abilities. These advanced skills enable data scientists to work more quickly and efficiently with structured databases without having to rely on data engineering team members or groups.

    Understanding advanced SQL skills can help data scientists stand out to potential employers or shine internally.

    Types of advanced SQL skills data scientists need to know

    Advanced SQL skills often mean distributing information across multiple stores, efficiently querying and combining that data for specific analytic purposes.

    Some of these skills include the following:

    Advanced and nested subqueries. Subqueries and nested queries are important to combine and link data between different sources. Combined with advanced JOIN operations, subqueries can be faster and more efficient than basic JOIN or queries because they eliminate extra steps in data extraction.

    Common table expressions. Common table expressions allow you to create a temporary table that enables temporary storage while working on large query operations. Multiple subqueries can complicate things, so table expressions help you break down your code into smaller chunks, making it easier to make sense of everything. 

    Efficient use of indexes. Indexes keep relational databases functioning effectively by setting up the system for expecting and optimizing for particular queries. Efficient use of indexes can greatly speed up performance, making data easier and faster to find. Conversely, poor use of indexing can lead to high query time and slow query performance, resulting in systems that can have runaway performance when queried at scale.

    Advanced use of date and time operations. Knowing how to manipulate date and time can come in handy, especially when working with time-series data. Advanced date operations might require knowledge of date parsing, time formats, date and time ranges, time grouping, time sorting and other activities that involve the use of timestamps and date formatting.

    Delta values. For many reasons, you may want to compare values from different periods. For example, you might want to evaluate sales from this month versus last month or sales from December this year versus December last year. You can find the difference between these numbers by running delta queries to uncover insights or trends you may not have seen otherwise.

    Ranking and sorting methods. Being able to rank and sort rows or values is necessary to help uncover key insights from data. Data analytics requirements might include ranking data by number of products or units sold, top items viewed, or top sources of purchases. Knowing advanced methods for ranking and sorting can optimize overall query time and provide accurate results.

    Query optimization. Effective data analysts spend time not only formulating queries but optimizing them for performance. This skill is incredibly important once databases grow past a certain size or are distributed across multiple sources. Knowing how to deal with complex queries and generate valuable results promptly with optimal performance is a key skill for effective data scientists.

    The value of advanced SQL skills

    The main purpose of data science is to help organizations derive value by finding information needles in data haystacks. Data scientists need to be masters at filtering, sorting and summarizing data to provide this value. Advanced SQL skills are core to providing this ability.

    Organizations are always looking to find data science unicorns who have all the skills they want and more. Knowing different ways to shape data for targeted analysis is incredibly desirable.

    For many decades, companies have stored valuable information in relational databases, including transactional data and customer data. Feeling comfortable finding, manipulating, extracting, joining or adding data to these databases will give data scientists a leg up on creating value from this data.

    As with any skill, learning advanced SQL skills will take time and practice to master. However, enterprises provide many opportunities for data scientists and data analysts to master those skills and provide more value to the organization with real-life data and business problems to solve.

    Author: Kathleen Walch

    Source: TechTarget

  • Hadoop engine benchmark: How Spark, Impala, Hive, and Presto compare

    forresters-hadoop-predictions-2015AtScale recently performed benchmark tests on the Hadoop engines Spark, Impala, Hive, and Presto. Find out the results, and discover which option might be best for your enterprise

    The global Hadoop market is expected to expand at an average compound annual growth rate (CAGR) of 26.3% between now and 2023, a testimony to how aggressively companies have been adopting this big data software framework for storing and processing the gargantuan files that characterize big data. But to turbo-charge this processing so that it performs faster, additional engine software is used in concert with Hadoop.

    AtScale, a business intelligence (BI) Hadoop solutions provider, periodically performs BI-on-Hadoop benchmarks that compare the performances of various Hadoop engines to determine which engine is best for which Hadoop processing scenario. The benchmark results assist systems professionals charged with managing big data operations as they make their engine choices for different types of Hadoop processing deployments.

    Recently, AtScale published a new survey that I discussed with Josh Klahr, AtScale's vice president of product management.

    "In this benchmark, we tested four different Hadoop engines," said Klahr. "The engines were Spark, Impala, Hive, and a newer entrant, Presto. We used the same cluster size for the benchmark that we had used in previous benchmarking."

    What AtScale found is that there was no clear engine winner in every case, but that some engines outperformed others depending on what the big data processing task involved. In one case, the benchmark looked at which Hadoop engine performed best when it came to processing large SQL data queries that involved big data joins.

    "There are companies out there that have six billion row tables that they have to join for a single SQL query," said Klahr. "The data architecture that these companies use include runtime filtering and pre-filtering of data based upon certain data specifications or parameters that end users input, and which also contribute to the processing load. In these cases, Spark and Impala performed very well. However, if it was a case of many concurrent users requiring access to the data, Presto processed more data."

    The AtScale benchmark also looked at which Hadoop engine had attained the greatest improvement in processing speed over the past six months.

    "The most noticeable gain that we saw was with Hive, especially in the process of performing SQL queries," said Klahr. "In the past six months, Hive has moved from release 1.4 to 2.1—and on an average, is now processing data 3.4 times faster."
     
    Other Hadoop engines also experienced processing performance gains over the past six months. Spark was processing data 2.4 times faster than it was six months ago, and Impala had improved processing over the past six months by 2.8%. In all cases, better processing speeds were being delivered to users.

    "What we found is that all four of these engines are well suited to the Hadoop environment and deliver excellent performance to end users, but that some engines perform in certain processing contexts better than others," said Klahr. "For instance, if your organization must support many concurrent users of your data, Presto and Impala perform best. However, if you are looking for the greatest amount of stability in your Hadoop processing engine, Hive is the best choice. And if you are faced with billions of rows of data that you must combine in complicated data joins for SQL queries in your big data environment, Spark is the best performer."

    Klahr said that many sites seems to be relatively savvy about Hadoop performance and engine options, but that a majority really hadn't done much benchmarking when it came to using SQL.

    "The best news for users is that all of these engines perform capably with Hadoop," sad Klahr. "Now that we also have benchmark information on SQL performance, this further enables sites to make the engine choices that best suit their Hadoop processing scenarios."

    Source: techrepublic.com, October 29, 2016

  • Modern Information Management: Understanding Big Data at Rest and in Motion

    Big data is the buzzword of the century, it seems. But, why is everyone so obsessed with it? Here’s what it’s all about, how companies are gathering it, and how it’s stored and used.

    7979558647 6c822e698d o YO

    What is it?

    Big data is simply large data sets that need to be analyzed computationally in order to reveal patterns, associations, or trends. This data is usually collected by governments and businesses on citizens and customers, respectively.

    The IT industry has had to shift its focus to big data over the last few years because of the sheer amount of interest being generated by big business. By collecting massive amounts of data, companies, like Amazon.com, Google, Walmart, Target, and others, are able to track buying behaviors of specific customers.

    Once enough data is collected, these companies then use the data to help shape advertising initiatives. For example, Target has used its big data collection initiative to help target (no pun intended) its customers with products it thought would be most beneficial given their past purchases.

    How Companies Store and Use It

    There are two ways that companies can use big data. The first way is to use the data at rest. The second way is to use it in motion.

    At Rest Data – Data at rest refers to information that’s collected and analyzed after the fact. It tells businesses what’s already happened. The analysis is done separately and distinctly from any actions that are taken upon conclusion of said analysis.

    For example, if a retailer wanted to analyze the previous month’s sales data. It would use data at rest to look over the previous month’s sales totals. Then, it would take those sales totals and make strategic decisions about how to move forward given what’s already happened.

    In essence, the company is using past data to guide future business activities. The data might drive the retailer to create new marketing initiatives, customize coupons, increase or decrease inventory, or to otherwise adjust merchandise pricing.

    Some companies might use this data to determine just how much of a discount is needed on promotions to spur sales growth.

    Some companies may use it to figure out how much they are able to discount in the spring and summer without creating a revenue problem later on in the year. Or, a company may use it to predict large sales events, like Black Friday or Cyber Monday.

    This type of data is batch processed since there’s no need to have the data instantly accessible or “streaming live.” There is a need, however, for storage of large amounts of data and for processing unstructured data. Companies often use a public cloud infrastructure due to the costs involved in storage and retrieval.

    Data In Motion – Data in motion refers to data that’s analyzed in real-time. Like data at rest, data may be captured at the point of sale, or at a contact point with a customer along the sales cycle. The difference between data in motion and data at rest is how the data is analyzed.

    Instead of batch processing and analyzation after the fact, data in motion uses a bare metal cloud environment because this type of infrastructure uses dedicated servers offering cloud-like features without virtualization.

    This allows for real-time processing of large amounts of data. Latency is also a concern for large companies because they need to be able to manage and use the data quickly. This is why many companies send their IT professionals to Simplilearn Hadoop admin training and then subsequently load them up on cloud-based training and other database training like NoSQL.

    9427663067 713fa3e786 o

    Big Data For The Future

    Some awesome, and potentially frightening, uses for big data are on the horizon. For example, in February 2014, the Chicago Police Department sent uniformed officers to make notification visits to targeted individuals they had identified as potential criminals. They used a computer-generated list which gathered data about those individuals’ backgrounds.

    Another possible use for big data is development of hiring algorithms. More and more companies are trying to figure out ways to hire candidates without trusting slick resume writing skills. New algorithms may eliminate job prospects based on statistics, rather than skillsets, however. For example, some algorithms find that people with shorter commutes are more likely to stay in a job longer.

    So, people who have long commutes are filtered out of the hiring process quickly.

    Finally, some insurance companies might use big data to analyze your driving habits and adjust your insurance premium accordingly. That might sound nice if you’re a good driver, but insurers know that driving late at night increases the risk for getting into an accident. Problem is, poorer people tend to work late shifts and overnights or second jobs just to make ends meet. The people who are least able to afford insurance hikes may be the ones that have to pay them.

    Source: Mobilemag

EasyTagCloud v2.8