Export Data from Redshift to S3 Bucket, Load Cloud Storage and Query on BQ

Hello Everyone,

My post header is going to the moon sorry for that 🙂

Last week I spent my time on GC but, if your data is in RS you have to unload your data from redshift to s3 cause of GC is not loaded data directly in redshift. So you have to use a bridge for that issue if your programming skills not enough in some scripting or oop languages you can do like below;

1 – Create a scheduled task for execution of below code;

unload (‘select * from test.bietltools where somefields is not null limit 100;’)
to ‘s3://bietltools-external/2018-08-29/bietltools_reports_dm_dt.csv’
iam_role ‘arn:aws:iam::123456789101:role/RedshiftS3Access’;

That query must be executed in Redshift, so if everything is fine on S3;

2 – Create a transfer task in Google Cloud Storage

Get your credentials and go to the cloud storage interface and create a transfer task in GCS from s3, fill text with your own credentials and bucket name etc.

You can find details on cloud storage page when you get your data from s3 to cloud storage.

3 – Create an external table in BigQuery

Create your Big Query table like an external table on BigQuery interface and create your first table with the schema or without the schema.

If you care about header in your file, no worry you can add the fields in big query table options side.

And now you are ready to query your data, enjoy querying 🙂

Installation of Tableau Server Ubuntu On Promise 2018.2

Hi Everyone,

On my last post, I mentioned to Easy DWH and Easy Dashboarding that post is related with my last post.

Lets Start;

Create a folder and store downloaded file in that path.

mkdir tableauinst

wget https://www.tableau.com/downloads/server/deb

Step 1: Install Tableau Server package and start Tableau Services Manager

Install Tableau Server with your distribution’s package manager, then run a script to initialize Tableau Services Manager (TSM). Tableau Services Manager is the management toolset used to install, configure, and manage Tableau services.

The initialize script is included with the installed package.

1 – Log on as a user with sudo access to the computer where you want to install Tableau Server.

2 – Navigate to the directory where you copied the Tableau Server installation package.

3 – Use the package manager to install the Tableau Server package.

sudo apt-get update
sudo apt-get -y install gdebi-core
sudo gdebi -n tableau-server-<version>_amd64.deb

4 – Run the following script to start TSM:

sudo ./initialize-tsm –accepteula

5 – After initialization is complete, close the terminal session:
logout

6 – Navigate to the scripts directory:

cd /opt/tableau/tableau_server/packages/scripts.<version>

7 – If your organization uses a forward proxy solution to access the internet, then configure Tableau Server to use the proxy server. Tableau Server must access the internet for map data and for default licensing functionality.

Step 2: Activate and register Tableau Server

Before you can configure Tableau Server you must activate a license and register.

Beginning by logging on to the TSM web UI. See Sign in to Tableau Services Manager Web UI.

Step 3: Configure general server settings

The most important configuration on this Setup page is the identity store option.

I did Local Auth, so you can specify others, check on Tableau website.

Step 4: Create the Tableau Server administrator account

Create the Tableau Server administrator account.

  • If you are using LDAP for authentication, then the account that you specify here must be a user in the directory.

    Run the following command:

    tabcmd initialuser --server 'localhost:80' –username '<AD-user-name>'

  • On the other hand, if you are running Tableau Server with local authentication, the username and password that you specify here will be used to create the administrative account. Enter a strong password for this account.

    Run the following command:

    tabcmd initialuser --server 'localhost:80' --username 'admin'

Use this account to access the Tableau Server admin web pages.

Step 5: Configure local firewall (optional)

We recommend that you run a local firewall on the computer that is running Tableau Server. This is a security best practice. By default, Linux distributions do not enable firewall during standard installations.

If you have installed or enabled a local firewall then you must open two ports for Tableau Server. These ports are the gateway port (TCP 80) and the tabadmincontroller port (TCP 8850). The following procedure shows an example of how to open these ports using Firewalld, which is the default firewall on CentOS. If you are using a different firewall then you’ll need to determine the right commands to run to open these ports.

  1. Start firewalld:

    sudo systemctl start firewalld

  2. Set default zone to public. Run the following command:

    sudo firewall-cmd --set-default-zone=public

  3. Add ports for the gateway port and the tabadmincontroller port. Run the following commands:

    sudo firewall-cmd --permanent --add-port=80/tcp

    sudo firewall-cmd --permanent --add-port=8850/tcp

  4. Reload the firewall and verify the settings. Run the following commands:

    sudo firewall-cmd --reload

    sudo firewall-cmd --list-all

    Bingo, Tableau is working now, just a sample from tableau dashboard 🙂

Analytics on GC – BQ

Hello Everyone,

Last a few days I spent the time to find some solutions for easy DWH and easy Dashboarding.

Lets Start;

 

1Create a GC account

 

Which is promoted 300 $ per 1st year.

https://cloud.google.com/gcp/

2 – Create A project and enable billing for that project.

 

Just write your credit card information for 1$ sample payment, Google will send again your bank account.

If you have some data from your current DWH or some files on somewhere;

3Create a Cloud Storage for BQ
set getting data from some sources

4Create a Sync Job for s3 or wherever you want to get data sources

I’m getting my data from s3, you have to set a name for source and one name for a destination, so I mention that like s3toGCstorage source and for destination destinations3toGCstorage.

And now you have your data on GC Storage, it scheduled and it works fine.

5Create a dataset on BigQuery

bq mk BigTableau

6Create a Table and Load Data from GCStorage
On webui or activate cloud shell

bq –location=[LOCATION] load –source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

In the end, you scheduled your task on GC Storage, that task getting your data from your sources, you created a BQ table to the struct that your data, and now we have to create a dashboard or whatever.

I will install tableau server for my issue, but you can use data studio in GC or whatever you want, in last a few years lots of Dashboard tools support to BQ for sourcing.

I will mention that in another post.

Tableau-Server Version Upgrade on Ubuntu

Tableau Server Pre-Upgrade

Create an image of Instance of the server, backup_tableau-1_24082018

Connect to tableau servers with ssh and login TSM on the command line.

tsm login -u XXXXX (which is your admin tsm user)

Perform a Full Backup and Restore of Tableau Server
There are two types of backup data that Tableau Server can generate. We recommend performing regular backups of each type in case you must restore a server in a recovery scenario:

1 – Repository data: Tableau Server data consists of the Tableau PostgreSQL database or repository, which contains workbook and user metadata, data extract files, and site configuration data. When you use TSM to create a backup, all of this data is saved in a single file with a .tsbak extension.

tsm maintenance backup -f tableau_server<version>.tsbak -d

2 – Configuration and Topology data: includes most of the server configuration information required to fully recover a server. SMTP, alerting, authentication, are all examples of configuration data that are exportable for backup. Topology data defines how your Tableau Server processes are configured in both single-server and multiple node deployments.

tsm settings export -f configuration.json

Start to Upgrade

1 – Log on as a user with sudo access to the computer you are upgrading.

Download which version you want to get on Tableau Web site:

1.a) wget https://downloads.tableau.com/esdalt/2018.2.0/tableau-server-2018-2-0_amd64.deb

2 – Use the package manager to install the Tableau Server package

2.a) sudo gdebi -n tableau-server-2018-2-0_amd64.deb

3 – After you have installed the new package on every node in your cluster

3.a) tsm stop

4 – With Tableau Server stopped, run the following command on any one node.

4.a) sudo /opt/tableau/tableau_server/packages/scripts.20182.18.0627.2230/upgrade-tsm –accepteula -u XXXXX (which is your admin tsm user)

5 – The upgraded Tableau binary directory will be added to PATH for new shells. To get the updated path, either start a new session, or for bash users run:

5.a) source /etc/profile.d/tableau_server.sh

6 – Start Tableau Server:

6.a) tsm start -u XXXXX (which is your admin tsm user)

Confirm that everything works as expected, these are some areas of testing to consider:

Server processes:
Sign in to Tableau Server as a server administrator, and then open the Server Status page to confirm that all services and processes are running as expected (including on all additional nodes if this is a distributed installation).

User access:
Confirm that Tableau Server users can sign in. Test your normal user sign in process. Have some of your users participate in the testing to make sure they are able to sign in as expected, and that they can get to the same content that they have access to in your production environment.

Publishing workbooks and data sources:
Have users publish workbooks and data sources from Tableau Desktop to make sure this goes as you expect.

Viewing published workbooks:
Have users who are familiar with the content try to view published workbooks to make sure they appear as expected. Test views embedded in web pages (for example, in SharePoint pages).

Subscriptions and extract refreshes :
Manually run some extract refreshes to confirm that they complete successfully. Run some key scheduled extract refreshes to confirm that they complete as expected.

Permissions:
Confirm that permissions are still set as expected for users and content.

Command-line utilities and APIs. If applicable, test the command line utilities (tsm and tabcmd) and programmatic access via APIs.

tsm status -v if something going wrong try to figure out tsm restart command.

If tsm is not started try to check logs with below command;
The upgrade log, app-upgrade.log, is located in /var/opt/tableau/tableau_server/logs

Why mysql is not used on big data?

Hi everyone,

I try to check database management system which is most used or which is most valueable in the world. I found a website like compare the databases for general structure.

I compared them on a link, you can check below, but i didnt understood to why mysql is not used in big data. If anyone can answer this question, you can contact me on about-contact page.

http://db-engines.com/en/system/DB2%3BMicrosoft+SQL+Server%3BMySQL%3BOracle%3BPostgreSQL

By the way, if you want to follow rank of databases you can catch on below link;

http://db-engines.com/en/ranking/relational+dbms

Cheers,

Talih Bayram

Career Roadmapping About Data

In following up on Dave Wells’ recent piece titled The Evolution (and Opportunity) of IT Careers, Jennifer takes a different look at the challenges of trying to understand why some people are happy and successful in their careers, while others just continue to struggle.


The concept of program, project, and operations as a significant career influence is one that I’ve worked with for years, and over that time it has inspired quite a bit of discussion. While Dave’s ideas about information, data, and systems are interesting, I think that they need to be slightly adjusted from a career perspective. Instead of information, data, and systems, let’s look at information, data, and technology. Today’s reality is that information and technology are diverging. So what makes sense from a career perspective is data, information, and technology roles.

jhay_jul01

Now let’s intersect that with program, project, and operations to adjust the grid from the previous article. It now looks like the image in Figure 1.

In my career guidance role, I see a lot of potential value to use this view as a framework to build career roadmaps.

A career roadmap is a navigational concept that shows not only where you’ve been in your career but more interestingly where you aspire to go as your career unfolds into the future. At a high level, the roadmap looks at progression through the nine tiles illustrated in the diagram.

jhay_jul02

Let’s look at the career roadmap for a persona that we’ll call Raoul. He started his career as a maintenance programmer, which clearly places him in the bottom right corner. Within a few years he moved from maintenance programming to software development, placing him at the intersection of project work and technology. His project work has sparked an interest in data and he now aspires to become a data architect at the intersection of data program work. Raoul’s career roadmap options are illustrated in Figure 2.

Raoul has a number of options to consider. He could stay at the project level and move horizontally into data as a data modeler (that’s path 1 in red), then move from data modeler to data architect.

jhay_jul03

He could stay in the technology space, moving from project to program level by becoming a systems architect followed by a shift to data architect (that’s path 2 in purple). He could move diagonally from technology/project to data/program, but that’s a pretty aggressive move and likely more difficult to achieve. The edge-to-edge moves illustrated by paths 1 and 2 tend to be easier because the gaps between tiles are not as great as when attempting a corner-to-corner move such as illustrated by path 3.

So which path makes best sense for Raoul? Following path 1, he gets advantage from his project achievements and relationships, and begins to develop data experience. Following path 2 he gets an advantage from his technology and programming achievements and relationships and shifts from developer to architect. Each works as a step along the path to his ultimate goal of becoming a data architect. The best path depends on a combination of his interests and the job opportunities that are available to him.

Now let’s look at Lucy, another persona. At various times throughout her career, Lucy has worked as a business data analyst (project/information), a DBA (operations/data), and a data architect. Lucy’s data architect role was especially interesting because some of the most important work that she performed was as a liaison between the architecture group and project team. She was working not in a single tile but in two adjacent tiles – program/data and project/data. Lucy’s career roadmap is illustrated in figure 3.

jhay_jul04.png

Today, combining her business, technical, and data experience she believes that she is a natural fit for a lead data steward role. But where does data steward fit in the framework? It doesn’t really fit into any of the tiles, nor is it represented by two or more adjacent tiles. The data steward role is an example of working in the “white space” that separates all of the tiles. White space jobs are often the most interesting of all, and they’re certainly important as essential roles that connect all of the pieces.

Summary

I’m in absolute agreement with Dave Wells in that the use of roadmaps in career planning will continue to grow as the field expands to include big data, analytics, and other advances. There will be interesting times ahead of us as technology demands increase and the IT field diversifies with business units assuming many roles that have traditionally existed in IT departments. Rapid evolution of both technology and skills will continue to be the norm as abundant opportunities emerge for every data, information, and systems professional.

 

Thank you so much for this amazing article to Jennifer Hay.

I’m glad to read Jennifer’s suggestions on this article and so sharing.

You can see original article on tdan.com

http://tdan.com/career-roadmapping/20012

It’s all in the Data: Everybody is a Data Steward

A Data Steward is someone that has formal accountability for data in the organization. I say that everybody in the organization is a Data Steward. You may disagree with me or think that this idea is preposterous; however, I hope to change your mind by the end of this short column. Please give me five minutes.

Data-Column

My premise is based on the fact that everybody that comes in contact with data should have formal accountability for that contact. In other words, people that define, produce, and use data must be held accountable for how they define, produce, and use the data. This may be common sense, but the truth is that this is not taking place. Formalizing accountability to execute and enforce authority over data is the essence of using stewardship to govern data.

Most people agree that everybody that uses sensitive data must protect that data. The sensitive data may contain PII data (personally identifiable information) or PHI data (personal health information) or even IP data (intellectual property) that has a clear set of rules associated with how that data can be shared and who can have access to that data. The rules may be external as in the case of PII and PHI data, or the rules can be internal as in the case of IP data. But one thing is for certain: there are rules associated with at least some of your data.

The truth is that the rules for protecting sensitive data must 1) apply equally to everybody that comes in contact with sensitive data, 2) everybody must know and live the rules, 3) the rules must be formally enforced, and 4) the ability to demonstrate that people are following the rules must be auditable. This, my friends, is what I am proving in this column. Everybody that uses sensitive data must be held formally accountable for how they use the data. Therefore, they are, by my definition, a Data Steward. A Non-Invasive Data Governance™ program focuses on formalizing that level of data usage accountability.

Data Usage is only one facet of the Everybody is a Data Steward notion. What about people that define or produce data? Shouldn’t they also have formal accountability for their actions? The answer to that question is ‘Yes.’

People that define data – either by entering the data or finding new data sources, creating new systems, creating new databases, or propagating new spread-marts that will be used for decision making – should be held formally accountable for checking to see what already exists before producing, as an example, another version of the customer. People that define the ‘golden record’ or system-of-record or master data resources for your organization should be held formally accountable for the quality and value of the definition of that data.

Non-Invasive Data Governance™ recognizes the data producers as stewards of the data as well. If you produce data one of the ways mentioned previously, it is important that you understand the impact you have on the value of that data to the organization. Accepting default values may or may not be a good thing. Entering dummy data where real data is required is never a good thing. Allowing data that is not up to standards to enter your data resources may wreak havoc on decision-making. Calculating profitability may be inconsistent from product to product. People that produce data – through their functions and processes – should be held accountable for how they produce that data including the quality, accuracy, and value of the data they produce.

It all boils down to whether or not you believe that everybody with a relationship to the data should be held formally accountable for that relationship. Basically, every person in your organization has a relationship to the data. Therefore,Everybody is a Data Steward.

The idea that Everybody is a Data Steward may scare you a smidge. Most data governance programs do not follow the thinking that everybody in the organization is a data steward. In fact, most programs assign or hire people to be data stewards. The Non-Invasive Data Governance™ approach allows for certain people to be stewards at a more tactical level (subject matter experts), but the approach calls for identifying or recognizing these people based on their existing levels of authority associated with their data domains.

Are you convinced yet that Everybody is a Data Steward? Does this concept mean that your data governance program will become, in some way, more complex?  From my experience the answer is ‘not necessarily’. It depends on how you communicate and address this main tenet of data stewardship. The Everybody is a Data Steward notion guarantees that accountability for data is consistent across the organization for everybody.

Thanks for tdan community, for this article.

You can see original article in this link;

http://tdan.com/its-all-in-the-data-everybody-is-a-data-steward-get-over-it/20003

Different Types of Databases Top 12

1.0 Relational Databases

This is the most common of all the different types of databases. In this, the data in a relational database is stored in various data tables. Each table has a key field which is used to connect it to other tables. Hence all the tables are related to each other through several key fields. These databases are extensively used in various industries and will be the one you are most likely to come across when working in IT.

Examples of relational databases are Oracle, Sybase and Microsoft SQL Server and they are often key parts of the process of software development. Hence you should ensure you include any work required on the database as part of your project when creating a project plan and estimating project costs.

2.0 Operational Databases

In its day to day operation, an organisation generates a huge amount of data. Think of things such as inventory management, purchases, transactions and financials. All this data is collected in a database which is often known by several names such as operational/ production database, subject-area database (SADB) or transaction databases.

An operational database is usually hugely important to Organisations as they include the customer database, personal database and inventory database ie the details of how much of a product the company has as well as information on the customers who buy them. The data stored in operational databases can be changed and manipulated depending on what the company requires.

3.0 Database Warehouses

Organisations are required to keep all relevant data for several years. In the UK it can be as long as 6 years. This data is also an important source of information for analysing and comparing the current year data with that of the past years which also makes it easier to determine key trends taking place. All this data from previous years are stored in a database warehouse. Since the data stored has gone through all kinds of screening, editing and integration it does not need any further editing or alteration.

With this database ensure that the software requirements specification (SRS) is formally approved as part of the project quality plan.

4.0 Distributed Databases

Many organisations have several office locations, manufacturing plants, regional offices, branch offices and a head office at different geographic locations. Each of these work groups may have their own database which together will form the main database of the company. This is known as a distributed database.

5.0 End-User Databases

There is a variety of data available at the workstation of all the end users of any organisation. Each workstation is like a small database in itself which includes data in spreadsheets, presentations, word files, note pads and downloaded files. All such small databases form a different type of database called the end-user database.

6.0 External Database

There is a sea of information available outside world which is required by an organisation. They are privately-owned data for which one can have conditional and limited access for a fortune. This data is meant for commercial usage. All such databases outside the organisation which are of use and limited access are together called external database.

7.0 Hypermedia Database

Most websites have various interconnected multimedia pages which might include text, video clips, audio clips, photographs and graphics. These all need to be stored and “called” from somewhere when the webpage if created. All of them together form the hypermedia database.

Please note that if you are creating such a database from scratch to be generous when creating a project plan, detailed when defining the business requirements documentation (BRD) and meticulous in your project cost controls. I have seen too many projects where the creation of one of these databases has caused scope creep and an out of control budget for a project.

8.0 Navigational Database

Navigational database has all the items which are references from other objects. In this, one has to navigate from one reference to other or one object to other. It might be using modern systems like XPath. One of its applications is the air flight management systems.

9.0 In-Memory Database

An in-memory databases stores data in a computer’s main memory instead of using a disk-based storage system. It is faster and more reliable than that in a disk. They find their application in telecommunications network equipments.

10.0 Document-Oriented Database

A document oriented database is a different type of database which is used in applications which are document oriented. The data is stored in the form of text records instead of being stored in a data table as usually happens.

11.0 Real-Time Database

A real-time database handles data which constantly keep on changing. An example of this is a stock market database where the value of shares change every minute and need to be updated in the real-time database. This type of database is also used in medical and scientific analysis, banking, accounting, process control, reservation systems etc. Essentially anything which requires access to fast moving and constantly changing information.

Assume that this will require much more time than a normal relational database when it comes to the software testing life cycle, as these are much more complicated to efficiently test within normal time frames.

12.0 Analytical Database

An analytical database is used to store information from different types of databases such as selected operational databases and external databases. Other names given to analytical databases are information databases, management databases or multi-dimensional databases. The data stored in an analytical database is used by the management for analysis purposes, hence the name. The data in an analytical database cannot be changed or manipulated.

Different Types of Databases Top 12 – Tip

Of the different types of databases, relational is the most common and includes such well known names as Oracle, No-SQL, Couchbase, Hadoop, Sybase and SQL Server. However as a project manager you need to be prepared for anything, hence why having a high level view of the different databases is useful particularly when managing a software development life cycle. Regarding the remainder, you will hear a great deal about database warehouses. This is a highly specialized area which involves mining the data produced to generate meaningful trends and reports for senior management to act upon.

I used referenced web site for this post : my-project-management-expert.com

We have a referrer link, for original article of this post, if you want you can follow my-project-management-expert.com

Special thanks for my-project-management-expert team, for this article, and you can see the post at below link;

http://www.my-project-management-expert.com/different-types-of-databases.html

What are the different types of databases?

QUICK ANSWER

The different types of databases include operational databases, end-user databases, distributed databases, analytical databases, relational databases, hierarchical databases and database models. Databases are classified according to their type of content, application area and technical aspect. For instance, a deductive database combines logic programming with a relational database, while a graph database uses graph structures to represent and store information.

FULL ANSWER

Other types of databases include hypertext databases, mobile databases, parallel databases, active databases, cloud databases, in-memory databases, spatial databases, temporal databases, real-time databases, probabilistic databases and embedded databases.

A database is an organized collection of data. Its primary function is to interact with a database management system to capture and analyze data. A database management system is a software system designed to allow the creation, querying and administration of databases. Some popular database management systems include PostgreSQL, MySQL, Microsoft SQL Server, Oracle, IBM DB2 and SAP.

Databases are designed to operate large amounts of information by inputting, storing, retrieving and managing it. They are set up in a way that allows users to easily and intuitively gain access to all the information. A database management maintains the integrity and security of stored data. It is also used for data recovery, in case of system failure.

 

What is Database (DB) ?

(1) Often abbreviated DB, a database is basically a collection of information organized in such a way that a computer program can quickly select desired pieces of data. You can think of a database as an electronic filing system.

Traditional databases are organized by fields, records, and files. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records. For example, a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number.

DATABASE

An alternative concept in database design is known as Hypertext. In a Hypertext database, any object, whether it be a piece of text, a picture, or a film, can be linked to any other object. Hypertext databases are particularly useful for organizing large amounts of disparate information, but they are not designed for numerical analysis.

To access information from a database, you need a database management system (DBMS). This is a collection of programs that enables you to enter, organize, and select data in a database.

(2) Increasingly, the term database is used as shorthand for database management system. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes.

I used referenced web site for this post : webopedia.com

We have a referrer link, for original article of this post, if you want you can follow webopedia.com

Special thanks for webopedia team, for this article, and you can see the post at below link;

http://www.webopedia.com/TERM/D/database.html