Analytics on GC – BQ

Hello Everyone,

Last a few days I spent the time to find some solutions for easy DWH and easy Dashboarding.

Lets Start;

 

1Create a GC account

 

Which is promoted 300 $ per 1st year.

https://cloud.google.com/gcp/

2 – Create A project and enable billing for that project.

 

Just write your credit card information for 1$ sample payment, Google will send again your bank account.

If you have some data from your current DWH or some files on somewhere;

3Create a Cloud Storage for BQ
set getting data from some sources

4Create a Sync Job for s3 or wherever you want to get data sources

I’m getting my data from s3, you have to set a name for source and one name for a destination, so I mention that like s3toGCstorage source and for destination destinations3toGCstorage.

And now you have your data on GC Storage, it scheduled and it works fine.

5Create a dataset on BigQuery

bq mk BigTableau

6Create a Table and Load Data from GCStorage
On webui or activate cloud shell

bq –location=[LOCATION] load –source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

In the end, you scheduled your task on GC Storage, that task getting your data from your sources, you created a BQ table to the struct that your data, and now we have to create a dashboard or whatever.

I will install tableau server for my issue, but you can use data studio in GC or whatever you want, in last a few years lots of Dashboard tools support to BQ for sourcing.

I will mention that in another post.

Career Roadmapping About Data

In following up on Dave Wells’ recent piece titled The Evolution (and Opportunity) of IT Careers, Jennifer takes a different look at the challenges of trying to understand why some people are happy and successful in their careers, while others just continue to struggle.


The concept of program, project, and operations as a significant career influence is one that I’ve worked with for years, and over that time it has inspired quite a bit of discussion. While Dave’s ideas about information, data, and systems are interesting, I think that they need to be slightly adjusted from a career perspective. Instead of information, data, and systems, let’s look at information, data, and technology. Today’s reality is that information and technology are diverging. So what makes sense from a career perspective is data, information, and technology roles.

jhay_jul01

Now let’s intersect that with program, project, and operations to adjust the grid from the previous article. It now looks like the image in Figure 1.

In my career guidance role, I see a lot of potential value to use this view as a framework to build career roadmaps.

A career roadmap is a navigational concept that shows not only where you’ve been in your career but more interestingly where you aspire to go as your career unfolds into the future. At a high level, the roadmap looks at progression through the nine tiles illustrated in the diagram.

jhay_jul02

Let’s look at the career roadmap for a persona that we’ll call Raoul. He started his career as a maintenance programmer, which clearly places him in the bottom right corner. Within a few years he moved from maintenance programming to software development, placing him at the intersection of project work and technology. His project work has sparked an interest in data and he now aspires to become a data architect at the intersection of data program work. Raoul’s career roadmap options are illustrated in Figure 2.

Raoul has a number of options to consider. He could stay at the project level and move horizontally into data as a data modeler (that’s path 1 in red), then move from data modeler to data architect.

jhay_jul03

He could stay in the technology space, moving from project to program level by becoming a systems architect followed by a shift to data architect (that’s path 2 in purple). He could move diagonally from technology/project to data/program, but that’s a pretty aggressive move and likely more difficult to achieve. The edge-to-edge moves illustrated by paths 1 and 2 tend to be easier because the gaps between tiles are not as great as when attempting a corner-to-corner move such as illustrated by path 3.

So which path makes best sense for Raoul? Following path 1, he gets advantage from his project achievements and relationships, and begins to develop data experience. Following path 2 he gets an advantage from his technology and programming achievements and relationships and shifts from developer to architect. Each works as a step along the path to his ultimate goal of becoming a data architect. The best path depends on a combination of his interests and the job opportunities that are available to him.

Now let’s look at Lucy, another persona. At various times throughout her career, Lucy has worked as a business data analyst (project/information), a DBA (operations/data), and a data architect. Lucy’s data architect role was especially interesting because some of the most important work that she performed was as a liaison between the architecture group and project team. She was working not in a single tile but in two adjacent tiles – program/data and project/data. Lucy’s career roadmap is illustrated in figure 3.

jhay_jul04.png

Today, combining her business, technical, and data experience she believes that she is a natural fit for a lead data steward role. But where does data steward fit in the framework? It doesn’t really fit into any of the tiles, nor is it represented by two or more adjacent tiles. The data steward role is an example of working in the “white space” that separates all of the tiles. White space jobs are often the most interesting of all, and they’re certainly important as essential roles that connect all of the pieces.

Summary

I’m in absolute agreement with Dave Wells in that the use of roadmaps in career planning will continue to grow as the field expands to include big data, analytics, and other advances. There will be interesting times ahead of us as technology demands increase and the IT field diversifies with business units assuming many roles that have traditionally existed in IT departments. Rapid evolution of both technology and skills will continue to be the norm as abundant opportunities emerge for every data, information, and systems professional.

 

Thank you so much for this amazing article to Jennifer Hay.

I’m glad to read Jennifer’s suggestions on this article and so sharing.

You can see original article on tdan.com

http://tdan.com/career-roadmapping/20012

It’s all in the Data: Everybody is a Data Steward

A Data Steward is someone that has formal accountability for data in the organization. I say that everybody in the organization is a Data Steward. You may disagree with me or think that this idea is preposterous; however, I hope to change your mind by the end of this short column. Please give me five minutes.

Data-Column

My premise is based on the fact that everybody that comes in contact with data should have formal accountability for that contact. In other words, people that define, produce, and use data must be held accountable for how they define, produce, and use the data. This may be common sense, but the truth is that this is not taking place. Formalizing accountability to execute and enforce authority over data is the essence of using stewardship to govern data.

Most people agree that everybody that uses sensitive data must protect that data. The sensitive data may contain PII data (personally identifiable information) or PHI data (personal health information) or even IP data (intellectual property) that has a clear set of rules associated with how that data can be shared and who can have access to that data. The rules may be external as in the case of PII and PHI data, or the rules can be internal as in the case of IP data. But one thing is for certain: there are rules associated with at least some of your data.

The truth is that the rules for protecting sensitive data must 1) apply equally to everybody that comes in contact with sensitive data, 2) everybody must know and live the rules, 3) the rules must be formally enforced, and 4) the ability to demonstrate that people are following the rules must be auditable. This, my friends, is what I am proving in this column. Everybody that uses sensitive data must be held formally accountable for how they use the data. Therefore, they are, by my definition, a Data Steward. A Non-Invasive Data Governance™ program focuses on formalizing that level of data usage accountability.

Data Usage is only one facet of the Everybody is a Data Steward notion. What about people that define or produce data? Shouldn’t they also have formal accountability for their actions? The answer to that question is ‘Yes.’

People that define data – either by entering the data or finding new data sources, creating new systems, creating new databases, or propagating new spread-marts that will be used for decision making – should be held formally accountable for checking to see what already exists before producing, as an example, another version of the customer. People that define the ‘golden record’ or system-of-record or master data resources for your organization should be held formally accountable for the quality and value of the definition of that data.

Non-Invasive Data Governance™ recognizes the data producers as stewards of the data as well. If you produce data one of the ways mentioned previously, it is important that you understand the impact you have on the value of that data to the organization. Accepting default values may or may not be a good thing. Entering dummy data where real data is required is never a good thing. Allowing data that is not up to standards to enter your data resources may wreak havoc on decision-making. Calculating profitability may be inconsistent from product to product. People that produce data – through their functions and processes – should be held accountable for how they produce that data including the quality, accuracy, and value of the data they produce.

It all boils down to whether or not you believe that everybody with a relationship to the data should be held formally accountable for that relationship. Basically, every person in your organization has a relationship to the data. Therefore,Everybody is a Data Steward.

The idea that Everybody is a Data Steward may scare you a smidge. Most data governance programs do not follow the thinking that everybody in the organization is a data steward. In fact, most programs assign or hire people to be data stewards. The Non-Invasive Data Governance™ approach allows for certain people to be stewards at a more tactical level (subject matter experts), but the approach calls for identifying or recognizing these people based on their existing levels of authority associated with their data domains.

Are you convinced yet that Everybody is a Data Steward? Does this concept mean that your data governance program will become, in some way, more complex?  From my experience the answer is ‘not necessarily’. It depends on how you communicate and address this main tenet of data stewardship. The Everybody is a Data Steward notion guarantees that accountability for data is consistent across the organization for everybody.

Thanks for tdan community, for this article.

You can see original article in this link;

http://tdan.com/its-all-in-the-data-everybody-is-a-data-steward-get-over-it/20003

What is Database (DB) ?

(1) Often abbreviated DB, a database is basically a collection of information organized in such a way that a computer program can quickly select desired pieces of data. You can think of a database as an electronic filing system.

Traditional databases are organized by fields, records, and files. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records. For example, a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number.

DATABASE

An alternative concept in database design is known as Hypertext. In a Hypertext database, any object, whether it be a piece of text, a picture, or a film, can be linked to any other object. Hypertext databases are particularly useful for organizing large amounts of disparate information, but they are not designed for numerical analysis.

To access information from a database, you need a database management system (DBMS). This is a collection of programs that enables you to enter, organize, and select data in a database.

(2) Increasingly, the term database is used as shorthand for database management system. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes.

I used referenced web site for this post : webopedia.com

We have a referrer link, for original article of this post, if you want you can follow webopedia.com

Special thanks for webopedia team, for this article, and you can see the post at below link;

http://www.webopedia.com/TERM/D/database.html

Why Data Integration ? – The Importance of Data Integration

Almost every Chief Information Officer (CIO) has the goal of integrating their organization’s data. In fact the issue of data integration has risen all the way to the Chief Financial Officer
(CFO) and Chief Executive Officer (CEO) level of a corporation. A key question is why is data integration becoming so important to so many C-level executives? There are several key reasons driving
this desire:

  • Provide IT Portfolio Management
  • Reduce IT Redundancy
  • Prevent IT Applications Failure

 
Provide IT Portfolio Management

Over the years I have had the opportunity to perform dozens of data warehousing assessments. During these assessments I always ask the client how much they spend annually on data warehousing. The
majority of companies and government organizations cannot give a relatively good estimate on what they actually spend. In order to manage these and any other costly information technology (IT)
initiatives it is critical to measure each one of them. However, it is impossible to measure them when most companies do not understand them (see Figure 1: “How To Manage IT”). This is
where IT Portfolio Management enters the picture.

 

i029fe0401

 

Figure 1: How To Manage IT

 

IT portfolio management refers to the formal process for managing IT assets. An IT asset is software, hardware, middleware, IT projects, internal staff, applications and external consulting. Like
every newer discipline, many companies that have started their IT portfolio management efforts have not done so correctly. I would like to list out some of the keys to building successful IT
portfolio management applications.

By properly managing their IT portfolio it allows the corporation to see which projects are proceeding well and which are lagging behind. In my experience, almost every large company has a great
deal of duplicate IT effort occurring (see later section on “Reduce IT Redundancy”). This happens because the meta data is not accessible. At my company we have a couple of large
clients whose primary goal is to remove these tremendous redundancies, which translates into tremendous initial and ongoing IT costs.
Reduce IT Redundancy

CIO is commonly defined as Chief Information Officer; however, there is another possible meaning to this acronym; Career Is Over. One of the chief reasons for this is that most IT departments are
“handcuffed” in needless IT redundancy that too few CIOs are willing and capable of fixing.

There are several CIO surveys that are conducted annually. These surveys ask “what are your top concerns for the upcoming year”. Regardless of the survey you look at “data
integration” will be high on the list. Now data integration has two facets to it. One is the integration of data across disparate systems for enterprise applications. The second is the
integration/removal of IT redundancies. Please understand that some IT redundancy is a good thing. For example, when there is a power outage and one of your data centers is non-operational you need
to have a backup of these systems/data. However, when I talk about IT redundancies I am addressing “needless” IT redundancy. Meaning, IT redundancy that only exists because of
insufficient management of our IT systems. I was working with a Midwestern insurance company that, over a four year span had initiated various decision support efforts. After this four year period
they took the time to map out the flow of data from their operational systems, to their data staging areas and finally to their data mart structures. What they discovered was Figure 2:
“Typical IT Architecture”.

i029fe0402

Figure 2: Typical IT Architecture

What is enlightening about Figure 2 is that when I show this illustration during a client meeting or at a conference keynote address the typical response that I receive from the people is
“Where did you get a copy of our IT architecture?” If you work at a Global 2000 company or any large government entity, Figure 2 represents an overly simplified version of your IT
architecture. These poor architecture habits create a litany of problems including:

  • Redundant Applications/Processes/Data
  • Needless IT Rework
  • Redundant Hardware/Software

Redundant Applications/Processes/Data

It has been my experience working with large government agencies and Global 2000 companies that needlessly duplicate data is running rampant throughout our industry. In my experience the typical
large organization has between 3 – 4 fold needless data redundancy. Moreover, I can name multiple organizations that have literally hundreds of “independent” data mart
applications spread all over the company. Each one of these data marts is duplicating the extraction, transformation and load (ETL) that is typically done centrally in a data warehouse. This
greatly increases the number of support staff required to maintain the data warehousing system as these tasks are the largest and most costly data warehousing activities. Besides duplicating this
process, each data mart will also copy the data as well requiring further IT resources. It is easy to see why IT budgets are straining under the weight of all of this needless redundancy.

Needless IT Rework

During the requirements gathering portion of one of our meta data management initiatives I had an IT project manager discuss the challenges that he is facing in analyzing one of the
mission-critical legacy applications that will feed the data warehousing application that his team has been tasked to build. During our interview he stated, “This has to be the twentieth time
that our organization is analyzing this system to understand the business rules around the data.” This person’s story is an all too common one as almost all organizations reinvent the
IT wheel on every project. This situation occurs because usually separate teams will typically build each of the IT systems and since they don’t have a Managed Meta Data Environment (MME),
these teams do not leverage the other’s standards, processes, knowledge, and lessons learned. This results in a great deal of rework and reanalysis.

Redundant Hardware/Software

I have discussed a great deal about the redundant application and IT work that occurs in the industry. All of this redundancy also generates a great deal of needless hardware and software
redundancy. This situation forces the enterprise to retain skilled employees to support each of these technologies. In addition, a great deal of financial savings is lost, as standardization on
these tools doesn’t occur. Often a software, hardware, or tool contract can be negotiated to provide considerable discounts for enterprise licenses, which can be phased into. These economies
of scale can provide tremendous cost savings to the organization.

In addition, the hardware and software that is purchased is not used in an optimal fashion. For example, I have a client that has each one of their individual IT projects buy their own hardware. As
a result, they are infamous for having a bunch of servers running at 25% capacity.

From the software perspective the problem only gets worse. While analyzing a client of mine I had asked their IT project leaders what software vendors have you standardized on? They answered
“all of them!” This leads to the old joke “What is the most popular form of software on the market? Answer…Shelfware!” Shelfware is software that a company purchases
and winds up never using and it just sits on the shelf collecting dust.

Prevent IT Applications Failure

When a corporation looks to undertake a major IT initiative, like a customer relationship management (CRM), enterprise resource planning (ERP), data warehouse, or e-commerce solution their
likelihood of project failure is between 65% – 80%, depending on the study referenced. This is especially alarming when we consider that these same initiatives traditionally have executive
management support and cost many millions of dollars. For example, I have one large client that is looking to roll out a CRM system (e.g. Siebel, Oracle) and an ERP system (e.g. SAP, PeopleSoft)
globally in the next four years. Their initial project budget is over $125 million! In my opinion they have a 0% probability of delivering all of these systems on-time and on-budget. Consider this,
when was that last time that you’ve seen an ERP or CRM initiative being delivered on time or on budget?

When we examine the causes for these projects failure several themes become apparent. First, these projects did not address a definable and measurable business need. This is the number one reason
for project failure, data warehouse, CRM, MME, or otherwise. As IT professionals we must always be looking to solve business problems or capture business opportunities. Second, the projects that
fail have a very difficult time understanding their company’s existing IT environment and business rules. This includes custom applications, vendor applications, data elements, entities, data
flows, data heritage and data lineage.

MME’s Focus On Data Integration

Many of these Global 2000 companies and large government organizations are targeting MME technology to assist them in identifying and removing existing application and data redundancy. Moreover,
many companies are actively using their MME to identify redundant applications through analysis of the data. These same companies are starting IT application integration projects to merge these
overlapping systems and to ensure that future IT applications do not proliferate needless redundancy.

If your organization can reduce their applications, processes, data, software and hardware, lowers the likelihood for IT project failure and speeds up the IT development life-cycle, then clearly it
will greatly reduce a company’s IT expenditures. For example, I have a large banking client that asked my company to analyze their IT environment. During this analysis we discovered that they
have a tremendous amount of application and data redundancy. Moreover, I had figured out that they have over 700 unique applications. I then compared this client to a bank that is more than twice
there size; however, this larger bank has a world class MME and uses it to properly manage their systems. As a result, they have less than 250 unique applications. Clearly the bank with more than
700 applications has a great deal of needless redundancy as compared to a bank that is more than twice their size and has less than 250 applications. Interestingly enough the bank that has less
than 250 applications and has a world-class MME is also 14 times more profitable than the bank maintaining over 700 applications. It doesn’t seem like a very far stretch to see that the less
profitable bank would become much more profitable if they removed this redundancy.

I used referenced web site : tdan.com

We have a referrer link, for original article of this post, if you want you can follow TDAN (The Data Administration Newsletter)

Special thanks for this tdan comunity, and you can see the post at below link;

http://tdan.com/the-importance-of-data-integration/5198