How Open Source Databases Improve Data Management Options for Marketers

Researching the latest marketing trends, we often see and hear the term “martech,” of course. The “mar” portion of that amalgam represents the word marketing, but the “tech” part has multiple meanings because so much new technology is seemingly being introduced ever day.

One of those “meanings” is a new visualization development called a “graph database.” It is a tool that has significant analytic advantages for managing digital marketing strategy. As open source tools, graph databases allow users to view real-time data relationships. The benefit to marketers is that it allows analysis that goes beyond descriptive analytics (metrics that explained what happened) to data mining techniques (discovering the causes of what happened). Open source databases provide fertile ground for investigating data management concepts–such as identifying machine learning activity among IoT-enabled devices.

Leading the charge among open source graph databases is Neo4j, an offering from San Mateo, Ca. based Neo Technology. Neo4j presents queried data results as nodes. Relationship lines are graphed between the nodes according to the query parameters. This permits users to view relationships among the data elements that are otherwise invisible in a tabular format.

The rising interest in open databases like Neo4j is a by-product of the rising volume of social data being harnessed for analysis. Social media, along with data access via APIs, has created a large eco-system of data sources–leaving marketers needing a better understanding of the relationships within the data. Social data is big data, as it fulfills the classic “3Vs” definition assigned by experts: it has (big) volume, velocity, and variety.

Many firms are starting to understand the strategic need for open source databases. At the O’Reilly OSCON conference in Austin this past May, I had an opportunity to speak with Michael Hunger, Developer Relations at Neo Technology. Hunger has managed the developer community that has grown around Neo4j and explained how database needs have evolved to benefit enterprises.

“Neo4j has a long history, and was originally used for analytics,” Hunger explained. “Many initial case studies included a lot of relationship analysis, such as synonyms across different languages that appear in a natural language processing. But if you do not have a database optimized for relationships you get into join hell, with tens of hundreds of joins to maintain and a lengthy query time as a result….You can use graph databases as a cost-efficient way to do dry runs of deployments. It was a niche purpose, but now we see more mainstream brick and mortar business using graph databases.”

So where should marketers start with their database initiatives? Like any data scientist facing a data-influenced project, marketers must start with the data. Before modeling with an imagined relationship in mind, analysts can validate data and refine its quality with solutions that can import a CSV data file. In fact there are several CSV file validation kits that supports various programming languages being used for automated tasks. CSVKit, for example, works with Python for those who are working with the language alongside Neo4j, while PapaParse works within a JavaScript programming code. In either case, marketers can plan according to the developer resources at hand. And if none of the said resource are at hand, another tool, CSVLint can validate files uploaded online, avoiding programming concerns.

Marketers can then start planning data queries using the graph database of choice. Most graph databases can be downloaded easily, like any other open source software. And while there are variations, most query language that support the databases are made to blend into existing environments. Neo4j use Cypher, a query language that contains some SQL-like protocols. But Cypher also has advantages, such as requiring less code to merge data sources, making automation maintenance more straightforward.

The purpose for a data graph can vary. In one instance, graph databases can manage machine learning prototyping, a development phase in which analysts test various models using historical and live data. In another, an analyst can create a network device dependency tree to answer questions such as what marketing content can support consumer interaction with a networked device or what devices and data are impacted when one device is removed for servicing.

Databases have come a long way from being storage units for data. With new dynamics in how data is generated, open source databases–and the analysis they can support–are becoming essential for companies looking to gain the most from martech and stay ahead of competitors.

Related Posts