DataEngine BI
BI
Product introduction
Product features and advantages
H3C dataengine provides a visual cluster installation and deployment interface, which is convenient and quick for resource management, host allocation and other operations. It supports one click installation, upgrade and graphical operation and maintenance of component services, and monitors the health status and operation indicators of various services in real time. After exceeding a certain configuration threshold, it gives an alarm and sends an email to the administrator, greatly improving the operation and maintenance efficiency.
All nodes in the MPP cluster are completely peer-to-peer and do not need the master node. Data loading, data export and query can be executed simultaneously in all nodes in parallel. Without resource sharing, increasing nodes can linearly expand the data capacity and computing power of MPP, and can easily expand and shrink from several nodes to thousands of nodes, or from several terabytes to tens of petabytes, so as to meet the requirements of business scale growth.
In the era of big data, data generation is faster and faster, while compliance and deep mining require more data to be retained, so more and more data are stored in the database. Analysis performance, high cost of high-speed disk and large data capacity requirements are often contradictory. The hierarchical storage feature of MPP can effectively resolve this contradiction. MPP can specify different storage policies for different schema, table and other objects, as well as table partition, and specify different storage locations (storage media with different performance, cost and capacity can be used), so as to optimize the storage cost.
MPP has a built-in database optimization designer with expert knowledge. Users only need to specify the logical mode (schema), load sample data, and provide typical query SQL statements. MPP database optimization designer will automatically design the horizontal distribution mode of data, the sorting mode and compression algorithm of each column according to expert knowledge, balance the query performance and storage space size requirements, and realize the overall optimization of the database.
Through the service-oriented data access platform, the integration process of heterogeneous data sources is encapsulated as a form of data service unit to provide external services, forming a wide range of data transmission services, so that data is no longer an island. Support to extract data from DBMS, Internet, Internet of things, enterprise production system and other data sources, and quickly store the processing results into the H3C data engine platform. So that users no longer pay attention to the underlying data transmission process, easy to use, focus on the development of the upper platform applications.
Provide unified sql service and programmable API, extract data processing results of data storage and calculation platform, shield bottom details, and provide data services for upper applications. The data service interface mainly includes the programmable API, full-text search interface, business orientation interface and association query interface of various computing frameworks, such as SQL interface, MapReduce / spark / storm / Flink, etc., to meet the needs of data query, visual Bi display, data analysis, comprehensive query and other business applications. Provide interface documents, secondary development guidance manual and secondary development sample program to meet the needs of developers.
H3C dataengine realizes security authentication based on the security protocol Kerberos, and uses LDAP as the account management system. Meanwhile, it uses range to provide a unified user and role management system, which complies with RBAC (role based access control) model specification, and binds users through roles for permission management. In addition, the data engine also supports the audit log and retrieval ability of each component. The whole component management interface supports single sign on, which makes the platform truly safe and reliable.
It supports R language, integrates spark mllib, a machine learning algorithm library, and includes common machine learning algorithms such as clustering analysis, classification algorithm, frequency association analysis and recommendation system. To meet the needs of batch statistical analysis, online data retrieval, R language data mining, real-time stream processing, full-text search and other all-round needs. It can help enterprises to build high-speed and extensible data warehouse and data mart, and provide interactive data analysis, instant report and Bi visual display capabilities with a variety of report tools.
The data platform of multimodal deployment mode supports two resource division modes, independent mode and shared mode, to meet the business requirements in different scenarios. In the sharing mode, a large cluster can be created. Different users can apply for the shared storage and computing resources of the cluster and isolate them through permissions. It is suitable for enterprises with strict resource control and frequent data exchange between secondary departments. In the independent mode, different users can apply to create a separate cluster and enjoy all the resources of the cluster. Before different clusters, the network is used for isolation, which is suitable for enterprises with sufficient resources and relatively independent business between the secondary departments.
In addition, to meet the requirements of enterprise stability, dataengine also provides an independent product mode of common services, including NoSQL database HBase, memory database redis, message middleware Kafka, search service Solr and elastic search, to avoid the impact of resource preemption between different components on cluster stability.
As the H3C cloudos cloud service provider, the H3C dataengine big data platform gives full play to the advantages of cloud computing and big data integration, and provides virtual resource pools and bare metal resource pools by using the cloud IAAs capabilities. Users can flexibly choose the data platform deployment mode according to specific business scenarios. Virtual machine deployment is suitable for application scenarios with small data volume and low performance requirements to maximize server resource utilization; bare metal deployment is suitable for scenarios with large data volume and high performance to improve user business capabilities.
Application example