云计算大数据外文翻译文献

云计算大数据外文翻译文献
云计算大数据外文翻译文献

云计算大数据外文翻译文献

(文档含英文原文和中文翻译)

原文:

Meet Hadoop

In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.

—Grace Hopper Data!

We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in

2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes. A zettabyte is 1021 bytes, or equivalently one thousand exabytes, one million petabytes, or one billion terabytes. That’s roughly the same order of magnitude as one disk drive for every person in the world.

This flood of data is coming from many sources. Consider the following:

? The New York Stock Exchange generates about one terabyte of new trade data per

day.

? Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.

? https://www.360docs.net/doc/6b7266003.html,, the genealogy site, stores around 2.5 petabytes of data.

? The Internet Archive stores around 2 petabytes of data, and is growing at a rate of

20 terabytes per month.

? The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.

So there’s a lot of data out there. But you are probably wondering how it affects you.

Most of the data is locked up in the largest web properties (like search engines), or

scientific or financial institutions, isn’t it? Does the advent of “Big Data,” as it is being called, affect smaller organizations or individuals?

I argue that it does. Take photos, for example. My wife’s grandfather was an avid photographer, and took photographs throughout his adult life. His entire corpus of medium format, slide, and 35mm film, when scanned in at high-resolution, occupies around 10 gigabytes. Compare this to the digital photos that my family took last year,which take up about 5 gigabytes of space. My family is producing photographic data at 35 times the rate my wife’s grandfather’s did, and the rate is increasing every year as it becomes easier to take more and more photos.

More generally, the digital streams that individuals are producing are growing apace. Microsoft Research’s MyLifeBits project gives a glimpse of archiving of personal in formation that may become commonplace in the near future. MyLifeBits was an experiment where an individual’s interactions—phone calls, emails, documents were captured electronically and stored for later access. The data gathered included a photo taken every minute, which resulted in an overall data volume of one gigabyte a month. When storage costs come down enough to make it feasible to store continuous audio and video, the data volume for a future MyLifeBits service will be many times that.

The trend is f or every individual’s data footprint to grow, but perhaps more importantly the amount of data generated by machines will be even greater than that generated by people. Machine logs, RFID readers, sensor networks, vehicle GPS traces, retail transactions—all of these contribute to the growing mountain of data.

The volume of data being made publicly available increases every year too. Organizations no longer have to merely manage their own data: success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data.Initiatives such as Public Data Sets on Amazon Web Services, https://www.360docs.net/doc/6b7266003.html,, and https://www.360docs.net/doc/6b7266003.html, exist to foster the “information commons,” where data can be freely (or in the case of AWS, for a modest price) shared for anyone to download and analyze. Mashups between different information sources make for unexpected and hitherto unimaginable applications.

Take, for example, the https://www.360docs.net/doc/6b7266003.html, project, which watches the Astrometry group

on Flickr for new photos of the night sky. It analyzes each image, and identifies which part of the sky it is from, and any interesting celestial bodies, such as stars or galaxies. Although it’s still a new and experimental service, it shows the kind of things that are possible when data (in this case, tagged photographic images) is made available andused for something (image analysis) that was not anticipated by the creator.

It has been said that “More data usually beats better algorithms,” which is to say that for some problems (such as recommending movies or music based on past preferences),however fiendish your algorithms are, they can often be beaten simply by having more data (and a less sophisticated algorithm).

The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it.

Data Storage and Analysis

The problem is simple: while the storage capacities of hard drives have increased massively over the years, access speeds--the rate at which data can be read from drives--have not kept up. One typical drive from 1990 could store 1370 MB of data and had a transfer speed of 4.4 MB/s, so you could read all the data from a full drive in around five minutes. Almost 20

years later one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.

This is a long time to read all data on a single drive and writing is even slower. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.

Only using one hundredth of a disk may seem wasteful. But we can store one hundred datasets, each of which is one terabyte, and provide shared access to them. We can imagine that the users of such a system would be happy to share access in return for shorter analysis times, and, statistically, that their analysis jobs would be likely to be spread over time, so they wouldn`t interfere with each other too much.

There`s more to being able to read and write data in parallel to or from multiple disks, though. The first problem to solve is hardware failure: as soon as you start using many pieces of hardware, the chance that one will fail is fairly high. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. This is how RAID works, for instance, although Hadoop`s filesystem, the Hadoop Distributed Filesystem (HDFS),takes a slightly different approach, as you shall see later. The second problem is that most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks. Various distributed systems allow data to be combined from multiple sources, but doing this correctly is notoriously challenging. MapReduce provides a programming model that abstracts the problem from disk reads and writes, transforming it into a computation over sets of keys and values. We will look at the details of this model in later chapters, but the important point for the present discussion is that there are two parts to the computation, the map and the re duce, and it’s the interface between the two where the “mixing” occurs. Like HDFS, MapReduce has reliability built-in.

This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system. The storage is provided by HDFS, and analysis by MapReduce. There are other parts to Hadoop, but these capabilities are its kernel.

Comparison with Other Systems

The approach taken by MapReduce may seem like a brute-force approach. The premise is that the entire dataset—or at least a good portion of it—is processed for each query. But this is its power. MapReduce is a batch query processor, and the ability to run an ad hoc query against your whole dataset and get the results in a reasonable time is transformative. It changes the way you think about data, and unlocks data that was previously archived on tape or disk. It gives people the opportunity to innovate with data. Questions that took too long to get answered before can now be answered, which in turn leads to new questions and new insights.

For e xample, Mailtrust, Rackspace’s mail division, used Hadoop for processing email logs. One ad hoc query they wrote was to find the geographic distribution of their users.

In their words: This data was so useful that we’ve scheduled the MapReduce job to run monthly and we will be using this data to help us decide which Rackspace data centers to place new mail servers in as we grow. By bringing several hundred gigabytes of data together and having the tools to analyze it, the Rackspace engineers were able to gain an understanding of the data that they otherwise would never have had, and, furthermore, they were able to use what they had learned to improve the service for their customers. You can read more about how Rackspace uses Hadoop in Chapter 14.

RDBMS

Why c an’t we use databases with lots of disks to do large-scale batch analysis? Why is MapReduce needed? The answer to these questions comes from another trend in disk drives: seek time is improving more slowly than transfer rate. Seeking is the process of moving the disk’s head to a particular place on the disk to read or write data. It characterizes the latency of a disk operation, whereas the transfer rate corresponds to a disk’s bandwidth.

If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than streaming through it, which operates at the transfer rate. On the other hand, for updating a small proportion of records in a database, a traditional B-Tree (the data structure used in relational databases, which is limited by the rate it can perform seeks) works well. For updating the majority of a database, a B-Tree is less efficient than MapReduce, which uses Sort/Merge to rebuild the database.

In many ways, MapReduce can be seen as a complement to an RDBMS. (The differences between the two systems are shown in Table 1-1.) MapReduce is a good fit for problems that

need to analyze the whole dataset, in a batch fashion, particularly for ad hoc analysis. An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver low-latency retrieval and update times of a relatively small amount of data. MapReduce suits applications where the data is written once, and read many times, whereas a relational database is good for datasets that are continually updated.

Table 1-1. RDBMS compared to MapReduce

Traditional RDBMS MapReduce

Data size Gigabytes Petabytes

Access Interactive and batch Batch

Write once, read many times Updates Read and write many

times

Structure Static schema Dynamic schema

Integrity High Low

Scaling Nonlinear Linear

Another difference between MapReduce and an RDBMS is the amount of structure in the datasets that they operate on. Structured data is data that is organized into entities that have a defined format, such as XML documents or database tables that conform to a particular predefined schema. This is the realm of the RDBMS. Semi-structured data, on the other hand, is looser, and though there may be a schema, it is often ignored, so it may be used only as a guide to the structure of the data: for example, a spreadsheet, in which the structure is the grid of cells, although the cells themselves may hold anyform of data. Unstructured data does not have any particular internal structure: for example, plain text or image data. MapReduce works well on unstructured or semistructured data, since it is designed to interpret the data at processing time. In other words, the input keys and values for MapReduce are not an intrinsic property of the data, but they are chosen by the person analyzing the data.

Relational data is often normalized to retain its integrity, and remove redundancy. Normalization poses problems for MapReduce, since it makes reading a record a nonlocal

operation, and one of the central assumptions that MapReduce makes is that it is possible to perform (high-speed) streaming reads and writes.

A web server log is a good example of a set of records that is not normalized (for example, the client hostnames are specified in full each time, even though the same client may appear many times), and this is one reason that logfiles of all kinds are particularly well-suited to analysis with MapReduce.

MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries.

Over time, however, the differences between relational databases and MapReduce systems are likely to blur. Both as relational databases start incorporating some of the ideas from MapReduce (such as Aster Data’s and Greenplum’s databases), and, from the other direction, as higher-level query languages built on MapReduce (such as Pig and Hive) make MapReduce systems more approachable to traditional database programmers.

Grid Computing

The High Performance Computing (HPC) and Grid Computing communities have been doing large-scale data processing for years, using such APIs as Message Passing Interface (MPI). Broadly, the approach in HPC is to distribute the work across a cluster of machines, which access a shared filesystem, hosted by a SAN. This works well for predominantly compute-intensive jobs, but becomes a problem when nodes need to access larger data volumes (hundreds of gigabytes, the point at which MapReduce really starts to shine), since the network bandwidth is the bottleneck, and compute nodes become idle.

MapReduce tries to colocate the data with the compute node, so data access is fast since it is local. This feature, known as data locality, is at the heart of MapReduce and is the reason for its good performance. Recognizing that network bandwidth is the most precious resource in a data center environment (it is easy to saturate network links by copying data around),

MapReduce implementations go to great lengths to preserve it by explicitly modelling network topology. Notice that this arrangement does not preclude high-CPU analyses in MapReduce.

MPI gives great control to the programmer, but requires that he or she explicitly handle the mechanics of the data flow, exposed via low-level C routines and constructs, such as sockets, as well as the higher-level algorithm for the analysis. MapReduce operates only at the higher level: the programmer thinks in terms of functions of key and value pairs, and the data flow is implicit.

Coordinating the processes in a large-scale distributed computation is a challenge. The hardest aspect is gracefully handling partial failure—when you don’t know if a remote process has failed or not—and still making progress with the overall computation. MapReduce spares the programmer from having to think about failure, since the implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy. MapReduce is able to do this since it is a shared-nothing architecture, meaning that tasks have no dependence on one other. (This is a slight oversimplification, since the output from mappers is fed to the reducers, but this is under the control of the MapReduce system; in this case, it needs to take more care rerunning a failed reducer than rerunning a failed map, since it has to make sure it can retrieve the necessary map outputs, and if not, regenerate them by running the relevant maps again.) So from the programmer’s point of view, the order in which the tasks run doesn’t matter. By contrast, MPI programs have to explicitly manage their own checkpointing and recovery, which gives more control to the programmer, but makes them more difficult to write.

MapReduce might sound like quite a restrictive programming model, and in a sense it

is: you are limited to key and value types that are related in specified ways, and mappers and reducers run with very limited coordination between one another (the mappers pass keys and values to reducers). A natural question to ask is: can you do anything useful or nontrivial with it?

The answer is yes. MapReduce was invented by engineers at Google as a system for building production search indexes because they found themselves solving the same problem over and over again (and MapReduce was inspired by older ideas from the functional programming, distributed computing, and database communities), but it has since been used for many other applications in many other industries. It is pleasantly surprising to see the range of algorithms that can be expressed in MapReduce, from image analysis, to graph-based problems,

to machine learning algorithms. It can’t solve every problem, of course, but it is a general data-processing tool.

You can see a sample of some of the applications that Hadoop has been used for in Chapter 14.

Volunteer Computing

When people first hear about Hadoop and MapReduce, they often ask, “How is it different from SETI@home?” SETI, the Search for Extra-Terrestrial Intelligence, runs a project called SETI@home in which volunteers donate CPU time from their otherwise idle computers to analyze radio telescope data for signs of intelligent life outside earth. SETI@home is the most well-known of many volunteer computing projects; others include the Great Internet Mersenne Prime Search (to search for large prime numbers) and Folding@home (to understand protein folding, and how it relates to disease).

Volunteer computing projects work by breaking the problem they are trying to solve into chunks called work units, which are sent to computers around the world to be analyzed. For example, a SETI@home work unit is about 0.35 MB of radio telescope data, and takes hours or days to analyze on a typical home computer. When the analysis is completed, the results are sent back to the server, and the client gets another work unit. As a precaution to combat cheating, each work unit is sent to three different machines, and needs at least two results to agree to be accepted.

Although SETI@home may be superficially similar to MapReduce (breaking a problem into independent pieces to be worked on in parallel), there are some significant differences. The SETI@home problem is very CPU-intensive, which makes it suitable for running on hundreds of thousands of computers across the world, since the time to transfer the work unit is dwarfed by the time to run the computation on it. Volunteers are donating CPU cycles, not bandwidth.

MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data center with very high aggregate bandwidth interconnects. By contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet with highly variable connection speeds and no data locality.

译文:

初识Hadoop

古时候,人们用牛来拉重物,当一头牛拉不动一根圆木的时候,他们不曾想过培育个头更大的牛。同样,我们也不需要尝试更大的计算机,而是应该开发更多的计算系统。

--格蕾斯·霍珀数据

我们生活在数据时代!很难估计全球存储的电子数据总量是多少,但是据IDC估计2006年"数字全球"项目(digital universe)的数据总量为0.18 ZB,并且预测到2011年这个数字将达到1.8 ZB,为2006年的10倍。1 ZB相当于10的21次方字节的数据,或者相当于1000 EB,1000000 PB,或者大家更熟悉的10亿TB的数据!这相当于世界上每个人一个磁盘驱动器的数量级。

这一数据洪流有许多来源。考虑下文:

纽约证券交易所每天产生1 TB的交易数据。

著名社交网站Facebook的主机存储着约100亿张照片,占据PB级存储空间。

https://www.360docs.net/doc/6b7266003.html,,一个家谱网站,存储着2.5 PB数据。

互联网档案馆(The Internet Archive)存储着约2 PB数据,并以每月至少20 TB的速度增长。

瑞士日内瓦附近的大型强子对撞机每年产生约15 PB的数据。

此外还有大量数据。但是你可能会想它对自己有何影响。大部分数据被锁定在最大的

网页内容里面(如搜索引擎)或者是金融和科学机构,对不对?是不是所谓的"大数据"的出现会影响到较小的组织或个人?

基于大数据和云计算平台与应用

基于大数据和云计算平台与应用 发表时间:2018-08-20T16:09:00.780Z 来源:《基层建设》2018年第21期作者:全仲谋 [导读] 摘要:大数据应用的发展对信息系统及其应用提出了更高要求,而基于云计算的大计算平台技术已成为现代建模仿真领域的核心技术,尤其是当前社会各领域开始注重对基于数据的应用,大数据的兴起引发了社会各领域研究、应用大数据的热潮。 中国移动通信集团广东有限公司湛江分公司 524033 摘要:大数据应用的发展对信息系统及其应用提出了更高要求,而基于云计算的大计算平台技术已成为现代建模仿真领域的核心技术,尤其是当前社会各领域开始注重对基于数据的应用,大数据的兴起引发了社会各领域研究、应用大数据的热潮。本文详细阐述了大数据和云计算平台应用的基本概念,病态系讨论了大数据和云计算平台的实际应用。 关键词:大数据;云计算;平台;应用 引言 “大数据”这个词在世界上的地位日益显著,甚至隐约可以成为这个时代的代名词。对于数据信息的采集和处理已然成为各行各业创造经济突破的新增长点,是企业战略目标制定和实施的关键依据。大数据的概念决定了它需要在一个特殊的平台上才能够发挥作用,庞大的信息量并不是以往的单机处理系统可以“吃得消”的。而云计算平台的建立正好弥补了这一方面的短板,其新颖的信息处理模式与大数据概念有着很好的契合度。但是目前大多数研究者的目光都是集中在大数据分析上,关于大数据与云计算平台应用的研究尚处于初级阶段。不过可以预期,未来大数据和云计算平台必将成为社会的发展核心。 一、大数据与云计算平台概述 1、大数据的特征。大数据又被IT业称之为巨量数据集合,具体是指无法在某个特定时间范围内用常规的软件工具进行捕捉、管理和处理的数据集合,是一种海量、多样化、高增长率的信息资产。大数据的特征主要体现在如下几个方面:超大的容量、繁多的种类、获取数据的高速、数据质量真实可靠、数据来源渠道复杂等等。信息时代到来的今天,数据信息在生产生活中的重要性日益凸显,大数据的发展速度也变得越来越快,对信息处理提出了更高的要求,即需要在短时间内对数据库进行有关的操作与处理,为满足这一需求,大数据技术应运而生。 2、云计算平台的优势。云计算是以网络为平台,利用远程连接的计算机获取所需计算服务,该计算机可供给弹性伸缩的计算资源,可提高资源利用效率,节省因重复配置资源增加的成本。云计算的优点:1.计算能力强。云计算可对计算机集群中的CPU进行远程调用,使其具备强大的计算能力,每秒高达10万亿次运算。2.可靠性高。云计算使用数据容错技术和计算节点同构可互换措施,能够保证云计算服务的可靠性。3.使用成本低。云计算采用自动化集中式管理,按需分配使用硬件资源,无需支付数据管理成本。 3、大数据与云计算平台的关系。大数据与云计算的联系紧密,两者均能够为数据资源提供存储、访问和计算的平台。对于云计算而言,其核心技术为数据处理技术,最终目的是为国家、企业和个人提供便捷服务,这与大数据的发展目的一致。大数据拥有丰富的数据资源,能够与云计算平台共同一个平台,进行大数据分析与计算,两者的相似度极高。 二、大数据与云计算平台优势分析 数据处理是大数据的基础要求,新时代下的“大数据”理念已经是无法用传统计算机处理方式来满足的,因而需要一种新的计算方式作为支持。容量大、种类多、价值高、更新快的特点使得大数据看起来像是一座高楼大厦,有着巨大的价值等待人们的开发利用,而云计算所提供的安全、高效的数据应用服务可以有力地支撑这座楼房。 大数据与云计算平台是一个由众多技术融合的综合体,其主要包括虚拟化技术、分布式海量数据存储与管理和分布式并行编程技术。大数据与云计算平台充分利用云计算适用于数据密集型计算的特点,很好地贴合了大数据对数据量和数据类型的要求;云计算分散到集群电脑的处理方式能够实现数据的及时调用和动态调整,达到高效、快速处理数据信息的目的;平台可以利用虚拟化处理方式对电脑本地资源、网络资源等进行整合、按照要求进行统一调度,实现信息价值最大化。同时大数据与云计算平台具有良好的相容性,能够与各种系统应用做到有效契合。以云计算为核心的数据处理平台能够满足更加复杂的操作要求,同时其容量大、运行稳定、安全性高的特点能够适应现在对数据处理的需求;大数据可以为云计算的运行提供指导,对云计算的资源进行有效的调配。 三、基于云计算的大数据平台应用研究 3.1基于云计算的大数据平台优点分析 目前社会各领域所采用的传统单机处理模式成本较高,而且无法根据用户的使用要求进行扩展,随着用户应用数据量的不断增加及数据处理复杂程度的不断提高,这便会导致单机处理模式的性能无法满足用户的实际需求,而基于云计算技术构建而成的大数据平台可以有效解决上述问题,可以为不同层次用户提供安全、高效、便捷的应用数据服务,对提高用户对应用数据的使用效率和使用质量有着重要作用。云计算在实际运用中具备良好的弹性伸缩及动态调配等功能,对资源的虚拟化处理及系统的透明性处理可以满足用户按需使用要求,其绿色节能可以最大程度上契合新型大数据处理技术的诸多要求,而以云计算为代表的新一代计算处理模式具有更强大的处理功能,其存储空间、可靠性、安全性、便捷性都可以满足用户需求,并且大数据平台在应用中具有优秀的可平滑迁移、可弹性伸缩等有点,并且可以实现对云计算资源的统一管理和调度等诸多优势特性,所以基于云计算的大数据平台应用已成为未来计算技术的主要发展方向。 3.2基于云计算的大数据平台实际应用 基于云计算技术的大数据平台可以提供聚合大规模分布式系统中,对通讯、存储、处理等能力的需求,并可以为上层平台通过灵活、可靠的方式提供各类应用,并且其在实际应用中可以针对海量多格式、多模式大数据的跨系统、跨平台等操作,提供统一管理手段和敏捷的响应机制,对支持大数据快速变化的功能目标、系统环境以及应用配置有着重要作用。例如,基于云计算技术构建而成的企业信息系统,该新型系统在建设过程中采用了分布式集群技术来构建一个大数据平台,该平台在实际运行中可以支持不同业务应用中多种格式、多种访问模式的大数据统一存储,并采用分布式工作流和调度系统框架来构建一个数据分析系统,利用分布式计算手段实现大数据的转换、关联、提取以及聚合等功能,该类大数据平台在实际应用中可以满足企业各种业务的实际需求。 基于云计算技术的大数据平台可以实现企业决策支撑、销售预测等功能,这是因为其在实际应用中可以利用上层应用数据,通过大数据平台分析系统的功能及附加业务的逻辑功能对其进行分析,从而为现代企业利用数据决策提供科学、准确、有效的参考依据。云计算平台技术与云计算服务技术在新时期的高速发展,使大数据平台应用技术成为可能,如果没有云计算技术作为大数据平台的技术支撑,大数

新技术云计算外文文献

云计算——新兴的计算技术 摘要:云计算是涉及通过互联网提供托管服务的总称。这些服务大致分为三类:基础设施即服务(IaaS)、平台即服务(PaaS)和软件即服务(SaaS)。云计算这个名字的灵感来自于云符号经常用来代表在互联网上流程图和图表。这是在继主机计算、个人电脑计算、客户端服务器计算和Web计算之后的第五代计算技术。本文将围绕云计算进行讨论。 关键词:云计算,IaaS(基础设施即服务),PaaS的(平台即服务),SaaS(软件即服务) 1引言 云服务有三个鲜明的特点区别于传统的主机服务模式,它们分别是:云服务的出售通常按分钟或小时收取费用;云服务是有弹性的,一个用户可以在不同的时间拥有可多可少的服务;云服务完全由供应商托管(消费者只需要通过个人电脑和互联网就可以使用)。虚拟化的重大创新、分布式计算的发展,以及高速互联网的建设和经济的衰落,都加速了对云计算的兴趣。 云可以是私有的或公有的。公有云向互联网上的任何人销售(目前,亚马逊的网络服务是最大的公有云服务提供商)。私有云是一个专有网络或数据中心,向一部分人提供托管服务。当服务提供商使用公有云资源来创建自己的私有云,这样的结果被称为虚拟化的私有云。私有云或公共云的云计算目标是提供方便的、可扩展的计算资源和IT服务[1]。 2云计算的优势 云计算具有的优势是什么? (a)最小化的资本开支 (b)位置和设备独立性 答案:供应商的视角:申请厂商更容易吸引新客户。 (a)提供最低成本的方法和配套应用; (b)能够使用商品服务器和存储硬件; 3云计算的障碍 从客户的视角来看,云计算的障碍有: (a)数据安全; (b)很多客户不希望他们的数据迁移到可以信任的“云”上; (c)数据必须进行本地保留; (d)延迟; (e)云可以走多少毫秒; (f)不是实时应用的理想选择; (g)应用程序可用性; (h)无法通过现有的传统应用进行切换; (i)等效的云应用不存在; 总结,并非所有的应用程序都要工作在公共云之上。

ASP外文翻译原文

https://www.360docs.net/doc/6b7266003.html, https://www.360docs.net/doc/6b7266003.html, 是一个统一的 Web 开发模型,它包括您使用尽可能少的代码生成企业级 Web 应用程序所必需的各种服务。https://www.360docs.net/doc/6b7266003.html, 作为 .NET Framework 的一部分提供。当您编写 https://www.360docs.net/doc/6b7266003.html, 应用程序的代码时,可以访问 .NET Framework 中的类。您可以使用与公共语言运行库 (CLR) 兼容的任何语言来编写应用程序的代码,这些语言包括 Microsoft Visual Basic、C#、JScript .NET 和 J#。使用这些语言,可以开发利用公共语言运行库、类型安全、继承等方面的优点的https://www.360docs.net/doc/6b7266003.html, 应用程序。 https://www.360docs.net/doc/6b7266003.html, 包括: ?页和控件框架 ?https://www.360docs.net/doc/6b7266003.html, 编译器 ?安全基础结构 ?状态管理功能 ?应用程序配置 ?运行状况监视和性能功能 ?调试支持 ?XML Web services 框架 ?可扩展的宿主环境和应用程序生命周期管理 ?可扩展的设计器环境 https://www.360docs.net/doc/6b7266003.html, 页和控件框架是一种编程框架,它在 Web 服务器上运行,可以动态地生成和呈现 https://www.360docs.net/doc/6b7266003.html, 网页。可以从任何浏览器或客户端设备请求 https://www.360docs.net/doc/6b7266003.html, 网页,https://www.360docs.net/doc/6b7266003.html, 会向请求浏览器呈现标记(例如 HTML)。通常,您可以对多个浏览器使用相同的页,因为 https://www.360docs.net/doc/6b7266003.html, 会为发出请求的浏览器呈现适当的标记。但是,您可以针对诸如 Microsoft Internet Explorer 6 的特定浏览器设计https://www.360docs.net/doc/6b7266003.html, 网页,并利用该浏览器的功能。https://www.360docs.net/doc/6b7266003.html, 支持基于 Web 的设备(如移动电话、手持型计算机和个人数字助理 (PDA))的移动控件。

毕业设计外文翻译原文.

Optimum blank design of an automobile sub-frame Jong-Yop Kim a ,Naksoo Kim a,*,Man-Sung Huh b a Department of Mechanical Engineering,Sogang University,Shinsu-dong 1,Mapo-ku,Seoul 121-742,South Korea b Hwa-shin Corporation,Young-chun,Kyung-buk,770-140,South Korea Received 17July 1998 Abstract A roll-back method is proposed to predict the optimum initial blank shape in the sheet metal forming process.The method takes the difference between the ?nal deformed shape and the target contour shape into account.Based on the method,a computer program composed of a blank design module,an FE-analysis program and a mesh generation module is developed.The roll-back method is applied to the drawing of a square cup with the ˉange of uniform size around its periphery,to con?rm its validity.Good agreement is recognized between the numerical results and the published results for initial blank shape and thickness strain distribution.The optimum blank shapes for two parts of an automobile sub-frame are designed.Both the thickness distribution and the level of punch load are improved with the designed blank.Also,the method is applied to design the weld line in a tailor-welded blank.It is concluded that the roll-back method is an effective and convenient method for an optimum blank shape design.#2000Elsevier Science S.A.All rights reserved. Keywords:Blank design;Sheet metal forming;Finite element method;Roll-back method

大数据文献综述

信息资源管理文献综述 题目:大数据背景下的信息资源管理 系别:信息与工程学院 班级:2015级信本1班 姓名: 学号:1506101015 任课教师: 2017年6月 大数据背景下的信息资源管理 摘要:随着网络信息化时代的日益普遍,我们正处在一个数据爆炸性增长的“大数据”时代,在我们的各个方面都产生了深远的影响。大数据是数据分析的前沿技术。简言之,从各种各样类型的数据中,快速获得有价值信息的能力就是大数据技术,这也是一个企业所需要必备的技术。“大数据”一词越来越地别提及与使用,我们用它来描述和定义信息爆炸时代产生的海量数据。就拿百度地图来说,我们在享受它带来的便利的同时,无偿的贡献了我们的“行踪”,比如说我们的上班地点,我们的家庭住址,甚至是我们的出行方式他们也可以知道,但我们不得不接受这个现实,我们每个人在互联网进入大数据时代,都将是透明性的存在。各种数据都在迅速膨胀并变大,所以我们需要对这些数据进行有效的管理并加以合理的运用。

关键词:大数据信息资源管理与利用 目录 大数据概念.......................................................... 大数据定义...................................................... 大数据来源...................................................... 传统数据库和大数据的比较........................................ 大数据技术.......................................................... 大数据的存储与管理.............................................. 大数据隐私与安全................................................ 大数据在信息管理层面的应用.......................................... 大数据在宏观信息管理层面的应用.................................. 大数据在中观信息管理层面的应用.................................. 大数据在微观信息管理层面的应用.................................. 大数据背景下我国信息资源管理现状分析................................ 前言:大数据泛指大规模、超大规模的数据集,因可从中挖掘出有价值 的信息而倍受关注,但传统方法无法进行有效分析和处理.《华尔街日

云计算和大数据基础知识12296

精心整理 云计算与大数据基础知识 一、云计算是什么? 云计算就是统一部署的程序、统一存储并由相关程序统一管理着的数据! 云计算cloudcomputing是一种基于因特网的超级计算模式,在远程的数据中心里,成千上万台电脑和服务器连接成一片电脑云。因此,云计算甚至可以让你体验每秒超过10万亿次的运算能力,拥有这么强大的计算能力可以模拟核爆炸、预测气候变化和市场发展趋势。用户通过电脑、笔记本、手机等方式接入数据中心,按自己的需求进行运算。 二、 三、 1 );软件2 任一资源节点异常宕机,都不会导致云环境中的各类业务的中断,也不会导致用户数据的丢失。这里的资源节点可以是计算节点、存储节点和网络节点。而资源动态流转,则意味着在云计算平台下实现资源调度机制,资源可以流转到需要的地方。如在系统业务整体升高情况下,可以启动闲置资源,纳入系统中,提高整个云平台的承载能力。而在整个系统业务负载低的情况下,则可以将业务集中起来,而将其他闲置的资源转入节能模式,从而在提高部分资源利用率的情况下,达到其他资源绿色、低碳的应用效果。 3、支持异构多业务体系 在云计算平台上,可以同时运行多个不同类型的业务。异构,表示该业务不是同一的,不是已有的或事先定义好的,而应该是用户可以自己创建并定义的服务。这也是云计算与网格计算的一个重要差异。 4、支持海量信息处理 云计算,在底层,需要面对各类众多的基础软硬件资源;在上层,需要能够同时支持各类众多的异构的业务;

而具体到某一业务,往往也需要面对大量的用户。由此,云计算必然需要面对海量信息交互,需要有高效、稳定的海量数据通信/存储系统作支撑。 5、按需分配,按量计费 按需分配,是云计算平台支持资源动态流转的外部特征表现。云计算平台通过虚拟分拆技术,可以实现计算资源的同构化和可度量化,可以提供小到一台计算机,多到千台计算机的计算能力。按量计费起源于效用计算,在云计算平台实现按需分配后,按量计费也成为云计算平台向外提供服务时的有效收费形式。 四、云计算按运营模式分类 1、公有云 公有云通常指第三方提供商为用户提供的能够使用的云,公有云一般可通过Internet使用,可能是免费或成本低廉的。 烦。B 2 3 五、 六、 1、传统的IT部署架构是“烟囱式”的,或者叫做“专机专用”系统。 图2传统IT基础架构 这种部署模式主要存在的问题有以下两点: 硬件高配低用。考虑到应用系统未来3~5年的业务发展,以及业务突发的需求,为满足应用系统的性能、容量承载需求,往往在选择计算、存储和网络等硬件设备的配置时会留有一定比例的余量。但硬件资源上线后,应用系统在一定时间内的负载并不会太高,使得较高配置的硬件设备利用率不高。 整合困难。用户在实际使用中也注意到了资源利用率不高的情形,当需要上线新的应用系统时,会优先考虑部署在既有的基础架构上。但因为不同的应用系统所需的运行环境、对资源的抢占会有很大的差异,更重要的是考虑到可靠性、稳定性、运维管理问题,将新、旧应用系统整合在一套基础架构上的难度非常大,更多的用户往往选择新增与应用系统配套的计算、存储和网络等硬件设备。

最新云计算中侧信道攻击的防御-毕业论文外文翻译整理

附录一英文文献 Security against Side Channel Attack in Cloud Computing Bhrugu Sevak Abstract--Cloud computing is a word that delivering hosted service over the internet. Cloud computing has been ideate as the next generation architecture of IT enterprise ecause of it’s provides ubiquitous network, cost reducing, flexibility and scalability to users. Now days with the fast growing of cloud computing technology introduces new more vulnerabilities so security is considered to be one of the most critical aspect in clod computing environment due to the confidential and important information stored in the cloud. As per AMAZONE EC2 service case study it is possible to identify the particular target VM(virtual machine) in internal cloud infrastructure and then placed new VM with targeted VM and extract confidential information from targeted VM on same physical machine called as simple side channel attack. This paper introduces how to avert the side channel attack in cloud computing. This is accomplished by using combination of Virtual firewall appliance and randomly encryption decryption (using concept of confusion diffusion) and provide RAS (Reliability, Availability, and Security) of client’s data or information. Keywords--Cloud computing, side channel attack, Amazon EC2 service case study, virtual firewall appliance, randomly encryption decryption. I. INTRODUCTION Cloud computing is a word that delivering hosted service over the internet.

中国的对外贸易外文翻译及原文

外文翻译 原文 Foreign T rade o f China Material Source:W anfang Database Author:Hitomi Iizaka 1.Introduction On December11,2001,China officially joined the World T rade Organization(WTO)and be c a me its143rd member.China’s presence in the worl d economy will continue to grow and deepen.The foreign trade sector plays an important andmultifaceted role in China’s economic development.At the same time, China’s expanded role in the world economy is beneficial t o all its trading partners. Regions that trade with China benefit from cheaper and mor e varieties of imported consumer goods,raw materials and intermediate products.China is also a large and growing export market.While the entry of any major trading nation in the global trading system can create a process of adjustment,the o u t c o me is fundamentally a win-win situation.In this p aper we would like t o provide a survey of the various institutions,laws and characteristics of China’s trade.Among some of the findings, we can highlight thefollowing: ?In2001,total trade to gross domestic pr oduct(GDP)ratio in China is44% ?In2001,47%of Chinese trade is processed trade1 ?In2001,51%of Chinese trade is conduct ed by foreign firms in China2 ?In2001,36%of Chinese exports originate from Gu an gdon g province ?In2001,39%of China’s exports go through Hong Kong to be re-exported elsewhere 2.Evolution of China’s Trade Regime Equally remarkable are the changes in the commodity composition of China’s exports and imports.Table2a shows China’s annu al export volumes of primary goods and manufactured goods over time.In1980,primary goods accounted for 50.3%of China’s exports and manufactured goods accounted for49.7%.Although the share of primary good declines slightly during the first half of1980’s,it remains at50.6%in1985.Since then,exports of manufactured goods have grown at a much

大数据文献综述

信息资源管理文献综述题目:大数据背景下的信息资源管理 系别:信息与工程学院 班级:2015级信本1班 姓名: 学号:1506101015 任课教师: 2017年6月

大数据背景下的信息资源管理 摘要:随着网络信息化时代的日益普遍,我们正处在一个数据爆炸性增长的“大数据”时代,在我们的各个方面都产生了深远的影响。大数据是数据分析的前沿技术。简言之,从各种各样类型的数据中,快速获得有价值信息的能力就是大数据技术,这也是一个企业所需要必备的技术。“大数据”一词越来越地别提及与使用,我们用它来描述和定义信息爆炸时代产生的海量数据。就拿百度地图来说,我们在享受它带来的便利的同时,无偿的贡献了我们的“行踪”,比如说我们的上班地点,我们的家庭住址,甚至是我们的出行方式他们也可以知道,但我们不得不接受这个现实,我们每个人在互联网进入大数据时代,都将是透明性的存在。各种数据都在迅速膨胀并变大,所以我们需要对这些数据进行有效的管理并加以合理的运用。 关键词:大数据信息资源管理与利用 目录 大数据概念 (3) 大数据定义 (3) 大数据来源 (3) 传统数据库和大数据的比较 (3) 大数据技术 (4) 大数据的存储与管理 (4)

大数据隐私与安全 (5) 大数据在信息管理层面的应用 (6) 大数据在宏观信息管理层面的应用 (6) 大数据在中观信息管理层面的应用 (7) 大数据在微观信息管理层面的应用 (8) 大数据背景下我国信息资源管理现状分析 (9) 前言:大数据泛指大规模、超大规模的数据集,因可从中挖掘出有价值 的信息而倍受关注,但传统方法无法进行有效分析和处理.《华尔街日 报》将大数据时代、智能化生产和无线网络革命称为引领未来繁荣的大技术变革.“世界经济论坛”报告指出大数据为新财富,价值堪比石油.因此,目前世界各国纷纷将开发利用大数据作为夺取新一轮竞争制高点的重要举措. 当前大数据分析者面临的主要问题有:数据日趋庞大,无论是入库和查询,都出现性能瓶颈;用户的应用和分析结果呈整合趋势,对实时性和响应时间要求越来越高;使用的模型越来越复杂,计算量指数级上升;传统技能和处理方法无法应对大数据挑战. 正文:

大数据与云计算研究报告

(说明:此文为WORD文档,下载后可直接使用)

摘要:近年来,大数据和云计算已经成为社会各界关注的热点话题。秉承“按需服务”理念的“云计算(Cloudcomputing)”正高速发展,“数据即资源”的“大数据(bigdata)”时代已经来临[1]。大数据利用对数据处理的实时性、有效性提出了更高要求,需要根据大数据特点对传统的常规数据处理技术进行技术变革,形成适用于大数据收集、存储、管理、处理、分析、共享和可视化的技术。如何更好地管理和利用大数据已经成为普遍关注的话题。大数据的规模效应给数据存储、管理以及数据分析带来了极大的挑战,数据管理方式上的变革正在酝酿和发生。本文所提到的大数据包含着云计算,因为云计算是支撑大数据的平台。 关键词:大数据云计算数据分析数据挖掘

引言 在学术界,大数据这一概念的提出相对较早。2008年9月,《自然》杂志就推出了名为“大数据”(bigdata)的专刊。2011年5月,麦肯锡全球研究院发布了名为《大数据:创新、竞争和生产力的下一个前沿》(Bigdata:Thenextfrontierforinnovation,competition,andproductivity)的研究报告,指出大数据将成为企业的核心资产,对海量数据的有效利用将成为企业在竞争中取胜的最有力武器。2012年,联合国发布大数据政务白皮书,指出大数据可以使用极为丰富的数据资源来对社会经济进行前所未有的实时分析,帮助政府更好地响应社会和经济运行。2012年3月29日,奥巴马政府发布了《大数据研究与发展计划倡议》,宣布启动对大数据的研发计划,标志着美国把大数据提高到国家战略层面,将“大数据研究”上升为国家意志,对未来的科技与经济发展必将带来深远影响。 大数据应用正在风靡全球,大数据精准营销成为企业掌舵者的口头禅,那么大数据真的是无懈可击吗?答案显然是否定的。随着互联网和移动设备的普及,大数据已经在我们的生活中无处不在,而有关大数据与隐私的问题也日益受到关注。毫无疑问,未来可以获得的个人数据量越多,其中的信息量就越大。只要拥有了足够多的数据,我们甚至可能发现有关于一个人的未来信息。另外市场是变化无常并且不可预期的,决策者的创造性思维并不能通过数据得以体现,相反,大数据在压制创新。大数据搜集到的数据的真实性也有待检验。一个人获得的数据和事实越多,预测就越有意义,人的判断也就显得愈发

Hadoop云计算外文翻译文献

Hadoop云计算外文翻译文献 (文档含中英文对照即英文原文和中文翻译) 原文: Meet Hadoop In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers. —Grace Hopper Data! We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in

2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes. A zettabyte is 1021 bytes, or equivalently one thousand exabytes, one million petabytes, or one billion terabytes. That’s roughly the same order of magnitude as one disk drive for every person in the world. This flood of data is coming from many sources. Consider the following: ? The New York Stock Exchange generates about one terabyte of new trade data per day. ? Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. ? https://www.360docs.net/doc/6b7266003.html,, the genealogy site, stores around 2.5 petabytes of data. ? The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month. ? The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year. So there’s a lot of data out there. But you are probably wondering how it affects you. Most of the data is locked up in the largest web properties (like search engines), or scientific or financial institutions, isn’t it? Does the advent of “Big Data,” as it is being called, affect smaller organizations or individuals? I argue that it does. Take photos, for example. My wife’s grandfather was an avid photographer, and took photographs throughout his adult life. His entire corpus of medium format, slide, and 35mm film, when scanned in at high-resolution, occupies around 10 gigabytes. Compare this to the digital photos that my family took last year,which take up about 5 gigabytes of space. My family is producing photographic data at 35 times the rate my wife’s grandfather’s did, and the rate is increasing every year as it becomes easier to take more and more photos. More generally, the digital streams that individuals are producing are growing apace. Microsoft Research’s MyLifeBits project gives a glimpse of archiving of pe rsonal information that may become commonplace in the near future. MyLifeBits was an experiment where an individual’s interactions—phone calls, emails, documents were captured electronically and stored for later access. The data gathered included a photo taken every minute, which resulted in an overall data volume of one gigabyte a month. When storage costs come down enough to make it feasible to store continuous audio and video, the data volume for a future MyLifeBits service will be many times that.

英文翻译与英文原文.陈--

翻译文献:INVESTIGATION ON DYNAMIC PERFORMANCE OF SLIDE UNIT IN MODULAR MACHINE TOOL (对组合机床滑台动态性能的调查报告) 文献作者:Peter Dransfield, 出处:Peter Dransfield, Hydraulic Control System-Design and Analysis of TheirDynamics, Springer-Verlag, 1981 翻译页数:p139—144 英文译文: 对组合机床滑台动态性能的调查报告 【摘要】这一张纸处理调查利用有束缚力的曲线图和状态空间分析法对组合机床滑台的滑动影响和运动平稳性问题进行分析与研究,从而建立了滑台的液压驱动系统一自调背压调速系统的动态数学模型。通过计算机数字仿真系统,分析了滑台产生滑动影响和运动不平稳的原因及主要影响因素。从那些中可以得出那样的结论,如果能合理地设计液压缸和自调背压调压阀的结构尺寸. 本文中所使用的符号如下: s1-流源,即调速阀出口流量; S el—滑台滑动摩擦力 R一滑台等效粘性摩擦系数: I1—滑台与油缸的质量 12—自调背压阀阀心质量 C1、c2—油缸无杆腔及有杆腔的液容; C2—自调背压阀弹簧柔度; R1, R2自调背压阀阻尼孔液阻, R9—自调背压阀阀口液阻 S e2—自调背压阀弹簧的初始预紧力; I4, I5—管路的等效液感 C5、C6—管路的等效液容: R5, R7-管路的等效液阻; V3, V4—油缸无杆腔及有杆腔内容积; P3, P4—油缸无杆腔及有杆腔的压力 F—滑台承受负载, V—滑台运动速度。本文采用功率键合图和状态空间分折法建立系统的运动数学模型,滑台的动态特性可以能得到显著改善。

污水处理外文翻译(带原文)

提高塔式复合人工湿地处理农村生活污水的 脱氮效率1 摘要: 努力保护水源,尤其是在乡镇地区的饮用水源,是中国污水处理当前面临的主要问题。氮元素在水体富营养化和对水生物的潜在毒害方面的重要作用,目前废水脱氮已成为首要关注的焦点。人工湿地作为一种小型的,处理费用较低的方法被用于处理乡镇生活污水。比起活性炭在脱氮方面显示出的广阔前景,人工湿地系统由于溶解氧的缺乏而在脱氮方面存在一定的制约。为了提高脱氮效率,一种新型三阶段塔式混合湿地结构----人工湿地(thcw)应运而生。它的第一部分和第三部分是水平流矩形湿地结构,第二部分分三层,呈圆形,呈紊流状态。塔式结构中水流由顶层进入第二层及底层,形成瀑布溢流,因此水中溶解氧浓度增加,从而提高了硝化反应效率,反硝化效率也由于有另外的有机物的加入而得到了改善,增加反硝化速率的另一个原因是直接通过旁路进入第二部分的废水中带入的足量有机物。常绿植物池柏(Taxodium ascendens),经济作物蔺草(Schoenoplectus trigueter),野茭白(Zizania aquatica),有装饰性的多花植物睡莲(Nymphaea tetragona),香蒲(Typha angustifolia)被种植在湿地中。该系统对总悬浮物、化学需氧量、氨氮、总氮和总磷的去除率分别为89%、85%、83%、83% 和64%。高水力负荷和低水力负荷(16 cm/d 和32 cm/d)对于塔式复合人工湿地结构的性能没有显著的影响。通过硝化活性和硝化速率的测定,发现硝化和反硝化是湿地脱氮的主要机理。塔式复合人工湿地结构同样具有观赏的价值。 关键词: 人工湿地;硝化作用;反硝化作用;生活污水;脱氮;硝化细菌;反硝化细菌 1. 前言 对于提高水源水质的广泛需求,尤其是提高饮用水水源水质的需求是目前废水深度处理的技术发展指向。在中国的乡镇地区,生活污水是直接排入湖泊、河流、土壤、海洋等水源中。这些缺乏处理的污水排放对于很多水库、湖泊不能达到水质标准是有责任的。许多位于中国的乡镇地区的社区缺乏足够的生活污水处理设备。由于山区地形、人口分散、经济基础差等原因,废水的收集和处理是很成问题的。由于资源短缺,经济欠发达地区所采取的废水处理技术必须低价高效,并且要便于施用,能量输入及维护费用较低,而且要保证出水能达标。建造在城市中基于活性污泥床的废水集中处理厂,对于小乡镇缺乏经济适用性,主要是由于污水收集结构的建造费用高。 1Ecological Engineering,Fen xia ,Ying Li。

相关文档
最新文档