概述
Beginners in the field of data science who are not familiar with programming often have a hard time figuring out where they should start.
不熟悉编程的数据科学领域的初学者通常很难确定应该从哪里开始。
With hundreds of questions about how to get started with Python for DS on various forums, this post (and video series) is my attempt to settle all those questions.
在各个论坛上有数百个关于如何开始使用Python for DS的问题,本文(和视频系列)是我试图解决所有这些问题的尝试。
I'm a Python evangelist that started off as a Full Stack Python Developer before moving on to data engineering and then data science. My prior experience with Python and a decent grasp of math helped make the switch to data science more comfortable for me.
我是一名Python传播者,最初是一名Full Stack Python开发人员,然后才着手进行数据工程和数据科学。 我以前在Python方面的经验以及对数学的熟练掌握使我更轻松地转向数据科学。
So, here are the fundamentals to help you with programming in Python.
因此,这是帮助您进行Python编程的基础知识。
Before we take a deep dive into the essentials, make sure that you have set up your Python environment and know how to use a Jupyter Notebook (optional).
在深入了解要点之前,请确保已设置Python环境并知道如何使用Jupyter Notebook(可选)。
A basic Python curriculum can be broken down into 4 essential topics that include:
基本的Python课程可以分为4个基本主题,其中包括:
- Data types (int, float, strings) 数据类型(int,float,字符串)
- Compound data structures (lists, tuples, and dictionaries) 复合数据结构(列表,元组和字典)
- Conditionals, loops, and functions 条件,循环和功能
- Object-oriented programming and using external libraries 面向对象的编程和使用外部库
Let's go over each one and see what are the fundamentals you should learn.
让我们逐一检查一下,看看您应该学习哪些基础知识。
1.数据类型和结构 (1. Data Types and Structures)
The very first step is to understand how Python interprets data.
第一步是了解Python如何解释数据。
Starting with widely used data types, you should be familiar with integers (int), floats (float), strings (str), and booleans (bool). Here's what you should practice.
从广泛使用的数据类型开始,您应该熟悉整数(int),浮点数(float),字符串(str)和布尔值(bool)。 这是您应该练习的。
类型,类型转换和I / O功能: (Type, typecasting, and I/O functions:)
Learning the type of data using the
type()
method.使用
type()
方法学习数据的type()
。
type('Harshit')
# output: str
Storing values into variables and input-output functions (
a = 5.67
)将值存储到变量和输入输出函数中(
a = 5.67
)- Typecasting — converting a particular type of variable/data into another type if possible. For example, converting a string of integers into an integer: 类型转换-将变量/数据的特定类型转换为另一种类型(如果可能)。 例如,将整数字符串转换为整数:
astring = "55"
print(type(astring))
# output: <class 'str'>
astring = int(astring)
print(type(astring))
# output: <class 'int64'>
But if you try to convert an alphanumeric or alphabet string into an integer, it will throw an error:
但是,如果您尝试将字母数字或字母字符串转换为整数,则会引发错误:
Once you are familiar with the basic data types and their usage, you should learn about arithmetic operators and expression evaluations (DMAS) and how you can store the result in a variable for further use.
熟悉基本数据类型及其用法后,您应该了解算术运算符和表达式评估 (DMAS),以及如何将结果存储在变量中以备将来使用。
answer = 43 + 56 / 14 - 9 * 2
print(answer)
# output: 29.0
字串: (Strings:)
Knowing how to deal with textual data and their operators comes in handy when dealing with the string data type. Practice these concepts:
在处理字符串数据类型时,知道如何处理文本数据及其运算符非常方便。 实践这些概念:
Concatenating strings using
+
使用
+
连接字符串Splitting and joining the string using the
split()
andjoin()
method使用
split()
和join()
方法分割和连接字符串Changing the case of the string using
lower()
andupper()
methods使用
lower()
和upper()
方法更改字符串的大小写- Working with substrings of a string 使用字符串的子字符串
Here’s the Notebook that covers all the points discussed.
这是涵盖所有要点的笔记本 。
2.复合数据结构(列表,元组和字典) (2. Compound data structures (lists, tuples, and dictionaries))
列表和元组(复合数据类型): (Lists and tuples (compound data types):)
One of the most commonly used and important data structures in Python are lists. A list is a collection of elements, and the collection can be of the same or varied data types.
列表是Python中最常用和最重要的数据结构之一。 列表是元素的集合,并且该集合可以是相同或不同的数据类型。
Understanding lists will eventually pave the way for computing algebraic equations and statistical models on your array of data.
理解列表最终将为在数据数组上计算代数方程式和统计模型铺平道路。
Here are the concepts you should be familiar with:
以下是您应该熟悉的概念:
- How multiple data types can be stored in a Python list. 如何在Python列表中存储多种数据类型。
Indexing and slicing to access a specific element or sub-list of the list.
索引和切片以访问列表的特定元素或子列表。
Helper methods for sorting, reversing, deleting elements, copying, and appending.
用于排序,反转,删除元素,复制和附加的辅助方法。
Nested lists — lists containing lists. For example,
[1,2,3, [10,11]]
.嵌套列表-包含列表的列表。 例如
[1,2,3, [10,11]]
。- Addition in a list. 在列表中添加。
alist + alist
# output: ['harshit', 2, 5.5, 10, [1, 2, 3], 'harshit', 2, 5.5, 10, [1, 2, 3]]
Multiplying the list with a scalar:
将列表与标量相乘:
alist * 2
# output: ['harshit', 2, 5.5, 10, [1, 2, 3], 'harshit', 2, 5.5, 10, [1, 2, 3]]
Tuples are an immutable ordered sequence of items. They are similar to lists, but the key difference is that tuples are immutable whereas lists are mutable.
元组是项的不可变有序序列。 它们与列表相似,但主要区别在于元组是不可变的,而列表则是可变的。
Concepts to focus on:
重点关注的概念:
- Indexing and slicing (similar to lists). 索引和切片(类似于列表)。
- Nested tuples. 嵌套元组。
Adding tuples and helper methods like
count()
andindex()
.添加元组和辅助方法,例如
count()
和index()
。
辞典 (Dictionaries)
These are another type of collection in Python. While lists are integer indexed, dictionaries are more like addresses. Dictionaries have key-value pairs, and keys are analogous to indexes in lists.
这些是Python中的另一种收集类型。 列表是用整数索引的,而字典更像地址。 字典具有键值对,并且键类似于列表中的索引。
To access an element, you need to pass the key in squared brackets.
要访问元素,您需要在方括号中传递密钥。
Concepts to focus on:
重点关注的概念:
- Iterating through a dictionary (also covered in loops). 遍历字典(也包含在循环中)。
Using helper methods like
get()
,pop()
,items()
,keys()
,update()
, and so on.使用
get()
,pop()
,items()
,keys()
,update()
等帮助器方法。
Notebook for the above topics can be found here.
有关上述主题的笔记本可以在这里找到。
3.条件,循环和函数 (3. Conditionals, Loops, and Functions)
条件与分支 (Conditions and Branching)
Python uses these boolean variables to assess conditions. Whenever there is a comparison or evaluation, boolean values are the resulting solution.
Python使用这些布尔变量来评估条件。 无论何时进行比较或评估,布尔值都是最终的解决方案。
x = True
ptint(type(x))
# output: <class bool>
print(1 == 2)
# output: False
The comparison in the image needs to be observed carefully as people confuse the assignment operator (=
) with the comparison operator (==
).
由于人们将赋值运算符( =
)与比较运算符( ==
)混淆,需要仔细观察图像中的比较。
布尔运算符(或,非) (Boolean operators (or, and, not))
These are used to evaluate complex assertions together.
这些用于一起评估复杂的断言。
or
— One of the many comparisons should be true for the entire condition to be true.or
—要使整个条件成立,必须进行多个比较之一。and
— All of the comparisons should be true for the entire condition to be true.and
-所有比较的应该是真实的整个情况是真实的。not
— Checks for the opposite of the comparison specified.not
-检查指定的比较是相反的。
score = 76
percentile = 83
if score > 75 or percentile > 90:
print("Admission successful!")
else:
print("Try again next year")
# output: Try again next year
Concepts to learn:
要学习的概念:
if
,else
, andelif
statements to construct your condition.if
,else
和elif
语句来构造您的条件。- Making complex comparisons in one condition. 在一种情况下进行复杂的比较。
Keeping indentation in mind while writing nested
if
/else
statements.在编写嵌套的
if
/else
语句时要牢记缩进。Using boolean,
in
,is
, andnot
operators.在中使用布尔值
in
,is
和not
运算符。
循环 (Loops)
Often you'll need to do a repetitive task, and loops will be your best friend to eliminate the overhead of code redundancy. You’ll often need to iterate through each element of a list or dictionary, and loops come in handy for that. while
and for
are two types of loops.
通常,您需要执行重复的任务,而循环将是您最好的朋友,以消除代码冗余的开销。 您通常需要遍历列表或字典的每个元素,而循环很方便。 while
和for
是两种类型的循环。
Focus on:
专注于:
The
range()
function and iterating through a sequence usingfor
loops.range()
函数并使用for
循环遍历序列。while
loopswhile
循环
age = [12,43,45,10]
i = 0
while i < len(age):
if age[i] >= 18:
print("Adult")
else:
print("Juvenile")
i += 1
# output:
# Juvenile
# Adult
# Adult
# Juvenile
- Iterating through lists and appending (or any other task with list items) elements in a particular order 遍历列表并按特定顺序附加(或其他任何具有列表项的任务)元素
cubes = []
for i in range(1,10):
cubes.append(i ** 3)
print(cubes)
#output: [1, 8, 27, 64, 125, 216, 343, 512, 729]
Using
break
,pass
, andcontinue
keywords.使用
break
,pass
和continue
关键字。
清单理解 (List Comprehension)
A sophisticated and succinct way of creating a list using and iterable followed by a for
clause.
使用创建列表的一个复杂的和简洁的方式和迭代接着是for
条款。
For example, you can create a list of 9 cubes as shown in the example above using list comprehension.
例如,您可以使用列表理解来创建9个多维数据集的列表,如上面的示例所示。
# list comprehension
cubes = [n** 3 for n in range(1,10)]
print(cubes)
# output: [1, 8, 27, 64, 125, 216, 343, 512, 729]
功能 (Functions)
While working on a big project, maintaining code becomes a real chore. If your code performs similar tasks many times, a convenient way to manage your code is by using functions.
在大型项目中工作时,维护代码变得很繁琐。 如果您的代码多次执行类似的任务,则使用函数来管理代码的便捷方法。
A function is a block of code that performs some operations on input data and gives you the desired output.
函数是一段代码,对输入数据执行一些操作并提供所需的输出。
Using functions makes the code more readable, reduces redundancy, makes the code reusable, and saves time.
使用函数可使代码更具可读性,减少冗余,使代码可重用,并节省时间。
Python uses indentation to create blocks of code. This is an example of a function:
Python使用缩进来创建代码块。 这是一个函数示例:
def add_two_numbers(a, b):
sum = a + b
return sum
We define a function using the def
keyword followed by the name of the function and arguments (input) within the parentheses, followed by a colon.
我们使用def
关键字定义函数,其后是函数名称和括号内的参数(输入),后跟冒号。
The body of the function is the indented code block, and the output is returned with the return
keyword.
函数的主体是缩进的代码块,并使用return
关键字返回输出。
You call a function by specifying the name and passing the arguments within the parentheses as per the definition.
您可以通过指定名称并根据定义在括号内传递参数来调用函数。
More examples and details here.
更多示例和细节在这里 。
4.面向对象的编程和使用外部库 (4. Object-Oriented programming and using external libraries)
We have been using the helper methods for lists, dictionaries, and other data types, but where are these coming from?
我们一直在使用辅助方法来处理列表,字典和其他数据类型,但是这些方法来自何处?
When we say list or dict, we are actually interacting with a list class object or a dict class object. Printing the type of a dictionary object will show you that it is a class dict object.
当我们说列表或字典时,实际上是在与列表类对象或字典类对象进行交互。 打印字典对象的类型将向您显示它是一个类dict对象。
These are all pre-defined classes in the Python language, and they make our tasks very easy and convenient.
这些都是Python语言中的预定义类,它们使我们的任务变得非常容易和方便。
Objects are instance of a class and are defined as an encapsulation of variables (data) and functions into a single entity. They have access to the variables (attributes) and methods (functions) from classes.
对象是类的实例,并且被定义为将变量(数据)和函数封装到单个实体中。 他们可以访问类中的变量(属性)和方法(函数)。
Now the question is, can we create our own custom classes and objects? The answer is YES.
现在的问题是,我们可以创建自己的自定义类和对象吗? 答案是肯定的。
Here is how you define a class and an object of it:
这是定义类和对象的方法:
class Rectangle:
def __init__(self, height, width):
self.height = height
self.width = width
def area(self):
area = self.height * self.width
return area
rect1 = Rectangle(12, 10)
print(type(rect1))
# output: <class '__main__.Rectangle'>
You can then access the attributes and methods using the dot(.) operator.
然后,您可以使用dot(。)运算符访问属性和方法。
使用外部库/模块 (Using External Libraries/Modules)
One of the main reasons to use Python for data science is the amazing community that develops high-quality packages for different domains and problems. Using external libraries and modules is an integral part of working on projects in Python.
使用Python进行数据科学的主要原因之一是令人惊叹的社区,该社区针对不同的领域和问题开发了高质量的软件包。 使用外部库和模块是使用Python进行项目工作不可或缺的一部分。
These libraries and modules have defined classes, attributes, and methods that we can use to accomplish our tasks. For example, the math
library contains many mathematical functions that we can use to carry out our calculations. The libraries are .py
files.
这些库和模块具有定义的类,属性和方法,可用于完成任务。 例如, math
库包含许多数学函数,我们可以使用它们来进行计算。 这些库是.py
文件。
You should learn to:
您应该学会:
- Import libraries in your workspace 在工作区中导入库
Using the
help
function to learn about a library or function使用
help
功能了解库或功能
- Importing the required function directly. 直接导入所需的功能。
- How to read the documentation of the well-known packages like pandas, numpy, and sklearn and use them in your projects 如何阅读熊猫,numpy和sklearn等知名软件包的文档并在项目中使用它们
结语 (Wrap up)
That should cover the fundamentals of Python and get you started with data science.
那应该涵盖Python的基础知识,并让您开始使用数据科学。
There are a few other features, functionalities, and data types that you’ll become familiar with over time as you work on more and more projects.
随着您处理越来越多的项目,随着时间的流逝,您还将熟悉其他一些功能,功能和数据类型。
You can go through these concepts in GitHub repo where you’ll find the exercise notebooks as well:
您可以在GitHub存储库中了解这些概念,并在其中找到练习 笔记本 :
Here is 3-part video series based on this post for you to follow along with:
这是基于此帖子的三部分视频系列,供您跟随:
Harshit的数据科学 (Data Science with Harshit)
You can connect with me on LinkedIn, Twitter, Instagram, and check out my YouTube channel for more in-depth tutorials and interviews.
您可以在LinkedIn , Twitter , Instagram上与我联系,并查看我的YouTube频道以获取更多深入的教程和访谈。
翻译自: https://www.freecodecamp.org/news/python-fundamentals-for-data-science/
最后
以上就是美满滑板为你收集整理的数据科学的Python基础的全部内容,希望文章能够帮你解决数据科学的Python基础所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复