data = pd.Series([1.,NA,3.5,NA,7])
data.fillna(data.mean()) #以平均值填充
0 1.000000 1 3.833333 2 3.500000 3 3.833333 4 7.000000 dtype: float64 参数说明: value Scalar value or dict-like object to use to fill missing values method Interpolation; by default ‘ffill’ if function called with no other arguments axis Axis to fill on; default axis=0 inplace Modify the calling object without producing a copy limit For forward and backward filling, maximum number of consecutive periods to fill # 数据转换
movies = pd.read_table('/Users/meininghang/Downloads/pydata-book-2nd-edition/datasets/movielens/movies.dat',
sep = '::',header = None,names = mnames)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:2: ParserWarning: Falling back to the ‘python’ engine because the ‘c’ engine does not support regex separators (separators > 1 char and different from ‘s+’ are interpreted as regex); you can avoid this warning by specifying engine=’python’.
pieces = [x.strip() for x in val.split(',')]
pieces
复制代码
1 2
['a', 'b', 'guido']
链接
复制代码
1
f,s,t = pieces
复制代码
1
f + '::' + s + '::' + t
复制代码
1 2
'a::b::guido'
位置
复制代码
1
'guido'in val
复制代码
1 2
True
复制代码
1
val.index(',')
复制代码
1 2
1
复制代码
1
val.find(':')
复制代码
1 2
-1
计数
复制代码
1
val.count(',')
复制代码
1 2
2
替换
复制代码
1
val.replace(',','::')
复制代码
1 2 3
'a::b::
guido'
复制代码
1
val.replace(',',' ')
复制代码
1 2 3
'a b
guido'
参数: Argument Description count Return the number of non-overlapping occurrences of substring in the string. endswith Returns True if string ends with suffix. startswith Returns True if string starts with prefix. join Use string as delimiter for concatenating a sequence of other strings. index Return position of first character in substring if found in the string; raises ValueError if not found. find Return position of first character of first occurrence of substring in the string; like index, but returns –1 if not found. rfind Return position of first character of last occurrence of substring in the string; returns –1 if not found. replace Replace occurrences of string with another string. strip, rstrip, lstrip Trim whitespace, including newlines; equivalent to x.strip() (and rstrip, lstrip, respectively) for each element. split Break string into list of substrings using passed delimiter. lower Convert alphabet characters to lowercase. upper Convert alphabet characters to uppercase. casefold Convert characters to lowercase, and convert any region-specific variable character combinations to a common comparable form. ljust, rjust Left justify or right justify, respectively; pad opposite side of string with spaces (or some other fill character) to return a string with a minimum width.
正则
复制代码
1
import re
复制代码
1 2 3
text = 'foo
bart baz
tqux'
复制代码
1
re.split('s+',text)
复制代码
1 2
['foo', 'bar', 'baz', 'qux']
编译
复制代码
1
regex = re.compile('s+')
复制代码
1
regex.split(text)
复制代码
1 2
['foo', 'bar', 'baz', 'qux']
查询方式
复制代码
1
regex.findall(text)
复制代码
1 2 3 4
['
', 't ', '
t']
复制代码
1
regex.search(text) #返回第一个匹配结果
复制代码
1 2 3
<_sre.SRE_Match object; span=(3, 7), match='
'>
电子邮件
复制代码
1 2 3 4 5 6 7 8
text = """Dave dave@google.com
Steve steve@gmail.com
Rob rob@gmail.com
Ryan ryan@yahoo.com
"""
pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}'# re.IGNORECASE makes the regex case-insensitive
regex = re.compile(pattern, flags=re.IGNORECASE)
参数: Argument Description findall Return all non-overlapping matching patterns in a string as a list finditer Like findall, but returns an iterator match Match pattern at start of string and optionally segment pattern components into groups; if the pattern matches, returns a match object, and otherwise None search Scan string for match to pattern; returning a match object if so; unlike match, the match can be anywhere in the string as opposed to only at the beginning split Break string into pieces at each occurrence of pattern sub, subn Replace all (sub) or first n occurrences (subn) of pattern in string with replacement expression; use symbols 1, 2, … to refer to match group elements in the replacement string
向量化
复制代码
1 2
data = {'Dave': 'dave@google.com', 'Steve': 'steve@gmail.com',
'Rob': 'rob@gmail.com', 'Wes': np.nan}
复制代码
1
data = pd.Series(data)
复制代码
1
data
复制代码
1 2 3 4 5 6 7 8 9 10
Dave
dave@google.com
Rob
rob@gmail.com
Steve
steve@gmail.com
Wes
NaN
dtype: object
复制代码
1
data.isnull()
复制代码
1 2 3 4 5 6 7 8 9 10
Dave
False
Rob
False
Steve
False
Wes
True
dtype: bool
包含
复制代码
1
data.str.contains('gmail')
复制代码
1 2 3 4 5 6 7 8 9 10
Dave
False
Rob
True
Steve
True
Wes
NaN
dtype: object
切片
复制代码
1
data.str[:5]
复制代码
1 2 3 4 5 6 7 8 9 10
Dave
dave@
Rob
rob@g
Steve
steve
Wes
NaN
dtype: object
方法: Method Description cat Concatenate strings element-wise with optional delimiter contains Return boolean array if each string contains pattern/regex count Count occurrences of pattern extract Use a regular expression with groups to extract one or more strings from a Series of strings; the result will be a DataFrame with one column per group endswith Equivalent to x.endswith(pattern) for each element startswith Equivalent to x.startswith(pattern) for each element findall Compute list of all occurrences of pattern/regex for each string get Index into each element (retrieve i-th element) isalnum Equivalent to built-in str.alnum isalpha Equivalent to built-in str.isalpha isdecimal Equivalent to built-in str.isdecimal isdigit Equivalent to built-in str.isdigit islower Equivalent to built-in str.islower isnumeric Equivalent to built-in str.isnumeric isupper Equivalent to built-in str.isupper join Join strings in each element of the Series with passed separator len Compute length of each string lower, upper Convert cases; equivalent to x.lower() or x.upper() for each element match Use re.match with the passed regular expression on each element, returning matched groups as list pad Add whitespace to left, right, or both sides of strings center Equivalent to pad(side=’both’) repeat Duplicate values (e.g., s.str.repeat(3) is equivalent to x * 3 for each string) replace Replace occurrences of pattern/regex with some other string slice Slice each string in the Series split Split strings on delimiter or regular expression strip Trim whitespace from both sides, including newlines rstrip Trim whitespace on right side lstrip Trim whitespace on left side
发表评论 取消回复