· 7 years ago · Oct 17, 2018, 07:04 AM
1Help on DataFrame in module pandas.core.frame object:
2class DataFrame(pandas.core.generic.NDFrame)
3 | Two-dimensional size-mutable, potentially heterogeneous tabular data
4 | structure with labeled axes (rows and columns). Arithmetic operations
5 | align on both row and column labels. Can be thought of as a dict-like
6 | container for Series objects. The primary pandas data structure.
7 |
8 | Parameters
9 | ----------
10 | data : numpy ndarray (structured or homogeneous), dict, or DataFrame
11 | Dict can contain Series, arrays, constants, or list-like objects
12 |
13 | .. versionchanged :: 0.23.0
14 | If data is a dict, argument order is maintained for Python 3.6
15 | and later.
16 |
17 | index : Index or array-like
18 | Index to use for resulting frame. Will default to RangeIndex if
19 | no indexing information part of input data and no index provided
20 | columns : Index or array-like
21 | Column labels to use for resulting frame. Will default to
22 | RangeIndex (0, 1, 2, ..., n) if no column labels are provided
23 | dtype : dtype, default None
24 | Data type to force. Only a single dtype is allowed. If None, infer
25 | copy : boolean, default False
26 | Copy data from inputs. Only affects DataFrame / 2d ndarray input
27 |
28 | Examples
29 | --------
30 | Constructing DataFrame from a dictionary.
31 |
32 | >>> d = {'col1': [1, 2], 'col2': [3, 4]}
33 | >>> df = pd.DataFrame(data=d)
34 | >>> df
35 | col1 col2
36 | 0 1 3
37 | 1 2 4
38 |
39 | Notice that the inferred dtype is int64.
40 |
41 | >>> df.dtypes
42 | col1 int64
43 | col2 int64
44 | dtype: object
45 |
46 | To enforce a single dtype:
47 |
48 | >>> df = pd.DataFrame(data=d, dtype=np.int8)
49 | >>> df.dtypes
50 | col1 int8
51 | col2 int8
52 | dtype: object
53 |
54 | Constructing DataFrame from numpy ndarray:
55 |
56 | >>> df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
57 | ... columns=['a', 'b', 'c', 'd', 'e'])
58 | >>> df2
59 | a b c d e
60 | 0 2 8 8 3 4
61 | 1 4 2 9 0 9
62 | 2 1 0 7 8 0
63 | 3 5 1 7 1 3
64 | 4 6 0 2 4 2
65 |
66 | See also
67 | --------
68 | DataFrame.from_records : constructor from tuples, also record arrays
69 | DataFrame.from_dict : from dicts of Series, arrays, or dicts
70 | DataFrame.from_items : from sequence of (key, value) pairs
71 | pandas.read_csv, pandas.read_table, pandas.read_clipboard
72 |
73 | Method resolution order:
74 | DataFrame
75 | pandas.core.generic.NDFrame
76 | pandas.core.base.PandasObject
77 | pandas.core.base.StringMixin
78 | pandas.core.accessor.DirNamesMixin
79 | pandas.core.base.SelectionMixin
80 | builtins.object
81 |
82 | Methods defined here:
83 |
84 | __add__(self, other, axis=None, level=None, fill_value=None)
85 | Binary operator __add__ with support to substitute a fill_value for missing data in
86 | one of the inputs
87 |
88 | Parameters
89 | ----------
90 | other : Series, DataFrame, or constant
91 | axis : {0, 1, 'index', 'columns'}
92 | For Series input, axis to match Series index on
93 | fill_value : None or float value, default None
94 | Fill existing missing (NaN) values, and any new element needed for
95 | successful DataFrame alignment, with this value before computation.
96 | If data in both corresponding DataFrame locations is missing
97 | the result will be missing
98 | level : int or name
99 | Broadcast across a level, matching Index values on the
100 | passed MultiIndex level
101 |
102 | Notes
103 | -----
104 | Mismatched indices will be unioned together
105 |
106 | Returns
107 | -------
108 | result : DataFrame
109 |
110 | __and__(self, other, axis='columns', level=None, fill_value=None)
111 | Binary operator __and__ with support to substitute a fill_value for missing data in
112 | one of the inputs
113 |
114 | Parameters
115 | ----------
116 | other : Series, DataFrame, or constant
117 | axis : {0, 1, 'index', 'columns'}
118 | For Series input, axis to match Series index on
119 | fill_value : None or float value, default None
120 | Fill existing missing (NaN) values, and any new element needed for
121 | successful DataFrame alignment, with this value before computation.
122 | If data in both corresponding DataFrame locations is missing
123 | the result will be missing
124 | level : int or name
125 | Broadcast across a level, matching Index values on the
126 | passed MultiIndex level
127 |
128 | Notes
129 | -----
130 | Mismatched indices will be unioned together
131 |
132 | Returns
133 | -------
134 | result : DataFrame
135 |
136 | __div__ = __truediv__(self, other, axis=None, level=None, fill_value=None)
137 |
138 | __eq__(self, other)
139 | Wrapper for comparison method __eq__
140 |
141 | __floordiv__(self, other, axis=None, level=None, fill_value=None)
142 | Binary operator __floordiv__ with support to substitute a fill_value for missing data in
143 | one of the inputs
144 |
145 | Parameters
146 | ----------
147 | other : Series, DataFrame, or constant
148 | axis : {0, 1, 'index', 'columns'}
149 | For Series input, axis to match Series index on
150 | fill_value : None or float value, default None
151 | Fill existing missing (NaN) values, and any new element needed for
152 | successful DataFrame alignment, with this value before computation.
153 | If data in both corresponding DataFrame locations is missing
154 | the result will be missing
155 | level : int or name
156 | Broadcast across a level, matching Index values on the
157 | passed MultiIndex level
158 |
159 | Notes
160 | -----
161 | Mismatched indices will be unioned together
162 |
163 | Returns
164 | -------
165 | result : DataFrame
166 |
167 | __ge__(self, other)
168 | Wrapper for comparison method __ge__
169 |
170 | __getitem__(self, key)
171 |
172 | __gt__(self, other)
173 | Wrapper for comparison method __gt__
174 |
175 | __iadd__ = f(self, other)
176 |
177 | __iand__ = f(self, other)
178 |
179 | __ifloordiv__ = f(self, other)
180 |
181 | __imod__ = f(self, other)
182 |
183 | __imul__ = f(self, other)
184 |
185 | __init__(self, data=None, index=None, columns=None, dtype=None, copy=False)
186 | Initialize self. See help(type(self)) for accurate signature.
187 |
188 | __ior__ = f(self, other)
189 |
190 | __ipow__ = f(self, other)
191 |
192 | __isub__ = f(self, other)
193 |
194 | __itruediv__ = f(self, other)
195 |
196 | __ixor__ = f(self, other)
197 |
198 | __le__(self, other)
199 | Wrapper for comparison method __le__
200 |
201 | __len__(self)
202 | Returns length of info axis, but here we use the index
203 |
204 | __lt__(self, other)
205 | Wrapper for comparison method __lt__
206 |
207 | __matmul__(self, other)
208 | Matrix multiplication using binary `@` operator in Python>=3.5
209 |
210 | __mod__(self, other, axis=None, level=None, fill_value=None)
211 | Binary operator __mod__ with support to substitute a fill_value for missing data in
212 | one of the inputs
213 |
214 | Parameters
215 | ----------
216 | other : Series, DataFrame, or constant
217 | axis : {0, 1, 'index', 'columns'}
218 | For Series input, axis to match Series index on
219 | fill_value : None or float value, default None
220 | Fill existing missing (NaN) values, and any new element needed for
221 | successful DataFrame alignment, with this value before computation.
222 | If data in both corresponding DataFrame locations is missing
223 | the result will be missing
224 | level : int or name
225 | Broadcast across a level, matching Index values on the
226 | passed MultiIndex level
227 |
228 | Notes
229 | -----
230 | Mismatched indices will be unioned together
231 |
232 | Returns
233 | -------
234 | result : DataFrame
235 |
236 | __mul__(self, other, axis=None, level=None, fill_value=None)
237 | Binary operator __mul__ with support to substitute a fill_value for missing data in
238 | one of the inputs
239 |
240 | Parameters
241 | ----------
242 | other : Series, DataFrame, or constant
243 | axis : {0, 1, 'index', 'columns'}
244 | For Series input, axis to match Series index on
245 | fill_value : None or float value, default None
246 | Fill existing missing (NaN) values, and any new element needed for
247 | successful DataFrame alignment, with this value before computation.
248 | If data in both corresponding DataFrame locations is missing
249 | the result will be missing
250 | level : int or name
251 | Broadcast across a level, matching Index values on the
252 | passed MultiIndex level
253 |
254 | Notes
255 | -----
256 | Mismatched indices will be unioned together
257 |
258 | Returns
259 | -------
260 | result : DataFrame
261 |
262 | __ne__(self, other)
263 | Wrapper for comparison method __ne__
264 |
265 | __or__(self, other, axis='columns', level=None, fill_value=None)
266 | Binary operator __or__ with support to substitute a fill_value for missing data in
267 | one of the inputs
268 |
269 | Parameters
270 | ----------
271 | other : Series, DataFrame, or constant
272 | axis : {0, 1, 'index', 'columns'}
273 | For Series input, axis to match Series index on
274 | fill_value : None or float value, default None
275 | Fill existing missing (NaN) values, and any new element needed for
276 | successful DataFrame alignment, with this value before computation.
277 | If data in both corresponding DataFrame locations is missing
278 | the result will be missing
279 | level : int or name
280 | Broadcast across a level, matching Index values on the
281 | passed MultiIndex level
282 |
283 | Notes
284 | -----
285 | Mismatched indices will be unioned together
286 |
287 | Returns
288 | -------
289 | result : DataFrame
290 |
291 | __pow__(self, other, axis=None, level=None, fill_value=None)
292 | Binary operator __pow__ with support to substitute a fill_value for missing data in
293 | one of the inputs
294 |
295 | Parameters
296 | ----------
297 | other : Series, DataFrame, or constant
298 | axis : {0, 1, 'index', 'columns'}
299 | For Series input, axis to match Series index on
300 | fill_value : None or float value, default None
301 | Fill existing missing (NaN) values, and any new element needed for
302 | successful DataFrame alignment, with this value before computation.
303 | If data in both corresponding DataFrame locations is missing
304 | the result will be missing
305 | level : int or name
306 | Broadcast across a level, matching Index values on the
307 | passed MultiIndex level
308 |
309 | Notes
310 | -----
311 | Mismatched indices will be unioned together
312 |
313 | Returns
314 | -------
315 | result : DataFrame
316 |
317 | __radd__(self, other, axis=None, level=None, fill_value=None)
318 | Binary operator __radd__ with support to substitute a fill_value for missing data in
319 | one of the inputs
320 |
321 | Parameters
322 | ----------
323 | other : Series, DataFrame, or constant
324 | axis : {0, 1, 'index', 'columns'}
325 | For Series input, axis to match Series index on
326 | fill_value : None or float value, default None
327 | Fill existing missing (NaN) values, and any new element needed for
328 | successful DataFrame alignment, with this value before computation.
329 | If data in both corresponding DataFrame locations is missing
330 | the result will be missing
331 | level : int or name
332 | Broadcast across a level, matching Index values on the
333 | passed MultiIndex level
334 |
335 | Notes
336 | -----
337 | Mismatched indices will be unioned together
338 |
339 | Returns
340 | -------
341 | result : DataFrame
342 |
343 | __rand__(self, other, axis='columns', level=None, fill_value=None)
344 | Binary operator __rand__ with support to substitute a fill_value for missing data in
345 | one of the inputs
346 |
347 | Parameters
348 | ----------
349 | other : Series, DataFrame, or constant
350 | axis : {0, 1, 'index', 'columns'}
351 | For Series input, axis to match Series index on
352 | fill_value : None or float value, default None
353 | Fill existing missing (NaN) values, and any new element needed for
354 | successful DataFrame alignment, with this value before computation.
355 | If data in both corresponding DataFrame locations is missing
356 | the result will be missing
357 | level : int or name
358 | Broadcast across a level, matching Index values on the
359 | passed MultiIndex level
360 |
361 | Notes
362 | -----
363 | Mismatched indices will be unioned together
364 |
365 | Returns
366 | -------
367 | result : DataFrame
368 |
369 | __rdiv__ = __rtruediv__(self, other, axis=None, level=None, fill_value=None)
370 |
371 | __rfloordiv__(self, other, axis=None, level=None, fill_value=None)
372 | Binary operator __rfloordiv__ with support to substitute a fill_value for missing data in
373 | one of the inputs
374 |
375 | Parameters
376 | ----------
377 | other : Series, DataFrame, or constant
378 | axis : {0, 1, 'index', 'columns'}
379 | For Series input, axis to match Series index on
380 | fill_value : None or float value, default None
381 | Fill existing missing (NaN) values, and any new element needed for
382 | successful DataFrame alignment, with this value before computation.
383 | If data in both corresponding DataFrame locations is missing
384 | the result will be missing
385 | level : int or name
386 | Broadcast across a level, matching Index values on the
387 | passed MultiIndex level
388 |
389 | Notes
390 | -----
391 | Mismatched indices will be unioned together
392 |
393 | Returns
394 | -------
395 | result : DataFrame
396 |
397 | __rmatmul__(self, other)
398 | Matrix multiplication using binary `@` operator in Python>=3.5
399 |
400 | __rmod__(self, other, axis=None, level=None, fill_value=None)
401 | Binary operator __rmod__ with support to substitute a fill_value for missing data in
402 | one of the inputs
403 |
404 | Parameters
405 | ----------
406 | other : Series, DataFrame, or constant
407 | axis : {0, 1, 'index', 'columns'}
408 | For Series input, axis to match Series index on
409 | fill_value : None or float value, default None
410 | Fill existing missing (NaN) values, and any new element needed for
411 | successful DataFrame alignment, with this value before computation.
412 | If data in both corresponding DataFrame locations is missing
413 | the result will be missing
414 | level : int or name
415 | Broadcast across a level, matching Index values on the
416 | passed MultiIndex level
417 |
418 | Notes
419 | -----
420 | Mismatched indices will be unioned together
421 |
422 | Returns
423 | -------
424 | result : DataFrame
425 |
426 | __rmul__(self, other, axis=None, level=None, fill_value=None)
427 | Binary operator __rmul__ with support to substitute a fill_value for missing data in
428 | one of the inputs
429 |
430 | Parameters
431 | ----------
432 | other : Series, DataFrame, or constant
433 | axis : {0, 1, 'index', 'columns'}
434 | For Series input, axis to match Series index on
435 | fill_value : None or float value, default None
436 | Fill existing missing (NaN) values, and any new element needed for
437 | successful DataFrame alignment, with this value before computation.
438 | If data in both corresponding DataFrame locations is missing
439 | the result will be missing
440 | level : int or name
441 | Broadcast across a level, matching Index values on the
442 | passed MultiIndex level
443 |
444 | Notes
445 | -----
446 | Mismatched indices will be unioned together
447 |
448 | Returns
449 | -------
450 | result : DataFrame
451 |
452 | __ror__(self, other, axis='columns', level=None, fill_value=None)
453 | Binary operator __ror__ with support to substitute a fill_value for missing data in
454 | one of the inputs
455 |
456 | Parameters
457 | ----------
458 | other : Series, DataFrame, or constant
459 | axis : {0, 1, 'index', 'columns'}
460 | For Series input, axis to match Series index on
461 | fill_value : None or float value, default None
462 | Fill existing missing (NaN) values, and any new element needed for
463 | successful DataFrame alignment, with this value before computation.
464 | If data in both corresponding DataFrame locations is missing
465 | the result will be missing
466 | level : int or name
467 | Broadcast across a level, matching Index values on the
468 | passed MultiIndex level
469 |
470 | Notes
471 | -----
472 | Mismatched indices will be unioned together
473 |
474 | Returns
475 | -------
476 | result : DataFrame
477 |
478 | __rpow__(self, other, axis=None, level=None, fill_value=None)
479 | Binary operator __rpow__ with support to substitute a fill_value for missing data in
480 | one of the inputs
481 |
482 | Parameters
483 | ----------
484 | other : Series, DataFrame, or constant
485 | axis : {0, 1, 'index', 'columns'}
486 | For Series input, axis to match Series index on
487 | fill_value : None or float value, default None
488 | Fill existing missing (NaN) values, and any new element needed for
489 | successful DataFrame alignment, with this value before computation.
490 | If data in both corresponding DataFrame locations is missing
491 | the result will be missing
492 | level : int or name
493 | Broadcast across a level, matching Index values on the
494 | passed MultiIndex level
495 |
496 | Notes
497 | -----
498 | Mismatched indices will be unioned together
499 |
500 | Returns
501 | -------
502 | result : DataFrame
503 |
504 | __rsub__(self, other, axis=None, level=None, fill_value=None)
505 | Binary operator __rsub__ with support to substitute a fill_value for missing data in
506 | one of the inputs
507 |
508 | Parameters
509 | ----------
510 | other : Series, DataFrame, or constant
511 | axis : {0, 1, 'index', 'columns'}
512 | For Series input, axis to match Series index on
513 | fill_value : None or float value, default None
514 | Fill existing missing (NaN) values, and any new element needed for
515 | successful DataFrame alignment, with this value before computation.
516 | If data in both corresponding DataFrame locations is missing
517 | the result will be missing
518 | level : int or name
519 | Broadcast across a level, matching Index values on the
520 | passed MultiIndex level
521 |
522 | Notes
523 | -----
524 | Mismatched indices will be unioned together
525 |
526 | Returns
527 | -------
528 | result : DataFrame
529 |
530 | __rtruediv__(self, other, axis=None, level=None, fill_value=None)
531 | Binary operator __rtruediv__ with support to substitute a fill_value for missing data in
532 | one of the inputs
533 |
534 | Parameters
535 | ----------
536 | other : Series, DataFrame, or constant
537 | axis : {0, 1, 'index', 'columns'}
538 | For Series input, axis to match Series index on
539 | fill_value : None or float value, default None
540 | Fill existing missing (NaN) values, and any new element needed for
541 | successful DataFrame alignment, with this value before computation.
542 | If data in both corresponding DataFrame locations is missing
543 | the result will be missing
544 | level : int or name
545 | Broadcast across a level, matching Index values on the
546 | passed MultiIndex level
547 |
548 | Notes
549 | -----
550 | Mismatched indices will be unioned together
551 |
552 | Returns
553 | -------
554 | result : DataFrame
555 |
556 | __rxor__(self, other, axis='columns', level=None, fill_value=None)
557 | Binary operator __rxor__ with support to substitute a fill_value for missing data in
558 | one of the inputs
559 |
560 | Parameters
561 | ----------
562 | other : Series, DataFrame, or constant
563 | axis : {0, 1, 'index', 'columns'}
564 | For Series input, axis to match Series index on
565 | fill_value : None or float value, default None
566 | Fill existing missing (NaN) values, and any new element needed for
567 | successful DataFrame alignment, with this value before computation.
568 | If data in both corresponding DataFrame locations is missing
569 | the result will be missing
570 | level : int or name
571 | Broadcast across a level, matching Index values on the
572 | passed MultiIndex level
573 |
574 | Notes
575 | -----
576 | Mismatched indices will be unioned together
577 |
578 | Returns
579 | -------
580 | result : DataFrame
581 |
582 | __setitem__(self, key, value)
583 |
584 | __sub__(self, other, axis=None, level=None, fill_value=None)
585 | Binary operator __sub__ with support to substitute a fill_value for missing data in
586 | one of the inputs
587 |
588 | Parameters
589 | ----------
590 | other : Series, DataFrame, or constant
591 | axis : {0, 1, 'index', 'columns'}
592 | For Series input, axis to match Series index on
593 | fill_value : None or float value, default None
594 | Fill existing missing (NaN) values, and any new element needed for
595 | successful DataFrame alignment, with this value before computation.
596 | If data in both corresponding DataFrame locations is missing
597 | the result will be missing
598 | level : int or name
599 | Broadcast across a level, matching Index values on the
600 | passed MultiIndex level
601 |
602 | Notes
603 | -----
604 | Mismatched indices will be unioned together
605 |
606 | Returns
607 | -------
608 | result : DataFrame
609 |
610 | __truediv__(self, other, axis=None, level=None, fill_value=None)
611 | Binary operator __truediv__ with support to substitute a fill_value for missing data in
612 | one of the inputs
613 |
614 | Parameters
615 | ----------
616 | other : Series, DataFrame, or constant
617 | axis : {0, 1, 'index', 'columns'}
618 | For Series input, axis to match Series index on
619 | fill_value : None or float value, default None
620 | Fill existing missing (NaN) values, and any new element needed for
621 | successful DataFrame alignment, with this value before computation.
622 | If data in both corresponding DataFrame locations is missing
623 | the result will be missing
624 | level : int or name
625 | Broadcast across a level, matching Index values on the
626 | passed MultiIndex level
627 |
628 | Notes
629 | -----
630 | Mismatched indices will be unioned together
631 |
632 | Returns
633 | -------
634 | result : DataFrame
635 |
636 | __unicode__(self)
637 | Return a string representation for a particular DataFrame
638 |
639 | Invoked by unicode(df) in py2 only. Yields a Unicode String in both
640 | py2/py3.
641 |
642 | __xor__(self, other, axis='columns', level=None, fill_value=None)
643 | Binary operator __xor__ with support to substitute a fill_value for missing data in
644 | one of the inputs
645 |
646 | Parameters
647 | ----------
648 | other : Series, DataFrame, or constant
649 | axis : {0, 1, 'index', 'columns'}
650 | For Series input, axis to match Series index on
651 | fill_value : None or float value, default None
652 | Fill existing missing (NaN) values, and any new element needed for
653 | successful DataFrame alignment, with this value before computation.
654 | If data in both corresponding DataFrame locations is missing
655 | the result will be missing
656 | level : int or name
657 | Broadcast across a level, matching Index values on the
658 | passed MultiIndex level
659 |
660 | Notes
661 | -----
662 | Mismatched indices will be unioned together
663 |
664 | Returns
665 | -------
666 | result : DataFrame
667 |
668 | add(self, other, axis='columns', level=None, fill_value=None)
669 | Addition of dataframe and other, element-wise (binary operator `add`).
670 |
671 | Equivalent to ``dataframe + other``, but with support to substitute a fill_value for
672 | missing data in one of the inputs.
673 |
674 | Parameters
675 | ----------
676 | other : Series, DataFrame, or constant
677 | axis : {0, 1, 'index', 'columns'}
678 | For Series input, axis to match Series index on
679 | level : int or name
680 | Broadcast across a level, matching Index values on the
681 | passed MultiIndex level
682 | fill_value : None or float value, default None
683 | Fill existing missing (NaN) values, and any new element needed for
684 | successful DataFrame alignment, with this value before computation.
685 | If data in both corresponding DataFrame locations is missing
686 | the result will be missing
687 |
688 | Notes
689 | -----
690 | Mismatched indices will be unioned together
691 |
692 | Returns
693 | -------
694 | result : DataFrame
695 |
696 | Examples
697 | --------
698 |
699 | >>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'],
700 | ... columns=['one'])
701 | >>> a
702 | one
703 | a 1.0
704 | b 1.0
705 | c 1.0
706 | d NaN
707 | >>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan],
708 | ... two=[np.nan, 2, np.nan, 2]),
709 | ... index=['a', 'b', 'd', 'e'])
710 | >>> b
711 | one two
712 | a 1.0 NaN
713 | b NaN 2.0
714 | d 1.0 NaN
715 | e NaN 2.0
716 | >>> a.add(b, fill_value=0)
717 | one two
718 | a 2.0 NaN
719 | b 1.0 2.0
720 | c 1.0 NaN
721 | d 1.0 NaN
722 | e NaN 2.0
723 |
724 |
725 | See also
726 | --------
727 | DataFrame.radd
728 |
729 | agg = aggregate(self, func, axis=0, *args, **kwargs)
730 |
731 | aggregate(self, func, axis=0, *args, **kwargs)
732 | Aggregate using one or more operations over the specified axis.
733 |
734 | .. versionadded:: 0.20.0
735 |
736 | Parameters
737 | ----------
738 | func : function, string, dictionary, or list of string/functions
739 | Function to use for aggregating the data. If a function, must either
740 | work when passed a DataFrame or when passed to DataFrame.apply. For
741 | a DataFrame, can pass a dict, if the keys are DataFrame column names.
742 |
743 | Accepted combinations are:
744 |
745 | - string function name.
746 | - function.
747 | - list of functions.
748 | - dict of column names -> functions (or list of functions).
749 |
750 |
751 | axis : {0 or 'index', 1 or 'columns'}, default 0
752 | - 0 or 'index': apply function to each column.
753 | - 1 or 'columns': apply function to each row.
754 | *args
755 | Positional arguments to pass to `func`.
756 | **kwargs
757 | Keyword arguments to pass to `func`.
758 |
759 | Returns
760 | -------
761 | aggregated : DataFrame
762 |
763 | Notes
764 | -----
765 | `agg` is an alias for `aggregate`. Use the alias.
766 |
767 | A passed user-defined-function will be passed a Series for evaluation.
768 |
769 | The aggregation operations are always performed over an axis, either the
770 | index (default) or the column axis. This behavior is different from
771 | `numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,
772 | `var`), where the default is to compute the aggregation of the flattened
773 | array, e.g., ``numpy.mean(arr_2d)`` as opposed to ``numpy.mean(arr_2d,
774 | axis=0)``.
775 |
776 | `agg` is an alias for `aggregate`. Use the alias.
777 |
778 | Examples
779 | --------
780 | >>> df = pd.DataFrame([[1, 2, 3],
781 | ... [4, 5, 6],
782 | ... [7, 8, 9],
783 | ... [np.nan, np.nan, np.nan]],
784 | ... columns=['A', 'B', 'C'])
785 |
786 | Aggregate these functions over the rows.
787 |
788 | >>> df.agg(['sum', 'min'])
789 | A B C
790 | sum 12.0 15.0 18.0
791 | min 1.0 2.0 3.0
792 |
793 | Different aggregations per column.
794 |
795 | >>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
796 | A B
797 | max NaN 8.0
798 | min 1.0 2.0
799 | sum 12.0 NaN
800 |
801 | Aggregate over the columns.
802 |
803 | >>> df.agg("mean", axis="columns")
804 | 0 2.0
805 | 1 5.0
806 | 2 8.0
807 | 3 NaN
808 | dtype: float64
809 |
810 | See also
811 | --------
812 | DataFrame.apply : Perform any type of operations.
813 | DataFrame.transform : Perform transformation type operations.
814 | pandas.core.groupby.GroupBy : Perform operations over groups.
815 | pandas.core.resample.Resampler : Perform operations over resampled bins.
816 | pandas.core.window.Rolling : Perform operations over rolling window.
817 | pandas.core.window.Expanding : Perform operations over expanding window.
818 | pandas.core.window.EWM : Perform operation over exponential weighted
819 | window.
820 |
821 | align(self, other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)
822 | Align two objects on their axes with the
823 | specified join method for each axis Index
824 |
825 | Parameters
826 | ----------
827 | other : DataFrame or Series
828 | join : {'outer', 'inner', 'left', 'right'}, default 'outer'
829 | axis : allowed axis of the other object, default None
830 | Align on index (0), columns (1), or both (None)
831 | level : int or level name, default None
832 | Broadcast across a level, matching Index values on the
833 | passed MultiIndex level
834 | copy : boolean, default True
835 | Always returns new objects. If copy=False and no reindexing is
836 | required then original objects are returned.
837 | fill_value : scalar, default np.NaN
838 | Value to use for missing values. Defaults to NaN, but can be any
839 | "compatible" value
840 | method : str, default None
841 | limit : int, default None
842 | fill_axis : {0 or 'index', 1 or 'columns'}, default 0
843 | Filling axis, method and limit
844 | broadcast_axis : {0 or 'index', 1 or 'columns'}, default None
845 | Broadcast values along this axis, if aligning two objects of
846 | different dimensions
847 |
848 | Returns
849 | -------
850 | (left, right) : (DataFrame, type of other)
851 | Aligned objects
852 |
853 | all(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs)
854 | Return whether all elements are True, potentially over an axis.
855 |
856 | Returns True if all elements within a series or along a Dataframe
857 | axis are non-zero, not-empty or not-False.
858 |
859 | Parameters
860 | ----------
861 | axis : {0 or 'index', 1 or 'columns', None}, default 0
862 | Indicate which axis or axes should be reduced.
863 |
864 | * 0 / 'index' : reduce the index, return a Series whose index is the
865 | original column labels.
866 | * 1 / 'columns' : reduce the columns, return a Series whose index is the
867 | original index.
868 | * None : reduce all axes, return a scalar.
869 |
870 | skipna : boolean, default True
871 | Exclude NA/null values. If an entire row/column is NA, the result
872 | will be NA.
873 | level : int or level name, default None
874 | If the axis is a MultiIndex (hierarchical), count along a
875 | particular level, collapsing into a Series.
876 | bool_only : boolean, default None
877 | Include only boolean columns. If None, will attempt to use everything,
878 | then use only boolean data. Not implemented for Series.
879 | **kwargs : any, default None
880 | Additional keywords have no effect but might be accepted for
881 | compatibility with NumPy.
882 |
883 | Returns
884 | -------
885 | all : Series or DataFrame (if level specified)
886 |
887 | See also
888 | --------
889 | pandas.Series.all : Return True if all elements are True
890 | pandas.DataFrame.any : Return True if one (or more) elements are True
891 |
892 | Examples
893 | --------
894 | Series
895 |
896 | >>> pd.Series([True, True]).all()
897 | True
898 | >>> pd.Series([True, False]).all()
899 | False
900 |
901 | DataFrames
902 |
903 | Create a dataframe from a dictionary.
904 |
905 | >>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
906 | >>> df
907 | col1 col2
908 | 0 True True
909 | 1 True False
910 |
911 | Default behaviour checks if column-wise values all return True.
912 |
913 | >>> df.all()
914 | col1 True
915 | col2 False
916 | dtype: bool
917 |
918 | Specify ``axis='columns'`` to check if row-wise values all return True.
919 |
920 | >>> df.all(axis='columns')
921 | 0 True
922 | 1 False
923 | dtype: bool
924 |
925 | Or ``axis=None`` for whether every value is True.
926 |
927 | >>> df.all(axis=None)
928 | False
929 |
930 | any(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs)
931 | Return whether any element is True over requested axis.
932 |
933 | Unlike :meth:`DataFrame.all`, this performs an *or* operation. If any of the
934 | values along the specified axis is True, this will return True.
935 |
936 | Parameters
937 | ----------
938 | axis : {0 or 'index', 1 or 'columns', None}, default 0
939 | Indicate which axis or axes should be reduced.
940 |
941 | * 0 / 'index' : reduce the index, return a Series whose index is the
942 | original column labels.
943 | * 1 / 'columns' : reduce the columns, return a Series whose index is the
944 | original index.
945 | * None : reduce all axes, return a scalar.
946 |
947 | skipna : boolean, default True
948 | Exclude NA/null values. If an entire row/column is NA, the result
949 | will be NA.
950 | level : int or level name, default None
951 | If the axis is a MultiIndex (hierarchical), count along a
952 | particular level, collapsing into a Series.
953 | bool_only : boolean, default None
954 | Include only boolean columns. If None, will attempt to use everything,
955 | then use only boolean data. Not implemented for Series.
956 | **kwargs : any, default None
957 | Additional keywords have no effect but might be accepted for
958 | compatibility with NumPy.
959 |
960 | Returns
961 | -------
962 | any : Series or DataFrame (if level specified)
963 |
964 | See Also
965 | --------
966 | pandas.DataFrame.all : Return whether all elements are True.
967 |
968 | Examples
969 | --------
970 | **Series**
971 |
972 | For Series input, the output is a scalar indicating whether any element
973 | is True.
974 |
975 | >>> pd.Series([True, False]).any()
976 | True
977 |
978 | **DataFrame**
979 |
980 | Whether each column contains at least one True element (the default).
981 |
982 | >>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
983 | >>> df
984 | A B C
985 | 0 1 0 0
986 | 1 2 2 0
987 |
988 | >>> df.any()
989 | A True
990 | B True
991 | C False
992 | dtype: bool
993 |
994 | Aggregating over the columns.
995 |
996 | >>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
997 | >>> df
998 | A B
999 | 0 True 1
1000 | 1 False 2
1001 |
1002 | >>> df.any(axis='columns')
1003 | 0 True
1004 | 1 True
1005 | dtype: bool
1006 |
1007 | >>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
1008 | >>> df
1009 | A B
1010 | 0 True 1
1011 | 1 False 0
1012 |
1013 | >>> df.any(axis='columns')
1014 | 0 True
1015 | 1 False
1016 | dtype: bool
1017 |
1018 | Aggregating over the entire DataFrame with ``axis=None``.
1019 |
1020 | >>> df.any(axis=None)
1021 | True
1022 |
1023 | `any` for an empty DataFrame is an empty Series.
1024 |
1025 | >>> pd.DataFrame([]).any()
1026 | Series([], dtype: bool)
1027 |
1028 | append(self, other, ignore_index=False, verify_integrity=False, sort=None)
1029 | Append rows of `other` to the end of this frame, returning a new
1030 | object. Columns not in this frame are added as new columns.
1031 |
1032 | Parameters
1033 | ----------
1034 | other : DataFrame or Series/dict-like object, or list of these
1035 | The data to append.
1036 | ignore_index : boolean, default False
1037 | If True, do not use the index labels.
1038 | verify_integrity : boolean, default False
1039 | If True, raise ValueError on creating index with duplicates.
1040 | sort : boolean, default None
1041 | Sort columns if the columns of `self` and `other` are not aligned.
1042 | The default sorting is deprecated and will change to not-sorting
1043 | in a future version of pandas. Explicitly pass ``sort=True`` to
1044 | silence the warning and sort. Explicitly pass ``sort=False`` to
1045 | silence the warning and not sort.
1046 |
1047 | .. versionadded:: 0.23.0
1048 |
1049 | Returns
1050 | -------
1051 | appended : DataFrame
1052 |
1053 | Notes
1054 | -----
1055 | If a list of dict/series is passed and the keys are all contained in
1056 | the DataFrame's index, the order of the columns in the resulting
1057 | DataFrame will be unchanged.
1058 |
1059 | Iteratively appending rows to a DataFrame can be more computationally
1060 | intensive than a single concatenate. A better solution is to append
1061 | those rows to a list and then concatenate the list with the original
1062 | DataFrame all at once.
1063 |
1064 | See also
1065 | --------
1066 | pandas.concat : General function to concatenate DataFrame, Series
1067 | or Panel objects
1068 |
1069 | Examples
1070 | --------
1071 |
1072 | >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
1073 | >>> df
1074 | A B
1075 | 0 1 2
1076 | 1 3 4
1077 | >>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
1078 | >>> df.append(df2)
1079 | A B
1080 | 0 1 2
1081 | 1 3 4
1082 | 0 5 6
1083 | 1 7 8
1084 |
1085 | With `ignore_index` set to True:
1086 |
1087 | >>> df.append(df2, ignore_index=True)
1088 | A B
1089 | 0 1 2
1090 | 1 3 4
1091 | 2 5 6
1092 | 3 7 8
1093 |
1094 | The following, while not recommended methods for generating DataFrames,
1095 | show two ways to generate a DataFrame from multiple data sources.
1096 |
1097 | Less efficient:
1098 |
1099 | >>> df = pd.DataFrame(columns=['A'])
1100 | >>> for i in range(5):
1101 | ... df = df.append({'A': i}, ignore_index=True)
1102 | >>> df
1103 | A
1104 | 0 0
1105 | 1 1
1106 | 2 2
1107 | 3 3
1108 | 4 4
1109 |
1110 | More efficient:
1111 |
1112 | >>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
1113 | ... ignore_index=True)
1114 | A
1115 | 0 0
1116 | 1 1
1117 | 2 2
1118 | 3 3
1119 | 4 4
1120 |
1121 | apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
1122 | Apply a function along an axis of the DataFrame.
1123 |
1124 | Objects passed to the function are Series objects whose index is
1125 | either the DataFrame's index (``axis=0``) or the DataFrame's columns
1126 | (``axis=1``). By default (``result_type=None``), the final return type
1127 | is inferred from the return type of the applied function. Otherwise,
1128 | it depends on the `result_type` argument.
1129 |
1130 | Parameters
1131 | ----------
1132 | func : function
1133 | Function to apply to each column or row.
1134 | axis : {0 or 'index', 1 or 'columns'}, default 0
1135 | Axis along which the function is applied:
1136 |
1137 | * 0 or 'index': apply function to each column.
1138 | * 1 or 'columns': apply function to each row.
1139 | broadcast : bool, optional
1140 | Only relevant for aggregation functions:
1141 |
1142 | * ``False`` or ``None`` : returns a Series whose length is the
1143 | length of the index or the number of columns (based on the
1144 | `axis` parameter)
1145 | * ``True`` : results will be broadcast to the original shape
1146 | of the frame, the original index and columns will be retained.
1147 |
1148 | .. deprecated:: 0.23.0
1149 | This argument will be removed in a future version, replaced
1150 | by result_type='broadcast'.
1151 |
1152 | raw : bool, default False
1153 | * ``False`` : passes each row or column as a Series to the
1154 | function.
1155 | * ``True`` : the passed function will receive ndarray objects
1156 | instead.
1157 | If you are just applying a NumPy reduction function this will
1158 | achieve much better performance.
1159 | reduce : bool or None, default None
1160 | Try to apply reduction procedures. If the DataFrame is empty,
1161 | `apply` will use `reduce` to determine whether the result
1162 | should be a Series or a DataFrame. If ``reduce=None`` (the
1163 | default), `apply`'s return value will be guessed by calling
1164 | `func` on an empty Series
1165 | (note: while guessing, exceptions raised by `func` will be
1166 | ignored).
1167 | If ``reduce=True`` a Series will always be returned, and if
1168 | ``reduce=False`` a DataFrame will always be returned.
1169 |
1170 | .. deprecated:: 0.23.0
1171 | This argument will be removed in a future version, replaced
1172 | by ``result_type='reduce'``.
1173 |
1174 | result_type : {'expand', 'reduce', 'broadcast', None}, default None
1175 | These only act when ``axis=1`` (columns):
1176 |
1177 | * 'expand' : list-like results will be turned into columns.
1178 | * 'reduce' : returns a Series if possible rather than expanding
1179 | list-like results. This is the opposite of 'expand'.
1180 | * 'broadcast' : results will be broadcast to the original shape
1181 | of the DataFrame, the original index and columns will be
1182 | retained.
1183 |
1184 | The default behaviour (None) depends on the return value of the
1185 | applied function: list-like results will be returned as a Series
1186 | of those. However if the apply function returns a Series these
1187 | are expanded to columns.
1188 |
1189 | .. versionadded:: 0.23.0
1190 |
1191 | args : tuple
1192 | Positional arguments to pass to `func` in addition to the
1193 | array/series.
1194 | **kwds
1195 | Additional keyword arguments to pass as keywords arguments to
1196 | `func`.
1197 |
1198 | Notes
1199 | -----
1200 | In the current implementation apply calls `func` twice on the
1201 | first column/row to decide whether it can take a fast or slow
1202 | code path. This can lead to unexpected behavior if `func` has
1203 | side-effects, as they will take effect twice for the first
1204 | column/row.
1205 |
1206 | See also
1207 | --------
1208 | DataFrame.applymap: For elementwise operations
1209 | DataFrame.aggregate: only perform aggregating type operations
1210 | DataFrame.transform: only perform transformating type operations
1211 |
1212 | Examples
1213 | --------
1214 |
1215 | >>> df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
1216 | >>> df
1217 | A B
1218 | 0 4 9
1219 | 1 4 9
1220 | 2 4 9
1221 |
1222 | Using a numpy universal function (in this case the same as
1223 | ``np.sqrt(df)``):
1224 |
1225 | >>> df.apply(np.sqrt)
1226 | A B
1227 | 0 2.0 3.0
1228 | 1 2.0 3.0
1229 | 2 2.0 3.0
1230 |
1231 | Using a reducing function on either axis
1232 |
1233 | >>> df.apply(np.sum, axis=0)
1234 | A 12
1235 | B 27
1236 | dtype: int64
1237 |
1238 | >>> df.apply(np.sum, axis=1)
1239 | 0 13
1240 | 1 13
1241 | 2 13
1242 | dtype: int64
1243 |
1244 | Retuning a list-like will result in a Series
1245 |
1246 | >>> df.apply(lambda x: [1, 2], axis=1)
1247 | 0 [1, 2]
1248 | 1 [1, 2]
1249 | 2 [1, 2]
1250 | dtype: object
1251 |
1252 | Passing result_type='expand' will expand list-like results
1253 | to columns of a Dataframe
1254 |
1255 | >>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
1256 | 0 1
1257 | 0 1 2
1258 | 1 1 2
1259 | 2 1 2
1260 |
1261 | Returning a Series inside the function is similar to passing
1262 | ``result_type='expand'``. The resulting column names
1263 | will be the Series index.
1264 |
1265 | >>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
1266 | foo bar
1267 | 0 1 2
1268 | 1 1 2
1269 | 2 1 2
1270 |
1271 | Passing ``result_type='broadcast'`` will ensure the same shape
1272 | result, whether list-like or scalar is returned by the function,
1273 | and broadcast it along the axis. The resulting column names will
1274 | be the originals.
1275 |
1276 | >>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
1277 | A B
1278 | 0 1 2
1279 | 1 1 2
1280 | 2 1 2
1281 |
1282 | Returns
1283 | -------
1284 | applied : Series or DataFrame
1285 |
1286 | applymap(self, func)
1287 | Apply a function to a Dataframe elementwise.
1288 |
1289 | This method applies a function that accepts and returns a scalar
1290 | to every element of a DataFrame.
1291 |
1292 | Parameters
1293 | ----------
1294 | func : callable
1295 | Python function, returns a single value from a single value.
1296 |
1297 | Returns
1298 | -------
1299 | DataFrame
1300 | Transformed DataFrame.
1301 |
1302 | See also
1303 | --------
1304 | DataFrame.apply : Apply a function along input axis of DataFrame
1305 |
1306 | Examples
1307 | --------
1308 | >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
1309 | >>> df
1310 | 0 1
1311 | 0 1.000 2.120
1312 | 1 3.356 4.567
1313 |
1314 | >>> df.applymap(lambda x: len(str(x)))
1315 | 0 1
1316 | 0 3 4
1317 | 1 5 5
1318 |
1319 | Note that a vectorized version of `func` often exists, which will
1320 | be much faster. You could square each number elementwise.
1321 |
1322 | >>> df.applymap(lambda x: x**2)
1323 | 0 1
1324 | 0 1.000000 4.494400
1325 | 1 11.262736 20.857489
1326 |
1327 | But it's better to avoid applymap in that case.
1328 |
1329 | >>> df ** 2
1330 | 0 1
1331 | 0 1.000000 4.494400
1332 | 1 11.262736 20.857489
1333 |
1334 | assign(self, **kwargs)
1335 | Assign new columns to a DataFrame, returning a new object
1336 | (a copy) with the new columns added to the original ones.
1337 | Existing columns that are re-assigned will be overwritten.
1338 |
1339 | Parameters
1340 | ----------
1341 | kwargs : keyword, value pairs
1342 | keywords are the column names. If the values are
1343 | callable, they are computed on the DataFrame and
1344 | assigned to the new columns. The callable must not
1345 | change input DataFrame (though pandas doesn't check it).
1346 | If the values are not callable, (e.g. a Series, scalar, or array),
1347 | they are simply assigned.
1348 |
1349 | Returns
1350 | -------
1351 | df : DataFrame
1352 | A new DataFrame with the new columns in addition to
1353 | all the existing columns.
1354 |
1355 | Notes
1356 | -----
1357 | Assigning multiple columns within the same ``assign`` is possible.
1358 | For Python 3.6 and above, later items in '\*\*kwargs' may refer to
1359 | newly created or modified columns in 'df'; items are computed and
1360 | assigned into 'df' in order. For Python 3.5 and below, the order of
1361 | keyword arguments is not specified, you cannot refer to newly created
1362 | or modified columns. All items are computed first, and then assigned
1363 | in alphabetical order.
1364 |
1365 | .. versionchanged :: 0.23.0
1366 |
1367 | Keyword argument order is maintained for Python 3.6 and later.
1368 |
1369 | Examples
1370 | --------
1371 | >>> df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
1372 |
1373 | Where the value is a callable, evaluated on `df`:
1374 |
1375 | >>> df.assign(ln_A = lambda x: np.log(x.A))
1376 | A B ln_A
1377 | 0 1 0.426905 0.000000
1378 | 1 2 -0.780949 0.693147
1379 | 2 3 -0.418711 1.098612
1380 | 3 4 -0.269708 1.386294
1381 | 4 5 -0.274002 1.609438
1382 | 5 6 -0.500792 1.791759
1383 | 6 7 1.649697 1.945910
1384 | 7 8 -1.495604 2.079442
1385 | 8 9 0.549296 2.197225
1386 | 9 10 -0.758542 2.302585
1387 |
1388 | Where the value already exists and is inserted:
1389 |
1390 | >>> newcol = np.log(df['A'])
1391 | >>> df.assign(ln_A=newcol)
1392 | A B ln_A
1393 | 0 1 0.426905 0.000000
1394 | 1 2 -0.780949 0.693147
1395 | 2 3 -0.418711 1.098612
1396 | 3 4 -0.269708 1.386294
1397 | 4 5 -0.274002 1.609438
1398 | 5 6 -0.500792 1.791759
1399 | 6 7 1.649697 1.945910
1400 | 7 8 -1.495604 2.079442
1401 | 8 9 0.549296 2.197225
1402 | 9 10 -0.758542 2.302585
1403 |
1404 | Where the keyword arguments depend on each other
1405 |
1406 | >>> df = pd.DataFrame({'A': [1, 2, 3]})
1407 |
1408 | >>> df.assign(B=df.A, C=lambda x:x['A']+ x['B'])
1409 | A B C
1410 | 0 1 1 2
1411 | 1 2 2 4
1412 | 2 3 3 6
1413 |
1414 | boxplot = boxplot_frame(self, column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, **kwds)
1415 | Make a box plot from DataFrame columns.
1416 |
1417 | Make a box-and-whisker plot from DataFrame columns, optionally grouped
1418 | by some other columns. A box plot is a method for graphically depicting
1419 | groups of numerical data through their quartiles.
1420 | The box extends from the Q1 to Q3 quartile values of the data,
1421 | with a line at the median (Q2). The whiskers extend from the edges
1422 | of box to show the range of the data. The position of the whiskers
1423 | is set by default to `1.5 * IQR (IQR = Q3 - Q1)` from the edges of the box.
1424 | Outlier points are those past the end of the whiskers.
1425 |
1426 | For further details see
1427 | Wikipedia's entry for `boxplot <https://en.wikipedia.org/wiki/Box_plot>`_.
1428 |
1429 | Parameters
1430 | ----------
1431 | column : str or list of str, optional
1432 | Column name or list of names, or vector.
1433 | Can be any valid input to :meth:`pandas.DataFrame.groupby`.
1434 | by : str or array-like, optional
1435 | Column in the DataFrame to :meth:`pandas.DataFrame.groupby`.
1436 | One box-plot will be done per value of columns in `by`.
1437 | ax : object of class matplotlib.axes.Axes, optional
1438 | The matplotlib axes to be used by boxplot.
1439 | fontsize : float or str
1440 | Tick label font size in points or as a string (e.g., `large`).
1441 | rot : int or float, default 0
1442 | The rotation angle of labels (in degrees)
1443 | with respect to the screen coordinate sytem.
1444 | grid : boolean, default True
1445 | Setting this to True will show the grid.
1446 | figsize : A tuple (width, height) in inches
1447 | The size of the figure to create in matplotlib.
1448 | layout : tuple (rows, columns), optional
1449 | For example, (3, 5) will display the subplots
1450 | using 3 columns and 5 rows, starting from the top-left.
1451 | return_type : {'axes', 'dict', 'both'} or None, default 'axes'
1452 | The kind of object to return. The default is ``axes``.
1453 |
1454 | * 'axes' returns the matplotlib axes the boxplot is drawn on.
1455 | * 'dict' returns a dictionary whose values are the matplotlib
1456 | Lines of the boxplot.
1457 | * 'both' returns a namedtuple with the axes and dict.
1458 | * when grouping with ``by``, a Series mapping columns to
1459 | ``return_type`` is returned.
1460 |
1461 | If ``return_type`` is `None`, a NumPy array
1462 | of axes with the same shape as ``layout`` is returned.
1463 | **kwds
1464 | All other plotting keyword arguments to be passed to
1465 | :func:`matplotlib.pyplot.boxplot`.
1466 |
1467 | Returns
1468 | -------
1469 | result :
1470 |
1471 | The return type depends on the `return_type` parameter:
1472 |
1473 | * 'axes' : object of class matplotlib.axes.Axes
1474 | * 'dict' : dict of matplotlib.lines.Line2D objects
1475 | * 'both' : a nametuple with strucure (ax, lines)
1476 |
1477 | For data grouped with ``by``:
1478 |
1479 | * :class:`~pandas.Series`
1480 | * :class:`~numpy.array` (for ``return_type = None``)
1481 |
1482 | See Also
1483 | --------
1484 | Series.plot.hist: Make a histogram.
1485 | matplotlib.pyplot.boxplot : Matplotlib equivalent plot.
1486 |
1487 | Notes
1488 | -----
1489 | Use ``return_type='dict'`` when you want to tweak the appearance
1490 | of the lines after plotting. In this case a dict containing the Lines
1491 | making up the boxes, caps, fliers, medians, and whiskers is returned.
1492 |
1493 | Examples
1494 | --------
1495 |
1496 | Boxplots can be created for every column in the dataframe
1497 | by ``df.boxplot()`` or indicating the columns to be used:
1498 |
1499 | .. plot::
1500 | :context: close-figs
1501 |
1502 | >>> np.random.seed(1234)
1503 | >>> df = pd.DataFrame(np.random.randn(10,4),
1504 | ... columns=['Col1', 'Col2', 'Col3', 'Col4'])
1505 | >>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3'])
1506 |
1507 | Boxplots of variables distributions grouped by the values of a third
1508 | variable can be created using the option ``by``. For instance:
1509 |
1510 | .. plot::
1511 | :context: close-figs
1512 |
1513 | >>> df = pd.DataFrame(np.random.randn(10, 2),
1514 | ... columns=['Col1', 'Col2'])
1515 | >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A',
1516 | ... 'B', 'B', 'B', 'B', 'B'])
1517 | >>> boxplot = df.boxplot(by='X')
1518 |
1519 | A list of strings (i.e. ``['X', 'Y']``) can be passed to boxplot
1520 | in order to group the data by combination of the variables in the x-axis:
1521 |
1522 | .. plot::
1523 | :context: close-figs
1524 |
1525 | >>> df = pd.DataFrame(np.random.randn(10,3),
1526 | ... columns=['Col1', 'Col2', 'Col3'])
1527 | >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A',
1528 | ... 'B', 'B', 'B', 'B', 'B'])
1529 | >>> df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A',
1530 | ... 'B', 'A', 'B', 'A', 'B'])
1531 | >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
1532 |
1533 | The layout of boxplot can be adjusted giving a tuple to ``layout``:
1534 |
1535 | .. plot::
1536 | :context: close-figs
1537 |
1538 | >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
1539 | ... layout=(2, 1))
1540 |
1541 | Additional formatting can be done to the boxplot, like suppressing the grid
1542 | (``grid=False``), rotating the labels in the x-axis (i.e. ``rot=45``)
1543 | or changing the fontsize (i.e. ``fontsize=15``):
1544 |
1545 | .. plot::
1546 | :context: close-figs
1547 |
1548 | >>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15)
1549 |
1550 | The parameter ``return_type`` can be used to select the type of element
1551 | returned by `boxplot`. When ``return_type='axes'`` is selected,
1552 | the matplotlib axes on which the boxplot is drawn are returned:
1553 |
1554 | >>> boxplot = df.boxplot(column=['Col1','Col2'], return_type='axes')
1555 | >>> type(boxplot)
1556 | <class 'matplotlib.axes._subplots.AxesSubplot'>
1557 |
1558 | When grouping with ``by``, a Series mapping columns to ``return_type``
1559 | is returned:
1560 |
1561 | >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
1562 | ... return_type='axes')
1563 | >>> type(boxplot)
1564 | <class 'pandas.core.series.Series'>
1565 |
1566 | If ``return_type`` is `None`, a NumPy array of axes with the same shape
1567 | as ``layout`` is returned:
1568 |
1569 | >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
1570 | ... return_type=None)
1571 | >>> type(boxplot)
1572 | <class 'numpy.ndarray'>
1573 |
1574 | combine(self, other, func, fill_value=None, overwrite=True)
1575 | Add two DataFrame objects and do not propagate NaN values, so if for a
1576 | (column, time) one frame is missing a value, it will default to the
1577 | other frame's value (which might be NaN as well)
1578 |
1579 | Parameters
1580 | ----------
1581 | other : DataFrame
1582 | func : function
1583 | Function that takes two series as inputs and return a Series or a
1584 | scalar
1585 | fill_value : scalar value
1586 | overwrite : boolean, default True
1587 | If True then overwrite values for common keys in the calling frame
1588 |
1589 | Returns
1590 | -------
1591 | result : DataFrame
1592 |
1593 | Examples
1594 | --------
1595 | >>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
1596 | >>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
1597 | >>> df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2)
1598 | A B
1599 | 0 0 3
1600 | 1 0 3
1601 |
1602 | See Also
1603 | --------
1604 | DataFrame.combine_first : Combine two DataFrame objects and default to
1605 | non-null values in frame calling the method
1606 |
1607 | combine_first(self, other)
1608 | Combine two DataFrame objects and default to non-null values in frame
1609 | calling the method. Result index columns will be the union of the
1610 | respective indexes and columns
1611 |
1612 | Parameters
1613 | ----------
1614 | other : DataFrame
1615 |
1616 | Returns
1617 | -------
1618 | combined : DataFrame
1619 |
1620 | Examples
1621 | --------
1622 | df1's values prioritized, use values from df2 to fill holes:
1623 |
1624 | >>> df1 = pd.DataFrame([[1, np.nan]])
1625 | >>> df2 = pd.DataFrame([[3, 4]])
1626 | >>> df1.combine_first(df2)
1627 | 0 1
1628 | 0 1 4.0
1629 |
1630 | See Also
1631 | --------
1632 | DataFrame.combine : Perform series-wise operation on two DataFrames
1633 | using a given function
1634 |
1635 | compound(self, axis=None, skipna=None, level=None)
1636 | Return the compound percentage of the values for the requested axis
1637 |
1638 | Parameters
1639 | ----------
1640 | axis : {index (0), columns (1)}
1641 | skipna : boolean, default True
1642 | Exclude NA/null values when computing the result.
1643 | level : int or level name, default None
1644 | If the axis is a MultiIndex (hierarchical), count along a
1645 | particular level, collapsing into a Series
1646 | numeric_only : boolean, default None
1647 | Include only float, int, boolean columns. If None, will attempt to use
1648 | everything, then use only numeric data. Not implemented for Series.
1649 |
1650 | Returns
1651 | -------
1652 | compounded : Series or DataFrame (if level specified)
1653 |
1654 | corr(self, method='pearson', min_periods=1)
1655 | Compute pairwise correlation of columns, excluding NA/null values
1656 |
1657 | Parameters
1658 | ----------
1659 | method : {'pearson', 'kendall', 'spearman'}
1660 | * pearson : standard correlation coefficient
1661 | * kendall : Kendall Tau correlation coefficient
1662 | * spearman : Spearman rank correlation
1663 | min_periods : int, optional
1664 | Minimum number of observations required per pair of columns
1665 | to have a valid result. Currently only available for pearson
1666 | and spearman correlation
1667 |
1668 | Returns
1669 | -------
1670 | y : DataFrame
1671 |
1672 | corrwith(self, other, axis=0, drop=False)
1673 | Compute pairwise correlation between rows or columns of two DataFrame
1674 | objects.
1675 |
1676 | Parameters
1677 | ----------
1678 | other : DataFrame, Series
1679 | axis : {0 or 'index', 1 or 'columns'}, default 0
1680 | 0 or 'index' to compute column-wise, 1 or 'columns' for row-wise
1681 | drop : boolean, default False
1682 | Drop missing indices from result, default returns union of all
1683 |
1684 | Returns
1685 | -------
1686 | correls : Series
1687 |
1688 | count(self, axis=0, level=None, numeric_only=False)
1689 | Count non-NA cells for each column or row.
1690 |
1691 | The values `None`, `NaN`, `NaT`, and optionally `numpy.inf` (depending
1692 | on `pandas.options.mode.use_inf_as_na`) are considered NA.
1693 |
1694 | Parameters
1695 | ----------
1696 | axis : {0 or 'index', 1 or 'columns'}, default 0
1697 | If 0 or 'index' counts are generated for each column.
1698 | If 1 or 'columns' counts are generated for each **row**.
1699 | level : int or str, optional
1700 | If the axis is a `MultiIndex` (hierarchical), count along a
1701 | particular `level`, collapsing into a `DataFrame`.
1702 | A `str` specifies the level name.
1703 | numeric_only : boolean, default False
1704 | Include only `float`, `int` or `boolean` data.
1705 |
1706 | Returns
1707 | -------
1708 | Series or DataFrame
1709 | For each column/row the number of non-NA/null entries.
1710 | If `level` is specified returns a `DataFrame`.
1711 |
1712 | See Also
1713 | --------
1714 | Series.count: number of non-NA elements in a Series
1715 | DataFrame.shape: number of DataFrame rows and columns (including NA
1716 | elements)
1717 | DataFrame.isna: boolean same-sized DataFrame showing places of NA
1718 | elements
1719 |
1720 | Examples
1721 | --------
1722 | Constructing DataFrame from a dictionary:
1723 |
1724 | >>> df = pd.DataFrame({"Person":
1725 | ... ["John", "Myla", None, "John", "Myla"],
1726 | ... "Age": [24., np.nan, 21., 33, 26],
1727 | ... "Single": [False, True, True, True, False]})
1728 | >>> df
1729 | Person Age Single
1730 | 0 John 24.0 False
1731 | 1 Myla NaN True
1732 | 2 None 21.0 True
1733 | 3 John 33.0 True
1734 | 4 Myla 26.0 False
1735 |
1736 | Notice the uncounted NA values:
1737 |
1738 | >>> df.count()
1739 | Person 4
1740 | Age 4
1741 | Single 5
1742 | dtype: int64
1743 |
1744 | Counts for each **row**:
1745 |
1746 | >>> df.count(axis='columns')
1747 | 0 3
1748 | 1 2
1749 | 2 2
1750 | 3 3
1751 | 4 3
1752 | dtype: int64
1753 |
1754 | Counts for one level of a `MultiIndex`:
1755 |
1756 | >>> df.set_index(["Person", "Single"]).count(level="Person")
1757 | Age
1758 | Person
1759 | John 2
1760 | Myla 1
1761 |
1762 | cov(self, min_periods=None)
1763 | Compute pairwise covariance of columns, excluding NA/null values.
1764 |
1765 | Compute the pairwise covariance among the series of a DataFrame.
1766 | The returned data frame is the `covariance matrix
1767 | <https://en.wikipedia.org/wiki/Covariance_matrix>`__ of the columns
1768 | of the DataFrame.
1769 |
1770 | Both NA and null values are automatically excluded from the
1771 | calculation. (See the note below about bias from missing values.)
1772 | A threshold can be set for the minimum number of
1773 | observations for each value created. Comparisons with observations
1774 | below this threshold will be returned as ``NaN``.
1775 |
1776 | This method is generally used for the analysis of time series data to
1777 | understand the relationship between different measures
1778 | across time.
1779 |
1780 | Parameters
1781 | ----------
1782 | min_periods : int, optional
1783 | Minimum number of observations required per pair of columns
1784 | to have a valid result.
1785 |
1786 | Returns
1787 | -------
1788 | DataFrame
1789 | The covariance matrix of the series of the DataFrame.
1790 |
1791 | See Also
1792 | --------
1793 | pandas.Series.cov : compute covariance with another Series
1794 | pandas.core.window.EWM.cov: expoential weighted sample covariance
1795 | pandas.core.window.Expanding.cov : expanding sample covariance
1796 | pandas.core.window.Rolling.cov : rolling sample covariance
1797 |
1798 | Notes
1799 | -----
1800 | Returns the covariance matrix of the DataFrame's time series.
1801 | The covariance is normalized by N-1.
1802 |
1803 | For DataFrames that have Series that are missing data (assuming that
1804 | data is `missing at random
1805 | <https://en.wikipedia.org/wiki/Missing_data#Missing_at_random>`__)
1806 | the returned covariance matrix will be an unbiased estimate
1807 | of the variance and covariance between the member Series.
1808 |
1809 | However, for many applications this estimate may not be acceptable
1810 | because the estimate covariance matrix is not guaranteed to be positive
1811 | semi-definite. This could lead to estimate correlations having
1812 | absolute values which are greater than one, and/or a non-invertible
1813 | covariance matrix. See `Estimation of covariance matrices
1814 | <http://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_
1815 | matrices>`__ for more details.
1816 |
1817 | Examples
1818 | --------
1819 | >>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
1820 | ... columns=['dogs', 'cats'])
1821 | >>> df.cov()
1822 | dogs cats
1823 | dogs 0.666667 -1.000000
1824 | cats -1.000000 1.666667
1825 |
1826 | >>> np.random.seed(42)
1827 | >>> df = pd.DataFrame(np.random.randn(1000, 5),
1828 | ... columns=['a', 'b', 'c', 'd', 'e'])
1829 | >>> df.cov()
1830 | a b c d e
1831 | a 0.998438 -0.020161 0.059277 -0.008943 0.014144
1832 | b -0.020161 1.059352 -0.008543 -0.024738 0.009826
1833 | c 0.059277 -0.008543 1.010670 -0.001486 -0.000271
1834 | d -0.008943 -0.024738 -0.001486 0.921297 -0.013692
1835 | e 0.014144 0.009826 -0.000271 -0.013692 0.977795
1836 |
1837 | **Minimum number of periods**
1838 |
1839 | This method also supports an optional ``min_periods`` keyword
1840 | that specifies the required minimum number of non-NA observations for
1841 | each column pair in order to have a valid result:
1842 |
1843 | >>> np.random.seed(42)
1844 | >>> df = pd.DataFrame(np.random.randn(20, 3),
1845 | ... columns=['a', 'b', 'c'])
1846 | >>> df.loc[df.index[:5], 'a'] = np.nan
1847 | >>> df.loc[df.index[5:10], 'b'] = np.nan
1848 | >>> df.cov(min_periods=12)
1849 | a b c
1850 | a 0.316741 NaN -0.150812
1851 | b NaN 1.248003 0.191417
1852 | c -0.150812 0.191417 0.895202
1853 |
1854 | cummax(self, axis=None, skipna=True, *args, **kwargs)
1855 | Return cumulative maximum over a DataFrame or Series axis.
1856 |
1857 | Returns a DataFrame or Series of the same size containing the cumulative
1858 | maximum.
1859 |
1860 | Parameters
1861 | ----------
1862 | axis : {0 or 'index', 1 or 'columns'}, default 0
1863 | The index or the name of the axis. 0 is equivalent to None or 'index'.
1864 | skipna : boolean, default True
1865 | Exclude NA/null values. If an entire row/column is NA, the result
1866 | will be NA.
1867 | *args, **kwargs :
1868 | Additional keywords have no effect but might be accepted for
1869 | compatibility with NumPy.
1870 |
1871 | Returns
1872 | -------
1873 | cummax : Series or DataFrame
1874 |
1875 | Examples
1876 | --------
1877 | **Series**
1878 |
1879 | >>> s = pd.Series([2, np.nan, 5, -1, 0])
1880 | >>> s
1881 | 0 2.0
1882 | 1 NaN
1883 | 2 5.0
1884 | 3 -1.0
1885 | 4 0.0
1886 | dtype: float64
1887 |
1888 | By default, NA values are ignored.
1889 |
1890 | >>> s.cummax()
1891 | 0 2.0
1892 | 1 NaN
1893 | 2 5.0
1894 | 3 5.0
1895 | 4 5.0
1896 | dtype: float64
1897 |
1898 | To include NA values in the operation, use ``skipna=False``
1899 |
1900 | >>> s.cummax(skipna=False)
1901 | 0 2.0
1902 | 1 NaN
1903 | 2 NaN
1904 | 3 NaN
1905 | 4 NaN
1906 | dtype: float64
1907 |
1908 | **DataFrame**
1909 |
1910 | >>> df = pd.DataFrame([[2.0, 1.0],
1911 | ... [3.0, np.nan],
1912 | ... [1.0, 0.0]],
1913 | ... columns=list('AB'))
1914 | >>> df
1915 | A B
1916 | 0 2.0 1.0
1917 | 1 3.0 NaN
1918 | 2 1.0 0.0
1919 |
1920 | By default, iterates over rows and finds the maximum
1921 | in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
1922 |
1923 | >>> df.cummax()
1924 | A B
1925 | 0 2.0 1.0
1926 | 1 3.0 NaN
1927 | 2 3.0 1.0
1928 |
1929 | To iterate over columns and find the maximum in each row,
1930 | use ``axis=1``
1931 |
1932 | >>> df.cummax(axis=1)
1933 | A B
1934 | 0 2.0 2.0
1935 | 1 3.0 NaN
1936 | 2 1.0 1.0
1937 |
1938 | See also
1939 | --------
1940 | pandas.core.window.Expanding.max : Similar functionality
1941 | but ignores ``NaN`` values.
1942 | DataFrame.max : Return the maximum over
1943 | DataFrame axis.
1944 | DataFrame.cummax : Return cumulative maximum over DataFrame axis.
1945 | DataFrame.cummin : Return cumulative minimum over DataFrame axis.
1946 | DataFrame.cumsum : Return cumulative sum over DataFrame axis.
1947 | DataFrame.cumprod : Return cumulative product over DataFrame axis.
1948 |
1949 | cummin(self, axis=None, skipna=True, *args, **kwargs)
1950 | Return cumulative minimum over a DataFrame or Series axis.
1951 |
1952 | Returns a DataFrame or Series of the same size containing the cumulative
1953 | minimum.
1954 |
1955 | Parameters
1956 | ----------
1957 | axis : {0 or 'index', 1 or 'columns'}, default 0
1958 | The index or the name of the axis. 0 is equivalent to None or 'index'.
1959 | skipna : boolean, default True
1960 | Exclude NA/null values. If an entire row/column is NA, the result
1961 | will be NA.
1962 | *args, **kwargs :
1963 | Additional keywords have no effect but might be accepted for
1964 | compatibility with NumPy.
1965 |
1966 | Returns
1967 | -------
1968 | cummin : Series or DataFrame
1969 |
1970 | Examples
1971 | --------
1972 | **Series**
1973 |
1974 | >>> s = pd.Series([2, np.nan, 5, -1, 0])
1975 | >>> s
1976 | 0 2.0
1977 | 1 NaN
1978 | 2 5.0
1979 | 3 -1.0
1980 | 4 0.0
1981 | dtype: float64
1982 |
1983 | By default, NA values are ignored.
1984 |
1985 | >>> s.cummin()
1986 | 0 2.0
1987 | 1 NaN
1988 | 2 2.0
1989 | 3 -1.0
1990 | 4 -1.0
1991 | dtype: float64
1992 |
1993 | To include NA values in the operation, use ``skipna=False``
1994 |
1995 | >>> s.cummin(skipna=False)
1996 | 0 2.0
1997 | 1 NaN
1998 | 2 NaN
1999 | 3 NaN
2000 | 4 NaN
2001 | dtype: float64
2002 |
2003 | **DataFrame**
2004 |
2005 | >>> df = pd.DataFrame([[2.0, 1.0],
2006 | ... [3.0, np.nan],
2007 | ... [1.0, 0.0]],
2008 | ... columns=list('AB'))
2009 | >>> df
2010 | A B
2011 | 0 2.0 1.0
2012 | 1 3.0 NaN
2013 | 2 1.0 0.0
2014 |
2015 | By default, iterates over rows and finds the minimum
2016 | in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
2017 |
2018 | >>> df.cummin()
2019 | A B
2020 | 0 2.0 1.0
2021 | 1 2.0 NaN
2022 | 2 1.0 0.0
2023 |
2024 | To iterate over columns and find the minimum in each row,
2025 | use ``axis=1``
2026 |
2027 | >>> df.cummin(axis=1)
2028 | A B
2029 | 0 2.0 1.0
2030 | 1 3.0 NaN
2031 | 2 1.0 0.0
2032 |
2033 | See also
2034 | --------
2035 | pandas.core.window.Expanding.min : Similar functionality
2036 | but ignores ``NaN`` values.
2037 | DataFrame.min : Return the minimum over
2038 | DataFrame axis.
2039 | DataFrame.cummax : Return cumulative maximum over DataFrame axis.
2040 | DataFrame.cummin : Return cumulative minimum over DataFrame axis.
2041 | DataFrame.cumsum : Return cumulative sum over DataFrame axis.
2042 | DataFrame.cumprod : Return cumulative product over DataFrame axis.
2043 |
2044 | cumprod(self, axis=None, skipna=True, *args, **kwargs)
2045 | Return cumulative product over a DataFrame or Series axis.
2046 |
2047 | Returns a DataFrame or Series of the same size containing the cumulative
2048 | product.
2049 |
2050 | Parameters
2051 | ----------
2052 | axis : {0 or 'index', 1 or 'columns'}, default 0
2053 | The index or the name of the axis. 0 is equivalent to None or 'index'.
2054 | skipna : boolean, default True
2055 | Exclude NA/null values. If an entire row/column is NA, the result
2056 | will be NA.
2057 | *args, **kwargs :
2058 | Additional keywords have no effect but might be accepted for
2059 | compatibility with NumPy.
2060 |
2061 | Returns
2062 | -------
2063 | cumprod : Series or DataFrame
2064 |
2065 | Examples
2066 | --------
2067 | **Series**
2068 |
2069 | >>> s = pd.Series([2, np.nan, 5, -1, 0])
2070 | >>> s
2071 | 0 2.0
2072 | 1 NaN
2073 | 2 5.0
2074 | 3 -1.0
2075 | 4 0.0
2076 | dtype: float64
2077 |
2078 | By default, NA values are ignored.
2079 |
2080 | >>> s.cumprod()
2081 | 0 2.0
2082 | 1 NaN
2083 | 2 10.0
2084 | 3 -10.0
2085 | 4 -0.0
2086 | dtype: float64
2087 |
2088 | To include NA values in the operation, use ``skipna=False``
2089 |
2090 | >>> s.cumprod(skipna=False)
2091 | 0 2.0
2092 | 1 NaN
2093 | 2 NaN
2094 | 3 NaN
2095 | 4 NaN
2096 | dtype: float64
2097 |
2098 | **DataFrame**
2099 |
2100 | >>> df = pd.DataFrame([[2.0, 1.0],
2101 | ... [3.0, np.nan],
2102 | ... [1.0, 0.0]],
2103 | ... columns=list('AB'))
2104 | >>> df
2105 | A B
2106 | 0 2.0 1.0
2107 | 1 3.0 NaN
2108 | 2 1.0 0.0
2109 |
2110 | By default, iterates over rows and finds the product
2111 | in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
2112 |
2113 | >>> df.cumprod()
2114 | A B
2115 | 0 2.0 1.0
2116 | 1 6.0 NaN
2117 | 2 6.0 0.0
2118 |
2119 | To iterate over columns and find the product in each row,
2120 | use ``axis=1``
2121 |
2122 | >>> df.cumprod(axis=1)
2123 | A B
2124 | 0 2.0 2.0
2125 | 1 3.0 NaN
2126 | 2 1.0 0.0
2127 |
2128 | See also
2129 | --------
2130 | pandas.core.window.Expanding.prod : Similar functionality
2131 | but ignores ``NaN`` values.
2132 | DataFrame.prod : Return the product over
2133 | DataFrame axis.
2134 | DataFrame.cummax : Return cumulative maximum over DataFrame axis.
2135 | DataFrame.cummin : Return cumulative minimum over DataFrame axis.
2136 | DataFrame.cumsum : Return cumulative sum over DataFrame axis.
2137 | DataFrame.cumprod : Return cumulative product over DataFrame axis.
2138 |
2139 | cumsum(self, axis=None, skipna=True, *args, **kwargs)
2140 | Return cumulative sum over a DataFrame or Series axis.
2141 |
2142 | Returns a DataFrame or Series of the same size containing the cumulative
2143 | sum.
2144 |
2145 | Parameters
2146 | ----------
2147 | axis : {0 or 'index', 1 or 'columns'}, default 0
2148 | The index or the name of the axis. 0 is equivalent to None or 'index'.
2149 | skipna : boolean, default True
2150 | Exclude NA/null values. If an entire row/column is NA, the result
2151 | will be NA.
2152 | *args, **kwargs :
2153 | Additional keywords have no effect but might be accepted for
2154 | compatibility with NumPy.
2155 |
2156 | Returns
2157 | -------
2158 | cumsum : Series or DataFrame
2159 |
2160 | Examples
2161 | --------
2162 | **Series**
2163 |
2164 | >>> s = pd.Series([2, np.nan, 5, -1, 0])
2165 | >>> s
2166 | 0 2.0
2167 | 1 NaN
2168 | 2 5.0
2169 | 3 -1.0
2170 | 4 0.0
2171 | dtype: float64
2172 |
2173 | By default, NA values are ignored.
2174 |
2175 | >>> s.cumsum()
2176 | 0 2.0
2177 | 1 NaN
2178 | 2 7.0
2179 | 3 6.0
2180 | 4 6.0
2181 | dtype: float64
2182 |
2183 | To include NA values in the operation, use ``skipna=False``
2184 |
2185 | >>> s.cumsum(skipna=False)
2186 | 0 2.0
2187 | 1 NaN
2188 | 2 NaN
2189 | 3 NaN
2190 | 4 NaN
2191 | dtype: float64
2192 |
2193 | **DataFrame**
2194 |
2195 | >>> df = pd.DataFrame([[2.0, 1.0],
2196 | ... [3.0, np.nan],
2197 | ... [1.0, 0.0]],
2198 | ... columns=list('AB'))
2199 | >>> df
2200 | A B
2201 | 0 2.0 1.0
2202 | 1 3.0 NaN
2203 | 2 1.0 0.0
2204 |
2205 | By default, iterates over rows and finds the sum
2206 | in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
2207 |
2208 | >>> df.cumsum()
2209 | A B
2210 | 0 2.0 1.0
2211 | 1 5.0 NaN
2212 | 2 6.0 1.0
2213 |
2214 | To iterate over columns and find the sum in each row,
2215 | use ``axis=1``
2216 |
2217 | >>> df.cumsum(axis=1)
2218 | A B
2219 | 0 2.0 3.0
2220 | 1 3.0 NaN
2221 | 2 1.0 1.0
2222 |
2223 | See also
2224 | --------
2225 | pandas.core.window.Expanding.sum : Similar functionality
2226 | but ignores ``NaN`` values.
2227 | DataFrame.sum : Return the sum over
2228 | DataFrame axis.
2229 | DataFrame.cummax : Return cumulative maximum over DataFrame axis.
2230 | DataFrame.cummin : Return cumulative minimum over DataFrame axis.
2231 | DataFrame.cumsum : Return cumulative sum over DataFrame axis.
2232 | DataFrame.cumprod : Return cumulative product over DataFrame axis.
2233 |
2234 | diff(self, periods=1, axis=0)
2235 | First discrete difference of element.
2236 |
2237 | Calculates the difference of a DataFrame element compared with another
2238 | element in the DataFrame (default is the element in the same column
2239 | of the previous row).
2240 |
2241 | Parameters
2242 | ----------
2243 | periods : int, default 1
2244 | Periods to shift for calculating difference, accepts negative
2245 | values.
2246 | axis : {0 or 'index', 1 or 'columns'}, default 0
2247 | Take difference over rows (0) or columns (1).
2248 |
2249 | .. versionadded:: 0.16.1.
2250 |
2251 | Returns
2252 | -------
2253 | diffed : DataFrame
2254 |
2255 | See Also
2256 | --------
2257 | Series.diff: First discrete difference for a Series.
2258 | DataFrame.pct_change: Percent change over given number of periods.
2259 | DataFrame.shift: Shift index by desired number of periods with an
2260 | optional time freq.
2261 |
2262 | Examples
2263 | --------
2264 | Difference with previous row
2265 |
2266 | >>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
2267 | ... 'b': [1, 1, 2, 3, 5, 8],
2268 | ... 'c': [1, 4, 9, 16, 25, 36]})
2269 | >>> df
2270 | a b c
2271 | 0 1 1 1
2272 | 1 2 1 4
2273 | 2 3 2 9
2274 | 3 4 3 16
2275 | 4 5 5 25
2276 | 5 6 8 36
2277 |
2278 | >>> df.diff()
2279 | a b c
2280 | 0 NaN NaN NaN
2281 | 1 1.0 0.0 3.0
2282 | 2 1.0 1.0 5.0
2283 | 3 1.0 1.0 7.0
2284 | 4 1.0 2.0 9.0
2285 | 5 1.0 3.0 11.0
2286 |
2287 | Difference with previous column
2288 |
2289 | >>> df.diff(axis=1)
2290 | a b c
2291 | 0 NaN 0.0 0.0
2292 | 1 NaN -1.0 3.0
2293 | 2 NaN -1.0 7.0
2294 | 3 NaN -1.0 13.0
2295 | 4 NaN 0.0 20.0
2296 | 5 NaN 2.0 28.0
2297 |
2298 | Difference with 3rd previous row
2299 |
2300 | >>> df.diff(periods=3)
2301 | a b c
2302 | 0 NaN NaN NaN
2303 | 1 NaN NaN NaN
2304 | 2 NaN NaN NaN
2305 | 3 3.0 2.0 15.0
2306 | 4 3.0 4.0 21.0
2307 | 5 3.0 6.0 27.0
2308 |
2309 | Difference with following row
2310 |
2311 | >>> df.diff(periods=-1)
2312 | a b c
2313 | 0 -1.0 0.0 -3.0
2314 | 1 -1.0 -1.0 -5.0
2315 | 2 -1.0 -1.0 -7.0
2316 | 3 -1.0 -2.0 -9.0
2317 | 4 -1.0 -3.0 -11.0
2318 | 5 NaN NaN NaN
2319 |
2320 | div = truediv(self, other, axis='columns', level=None, fill_value=None)
2321 |
2322 | divide = truediv(self, other, axis='columns', level=None, fill_value=None)
2323 |
2324 | dot(self, other)
2325 | Matrix multiplication with DataFrame or Series objects. Can also be
2326 | called using `self @ other` in Python >= 3.5.
2327 |
2328 | Parameters
2329 | ----------
2330 | other : DataFrame or Series
2331 |
2332 | Returns
2333 | -------
2334 | dot_product : DataFrame or Series
2335 |
2336 | drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
2337 | Drop specified labels from rows or columns.
2338 |
2339 | Remove rows or columns by specifying label names and corresponding
2340 | axis, or by specifying directly index or column names. When using a
2341 | multi-index, labels on different levels can be removed by specifying
2342 | the level.
2343 |
2344 | Parameters
2345 | ----------
2346 | labels : single label or list-like
2347 | Index or column labels to drop.
2348 | axis : {0 or 'index', 1 or 'columns'}, default 0
2349 | Whether to drop labels from the index (0 or 'index') or
2350 | columns (1 or 'columns').
2351 | index, columns : single label or list-like
2352 | Alternative to specifying axis (``labels, axis=1``
2353 | is equivalent to ``columns=labels``).
2354 |
2355 | .. versionadded:: 0.21.0
2356 | level : int or level name, optional
2357 | For MultiIndex, level from which the labels will be removed.
2358 | inplace : bool, default False
2359 | If True, do operation inplace and return None.
2360 | errors : {'ignore', 'raise'}, default 'raise'
2361 | If 'ignore', suppress error and only existing labels are
2362 | dropped.
2363 |
2364 | Returns
2365 | -------
2366 | dropped : pandas.DataFrame
2367 |
2368 | See Also
2369 | --------
2370 | DataFrame.loc : Label-location based indexer for selection by label.
2371 | DataFrame.dropna : Return DataFrame with labels on given axis omitted
2372 | where (all or any) data are missing
2373 | DataFrame.drop_duplicates : Return DataFrame with duplicate rows
2374 | removed, optionally only considering certain columns
2375 | Series.drop : Return Series with specified index labels removed.
2376 |
2377 | Raises
2378 | ------
2379 | KeyError
2380 | If none of the labels are found in the selected axis
2381 |
2382 | Examples
2383 | --------
2384 | >>> df = pd.DataFrame(np.arange(12).reshape(3,4),
2385 | ... columns=['A', 'B', 'C', 'D'])
2386 | >>> df
2387 | A B C D
2388 | 0 0 1 2 3
2389 | 1 4 5 6 7
2390 | 2 8 9 10 11
2391 |
2392 | Drop columns
2393 |
2394 | >>> df.drop(['B', 'C'], axis=1)
2395 | A D
2396 | 0 0 3
2397 | 1 4 7
2398 | 2 8 11
2399 |
2400 | >>> df.drop(columns=['B', 'C'])
2401 | A D
2402 | 0 0 3
2403 | 1 4 7
2404 | 2 8 11
2405 |
2406 | Drop a row by index
2407 |
2408 | >>> df.drop([0, 1])
2409 | A B C D
2410 | 2 8 9 10 11
2411 |
2412 | Drop columns and/or rows of MultiIndex DataFrame
2413 |
2414 | >>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
2415 | ... ['speed', 'weight', 'length']],
2416 | ... labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
2417 | ... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
2418 | >>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
2419 | ... data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
2420 | ... [250, 150], [1.5, 0.8], [320, 250],
2421 | ... [1, 0.8], [0.3,0.2]])
2422 | >>> df
2423 | big small
2424 | lama speed 45.0 30.0
2425 | weight 200.0 100.0
2426 | length 1.5 1.0
2427 | cow speed 30.0 20.0
2428 | weight 250.0 150.0
2429 | length 1.5 0.8
2430 | falcon speed 320.0 250.0
2431 | weight 1.0 0.8
2432 | length 0.3 0.2
2433 |
2434 | >>> df.drop(index='cow', columns='small')
2435 | big
2436 | lama speed 45.0
2437 | weight 200.0
2438 | length 1.5
2439 | falcon speed 320.0
2440 | weight 1.0
2441 | length 0.3
2442 |
2443 | >>> df.drop(index='length', level=1)
2444 | big small
2445 | lama speed 45.0 30.0
2446 | weight 200.0 100.0
2447 | cow speed 30.0 20.0
2448 | weight 250.0 150.0
2449 | falcon speed 320.0 250.0
2450 | weight 1.0 0.8
2451 |
2452 | drop_duplicates(self, subset=None, keep='first', inplace=False)
2453 | Return DataFrame with duplicate rows removed, optionally only
2454 | considering certain columns
2455 |
2456 | Parameters
2457 | ----------
2458 | subset : column label or sequence of labels, optional
2459 | Only consider certain columns for identifying duplicates, by
2460 | default use all of the columns
2461 | keep : {'first', 'last', False}, default 'first'
2462 | - ``first`` : Drop duplicates except for the first occurrence.
2463 | - ``last`` : Drop duplicates except for the last occurrence.
2464 | - False : Drop all duplicates.
2465 | inplace : boolean, default False
2466 | Whether to drop duplicates in place or to return a copy
2467 |
2468 | Returns
2469 | -------
2470 | deduplicated : DataFrame
2471 |
2472 | dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
2473 | Remove missing values.
2474 |
2475 | See the :ref:`User Guide <missing_data>` for more on which values are
2476 | considered missing, and how to work with missing data.
2477 |
2478 | Parameters
2479 | ----------
2480 | axis : {0 or 'index', 1 or 'columns'}, default 0
2481 | Determine if rows or columns which contain missing values are
2482 | removed.
2483 |
2484 | * 0, or 'index' : Drop rows which contain missing values.
2485 | * 1, or 'columns' : Drop columns which contain missing value.
2486 |
2487 | .. deprecated:: 0.23.0: Pass tuple or list to drop on multiple
2488 | axes.
2489 | how : {'any', 'all'}, default 'any'
2490 | Determine if row or column is removed from DataFrame, when we have
2491 | at least one NA or all NA.
2492 |
2493 | * 'any' : If any NA values are present, drop that row or column.
2494 | * 'all' : If all values are NA, drop that row or column.
2495 | thresh : int, optional
2496 | Require that many non-NA values.
2497 | subset : array-like, optional
2498 | Labels along other axis to consider, e.g. if you are dropping rows
2499 | these would be a list of columns to include.
2500 | inplace : bool, default False
2501 | If True, do operation inplace and return None.
2502 |
2503 | Returns
2504 | -------
2505 | DataFrame
2506 | DataFrame with NA entries dropped from it.
2507 |
2508 | See Also
2509 | --------
2510 | DataFrame.isna: Indicate missing values.
2511 | DataFrame.notna : Indicate existing (non-missing) values.
2512 | DataFrame.fillna : Replace missing values.
2513 | Series.dropna : Drop missing values.
2514 | Index.dropna : Drop missing indices.
2515 |
2516 | Examples
2517 | --------
2518 | >>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
2519 | ... "toy": [np.nan, 'Batmobile', 'Bullwhip'],
2520 | ... "born": [pd.NaT, pd.Timestamp("1940-04-25"),
2521 | ... pd.NaT]})
2522 | >>> df
2523 | name toy born
2524 | 0 Alfred NaN NaT
2525 | 1 Batman Batmobile 1940-04-25
2526 | 2 Catwoman Bullwhip NaT
2527 |
2528 | Drop the rows where at least one element is missing.
2529 |
2530 | >>> df.dropna()
2531 | name toy born
2532 | 1 Batman Batmobile 1940-04-25
2533 |
2534 | Drop the columns where at least one element is missing.
2535 |
2536 | >>> df.dropna(axis='columns')
2537 | name
2538 | 0 Alfred
2539 | 1 Batman
2540 | 2 Catwoman
2541 |
2542 | Drop the rows where all elements are missing.
2543 |
2544 | >>> df.dropna(how='all')
2545 | name toy born
2546 | 0 Alfred NaN NaT
2547 | 1 Batman Batmobile 1940-04-25
2548 | 2 Catwoman Bullwhip NaT
2549 |
2550 | Keep only the rows with at least 2 non-NA values.
2551 |
2552 | >>> df.dropna(thresh=2)
2553 | name toy born
2554 | 1 Batman Batmobile 1940-04-25
2555 | 2 Catwoman Bullwhip NaT
2556 |
2557 | Define in which columns to look for missing values.
2558 |
2559 | >>> df.dropna(subset=['name', 'born'])
2560 | name toy born
2561 | 1 Batman Batmobile 1940-04-25
2562 |
2563 | Keep the DataFrame with valid entries in the same variable.
2564 |
2565 | >>> df.dropna(inplace=True)
2566 | >>> df
2567 | name toy born
2568 | 1 Batman Batmobile 1940-04-25
2569 |
2570 | duplicated(self, subset=None, keep='first')
2571 | Return boolean Series denoting duplicate rows, optionally only
2572 | considering certain columns
2573 |
2574 | Parameters
2575 | ----------
2576 | subset : column label or sequence of labels, optional
2577 | Only consider certain columns for identifying duplicates, by
2578 | default use all of the columns
2579 | keep : {'first', 'last', False}, default 'first'
2580 | - ``first`` : Mark duplicates as ``True`` except for the
2581 | first occurrence.
2582 | - ``last`` : Mark duplicates as ``True`` except for the
2583 | last occurrence.
2584 | - False : Mark all duplicates as ``True``.
2585 |
2586 | Returns
2587 | -------
2588 | duplicated : Series
2589 |
2590 | eq(self, other, axis='columns', level=None)
2591 | Wrapper for flexible comparison methods eq
2592 |
2593 | eval(self, expr, inplace=False, **kwargs)
2594 | Evaluate a string describing operations on DataFrame columns.
2595 |
2596 | Operates on columns only, not specific rows or elements. This allows
2597 | `eval` to run arbitrary code, which can make you vulnerable to code
2598 | injection if you pass user input to this function.
2599 |
2600 | Parameters
2601 | ----------
2602 | expr : str
2603 | The expression string to evaluate.
2604 | inplace : bool, default False
2605 | If the expression contains an assignment, whether to perform the
2606 | operation inplace and mutate the existing DataFrame. Otherwise,
2607 | a new DataFrame is returned.
2608 |
2609 | .. versionadded:: 0.18.0.
2610 | kwargs : dict
2611 | See the documentation for :func:`~pandas.eval` for complete details
2612 | on the keyword arguments accepted by
2613 | :meth:`~pandas.DataFrame.query`.
2614 |
2615 | Returns
2616 | -------
2617 | ndarray, scalar, or pandas object
2618 | The result of the evaluation.
2619 |
2620 | See Also
2621 | --------
2622 | DataFrame.query : Evaluates a boolean expression to query the columns
2623 | of a frame.
2624 | DataFrame.assign : Can evaluate an expression or function to create new
2625 | values for a column.
2626 | pandas.eval : Evaluate a Python expression as a string using various
2627 | backends.
2628 |
2629 | Notes
2630 | -----
2631 | For more details see the API documentation for :func:`~pandas.eval`.
2632 | For detailed examples see :ref:`enhancing performance with eval
2633 | <enhancingperf.eval>`.
2634 |
2635 | Examples
2636 | --------
2637 | >>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
2638 | >>> df
2639 | A B
2640 | 0 1 10
2641 | 1 2 8
2642 | 2 3 6
2643 | 3 4 4
2644 | 4 5 2
2645 | >>> df.eval('A + B')
2646 | 0 11
2647 | 1 10
2648 | 2 9
2649 | 3 8
2650 | 4 7
2651 | dtype: int64
2652 |
2653 | Assignment is allowed though by default the original DataFrame is not
2654 | modified.
2655 |
2656 | >>> df.eval('C = A + B')
2657 | A B C
2658 | 0 1 10 11
2659 | 1 2 8 10
2660 | 2 3 6 9
2661 | 3 4 4 8
2662 | 4 5 2 7
2663 | >>> df
2664 | A B
2665 | 0 1 10
2666 | 1 2 8
2667 | 2 3 6
2668 | 3 4 4
2669 | 4 5 2
2670 |
2671 | Use ``inplace=True`` to modify the original DataFrame.
2672 |
2673 | >>> df.eval('C = A + B', inplace=True)
2674 | >>> df
2675 | A B C
2676 | 0 1 10 11
2677 | 1 2 8 10
2678 | 2 3 6 9
2679 | 3 4 4 8
2680 | 4 5 2 7
2681 |
2682 | ewm(self, com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)
2683 | Provides exponential weighted functions
2684 |
2685 | .. versionadded:: 0.18.0
2686 |
2687 | Parameters
2688 | ----------
2689 | com : float, optional
2690 | Specify decay in terms of center of mass,
2691 | :math:`\alpha = 1 / (1 + com),\text{ for } com \geq 0`
2692 | span : float, optional
2693 | Specify decay in terms of span,
2694 | :math:`\alpha = 2 / (span + 1),\text{ for } span \geq 1`
2695 | halflife : float, optional
2696 | Specify decay in terms of half-life,
2697 | :math:`\alpha = 1 - exp(log(0.5) / halflife),\text{ for } halflife > 0`
2698 | alpha : float, optional
2699 | Specify smoothing factor :math:`\alpha` directly,
2700 | :math:`0 < \alpha \leq 1`
2701 |
2702 | .. versionadded:: 0.18.0
2703 |
2704 | min_periods : int, default 0
2705 | Minimum number of observations in window required to have a value
2706 | (otherwise result is NA).
2707 | adjust : boolean, default True
2708 | Divide by decaying adjustment factor in beginning periods to account
2709 | for imbalance in relative weightings (viewing EWMA as a moving average)
2710 | ignore_na : boolean, default False
2711 | Ignore missing values when calculating weights;
2712 | specify True to reproduce pre-0.15.0 behavior
2713 |
2714 | Returns
2715 | -------
2716 | a Window sub-classed for the particular operation
2717 |
2718 | Examples
2719 | --------
2720 |
2721 | >>> df = DataFrame({'B': [0, 1, 2, np.nan, 4]})
2722 | B
2723 | 0 0.0
2724 | 1 1.0
2725 | 2 2.0
2726 | 3 NaN
2727 | 4 4.0
2728 |
2729 | >>> df.ewm(com=0.5).mean()
2730 | B
2731 | 0 0.000000
2732 | 1 0.750000
2733 | 2 1.615385
2734 | 3 1.615385
2735 | 4 3.670213
2736 |
2737 | Notes
2738 | -----
2739 | Exactly one of center of mass, span, half-life, and alpha must be provided.
2740 | Allowed values and relationship between the parameters are specified in the
2741 | parameter descriptions above; see the link at the end of this section for
2742 | a detailed explanation.
2743 |
2744 | When adjust is True (default), weighted averages are calculated using
2745 | weights (1-alpha)**(n-1), (1-alpha)**(n-2), ..., 1-alpha, 1.
2746 |
2747 | When adjust is False, weighted averages are calculated recursively as:
2748 | weighted_average[0] = arg[0];
2749 | weighted_average[i] = (1-alpha)*weighted_average[i-1] + alpha*arg[i].
2750 |
2751 | When ignore_na is False (default), weights are based on absolute positions.
2752 | For example, the weights of x and y used in calculating the final weighted
2753 | average of [x, None, y] are (1-alpha)**2 and 1 (if adjust is True), and
2754 | (1-alpha)**2 and alpha (if adjust is False).
2755 |
2756 | When ignore_na is True (reproducing pre-0.15.0 behavior), weights are based
2757 | on relative positions. For example, the weights of x and y used in
2758 | calculating the final weighted average of [x, None, y] are 1-alpha and 1
2759 | (if adjust is True), and 1-alpha and alpha (if adjust is False).
2760 |
2761 | More details can be found at
2762 | http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-windows
2763 |
2764 | See Also
2765 | --------
2766 | rolling : Provides rolling window calculations
2767 | expanding : Provides expanding transformations.
2768 |
2769 | expanding(self, min_periods=1, center=False, axis=0)
2770 | Provides expanding transformations.
2771 |
2772 | .. versionadded:: 0.18.0
2773 |
2774 | Parameters
2775 | ----------
2776 | min_periods : int, default 1
2777 | Minimum number of observations in window required to have a value
2778 | (otherwise result is NA).
2779 | center : boolean, default False
2780 | Set the labels at the center of the window.
2781 | axis : int or string, default 0
2782 |
2783 | Returns
2784 | -------
2785 | a Window sub-classed for the particular operation
2786 |
2787 | Examples
2788 | --------
2789 |
2790 | >>> df = DataFrame({'B': [0, 1, 2, np.nan, 4]})
2791 | B
2792 | 0 0.0
2793 | 1 1.0
2794 | 2 2.0
2795 | 3 NaN
2796 | 4 4.0
2797 |
2798 | >>> df.expanding(2).sum()
2799 | B
2800 | 0 NaN
2801 | 1 1.0
2802 | 2 3.0
2803 | 3 3.0
2804 | 4 7.0
2805 |
2806 | Notes
2807 | -----
2808 | By default, the result is set to the right edge of the window. This can be
2809 | changed to the center of the window by setting ``center=True``.
2810 |
2811 | See Also
2812 | --------
2813 | rolling : Provides rolling window calculations
2814 | ewm : Provides exponential weighted functions
2815 |
2816 | fillna(self, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
2817 | Fill NA/NaN values using the specified method
2818 |
2819 | Parameters
2820 | ----------
2821 | value : scalar, dict, Series, or DataFrame
2822 | Value to use to fill holes (e.g. 0), alternately a
2823 | dict/Series/DataFrame of values specifying which value to use for
2824 | each index (for a Series) or column (for a DataFrame). (values not
2825 | in the dict/Series/DataFrame will not be filled). This value cannot
2826 | be a list.
2827 | method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
2828 | Method to use for filling holes in reindexed Series
2829 | pad / ffill: propagate last valid observation forward to next valid
2830 | backfill / bfill: use NEXT valid observation to fill gap
2831 | axis : {0 or 'index', 1 or 'columns'}
2832 | inplace : boolean, default False
2833 | If True, fill in place. Note: this will modify any
2834 | other views on this object, (e.g. a no-copy slice for a column in a
2835 | DataFrame).
2836 | limit : int, default None
2837 | If method is specified, this is the maximum number of consecutive
2838 | NaN values to forward/backward fill. In other words, if there is
2839 | a gap with more than this number of consecutive NaNs, it will only
2840 | be partially filled. If method is not specified, this is the
2841 | maximum number of entries along the entire axis where NaNs will be
2842 | filled. Must be greater than 0 if not None.
2843 | downcast : dict, default is None
2844 | a dict of item->dtype of what to downcast if possible,
2845 | or the string 'infer' which will try to downcast to an appropriate
2846 | equal type (e.g. float64 to int64 if possible)
2847 |
2848 | See Also
2849 | --------
2850 | interpolate : Fill NaN values using interpolation.
2851 | reindex, asfreq
2852 |
2853 | Returns
2854 | -------
2855 | filled : DataFrame
2856 |
2857 | Examples
2858 | --------
2859 | >>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
2860 | ... [3, 4, np.nan, 1],
2861 | ... [np.nan, np.nan, np.nan, 5],
2862 | ... [np.nan, 3, np.nan, 4]],
2863 | ... columns=list('ABCD'))
2864 | >>> df
2865 | A B C D
2866 | 0 NaN 2.0 NaN 0
2867 | 1 3.0 4.0 NaN 1
2868 | 2 NaN NaN NaN 5
2869 | 3 NaN 3.0 NaN 4
2870 |
2871 | Replace all NaN elements with 0s.
2872 |
2873 | >>> df.fillna(0)
2874 | A B C D
2875 | 0 0.0 2.0 0.0 0
2876 | 1 3.0 4.0 0.0 1
2877 | 2 0.0 0.0 0.0 5
2878 | 3 0.0 3.0 0.0 4
2879 |
2880 | We can also propagate non-null values forward or backward.
2881 |
2882 | >>> df.fillna(method='ffill')
2883 | A B C D
2884 | 0 NaN 2.0 NaN 0
2885 | 1 3.0 4.0 NaN 1
2886 | 2 3.0 4.0 NaN 5
2887 | 3 3.0 3.0 NaN 4
2888 |
2889 | Replace all NaN elements in column 'A', 'B', 'C', and 'D', with 0, 1,
2890 | 2, and 3 respectively.
2891 |
2892 | >>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
2893 | >>> df.fillna(value=values)
2894 | A B C D
2895 | 0 0.0 2.0 2.0 0
2896 | 1 3.0 4.0 2.0 1
2897 | 2 0.0 1.0 2.0 5
2898 | 3 0.0 3.0 2.0 4
2899 |
2900 | Only replace the first NaN element.
2901 |
2902 | >>> df.fillna(value=values, limit=1)
2903 | A B C D
2904 | 0 0.0 2.0 2.0 0
2905 | 1 3.0 4.0 NaN 1
2906 | 2 NaN 1.0 NaN 5
2907 | 3 NaN 3.0 NaN 4
2908 |
2909 | floordiv(self, other, axis='columns', level=None, fill_value=None)
2910 | Integer division of dataframe and other, element-wise (binary operator `floordiv`).
2911 |
2912 | Equivalent to ``dataframe // other``, but with support to substitute a fill_value for
2913 | missing data in one of the inputs.
2914 |
2915 | Parameters
2916 | ----------
2917 | other : Series, DataFrame, or constant
2918 | axis : {0, 1, 'index', 'columns'}
2919 | For Series input, axis to match Series index on
2920 | level : int or name
2921 | Broadcast across a level, matching Index values on the
2922 | passed MultiIndex level
2923 | fill_value : None or float value, default None
2924 | Fill existing missing (NaN) values, and any new element needed for
2925 | successful DataFrame alignment, with this value before computation.
2926 | If data in both corresponding DataFrame locations is missing
2927 | the result will be missing
2928 |
2929 | Notes
2930 | -----
2931 | Mismatched indices will be unioned together
2932 |
2933 | Returns
2934 | -------
2935 | result : DataFrame
2936 |
2937 | Examples
2938 | --------
2939 | None
2940 |
2941 | See also
2942 | --------
2943 | DataFrame.rfloordiv
2944 |
2945 | ge(self, other, axis='columns', level=None)
2946 | Wrapper for flexible comparison methods ge
2947 |
2948 | get_value(self, index, col, takeable=False)
2949 | Quickly retrieve single value at passed column and index
2950 |
2951 | .. deprecated:: 0.21.0
2952 | Use .at[] or .iat[] accessors instead.
2953 |
2954 | Parameters
2955 | ----------
2956 | index : row label
2957 | col : column label
2958 | takeable : interpret the index/col as indexers, default False
2959 |
2960 | Returns
2961 | -------
2962 | value : scalar value
2963 |
2964 | gt(self, other, axis='columns', level=None)
2965 | Wrapper for flexible comparison methods gt
2966 |
2967 | hist = hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwds)
2968 | Make a histogram of the DataFrame's.
2969 |
2970 | A `histogram`_ is a representation of the distribution of data.
2971 | This function calls :meth:`matplotlib.pyplot.hist`, on each series in
2972 | the DataFrame, resulting in one histogram per column.
2973 |
2974 | .. _histogram: https://en.wikipedia.org/wiki/Histogram
2975 |
2976 | Parameters
2977 | ----------
2978 | data : DataFrame
2979 | The pandas object holding the data.
2980 | column : string or sequence
2981 | If passed, will be used to limit data to a subset of columns.
2982 | by : object, optional
2983 | If passed, then used to form histograms for separate groups.
2984 | grid : boolean, default True
2985 | Whether to show axis grid lines.
2986 | xlabelsize : int, default None
2987 | If specified changes the x-axis label size.
2988 | xrot : float, default None
2989 | Rotation of x axis labels. For example, a value of 90 displays the
2990 | x labels rotated 90 degrees clockwise.
2991 | ylabelsize : int, default None
2992 | If specified changes the y-axis label size.
2993 | yrot : float, default None
2994 | Rotation of y axis labels. For example, a value of 90 displays the
2995 | y labels rotated 90 degrees clockwise.
2996 | ax : Matplotlib axes object, default None
2997 | The axes to plot the histogram on.
2998 | sharex : boolean, default True if ax is None else False
2999 | In case subplots=True, share x axis and set some x axis labels to
3000 | invisible; defaults to True if ax is None otherwise False if an ax
3001 | is passed in.
3002 | Note that passing in both an ax and sharex=True will alter all x axis
3003 | labels for all subplots in a figure.
3004 | sharey : boolean, default False
3005 | In case subplots=True, share y axis and set some y axis labels to
3006 | invisible.
3007 | figsize : tuple
3008 | The size in inches of the figure to create. Uses the value in
3009 | `matplotlib.rcParams` by default.
3010 | layout : tuple, optional
3011 | Tuple of (rows, columns) for the layout of the histograms.
3012 | bins : integer or sequence, default 10
3013 | Number of histogram bins to be used. If an integer is given, bins + 1
3014 | bin edges are calculated and returned. If bins is a sequence, gives
3015 | bin edges, including left edge of first bin and right edge of last
3016 | bin. In this case, bins is returned unmodified.
3017 | **kwds
3018 | All other plotting keyword arguments to be passed to
3019 | :meth:`matplotlib.pyplot.hist`.
3020 |
3021 | Returns
3022 | -------
3023 | axes : matplotlib.AxesSubplot or numpy.ndarray of them
3024 |
3025 | See Also
3026 | --------
3027 | matplotlib.pyplot.hist : Plot a histogram using matplotlib.
3028 |
3029 | Examples
3030 | --------
3031 |
3032 | .. plot::
3033 | :context: close-figs
3034 |
3035 | This example draws a histogram based on the length and width of
3036 | some animals, displayed in three bins
3037 |
3038 | >>> df = pd.DataFrame({
3039 | ... 'length': [1.5, 0.5, 1.2, 0.9, 3],
3040 | ... 'width': [0.7, 0.2, 0.15, 0.2, 1.1]
3041 | ... }, index= ['pig', 'rabbit', 'duck', 'chicken', 'horse'])
3042 | >>> hist = df.hist(bins=3)
3043 |
3044 | idxmax(self, axis=0, skipna=True)
3045 | Return index of first occurrence of maximum over requested axis.
3046 | NA/null values are excluded.
3047 |
3048 | Parameters
3049 | ----------
3050 | axis : {0 or 'index', 1 or 'columns'}, default 0
3051 | 0 or 'index' for row-wise, 1 or 'columns' for column-wise
3052 | skipna : boolean, default True
3053 | Exclude NA/null values. If an entire row/column is NA, the result
3054 | will be NA.
3055 |
3056 | Raises
3057 | ------
3058 | ValueError
3059 | * If the row/column is empty
3060 |
3061 | Returns
3062 | -------
3063 | idxmax : Series
3064 |
3065 | Notes
3066 | -----
3067 | This method is the DataFrame version of ``ndarray.argmax``.
3068 |
3069 | See Also
3070 | --------
3071 | Series.idxmax
3072 |
3073 | idxmin(self, axis=0, skipna=True)
3074 | Return index of first occurrence of minimum over requested axis.
3075 | NA/null values are excluded.
3076 |
3077 | Parameters
3078 | ----------
3079 | axis : {0 or 'index', 1 or 'columns'}, default 0
3080 | 0 or 'index' for row-wise, 1 or 'columns' for column-wise
3081 | skipna : boolean, default True
3082 | Exclude NA/null values. If an entire row/column is NA, the result
3083 | will be NA.
3084 |
3085 | Raises
3086 | ------
3087 | ValueError
3088 | * If the row/column is empty
3089 |
3090 | Returns
3091 | -------
3092 | idxmin : Series
3093 |
3094 | Notes
3095 | -----
3096 | This method is the DataFrame version of ``ndarray.argmin``.
3097 |
3098 | See Also
3099 | --------
3100 | Series.idxmin
3101 |
3102 | info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)
3103 | Print a concise summary of a DataFrame.
3104 |
3105 | This method prints information about a DataFrame including
3106 | the index dtype and column dtypes, non-null values and memory usage.
3107 |
3108 | Parameters
3109 | ----------
3110 | verbose : bool, optional
3111 | Whether to print the full summary. By default, the setting in
3112 | ``pandas.options.display.max_info_columns`` is followed.
3113 | buf : writable buffer, defaults to sys.stdout
3114 | Where to send the output. By default, the output is printed to
3115 | sys.stdout. Pass a writable buffer if you need to further process
3116 | the output.
3117 | max_cols : int, optional
3118 | When to switch from the verbose to the truncated output. If the
3119 | DataFrame has more than `max_cols` columns, the truncated output
3120 | is used. By default, the setting in
3121 | ``pandas.options.display.max_info_columns`` is used.
3122 | memory_usage : bool, str, optional
3123 | Specifies whether total memory usage of the DataFrame
3124 | elements (including the index) should be displayed. By default,
3125 | this follows the ``pandas.options.display.memory_usage`` setting.
3126 |
3127 | True always show memory usage. False never shows memory usage.
3128 | A value of 'deep' is equivalent to "True with deep introspection".
3129 | Memory usage is shown in human-readable units (base-2
3130 | representation). Without deep introspection a memory estimation is
3131 | made based in column dtype and number of rows assuming values
3132 | consume the same memory amount for corresponding dtypes. With deep
3133 | memory introspection, a real memory usage calculation is performed
3134 | at the cost of computational resources.
3135 | null_counts : bool, optional
3136 | Whether to show the non-null counts. By default, this is shown
3137 | only if the frame is smaller than
3138 | ``pandas.options.display.max_info_rows`` and
3139 | ``pandas.options.display.max_info_columns``. A value of True always
3140 | shows the counts, and False never shows the counts.
3141 |
3142 | Returns
3143 | -------
3144 | None
3145 | This method prints a summary of a DataFrame and returns None.
3146 |
3147 | See Also
3148 | --------
3149 | DataFrame.describe: Generate descriptive statistics of DataFrame
3150 | columns.
3151 | DataFrame.memory_usage: Memory usage of DataFrame columns.
3152 |
3153 | Examples
3154 | --------
3155 | >>> int_values = [1, 2, 3, 4, 5]
3156 | >>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
3157 | >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
3158 | >>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
3159 | ... "float_col": float_values})
3160 | >>> df
3161 | int_col text_col float_col
3162 | 0 1 alpha 0.00
3163 | 1 2 beta 0.25
3164 | 2 3 gamma 0.50
3165 | 3 4 delta 0.75
3166 | 4 5 epsilon 1.00
3167 |
3168 | Prints information of all columns:
3169 |
3170 | >>> df.info(verbose=True)
3171 | <class 'pandas.core.frame.DataFrame'>
3172 | RangeIndex: 5 entries, 0 to 4
3173 | Data columns (total 3 columns):
3174 | int_col 5 non-null int64
3175 | text_col 5 non-null object
3176 | float_col 5 non-null float64
3177 | dtypes: float64(1), int64(1), object(1)
3178 | memory usage: 200.0+ bytes
3179 |
3180 | Prints a summary of columns count and its dtypes but not per column
3181 | information:
3182 |
3183 | >>> df.info(verbose=False)
3184 | <class 'pandas.core.frame.DataFrame'>
3185 | RangeIndex: 5 entries, 0 to 4
3186 | Columns: 3 entries, int_col to float_col
3187 | dtypes: float64(1), int64(1), object(1)
3188 | memory usage: 200.0+ bytes
3189 |
3190 | Pipe output of DataFrame.info to buffer instead of sys.stdout, get
3191 | buffer content and writes to a text file:
3192 |
3193 | >>> import io
3194 | >>> buffer = io.StringIO()
3195 | >>> df.info(buf=buffer)
3196 | >>> s = buffer.getvalue()
3197 | >>> with open("df_info.txt", "w", encoding="utf-8") as f:
3198 | ... f.write(s)
3199 | 260
3200 |
3201 | The `memory_usage` parameter allows deep introspection mode, specially
3202 | useful for big DataFrames and fine-tune memory optimization:
3203 |
3204 | >>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
3205 | >>> df = pd.DataFrame({
3206 | ... 'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
3207 | ... 'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
3208 | ... 'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
3209 | ... })
3210 | >>> df.info()
3211 | <class 'pandas.core.frame.DataFrame'>
3212 | RangeIndex: 1000000 entries, 0 to 999999
3213 | Data columns (total 3 columns):
3214 | column_1 1000000 non-null object
3215 | column_2 1000000 non-null object
3216 | column_3 1000000 non-null object
3217 | dtypes: object(3)
3218 | memory usage: 22.9+ MB
3219 |
3220 | >>> df.info(memory_usage='deep')
3221 | <class 'pandas.core.frame.DataFrame'>
3222 | RangeIndex: 1000000 entries, 0 to 999999
3223 | Data columns (total 3 columns):
3224 | column_1 1000000 non-null object
3225 | column_2 1000000 non-null object
3226 | column_3 1000000 non-null object
3227 | dtypes: object(3)
3228 | memory usage: 188.8 MB
3229 |
3230 | insert(self, loc, column, value, allow_duplicates=False)
3231 | Insert column into DataFrame at specified location.
3232 |
3233 | Raises a ValueError if `column` is already contained in the DataFrame,
3234 | unless `allow_duplicates` is set to True.
3235 |
3236 | Parameters
3237 | ----------
3238 | loc : int
3239 | Insertion index. Must verify 0 <= loc <= len(columns)
3240 | column : string, number, or hashable object
3241 | label of the inserted column
3242 | value : int, Series, or array-like
3243 | allow_duplicates : bool, optional
3244 |
3245 | isin(self, values)
3246 | Return boolean DataFrame showing whether each element in the
3247 | DataFrame is contained in values.
3248 |
3249 | Parameters
3250 | ----------
3251 | values : iterable, Series, DataFrame or dictionary
3252 | The result will only be true at a location if all the
3253 | labels match. If `values` is a Series, that's the index. If
3254 | `values` is a dictionary, the keys must be the column names,
3255 | which must match. If `values` is a DataFrame,
3256 | then both the index and column labels must match.
3257 |
3258 | Returns
3259 | -------
3260 |
3261 | DataFrame of booleans
3262 |
3263 | Examples
3264 | --------
3265 | When ``values`` is a list:
3266 |
3267 | >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
3268 | >>> df.isin([1, 3, 12, 'a'])
3269 | A B
3270 | 0 True True
3271 | 1 False False
3272 | 2 True False
3273 |
3274 | When ``values`` is a dict:
3275 |
3276 | >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
3277 | >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
3278 | A B
3279 | 0 True False # Note that B didn't match the 1 here.
3280 | 1 False True
3281 | 2 True True
3282 |
3283 | When ``values`` is a Series or DataFrame:
3284 |
3285 | >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
3286 | >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
3287 | >>> df.isin(other)
3288 | A B
3289 | 0 True False
3290 | 1 False False # Column A in `other` has a 3, but not at index 1.
3291 | 2 True True
3292 |
3293 | isna(self)
3294 | Detect missing values.
3295 |
3296 | Return a boolean same-sized object indicating if the values are NA.
3297 | NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
3298 | values.
3299 | Everything else gets mapped to False values. Characters such as empty
3300 | strings ``''`` or :attr:`numpy.inf` are not considered NA values
3301 | (unless you set ``pandas.options.mode.use_inf_as_na = True``).
3302 |
3303 | Returns
3304 | -------
3305 | DataFrame
3306 | Mask of bool values for each element in DataFrame that
3307 | indicates whether an element is not an NA value.
3308 |
3309 | See Also
3310 | --------
3311 | DataFrame.isnull : alias of isna
3312 | DataFrame.notna : boolean inverse of isna
3313 | DataFrame.dropna : omit axes labels with missing values
3314 | isna : top-level isna
3315 |
3316 | Examples
3317 | --------
3318 | Show which entries in a DataFrame are NA.
3319 |
3320 | >>> df = pd.DataFrame({'age': [5, 6, np.NaN],
3321 | ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'),
3322 | ... pd.Timestamp('1940-04-25')],
3323 | ... 'name': ['Alfred', 'Batman', ''],
3324 | ... 'toy': [None, 'Batmobile', 'Joker']})
3325 | >>> df
3326 | age born name toy
3327 | 0 5.0 NaT Alfred None
3328 | 1 6.0 1939-05-27 Batman Batmobile
3329 | 2 NaN 1940-04-25 Joker
3330 |
3331 | >>> df.isna()
3332 | age born name toy
3333 | 0 False True False True
3334 | 1 False False False False
3335 | 2 True False False False
3336 |
3337 | Show which entries in a Series are NA.
3338 |
3339 | >>> ser = pd.Series([5, 6, np.NaN])
3340 | >>> ser
3341 | 0 5.0
3342 | 1 6.0
3343 | 2 NaN
3344 | dtype: float64
3345 |
3346 | >>> ser.isna()
3347 | 0 False
3348 | 1 False
3349 | 2 True
3350 | dtype: bool
3351 |
3352 | isnull(self)
3353 | Detect missing values.
3354 |
3355 | Return a boolean same-sized object indicating if the values are NA.
3356 | NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
3357 | values.
3358 | Everything else gets mapped to False values. Characters such as empty
3359 | strings ``''`` or :attr:`numpy.inf` are not considered NA values
3360 | (unless you set ``pandas.options.mode.use_inf_as_na = True``).
3361 |
3362 | Returns
3363 | -------
3364 | DataFrame
3365 | Mask of bool values for each element in DataFrame that
3366 | indicates whether an element is not an NA value.
3367 |
3368 | See Also
3369 | --------
3370 | DataFrame.isnull : alias of isna
3371 | DataFrame.notna : boolean inverse of isna
3372 | DataFrame.dropna : omit axes labels with missing values
3373 | isna : top-level isna
3374 |
3375 | Examples
3376 | --------
3377 | Show which entries in a DataFrame are NA.
3378 |
3379 | >>> df = pd.DataFrame({'age': [5, 6, np.NaN],
3380 | ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'),
3381 | ... pd.Timestamp('1940-04-25')],
3382 | ... 'name': ['Alfred', 'Batman', ''],
3383 | ... 'toy': [None, 'Batmobile', 'Joker']})
3384 | >>> df
3385 | age born name toy
3386 | 0 5.0 NaT Alfred None
3387 | 1 6.0 1939-05-27 Batman Batmobile
3388 | 2 NaN 1940-04-25 Joker
3389 |
3390 | >>> df.isna()
3391 | age born name toy
3392 | 0 False True False True
3393 | 1 False False False False
3394 | 2 True False False False
3395 |
3396 | Show which entries in a Series are NA.
3397 |
3398 | >>> ser = pd.Series([5, 6, np.NaN])
3399 | >>> ser
3400 | 0 5.0
3401 | 1 6.0
3402 | 2 NaN
3403 | dtype: float64
3404 |
3405 | >>> ser.isna()
3406 | 0 False
3407 | 1 False
3408 | 2 True
3409 | dtype: bool
3410 |
3411 | items = iteritems(self)
3412 |
3413 | iteritems(self)
3414 | Iterator over (column name, Series) pairs.
3415 |
3416 | See also
3417 | --------
3418 | iterrows : Iterate over DataFrame rows as (index, Series) pairs.
3419 | itertuples : Iterate over DataFrame rows as namedtuples of the values.
3420 |
3421 | iterrows(self)
3422 | Iterate over DataFrame rows as (index, Series) pairs.
3423 |
3424 | Notes
3425 | -----
3426 |
3427 | 1. Because ``iterrows`` returns a Series for each row,
3428 | it does **not** preserve dtypes across the rows (dtypes are
3429 | preserved across columns for DataFrames). For example,
3430 |
3431 | >>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
3432 | >>> row = next(df.iterrows())[1]
3433 | >>> row
3434 | int 1.0
3435 | float 1.5
3436 | Name: 0, dtype: float64
3437 | >>> print(row['int'].dtype)
3438 | float64
3439 | >>> print(df['int'].dtype)
3440 | int64
3441 |
3442 | To preserve dtypes while iterating over the rows, it is better
3443 | to use :meth:`itertuples` which returns namedtuples of the values
3444 | and which is generally faster than ``iterrows``.
3445 |
3446 | 2. You should **never modify** something you are iterating over.
3447 | This is not guaranteed to work in all cases. Depending on the
3448 | data types, the iterator returns a copy and not a view, and writing
3449 | to it will have no effect.
3450 |
3451 | Returns
3452 | -------
3453 | it : generator
3454 | A generator that iterates over the rows of the frame.
3455 |
3456 | See also
3457 | --------
3458 | itertuples : Iterate over DataFrame rows as namedtuples of the values.
3459 | iteritems : Iterate over (column name, Series) pairs.
3460 |
3461 | itertuples(self, index=True, name='Pandas')
3462 | Iterate over DataFrame rows as namedtuples, with index value as first
3463 | element of the tuple.
3464 |
3465 | Parameters
3466 | ----------
3467 | index : boolean, default True
3468 | If True, return the index as the first element of the tuple.
3469 | name : string, default "Pandas"
3470 | The name of the returned namedtuples or None to return regular
3471 | tuples.
3472 |
3473 | Notes
3474 | -----
3475 | The column names will be renamed to positional names if they are
3476 | invalid Python identifiers, repeated, or start with an underscore.
3477 | With a large number of columns (>255), regular tuples are returned.
3478 |
3479 | See also
3480 | --------
3481 | iterrows : Iterate over DataFrame rows as (index, Series) pairs.
3482 | iteritems : Iterate over (column name, Series) pairs.
3483 |
3484 | Examples
3485 | --------
3486 |
3487 | >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
3488 | index=['a', 'b'])
3489 | >>> df
3490 | col1 col2
3491 | a 1 0.1
3492 | b 2 0.2
3493 | >>> for row in df.itertuples():
3494 | ... print(row)
3495 | ...
3496 | Pandas(Index='a', col1=1, col2=0.10000000000000001)
3497 | Pandas(Index='b', col1=2, col2=0.20000000000000001)
3498 |
3499 | join(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
3500 | Join columns with other DataFrame either on index or on a key
3501 | column. Efficiently Join multiple DataFrame objects by index at once by
3502 | passing a list.
3503 |
3504 | Parameters
3505 | ----------
3506 | other : DataFrame, Series with name field set, or list of DataFrame
3507 | Index should be similar to one of the columns in this one. If a
3508 | Series is passed, its name attribute must be set, and that will be
3509 | used as the column name in the resulting joined DataFrame
3510 | on : name, tuple/list of names, or array-like
3511 | Column or index level name(s) in the caller to join on the index
3512 | in `other`, otherwise joins index-on-index. If multiple
3513 | values given, the `other` DataFrame must have a MultiIndex. Can
3514 | pass an array as the join key if it is not already contained in
3515 | the calling DataFrame. Like an Excel VLOOKUP operation
3516 | how : {'left', 'right', 'outer', 'inner'}, default: 'left'
3517 | How to handle the operation of the two objects.
3518 |
3519 | * left: use calling frame's index (or column if on is specified)
3520 | * right: use other frame's index
3521 | * outer: form union of calling frame's index (or column if on is
3522 | specified) with other frame's index, and sort it
3523 | lexicographically
3524 | * inner: form intersection of calling frame's index (or column if
3525 | on is specified) with other frame's index, preserving the order
3526 | of the calling's one
3527 | lsuffix : string
3528 | Suffix to use from left frame's overlapping columns
3529 | rsuffix : string
3530 | Suffix to use from right frame's overlapping columns
3531 | sort : boolean, default False
3532 | Order result DataFrame lexicographically by the join key. If False,
3533 | the order of the join key depends on the join type (how keyword)
3534 |
3535 | Notes
3536 | -----
3537 | on, lsuffix, and rsuffix options are not supported when passing a list
3538 | of DataFrame objects
3539 |
3540 | Support for specifying index levels as the `on` parameter was added
3541 | in version 0.23.0
3542 |
3543 | Examples
3544 | --------
3545 | >>> caller = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
3546 | ... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
3547 |
3548 | >>> caller
3549 | A key
3550 | 0 A0 K0
3551 | 1 A1 K1
3552 | 2 A2 K2
3553 | 3 A3 K3
3554 | 4 A4 K4
3555 | 5 A5 K5
3556 |
3557 | >>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
3558 | ... 'B': ['B0', 'B1', 'B2']})
3559 |
3560 | >>> other
3561 | B key
3562 | 0 B0 K0
3563 | 1 B1 K1
3564 | 2 B2 K2
3565 |
3566 | Join DataFrames using their indexes.
3567 |
3568 | >>> caller.join(other, lsuffix='_caller', rsuffix='_other')
3569 |
3570 | >>> A key_caller B key_other
3571 | 0 A0 K0 B0 K0
3572 | 1 A1 K1 B1 K1
3573 | 2 A2 K2 B2 K2
3574 | 3 A3 K3 NaN NaN
3575 | 4 A4 K4 NaN NaN
3576 | 5 A5 K5 NaN NaN
3577 |
3578 |
3579 | If we want to join using the key columns, we need to set key to be
3580 | the index in both caller and other. The joined DataFrame will have
3581 | key as its index.
3582 |
3583 | >>> caller.set_index('key').join(other.set_index('key'))
3584 |
3585 | >>> A B
3586 | key
3587 | K0 A0 B0
3588 | K1 A1 B1
3589 | K2 A2 B2
3590 | K3 A3 NaN
3591 | K4 A4 NaN
3592 | K5 A5 NaN
3593 |
3594 | Another option to join using the key columns is to use the on
3595 | parameter. DataFrame.join always uses other's index but we can use any
3596 | column in the caller. This method preserves the original caller's
3597 | index in the result.
3598 |
3599 | >>> caller.join(other.set_index('key'), on='key')
3600 |
3601 | >>> A key B
3602 | 0 A0 K0 B0
3603 | 1 A1 K1 B1
3604 | 2 A2 K2 B2
3605 | 3 A3 K3 NaN
3606 | 4 A4 K4 NaN
3607 | 5 A5 K5 NaN
3608 |
3609 |
3610 | See also
3611 | --------
3612 | DataFrame.merge : For column(s)-on-columns(s) operations
3613 |
3614 | Returns
3615 | -------
3616 | joined : DataFrame
3617 |
3618 | kurt(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
3619 | Return unbiased kurtosis over requested axis using Fisher's definition of
3620 | kurtosis (kurtosis of normal == 0.0). Normalized by N-1
3621 |
3622 |
3623 | Parameters
3624 | ----------
3625 | axis : {index (0), columns (1)}
3626 | skipna : boolean, default True
3627 | Exclude NA/null values when computing the result.
3628 | level : int or level name, default None
3629 | If the axis is a MultiIndex (hierarchical), count along a
3630 | particular level, collapsing into a Series
3631 | numeric_only : boolean, default None
3632 | Include only float, int, boolean columns. If None, will attempt to use
3633 | everything, then use only numeric data. Not implemented for Series.
3634 |
3635 | Returns
3636 | -------
3637 | kurt : Series or DataFrame (if level specified)
3638 |
3639 | kurtosis = kurt(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
3640 |
3641 | le(self, other, axis='columns', level=None)
3642 | Wrapper for flexible comparison methods le
3643 |
3644 | lookup(self, row_labels, col_labels)
3645 | Label-based "fancy indexing" function for DataFrame.
3646 | Given equal-length arrays of row and column labels, return an
3647 | array of the values corresponding to each (row, col) pair.
3648 |
3649 | Parameters
3650 | ----------
3651 | row_labels : sequence
3652 | The row labels to use for lookup
3653 | col_labels : sequence
3654 | The column labels to use for lookup
3655 |
3656 | Notes
3657 | -----
3658 | Akin to::
3659 |
3660 | result = []
3661 | for row, col in zip(row_labels, col_labels):
3662 | result.append(df.get_value(row, col))
3663 |
3664 | Examples
3665 | --------
3666 | values : ndarray
3667 | The found values
3668 |
3669 | lt(self, other, axis='columns', level=None)
3670 | Wrapper for flexible comparison methods lt
3671 |
3672 | mad(self, axis=None, skipna=None, level=None)
3673 | Return the mean absolute deviation of the values for the requested axis
3674 |
3675 | Parameters
3676 | ----------
3677 | axis : {index (0), columns (1)}
3678 | skipna : boolean, default True
3679 | Exclude NA/null values when computing the result.
3680 | level : int or level name, default None
3681 | If the axis is a MultiIndex (hierarchical), count along a
3682 | particular level, collapsing into a Series
3683 | numeric_only : boolean, default None
3684 | Include only float, int, boolean columns. If None, will attempt to use
3685 | everything, then use only numeric data. Not implemented for Series.
3686 |
3687 | Returns
3688 | -------
3689 | mad : Series or DataFrame (if level specified)
3690 |
3691 | max(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
3692 | This method returns the maximum of the values in the object.
3693 | If you want the *index* of the maximum, use ``idxmax``. This is
3694 | the equivalent of the ``numpy.ndarray`` method ``argmax``.
3695 |
3696 | Parameters
3697 | ----------
3698 | axis : {index (0), columns (1)}
3699 | skipna : boolean, default True
3700 | Exclude NA/null values when computing the result.
3701 | level : int or level name, default None
3702 | If the axis is a MultiIndex (hierarchical), count along a
3703 | particular level, collapsing into a Series
3704 | numeric_only : boolean, default None
3705 | Include only float, int, boolean columns. If None, will attempt to use
3706 | everything, then use only numeric data. Not implemented for Series.
3707 |
3708 | Returns
3709 | -------
3710 | max : Series or DataFrame (if level specified)
3711 |
3712 | mean(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
3713 | Return the mean of the values for the requested axis
3714 |
3715 | Parameters
3716 | ----------
3717 | axis : {index (0), columns (1)}
3718 | skipna : boolean, default True
3719 | Exclude NA/null values when computing the result.
3720 | level : int or level name, default None
3721 | If the axis is a MultiIndex (hierarchical), count along a
3722 | particular level, collapsing into a Series
3723 | numeric_only : boolean, default None
3724 | Include only float, int, boolean columns. If None, will attempt to use
3725 | everything, then use only numeric data. Not implemented for Series.
3726 |
3727 | Returns
3728 | -------
3729 | mean : Series or DataFrame (if level specified)
3730 |
3731 | median(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
3732 | Return the median of the values for the requested axis
3733 |
3734 | Parameters
3735 | ----------
3736 | axis : {index (0), columns (1)}
3737 | skipna : boolean, default True
3738 | Exclude NA/null values when computing the result.
3739 | level : int or level name, default None
3740 | If the axis is a MultiIndex (hierarchical), count along a
3741 | particular level, collapsing into a Series
3742 | numeric_only : boolean, default None
3743 | Include only float, int, boolean columns. If None, will attempt to use
3744 | everything, then use only numeric data. Not implemented for Series.
3745 |
3746 | Returns
3747 | -------
3748 | median : Series or DataFrame (if level specified)
3749 |
3750 | melt(self, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
3751 | "Unpivots" a DataFrame from wide format to long format, optionally
3752 | leaving identifier variables set.
3753 |
3754 | This function is useful to massage a DataFrame into a format where one
3755 | or more columns are identifier variables (`id_vars`), while all other
3756 | columns, considered measured variables (`value_vars`), are "unpivoted" to
3757 | the row axis, leaving just two non-identifier columns, 'variable' and
3758 | 'value'.
3759 |
3760 | .. versionadded:: 0.20.0
3761 |
3762 | Parameters
3763 | ----------
3764 | frame : DataFrame
3765 | id_vars : tuple, list, or ndarray, optional
3766 | Column(s) to use as identifier variables.
3767 | value_vars : tuple, list, or ndarray, optional
3768 | Column(s) to unpivot. If not specified, uses all columns that
3769 | are not set as `id_vars`.
3770 | var_name : scalar
3771 | Name to use for the 'variable' column. If None it uses
3772 | ``frame.columns.name`` or 'variable'.
3773 | value_name : scalar, default 'value'
3774 | Name to use for the 'value' column.
3775 | col_level : int or string, optional
3776 | If columns are a MultiIndex then use this level to melt.
3777 |
3778 | See also
3779 | --------
3780 | melt
3781 | pivot_table
3782 | DataFrame.pivot
3783 |
3784 | Examples
3785 | --------
3786 | >>> import pandas as pd
3787 | >>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
3788 | ... 'B': {0: 1, 1: 3, 2: 5},
3789 | ... 'C': {0: 2, 1: 4, 2: 6}})
3790 | >>> df
3791 | A B C
3792 | 0 a 1 2
3793 | 1 b 3 4
3794 | 2 c 5 6
3795 |
3796 | >>> df.melt(id_vars=['A'], value_vars=['B'])
3797 | A variable value
3798 | 0 a B 1
3799 | 1 b B 3
3800 | 2 c B 5
3801 |
3802 | >>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
3803 | A variable value
3804 | 0 a B 1
3805 | 1 b B 3
3806 | 2 c B 5
3807 | 3 a C 2
3808 | 4 b C 4
3809 | 5 c C 6
3810 |
3811 | The names of 'variable' and 'value' columns can be customized:
3812 |
3813 | >>> df.melt(id_vars=['A'], value_vars=['B'],
3814 | ... var_name='myVarname', value_name='myValname')
3815 | A myVarname myValname
3816 | 0 a B 1
3817 | 1 b B 3
3818 | 2 c B 5
3819 |
3820 | If you have multi-index columns:
3821 |
3822 | >>> df.columns = [list('ABC'), list('DEF')]
3823 | >>> df
3824 | A B C
3825 | D E F
3826 | 0 a 1 2
3827 | 1 b 3 4
3828 | 2 c 5 6
3829 |
3830 | >>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
3831 | A variable value
3832 | 0 a B 1
3833 | 1 b B 3
3834 | 2 c B 5
3835 |
3836 | >>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
3837 | (A, D) variable_0 variable_1 value
3838 | 0 a B E 1
3839 | 1 b B E 3
3840 | 2 c B E 5
3841 |
3842 | memory_usage(self, index=True, deep=False)
3843 | Return the memory usage of each column in bytes.
3844 |
3845 | The memory usage can optionally include the contribution of
3846 | the index and elements of `object` dtype.
3847 |
3848 | This value is displayed in `DataFrame.info` by default. This can be
3849 | suppressed by setting ``pandas.options.display.memory_usage`` to False.
3850 |
3851 | Parameters
3852 | ----------
3853 | index : bool, default True
3854 | Specifies whether to include the memory usage of the DataFrame's
3855 | index in returned Series. If ``index=True`` the memory usage of the
3856 | index the first item in the output.
3857 | deep : bool, default False
3858 | If True, introspect the data deeply by interrogating
3859 | `object` dtypes for system-level memory consumption, and include
3860 | it in the returned values.
3861 |
3862 | Returns
3863 | -------
3864 | sizes : Series
3865 | A Series whose index is the original column names and whose values
3866 | is the memory usage of each column in bytes.
3867 |
3868 | See Also
3869 | --------
3870 | numpy.ndarray.nbytes : Total bytes consumed by the elements of an
3871 | ndarray.
3872 | Series.memory_usage : Bytes consumed by a Series.
3873 | pandas.Categorical : Memory-efficient array for string values with
3874 | many repeated values.
3875 | DataFrame.info : Concise summary of a DataFrame.
3876 |
3877 | Examples
3878 | --------
3879 | >>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
3880 | >>> data = dict([(t, np.ones(shape=5000).astype(t))
3881 | ... for t in dtypes])
3882 | >>> df = pd.DataFrame(data)
3883 | >>> df.head()
3884 | int64 float64 complex128 object bool
3885 | 0 1 1.0 (1+0j) 1 True
3886 | 1 1 1.0 (1+0j) 1 True
3887 | 2 1 1.0 (1+0j) 1 True
3888 | 3 1 1.0 (1+0j) 1 True
3889 | 4 1 1.0 (1+0j) 1 True
3890 |
3891 | >>> df.memory_usage()
3892 | Index 80
3893 | int64 40000
3894 | float64 40000
3895 | complex128 80000
3896 | object 40000
3897 | bool 5000
3898 | dtype: int64
3899 |
3900 | >>> df.memory_usage(index=False)
3901 | int64 40000
3902 | float64 40000
3903 | complex128 80000
3904 | object 40000
3905 | bool 5000
3906 | dtype: int64
3907 |
3908 | The memory footprint of `object` dtype columns is ignored by default:
3909 |
3910 | >>> df.memory_usage(deep=True)
3911 | Index 80
3912 | int64 40000
3913 | float64 40000
3914 | complex128 80000
3915 | object 160000
3916 | bool 5000
3917 | dtype: int64
3918 |
3919 | Use a Categorical for efficient storage of an object-dtype column with
3920 | many repeated values.
3921 |
3922 | >>> df['object'].astype('category').memory_usage(deep=True)
3923 | 5168
3924 |
3925 | merge(self, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
3926 | Merge DataFrame objects by performing a database-style join operation by
3927 | columns or indexes.
3928 |
3929 | If joining columns on columns, the DataFrame indexes *will be
3930 | ignored*. Otherwise if joining indexes on indexes or indexes on a column or
3931 | columns, the index will be passed on.
3932 |
3933 | Parameters
3934 | ----------
3935 | right : DataFrame
3936 | how : {'left', 'right', 'outer', 'inner'}, default 'inner'
3937 | * left: use only keys from left frame, similar to a SQL left outer join;
3938 | preserve key order
3939 | * right: use only keys from right frame, similar to a SQL right outer join;
3940 | preserve key order
3941 | * outer: use union of keys from both frames, similar to a SQL full outer
3942 | join; sort keys lexicographically
3943 | * inner: use intersection of keys from both frames, similar to a SQL inner
3944 | join; preserve the order of the left keys
3945 | on : label or list
3946 | Column or index level names to join on. These must be found in both
3947 | DataFrames. If `on` is None and not merging on indexes then this defaults
3948 | to the intersection of the columns in both DataFrames.
3949 | left_on : label or list, or array-like
3950 | Column or index level names to join on in the left DataFrame. Can also
3951 | be an array or list of arrays of the length of the left DataFrame.
3952 | These arrays are treated as if they are columns.
3953 | right_on : label or list, or array-like
3954 | Column or index level names to join on in the right DataFrame. Can also
3955 | be an array or list of arrays of the length of the right DataFrame.
3956 | These arrays are treated as if they are columns.
3957 | left_index : boolean, default False
3958 | Use the index from the left DataFrame as the join key(s). If it is a
3959 | MultiIndex, the number of keys in the other DataFrame (either the index
3960 | or a number of columns) must match the number of levels
3961 | right_index : boolean, default False
3962 | Use the index from the right DataFrame as the join key. Same caveats as
3963 | left_index
3964 | sort : boolean, default False
3965 | Sort the join keys lexicographically in the result DataFrame. If False,
3966 | the order of the join keys depends on the join type (how keyword)
3967 | suffixes : 2-length sequence (tuple, list, ...)
3968 | Suffix to apply to overlapping column names in the left and right
3969 | side, respectively
3970 | copy : boolean, default True
3971 | If False, do not copy data unnecessarily
3972 | indicator : boolean or string, default False
3973 | If True, adds a column to output DataFrame called "_merge" with
3974 | information on the source of each row.
3975 | If string, column with information on source of each row will be added to
3976 | output DataFrame, and column will be named value of string.
3977 | Information column is Categorical-type and takes on a value of "left_only"
3978 | for observations whose merge key only appears in 'left' DataFrame,
3979 | "right_only" for observations whose merge key only appears in 'right'
3980 | DataFrame, and "both" if the observation's merge key is found in both.
3981 |
3982 | validate : string, default None
3983 | If specified, checks if merge is of specified type.
3984 |
3985 | * "one_to_one" or "1:1": check if merge keys are unique in both
3986 | left and right datasets.
3987 | * "one_to_many" or "1:m": check if merge keys are unique in left
3988 | dataset.
3989 | * "many_to_one" or "m:1": check if merge keys are unique in right
3990 | dataset.
3991 | * "many_to_many" or "m:m": allowed, but does not result in checks.
3992 |
3993 | .. versionadded:: 0.21.0
3994 |
3995 | Notes
3996 | -----
3997 | Support for specifying index levels as the `on`, `left_on`, and
3998 | `right_on` parameters was added in version 0.23.0
3999 |
4000 | Examples
4001 | --------
4002 |
4003 | >>> A >>> B
4004 | lkey value rkey value
4005 | 0 foo 1 0 foo 5
4006 | 1 bar 2 1 bar 6
4007 | 2 baz 3 2 qux 7
4008 | 3 foo 4 3 bar 8
4009 |
4010 | >>> A.merge(B, left_on='lkey', right_on='rkey', how='outer')
4011 | lkey value_x rkey value_y
4012 | 0 foo 1 foo 5
4013 | 1 foo 4 foo 5
4014 | 2 bar 2 bar 6
4015 | 3 bar 2 bar 8
4016 | 4 baz 3 NaN NaN
4017 | 5 NaN NaN qux 7
4018 |
4019 | Returns
4020 | -------
4021 | merged : DataFrame
4022 | The output type will the be same as 'left', if it is a subclass
4023 | of DataFrame.
4024 |
4025 | See also
4026 | --------
4027 | merge_ordered
4028 | merge_asof
4029 | DataFrame.join
4030 |
4031 | min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
4032 | This method returns the minimum of the values in the object.
4033 | If you want the *index* of the minimum, use ``idxmin``. This is
4034 | the equivalent of the ``numpy.ndarray`` method ``argmin``.
4035 |
4036 | Parameters
4037 | ----------
4038 | axis : {index (0), columns (1)}
4039 | skipna : boolean, default True
4040 | Exclude NA/null values when computing the result.
4041 | level : int or level name, default None
4042 | If the axis is a MultiIndex (hierarchical), count along a
4043 | particular level, collapsing into a Series
4044 | numeric_only : boolean, default None
4045 | Include only float, int, boolean columns. If None, will attempt to use
4046 | everything, then use only numeric data. Not implemented for Series.
4047 |
4048 | Returns
4049 | -------
4050 | min : Series or DataFrame (if level specified)
4051 |
4052 | mod(self, other, axis='columns', level=None, fill_value=None)
4053 | Modulo of dataframe and other, element-wise (binary operator `mod`).
4054 |
4055 | Equivalent to ``dataframe % other``, but with support to substitute a fill_value for
4056 | missing data in one of the inputs.
4057 |
4058 | Parameters
4059 | ----------
4060 | other : Series, DataFrame, or constant
4061 | axis : {0, 1, 'index', 'columns'}
4062 | For Series input, axis to match Series index on
4063 | level : int or name
4064 | Broadcast across a level, matching Index values on the
4065 | passed MultiIndex level
4066 | fill_value : None or float value, default None
4067 | Fill existing missing (NaN) values, and any new element needed for
4068 | successful DataFrame alignment, with this value before computation.
4069 | If data in both corresponding DataFrame locations is missing
4070 | the result will be missing
4071 |
4072 | Notes
4073 | -----
4074 | Mismatched indices will be unioned together
4075 |
4076 | Returns
4077 | -------
4078 | result : DataFrame
4079 |
4080 | Examples
4081 | --------
4082 | None
4083 |
4084 | See also
4085 | --------
4086 | DataFrame.rmod
4087 |
4088 | mode(self, axis=0, numeric_only=False)
4089 | Gets the mode(s) of each element along the axis selected. Adds a row
4090 | for each mode per label, fills in gaps with nan.
4091 |
4092 | Note that there could be multiple values returned for the selected
4093 | axis (when more than one item share the maximum frequency), which is
4094 | the reason why a dataframe is returned. If you want to impute missing
4095 | values with the mode in a dataframe ``df``, you can just do this:
4096 | ``df.fillna(df.mode().iloc[0])``
4097 |
4098 | Parameters
4099 | ----------
4100 | axis : {0 or 'index', 1 or 'columns'}, default 0
4101 | * 0 or 'index' : get mode of each column
4102 | * 1 or 'columns' : get mode of each row
4103 | numeric_only : boolean, default False
4104 | if True, only apply to numeric columns
4105 |
4106 | Returns
4107 | -------
4108 | modes : DataFrame (sorted)
4109 |
4110 | Examples
4111 | --------
4112 | >>> df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3]})
4113 | >>> df.mode()
4114 | A
4115 | 0 1
4116 | 1 2
4117 |
4118 | mul(self, other, axis='columns', level=None, fill_value=None)
4119 | Multiplication of dataframe and other, element-wise (binary operator `mul`).
4120 |
4121 | Equivalent to ``dataframe * other``, but with support to substitute a fill_value for
4122 | missing data in one of the inputs.
4123 |
4124 | Parameters
4125 | ----------
4126 | other : Series, DataFrame, or constant
4127 | axis : {0, 1, 'index', 'columns'}
4128 | For Series input, axis to match Series index on
4129 | level : int or name
4130 | Broadcast across a level, matching Index values on the
4131 | passed MultiIndex level
4132 | fill_value : None or float value, default None
4133 | Fill existing missing (NaN) values, and any new element needed for
4134 | successful DataFrame alignment, with this value before computation.
4135 | If data in both corresponding DataFrame locations is missing
4136 | the result will be missing
4137 |
4138 | Notes
4139 | -----
4140 | Mismatched indices will be unioned together
4141 |
4142 | Returns
4143 | -------
4144 | result : DataFrame
4145 |
4146 | Examples
4147 | --------
4148 | None
4149 |
4150 | See also
4151 | --------
4152 | DataFrame.rmul
4153 |
4154 | multiply = mul(self, other, axis='columns', level=None, fill_value=None)
4155 |
4156 | ne(self, other, axis='columns', level=None)
4157 | Wrapper for flexible comparison methods ne
4158 |
4159 | nlargest(self, n, columns, keep='first')
4160 | Return the first `n` rows ordered by `columns` in descending order.
4161 |
4162 | Return the first `n` rows with the largest values in `columns`, in
4163 | descending order. The columns that are not specified are returned as
4164 | well, but not used for ordering.
4165 |
4166 | This method is equivalent to
4167 | ``df.sort_values(columns, ascending=False).head(n)``, but more
4168 | performant.
4169 |
4170 | Parameters
4171 | ----------
4172 | n : int
4173 | Number of rows to return.
4174 | columns : label or list of labels
4175 | Column label(s) to order by.
4176 | keep : {'first', 'last'}, default 'first'
4177 | Where there are duplicate values:
4178 |
4179 | - `first` : prioritize the first occurrence(s)
4180 | - `last` : prioritize the last occurrence(s)
4181 |
4182 | Returns
4183 | -------
4184 | DataFrame
4185 | The first `n` rows ordered by the given columns in descending
4186 | order.
4187 |
4188 | See Also
4189 | --------
4190 | DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in
4191 | ascending order.
4192 | DataFrame.sort_values : Sort DataFrame by the values
4193 | DataFrame.head : Return the first `n` rows without re-ordering.
4194 |
4195 | Notes
4196 | -----
4197 | This function cannot be used with all column types. For example, when
4198 | specifying columns with `object` or `category` dtypes, ``TypeError`` is
4199 | raised.
4200 |
4201 | Examples
4202 | --------
4203 | >>> df = pd.DataFrame({'a': [1, 10, 8, 10, -1],
4204 | ... 'b': list('abdce'),
4205 | ... 'c': [1.0, 2.0, np.nan, 3.0, 4.0]})
4206 | >>> df
4207 | a b c
4208 | 0 1 a 1.0
4209 | 1 10 b 2.0
4210 | 2 8 d NaN
4211 | 3 10 c 3.0
4212 | 4 -1 e 4.0
4213 |
4214 | In the following example, we will use ``nlargest`` to select the three
4215 | rows having the largest values in column "a".
4216 |
4217 | >>> df.nlargest(3, 'a')
4218 | a b c
4219 | 1 10 b 2.0
4220 | 3 10 c 3.0
4221 | 2 8 d NaN
4222 |
4223 | When using ``keep='last'``, ties are resolved in reverse order:
4224 |
4225 | >>> df.nlargest(3, 'a', keep='last')
4226 | a b c
4227 | 3 10 c 3.0
4228 | 1 10 b 2.0
4229 | 2 8 d NaN
4230 |
4231 | To order by the largest values in column "a" and then "c", we can
4232 | specify multiple columns like in the next example.
4233 |
4234 | >>> df.nlargest(3, ['a', 'c'])
4235 | a b c
4236 | 3 10 c 3.0
4237 | 1 10 b 2.0
4238 | 2 8 d NaN
4239 |
4240 | Attempting to use ``nlargest`` on non-numeric dtypes will raise a
4241 | ``TypeError``:
4242 |
4243 | >>> df.nlargest(3, 'b')
4244 | Traceback (most recent call last):
4245 | TypeError: Column 'b' has dtype object, cannot use method 'nlargest'
4246 |
4247 | notna(self)
4248 | Detect existing (non-missing) values.
4249 |
4250 | Return a boolean same-sized object indicating if the values are not NA.
4251 | Non-missing values get mapped to True. Characters such as empty
4252 | strings ``''`` or :attr:`numpy.inf` are not considered NA values
4253 | (unless you set ``pandas.options.mode.use_inf_as_na = True``).
4254 | NA values, such as None or :attr:`numpy.NaN`, get mapped to False
4255 | values.
4256 |
4257 | Returns
4258 | -------
4259 | DataFrame
4260 | Mask of bool values for each element in DataFrame that
4261 | indicates whether an element is not an NA value.
4262 |
4263 | See Also
4264 | --------
4265 | DataFrame.notnull : alias of notna
4266 | DataFrame.isna : boolean inverse of notna
4267 | DataFrame.dropna : omit axes labels with missing values
4268 | notna : top-level notna
4269 |
4270 | Examples
4271 | --------
4272 | Show which entries in a DataFrame are not NA.
4273 |
4274 | >>> df = pd.DataFrame({'age': [5, 6, np.NaN],
4275 | ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'),
4276 | ... pd.Timestamp('1940-04-25')],
4277 | ... 'name': ['Alfred', 'Batman', ''],
4278 | ... 'toy': [None, 'Batmobile', 'Joker']})
4279 | >>> df
4280 | age born name toy
4281 | 0 5.0 NaT Alfred None
4282 | 1 6.0 1939-05-27 Batman Batmobile
4283 | 2 NaN 1940-04-25 Joker
4284 |
4285 | >>> df.notna()
4286 | age born name toy
4287 | 0 True False True False
4288 | 1 True True True True
4289 | 2 False True True True
4290 |
4291 | Show which entries in a Series are not NA.
4292 |
4293 | >>> ser = pd.Series([5, 6, np.NaN])
4294 | >>> ser
4295 | 0 5.0
4296 | 1 6.0
4297 | 2 NaN
4298 | dtype: float64
4299 |
4300 | >>> ser.notna()
4301 | 0 True
4302 | 1 True
4303 | 2 False
4304 | dtype: bool
4305 |
4306 | notnull(self)
4307 | Detect existing (non-missing) values.
4308 |
4309 | Return a boolean same-sized object indicating if the values are not NA.
4310 | Non-missing values get mapped to True. Characters such as empty
4311 | strings ``''`` or :attr:`numpy.inf` are not considered NA values
4312 | (unless you set ``pandas.options.mode.use_inf_as_na = True``).
4313 | NA values, such as None or :attr:`numpy.NaN`, get mapped to False
4314 | values.
4315 |
4316 | Returns
4317 | -------
4318 | DataFrame
4319 | Mask of bool values for each element in DataFrame that
4320 | indicates whether an element is not an NA value.
4321 |
4322 | See Also
4323 | --------
4324 | DataFrame.notnull : alias of notna
4325 | DataFrame.isna : boolean inverse of notna
4326 | DataFrame.dropna : omit axes labels with missing values
4327 | notna : top-level notna
4328 |
4329 | Examples
4330 | --------
4331 | Show which entries in a DataFrame are not NA.
4332 |
4333 | >>> df = pd.DataFrame({'age': [5, 6, np.NaN],
4334 | ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'),
4335 | ... pd.Timestamp('1940-04-25')],
4336 | ... 'name': ['Alfred', 'Batman', ''],
4337 | ... 'toy': [None, 'Batmobile', 'Joker']})
4338 | >>> df
4339 | age born name toy
4340 | 0 5.0 NaT Alfred None
4341 | 1 6.0 1939-05-27 Batman Batmobile
4342 | 2 NaN 1940-04-25 Joker
4343 |
4344 | >>> df.notna()
4345 | age born name toy
4346 | 0 True False True False
4347 | 1 True True True True
4348 | 2 False True True True
4349 |
4350 | Show which entries in a Series are not NA.
4351 |
4352 | >>> ser = pd.Series([5, 6, np.NaN])
4353 | >>> ser
4354 | 0 5.0
4355 | 1 6.0
4356 | 2 NaN
4357 | dtype: float64
4358 |
4359 | >>> ser.notna()
4360 | 0 True
4361 | 1 True
4362 | 2 False
4363 | dtype: bool
4364 |
4365 | nsmallest(self, n, columns, keep='first')
4366 | Get the rows of a DataFrame sorted by the `n` smallest
4367 | values of `columns`.
4368 |
4369 | Parameters
4370 | ----------
4371 | n : int
4372 | Number of items to retrieve
4373 | columns : list or str
4374 | Column name or names to order by
4375 | keep : {'first', 'last'}, default 'first'
4376 | Where there are duplicate values:
4377 | - ``first`` : take the first occurrence.
4378 | - ``last`` : take the last occurrence.
4379 |
4380 | Returns
4381 | -------
4382 | DataFrame
4383 |
4384 | Examples
4385 | --------
4386 | >>> df = pd.DataFrame({'a': [1, 10, 8, 11, -1],
4387 | ... 'b': list('abdce'),
4388 | ... 'c': [1.0, 2.0, np.nan, 3.0, 4.0]})
4389 | >>> df.nsmallest(3, 'a')
4390 | a b c
4391 | 4 -1 e 4
4392 | 0 1 a 1
4393 | 2 8 d NaN
4394 |
4395 | nunique(self, axis=0, dropna=True)
4396 | Return Series with number of distinct observations over requested
4397 | axis.
4398 |
4399 | .. versionadded:: 0.20.0
4400 |
4401 | Parameters
4402 | ----------
4403 | axis : {0 or 'index', 1 or 'columns'}, default 0
4404 | dropna : boolean, default True
4405 | Don't include NaN in the counts.
4406 |
4407 | Returns
4408 | -------
4409 | nunique : Series
4410 |
4411 | Examples
4412 | --------
4413 | >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 1, 1]})
4414 | >>> df.nunique()
4415 | A 3
4416 | B 1
4417 |
4418 | >>> df.nunique(axis=1)
4419 | 0 1
4420 | 1 2
4421 | 2 2
4422 |
4423 | pivot(self, index=None, columns=None, values=None)
4424 | Return reshaped DataFrame organized by given index / column values.
4425 |
4426 | Reshape data (produce a "pivot" table) based on column values. Uses
4427 | unique values from specified `index` / `columns` to form axes of the
4428 | resulting DataFrame. This function does not support data
4429 | aggregation, multiple values will result in a MultiIndex in the
4430 | columns. See the :ref:`User Guide <reshaping>` for more on reshaping.
4431 |
4432 | Parameters
4433 | ----------
4434 | index : string or object, optional
4435 | Column to use to make new frame's index. If None, uses
4436 | existing index.
4437 | columns : string or object
4438 | Column to use to make new frame's columns.
4439 | values : string, object or a list of the previous, optional
4440 | Column(s) to use for populating new frame's values. If not
4441 | specified, all remaining columns will be used and the result will
4442 | have hierarchically indexed columns.
4443 |
4444 | .. versionchanged :: 0.23.0
4445 | Also accept list of column names.
4446 |
4447 | Returns
4448 | -------
4449 | DataFrame
4450 | Returns reshaped DataFrame.
4451 |
4452 | Raises
4453 | ------
4454 | ValueError:
4455 | When there are any `index`, `columns` combinations with multiple
4456 | values. `DataFrame.pivot_table` when you need to aggregate.
4457 |
4458 | See Also
4459 | --------
4460 | DataFrame.pivot_table : generalization of pivot that can handle
4461 | duplicate values for one index/column pair.
4462 | DataFrame.unstack : pivot based on the index values instead of a
4463 | column.
4464 |
4465 | Notes
4466 | -----
4467 | For finer-tuned control, see hierarchical indexing documentation along
4468 | with the related stack/unstack methods.
4469 |
4470 | Examples
4471 | --------
4472 | >>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
4473 | ... 'two'],
4474 | ... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
4475 | ... 'baz': [1, 2, 3, 4, 5, 6],
4476 | ... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
4477 | >>> df
4478 | foo bar baz zoo
4479 | 0 one A 1 x
4480 | 1 one B 2 y
4481 | 2 one C 3 z
4482 | 3 two A 4 q
4483 | 4 two B 5 w
4484 | 5 two C 6 t
4485 |
4486 | >>> df.pivot(index='foo', columns='bar', values='baz')
4487 | bar A B C
4488 | foo
4489 | one 1 2 3
4490 | two 4 5 6
4491 |
4492 | >>> df.pivot(index='foo', columns='bar')['baz']
4493 | bar A B C
4494 | foo
4495 | one 1 2 3
4496 | two 4 5 6
4497 |
4498 | >>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
4499 | baz zoo
4500 | bar A B C A B C
4501 | foo
4502 | one 1 2 3 x y z
4503 | two 4 5 6 q w t
4504 |
4505 | A ValueError is raised if there are any duplicates.
4506 |
4507 | >>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
4508 | ... "bar": ['A', 'A', 'B', 'C'],
4509 | ... "baz": [1, 2, 3, 4]})
4510 | >>> df
4511 | foo bar baz
4512 | 0 one A 1
4513 | 1 one A 2
4514 | 2 two B 3
4515 | 3 two C 4
4516 |
4517 | Notice that the first two rows are the same for our `index`
4518 | and `columns` arguments.
4519 |
4520 | >>> df.pivot(index='foo', columns='bar', values='baz')
4521 | Traceback (most recent call last):
4522 | ...
4523 | ValueError: Index contains duplicate entries, cannot reshape
4524 |
4525 | pivot_table(self, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')
4526 | Create a spreadsheet-style pivot table as a DataFrame. The levels in
4527 | the pivot table will be stored in MultiIndex objects (hierarchical
4528 | indexes) on the index and columns of the result DataFrame
4529 |
4530 | Parameters
4531 | ----------
4532 | values : column to aggregate, optional
4533 | index : column, Grouper, array, or list of the previous
4534 | If an array is passed, it must be the same length as the data. The
4535 | list can contain any of the other types (except list).
4536 | Keys to group by on the pivot table index. If an array is passed,
4537 | it is being used as the same manner as column values.
4538 | columns : column, Grouper, array, or list of the previous
4539 | If an array is passed, it must be the same length as the data. The
4540 | list can contain any of the other types (except list).
4541 | Keys to group by on the pivot table column. If an array is passed,
4542 | it is being used as the same manner as column values.
4543 | aggfunc : function, list of functions, dict, default numpy.mean
4544 | If list of functions passed, the resulting pivot table will have
4545 | hierarchical columns whose top level are the function names
4546 | (inferred from the function objects themselves)
4547 | If dict is passed, the key is column to aggregate and value
4548 | is function or list of functions
4549 | fill_value : scalar, default None
4550 | Value to replace missing values with
4551 | margins : boolean, default False
4552 | Add all row / columns (e.g. for subtotal / grand totals)
4553 | dropna : boolean, default True
4554 | Do not include columns whose entries are all NaN
4555 | margins_name : string, default 'All'
4556 | Name of the row / column that will contain the totals
4557 | when margins is True.
4558 |
4559 | Examples
4560 | --------
4561 | >>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
4562 | ... "bar", "bar", "bar", "bar"],
4563 | ... "B": ["one", "one", "one", "two", "two",
4564 | ... "one", "one", "two", "two"],
4565 | ... "C": ["small", "large", "large", "small",
4566 | ... "small", "large", "small", "small",
4567 | ... "large"],
4568 | ... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7]})
4569 | >>> df
4570 | A B C D
4571 | 0 foo one small 1
4572 | 1 foo one large 2
4573 | 2 foo one large 2
4574 | 3 foo two small 3
4575 | 4 foo two small 3
4576 | 5 bar one large 4
4577 | 6 bar one small 5
4578 | 7 bar two small 6
4579 | 8 bar two large 7
4580 |
4581 | >>> table = pivot_table(df, values='D', index=['A', 'B'],
4582 | ... columns=['C'], aggfunc=np.sum)
4583 | >>> table
4584 | C large small
4585 | A B
4586 | bar one 4.0 5.0
4587 | two 7.0 6.0
4588 | foo one 4.0 1.0
4589 | two NaN 6.0
4590 |
4591 | >>> table = pivot_table(df, values='D', index=['A', 'B'],
4592 | ... columns=['C'], aggfunc=np.sum)
4593 | >>> table
4594 | C large small
4595 | A B
4596 | bar one 4.0 5.0
4597 | two 7.0 6.0
4598 | foo one 4.0 1.0
4599 | two NaN 6.0
4600 |
4601 | >>> table = pivot_table(df, values=['D', 'E'], index=['A', 'C'],
4602 | ... aggfunc={'D': np.mean,
4603 | ... 'E': [min, max, np.mean]})
4604 | >>> table
4605 | D E
4606 | mean max median min
4607 | A C
4608 | bar large 5.500000 16 14.5 13
4609 | small 5.500000 15 14.5 14
4610 | foo large 2.000000 10 9.5 9
4611 | small 2.333333 12 11.0 8
4612 |
4613 | Returns
4614 | -------
4615 | table : DataFrame
4616 |
4617 | See also
4618 | --------
4619 | DataFrame.pivot : pivot without aggregation that can handle
4620 | non-numeric data
4621 |
4622 | pow(self, other, axis='columns', level=None, fill_value=None)
4623 | Exponential power of dataframe and other, element-wise (binary operator `pow`).
4624 |
4625 | Equivalent to ``dataframe ** other``, but with support to substitute a fill_value for
4626 | missing data in one of the inputs.
4627 |
4628 | Parameters
4629 | ----------
4630 | other : Series, DataFrame, or constant
4631 | axis : {0, 1, 'index', 'columns'}
4632 | For Series input, axis to match Series index on
4633 | level : int or name
4634 | Broadcast across a level, matching Index values on the
4635 | passed MultiIndex level
4636 | fill_value : None or float value, default None
4637 | Fill existing missing (NaN) values, and any new element needed for
4638 | successful DataFrame alignment, with this value before computation.
4639 | If data in both corresponding DataFrame locations is missing
4640 | the result will be missing
4641 |
4642 | Notes
4643 | -----
4644 | Mismatched indices will be unioned together
4645 |
4646 | Returns
4647 | -------
4648 | result : DataFrame
4649 |
4650 | Examples
4651 | --------
4652 | None
4653 |
4654 | See also
4655 | --------
4656 | DataFrame.rpow
4657 |
4658 | prod(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
4659 | Return the product of the values for the requested axis
4660 |
4661 | Parameters
4662 | ----------
4663 | axis : {index (0), columns (1)}
4664 | skipna : boolean, default True
4665 | Exclude NA/null values when computing the result.
4666 | level : int or level name, default None
4667 | If the axis is a MultiIndex (hierarchical), count along a
4668 | particular level, collapsing into a Series
4669 | numeric_only : boolean, default None
4670 | Include only float, int, boolean columns. If None, will attempt to use
4671 | everything, then use only numeric data. Not implemented for Series.
4672 | min_count : int, default 0
4673 | The required number of valid values to perform the operation. If fewer than
4674 | ``min_count`` non-NA values are present the result will be NA.
4675 |
4676 | .. versionadded :: 0.22.0
4677 |
4678 | Added with the default being 0. This means the sum of an all-NA
4679 | or empty Series is 0, and the product of an all-NA or empty
4680 | Series is 1.
4681 |
4682 | Returns
4683 | -------
4684 | prod : Series or DataFrame (if level specified)
4685 |
4686 | Examples
4687 | --------
4688 | By default, the product of an empty or all-NA Series is ``1``
4689 |
4690 | >>> pd.Series([]).prod()
4691 | 1.0
4692 |
4693 | This can be controlled with the ``min_count`` parameter
4694 |
4695 | >>> pd.Series([]).prod(min_count=1)
4696 | nan
4697 |
4698 | Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
4699 | empty series identically.
4700 |
4701 | >>> pd.Series([np.nan]).prod()
4702 | 1.0
4703 |
4704 | >>> pd.Series([np.nan]).prod(min_count=1)
4705 | nan
4706 |
4707 | product = prod(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
4708 |
4709 | quantile(self, q=0.5, axis=0, numeric_only=True, interpolation='linear')
4710 | Return values at the given quantile over requested axis, a la
4711 | numpy.percentile.
4712 |
4713 | Parameters
4714 | ----------
4715 | q : float or array-like, default 0.5 (50% quantile)
4716 | 0 <= q <= 1, the quantile(s) to compute
4717 | axis : {0, 1, 'index', 'columns'} (default 0)
4718 | 0 or 'index' for row-wise, 1 or 'columns' for column-wise
4719 | numeric_only : boolean, default True
4720 | If False, the quantile of datetime and timedelta data will be
4721 | computed as well
4722 | interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
4723 | .. versionadded:: 0.18.0
4724 |
4725 | This optional parameter specifies the interpolation method to use,
4726 | when the desired quantile lies between two data points `i` and `j`:
4727 |
4728 | * linear: `i + (j - i) * fraction`, where `fraction` is the
4729 | fractional part of the index surrounded by `i` and `j`.
4730 | * lower: `i`.
4731 | * higher: `j`.
4732 | * nearest: `i` or `j` whichever is nearest.
4733 | * midpoint: (`i` + `j`) / 2.
4734 |
4735 | Returns
4736 | -------
4737 | quantiles : Series or DataFrame
4738 |
4739 | - If ``q`` is an array, a DataFrame will be returned where the
4740 | index is ``q``, the columns are the columns of self, and the
4741 | values are the quantiles.
4742 | - If ``q`` is a float, a Series will be returned where the
4743 | index is the columns of self and the values are the quantiles.
4744 |
4745 | Examples
4746 | --------
4747 |
4748 | >>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
4749 | columns=['a', 'b'])
4750 | >>> df.quantile(.1)
4751 | a 1.3
4752 | b 3.7
4753 | dtype: float64
4754 | >>> df.quantile([.1, .5])
4755 | a b
4756 | 0.1 1.3 3.7
4757 | 0.5 2.5 55.0
4758 |
4759 | Specifying `numeric_only=False` will also compute the quantile of
4760 | datetime and timedelta data.
4761 |
4762 | >>> df = pd.DataFrame({'A': [1, 2],
4763 | 'B': [pd.Timestamp('2010'),
4764 | pd.Timestamp('2011')],
4765 | 'C': [pd.Timedelta('1 days'),
4766 | pd.Timedelta('2 days')]})
4767 | >>> df.quantile(0.5, numeric_only=False)
4768 | A 1.5
4769 | B 2010-07-02 12:00:00
4770 | C 1 days 12:00:00
4771 | Name: 0.5, dtype: object
4772 |
4773 | See Also
4774 | --------
4775 | pandas.core.window.Rolling.quantile
4776 |
4777 | query(self, expr, inplace=False, **kwargs)
4778 | Query the columns of a frame with a boolean expression.
4779 |
4780 | Parameters
4781 | ----------
4782 | expr : string
4783 | The query string to evaluate. You can refer to variables
4784 | in the environment by prefixing them with an '@' character like
4785 | ``@a + b``.
4786 | inplace : bool
4787 | Whether the query should modify the data in place or return
4788 | a modified copy
4789 |
4790 | .. versionadded:: 0.18.0
4791 |
4792 | kwargs : dict
4793 | See the documentation for :func:`pandas.eval` for complete details
4794 | on the keyword arguments accepted by :meth:`DataFrame.query`.
4795 |
4796 | Returns
4797 | -------
4798 | q : DataFrame
4799 |
4800 | Notes
4801 | -----
4802 | The result of the evaluation of this expression is first passed to
4803 | :attr:`DataFrame.loc` and if that fails because of a
4804 | multidimensional key (e.g., a DataFrame) then the result will be passed
4805 | to :meth:`DataFrame.__getitem__`.
4806 |
4807 | This method uses the top-level :func:`pandas.eval` function to
4808 | evaluate the passed query.
4809 |
4810 | The :meth:`~pandas.DataFrame.query` method uses a slightly
4811 | modified Python syntax by default. For example, the ``&`` and ``|``
4812 | (bitwise) operators have the precedence of their boolean cousins,
4813 | :keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,
4814 | however the semantics are different.
4815 |
4816 | You can change the semantics of the expression by passing the keyword
4817 | argument ``parser='python'``. This enforces the same semantics as
4818 | evaluation in Python space. Likewise, you can pass ``engine='python'``
4819 | to evaluate an expression using Python itself as a backend. This is not
4820 | recommended as it is inefficient compared to using ``numexpr`` as the
4821 | engine.
4822 |
4823 | The :attr:`DataFrame.index` and
4824 | :attr:`DataFrame.columns` attributes of the
4825 | :class:`~pandas.DataFrame` instance are placed in the query namespace
4826 | by default, which allows you to treat both the index and columns of the
4827 | frame as a column in the frame.
4828 | The identifier ``index`` is used for the frame index; you can also
4829 | use the name of the index to identify it in a query. Please note that
4830 | Python keywords may not be used as identifiers.
4831 |
4832 | For further details and examples see the ``query`` documentation in
4833 | :ref:`indexing <indexing.query>`.
4834 |
4835 | See Also
4836 | --------
4837 | pandas.eval
4838 | DataFrame.eval
4839 |
4840 | Examples
4841 | --------
4842 | >>> from numpy.random import randn
4843 | >>> from pandas import DataFrame
4844 | >>> df = pd.DataFrame(randn(10, 2), columns=list('ab'))
4845 | >>> df.query('a > b')
4846 | >>> df[df.a > df.b] # same result as the previous expression
4847 |
4848 | radd(self, other, axis='columns', level=None, fill_value=None)
4849 | Addition of dataframe and other, element-wise (binary operator `radd`).
4850 |
4851 | Equivalent to ``other + dataframe``, but with support to substitute a fill_value for
4852 | missing data in one of the inputs.
4853 |
4854 | Parameters
4855 | ----------
4856 | other : Series, DataFrame, or constant
4857 | axis : {0, 1, 'index', 'columns'}
4858 | For Series input, axis to match Series index on
4859 | level : int or name
4860 | Broadcast across a level, matching Index values on the
4861 | passed MultiIndex level
4862 | fill_value : None or float value, default None
4863 | Fill existing missing (NaN) values, and any new element needed for
4864 | successful DataFrame alignment, with this value before computation.
4865 | If data in both corresponding DataFrame locations is missing
4866 | the result will be missing
4867 |
4868 | Notes
4869 | -----
4870 | Mismatched indices will be unioned together
4871 |
4872 | Returns
4873 | -------
4874 | result : DataFrame
4875 |
4876 | Examples
4877 | --------
4878 |
4879 | >>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'],
4880 | ... columns=['one'])
4881 | >>> a
4882 | one
4883 | a 1.0
4884 | b 1.0
4885 | c 1.0
4886 | d NaN
4887 | >>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan],
4888 | ... two=[np.nan, 2, np.nan, 2]),
4889 | ... index=['a', 'b', 'd', 'e'])
4890 | >>> b
4891 | one two
4892 | a 1.0 NaN
4893 | b NaN 2.0
4894 | d 1.0 NaN
4895 | e NaN 2.0
4896 | >>> a.add(b, fill_value=0)
4897 | one two
4898 | a 2.0 NaN
4899 | b 1.0 2.0
4900 | c 1.0 NaN
4901 | d 1.0 NaN
4902 | e NaN 2.0
4903 |
4904 |
4905 | See also
4906 | --------
4907 | DataFrame.add
4908 |
4909 | rdiv = rtruediv(self, other, axis='columns', level=None, fill_value=None)
4910 |
4911 | reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
4912 | Conform DataFrame to new index with optional filling logic, placing
4913 | NA/NaN in locations having no value in the previous index. A new object
4914 | is produced unless the new index is equivalent to the current one and
4915 | copy=False
4916 |
4917 | Parameters
4918 | ----------
4919 | labels : array-like, optional
4920 | New labels / index to conform the axis specified by 'axis' to.
4921 | index, columns : array-like, optional (should be specified using keywords)
4922 | New labels / index to conform to. Preferably an Index object to
4923 | avoid duplicating data
4924 | axis : int or str, optional
4925 | Axis to target. Can be either the axis name ('index', 'columns')
4926 | or number (0, 1).
4927 | method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
4928 | method to use for filling holes in reindexed DataFrame.
4929 | Please note: this is only applicable to DataFrames/Series with a
4930 | monotonically increasing/decreasing index.
4931 |
4932 | * default: don't fill gaps
4933 | * pad / ffill: propagate last valid observation forward to next
4934 | valid
4935 | * backfill / bfill: use next valid observation to fill gap
4936 | * nearest: use nearest valid observations to fill gap
4937 |
4938 | copy : boolean, default True
4939 | Return a new object, even if the passed indexes are the same
4940 | level : int or name
4941 | Broadcast across a level, matching Index values on the
4942 | passed MultiIndex level
4943 | fill_value : scalar, default np.NaN
4944 | Value to use for missing values. Defaults to NaN, but can be any
4945 | "compatible" value
4946 | limit : int, default None
4947 | Maximum number of consecutive elements to forward or backward fill
4948 | tolerance : optional
4949 | Maximum distance between original and new labels for inexact
4950 | matches. The values of the index at the matching locations most
4951 | satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
4952 |
4953 | Tolerance may be a scalar value, which applies the same tolerance
4954 | to all values, or list-like, which applies variable tolerance per
4955 | element. List-like includes list, tuple, array, Series, and must be
4956 | the same size as the index and its dtype must exactly match the
4957 | index's type.
4958 |
4959 | .. versionadded:: 0.21.0 (list-like tolerance)
4960 |
4961 | Examples
4962 | --------
4963 |
4964 | ``DataFrame.reindex`` supports two calling conventions
4965 |
4966 | * ``(index=index_labels, columns=column_labels, ...)``
4967 | * ``(labels, axis={'index', 'columns'}, ...)``
4968 |
4969 | We *highly* recommend using keyword arguments to clarify your
4970 | intent.
4971 |
4972 | Create a dataframe with some fictional data.
4973 |
4974 | >>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
4975 | >>> df = pd.DataFrame({
4976 | ... 'http_status': [200,200,404,404,301],
4977 | ... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
4978 | ... index=index)
4979 | >>> df
4980 | http_status response_time
4981 | Firefox 200 0.04
4982 | Chrome 200 0.02
4983 | Safari 404 0.07
4984 | IE10 404 0.08
4985 | Konqueror 301 1.00
4986 |
4987 | Create a new index and reindex the dataframe. By default
4988 | values in the new index that do not have corresponding
4989 | records in the dataframe are assigned ``NaN``.
4990 |
4991 | >>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
4992 | ... 'Chrome']
4993 | >>> df.reindex(new_index)
4994 | http_status response_time
4995 | Safari 404.0 0.07
4996 | Iceweasel NaN NaN
4997 | Comodo Dragon NaN NaN
4998 | IE10 404.0 0.08
4999 | Chrome 200.0 0.02
5000 |
5001 | We can fill in the missing values by passing a value to
5002 | the keyword ``fill_value``. Because the index is not monotonically
5003 | increasing or decreasing, we cannot use arguments to the keyword
5004 | ``method`` to fill the ``NaN`` values.
5005 |
5006 | >>> df.reindex(new_index, fill_value=0)
5007 | http_status response_time
5008 | Safari 404 0.07
5009 | Iceweasel 0 0.00
5010 | Comodo Dragon 0 0.00
5011 | IE10 404 0.08
5012 | Chrome 200 0.02
5013 |
5014 | >>> df.reindex(new_index, fill_value='missing')
5015 | http_status response_time
5016 | Safari 404 0.07
5017 | Iceweasel missing missing
5018 | Comodo Dragon missing missing
5019 | IE10 404 0.08
5020 | Chrome 200 0.02
5021 |
5022 | We can also reindex the columns.
5023 |
5024 | >>> df.reindex(columns=['http_status', 'user_agent'])
5025 | http_status user_agent
5026 | Firefox 200 NaN
5027 | Chrome 200 NaN
5028 | Safari 404 NaN
5029 | IE10 404 NaN
5030 | Konqueror 301 NaN
5031 |
5032 | Or we can use "axis-style" keyword arguments
5033 |
5034 | >>> df.reindex(['http_status', 'user_agent'], axis="columns")
5035 | http_status user_agent
5036 | Firefox 200 NaN
5037 | Chrome 200 NaN
5038 | Safari 404 NaN
5039 | IE10 404 NaN
5040 | Konqueror 301 NaN
5041 |
5042 | To further illustrate the filling functionality in
5043 | ``reindex``, we will create a dataframe with a
5044 | monotonically increasing index (for example, a sequence
5045 | of dates).
5046 |
5047 | >>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
5048 | >>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
5049 | ... index=date_index)
5050 | >>> df2
5051 | prices
5052 | 2010-01-01 100
5053 | 2010-01-02 101
5054 | 2010-01-03 NaN
5055 | 2010-01-04 100
5056 | 2010-01-05 89
5057 | 2010-01-06 88
5058 |
5059 | Suppose we decide to expand the dataframe to cover a wider
5060 | date range.
5061 |
5062 | >>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
5063 | >>> df2.reindex(date_index2)
5064 | prices
5065 | 2009-12-29 NaN
5066 | 2009-12-30 NaN
5067 | 2009-12-31 NaN
5068 | 2010-01-01 100
5069 | 2010-01-02 101
5070 | 2010-01-03 NaN
5071 | 2010-01-04 100
5072 | 2010-01-05 89
5073 | 2010-01-06 88
5074 | 2010-01-07 NaN
5075 |
5076 | The index entries that did not have a value in the original data frame
5077 | (for example, '2009-12-29') are by default filled with ``NaN``.
5078 | If desired, we can fill in the missing values using one of several
5079 | options.
5080 |
5081 | For example, to backpropagate the last valid value to fill the ``NaN``
5082 | values, pass ``bfill`` as an argument to the ``method`` keyword.
5083 |
5084 | >>> df2.reindex(date_index2, method='bfill')
5085 | prices
5086 | 2009-12-29 100
5087 | 2009-12-30 100
5088 | 2009-12-31 100
5089 | 2010-01-01 100
5090 | 2010-01-02 101
5091 | 2010-01-03 NaN
5092 | 2010-01-04 100
5093 | 2010-01-05 89
5094 | 2010-01-06 88
5095 | 2010-01-07 NaN
5096 |
5097 | Please note that the ``NaN`` value present in the original dataframe
5098 | (at index value 2010-01-03) will not be filled by any of the
5099 | value propagation schemes. This is because filling while reindexing
5100 | does not look at dataframe values, but only compares the original and
5101 | desired indexes. If you do want to fill in the ``NaN`` values present
5102 | in the original dataframe, use the ``fillna()`` method.
5103 |
5104 | See the :ref:`user guide <basics.reindexing>` for more.
5105 |
5106 | Returns
5107 | -------
5108 | reindexed : DataFrame
5109 |
5110 | reindex_axis(self, labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)
5111 | Conform input object to new index with optional
5112 | filling logic, placing NA/NaN in locations having no value in the
5113 | previous index. A new object is produced unless the new index is
5114 | equivalent to the current one and copy=False
5115 |
5116 | Parameters
5117 | ----------
5118 | labels : array-like
5119 | New labels / index to conform to. Preferably an Index object to
5120 | avoid duplicating data
5121 | axis : {0 or 'index', 1 or 'columns'}
5122 | method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
5123 | Method to use for filling holes in reindexed DataFrame:
5124 |
5125 | * default: don't fill gaps
5126 | * pad / ffill: propagate last valid observation forward to next
5127 | valid
5128 | * backfill / bfill: use next valid observation to fill gap
5129 | * nearest: use nearest valid observations to fill gap
5130 |
5131 | copy : boolean, default True
5132 | Return a new object, even if the passed indexes are the same
5133 | level : int or name
5134 | Broadcast across a level, matching Index values on the
5135 | passed MultiIndex level
5136 | limit : int, default None
5137 | Maximum number of consecutive elements to forward or backward fill
5138 | tolerance : optional
5139 | Maximum distance between original and new labels for inexact
5140 | matches. The values of the index at the matching locations most
5141 | satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
5142 |
5143 | Tolerance may be a scalar value, which applies the same tolerance
5144 | to all values, or list-like, which applies variable tolerance per
5145 | element. List-like includes list, tuple, array, Series, and must be
5146 | the same size as the index and its dtype must exactly match the
5147 | index's type.
5148 |
5149 | .. versionadded:: 0.21.0 (list-like tolerance)
5150 |
5151 | Examples
5152 | --------
5153 | >>> df.reindex_axis(['A', 'B', 'C'], axis=1)
5154 |
5155 | See Also
5156 | --------
5157 | reindex, reindex_like
5158 |
5159 | Returns
5160 | -------
5161 | reindexed : DataFrame
5162 |
5163 | rename(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None)
5164 | Alter axes labels.
5165 |
5166 | Function / dict values must be unique (1-to-1). Labels not contained in
5167 | a dict / Series will be left as-is. Extra labels listed don't throw an
5168 | error.
5169 |
5170 | See the :ref:`user guide <basics.rename>` for more.
5171 |
5172 | Parameters
5173 | ----------
5174 | mapper, index, columns : dict-like or function, optional
5175 | dict-like or functions transformations to apply to
5176 | that axis' values. Use either ``mapper`` and ``axis`` to
5177 | specify the axis to target with ``mapper``, or ``index`` and
5178 | ``columns``.
5179 | axis : int or str, optional
5180 | Axis to target with ``mapper``. Can be either the axis name
5181 | ('index', 'columns') or number (0, 1). The default is 'index'.
5182 | copy : boolean, default True
5183 | Also copy underlying data
5184 | inplace : boolean, default False
5185 | Whether to return a new DataFrame. If True then value of copy is
5186 | ignored.
5187 | level : int or level name, default None
5188 | In case of a MultiIndex, only rename labels in the specified
5189 | level.
5190 |
5191 | Returns
5192 | -------
5193 | renamed : DataFrame
5194 |
5195 | See Also
5196 | --------
5197 | pandas.DataFrame.rename_axis
5198 |
5199 | Examples
5200 | --------
5201 |
5202 | ``DataFrame.rename`` supports two calling conventions
5203 |
5204 | * ``(index=index_mapper, columns=columns_mapper, ...)``
5205 | * ``(mapper, axis={'index', 'columns'}, ...)``
5206 |
5207 | We *highly* recommend using keyword arguments to clarify your
5208 | intent.
5209 |
5210 | >>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
5211 | >>> df.rename(index=str, columns={"A": "a", "B": "c"})
5212 | a c
5213 | 0 1 4
5214 | 1 2 5
5215 | 2 3 6
5216 |
5217 | >>> df.rename(index=str, columns={"A": "a", "C": "c"})
5218 | a B
5219 | 0 1 4
5220 | 1 2 5
5221 | 2 3 6
5222 |
5223 | Using axis-style parameters
5224 |
5225 | >>> df.rename(str.lower, axis='columns')
5226 | a b
5227 | 0 1 4
5228 | 1 2 5
5229 | 2 3 6
5230 |
5231 | >>> df.rename({1: 2, 2: 4}, axis='index')
5232 | A B
5233 | 0 1 4
5234 | 2 2 5
5235 | 4 3 6
5236 |
5237 | reorder_levels(self, order, axis=0)
5238 | Rearrange index levels using input order.
5239 | May not drop or duplicate levels
5240 |
5241 | Parameters
5242 | ----------
5243 | order : list of int or list of str
5244 | List representing new level order. Reference level by number
5245 | (position) or by key (label).
5246 | axis : int
5247 | Where to reorder levels.
5248 |
5249 | Returns
5250 | -------
5251 | type of caller (new object)
5252 |
5253 | replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
5254 | Replace values given in `to_replace` with `value`.
5255 |
5256 | Values of the DataFrame are replaced with other values dynamically.
5257 | This differs from updating with ``.loc`` or ``.iloc``, which require
5258 | you to specify a location to update with some value.
5259 |
5260 | Parameters
5261 | ----------
5262 | to_replace : str, regex, list, dict, Series, int, float, or None
5263 | How to find the values that will be replaced.
5264 |
5265 | * numeric, str or regex:
5266 |
5267 | - numeric: numeric values equal to `to_replace` will be
5268 | replaced with `value`
5269 | - str: string exactly matching `to_replace` will be replaced
5270 | with `value`
5271 | - regex: regexs matching `to_replace` will be replaced with
5272 | `value`
5273 |
5274 | * list of str, regex, or numeric:
5275 |
5276 | - First, if `to_replace` and `value` are both lists, they
5277 | **must** be the same length.
5278 | - Second, if ``regex=True`` then all of the strings in **both**
5279 | lists will be interpreted as regexs otherwise they will match
5280 | directly. This doesn't matter much for `value` since there
5281 | are only a few possible substitution regexes you can use.
5282 | - str, regex and numeric rules apply as above.
5283 |
5284 | * dict:
5285 |
5286 | - Dicts can be used to specify different replacement values
5287 | for different existing values. For example,
5288 | ``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and
5289 | 'y' with 'z'. To use a dict in this way the `value`
5290 | parameter should be `None`.
5291 | - For a DataFrame a dict can specify that different values
5292 | should be replaced in different columns. For example,
5293 | ``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a'
5294 | and the value 'z' in column 'b' and replaces these values
5295 | with whatever is specified in `value`. The `value` parameter
5296 | should not be ``None`` in this case. You can treat this as a
5297 | special case of passing two lists except that you are
5298 | specifying the column to search in.
5299 | - For a DataFrame nested dictionaries, e.g.,
5300 | ``{'a': {'b': np.nan}}``, are read as follows: look in column
5301 | 'a' for the value 'b' and replace it with NaN. The `value`
5302 | parameter should be ``None`` to use a nested dict in this
5303 | way. You can nest regular expressions as well. Note that
5304 | column names (the top-level dictionary keys in a nested
5305 | dictionary) **cannot** be regular expressions.
5306 |
5307 | * None:
5308 |
5309 | - This means that the `regex` argument must be a string,
5310 | compiled regular expression, or list, dict, ndarray or
5311 | Series of such elements. If `value` is also ``None`` then
5312 | this **must** be a nested dictionary or Series.
5313 |
5314 | See the examples section for examples of each of these.
5315 | value : scalar, dict, list, str, regex, default None
5316 | Value to replace any values matching `to_replace` with.
5317 | For a DataFrame a dict of values can be used to specify which
5318 | value to use for each column (columns not in the dict will not be
5319 | filled). Regular expressions, strings and lists or dicts of such
5320 | objects are also allowed.
5321 | inplace : boolean, default False
5322 | If True, in place. Note: this will modify any
5323 | other views on this object (e.g. a column from a DataFrame).
5324 | Returns the caller if this is True.
5325 | limit : int, default None
5326 | Maximum size gap to forward or backward fill.
5327 | regex : bool or same types as `to_replace`, default False
5328 | Whether to interpret `to_replace` and/or `value` as regular
5329 | expressions. If this is ``True`` then `to_replace` *must* be a
5330 | string. Alternatively, this could be a regular expression or a
5331 | list, dict, or array of regular expressions in which case
5332 | `to_replace` must be ``None``.
5333 | method : {'pad', 'ffill', 'bfill', `None`}
5334 | The method to use when for replacement, when `to_replace` is a
5335 | scalar, list or tuple and `value` is ``None``.
5336 |
5337 | .. versionchanged:: 0.23.0
5338 | Added to DataFrame.
5339 |
5340 | See Also
5341 | --------
5342 | DataFrame.fillna : Fill NA values
5343 | DataFrame.where : Replace values based on boolean condition
5344 | Series.str.replace : Simple string replacement.
5345 |
5346 | Returns
5347 | -------
5348 | DataFrame
5349 | Object after replacement.
5350 |
5351 | Raises
5352 | ------
5353 | AssertionError
5354 | * If `regex` is not a ``bool`` and `to_replace` is not
5355 | ``None``.
5356 | TypeError
5357 | * If `to_replace` is a ``dict`` and `value` is not a ``list``,
5358 | ``dict``, ``ndarray``, or ``Series``
5359 | * If `to_replace` is ``None`` and `regex` is not compilable
5360 | into a regular expression or is a list, dict, ndarray, or
5361 | Series.
5362 | * When replacing multiple ``bool`` or ``datetime64`` objects and
5363 | the arguments to `to_replace` does not match the type of the
5364 | value being replaced
5365 | ValueError
5366 | * If a ``list`` or an ``ndarray`` is passed to `to_replace` and
5367 | `value` but they are not the same length.
5368 |
5369 | Notes
5370 | -----
5371 | * Regex substitution is performed under the hood with ``re.sub``. The
5372 | rules for substitution for ``re.sub`` are the same.
5373 | * Regular expressions will only substitute on strings, meaning you
5374 | cannot provide, for example, a regular expression matching floating
5375 | point numbers and expect the columns in your frame that have a
5376 | numeric dtype to be matched. However, if those floating point
5377 | numbers *are* strings, then you can do this.
5378 | * This method has *a lot* of options. You are encouraged to experiment
5379 | and play with this method to gain intuition about how it works.
5380 | * When dict is used as the `to_replace` value, it is like
5381 | key(s) in the dict are the to_replace part and
5382 | value(s) in the dict are the value parameter.
5383 |
5384 | Examples
5385 | --------
5386 |
5387 | **Scalar `to_replace` and `value`**
5388 |
5389 | >>> s = pd.Series([0, 1, 2, 3, 4])
5390 | >>> s.replace(0, 5)
5391 | 0 5
5392 | 1 1
5393 | 2 2
5394 | 3 3
5395 | 4 4
5396 | dtype: int64
5397 |
5398 | >>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
5399 | ... 'B': [5, 6, 7, 8, 9],
5400 | ... 'C': ['a', 'b', 'c', 'd', 'e']})
5401 | >>> df.replace(0, 5)
5402 | A B C
5403 | 0 5 5 a
5404 | 1 1 6 b
5405 | 2 2 7 c
5406 | 3 3 8 d
5407 | 4 4 9 e
5408 |
5409 | **List-like `to_replace`**
5410 |
5411 | >>> df.replace([0, 1, 2, 3], 4)
5412 | A B C
5413 | 0 4 5 a
5414 | 1 4 6 b
5415 | 2 4 7 c
5416 | 3 4 8 d
5417 | 4 4 9 e
5418 |
5419 | >>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
5420 | A B C
5421 | 0 4 5 a
5422 | 1 3 6 b
5423 | 2 2 7 c
5424 | 3 1 8 d
5425 | 4 4 9 e
5426 |
5427 | >>> s.replace([1, 2], method='bfill')
5428 | 0 0
5429 | 1 3
5430 | 2 3
5431 | 3 3
5432 | 4 4
5433 | dtype: int64
5434 |
5435 | **dict-like `to_replace`**
5436 |
5437 | >>> df.replace({0: 10, 1: 100})
5438 | A B C
5439 | 0 10 5 a
5440 | 1 100 6 b
5441 | 2 2 7 c
5442 | 3 3 8 d
5443 | 4 4 9 e
5444 |
5445 | >>> df.replace({'A': 0, 'B': 5}, 100)
5446 | A B C
5447 | 0 100 100 a
5448 | 1 1 6 b
5449 | 2 2 7 c
5450 | 3 3 8 d
5451 | 4 4 9 e
5452 |
5453 | >>> df.replace({'A': {0: 100, 4: 400}})
5454 | A B C
5455 | 0 100 5 a
5456 | 1 1 6 b
5457 | 2 2 7 c
5458 | 3 3 8 d
5459 | 4 400 9 e
5460 |
5461 | **Regular expression `to_replace`**
5462 |
5463 | >>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
5464 | ... 'B': ['abc', 'bar', 'xyz']})
5465 | >>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
5466 | A B
5467 | 0 new abc
5468 | 1 foo new
5469 | 2 bait xyz
5470 |
5471 | >>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
5472 | A B
5473 | 0 new abc
5474 | 1 foo bar
5475 | 2 bait xyz
5476 |
5477 | >>> df.replace(regex=r'^ba.$', value='new')
5478 | A B
5479 | 0 new abc
5480 | 1 foo new
5481 | 2 bait xyz
5482 |
5483 | >>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'})
5484 | A B
5485 | 0 new abc
5486 | 1 xyz new
5487 | 2 bait xyz
5488 |
5489 | >>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
5490 | A B
5491 | 0 new abc
5492 | 1 new new
5493 | 2 bait xyz
5494 |
5495 | Note that when replacing multiple ``bool`` or ``datetime64`` objects,
5496 | the data types in the `to_replace` parameter must match the data
5497 | type of the value being replaced:
5498 |
5499 | >>> df = pd.DataFrame({'A': [True, False, True],
5500 | ... 'B': [False, True, False]})
5501 | >>> df.replace({'a string': 'new value', True: False}) # raises
5502 | Traceback (most recent call last):
5503 | ...
5504 | TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'
5505 |
5506 | This raises a ``TypeError`` because one of the ``dict`` keys is not of
5507 | the correct type for replacement.
5508 |
5509 | Compare the behavior of ``s.replace({'a': None})`` and
5510 | ``s.replace('a', None)`` to understand the pecularities
5511 | of the `to_replace` parameter:
5512 |
5513 | >>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
5514 |
5515 | When one uses a dict as the `to_replace` value, it is like the
5516 | value(s) in the dict are equal to the `value` parameter.
5517 | ``s.replace({'a': None})`` is equivalent to
5518 | ``s.replace(to_replace={'a': None}, value=None, method=None)``:
5519 |
5520 | >>> s.replace({'a': None})
5521 | 0 10
5522 | 1 None
5523 | 2 None
5524 | 3 b
5525 | 4 None
5526 | dtype: object
5527 |
5528 | When ``value=None`` and `to_replace` is a scalar, list or
5529 | tuple, `replace` uses the method parameter (default 'pad') to do the
5530 | replacement. So this is why the 'a' values are being replaced by 10
5531 | in rows 1 and 2 and 'b' in row 4 in this case.
5532 | The command ``s.replace('a', None)`` is actually equivalent to
5533 | ``s.replace(to_replace='a', value=None, method='pad')``:
5534 |
5535 | >>> s.replace('a', None)
5536 | 0 10
5537 | 1 10
5538 | 2 10
5539 | 3 b
5540 | 4 b
5541 | dtype: object
5542 |
5543 | reset_index(self, level=None, drop=False, inplace=False, col_level=0, col_fill='')
5544 | For DataFrame with multi-level index, return new DataFrame with
5545 | labeling information in the columns under the index names, defaulting
5546 | to 'level_0', 'level_1', etc. if any are None. For a standard index,
5547 | the index name will be used (if set), otherwise a default 'index' or
5548 | 'level_0' (if 'index' is already taken) will be used.
5549 |
5550 | Parameters
5551 | ----------
5552 | level : int, str, tuple, or list, default None
5553 | Only remove the given levels from the index. Removes all levels by
5554 | default
5555 | drop : boolean, default False
5556 | Do not try to insert index into dataframe columns. This resets
5557 | the index to the default integer index.
5558 | inplace : boolean, default False
5559 | Modify the DataFrame in place (do not create a new object)
5560 | col_level : int or str, default 0
5561 | If the columns have multiple levels, determines which level the
5562 | labels are inserted into. By default it is inserted into the first
5563 | level.
5564 | col_fill : object, default ''
5565 | If the columns have multiple levels, determines how the other
5566 | levels are named. If None then the index name is repeated.
5567 |
5568 | Returns
5569 | -------
5570 | resetted : DataFrame
5571 |
5572 | Examples
5573 | --------
5574 | >>> df = pd.DataFrame([('bird', 389.0),
5575 | ... ('bird', 24.0),
5576 | ... ('mammal', 80.5),
5577 | ... ('mammal', np.nan)],
5578 | ... index=['falcon', 'parrot', 'lion', 'monkey'],
5579 | ... columns=('class', 'max_speed'))
5580 | >>> df
5581 | class max_speed
5582 | falcon bird 389.0
5583 | parrot bird 24.0
5584 | lion mammal 80.5
5585 | monkey mammal NaN
5586 |
5587 | When we reset the index, the old index is added as a column, and a
5588 | new sequential index is used:
5589 |
5590 | >>> df.reset_index()
5591 | index class max_speed
5592 | 0 falcon bird 389.0
5593 | 1 parrot bird 24.0
5594 | 2 lion mammal 80.5
5595 | 3 monkey mammal NaN
5596 |
5597 | We can use the `drop` parameter to avoid the old index being added as
5598 | a column:
5599 |
5600 | >>> df.reset_index(drop=True)
5601 | class max_speed
5602 | 0 bird 389.0
5603 | 1 bird 24.0
5604 | 2 mammal 80.5
5605 | 3 mammal NaN
5606 |
5607 | You can also use `reset_index` with `MultiIndex`.
5608 |
5609 | >>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
5610 | ... ('bird', 'parrot'),
5611 | ... ('mammal', 'lion'),
5612 | ... ('mammal', 'monkey')],
5613 | ... names=['class', 'name'])
5614 | >>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
5615 | ... ('species', 'type')])
5616 | >>> df = pd.DataFrame([(389.0, 'fly'),
5617 | ... ( 24.0, 'fly'),
5618 | ... ( 80.5, 'run'),
5619 | ... (np.nan, 'jump')],
5620 | ... index=index,
5621 | ... columns=columns)
5622 | >>> df
5623 | speed species
5624 | max type
5625 | class name
5626 | bird falcon 389.0 fly
5627 | parrot 24.0 fly
5628 | mammal lion 80.5 run
5629 | monkey NaN jump
5630 |
5631 | If the index has multiple levels, we can reset a subset of them:
5632 |
5633 | >>> df.reset_index(level='class')
5634 | class speed species
5635 | max type
5636 | name
5637 | falcon bird 389.0 fly
5638 | parrot bird 24.0 fly
5639 | lion mammal 80.5 run
5640 | monkey mammal NaN jump
5641 |
5642 | If we are not dropping the index, by default, it is placed in the top
5643 | level. We can place it in another level:
5644 |
5645 | >>> df.reset_index(level='class', col_level=1)
5646 | speed species
5647 | class max type
5648 | name
5649 | falcon bird 389.0 fly
5650 | parrot bird 24.0 fly
5651 | lion mammal 80.5 run
5652 | monkey mammal NaN jump
5653 |
5654 | When the index is inserted under another level, we can specify under
5655 | which one with the parameter `col_fill`:
5656 |
5657 | >>> df.reset_index(level='class', col_level=1, col_fill='species')
5658 | species speed species
5659 | class max type
5660 | name
5661 | falcon bird 389.0 fly
5662 | parrot bird 24.0 fly
5663 | lion mammal 80.5 run
5664 | monkey mammal NaN jump
5665 |
5666 | If we specify a nonexistent level for `col_fill`, it is created:
5667 |
5668 | >>> df.reset_index(level='class', col_level=1, col_fill='genus')
5669 | genus speed species
5670 | class max type
5671 | name
5672 | falcon bird 389.0 fly
5673 | parrot bird 24.0 fly
5674 | lion mammal 80.5 run
5675 | monkey mammal NaN jump
5676 |
5677 | rfloordiv(self, other, axis='columns', level=None, fill_value=None)
5678 | Integer division of dataframe and other, element-wise (binary operator `rfloordiv`).
5679 |
5680 | Equivalent to ``other // dataframe``, but with support to substitute a fill_value for
5681 | missing data in one of the inputs.
5682 |
5683 | Parameters
5684 | ----------
5685 | other : Series, DataFrame, or constant
5686 | axis : {0, 1, 'index', 'columns'}
5687 | For Series input, axis to match Series index on
5688 | level : int or name
5689 | Broadcast across a level, matching Index values on the
5690 | passed MultiIndex level
5691 | fill_value : None or float value, default None
5692 | Fill existing missing (NaN) values, and any new element needed for
5693 | successful DataFrame alignment, with this value before computation.
5694 | If data in both corresponding DataFrame locations is missing
5695 | the result will be missing
5696 |
5697 | Notes
5698 | -----
5699 | Mismatched indices will be unioned together
5700 |
5701 | Returns
5702 | -------
5703 | result : DataFrame
5704 |
5705 | Examples
5706 | --------
5707 | None
5708 |
5709 | See also
5710 | --------
5711 | DataFrame.floordiv
5712 |
5713 | rmod(self, other, axis='columns', level=None, fill_value=None)
5714 | Modulo of dataframe and other, element-wise (binary operator `rmod`).
5715 |
5716 | Equivalent to ``other % dataframe``, but with support to substitute a fill_value for
5717 | missing data in one of the inputs.
5718 |
5719 | Parameters
5720 | ----------
5721 | other : Series, DataFrame, or constant
5722 | axis : {0, 1, 'index', 'columns'}
5723 | For Series input, axis to match Series index on
5724 | level : int or name
5725 | Broadcast across a level, matching Index values on the
5726 | passed MultiIndex level
5727 | fill_value : None or float value, default None
5728 | Fill existing missing (NaN) values, and any new element needed for
5729 | successful DataFrame alignment, with this value before computation.
5730 | If data in both corresponding DataFrame locations is missing
5731 | the result will be missing
5732 |
5733 | Notes
5734 | -----
5735 | Mismatched indices will be unioned together
5736 |
5737 | Returns
5738 | -------
5739 | result : DataFrame
5740 |
5741 | Examples
5742 | --------
5743 | None
5744 |
5745 | See also
5746 | --------
5747 | DataFrame.mod
5748 |
5749 | rmul(self, other, axis='columns', level=None, fill_value=None)
5750 | Multiplication of dataframe and other, element-wise (binary operator `rmul`).
5751 |
5752 | Equivalent to ``other * dataframe``, but with support to substitute a fill_value for
5753 | missing data in one of the inputs.
5754 |
5755 | Parameters
5756 | ----------
5757 | other : Series, DataFrame, or constant
5758 | axis : {0, 1, 'index', 'columns'}
5759 | For Series input, axis to match Series index on
5760 | level : int or name
5761 | Broadcast across a level, matching Index values on the
5762 | passed MultiIndex level
5763 | fill_value : None or float value, default None
5764 | Fill existing missing (NaN) values, and any new element needed for
5765 | successful DataFrame alignment, with this value before computation.
5766 | If data in both corresponding DataFrame locations is missing
5767 | the result will be missing
5768 |
5769 | Notes
5770 | -----
5771 | Mismatched indices will be unioned together
5772 |
5773 | Returns
5774 | -------
5775 | result : DataFrame
5776 |
5777 | Examples
5778 | --------
5779 | None
5780 |
5781 | See also
5782 | --------
5783 | DataFrame.mul
5784 |
5785 | rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
5786 | Provides rolling window calculations.
5787 |
5788 | .. versionadded:: 0.18.0
5789 |
5790 | Parameters
5791 | ----------
5792 | window : int, or offset
5793 | Size of the moving window. This is the number of observations used for
5794 | calculating the statistic. Each window will be a fixed size.
5795 |
5796 | If its an offset then this will be the time period of each window. Each
5797 | window will be a variable sized based on the observations included in
5798 | the time-period. This is only valid for datetimelike indexes. This is
5799 | new in 0.19.0
5800 | min_periods : int, default None
5801 | Minimum number of observations in window required to have a value
5802 | (otherwise result is NA). For a window that is specified by an offset,
5803 | this will default to 1.
5804 | center : boolean, default False
5805 | Set the labels at the center of the window.
5806 | win_type : string, default None
5807 | Provide a window type. If ``None``, all points are evenly weighted.
5808 | See the notes below for further information.
5809 | on : string, optional
5810 | For a DataFrame, column on which to calculate
5811 | the rolling window, rather than the index
5812 | closed : string, default None
5813 | Make the interval closed on the 'right', 'left', 'both' or
5814 | 'neither' endpoints.
5815 | For offset-based windows, it defaults to 'right'.
5816 | For fixed windows, defaults to 'both'. Remaining cases not implemented
5817 | for fixed windows.
5818 |
5819 | .. versionadded:: 0.20.0
5820 |
5821 | axis : int or string, default 0
5822 |
5823 | Returns
5824 | -------
5825 | a Window or Rolling sub-classed for the particular operation
5826 |
5827 | Examples
5828 | --------
5829 |
5830 | >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
5831 | >>> df
5832 | B
5833 | 0 0.0
5834 | 1 1.0
5835 | 2 2.0
5836 | 3 NaN
5837 | 4 4.0
5838 |
5839 | Rolling sum with a window length of 2, using the 'triang'
5840 | window type.
5841 |
5842 | >>> df.rolling(2, win_type='triang').sum()
5843 | B
5844 | 0 NaN
5845 | 1 1.0
5846 | 2 2.5
5847 | 3 NaN
5848 | 4 NaN
5849 |
5850 | Rolling sum with a window length of 2, min_periods defaults
5851 | to the window length.
5852 |
5853 | >>> df.rolling(2).sum()
5854 | B
5855 | 0 NaN
5856 | 1 1.0
5857 | 2 3.0
5858 | 3 NaN
5859 | 4 NaN
5860 |
5861 | Same as above, but explicitly set the min_periods
5862 |
5863 | >>> df.rolling(2, min_periods=1).sum()
5864 | B
5865 | 0 0.0
5866 | 1 1.0
5867 | 2 3.0
5868 | 3 2.0
5869 | 4 4.0
5870 |
5871 | A ragged (meaning not-a-regular frequency), time-indexed DataFrame
5872 |
5873 | >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
5874 | ... index = [pd.Timestamp('20130101 09:00:00'),
5875 | ... pd.Timestamp('20130101 09:00:02'),
5876 | ... pd.Timestamp('20130101 09:00:03'),
5877 | ... pd.Timestamp('20130101 09:00:05'),
5878 | ... pd.Timestamp('20130101 09:00:06')])
5879 |
5880 | >>> df
5881 | B
5882 | 2013-01-01 09:00:00 0.0
5883 | 2013-01-01 09:00:02 1.0
5884 | 2013-01-01 09:00:03 2.0
5885 | 2013-01-01 09:00:05 NaN
5886 | 2013-01-01 09:00:06 4.0
5887 |
5888 |
5889 | Contrasting to an integer rolling window, this will roll a variable
5890 | length window corresponding to the time period.
5891 | The default for min_periods is 1.
5892 |
5893 | >>> df.rolling('2s').sum()
5894 | B
5895 | 2013-01-01 09:00:00 0.0
5896 | 2013-01-01 09:00:02 1.0
5897 | 2013-01-01 09:00:03 3.0
5898 | 2013-01-01 09:00:05 NaN
5899 | 2013-01-01 09:00:06 4.0
5900 |
5901 | Notes
5902 | -----
5903 | By default, the result is set to the right edge of the window. This can be
5904 | changed to the center of the window by setting ``center=True``.
5905 |
5906 | To learn more about the offsets & frequency strings, please see `this link
5907 | <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__.
5908 |
5909 | The recognized win_types are:
5910 |
5911 | * ``boxcar``
5912 | * ``triang``
5913 | * ``blackman``
5914 | * ``hamming``
5915 | * ``bartlett``
5916 | * ``parzen``
5917 | * ``bohman``
5918 | * ``blackmanharris``
5919 | * ``nuttall``
5920 | * ``barthann``
5921 | * ``kaiser`` (needs beta)
5922 | * ``gaussian`` (needs std)
5923 | * ``general_gaussian`` (needs power, width)
5924 | * ``slepian`` (needs width).
5925 |
5926 | If ``win_type=None`` all points are evenly weighted. To learn more about
5927 | different window types see `scipy.signal window functions
5928 | <https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions>`__.
5929 |
5930 | See Also
5931 | --------
5932 | expanding : Provides expanding transformations.
5933 | ewm : Provides exponential weighted functions
5934 |
5935 | round(self, decimals=0, *args, **kwargs)
5936 | Round a DataFrame to a variable number of decimal places.
5937 |
5938 | Parameters
5939 | ----------
5940 | decimals : int, dict, Series
5941 | Number of decimal places to round each column to. If an int is
5942 | given, round each column to the same number of places.
5943 | Otherwise dict and Series round to variable numbers of places.
5944 | Column names should be in the keys if `decimals` is a
5945 | dict-like, or in the index if `decimals` is a Series. Any
5946 | columns not included in `decimals` will be left as is. Elements
5947 | of `decimals` which are not columns of the input will be
5948 | ignored.
5949 |
5950 | Examples
5951 | --------
5952 | >>> df = pd.DataFrame(np.random.random([3, 3]),
5953 | ... columns=['A', 'B', 'C'], index=['first', 'second', 'third'])
5954 | >>> df
5955 | A B C
5956 | first 0.028208 0.992815 0.173891
5957 | second 0.038683 0.645646 0.577595
5958 | third 0.877076 0.149370 0.491027
5959 | >>> df.round(2)
5960 | A B C
5961 | first 0.03 0.99 0.17
5962 | second 0.04 0.65 0.58
5963 | third 0.88 0.15 0.49
5964 | >>> df.round({'A': 1, 'C': 2})
5965 | A B C
5966 | first 0.0 0.992815 0.17
5967 | second 0.0 0.645646 0.58
5968 | third 0.9 0.149370 0.49
5969 | >>> decimals = pd.Series([1, 0, 2], index=['A', 'B', 'C'])
5970 | >>> df.round(decimals)
5971 | A B C
5972 | first 0.0 1 0.17
5973 | second 0.0 1 0.58
5974 | third 0.9 0 0.49
5975 |
5976 | Returns
5977 | -------
5978 | DataFrame object
5979 |
5980 | See Also
5981 | --------
5982 | numpy.around
5983 | Series.round
5984 |
5985 | rpow(self, other, axis='columns', level=None, fill_value=None)
5986 | Exponential power of dataframe and other, element-wise (binary operator `rpow`).
5987 |
5988 | Equivalent to ``other ** dataframe``, but with support to substitute a fill_value for
5989 | missing data in one of the inputs.
5990 |
5991 | Parameters
5992 | ----------
5993 | other : Series, DataFrame, or constant
5994 | axis : {0, 1, 'index', 'columns'}
5995 | For Series input, axis to match Series index on
5996 | level : int or name
5997 | Broadcast across a level, matching Index values on the
5998 | passed MultiIndex level
5999 | fill_value : None or float value, default None
6000 | Fill existing missing (NaN) values, and any new element needed for
6001 | successful DataFrame alignment, with this value before computation.
6002 | If data in both corresponding DataFrame locations is missing
6003 | the result will be missing
6004 |
6005 | Notes
6006 | -----
6007 | Mismatched indices will be unioned together
6008 |
6009 | Returns
6010 | -------
6011 | result : DataFrame
6012 |
6013 | Examples
6014 | --------
6015 | None
6016 |
6017 | See also
6018 | --------
6019 | DataFrame.pow
6020 |
6021 | rsub(self, other, axis='columns', level=None, fill_value=None)
6022 | Subtraction of dataframe and other, element-wise (binary operator `rsub`).
6023 |
6024 | Equivalent to ``other - dataframe``, but with support to substitute a fill_value for
6025 | missing data in one of the inputs.
6026 |
6027 | Parameters
6028 | ----------
6029 | other : Series, DataFrame, or constant
6030 | axis : {0, 1, 'index', 'columns'}
6031 | For Series input, axis to match Series index on
6032 | level : int or name
6033 | Broadcast across a level, matching Index values on the
6034 | passed MultiIndex level
6035 | fill_value : None or float value, default None
6036 | Fill existing missing (NaN) values, and any new element needed for
6037 | successful DataFrame alignment, with this value before computation.
6038 | If data in both corresponding DataFrame locations is missing
6039 | the result will be missing
6040 |
6041 | Notes
6042 | -----
6043 | Mismatched indices will be unioned together
6044 |
6045 | Returns
6046 | -------
6047 | result : DataFrame
6048 |
6049 | Examples
6050 | --------
6051 |
6052 | >>> a = pd.DataFrame([2, 1, 1, np.nan], index=['a', 'b', 'c', 'd'],
6053 | ... columns=['one'])
6054 | >>> a
6055 | one
6056 | a 2.0
6057 | b 1.0
6058 | c 1.0
6059 | d NaN
6060 | >>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan],
6061 | ... two=[3, 2, np.nan, 2]),
6062 | ... index=['a', 'b', 'd', 'e'])
6063 | >>> b
6064 | one two
6065 | a 1.0 3.0
6066 | b NaN 2.0
6067 | d 1.0 NaN
6068 | e NaN 2.0
6069 | >>> a.sub(b, fill_value=0)
6070 | one two
6071 | a 1.0 -3.0
6072 | b 1.0 -2.0
6073 | c 1.0 NaN
6074 | d -1.0 NaN
6075 | e NaN -2.0
6076 |
6077 |
6078 | See also
6079 | --------
6080 | DataFrame.sub
6081 |
6082 | rtruediv(self, other, axis='columns', level=None, fill_value=None)
6083 | Floating division of dataframe and other, element-wise (binary operator `rtruediv`).
6084 |
6085 | Equivalent to ``other / dataframe``, but with support to substitute a fill_value for
6086 | missing data in one of the inputs.
6087 |
6088 | Parameters
6089 | ----------
6090 | other : Series, DataFrame, or constant
6091 | axis : {0, 1, 'index', 'columns'}
6092 | For Series input, axis to match Series index on
6093 | level : int or name
6094 | Broadcast across a level, matching Index values on the
6095 | passed MultiIndex level
6096 | fill_value : None or float value, default None
6097 | Fill existing missing (NaN) values, and any new element needed for
6098 | successful DataFrame alignment, with this value before computation.
6099 | If data in both corresponding DataFrame locations is missing
6100 | the result will be missing
6101 |
6102 | Notes
6103 | -----
6104 | Mismatched indices will be unioned together
6105 |
6106 | Returns
6107 | -------
6108 | result : DataFrame
6109 |
6110 | Examples
6111 | --------
6112 | None
6113 |
6114 | See also
6115 | --------
6116 | DataFrame.truediv
6117 |
6118 | select_dtypes(self, include=None, exclude=None)
6119 | Return a subset of the DataFrame's columns based on the column dtypes.
6120 |
6121 | Parameters
6122 | ----------
6123 | include, exclude : scalar or list-like
6124 | A selection of dtypes or strings to be included/excluded. At least
6125 | one of these parameters must be supplied.
6126 |
6127 | Raises
6128 | ------
6129 | ValueError
6130 | * If both of ``include`` and ``exclude`` are empty
6131 | * If ``include`` and ``exclude`` have overlapping elements
6132 | * If any kind of string dtype is passed in.
6133 |
6134 | Returns
6135 | -------
6136 | subset : DataFrame
6137 | The subset of the frame including the dtypes in ``include`` and
6138 | excluding the dtypes in ``exclude``.
6139 |
6140 | Notes
6141 | -----
6142 | * To select all *numeric* types, use ``np.number`` or ``'number'``
6143 | * To select strings you must use the ``object`` dtype, but note that
6144 | this will return *all* object dtype columns
6145 | * See the `numpy dtype hierarchy
6146 | <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>`__
6147 | * To select datetimes, use ``np.datetime64``, ``'datetime'`` or
6148 | ``'datetime64'``
6149 | * To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or
6150 | ``'timedelta64'``
6151 | * To select Pandas categorical dtypes, use ``'category'``
6152 | * To select Pandas datetimetz dtypes, use ``'datetimetz'`` (new in
6153 | 0.20.0) or ``'datetime64[ns, tz]'``
6154 |
6155 | Examples
6156 | --------
6157 | >>> df = pd.DataFrame({'a': [1, 2] * 3,
6158 | ... 'b': [True, False] * 3,
6159 | ... 'c': [1.0, 2.0] * 3})
6160 | >>> df
6161 | a b c
6162 | 0 1 True 1.0
6163 | 1 2 False 2.0
6164 | 2 1 True 1.0
6165 | 3 2 False 2.0
6166 | 4 1 True 1.0
6167 | 5 2 False 2.0
6168 |
6169 | >>> df.select_dtypes(include='bool')
6170 | b
6171 | 0 True
6172 | 1 False
6173 | 2 True
6174 | 3 False
6175 | 4 True
6176 | 5 False
6177 |
6178 | >>> df.select_dtypes(include=['float64'])
6179 | c
6180 | 0 1.0
6181 | 1 2.0
6182 | 2 1.0
6183 | 3 2.0
6184 | 4 1.0
6185 | 5 2.0
6186 |
6187 | >>> df.select_dtypes(exclude=['int'])
6188 | b c
6189 | 0 True 1.0
6190 | 1 False 2.0
6191 | 2 True 1.0
6192 | 3 False 2.0
6193 | 4 True 1.0
6194 | 5 False 2.0
6195 |
6196 | sem(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
6197 | Return unbiased standard error of the mean over requested axis.
6198 |
6199 | Normalized by N-1 by default. This can be changed using the ddof argument
6200 |
6201 | Parameters
6202 | ----------
6203 | axis : {index (0), columns (1)}
6204 | skipna : boolean, default True
6205 | Exclude NA/null values. If an entire row/column is NA, the result
6206 | will be NA
6207 | level : int or level name, default None
6208 | If the axis is a MultiIndex (hierarchical), count along a
6209 | particular level, collapsing into a Series
6210 | ddof : int, default 1
6211 | Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
6212 | where N represents the number of elements.
6213 | numeric_only : boolean, default None
6214 | Include only float, int, boolean columns. If None, will attempt to use
6215 | everything, then use only numeric data. Not implemented for Series.
6216 |
6217 | Returns
6218 | -------
6219 | sem : Series or DataFrame (if level specified)
6220 |
6221 | set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
6222 | Set the DataFrame index (row labels) using one or more existing
6223 | columns. By default yields a new object.
6224 |
6225 | Parameters
6226 | ----------
6227 | keys : column label or list of column labels / arrays
6228 | drop : boolean, default True
6229 | Delete columns to be used as the new index
6230 | append : boolean, default False
6231 | Whether to append columns to existing index
6232 | inplace : boolean, default False
6233 | Modify the DataFrame in place (do not create a new object)
6234 | verify_integrity : boolean, default False
6235 | Check the new index for duplicates. Otherwise defer the check until
6236 | necessary. Setting to False will improve the performance of this
6237 | method
6238 |
6239 | Examples
6240 | --------
6241 | >>> df = pd.DataFrame({'month': [1, 4, 7, 10],
6242 | ... 'year': [2012, 2014, 2013, 2014],
6243 | ... 'sale':[55, 40, 84, 31]})
6244 | month sale year
6245 | 0 1 55 2012
6246 | 1 4 40 2014
6247 | 2 7 84 2013
6248 | 3 10 31 2014
6249 |
6250 | Set the index to become the 'month' column:
6251 |
6252 | >>> df.set_index('month')
6253 | sale year
6254 | month
6255 | 1 55 2012
6256 | 4 40 2014
6257 | 7 84 2013
6258 | 10 31 2014
6259 |
6260 | Create a multi-index using columns 'year' and 'month':
6261 |
6262 | >>> df.set_index(['year', 'month'])
6263 | sale
6264 | year month
6265 | 2012 1 55
6266 | 2014 4 40
6267 | 2013 7 84
6268 | 2014 10 31
6269 |
6270 | Create a multi-index using a set of values and a column:
6271 |
6272 | >>> df.set_index([[1, 2, 3, 4], 'year'])
6273 | month sale
6274 | year
6275 | 1 2012 1 55
6276 | 2 2014 4 40
6277 | 3 2013 7 84
6278 | 4 2014 10 31
6279 |
6280 | Returns
6281 | -------
6282 | dataframe : DataFrame
6283 |
6284 | set_value(self, index, col, value, takeable=False)
6285 | Put single value at passed column and index
6286 |
6287 | .. deprecated:: 0.21.0
6288 | Use .at[] or .iat[] accessors instead.
6289 |
6290 | Parameters
6291 | ----------
6292 | index : row label
6293 | col : column label
6294 | value : scalar value
6295 | takeable : interpret the index/col as indexers, default False
6296 |
6297 | Returns
6298 | -------
6299 | frame : DataFrame
6300 | If label pair is contained, will be reference to calling DataFrame,
6301 | otherwise a new object
6302 |
6303 | shift(self, periods=1, freq=None, axis=0)
6304 | Shift index by desired number of periods with an optional time freq
6305 |
6306 | Parameters
6307 | ----------
6308 | periods : int
6309 | Number of periods to move, can be positive or negative
6310 | freq : DateOffset, timedelta, or time rule string, optional
6311 | Increment to use from the tseries module or time rule (e.g. 'EOM').
6312 | See Notes.
6313 | axis : {0 or 'index', 1 or 'columns'}
6314 |
6315 | Notes
6316 | -----
6317 | If freq is specified then the index values are shifted but the data
6318 | is not realigned. That is, use freq if you would like to extend the
6319 | index when shifting and preserve the original data.
6320 |
6321 | Returns
6322 | -------
6323 | shifted : DataFrame
6324 |
6325 | skew(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
6326 | Return unbiased skew over requested axis
6327 | Normalized by N-1
6328 |
6329 | Parameters
6330 | ----------
6331 | axis : {index (0), columns (1)}
6332 | skipna : boolean, default True
6333 | Exclude NA/null values when computing the result.
6334 | level : int or level name, default None
6335 | If the axis is a MultiIndex (hierarchical), count along a
6336 | particular level, collapsing into a Series
6337 | numeric_only : boolean, default None
6338 | Include only float, int, boolean columns. If None, will attempt to use
6339 | everything, then use only numeric data. Not implemented for Series.
6340 |
6341 | Returns
6342 | -------
6343 | skew : Series or DataFrame (if level specified)
6344 |
6345 | sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)
6346 | Sort object by labels (along an axis)
6347 |
6348 | Parameters
6349 | ----------
6350 | axis : index, columns to direct sorting
6351 | level : int or level name or list of ints or list of level names
6352 | if not None, sort on values in specified index level(s)
6353 | ascending : boolean, default True
6354 | Sort ascending vs. descending
6355 | inplace : bool, default False
6356 | if True, perform operation in-place
6357 | kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
6358 | Choice of sorting algorithm. See also ndarray.np.sort for more
6359 | information. `mergesort` is the only stable algorithm. For
6360 | DataFrames, this option is only applied when sorting on a single
6361 | column or label.
6362 | na_position : {'first', 'last'}, default 'last'
6363 | `first` puts NaNs at the beginning, `last` puts NaNs at the end.
6364 | Not implemented for MultiIndex.
6365 | sort_remaining : bool, default True
6366 | if true and sorting by level and index is multilevel, sort by other
6367 | levels too (in order) after sorting by specified level
6368 |
6369 | Returns
6370 | -------
6371 | sorted_obj : DataFrame
6372 |
6373 | sort_values(self, by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
6374 | Sort by the values along either axis
6375 |
6376 | Parameters
6377 | ----------
6378 | by : str or list of str
6379 | Name or list of names to sort by.
6380 |
6381 | - if `axis` is 0 or `'index'` then `by` may contain index
6382 | levels and/or column labels
6383 | - if `axis` is 1 or `'columns'` then `by` may contain column
6384 | levels and/or index labels
6385 |
6386 | .. versionchanged:: 0.23.0
6387 | Allow specifying index or column level names.
6388 | axis : {0 or 'index', 1 or 'columns'}, default 0
6389 | Axis to be sorted
6390 | ascending : bool or list of bool, default True
6391 | Sort ascending vs. descending. Specify list for multiple sort
6392 | orders. If this is a list of bools, must match the length of
6393 | the by.
6394 | inplace : bool, default False
6395 | if True, perform operation in-place
6396 | kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
6397 | Choice of sorting algorithm. See also ndarray.np.sort for more
6398 | information. `mergesort` is the only stable algorithm. For
6399 | DataFrames, this option is only applied when sorting on a single
6400 | column or label.
6401 | na_position : {'first', 'last'}, default 'last'
6402 | `first` puts NaNs at the beginning, `last` puts NaNs at the end
6403 |
6404 | Returns
6405 | -------
6406 | sorted_obj : DataFrame
6407 |
6408 | Examples
6409 | --------
6410 | >>> df = pd.DataFrame({
6411 | ... 'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
6412 | ... 'col2' : [2, 1, 9, 8, 7, 4],
6413 | ... 'col3': [0, 1, 9, 4, 2, 3],
6414 | ... })
6415 | >>> df
6416 | col1 col2 col3
6417 | 0 A 2 0
6418 | 1 A 1 1
6419 | 2 B 9 9
6420 | 3 NaN 8 4
6421 | 4 D 7 2
6422 | 5 C 4 3
6423 |
6424 | Sort by col1
6425 |
6426 | >>> df.sort_values(by=['col1'])
6427 | col1 col2 col3
6428 | 0 A 2 0
6429 | 1 A 1 1
6430 | 2 B 9 9
6431 | 5 C 4 3
6432 | 4 D 7 2
6433 | 3 NaN 8 4
6434 |
6435 | Sort by multiple columns
6436 |
6437 | >>> df.sort_values(by=['col1', 'col2'])
6438 | col1 col2 col3
6439 | 1 A 1 1
6440 | 0 A 2 0
6441 | 2 B 9 9
6442 | 5 C 4 3
6443 | 4 D 7 2
6444 | 3 NaN 8 4
6445 |
6446 | Sort Descending
6447 |
6448 | >>> df.sort_values(by='col1', ascending=False)
6449 | col1 col2 col3
6450 | 4 D 7 2
6451 | 5 C 4 3
6452 | 2 B 9 9
6453 | 0 A 2 0
6454 | 1 A 1 1
6455 | 3 NaN 8 4
6456 |
6457 | Putting NAs first
6458 |
6459 | >>> df.sort_values(by='col1', ascending=False, na_position='first')
6460 | col1 col2 col3
6461 | 3 NaN 8 4
6462 | 4 D 7 2
6463 | 5 C 4 3
6464 | 2 B 9 9
6465 | 0 A 2 0
6466 | 1 A 1 1
6467 |
6468 | sortlevel(self, level=0, axis=0, ascending=True, inplace=False, sort_remaining=True)
6469 | Sort multilevel index by chosen axis and primary level. Data will be
6470 | lexicographically sorted by the chosen level followed by the other
6471 | levels (in order).
6472 |
6473 | .. deprecated:: 0.20.0
6474 | Use :meth:`DataFrame.sort_index`
6475 |
6476 |
6477 | Parameters
6478 | ----------
6479 | level : int
6480 | axis : {0 or 'index', 1 or 'columns'}, default 0
6481 | ascending : boolean, default True
6482 | inplace : boolean, default False
6483 | Sort the DataFrame without creating a new instance
6484 | sort_remaining : boolean, default True
6485 | Sort by the other levels too.
6486 |
6487 | Returns
6488 | -------
6489 | sorted : DataFrame
6490 |
6491 | See Also
6492 | --------
6493 | DataFrame.sort_index(level=...)
6494 |
6495 | stack(self, level=-1, dropna=True)
6496 | Stack the prescribed level(s) from columns to index.
6497 |
6498 | Return a reshaped DataFrame or Series having a multi-level
6499 | index with one or more new inner-most levels compared to the current
6500 | DataFrame. The new inner-most levels are created by pivoting the
6501 | columns of the current dataframe:
6502 |
6503 | - if the columns have a single level, the output is a Series;
6504 | - if the columns have multiple levels, the new index
6505 | level(s) is (are) taken from the prescribed level(s) and
6506 | the output is a DataFrame.
6507 |
6508 | The new index levels are sorted.
6509 |
6510 | Parameters
6511 | ----------
6512 | level : int, str, list, default -1
6513 | Level(s) to stack from the column axis onto the index
6514 | axis, defined as one index or label, or a list of indices
6515 | or labels.
6516 | dropna : bool, default True
6517 | Whether to drop rows in the resulting Frame/Series with
6518 | missing values. Stacking a column level onto the index
6519 | axis can create combinations of index and column values
6520 | that are missing from the original dataframe. See Examples
6521 | section.
6522 |
6523 | Returns
6524 | -------
6525 | DataFrame or Series
6526 | Stacked dataframe or series.
6527 |
6528 | See Also
6529 | --------
6530 | DataFrame.unstack : Unstack prescribed level(s) from index axis
6531 | onto column axis.
6532 | DataFrame.pivot : Reshape dataframe from long format to wide
6533 | format.
6534 | DataFrame.pivot_table : Create a spreadsheet-style pivot table
6535 | as a DataFrame.
6536 |
6537 | Notes
6538 | -----
6539 | The function is named by analogy with a collection of books
6540 | being re-organised from being side by side on a horizontal
6541 | position (the columns of the dataframe) to being stacked
6542 | vertically on top of of each other (in the index of the
6543 | dataframe).
6544 |
6545 | Examples
6546 | --------
6547 | **Single level columns**
6548 |
6549 | >>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
6550 | ... index=['cat', 'dog'],
6551 | ... columns=['weight', 'height'])
6552 |
6553 | Stacking a dataframe with a single level column axis returns a Series:
6554 |
6555 | >>> df_single_level_cols
6556 | weight height
6557 | cat 0 1
6558 | dog 2 3
6559 | >>> df_single_level_cols.stack()
6560 | cat weight 0
6561 | height 1
6562 | dog weight 2
6563 | height 3
6564 | dtype: int64
6565 |
6566 | **Multi level columns: simple case**
6567 |
6568 | >>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
6569 | ... ('weight', 'pounds')])
6570 | >>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
6571 | ... index=['cat', 'dog'],
6572 | ... columns=multicol1)
6573 |
6574 | Stacking a dataframe with a multi-level column axis:
6575 |
6576 | >>> df_multi_level_cols1
6577 | weight
6578 | kg pounds
6579 | cat 1 2
6580 | dog 2 4
6581 | >>> df_multi_level_cols1.stack()
6582 | weight
6583 | cat kg 1
6584 | pounds 2
6585 | dog kg 2
6586 | pounds 4
6587 |
6588 | **Missing values**
6589 |
6590 | >>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
6591 | ... ('height', 'm')])
6592 | >>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
6593 | ... index=['cat', 'dog'],
6594 | ... columns=multicol2)
6595 |
6596 | It is common to have missing values when stacking a dataframe
6597 | with multi-level columns, as the stacked dataframe typically
6598 | has more values than the original dataframe. Missing values
6599 | are filled with NaNs:
6600 |
6601 | >>> df_multi_level_cols2
6602 | weight height
6603 | kg m
6604 | cat 1.0 2.0
6605 | dog 3.0 4.0
6606 | >>> df_multi_level_cols2.stack()
6607 | height weight
6608 | cat kg NaN 1.0
6609 | m 2.0 NaN
6610 | dog kg NaN 3.0
6611 | m 4.0 NaN
6612 |
6613 | **Prescribing the level(s) to be stacked**
6614 |
6615 | The first parameter controls which level or levels are stacked:
6616 |
6617 | >>> df_multi_level_cols2.stack(0)
6618 | kg m
6619 | cat height NaN 2.0
6620 | weight 1.0 NaN
6621 | dog height NaN 4.0
6622 | weight 3.0 NaN
6623 | >>> df_multi_level_cols2.stack([0, 1])
6624 | cat height m 2.0
6625 | weight kg 1.0
6626 | dog height m 4.0
6627 | weight kg 3.0
6628 | dtype: float64
6629 |
6630 | **Dropping missing values**
6631 |
6632 | >>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
6633 | ... index=['cat', 'dog'],
6634 | ... columns=multicol2)
6635 |
6636 | Note that rows where all values are missing are dropped by
6637 | default but this behaviour can be controlled via the dropna
6638 | keyword parameter:
6639 |
6640 | >>> df_multi_level_cols3
6641 | weight height
6642 | kg m
6643 | cat NaN 1.0
6644 | dog 2.0 3.0
6645 | >>> df_multi_level_cols3.stack(dropna=False)
6646 | height weight
6647 | cat kg NaN NaN
6648 | m 1.0 NaN
6649 | dog kg NaN 2.0
6650 | m 3.0 NaN
6651 | >>> df_multi_level_cols3.stack(dropna=True)
6652 | height weight
6653 | cat m 1.0 NaN
6654 | dog kg NaN 2.0
6655 | m 3.0 NaN
6656 |
6657 | std(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
6658 | Return sample standard deviation over requested axis.
6659 |
6660 | Normalized by N-1 by default. This can be changed using the ddof argument
6661 |
6662 | Parameters
6663 | ----------
6664 | axis : {index (0), columns (1)}
6665 | skipna : boolean, default True
6666 | Exclude NA/null values. If an entire row/column is NA, the result
6667 | will be NA
6668 | level : int or level name, default None
6669 | If the axis is a MultiIndex (hierarchical), count along a
6670 | particular level, collapsing into a Series
6671 | ddof : int, default 1
6672 | Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
6673 | where N represents the number of elements.
6674 | numeric_only : boolean, default None
6675 | Include only float, int, boolean columns. If None, will attempt to use
6676 | everything, then use only numeric data. Not implemented for Series.
6677 |
6678 | Returns
6679 | -------
6680 | std : Series or DataFrame (if level specified)
6681 |
6682 | sub(self, other, axis='columns', level=None, fill_value=None)
6683 | Subtraction of dataframe and other, element-wise (binary operator `sub`).
6684 |
6685 | Equivalent to ``dataframe - other``, but with support to substitute a fill_value for
6686 | missing data in one of the inputs.
6687 |
6688 | Parameters
6689 | ----------
6690 | other : Series, DataFrame, or constant
6691 | axis : {0, 1, 'index', 'columns'}
6692 | For Series input, axis to match Series index on
6693 | level : int or name
6694 | Broadcast across a level, matching Index values on the
6695 | passed MultiIndex level
6696 | fill_value : None or float value, default None
6697 | Fill existing missing (NaN) values, and any new element needed for
6698 | successful DataFrame alignment, with this value before computation.
6699 | If data in both corresponding DataFrame locations is missing
6700 | the result will be missing
6701 |
6702 | Notes
6703 | -----
6704 | Mismatched indices will be unioned together
6705 |
6706 | Returns
6707 | -------
6708 | result : DataFrame
6709 |
6710 | Examples
6711 | --------
6712 |
6713 | >>> a = pd.DataFrame([2, 1, 1, np.nan], index=['a', 'b', 'c', 'd'],
6714 | ... columns=['one'])
6715 | >>> a
6716 | one
6717 | a 2.0
6718 | b 1.0
6719 | c 1.0
6720 | d NaN
6721 | >>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan],
6722 | ... two=[3, 2, np.nan, 2]),
6723 | ... index=['a', 'b', 'd', 'e'])
6724 | >>> b
6725 | one two
6726 | a 1.0 3.0
6727 | b NaN 2.0
6728 | d 1.0 NaN
6729 | e NaN 2.0
6730 | >>> a.sub(b, fill_value=0)
6731 | one two
6732 | a 1.0 -3.0
6733 | b 1.0 -2.0
6734 | c 1.0 NaN
6735 | d -1.0 NaN
6736 | e NaN -2.0
6737 |
6738 |
6739 | See also
6740 | --------
6741 | DataFrame.rsub
6742 |
6743 | subtract = sub(self, other, axis='columns', level=None, fill_value=None)
6744 |
6745 | sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
6746 | Return the sum of the values for the requested axis
6747 |
6748 | Parameters
6749 | ----------
6750 | axis : {index (0), columns (1)}
6751 | skipna : boolean, default True
6752 | Exclude NA/null values when computing the result.
6753 | level : int or level name, default None
6754 | If the axis is a MultiIndex (hierarchical), count along a
6755 | particular level, collapsing into a Series
6756 | numeric_only : boolean, default None
6757 | Include only float, int, boolean columns. If None, will attempt to use
6758 | everything, then use only numeric data. Not implemented for Series.
6759 | min_count : int, default 0
6760 | The required number of valid values to perform the operation. If fewer than
6761 | ``min_count`` non-NA values are present the result will be NA.
6762 |
6763 | .. versionadded :: 0.22.0
6764 |
6765 | Added with the default being 0. This means the sum of an all-NA
6766 | or empty Series is 0, and the product of an all-NA or empty
6767 | Series is 1.
6768 |
6769 | Returns
6770 | -------
6771 | sum : Series or DataFrame (if level specified)
6772 |
6773 | Examples
6774 | --------
6775 | By default, the sum of an empty or all-NA Series is ``0``.
6776 |
6777 | >>> pd.Series([]).sum() # min_count=0 is the default
6778 | 0.0
6779 |
6780 | This can be controlled with the ``min_count`` parameter. For example, if
6781 | you'd like the sum of an empty series to be NaN, pass ``min_count=1``.
6782 |
6783 | >>> pd.Series([]).sum(min_count=1)
6784 | nan
6785 |
6786 | Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
6787 | empty series identically.
6788 |
6789 | >>> pd.Series([np.nan]).sum()
6790 | 0.0
6791 |
6792 | >>> pd.Series([np.nan]).sum(min_count=1)
6793 | nan
6794 |
6795 | swaplevel(self, i=-2, j=-1, axis=0)
6796 | Swap levels i and j in a MultiIndex on a particular axis
6797 |
6798 | Parameters
6799 | ----------
6800 | i, j : int, string (can be mixed)
6801 | Level of index to be swapped. Can pass level name as string.
6802 |
6803 | Returns
6804 | -------
6805 | swapped : type of caller (new object)
6806 |
6807 | .. versionchanged:: 0.18.1
6808 |
6809 | The indexes ``i`` and ``j`` are now optional, and default to
6810 | the two innermost levels of the index.
6811 |
6812 | to_csv(self, path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')
6813 | Write DataFrame to a comma-separated values (csv) file
6814 |
6815 | Parameters
6816 | ----------
6817 | path_or_buf : string or file handle, default None
6818 | File path or object, if None is provided the result is returned as
6819 | a string.
6820 | sep : character, default ','
6821 | Field delimiter for the output file.
6822 | na_rep : string, default ''
6823 | Missing data representation
6824 | float_format : string, default None
6825 | Format string for floating point numbers
6826 | columns : sequence, optional
6827 | Columns to write
6828 | header : boolean or list of string, default True
6829 | Write out the column names. If a list of strings is given it is
6830 | assumed to be aliases for the column names
6831 | index : boolean, default True
6832 | Write row names (index)
6833 | index_label : string or sequence, or False, default None
6834 | Column label for index column(s) if desired. If None is given, and
6835 | `header` and `index` are True, then the index names are used. A
6836 | sequence should be given if the DataFrame uses MultiIndex. If
6837 | False do not print fields for index names. Use index_label=False
6838 | for easier importing in R
6839 | mode : str
6840 | Python write mode, default 'w'
6841 | encoding : string, optional
6842 | A string representing the encoding to use in the output file,
6843 | defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
6844 | compression : string, optional
6845 | A string representing the compression to use in the output file.
6846 | Allowed values are 'gzip', 'bz2', 'zip', 'xz'. This input is only
6847 | used when the first argument is a filename.
6848 | line_terminator : string, default ``'\n'``
6849 | The newline character or character sequence to use in the output
6850 | file
6851 | quoting : optional constant from csv module
6852 | defaults to csv.QUOTE_MINIMAL. If you have set a `float_format`
6853 | then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
6854 | will treat them as non-numeric
6855 | quotechar : string (length 1), default '\"'
6856 | character used to quote fields
6857 | doublequote : boolean, default True
6858 | Control quoting of `quotechar` inside a field
6859 | escapechar : string (length 1), default None
6860 | character used to escape `sep` and `quotechar` when appropriate
6861 | chunksize : int or None
6862 | rows to write at a time
6863 | tupleize_cols : boolean, default False
6864 | .. deprecated:: 0.21.0
6865 | This argument will be removed and will always write each row
6866 | of the multi-index as a separate row in the CSV file.
6867 |
6868 | Write MultiIndex columns as a list of tuples (if True) or in
6869 | the new, expanded format, where each MultiIndex column is a row
6870 | in the CSV (if False).
6871 | date_format : string, default None
6872 | Format string for datetime objects
6873 | decimal: string, default '.'
6874 | Character recognized as decimal separator. E.g. use ',' for
6875 | European data
6876 |
6877 | to_dict(self, orient='dict', into=<class 'dict'>)
6878 | Convert the DataFrame to a dictionary.
6879 |
6880 | The type of the key-value pairs can be customized with the parameters
6881 | (see below).
6882 |
6883 | Parameters
6884 | ----------
6885 | orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
6886 | Determines the type of the values of the dictionary.
6887 |
6888 | - 'dict' (default) : dict like {column -> {index -> value}}
6889 | - 'list' : dict like {column -> [values]}
6890 | - 'series' : dict like {column -> Series(values)}
6891 | - 'split' : dict like
6892 | {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
6893 | - 'records' : list like
6894 | [{column -> value}, ... , {column -> value}]
6895 | - 'index' : dict like {index -> {column -> value}}
6896 |
6897 | Abbreviations are allowed. `s` indicates `series` and `sp`
6898 | indicates `split`.
6899 |
6900 | into : class, default dict
6901 | The collections.Mapping subclass used for all Mappings
6902 | in the return value. Can be the actual class or an empty
6903 | instance of the mapping type you want. If you want a
6904 | collections.defaultdict, you must pass it initialized.
6905 |
6906 | .. versionadded:: 0.21.0
6907 |
6908 | Returns
6909 | -------
6910 | result : collections.Mapping like {column -> {index -> value}}
6911 |
6912 | See Also
6913 | --------
6914 | DataFrame.from_dict: create a DataFrame from a dictionary
6915 | DataFrame.to_json: convert a DataFrame to JSON format
6916 |
6917 | Examples
6918 | --------
6919 | >>> df = pd.DataFrame({'col1': [1, 2],
6920 | ... 'col2': [0.5, 0.75]},
6921 | ... index=['a', 'b'])
6922 | >>> df
6923 | col1 col2
6924 | a 1 0.50
6925 | b 2 0.75
6926 | >>> df.to_dict()
6927 | {'col1': {'a': 1, 'b': 2}, 'col2': {'a': 0.5, 'b': 0.75}}
6928 |
6929 | You can specify the return orientation.
6930 |
6931 | >>> df.to_dict('series')
6932 | {'col1': a 1
6933 | b 2
6934 | Name: col1, dtype: int64,
6935 | 'col2': a 0.50
6936 | b 0.75
6937 | Name: col2, dtype: float64}
6938 |
6939 | >>> df.to_dict('split')
6940 | {'index': ['a', 'b'], 'columns': ['col1', 'col2'],
6941 | 'data': [[1.0, 0.5], [2.0, 0.75]]}
6942 |
6943 | >>> df.to_dict('records')
6944 | [{'col1': 1.0, 'col2': 0.5}, {'col1': 2.0, 'col2': 0.75}]
6945 |
6946 | >>> df.to_dict('index')
6947 | {'a': {'col1': 1.0, 'col2': 0.5}, 'b': {'col1': 2.0, 'col2': 0.75}}
6948 |
6949 | You can also specify the mapping type.
6950 |
6951 | >>> from collections import OrderedDict, defaultdict
6952 | >>> df.to_dict(into=OrderedDict)
6953 | OrderedDict([('col1', OrderedDict([('a', 1), ('b', 2)])),
6954 | ('col2', OrderedDict([('a', 0.5), ('b', 0.75)]))])
6955 |
6956 | If you want a `defaultdict`, you need to initialize it:
6957 |
6958 | >>> dd = defaultdict(list)
6959 | >>> df.to_dict('records', into=dd)
6960 | [defaultdict(<class 'list'>, {'col1': 1.0, 'col2': 0.5}),
6961 | defaultdict(<class 'list'>, {'col1': 2.0, 'col2': 0.75})]
6962 |
6963 | to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
6964 | Write DataFrame to an excel sheet
6965 |
6966 |
6967 | Parameters
6968 | ----------
6969 | excel_writer : string or ExcelWriter object
6970 | File path or existing ExcelWriter
6971 | sheet_name : string, default 'Sheet1'
6972 | Name of sheet which will contain DataFrame
6973 | na_rep : string, default ''
6974 | Missing data representation
6975 | float_format : string, default None
6976 | Format string for floating point numbers
6977 | columns : sequence, optional
6978 | Columns to write
6979 | header : boolean or list of string, default True
6980 | Write out the column names. If a list of strings is given it is
6981 | assumed to be aliases for the column names
6982 | index : boolean, default True
6983 | Write row names (index)
6984 | index_label : string or sequence, default None
6985 | Column label for index column(s) if desired. If None is given, and
6986 | `header` and `index` are True, then the index names are used. A
6987 | sequence should be given if the DataFrame uses MultiIndex.
6988 | startrow :
6989 | upper left cell row to dump data frame
6990 | startcol :
6991 | upper left cell column to dump data frame
6992 | engine : string, default None
6993 | write engine to use - you can also set this via the options
6994 | ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
6995 | ``io.excel.xlsm.writer``.
6996 | merge_cells : boolean, default True
6997 | Write MultiIndex and Hierarchical Rows as merged cells.
6998 | encoding: string, default None
6999 | encoding of the resulting excel file. Only necessary for xlwt,
7000 | other writers support unicode natively.
7001 | inf_rep : string, default 'inf'
7002 | Representation for infinity (there is no native representation for
7003 | infinity in Excel)
7004 | freeze_panes : tuple of integer (length 2), default None
7005 | Specifies the one-based bottommost row and rightmost column that
7006 | is to be frozen
7007 |
7008 | .. versionadded:: 0.20.0
7009 |
7010 | Notes
7011 | -----
7012 | If passing an existing ExcelWriter object, then the sheet will be added
7013 | to the existing workbook. This can be used to save different
7014 | DataFrames to one workbook:
7015 |
7016 | >>> writer = pd.ExcelWriter('output.xlsx')
7017 | >>> df1.to_excel(writer,'Sheet1')
7018 | >>> df2.to_excel(writer,'Sheet2')
7019 | >>> writer.save()
7020 |
7021 | For compatibility with to_csv, to_excel serializes lists and dicts to
7022 | strings before writing.
7023 |
7024 | to_feather(self, fname)
7025 | write out the binary feather-format for DataFrames
7026 |
7027 | .. versionadded:: 0.20.0
7028 |
7029 | Parameters
7030 | ----------
7031 | fname : str
7032 | string file path
7033 |
7034 | to_gbq(self, destination_table, project_id, chunksize=None, verbose=None, reauth=False, if_exists='fail', private_key=None, auth_local_webserver=False, table_schema=None)
7035 | Write a DataFrame to a Google BigQuery table.
7036 |
7037 | This function requires the `pandas-gbq package
7038 | <https://pandas-gbq.readthedocs.io>`__.
7039 |
7040 | Authentication to the Google BigQuery service is via OAuth 2.0.
7041 |
7042 | - If ``private_key`` is provided, the library loads the JSON service
7043 | account credentials and uses those to authenticate.
7044 |
7045 | - If no ``private_key`` is provided, the library tries `application
7046 | default credentials`_.
7047 |
7048 | .. _application default credentials:
7049 | https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application
7050 |
7051 | - If application default credentials are not found or cannot be used
7052 | with BigQuery, the library authenticates with user account
7053 | credentials. In this case, you will be asked to grant permissions
7054 | for product name 'pandas GBQ'.
7055 |
7056 | Parameters
7057 | ----------
7058 | destination_table : str
7059 | Name of table to be written, in the form 'dataset.tablename'.
7060 | project_id : str
7061 | Google BigQuery Account project ID.
7062 | chunksize : int, optional
7063 | Number of rows to be inserted in each chunk from the dataframe.
7064 | Set to ``None`` to load the whole dataframe at once.
7065 | reauth : bool, default False
7066 | Force Google BigQuery to reauthenticate the user. This is useful
7067 | if multiple accounts are used.
7068 | if_exists : str, default 'fail'
7069 | Behavior when the destination table exists. Value can be one of:
7070 |
7071 | ``'fail'``
7072 | If table exists, do nothing.
7073 | ``'replace'``
7074 | If table exists, drop it, recreate it, and insert data.
7075 | ``'append'``
7076 | If table exists, insert data. Create if does not exist.
7077 | private_key : str, optional
7078 | Service account private key in JSON format. Can be file path
7079 | or string contents. This is useful for remote server
7080 | authentication (eg. Jupyter/IPython notebook on remote host).
7081 | auth_local_webserver : bool, default False
7082 | Use the `local webserver flow`_ instead of the `console flow`_
7083 | when getting user credentials.
7084 |
7085 | .. _local webserver flow:
7086 | http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server
7087 | .. _console flow:
7088 | http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console
7089 |
7090 | *New in version 0.2.0 of pandas-gbq*.
7091 | table_schema : list of dicts, optional
7092 | List of BigQuery table fields to which according DataFrame
7093 | columns conform to, e.g. ``[{'name': 'col1', 'type':
7094 | 'STRING'},...]``. If schema is not provided, it will be
7095 | generated according to dtypes of DataFrame columns. See
7096 | BigQuery API documentation on available names of a field.
7097 |
7098 | *New in version 0.3.1 of pandas-gbq*.
7099 | verbose : boolean, deprecated
7100 | *Deprecated in Pandas-GBQ 0.4.0.* Use the `logging module
7101 | to adjust verbosity instead
7102 | <https://pandas-gbq.readthedocs.io/en/latest/intro.html#logging>`__.
7103 |
7104 | See Also
7105 | --------
7106 | pandas_gbq.to_gbq : This function in the pandas-gbq library.
7107 | pandas.read_gbq : Read a DataFrame from Google BigQuery.
7108 |
7109 | to_html(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False, notebook=False, decimal='.', border=None, table_id=None)
7110 | Render a DataFrame as an HTML table.
7111 |
7112 | `to_html`-specific options:
7113 |
7114 | bold_rows : boolean, default True
7115 | Make the row labels bold in the output
7116 | classes : str or list or tuple, default None
7117 | CSS class(es) to apply to the resulting html table
7118 | escape : boolean, default True
7119 | Convert the characters <, >, and & to HTML-safe sequences.
7120 | max_rows : int, optional
7121 | Maximum number of rows to show before truncating. If None, show
7122 | all.
7123 | max_cols : int, optional
7124 | Maximum number of columns to show before truncating. If None, show
7125 | all.
7126 | decimal : string, default '.'
7127 | Character recognized as decimal separator, e.g. ',' in Europe
7128 |
7129 | .. versionadded:: 0.18.0
7130 |
7131 | border : int
7132 | A ``border=border`` attribute is included in the opening
7133 | `<table>` tag. Default ``pd.options.html.border``.
7134 |
7135 | .. versionadded:: 0.19.0
7136 |
7137 | table_id : str, optional
7138 | A css id is included in the opening `<table>` tag if specified.
7139 |
7140 | .. versionadded:: 0.23.0
7141 |
7142 |
7143 | Parameters
7144 | ----------
7145 | buf : StringIO-like, optional
7146 | buffer to write to
7147 | columns : sequence, optional
7148 | the subset of columns to write; default None writes all columns
7149 | col_space : int, optional
7150 | the minimum width of each column
7151 | header : bool, optional
7152 | whether to print column labels, default True
7153 | index : bool, optional
7154 | whether to print index (row) labels, default True
7155 | na_rep : string, optional
7156 | string representation of NAN to use, default 'NaN'
7157 | formatters : list or dict of one-parameter functions, optional
7158 | formatter functions to apply to columns' elements by position or name,
7159 | default None. The result of each function must be a unicode string.
7160 | List must be of length equal to the number of columns.
7161 | float_format : one-parameter function, optional
7162 | formatter function to apply to columns' elements if they are floats,
7163 | default None. The result of this function must be a unicode string.
7164 | sparsify : bool, optional
7165 | Set to False for a DataFrame with a hierarchical index to print every
7166 | multiindex key at each row, default True
7167 | index_names : bool, optional
7168 | Prints the names of the indexes, default True
7169 | line_width : int, optional
7170 | Width to wrap a line in characters, default no wrap
7171 | table_id : str, optional
7172 | id for the <table> element create by to_html
7173 |
7174 | .. versionadded:: 0.23.0
7175 | justify : str, default None
7176 | How to justify the column labels. If None uses the option from
7177 | the print configuration (controlled by set_option), 'right' out
7178 | of the box. Valid values are
7179 |
7180 | * left
7181 | * right
7182 | * center
7183 | * justify
7184 | * justify-all
7185 | * start
7186 | * end
7187 | * inherit
7188 | * match-parent
7189 | * initial
7190 | * unset
7191 |
7192 |
7193 | Returns
7194 | -------
7195 | formatted : string (or unicode, depending on data and options)
7196 |
7197 | to_panel(self)
7198 | Transform long (stacked) format (DataFrame) into wide (3D, Panel)
7199 | format.
7200 |
7201 | .. deprecated:: 0.20.0
7202 |
7203 | Currently the index of the DataFrame must be a 2-level MultiIndex. This
7204 | may be generalized later
7205 |
7206 | Returns
7207 | -------
7208 | panel : Panel
7209 |
7210 | to_parquet(self, fname, engine='auto', compression='snappy', **kwargs)
7211 | Write a DataFrame to the binary parquet format.
7212 |
7213 | .. versionadded:: 0.21.0
7214 |
7215 | This function writes the dataframe as a `parquet file
7216 | <https://parquet.apache.org/>`_. You can choose different parquet
7217 | backends, and have the option of compression. See
7218 | :ref:`the user guide <io.parquet>` for more details.
7219 |
7220 | Parameters
7221 | ----------
7222 | fname : str
7223 | String file path.
7224 | engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
7225 | Parquet library to use. If 'auto', then the option
7226 | ``io.parquet.engine`` is used. The default ``io.parquet.engine``
7227 | behavior is to try 'pyarrow', falling back to 'fastparquet' if
7228 | 'pyarrow' is unavailable.
7229 | compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
7230 | Name of the compression to use. Use ``None`` for no compression.
7231 | **kwargs
7232 | Additional arguments passed to the parquet library. See
7233 | :ref:`pandas io <io.parquet>` for more details.
7234 |
7235 | See Also
7236 | --------
7237 | read_parquet : Read a parquet file.
7238 | DataFrame.to_csv : Write a csv file.
7239 | DataFrame.to_sql : Write to a sql table.
7240 | DataFrame.to_hdf : Write to hdf.
7241 |
7242 | Notes
7243 | -----
7244 | This function requires either the `fastparquet
7245 | <https://pypi.org/project/fastparquet>`_ or `pyarrow
7246 | <https://arrow.apache.org/docs/python/>`_ library.
7247 |
7248 | Examples
7249 | --------
7250 | >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
7251 | >>> df.to_parquet('df.parquet.gzip', compression='gzip')
7252 | >>> pd.read_parquet('df.parquet.gzip')
7253 | col1 col2
7254 | 0 1 3
7255 | 1 2 4
7256 |
7257 | to_period(self, freq=None, axis=0, copy=True)
7258 | Convert DataFrame from DatetimeIndex to PeriodIndex with desired
7259 | frequency (inferred from index if not passed)
7260 |
7261 | Parameters
7262 | ----------
7263 | freq : string, default
7264 | axis : {0 or 'index', 1 or 'columns'}, default 0
7265 | The axis to convert (the index by default)
7266 | copy : boolean, default True
7267 | If False then underlying input data is not copied
7268 |
7269 | Returns
7270 | -------
7271 | ts : TimeSeries with PeriodIndex
7272 |
7273 | to_records(self, index=True, convert_datetime64=None)
7274 | Convert DataFrame to a NumPy record array.
7275 |
7276 | Index will be put in the 'index' field of the record array if
7277 | requested.
7278 |
7279 | Parameters
7280 | ----------
7281 | index : boolean, default True
7282 | Include index in resulting record array, stored in 'index' field.
7283 | convert_datetime64 : boolean, default None
7284 | .. deprecated:: 0.23.0
7285 |
7286 | Whether to convert the index to datetime.datetime if it is a
7287 | DatetimeIndex.
7288 |
7289 | Returns
7290 | -------
7291 | y : numpy.recarray
7292 |
7293 | See Also
7294 | --------
7295 | DataFrame.from_records: convert structured or record ndarray
7296 | to DataFrame.
7297 | numpy.recarray: ndarray that allows field access using
7298 | attributes, analogous to typed columns in a
7299 | spreadsheet.
7300 |
7301 | Examples
7302 | --------
7303 | >>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
7304 | ... index=['a', 'b'])
7305 | >>> df
7306 | A B
7307 | a 1 0.50
7308 | b 2 0.75
7309 | >>> df.to_records()
7310 | rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
7311 | dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])
7312 |
7313 | The index can be excluded from the record array:
7314 |
7315 | >>> df.to_records(index=False)
7316 | rec.array([(1, 0.5 ), (2, 0.75)],
7317 | dtype=[('A', '<i8'), ('B', '<f8')])
7318 |
7319 | By default, timestamps are converted to `datetime.datetime`:
7320 |
7321 | >>> df.index = pd.date_range('2018-01-01 09:00', periods=2, freq='min')
7322 | >>> df
7323 | A B
7324 | 2018-01-01 09:00:00 1 0.50
7325 | 2018-01-01 09:01:00 2 0.75
7326 | >>> df.to_records()
7327 | rec.array([(datetime.datetime(2018, 1, 1, 9, 0), 1, 0.5 ),
7328 | (datetime.datetime(2018, 1, 1, 9, 1), 2, 0.75)],
7329 | dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])
7330 |
7331 | The timestamp conversion can be disabled so NumPy's datetime64
7332 | data type is used instead:
7333 |
7334 | >>> df.to_records(convert_datetime64=False)
7335 | rec.array([('2018-01-01T09:00:00.000000000', 1, 0.5 ),
7336 | ('2018-01-01T09:01:00.000000000', 2, 0.75)],
7337 | dtype=[('index', '<M8[ns]'), ('A', '<i8'), ('B', '<f8')])
7338 |
7339 | to_sparse(self, fill_value=None, kind='block')
7340 | Convert to SparseDataFrame
7341 |
7342 | Parameters
7343 | ----------
7344 | fill_value : float, default NaN
7345 | kind : {'block', 'integer'}
7346 |
7347 | Returns
7348 | -------
7349 | y : SparseDataFrame
7350 |
7351 | to_stata(self, fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None, time_stamp=None, data_label=None, variable_labels=None, version=114, convert_strl=None)
7352 | Export Stata binary dta files.
7353 |
7354 | Parameters
7355 | ----------
7356 | fname : path (string), buffer or path object
7357 | string, path object (pathlib.Path or py._path.local.LocalPath) or
7358 | object implementing a binary write() functions. If using a buffer
7359 | then the buffer will not be automatically closed after the file
7360 | data has been written.
7361 | convert_dates : dict
7362 | Dictionary mapping columns containing datetime types to stata
7363 | internal format to use when writing the dates. Options are 'tc',
7364 | 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer
7365 | or a name. Datetime columns that do not have a conversion type
7366 | specified will be converted to 'tc'. Raises NotImplementedError if
7367 | a datetime column has timezone information.
7368 | write_index : bool
7369 | Write the index to Stata dataset.
7370 | encoding : str
7371 | Default is latin-1. Unicode is not supported.
7372 | byteorder : str
7373 | Can be ">", "<", "little", or "big". default is `sys.byteorder`.
7374 | time_stamp : datetime
7375 | A datetime to use as file creation date. Default is the current
7376 | time.
7377 | data_label : str
7378 | A label for the data set. Must be 80 characters or smaller.
7379 | variable_labels : dict
7380 | Dictionary containing columns as keys and variable labels as
7381 | values. Each label must be 80 characters or smaller.
7382 |
7383 | .. versionadded:: 0.19.0
7384 |
7385 | version : {114, 117}
7386 | Version to use in the output dta file. Version 114 can be used
7387 | read by Stata 10 and later. Version 117 can be read by Stata 13
7388 | or later. Version 114 limits string variables to 244 characters or
7389 | fewer while 117 allows strings with lengths up to 2,000,000
7390 | characters.
7391 |
7392 | .. versionadded:: 0.23.0
7393 |
7394 | convert_strl : list, optional
7395 | List of column names to convert to string columns to Stata StrL
7396 | format. Only available if version is 117. Storing strings in the
7397 | StrL format can produce smaller dta files if strings have more than
7398 | 8 characters and values are repeated.
7399 |
7400 | .. versionadded:: 0.23.0
7401 |
7402 | Raises
7403 | ------
7404 | NotImplementedError
7405 | * If datetimes contain timezone information
7406 | * Column dtype is not representable in Stata
7407 | ValueError
7408 | * Columns listed in convert_dates are neither datetime64[ns]
7409 | or datetime.datetime
7410 | * Column listed in convert_dates is not in DataFrame
7411 | * Categorical label contains more than 32,000 characters
7412 |
7413 | .. versionadded:: 0.19.0
7414 |
7415 | See Also
7416 | --------
7417 | pandas.read_stata : Import Stata data files
7418 | pandas.io.stata.StataWriter : low-level writer for Stata data files
7419 | pandas.io.stata.StataWriter117 : low-level writer for version 117 files
7420 |
7421 | Examples
7422 | --------
7423 | >>> data.to_stata('./data_file.dta')
7424 |
7425 | Or with dates
7426 |
7427 | >>> data.to_stata('./date_data_file.dta', {2 : 'tw'})
7428 |
7429 | Alternatively you can create an instance of the StataWriter class
7430 |
7431 | >>> writer = StataWriter('./data_file.dta', data)
7432 | >>> writer.write_file()
7433 |
7434 | With dates:
7435 |
7436 | >>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
7437 | >>> writer.write_file()
7438 |
7439 | to_string(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)
7440 | Render a DataFrame to a console-friendly tabular output.
7441 |
7442 | Parameters
7443 | ----------
7444 | buf : StringIO-like, optional
7445 | buffer to write to
7446 | columns : sequence, optional
7447 | the subset of columns to write; default None writes all columns
7448 | col_space : int, optional
7449 | the minimum width of each column
7450 | header : bool, optional
7451 | Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names
7452 | index : bool, optional
7453 | whether to print index (row) labels, default True
7454 | na_rep : string, optional
7455 | string representation of NAN to use, default 'NaN'
7456 | formatters : list or dict of one-parameter functions, optional
7457 | formatter functions to apply to columns' elements by position or name,
7458 | default None. The result of each function must be a unicode string.
7459 | List must be of length equal to the number of columns.
7460 | float_format : one-parameter function, optional
7461 | formatter function to apply to columns' elements if they are floats,
7462 | default None. The result of this function must be a unicode string.
7463 | sparsify : bool, optional
7464 | Set to False for a DataFrame with a hierarchical index to print every
7465 | multiindex key at each row, default True
7466 | index_names : bool, optional
7467 | Prints the names of the indexes, default True
7468 | line_width : int, optional
7469 | Width to wrap a line in characters, default no wrap
7470 | table_id : str, optional
7471 | id for the <table> element create by to_html
7472 |
7473 | .. versionadded:: 0.23.0
7474 | justify : str, default None
7475 | How to justify the column labels. If None uses the option from
7476 | the print configuration (controlled by set_option), 'right' out
7477 | of the box. Valid values are
7478 |
7479 | * left
7480 | * right
7481 | * center
7482 | * justify
7483 | * justify-all
7484 | * start
7485 | * end
7486 | * inherit
7487 | * match-parent
7488 | * initial
7489 | * unset
7490 |
7491 |
7492 | Returns
7493 | -------
7494 | formatted : string (or unicode, depending on data and options)
7495 |
7496 | to_timestamp(self, freq=None, how='start', axis=0, copy=True)
7497 | Cast to DatetimeIndex of timestamps, at *beginning* of period
7498 |
7499 | Parameters
7500 | ----------
7501 | freq : string, default frequency of PeriodIndex
7502 | Desired frequency
7503 | how : {'s', 'e', 'start', 'end'}
7504 | Convention for converting period to timestamp; start of period
7505 | vs. end
7506 | axis : {0 or 'index', 1 or 'columns'}, default 0
7507 | The axis to convert (the index by default)
7508 | copy : boolean, default True
7509 | If false then underlying input data is not copied
7510 |
7511 | Returns
7512 | -------
7513 | df : DataFrame with DatetimeIndex
7514 |
7515 | transform(self, func, *args, **kwargs)
7516 | Call function producing a like-indexed NDFrame
7517 | and return a NDFrame with the transformed values
7518 |
7519 | .. versionadded:: 0.20.0
7520 |
7521 | Parameters
7522 | ----------
7523 | func : callable, string, dictionary, or list of string/callables
7524 | To apply to column
7525 |
7526 | Accepted Combinations are:
7527 |
7528 | - string function name
7529 | - function
7530 | - list of functions
7531 | - dict of column names -> functions (or list of functions)
7532 |
7533 | Returns
7534 | -------
7535 | transformed : NDFrame
7536 |
7537 | Examples
7538 | --------
7539 | >>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
7540 | ... index=pd.date_range('1/1/2000', periods=10))
7541 | df.iloc[3:7] = np.nan
7542 |
7543 | >>> df.transform(lambda x: (x - x.mean()) / x.std())
7544 | A B C
7545 | 2000-01-01 0.579457 1.236184 0.123424
7546 | 2000-01-02 0.370357 -0.605875 -1.231325
7547 | 2000-01-03 1.455756 -0.277446 0.288967
7548 | 2000-01-04 NaN NaN NaN
7549 | 2000-01-05 NaN NaN NaN
7550 | 2000-01-06 NaN NaN NaN
7551 | 2000-01-07 NaN NaN NaN
7552 | 2000-01-08 -0.498658 1.274522 1.642524
7553 | 2000-01-09 -0.540524 -1.012676 -0.828968
7554 | 2000-01-10 -1.366388 -0.614710 0.005378
7555 |
7556 | See also
7557 | --------
7558 | pandas.NDFrame.aggregate
7559 | pandas.NDFrame.apply
7560 |
7561 | transpose(self, *args, **kwargs)
7562 | Transpose index and columns.
7563 |
7564 | Reflect the DataFrame over its main diagonal by writing rows as columns
7565 | and vice-versa. The property :attr:`.T` is an accessor to the method
7566 | :meth:`transpose`.
7567 |
7568 | Parameters
7569 | ----------
7570 | copy : bool, default False
7571 | If True, the underlying data is copied. Otherwise (default), no
7572 | copy is made if possible.
7573 | *args, **kwargs
7574 | Additional keywords have no effect but might be accepted for
7575 | compatibility with numpy.
7576 |
7577 | Returns
7578 | -------
7579 | DataFrame
7580 | The transposed DataFrame.
7581 |
7582 | See Also
7583 | --------
7584 | numpy.transpose : Permute the dimensions of a given array.
7585 |
7586 | Notes
7587 | -----
7588 | Transposing a DataFrame with mixed dtypes will result in a homogeneous
7589 | DataFrame with the `object` dtype. In such a case, a copy of the data
7590 | is always made.
7591 |
7592 | Examples
7593 | --------
7594 | **Square DataFrame with homogeneous dtype**
7595 |
7596 | >>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
7597 | >>> df1 = pd.DataFrame(data=d1)
7598 | >>> df1
7599 | col1 col2
7600 | 0 1 3
7601 | 1 2 4
7602 |
7603 | >>> df1_transposed = df1.T # or df1.transpose()
7604 | >>> df1_transposed
7605 | 0 1
7606 | col1 1 2
7607 | col2 3 4
7608 |
7609 | When the dtype is homogeneous in the original DataFrame, we get a
7610 | transposed DataFrame with the same dtype:
7611 |
7612 | >>> df1.dtypes
7613 | col1 int64
7614 | col2 int64
7615 | dtype: object
7616 | >>> df1_transposed.dtypes
7617 | 0 int64
7618 | 1 int64
7619 | dtype: object
7620 |
7621 | **Non-square DataFrame with mixed dtypes**
7622 |
7623 | >>> d2 = {'name': ['Alice', 'Bob'],
7624 | ... 'score': [9.5, 8],
7625 | ... 'employed': [False, True],
7626 | ... 'kids': [0, 0]}
7627 | >>> df2 = pd.DataFrame(data=d2)
7628 | >>> df2
7629 | name score employed kids
7630 | 0 Alice 9.5 False 0
7631 | 1 Bob 8.0 True 0
7632 |
7633 | >>> df2_transposed = df2.T # or df2.transpose()
7634 | >>> df2_transposed
7635 | 0 1
7636 | name Alice Bob
7637 | score 9.5 8
7638 | employed False True
7639 | kids 0 0
7640 |
7641 | When the DataFrame has mixed dtypes, we get a transposed DataFrame with
7642 | the `object` dtype:
7643 |
7644 | >>> df2.dtypes
7645 | name object
7646 | score float64
7647 | employed bool
7648 | kids int64
7649 | dtype: object
7650 | >>> df2_transposed.dtypes
7651 | 0 object
7652 | 1 object
7653 | dtype: object
7654 |
7655 | truediv(self, other, axis='columns', level=None, fill_value=None)
7656 | Floating division of dataframe and other, element-wise (binary operator `truediv`).
7657 |
7658 | Equivalent to ``dataframe / other``, but with support to substitute a fill_value for
7659 | missing data in one of the inputs.
7660 |
7661 | Parameters
7662 | ----------
7663 | other : Series, DataFrame, or constant
7664 | axis : {0, 1, 'index', 'columns'}
7665 | For Series input, axis to match Series index on
7666 | level : int or name
7667 | Broadcast across a level, matching Index values on the
7668 | passed MultiIndex level
7669 | fill_value : None or float value, default None
7670 | Fill existing missing (NaN) values, and any new element needed for
7671 | successful DataFrame alignment, with this value before computation.
7672 | If data in both corresponding DataFrame locations is missing
7673 | the result will be missing
7674 |
7675 | Notes
7676 | -----
7677 | Mismatched indices will be unioned together
7678 |
7679 | Returns
7680 | -------
7681 | result : DataFrame
7682 |
7683 | Examples
7684 | --------
7685 | None
7686 |
7687 | See also
7688 | --------
7689 | DataFrame.rtruediv
7690 |
7691 | unstack(self, level=-1, fill_value=None)
7692 | Pivot a level of the (necessarily hierarchical) index labels, returning
7693 | a DataFrame having a new level of column labels whose inner-most level
7694 | consists of the pivoted index labels. If the index is not a MultiIndex,
7695 | the output will be a Series (the analogue of stack when the columns are
7696 | not a MultiIndex).
7697 | The level involved will automatically get sorted.
7698 |
7699 | Parameters
7700 | ----------
7701 | level : int, string, or list of these, default -1 (last level)
7702 | Level(s) of index to unstack, can pass level name
7703 | fill_value : replace NaN with this value if the unstack produces
7704 | missing values
7705 |
7706 | .. versionadded:: 0.18.0
7707 |
7708 | See also
7709 | --------
7710 | DataFrame.pivot : Pivot a table based on column values.
7711 | DataFrame.stack : Pivot a level of the column labels (inverse operation
7712 | from `unstack`).
7713 |
7714 | Examples
7715 | --------
7716 | >>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
7717 | ... ('two', 'a'), ('two', 'b')])
7718 | >>> s = pd.Series(np.arange(1.0, 5.0), index=index)
7719 | >>> s
7720 | one a 1.0
7721 | b 2.0
7722 | two a 3.0
7723 | b 4.0
7724 | dtype: float64
7725 |
7726 | >>> s.unstack(level=-1)
7727 | a b
7728 | one 1.0 2.0
7729 | two 3.0 4.0
7730 |
7731 | >>> s.unstack(level=0)
7732 | one two
7733 | a 1.0 3.0
7734 | b 2.0 4.0
7735 |
7736 | >>> df = s.unstack(level=0)
7737 | >>> df.unstack()
7738 | one a 1.0
7739 | b 2.0
7740 | two a 3.0
7741 | b 4.0
7742 | dtype: float64
7743 |
7744 | Returns
7745 | -------
7746 | unstacked : DataFrame or Series
7747 |
7748 | update(self, other, join='left', overwrite=True, filter_func=None, raise_conflict=False)
7749 | Modify in place using non-NA values from another DataFrame.
7750 |
7751 | Aligns on indices. There is no return value.
7752 |
7753 | Parameters
7754 | ----------
7755 | other : DataFrame, or object coercible into a DataFrame
7756 | Should have at least one matching index/column label
7757 | with the original DataFrame. If a Series is passed,
7758 | its name attribute must be set, and that will be
7759 | used as the column name to align with the original DataFrame.
7760 | join : {'left'}, default 'left'
7761 | Only left join is implemented, keeping the index and columns of the
7762 | original object.
7763 | overwrite : bool, default True
7764 | How to handle non-NA values for overlapping keys:
7765 |
7766 | * True: overwrite original DataFrame's values
7767 | with values from `other`.
7768 | * False: only update values that are NA in
7769 | the original DataFrame.
7770 |
7771 | filter_func : callable(1d-array) -> boolean 1d-array, optional
7772 | Can choose to replace values other than NA. Return True for values
7773 | that should be updated.
7774 | raise_conflict : bool, default False
7775 | If True, will raise a ValueError if the DataFrame and `other`
7776 | both contain non-NA data in the same place.
7777 |
7778 | Raises
7779 | ------
7780 | ValueError
7781 | When `raise_conflict` is True and there's overlapping non-NA data.
7782 |
7783 | See Also
7784 | --------
7785 | dict.update : Similar method for dictionaries.
7786 | DataFrame.merge : For column(s)-on-columns(s) operations.
7787 |
7788 | Examples
7789 | --------
7790 | >>> df = pd.DataFrame({'A': [1, 2, 3],
7791 | ... 'B': [400, 500, 600]})
7792 | >>> new_df = pd.DataFrame({'B': [4, 5, 6],
7793 | ... 'C': [7, 8, 9]})
7794 | >>> df.update(new_df)
7795 | >>> df
7796 | A B
7797 | 0 1 4
7798 | 1 2 5
7799 | 2 3 6
7800 |
7801 | The DataFrame's length does not increase as a result of the update,
7802 | only values at matching index/column labels are updated.
7803 |
7804 | >>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
7805 | ... 'B': ['x', 'y', 'z']})
7806 | >>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
7807 | >>> df.update(new_df)
7808 | >>> df
7809 | A B
7810 | 0 a d
7811 | 1 b e
7812 | 2 c f
7813 |
7814 | For Series, it's name attribute must be set.
7815 |
7816 | >>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
7817 | ... 'B': ['x', 'y', 'z']})
7818 | >>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
7819 | >>> df.update(new_column)
7820 | >>> df
7821 | A B
7822 | 0 a d
7823 | 1 b y
7824 | 2 c e
7825 | >>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
7826 | ... 'B': ['x', 'y', 'z']})
7827 | >>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
7828 | >>> df.update(new_df)
7829 | >>> df
7830 | A B
7831 | 0 a x
7832 | 1 b d
7833 | 2 c e
7834 |
7835 | If `other` contains NaNs the corresponding values are not updated
7836 | in the original dataframe.
7837 |
7838 | >>> df = pd.DataFrame({'A': [1, 2, 3],
7839 | ... 'B': [400, 500, 600]})
7840 | >>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
7841 | >>> df.update(new_df)
7842 | >>> df
7843 | A B
7844 | 0 1 4.0
7845 | 1 2 500.0
7846 | 2 3 6.0
7847 |
7848 | var(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
7849 | Return unbiased variance over requested axis.
7850 |
7851 | Normalized by N-1 by default. This can be changed using the ddof argument
7852 |
7853 | Parameters
7854 | ----------
7855 | axis : {index (0), columns (1)}
7856 | skipna : boolean, default True
7857 | Exclude NA/null values. If an entire row/column is NA, the result
7858 | will be NA
7859 | level : int or level name, default None
7860 | If the axis is a MultiIndex (hierarchical), count along a
7861 | particular level, collapsing into a Series
7862 | ddof : int, default 1
7863 | Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
7864 | where N represents the number of elements.
7865 | numeric_only : boolean, default None
7866 | Include only float, int, boolean columns. If None, will attempt to use
7867 | everything, then use only numeric data. Not implemented for Series.
7868 |
7869 | Returns
7870 | -------
7871 | var : Series or DataFrame (if level specified)
7872 |
7873 | ----------------------------------------------------------------------
7874 | Class methods defined here:
7875 |
7876 | from_csv(path, header=0, sep=',', index_col=0, parse_dates=True, encoding=None, tupleize_cols=None, infer_datetime_format=False) from builtins.type
7877 | Read CSV file.
7878 |
7879 | .. deprecated:: 0.21.0
7880 | Use :func:`pandas.read_csv` instead.
7881 |
7882 | It is preferable to use the more powerful :func:`pandas.read_csv`
7883 | for most general purposes, but ``from_csv`` makes for an easy
7884 | roundtrip to and from a file (the exact counterpart of
7885 | ``to_csv``), especially with a DataFrame of time series data.
7886 |
7887 | This method only differs from the preferred :func:`pandas.read_csv`
7888 | in some defaults:
7889 |
7890 | - `index_col` is ``0`` instead of ``None`` (take first column as index
7891 | by default)
7892 | - `parse_dates` is ``True`` instead of ``False`` (try parsing the index
7893 | as datetime by default)
7894 |
7895 | So a ``pd.DataFrame.from_csv(path)`` can be replaced by
7896 | ``pd.read_csv(path, index_col=0, parse_dates=True)``.
7897 |
7898 | Parameters
7899 | ----------
7900 | path : string file path or file handle / StringIO
7901 | header : int, default 0
7902 | Row to use as header (skip prior rows)
7903 | sep : string, default ','
7904 | Field delimiter
7905 | index_col : int or sequence, default 0
7906 | Column to use for index. If a sequence is given, a MultiIndex
7907 | is used. Different default from read_table
7908 | parse_dates : boolean, default True
7909 | Parse dates. Different default from read_table
7910 | tupleize_cols : boolean, default False
7911 | write multi_index columns as a list of tuples (if True)
7912 | or new (expanded format) if False)
7913 | infer_datetime_format: boolean, default False
7914 | If True and `parse_dates` is True for a column, try to infer the
7915 | datetime format based on the first datetime string. If the format
7916 | can be inferred, there often will be a large parsing speed-up.
7917 |
7918 | See also
7919 | --------
7920 | pandas.read_csv
7921 |
7922 | Returns
7923 | -------
7924 | y : DataFrame
7925 |
7926 | from_dict(data, orient='columns', dtype=None, columns=None) from builtins.type
7927 | Construct DataFrame from dict of array-like or dicts.
7928 |
7929 | Creates DataFrame object from dictionary by columns or by index
7930 | allowing dtype specification.
7931 |
7932 | Parameters
7933 | ----------
7934 | data : dict
7935 | Of the form {field : array-like} or {field : dict}.
7936 | orient : {'columns', 'index'}, default 'columns'
7937 | The "orientation" of the data. If the keys of the passed dict
7938 | should be the columns of the resulting DataFrame, pass 'columns'
7939 | (default). Otherwise if the keys should be rows, pass 'index'.
7940 | dtype : dtype, default None
7941 | Data type to force, otherwise infer.
7942 | columns : list, default None
7943 | Column labels to use when ``orient='index'``. Raises a ValueError
7944 | if used with ``orient='columns'``.
7945 |
7946 | .. versionadded:: 0.23.0
7947 |
7948 | Returns
7949 | -------
7950 | pandas.DataFrame
7951 |
7952 | See Also
7953 | --------
7954 | DataFrame.from_records : DataFrame from ndarray (structured
7955 | dtype), list of tuples, dict, or DataFrame
7956 | DataFrame : DataFrame object creation using constructor
7957 |
7958 | Examples
7959 | --------
7960 | By default the keys of the dict become the DataFrame columns:
7961 |
7962 | >>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
7963 | >>> pd.DataFrame.from_dict(data)
7964 | col_1 col_2
7965 | 0 3 a
7966 | 1 2 b
7967 | 2 1 c
7968 | 3 0 d
7969 |
7970 | Specify ``orient='index'`` to create the DataFrame using dictionary
7971 | keys as rows:
7972 |
7973 | >>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
7974 | >>> pd.DataFrame.from_dict(data, orient='index')
7975 | 0 1 2 3
7976 | row_1 3 2 1 0
7977 | row_2 a b c d
7978 |
7979 | When using the 'index' orientation, the column names can be
7980 | specified manually:
7981 |
7982 | >>> pd.DataFrame.from_dict(data, orient='index',
7983 | ... columns=['A', 'B', 'C', 'D'])
7984 | A B C D
7985 | row_1 3 2 1 0
7986 | row_2 a b c d
7987 |
7988 | from_items(items, columns=None, orient='columns') from builtins.type
7989 | Construct a dataframe from a list of tuples
7990 |
7991 | .. deprecated:: 0.23.0
7992 | `from_items` is deprecated and will be removed in a future version.
7993 | Use :meth:`DataFrame.from_dict(dict(items)) <DataFrame.from_dict>`
7994 | instead.
7995 | :meth:`DataFrame.from_dict(OrderedDict(items)) <DataFrame.from_dict>`
7996 | may be used to preserve the key order.
7997 |
7998 | Convert (key, value) pairs to DataFrame. The keys will be the axis
7999 | index (usually the columns, but depends on the specified
8000 | orientation). The values should be arrays or Series.
8001 |
8002 | Parameters
8003 | ----------
8004 | items : sequence of (key, value) pairs
8005 | Values should be arrays or Series.
8006 | columns : sequence of column labels, optional
8007 | Must be passed if orient='index'.
8008 | orient : {'columns', 'index'}, default 'columns'
8009 | The "orientation" of the data. If the keys of the
8010 | input correspond to column labels, pass 'columns'
8011 | (default). Otherwise if the keys correspond to the index,
8012 | pass 'index'.
8013 |
8014 | Returns
8015 | -------
8016 | frame : DataFrame
8017 |
8018 | from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None) from builtins.type
8019 | Convert structured or record ndarray to DataFrame
8020 |
8021 | Parameters
8022 | ----------
8023 | data : ndarray (structured dtype), list of tuples, dict, or DataFrame
8024 | index : string, list of fields, array-like
8025 | Field of array to use as the index, alternately a specific set of
8026 | input labels to use
8027 | exclude : sequence, default None
8028 | Columns or fields to exclude
8029 | columns : sequence, default None
8030 | Column names to use. If the passed data do not have names
8031 | associated with them, this argument provides names for the
8032 | columns. Otherwise this argument indicates the order of the columns
8033 | in the result (any names not found in the data will become all-NA
8034 | columns)
8035 | coerce_float : boolean, default False
8036 | Attempt to convert values of non-string, non-numeric objects (like
8037 | decimal.Decimal) to floating point, useful for SQL result sets
8038 |
8039 | Returns
8040 | -------
8041 | df : DataFrame
8042 |
8043 | ----------------------------------------------------------------------
8044 | Data descriptors defined here:
8045 |
8046 | T
8047 | Transpose index and columns.
8048 |
8049 | Reflect the DataFrame over its main diagonal by writing rows as columns
8050 | and vice-versa. The property :attr:`.T` is an accessor to the method
8051 | :meth:`transpose`.
8052 |
8053 | Parameters
8054 | ----------
8055 | copy : bool, default False
8056 | If True, the underlying data is copied. Otherwise (default), no
8057 | copy is made if possible.
8058 | *args, **kwargs
8059 | Additional keywords have no effect but might be accepted for
8060 | compatibility with numpy.
8061 |
8062 | Returns
8063 | -------
8064 | DataFrame
8065 | The transposed DataFrame.
8066 |
8067 | See Also
8068 | --------
8069 | numpy.transpose : Permute the dimensions of a given array.
8070 |
8071 | Notes
8072 | -----
8073 | Transposing a DataFrame with mixed dtypes will result in a homogeneous
8074 | DataFrame with the `object` dtype. In such a case, a copy of the data
8075 | is always made.
8076 |
8077 | Examples
8078 | --------
8079 | **Square DataFrame with homogeneous dtype**
8080 |
8081 | >>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
8082 | >>> df1 = pd.DataFrame(data=d1)
8083 | >>> df1
8084 | col1 col2
8085 | 0 1 3
8086 | 1 2 4
8087 |
8088 | >>> df1_transposed = df1.T # or df1.transpose()
8089 | >>> df1_transposed
8090 | 0 1
8091 | col1 1 2
8092 | col2 3 4
8093 |
8094 | When the dtype is homogeneous in the original DataFrame, we get a
8095 | transposed DataFrame with the same dtype:
8096 |
8097 | >>> df1.dtypes
8098 | col1 int64
8099 | col2 int64
8100 | dtype: object
8101 | >>> df1_transposed.dtypes
8102 | 0 int64
8103 | 1 int64
8104 | dtype: object
8105 |
8106 | **Non-square DataFrame with mixed dtypes**
8107 |
8108 | >>> d2 = {'name': ['Alice', 'Bob'],
8109 | ... 'score': [9.5, 8],
8110 | ... 'employed': [False, True],
8111 | ... 'kids': [0, 0]}
8112 | >>> df2 = pd.DataFrame(data=d2)
8113 | >>> df2
8114 | name score employed kids
8115 | 0 Alice 9.5 False 0
8116 | 1 Bob 8.0 True 0
8117 |
8118 | >>> df2_transposed = df2.T # or df2.transpose()
8119 | >>> df2_transposed
8120 | 0 1
8121 | name Alice Bob
8122 | score 9.5 8
8123 | employed False True
8124 | kids 0 0
8125 |
8126 | When the DataFrame has mixed dtypes, we get a transposed DataFrame with
8127 | the `object` dtype:
8128 |
8129 | >>> df2.dtypes
8130 | name object
8131 | score float64
8132 | employed bool
8133 | kids int64
8134 | dtype: object
8135 | >>> df2_transposed.dtypes
8136 | 0 object
8137 | 1 object
8138 | dtype: object
8139 |
8140 | axes
8141 | Return a list representing the axes of the DataFrame.
8142 |
8143 | It has the row axis labels and column axis labels as the only members.
8144 | They are returned in that order.
8145 |
8146 | Examples
8147 | --------
8148 | >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
8149 | >>> df.axes
8150 | [RangeIndex(start=0, stop=2, step=1), Index(['coll', 'col2'],
8151 | dtype='object')]
8152 |
8153 | columns
8154 | The column labels of the DataFrame.
8155 |
8156 | index
8157 | The index (row labels) of the DataFrame.
8158 |
8159 | shape
8160 | Return a tuple representing the dimensionality of the DataFrame.
8161 |
8162 | See Also
8163 | --------
8164 | ndarray.shape
8165 |
8166 | Examples
8167 | --------
8168 | >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
8169 | >>> df.shape
8170 | (2, 2)
8171 |
8172 | >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
8173 | ... 'col3': [5, 6]})
8174 | >>> df.shape
8175 | (2, 3)
8176 |
8177 | style
8178 | Property returning a Styler object containing methods for
8179 | building a styled HTML representation fo the DataFrame.
8180 |
8181 | See Also
8182 | --------
8183 | pandas.io.formats.style.Styler
8184 |
8185 | ----------------------------------------------------------------------
8186 | Data and other attributes defined here:
8187 |
8188 | plot = <class 'pandas.plotting._core.FramePlotMethods'>
8189 | DataFrame plotting accessor and method
8190 |
8191 | Examples
8192 | --------
8193 | >>> df.plot.line()
8194 | >>> df.plot.scatter('x', 'y')
8195 | >>> df.plot.hexbin()
8196 |
8197 | These plotting methods can also be accessed by calling the accessor as a
8198 | method with the ``kind`` argument:
8199 | ``df.plot(kind='line')`` is equivalent to ``df.plot.line()``
8200 |
8201 | ----------------------------------------------------------------------
8202 | Methods inherited from pandas.core.generic.NDFrame:
8203 |
8204 | __abs__(self)
8205 |
8206 | __array__(self, dtype=None)
8207 |
8208 | __array_wrap__(self, result, context=None)
8209 |
8210 | __bool__ = __nonzero__(self)
8211 |
8212 | __contains__(self, key)
8213 | True if the key is in the info axis
8214 |
8215 | __copy__(self, deep=True)
8216 |
8217 | __deepcopy__(self, memo=None)
8218 |
8219 | __delitem__(self, key)
8220 | Delete item
8221 |
8222 | __finalize__(self, other, method=None, **kwargs)
8223 | Propagate metadata from other to self.
8224 |
8225 | Parameters
8226 | ----------
8227 | other : the object from which to get the attributes that we are going
8228 | to propagate
8229 | method : optional, a passed method name ; possibly to take different
8230 | types of propagation actions based on this
8231 |
8232 | __getattr__(self, name)
8233 | After regular attribute access, try looking up the name
8234 | This allows simpler access to columns for interactive use.
8235 |
8236 | __getstate__(self)
8237 |
8238 | __hash__(self)
8239 | Return hash(self).
8240 |
8241 | __invert__(self)
8242 |
8243 | __iter__(self)
8244 | Iterate over infor axis
8245 |
8246 | __neg__(self)
8247 |
8248 | __nonzero__(self)
8249 |
8250 | __pos__(self)
8251 |
8252 | __round__(self, decimals=0)
8253 |
8254 | __setattr__(self, name, value)
8255 | After regular attribute access, try setting the name
8256 | This allows simpler access to columns for interactive use.
8257 |
8258 | __setstate__(self, state)
8259 |
8260 | abs(self)
8261 | Return a Series/DataFrame with absolute numeric value of each element.
8262 |
8263 | This function only applies to elements that are all numeric.
8264 |
8265 | Returns
8266 | -------
8267 | abs
8268 | Series/DataFrame containing the absolute value of each element.
8269 |
8270 | Notes
8271 | -----
8272 | For ``complex`` inputs, ``1.2 + 1j``, the absolute value is
8273 | :math:`\sqrt{ a^2 + b^2 }`.
8274 |
8275 | Examples
8276 | --------
8277 | Absolute numeric values in a Series.
8278 |
8279 | >>> s = pd.Series([-1.10, 2, -3.33, 4])
8280 | >>> s.abs()
8281 | 0 1.10
8282 | 1 2.00
8283 | 2 3.33
8284 | 3 4.00
8285 | dtype: float64
8286 |
8287 | Absolute numeric values in a Series with complex numbers.
8288 |
8289 | >>> s = pd.Series([1.2 + 1j])
8290 | >>> s.abs()
8291 | 0 1.56205
8292 | dtype: float64
8293 |
8294 | Absolute numeric values in a Series with a Timedelta element.
8295 |
8296 | >>> s = pd.Series([pd.Timedelta('1 days')])
8297 | >>> s.abs()
8298 | 0 1 days
8299 | dtype: timedelta64[ns]
8300 |
8301 | Select rows with data closest to certain value using argsort (from
8302 | `StackOverflow <https://stackoverflow.com/a/17758115>`__).
8303 |
8304 | >>> df = pd.DataFrame({
8305 | ... 'a': [4, 5, 6, 7],
8306 | ... 'b': [10, 20, 30, 40],
8307 | ... 'c': [100, 50, -30, -50]
8308 | ... })
8309 | >>> df
8310 | a b c
8311 | 0 4 10 100
8312 | 1 5 20 50
8313 | 2 6 30 -30
8314 | 3 7 40 -50
8315 | >>> df.loc[(df.c - 43).abs().argsort()]
8316 | a b c
8317 | 1 5 20 50
8318 | 0 4 10 100
8319 | 2 6 30 -30
8320 | 3 7 40 -50
8321 |
8322 | See Also
8323 | --------
8324 | numpy.absolute : calculate the absolute value element-wise.
8325 |
8326 | add_prefix(self, prefix)
8327 | Prefix labels with string `prefix`.
8328 |
8329 | For Series, the row labels are prefixed.
8330 | For DataFrame, the column labels are prefixed.
8331 |
8332 | Parameters
8333 | ----------
8334 | prefix : str
8335 | The string to add before each label.
8336 |
8337 | Returns
8338 | -------
8339 | Series or DataFrame
8340 | New Series or DataFrame with updated labels.
8341 |
8342 | See Also
8343 | --------
8344 | Series.add_suffix: Suffix row labels with string `suffix`.
8345 | DataFrame.add_suffix: Suffix column labels with string `suffix`.
8346 |
8347 | Examples
8348 | --------
8349 | >>> s = pd.Series([1, 2, 3, 4])
8350 | >>> s
8351 | 0 1
8352 | 1 2
8353 | 2 3
8354 | 3 4
8355 | dtype: int64
8356 |
8357 | >>> s.add_prefix('item_')
8358 | item_0 1
8359 | item_1 2
8360 | item_2 3
8361 | item_3 4
8362 | dtype: int64
8363 |
8364 | >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
8365 | >>> df
8366 | A B
8367 | 0 1 3
8368 | 1 2 4
8369 | 2 3 5
8370 | 3 4 6
8371 |
8372 | >>> df.add_prefix('col_')
8373 | col_A col_B
8374 | 0 1 3
8375 | 1 2 4
8376 | 2 3 5
8377 | 3 4 6
8378 |
8379 | add_suffix(self, suffix)
8380 | Suffix labels with string `suffix`.
8381 |
8382 | For Series, the row labels are suffixed.
8383 | For DataFrame, the column labels are suffixed.
8384 |
8385 | Parameters
8386 | ----------
8387 | suffix : str
8388 | The string to add after each label.
8389 |
8390 | Returns
8391 | -------
8392 | Series or DataFrame
8393 | New Series or DataFrame with updated labels.
8394 |
8395 | See Also
8396 | --------
8397 | Series.add_prefix: Prefix row labels with string `prefix`.
8398 | DataFrame.add_prefix: Prefix column labels with string `prefix`.
8399 |
8400 | Examples
8401 | --------
8402 | >>> s = pd.Series([1, 2, 3, 4])
8403 | >>> s
8404 | 0 1
8405 | 1 2
8406 | 2 3
8407 | 3 4
8408 | dtype: int64
8409 |
8410 | >>> s.add_suffix('_item')
8411 | 0_item 1
8412 | 1_item 2
8413 | 2_item 3
8414 | 3_item 4
8415 | dtype: int64
8416 |
8417 | >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
8418 | >>> df
8419 | A B
8420 | 0 1 3
8421 | 1 2 4
8422 | 2 3 5
8423 | 3 4 6
8424 |
8425 | >>> df.add_suffix('_col')
8426 | A_col B_col
8427 | 0 1 3
8428 | 1 2 4
8429 | 2 3 5
8430 | 3 4 6
8431 |
8432 | as_blocks(self, copy=True)
8433 | Convert the frame to a dict of dtype -> Constructor Types that each has
8434 | a homogeneous dtype.
8435 |
8436 | .. deprecated:: 0.21.0
8437 |
8438 | NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
8439 | as_matrix)
8440 |
8441 | Parameters
8442 | ----------
8443 | copy : boolean, default True
8444 |
8445 | Returns
8446 | -------
8447 | values : a dict of dtype -> Constructor Types
8448 |
8449 | as_matrix(self, columns=None)
8450 | Convert the frame to its Numpy-array representation.
8451 |
8452 | .. deprecated:: 0.23.0
8453 | Use :meth:`DataFrame.values` instead.
8454 |
8455 | Parameters
8456 | ----------
8457 | columns: list, optional, default:None
8458 | If None, return all columns, otherwise, returns specified columns.
8459 |
8460 | Returns
8461 | -------
8462 | values : ndarray
8463 | If the caller is heterogeneous and contains booleans or objects,
8464 | the result will be of dtype=object. See Notes.
8465 |
8466 |
8467 | Notes
8468 | -----
8469 | Return is NOT a Numpy-matrix, rather, a Numpy-array.
8470 |
8471 | The dtype will be a lower-common-denominator dtype (implicit
8472 | upcasting); that is to say if the dtypes (even of numeric types)
8473 | are mixed, the one that accommodates all will be chosen. Use this
8474 | with care if you are not dealing with the blocks.
8475 |
8476 | e.g. If the dtypes are float16 and float32, dtype will be upcast to
8477 | float32. If dtypes are int32 and uint8, dtype will be upcase to
8478 | int32. By numpy.find_common_type convention, mixing int64 and uint64
8479 | will result in a flot64 dtype.
8480 |
8481 | This method is provided for backwards compatibility. Generally,
8482 | it is recommended to use '.values'.
8483 |
8484 | See Also
8485 | --------
8486 | pandas.DataFrame.values
8487 |
8488 | asfreq(self, freq, method=None, how=None, normalize=False, fill_value=None)
8489 | Convert TimeSeries to specified frequency.
8490 |
8491 | Optionally provide filling method to pad/backfill missing values.
8492 |
8493 | Returns the original data conformed to a new index with the specified
8494 | frequency. ``resample`` is more appropriate if an operation, such as
8495 | summarization, is necessary to represent the data at the new frequency.
8496 |
8497 | Parameters
8498 | ----------
8499 | freq : DateOffset object, or string
8500 | method : {'backfill'/'bfill', 'pad'/'ffill'}, default None
8501 | Method to use for filling holes in reindexed Series (note this
8502 | does not fill NaNs that already were present):
8503 |
8504 | * 'pad' / 'ffill': propagate last valid observation forward to next
8505 | valid
8506 | * 'backfill' / 'bfill': use NEXT valid observation to fill
8507 | how : {'start', 'end'}, default end
8508 | For PeriodIndex only, see PeriodIndex.asfreq
8509 | normalize : bool, default False
8510 | Whether to reset output index to midnight
8511 | fill_value: scalar, optional
8512 | Value to use for missing values, applied during upsampling (note
8513 | this does not fill NaNs that already were present).
8514 |
8515 | .. versionadded:: 0.20.0
8516 |
8517 | Returns
8518 | -------
8519 | converted : type of caller
8520 |
8521 | Examples
8522 | --------
8523 |
8524 | Start by creating a series with 4 one minute timestamps.
8525 |
8526 | >>> index = pd.date_range('1/1/2000', periods=4, freq='T')
8527 | >>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
8528 | >>> df = pd.DataFrame({'s':series})
8529 | >>> df
8530 | s
8531 | 2000-01-01 00:00:00 0.0
8532 | 2000-01-01 00:01:00 NaN
8533 | 2000-01-01 00:02:00 2.0
8534 | 2000-01-01 00:03:00 3.0
8535 |
8536 | Upsample the series into 30 second bins.
8537 |
8538 | >>> df.asfreq(freq='30S')
8539 | s
8540 | 2000-01-01 00:00:00 0.0
8541 | 2000-01-01 00:00:30 NaN
8542 | 2000-01-01 00:01:00 NaN
8543 | 2000-01-01 00:01:30 NaN
8544 | 2000-01-01 00:02:00 2.0
8545 | 2000-01-01 00:02:30 NaN
8546 | 2000-01-01 00:03:00 3.0
8547 |
8548 | Upsample again, providing a ``fill value``.
8549 |
8550 | >>> df.asfreq(freq='30S', fill_value=9.0)
8551 | s
8552 | 2000-01-01 00:00:00 0.0
8553 | 2000-01-01 00:00:30 9.0
8554 | 2000-01-01 00:01:00 NaN
8555 | 2000-01-01 00:01:30 9.0
8556 | 2000-01-01 00:02:00 2.0
8557 | 2000-01-01 00:02:30 9.0
8558 | 2000-01-01 00:03:00 3.0
8559 |
8560 | Upsample again, providing a ``method``.
8561 |
8562 | >>> df.asfreq(freq='30S', method='bfill')
8563 | s
8564 | 2000-01-01 00:00:00 0.0
8565 | 2000-01-01 00:00:30 NaN
8566 | 2000-01-01 00:01:00 NaN
8567 | 2000-01-01 00:01:30 2.0
8568 | 2000-01-01 00:02:00 2.0
8569 | 2000-01-01 00:02:30 3.0
8570 | 2000-01-01 00:03:00 3.0
8571 |
8572 | See Also
8573 | --------
8574 | reindex
8575 |
8576 | Notes
8577 | -----
8578 | To learn more about the frequency strings, please see `this link
8579 | <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__.
8580 |
8581 | asof(self, where, subset=None)
8582 | The last row without any NaN is taken (or the last row without
8583 | NaN considering only the subset of columns in the case of a DataFrame)
8584 |
8585 | .. versionadded:: 0.19.0 For DataFrame
8586 |
8587 | If there is no good value, NaN is returned for a Series
8588 | a Series of NaN values for a DataFrame
8589 |
8590 | Parameters
8591 | ----------
8592 | where : date or array of dates
8593 | subset : string or list of strings, default None
8594 | if not None use these columns for NaN propagation
8595 |
8596 | Notes
8597 | -----
8598 | Dates are assumed to be sorted
8599 | Raises if this is not the case
8600 |
8601 | Returns
8602 | -------
8603 | where is scalar
8604 |
8605 | - value or NaN if input is Series
8606 | - Series if input is DataFrame
8607 |
8608 | where is Index: same shape object as input
8609 |
8610 | See Also
8611 | --------
8612 | merge_asof
8613 |
8614 | astype(self, dtype, copy=True, errors='raise', **kwargs)
8615 | Cast a pandas object to a specified dtype ``dtype``.
8616 |
8617 | Parameters
8618 | ----------
8619 | dtype : data type, or dict of column name -> data type
8620 | Use a numpy.dtype or Python type to cast entire pandas object to
8621 | the same type. Alternatively, use {col: dtype, ...}, where col is a
8622 | column label and dtype is a numpy.dtype or Python type to cast one
8623 | or more of the DataFrame's columns to column-specific types.
8624 | copy : bool, default True.
8625 | Return a copy when ``copy=True`` (be very careful setting
8626 | ``copy=False`` as changes to values then may propagate to other
8627 | pandas objects).
8628 | errors : {'raise', 'ignore'}, default 'raise'.
8629 | Control raising of exceptions on invalid data for provided dtype.
8630 |
8631 | - ``raise`` : allow exceptions to be raised
8632 | - ``ignore`` : suppress exceptions. On error return original object
8633 |
8634 | .. versionadded:: 0.20.0
8635 |
8636 | raise_on_error : raise on invalid input
8637 | .. deprecated:: 0.20.0
8638 | Use ``errors`` instead
8639 | kwargs : keyword arguments to pass on to the constructor
8640 |
8641 | Returns
8642 | -------
8643 | casted : type of caller
8644 |
8645 | Examples
8646 | --------
8647 | >>> ser = pd.Series([1, 2], dtype='int32')
8648 | >>> ser
8649 | 0 1
8650 | 1 2
8651 | dtype: int32
8652 | >>> ser.astype('int64')
8653 | 0 1
8654 | 1 2
8655 | dtype: int64
8656 |
8657 | Convert to categorical type:
8658 |
8659 | >>> ser.astype('category')
8660 | 0 1
8661 | 1 2
8662 | dtype: category
8663 | Categories (2, int64): [1, 2]
8664 |
8665 | Convert to ordered categorical type with custom ordering:
8666 |
8667 | >>> ser.astype('category', ordered=True, categories=[2, 1])
8668 | 0 1
8669 | 1 2
8670 | dtype: category
8671 | Categories (2, int64): [2 < 1]
8672 |
8673 | Note that using ``copy=False`` and changing data on a new
8674 | pandas object may propagate changes:
8675 |
8676 | >>> s1 = pd.Series([1,2])
8677 | >>> s2 = s1.astype('int64', copy=False)
8678 | >>> s2[0] = 10
8679 | >>> s1 # note that s1[0] has changed too
8680 | 0 10
8681 | 1 2
8682 | dtype: int64
8683 |
8684 | See also
8685 | --------
8686 | pandas.to_datetime : Convert argument to datetime.
8687 | pandas.to_timedelta : Convert argument to timedelta.
8688 | pandas.to_numeric : Convert argument to a numeric type.
8689 | numpy.ndarray.astype : Cast a numpy array to a specified type.
8690 |
8691 | at_time(self, time, asof=False)
8692 | Select values at particular time of day (e.g. 9:30AM).
8693 |
8694 | Raises
8695 | ------
8696 | TypeError
8697 | If the index is not a :class:`DatetimeIndex`
8698 |
8699 | Parameters
8700 | ----------
8701 | time : datetime.time or string
8702 |
8703 | Returns
8704 | -------
8705 | values_at_time : type of caller
8706 |
8707 | Examples
8708 | --------
8709 | >>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
8710 | >>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
8711 | >>> ts
8712 | A
8713 | 2018-04-09 00:00:00 1
8714 | 2018-04-09 12:00:00 2
8715 | 2018-04-10 00:00:00 3
8716 | 2018-04-10 12:00:00 4
8717 |
8718 | >>> ts.at_time('12:00')
8719 | A
8720 | 2018-04-09 12:00:00 2
8721 | 2018-04-10 12:00:00 4
8722 |
8723 | See Also
8724 | --------
8725 | between_time : Select values between particular times of the day
8726 | first : Select initial periods of time series based on a date offset
8727 | last : Select final periods of time series based on a date offset
8728 | DatetimeIndex.indexer_at_time : Get just the index locations for
8729 | values at particular time of the day
8730 |
8731 | between_time(self, start_time, end_time, include_start=True, include_end=True)
8732 | Select values between particular times of the day (e.g., 9:00-9:30 AM).
8733 |
8734 | By setting ``start_time`` to be later than ``end_time``,
8735 | you can get the times that are *not* between the two times.
8736 |
8737 | Raises
8738 | ------
8739 | TypeError
8740 | If the index is not a :class:`DatetimeIndex`
8741 |
8742 | Parameters
8743 | ----------
8744 | start_time : datetime.time or string
8745 | end_time : datetime.time or string
8746 | include_start : boolean, default True
8747 | include_end : boolean, default True
8748 |
8749 | Returns
8750 | -------
8751 | values_between_time : type of caller
8752 |
8753 | Examples
8754 | --------
8755 | >>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
8756 | >>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
8757 | >>> ts
8758 | A
8759 | 2018-04-09 00:00:00 1
8760 | 2018-04-10 00:20:00 2
8761 | 2018-04-11 00:40:00 3
8762 | 2018-04-12 01:00:00 4
8763 |
8764 | >>> ts.between_time('0:15', '0:45')
8765 | A
8766 | 2018-04-10 00:20:00 2
8767 | 2018-04-11 00:40:00 3
8768 |
8769 | You get the times that are *not* between two times by setting
8770 | ``start_time`` later than ``end_time``:
8771 |
8772 | >>> ts.between_time('0:45', '0:15')
8773 | A
8774 | 2018-04-09 00:00:00 1
8775 | 2018-04-12 01:00:00 4
8776 |
8777 | See Also
8778 | --------
8779 | at_time : Select values at a particular time of the day
8780 | first : Select initial periods of time series based on a date offset
8781 | last : Select final periods of time series based on a date offset
8782 | DatetimeIndex.indexer_between_time : Get just the index locations for
8783 | values between particular times of the day
8784 |
8785 | bfill(self, axis=None, inplace=False, limit=None, downcast=None)
8786 | Synonym for :meth:`DataFrame.fillna(method='bfill') <DataFrame.fillna>`
8787 |
8788 | bool(self)
8789 | Return the bool of a single element PandasObject.
8790 |
8791 | This must be a boolean scalar value, either True or False. Raise a
8792 | ValueError if the PandasObject does not have exactly 1 element, or that
8793 | element is not boolean
8794 |
8795 | clip(self, lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
8796 | Trim values at input threshold(s).
8797 |
8798 | Assigns values outside boundary to boundary values. Thresholds
8799 | can be singular values or array like, and in the latter case
8800 | the clipping is performed element-wise in the specified axis.
8801 |
8802 | Parameters
8803 | ----------
8804 | lower : float or array_like, default None
8805 | Minimum threshold value. All values below this
8806 | threshold will be set to it.
8807 | upper : float or array_like, default None
8808 | Maximum threshold value. All values above this
8809 | threshold will be set to it.
8810 | axis : int or string axis name, optional
8811 | Align object with lower and upper along the given axis.
8812 | inplace : boolean, default False
8813 | Whether to perform the operation in place on the data.
8814 |
8815 | .. versionadded:: 0.21.0
8816 | *args, **kwargs
8817 | Additional keywords have no effect but might be accepted
8818 | for compatibility with numpy.
8819 |
8820 | See Also
8821 | --------
8822 | clip_lower : Clip values below specified threshold(s).
8823 | clip_upper : Clip values above specified threshold(s).
8824 |
8825 | Returns
8826 | -------
8827 | Series or DataFrame
8828 | Same type as calling object with the values outside the
8829 | clip boundaries replaced
8830 |
8831 | Examples
8832 | --------
8833 | >>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
8834 | >>> df = pd.DataFrame(data)
8835 | >>> df
8836 | col_0 col_1
8837 | 0 9 -2
8838 | 1 -3 -7
8839 | 2 0 6
8840 | 3 -1 8
8841 | 4 5 -5
8842 |
8843 | Clips per column using lower and upper thresholds:
8844 |
8845 | >>> df.clip(-4, 6)
8846 | col_0 col_1
8847 | 0 6 -2
8848 | 1 -3 -4
8849 | 2 0 6
8850 | 3 -1 6
8851 | 4 5 -4
8852 |
8853 | Clips using specific lower and upper thresholds per column element:
8854 |
8855 | >>> t = pd.Series([2, -4, -1, 6, 3])
8856 | >>> t
8857 | 0 2
8858 | 1 -4
8859 | 2 -1
8860 | 3 6
8861 | 4 3
8862 | dtype: int64
8863 |
8864 | >>> df.clip(t, t + 4, axis=0)
8865 | col_0 col_1
8866 | 0 6 2
8867 | 1 -3 -4
8868 | 2 0 3
8869 | 3 6 8
8870 | 4 5 3
8871 |
8872 | clip_lower(self, threshold, axis=None, inplace=False)
8873 | Return copy of the input with values below a threshold truncated.
8874 |
8875 | Parameters
8876 | ----------
8877 | threshold : numeric or array-like
8878 | Minimum value allowed. All values below threshold will be set to
8879 | this value.
8880 |
8881 | * float : every value is compared to `threshold`.
8882 | * array-like : The shape of `threshold` should match the object
8883 | it's compared to. When `self` is a Series, `threshold` should be
8884 | the length. When `self` is a DataFrame, `threshold` should 2-D
8885 | and the same shape as `self` for ``axis=None``, or 1-D and the
8886 | same length as the axis being compared.
8887 |
8888 | axis : {0 or 'index', 1 or 'columns'}, default 0
8889 | Align `self` with `threshold` along the given axis.
8890 |
8891 | inplace : boolean, default False
8892 | Whether to perform the operation in place on the data.
8893 |
8894 | .. versionadded:: 0.21.0
8895 |
8896 | See Also
8897 | --------
8898 | Series.clip : Return copy of input with values below and above
8899 | thresholds truncated.
8900 | Series.clip_upper : Return copy of input with values above
8901 | threshold truncated.
8902 |
8903 | Returns
8904 | -------
8905 | clipped : same type as input
8906 |
8907 | Examples
8908 | --------
8909 | Series single threshold clipping:
8910 |
8911 | >>> s = pd.Series([5, 6, 7, 8, 9])
8912 | >>> s.clip_lower(8)
8913 | 0 8
8914 | 1 8
8915 | 2 8
8916 | 3 8
8917 | 4 9
8918 | dtype: int64
8919 |
8920 | Series clipping element-wise using an array of thresholds. `threshold`
8921 | should be the same length as the Series.
8922 |
8923 | >>> elemwise_thresholds = [4, 8, 7, 2, 5]
8924 | >>> s.clip_lower(elemwise_thresholds)
8925 | 0 5
8926 | 1 8
8927 | 2 7
8928 | 3 8
8929 | 4 9
8930 | dtype: int64
8931 |
8932 | DataFrames can be compared to a scalar.
8933 |
8934 | >>> df = pd.DataFrame({"A": [1, 3, 5], "B": [2, 4, 6]})
8935 | >>> df
8936 | A B
8937 | 0 1 2
8938 | 1 3 4
8939 | 2 5 6
8940 |
8941 | >>> df.clip_lower(3)
8942 | A B
8943 | 0 3 3
8944 | 1 3 4
8945 | 2 5 6
8946 |
8947 | Or to an array of values. By default, `threshold` should be the same
8948 | shape as the DataFrame.
8949 |
8950 | >>> df.clip_lower(np.array([[3, 4], [2, 2], [6, 2]]))
8951 | A B
8952 | 0 3 4
8953 | 1 3 4
8954 | 2 6 6
8955 |
8956 | Control how `threshold` is broadcast with `axis`. In this case
8957 | `threshold` should be the same length as the axis specified by
8958 | `axis`.
8959 |
8960 | >>> df.clip_lower(np.array([3, 3, 5]), axis='index')
8961 | A B
8962 | 0 3 3
8963 | 1 3 4
8964 | 2 5 6
8965 |
8966 | >>> df.clip_lower(np.array([4, 5]), axis='columns')
8967 | A B
8968 | 0 4 5
8969 | 1 4 5
8970 | 2 5 6
8971 |
8972 | clip_upper(self, threshold, axis=None, inplace=False)
8973 | Return copy of input with values above given value(s) truncated.
8974 |
8975 | Parameters
8976 | ----------
8977 | threshold : float or array_like
8978 | axis : int or string axis name, optional
8979 | Align object with threshold along the given axis.
8980 | inplace : boolean, default False
8981 | Whether to perform the operation in place on the data
8982 |
8983 | .. versionadded:: 0.21.0
8984 |
8985 | See Also
8986 | --------
8987 | clip
8988 |
8989 | Returns
8990 | -------
8991 | clipped : same type as input
8992 |
8993 | consolidate(self, inplace=False)
8994 | Compute NDFrame with "consolidated" internals (data of each dtype
8995 | grouped together in a single ndarray).
8996 |
8997 | .. deprecated:: 0.20.0
8998 | Consolidate will be an internal implementation only.
8999 |
9000 | convert_objects(self, convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)
9001 | Attempt to infer better dtype for object columns.
9002 |
9003 | .. deprecated:: 0.21.0
9004 |
9005 | Parameters
9006 | ----------
9007 | convert_dates : boolean, default True
9008 | If True, convert to date where possible. If 'coerce', force
9009 | conversion, with unconvertible values becoming NaT.
9010 | convert_numeric : boolean, default False
9011 | If True, attempt to coerce to numbers (including strings), with
9012 | unconvertible values becoming NaN.
9013 | convert_timedeltas : boolean, default True
9014 | If True, convert to timedelta where possible. If 'coerce', force
9015 | conversion, with unconvertible values becoming NaT.
9016 | copy : boolean, default True
9017 | If True, return a copy even if no copy is necessary (e.g. no
9018 | conversion was done). Note: This is meant for internal use, and
9019 | should not be confused with inplace.
9020 |
9021 | See Also
9022 | --------
9023 | pandas.to_datetime : Convert argument to datetime.
9024 | pandas.to_timedelta : Convert argument to timedelta.
9025 | pandas.to_numeric : Return a fixed frequency timedelta index,
9026 | with day as the default.
9027 |
9028 | Returns
9029 | -------
9030 | converted : same as input object
9031 |
9032 | copy(self, deep=True)
9033 | Make a copy of this object's indices and data.
9034 |
9035 | When ``deep=True`` (default), a new object will be created with a
9036 | copy of the calling object's data and indices. Modifications to
9037 | the data or indices of the copy will not be reflected in the
9038 | original object (see notes below).
9039 |
9040 | When ``deep=False``, a new object will be created without copying
9041 | the calling object's data or index (only references to the data
9042 | and index are copied). Any changes to the data of the original
9043 | will be reflected in the shallow copy (and vice versa).
9044 |
9045 | Parameters
9046 | ----------
9047 | deep : bool, default True
9048 | Make a deep copy, including a copy of the data and the indices.
9049 | With ``deep=False`` neither the indices nor the data are copied.
9050 |
9051 | Returns
9052 | -------
9053 | copy : Series, DataFrame or Panel
9054 | Object type matches caller.
9055 |
9056 | Notes
9057 | -----
9058 | When ``deep=True``, data is copied but actual Python objects
9059 | will not be copied recursively, only the reference to the object.
9060 | This is in contrast to `copy.deepcopy` in the Standard Library,
9061 | which recursively copies object data (see examples below).
9062 |
9063 | While ``Index`` objects are copied when ``deep=True``, the underlying
9064 | numpy array is not copied for performance reasons. Since ``Index`` is
9065 | immutable, the underlying data can be safely shared and a copy
9066 | is not needed.
9067 |
9068 | Examples
9069 | --------
9070 | >>> s = pd.Series([1, 2], index=["a", "b"])
9071 | >>> s
9072 | a 1
9073 | b 2
9074 | dtype: int64
9075 |
9076 | >>> s_copy = s.copy()
9077 | >>> s_copy
9078 | a 1
9079 | b 2
9080 | dtype: int64
9081 |
9082 | **Shallow copy versus default (deep) copy:**
9083 |
9084 | >>> s = pd.Series([1, 2], index=["a", "b"])
9085 | >>> deep = s.copy()
9086 | >>> shallow = s.copy(deep=False)
9087 |
9088 | Shallow copy shares data and index with original.
9089 |
9090 | >>> s is shallow
9091 | False
9092 | >>> s.values is shallow.values and s.index is shallow.index
9093 | True
9094 |
9095 | Deep copy has own copy of data and index.
9096 |
9097 | >>> s is deep
9098 | False
9099 | >>> s.values is deep.values or s.index is deep.index
9100 | False
9101 |
9102 | Updates to the data shared by shallow copy and original is reflected
9103 | in both; deep copy remains unchanged.
9104 |
9105 | >>> s[0] = 3
9106 | >>> shallow[1] = 4
9107 | >>> s
9108 | a 3
9109 | b 4
9110 | dtype: int64
9111 | >>> shallow
9112 | a 3
9113 | b 4
9114 | dtype: int64
9115 | >>> deep
9116 | a 1
9117 | b 2
9118 | dtype: int64
9119 |
9120 | Note that when copying an object containing Python objects, a deep copy
9121 | will copy the data, but will not do so recursively. Updating a nested
9122 | data object will be reflected in the deep copy.
9123 |
9124 | >>> s = pd.Series([[1, 2], [3, 4]])
9125 | >>> deep = s.copy()
9126 | >>> s[0][0] = 10
9127 | >>> s
9128 | 0 [10, 2]
9129 | 1 [3, 4]
9130 | dtype: object
9131 | >>> deep
9132 | 0 [10, 2]
9133 | 1 [3, 4]
9134 | dtype: object
9135 |
9136 | describe(self, percentiles=None, include=None, exclude=None)
9137 | Generates descriptive statistics that summarize the central tendency,
9138 | dispersion and shape of a dataset's distribution, excluding
9139 | ``NaN`` values.
9140 |
9141 | Analyzes both numeric and object series, as well
9142 | as ``DataFrame`` column sets of mixed data types. The output
9143 | will vary depending on what is provided. Refer to the notes
9144 | below for more detail.
9145 |
9146 | Parameters
9147 | ----------
9148 | percentiles : list-like of numbers, optional
9149 | The percentiles to include in the output. All should
9150 | fall between 0 and 1. The default is
9151 | ``[.25, .5, .75]``, which returns the 25th, 50th, and
9152 | 75th percentiles.
9153 | include : 'all', list-like of dtypes or None (default), optional
9154 | A white list of data types to include in the result. Ignored
9155 | for ``Series``. Here are the options:
9156 |
9157 | - 'all' : All columns of the input will be included in the output.
9158 | - A list-like of dtypes : Limits the results to the
9159 | provided data types.
9160 | To limit the result to numeric types submit
9161 | ``numpy.number``. To limit it instead to object columns submit
9162 | the ``numpy.object`` data type. Strings
9163 | can also be used in the style of
9164 | ``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
9165 | select pandas categorical columns, use ``'category'``
9166 | - None (default) : The result will include all numeric columns.
9167 | exclude : list-like of dtypes or None (default), optional,
9168 | A black list of data types to omit from the result. Ignored
9169 | for ``Series``. Here are the options:
9170 |
9171 | - A list-like of dtypes : Excludes the provided data types
9172 | from the result. To exclude numeric types submit
9173 | ``numpy.number``. To exclude object columns submit the data
9174 | type ``numpy.object``. Strings can also be used in the style of
9175 | ``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
9176 | exclude pandas categorical columns, use ``'category'``
9177 | - None (default) : The result will exclude nothing.
9178 |
9179 | Returns
9180 | -------
9181 | summary: Series/DataFrame of summary statistics
9182 |
9183 | Notes
9184 | -----
9185 | For numeric data, the result's index will include ``count``,
9186 | ``mean``, ``std``, ``min``, ``max`` as well as lower, ``50`` and
9187 | upper percentiles. By default the lower percentile is ``25`` and the
9188 | upper percentile is ``75``. The ``50`` percentile is the
9189 | same as the median.
9190 |
9191 | For object data (e.g. strings or timestamps), the result's index
9192 | will include ``count``, ``unique``, ``top``, and ``freq``. The ``top``
9193 | is the most common value. The ``freq`` is the most common value's
9194 | frequency. Timestamps also include the ``first`` and ``last`` items.
9195 |
9196 | If multiple object values have the highest count, then the
9197 | ``count`` and ``top`` results will be arbitrarily chosen from
9198 | among those with the highest count.
9199 |
9200 | For mixed data types provided via a ``DataFrame``, the default is to
9201 | return only an analysis of numeric columns. If the dataframe consists
9202 | only of object and categorical data without any numeric columns, the
9203 | default is to return an analysis of both the object and categorical
9204 | columns. If ``include='all'`` is provided as an option, the result
9205 | will include a union of attributes of each type.
9206 |
9207 | The `include` and `exclude` parameters can be used to limit
9208 | which columns in a ``DataFrame`` are analyzed for the output.
9209 | The parameters are ignored when analyzing a ``Series``.
9210 |
9211 | Examples
9212 | --------
9213 | Describing a numeric ``Series``.
9214 |
9215 | >>> s = pd.Series([1, 2, 3])
9216 | >>> s.describe()
9217 | count 3.0
9218 | mean 2.0
9219 | std 1.0
9220 | min 1.0
9221 | 25% 1.5
9222 | 50% 2.0
9223 | 75% 2.5
9224 | max 3.0
9225 |
9226 | Describing a categorical ``Series``.
9227 |
9228 | >>> s = pd.Series(['a', 'a', 'b', 'c'])
9229 | >>> s.describe()
9230 | count 4
9231 | unique 3
9232 | top a
9233 | freq 2
9234 | dtype: object
9235 |
9236 | Describing a timestamp ``Series``.
9237 |
9238 | >>> s = pd.Series([
9239 | ... np.datetime64("2000-01-01"),
9240 | ... np.datetime64("2010-01-01"),
9241 | ... np.datetime64("2010-01-01")
9242 | ... ])
9243 | >>> s.describe()
9244 | count 3
9245 | unique 2
9246 | top 2010-01-01 00:00:00
9247 | freq 2
9248 | first 2000-01-01 00:00:00
9249 | last 2010-01-01 00:00:00
9250 | dtype: object
9251 |
9252 | Describing a ``DataFrame``. By default only numeric fields
9253 | are returned.
9254 |
9255 | >>> df = pd.DataFrame({ 'object': ['a', 'b', 'c'],
9256 | ... 'numeric': [1, 2, 3],
9257 | ... 'categorical': pd.Categorical(['d','e','f'])
9258 | ... })
9259 | >>> df.describe()
9260 | numeric
9261 | count 3.0
9262 | mean 2.0
9263 | std 1.0
9264 | min 1.0
9265 | 25% 1.5
9266 | 50% 2.0
9267 | 75% 2.5
9268 | max 3.0
9269 |
9270 | Describing all columns of a ``DataFrame`` regardless of data type.
9271 |
9272 | >>> df.describe(include='all')
9273 | categorical numeric object
9274 | count 3 3.0 3
9275 | unique 3 NaN 3
9276 | top f NaN c
9277 | freq 1 NaN 1
9278 | mean NaN 2.0 NaN
9279 | std NaN 1.0 NaN
9280 | min NaN 1.0 NaN
9281 | 25% NaN 1.5 NaN
9282 | 50% NaN 2.0 NaN
9283 | 75% NaN 2.5 NaN
9284 | max NaN 3.0 NaN
9285 |
9286 | Describing a column from a ``DataFrame`` by accessing it as
9287 | an attribute.
9288 |
9289 | >>> df.numeric.describe()
9290 | count 3.0
9291 | mean 2.0
9292 | std 1.0
9293 | min 1.0
9294 | 25% 1.5
9295 | 50% 2.0
9296 | 75% 2.5
9297 | max 3.0
9298 | Name: numeric, dtype: float64
9299 |
9300 | Including only numeric columns in a ``DataFrame`` description.
9301 |
9302 | >>> df.describe(include=[np.number])
9303 | numeric
9304 | count 3.0
9305 | mean 2.0
9306 | std 1.0
9307 | min 1.0
9308 | 25% 1.5
9309 | 50% 2.0
9310 | 75% 2.5
9311 | max 3.0
9312 |
9313 | Including only string columns in a ``DataFrame`` description.
9314 |
9315 | >>> df.describe(include=[np.object])
9316 | object
9317 | count 3
9318 | unique 3
9319 | top c
9320 | freq 1
9321 |
9322 | Including only categorical columns from a ``DataFrame`` description.
9323 |
9324 | >>> df.describe(include=['category'])
9325 | categorical
9326 | count 3
9327 | unique 3
9328 | top f
9329 | freq 1
9330 |
9331 | Excluding numeric columns from a ``DataFrame`` description.
9332 |
9333 | >>> df.describe(exclude=[np.number])
9334 | categorical object
9335 | count 3 3
9336 | unique 3 3
9337 | top f c
9338 | freq 1 1
9339 |
9340 | Excluding object columns from a ``DataFrame`` description.
9341 |
9342 | >>> df.describe(exclude=[np.object])
9343 | categorical numeric
9344 | count 3 3.0
9345 | unique 3 NaN
9346 | top f NaN
9347 | freq 1 NaN
9348 | mean NaN 2.0
9349 | std NaN 1.0
9350 | min NaN 1.0
9351 | 25% NaN 1.5
9352 | 50% NaN 2.0
9353 | 75% NaN 2.5
9354 | max NaN 3.0
9355 |
9356 | See Also
9357 | --------
9358 | DataFrame.count
9359 | DataFrame.max
9360 | DataFrame.min
9361 | DataFrame.mean
9362 | DataFrame.std
9363 | DataFrame.select_dtypes
9364 |
9365 | equals(self, other)
9366 | Determines if two NDFrame objects contain the same elements. NaNs in
9367 | the same location are considered equal.
9368 |
9369 | ffill(self, axis=None, inplace=False, limit=None, downcast=None)
9370 | Synonym for :meth:`DataFrame.fillna(method='ffill') <DataFrame.fillna>`
9371 |
9372 | filter(self, items=None, like=None, regex=None, axis=None)
9373 | Subset rows or columns of dataframe according to labels in
9374 | the specified index.
9375 |
9376 | Note that this routine does not filter a dataframe on its
9377 | contents. The filter is applied to the labels of the index.
9378 |
9379 | Parameters
9380 | ----------
9381 | items : list-like
9382 | List of info axis to restrict to (must not all be present)
9383 | like : string
9384 | Keep info axis where "arg in col == True"
9385 | regex : string (regular expression)
9386 | Keep info axis with re.search(regex, col) == True
9387 | axis : int or string axis name
9388 | The axis to filter on. By default this is the info axis,
9389 | 'index' for Series, 'columns' for DataFrame
9390 |
9391 | Returns
9392 | -------
9393 | same type as input object
9394 |
9395 | Examples
9396 | --------
9397 | >>> df
9398 | one two three
9399 | mouse 1 2 3
9400 | rabbit 4 5 6
9401 |
9402 | >>> # select columns by name
9403 | >>> df.filter(items=['one', 'three'])
9404 | one three
9405 | mouse 1 3
9406 | rabbit 4 6
9407 |
9408 | >>> # select columns by regular expression
9409 | >>> df.filter(regex='e$', axis=1)
9410 | one three
9411 | mouse 1 3
9412 | rabbit 4 6
9413 |
9414 | >>> # select rows containing 'bbi'
9415 | >>> df.filter(like='bbi', axis=0)
9416 | one two three
9417 | rabbit 4 5 6
9418 |
9419 | See Also
9420 | --------
9421 | pandas.DataFrame.loc
9422 |
9423 | Notes
9424 | -----
9425 | The ``items``, ``like``, and ``regex`` parameters are
9426 | enforced to be mutually exclusive.
9427 |
9428 | ``axis`` defaults to the info axis that is used when indexing
9429 | with ``[]``.
9430 |
9431 | first(self, offset)
9432 | Convenience method for subsetting initial periods of time series data
9433 | based on a date offset.
9434 |
9435 | Raises
9436 | ------
9437 | TypeError
9438 | If the index is not a :class:`DatetimeIndex`
9439 |
9440 | Parameters
9441 | ----------
9442 | offset : string, DateOffset, dateutil.relativedelta
9443 |
9444 | Examples
9445 | --------
9446 | >>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
9447 | >>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
9448 | >>> ts
9449 | A
9450 | 2018-04-09 1
9451 | 2018-04-11 2
9452 | 2018-04-13 3
9453 | 2018-04-15 4
9454 |
9455 | Get the rows for the first 3 days:
9456 |
9457 | >>> ts.first('3D')
9458 | A
9459 | 2018-04-09 1
9460 | 2018-04-11 2
9461 |
9462 | Notice the data for 3 first calender days were returned, not the first
9463 | 3 days observed in the dataset, and therefore data for 2018-04-13 was
9464 | not returned.
9465 |
9466 | Returns
9467 | -------
9468 | subset : type of caller
9469 |
9470 | See Also
9471 | --------
9472 | last : Select final periods of time series based on a date offset
9473 | at_time : Select values at a particular time of the day
9474 | between_time : Select values between particular times of the day
9475 |
9476 | first_valid_index(self)
9477 | Return index for first non-NA/null value.
9478 |
9479 | Notes
9480 | --------
9481 | If all elements are non-NA/null, returns None.
9482 | Also returns None for empty NDFrame.
9483 |
9484 | Returns
9485 | --------
9486 | scalar : type of index
9487 |
9488 | get(self, key, default=None)
9489 | Get item from object for given key (DataFrame column, Panel slice,
9490 | etc.). Returns default value if not found.
9491 |
9492 | Parameters
9493 | ----------
9494 | key : object
9495 |
9496 | Returns
9497 | -------
9498 | value : type of items contained in object
9499 |
9500 | get_dtype_counts(self)
9501 | Return counts of unique dtypes in this object.
9502 |
9503 | Returns
9504 | -------
9505 | dtype : Series
9506 | Series with the count of columns with each dtype.
9507 |
9508 | See Also
9509 | --------
9510 | dtypes : Return the dtypes in this object.
9511 |
9512 | Examples
9513 | --------
9514 | >>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
9515 | >>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
9516 | >>> df
9517 | str int float
9518 | 0 a 1 1.0
9519 | 1 b 2 2.0
9520 | 2 c 3 3.0
9521 |
9522 | >>> df.get_dtype_counts()
9523 | float64 1
9524 | int64 1
9525 | object 1
9526 | dtype: int64
9527 |
9528 | get_ftype_counts(self)
9529 | Return counts of unique ftypes in this object.
9530 |
9531 | .. deprecated:: 0.23.0
9532 |
9533 | This is useful for SparseDataFrame or for DataFrames containing
9534 | sparse arrays.
9535 |
9536 | Returns
9537 | -------
9538 | dtype : Series
9539 | Series with the count of columns with each type and
9540 | sparsity (dense/sparse)
9541 |
9542 | See Also
9543 | --------
9544 | ftypes : Return ftypes (indication of sparse/dense and dtype) in
9545 | this object.
9546 |
9547 | Examples
9548 | --------
9549 | >>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
9550 | >>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
9551 | >>> df
9552 | str int float
9553 | 0 a 1 1.0
9554 | 1 b 2 2.0
9555 | 2 c 3 3.0
9556 |
9557 | >>> df.get_ftype_counts()
9558 | float64:dense 1
9559 | int64:dense 1
9560 | object:dense 1
9561 | dtype: int64
9562 |
9563 | get_values(self)
9564 | Return an ndarray after converting sparse values to dense.
9565 |
9566 | This is the same as ``.values`` for non-sparse data. For sparse
9567 | data contained in a `pandas.SparseArray`, the data are first
9568 | converted to a dense representation.
9569 |
9570 | Returns
9571 | -------
9572 | numpy.ndarray
9573 | Numpy representation of DataFrame
9574 |
9575 | See Also
9576 | --------
9577 | values : Numpy representation of DataFrame.
9578 | pandas.SparseArray : Container for sparse data.
9579 |
9580 | Examples
9581 | --------
9582 | >>> df = pd.DataFrame({'a': [1, 2], 'b': [True, False],
9583 | ... 'c': [1.0, 2.0]})
9584 | >>> df
9585 | a b c
9586 | 0 1 True 1.0
9587 | 1 2 False 2.0
9588 |
9589 | >>> df.get_values()
9590 | array([[1, True, 1.0], [2, False, 2.0]], dtype=object)
9591 |
9592 | >>> df = pd.DataFrame({"a": pd.SparseArray([1, None, None]),
9593 | ... "c": [1.0, 2.0, 3.0]})
9594 | >>> df
9595 | a c
9596 | 0 1.0 1.0
9597 | 1 NaN 2.0
9598 | 2 NaN 3.0
9599 |
9600 | >>> df.get_values()
9601 | array([[ 1., 1.],
9602 | [nan, 2.],
9603 | [nan, 3.]])
9604 |
9605 | groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)
9606 | Group series using mapper (dict or key function, apply given function
9607 | to group, return result as series) or by a series of columns.
9608 |
9609 | Parameters
9610 | ----------
9611 | by : mapping, function, label, or list of labels
9612 | Used to determine the groups for the groupby.
9613 | If ``by`` is a function, it's called on each value of the object's
9614 | index. If a dict or Series is passed, the Series or dict VALUES
9615 | will be used to determine the groups (the Series' values are first
9616 | aligned; see ``.align()`` method). If an ndarray is passed, the
9617 | values are used as-is determine the groups. A label or list of
9618 | labels may be passed to group by the columns in ``self``. Notice
9619 | that a tuple is interpreted a (single) key.
9620 | axis : int, default 0
9621 | level : int, level name, or sequence of such, default None
9622 | If the axis is a MultiIndex (hierarchical), group by a particular
9623 | level or levels
9624 | as_index : boolean, default True
9625 | For aggregated output, return object with group labels as the
9626 | index. Only relevant for DataFrame input. as_index=False is
9627 | effectively "SQL-style" grouped output
9628 | sort : boolean, default True
9629 | Sort group keys. Get better performance by turning this off.
9630 | Note this does not influence the order of observations within each
9631 | group. groupby preserves the order of rows within each group.
9632 | group_keys : boolean, default True
9633 | When calling apply, add group keys to index to identify pieces
9634 | squeeze : boolean, default False
9635 | reduce the dimensionality of the return type if possible,
9636 | otherwise return a consistent type
9637 | observed : boolean, default False
9638 | This only applies if any of the groupers are Categoricals
9639 | If True: only show observed values for categorical groupers.
9640 | If False: show all values for categorical groupers.
9641 |
9642 | .. versionadded:: 0.23.0
9643 |
9644 | Returns
9645 | -------
9646 | GroupBy object
9647 |
9648 | Examples
9649 | --------
9650 | DataFrame results
9651 |
9652 | >>> data.groupby(func, axis=0).mean()
9653 | >>> data.groupby(['col1', 'col2'])['col3'].mean()
9654 |
9655 | DataFrame with hierarchical index
9656 |
9657 | >>> data.groupby(['col1', 'col2']).mean()
9658 |
9659 | Notes
9660 | -----
9661 | See the `user guide
9662 | <http://pandas.pydata.org/pandas-docs/stable/groupby.html>`_ for more.
9663 |
9664 | See also
9665 | --------
9666 | resample : Convenience method for frequency conversion and resampling
9667 | of time series.
9668 |
9669 | head(self, n=5)
9670 | Return the first `n` rows.
9671 |
9672 | This function returns the first `n` rows for the object based
9673 | on position. It is useful for quickly testing if your object
9674 | has the right type of data in it.
9675 |
9676 | Parameters
9677 | ----------
9678 | n : int, default 5
9679 | Number of rows to select.
9680 |
9681 | Returns
9682 | -------
9683 | obj_head : type of caller
9684 | The first `n` rows of the caller object.
9685 |
9686 | See Also
9687 | --------
9688 | pandas.DataFrame.tail: Returns the last `n` rows.
9689 |
9690 | Examples
9691 | --------
9692 | >>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
9693 | ... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
9694 | >>> df
9695 | animal
9696 | 0 alligator
9697 | 1 bee
9698 | 2 falcon
9699 | 3 lion
9700 | 4 monkey
9701 | 5 parrot
9702 | 6 shark
9703 | 7 whale
9704 | 8 zebra
9705 |
9706 | Viewing the first 5 lines
9707 |
9708 | >>> df.head()
9709 | animal
9710 | 0 alligator
9711 | 1 bee
9712 | 2 falcon
9713 | 3 lion
9714 | 4 monkey
9715 |
9716 | Viewing the first `n` lines (three in this case)
9717 |
9718 | >>> df.head(3)
9719 | animal
9720 | 0 alligator
9721 | 1 bee
9722 | 2 falcon
9723 |
9724 | infer_objects(self)
9725 | Attempt to infer better dtypes for object columns.
9726 |
9727 | Attempts soft conversion of object-dtyped
9728 | columns, leaving non-object and unconvertible
9729 | columns unchanged. The inference rules are the
9730 | same as during normal Series/DataFrame construction.
9731 |
9732 | .. versionadded:: 0.21.0
9733 |
9734 | See Also
9735 | --------
9736 | pandas.to_datetime : Convert argument to datetime.
9737 | pandas.to_timedelta : Convert argument to timedelta.
9738 | pandas.to_numeric : Convert argument to numeric typeR
9739 |
9740 | Returns
9741 | -------
9742 | converted : same type as input object
9743 |
9744 | Examples
9745 | --------
9746 | >>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
9747 | >>> df = df.iloc[1:]
9748 | >>> df
9749 | A
9750 | 1 1
9751 | 2 2
9752 | 3 3
9753 |
9754 | >>> df.dtypes
9755 | A object
9756 | dtype: object
9757 |
9758 | >>> df.infer_objects().dtypes
9759 | A int64
9760 | dtype: object
9761 |
9762 | interpolate(self, method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)
9763 | Interpolate values according to different methods.
9764 |
9765 | Please note that only ``method='linear'`` is supported for
9766 | DataFrames/Series with a MultiIndex.
9767 |
9768 | Parameters
9769 | ----------
9770 | method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
9771 | 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
9772 | 'polynomial', 'spline', 'piecewise_polynomial',
9773 | 'from_derivatives', 'pchip', 'akima'}
9774 |
9775 | * 'linear': ignore the index and treat the values as equally
9776 | spaced. This is the only method supported on MultiIndexes.
9777 | default
9778 | * 'time': interpolation works on daily and higher resolution
9779 | data to interpolate given length of interval
9780 | * 'index', 'values': use the actual numerical values of the index
9781 | * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
9782 | 'barycentric', 'polynomial' is passed to
9783 | ``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
9784 | require that you also specify an `order` (int),
9785 | e.g. df.interpolate(method='polynomial', order=4).
9786 | These use the actual numerical values of the index.
9787 | * 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
9788 | are all wrappers around the scipy interpolation methods of
9789 | similar names. These use the actual numerical values of the
9790 | index. For more information on their behavior, see the
9791 | `scipy documentation
9792 | <http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
9793 | and `tutorial documentation
9794 | <http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__
9795 | * 'from_derivatives' refers to BPoly.from_derivatives which
9796 | replaces 'piecewise_polynomial' interpolation method in
9797 | scipy 0.18
9798 |
9799 | .. versionadded:: 0.18.1
9800 |
9801 | Added support for the 'akima' method
9802 | Added interpolate method 'from_derivatives' which replaces
9803 | 'piecewise_polynomial' in scipy 0.18; backwards-compatible with
9804 | scipy < 0.18
9805 |
9806 | axis : {0, 1}, default 0
9807 | * 0: fill column-by-column
9808 | * 1: fill row-by-row
9809 | limit : int, default None.
9810 | Maximum number of consecutive NaNs to fill. Must be greater than 0.
9811 | limit_direction : {'forward', 'backward', 'both'}, default 'forward'
9812 | limit_area : {'inside', 'outside'}, default None
9813 | * None: (default) no fill restriction
9814 | * 'inside' Only fill NaNs surrounded by valid values (interpolate).
9815 | * 'outside' Only fill NaNs outside valid values (extrapolate).
9816 |
9817 | If limit is specified, consecutive NaNs will be filled in this
9818 | direction.
9819 |
9820 | .. versionadded:: 0.21.0
9821 | inplace : bool, default False
9822 | Update the NDFrame in place if possible.
9823 | downcast : optional, 'infer' or None, defaults to None
9824 | Downcast dtypes if possible.
9825 | kwargs : keyword arguments to pass on to the interpolating function.
9826 |
9827 | Returns
9828 | -------
9829 | Series or DataFrame of same shape interpolated at the NaNs
9830 |
9831 | See Also
9832 | --------
9833 | reindex, replace, fillna
9834 |
9835 | Examples
9836 | --------
9837 |
9838 | Filling in NaNs
9839 |
9840 | >>> s = pd.Series([0, 1, np.nan, 3])
9841 | >>> s.interpolate()
9842 | 0 0
9843 | 1 1
9844 | 2 2
9845 | 3 3
9846 | dtype: float64
9847 |
9848 | keys(self)
9849 | Get the 'info axis' (see Indexing for more)
9850 |
9851 | This is index for Series, columns for DataFrame and major_axis for
9852 | Panel.
9853 |
9854 | last(self, offset)
9855 | Convenience method for subsetting final periods of time series data
9856 | based on a date offset.
9857 |
9858 | Raises
9859 | ------
9860 | TypeError
9861 | If the index is not a :class:`DatetimeIndex`
9862 |
9863 | Parameters
9864 | ----------
9865 | offset : string, DateOffset, dateutil.relativedelta
9866 |
9867 | Examples
9868 | --------
9869 | >>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
9870 | >>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
9871 | >>> ts
9872 | A
9873 | 2018-04-09 1
9874 | 2018-04-11 2
9875 | 2018-04-13 3
9876 | 2018-04-15 4
9877 |
9878 | Get the rows for the last 3 days:
9879 |
9880 | >>> ts.last('3D')
9881 | A
9882 | 2018-04-13 3
9883 | 2018-04-15 4
9884 |
9885 | Notice the data for 3 last calender days were returned, not the last
9886 | 3 observed days in the dataset, and therefore data for 2018-04-11 was
9887 | not returned.
9888 |
9889 | Returns
9890 | -------
9891 | subset : type of caller
9892 |
9893 | See Also
9894 | --------
9895 | first : Select initial periods of time series based on a date offset
9896 | at_time : Select values at a particular time of the day
9897 | between_time : Select values between particular times of the day
9898 |
9899 | last_valid_index(self)
9900 | Return index for last non-NA/null value.
9901 |
9902 | Notes
9903 | --------
9904 | If all elements are non-NA/null, returns None.
9905 | Also returns None for empty NDFrame.
9906 |
9907 | Returns
9908 | --------
9909 | scalar : type of index
9910 |
9911 | mask(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)
9912 | Return an object of same shape as self and whose corresponding
9913 | entries are from self where `cond` is False and otherwise are from
9914 | `other`.
9915 |
9916 | Parameters
9917 | ----------
9918 | cond : boolean NDFrame, array-like, or callable
9919 | Where `cond` is False, keep the original value. Where
9920 | True, replace with corresponding value from `other`.
9921 | If `cond` is callable, it is computed on the NDFrame and
9922 | should return boolean NDFrame or array. The callable must
9923 | not change input NDFrame (though pandas doesn't check it).
9924 |
9925 | .. versionadded:: 0.18.1
9926 | A callable can be used as cond.
9927 |
9928 | other : scalar, NDFrame, or callable
9929 | Entries where `cond` is True are replaced with
9930 | corresponding value from `other`.
9931 | If other is callable, it is computed on the NDFrame and
9932 | should return scalar or NDFrame. The callable must not
9933 | change input NDFrame (though pandas doesn't check it).
9934 |
9935 | .. versionadded:: 0.18.1
9936 | A callable can be used as other.
9937 |
9938 | inplace : boolean, default False
9939 | Whether to perform the operation in place on the data
9940 | axis : alignment axis if needed, default None
9941 | level : alignment level if needed, default None
9942 | errors : str, {'raise', 'ignore'}, default 'raise'
9943 | - ``raise`` : allow exceptions to be raised
9944 | - ``ignore`` : suppress exceptions. On error return original object
9945 |
9946 | Note that currently this parameter won't affect
9947 | the results and will always coerce to a suitable dtype.
9948 |
9949 | try_cast : boolean, default False
9950 | try to cast the result back to the input type (if possible),
9951 | raise_on_error : boolean, default True
9952 | Whether to raise on invalid data types (e.g. trying to where on
9953 | strings)
9954 |
9955 | .. deprecated:: 0.21.0
9956 |
9957 | Returns
9958 | -------
9959 | wh : same type as caller
9960 |
9961 | Notes
9962 | -----
9963 | The mask method is an application of the if-then idiom. For each
9964 | element in the calling DataFrame, if ``cond`` is ``False`` the
9965 | element is used; otherwise the corresponding element from the DataFrame
9966 | ``other`` is used.
9967 |
9968 | The signature for :func:`DataFrame.where` differs from
9969 | :func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
9970 | ``np.where(m, df1, df2)``.
9971 |
9972 | For further details and examples see the ``mask`` documentation in
9973 | :ref:`indexing <indexing.where_mask>`.
9974 |
9975 | Examples
9976 | --------
9977 | >>> s = pd.Series(range(5))
9978 | >>> s.where(s > 0)
9979 | 0 NaN
9980 | 1 1.0
9981 | 2 2.0
9982 | 3 3.0
9983 | 4 4.0
9984 |
9985 | >>> s.mask(s > 0)
9986 | 0 0.0
9987 | 1 NaN
9988 | 2 NaN
9989 | 3 NaN
9990 | 4 NaN
9991 |
9992 | >>> s.where(s > 1, 10)
9993 | 0 10.0
9994 | 1 10.0
9995 | 2 2.0
9996 | 3 3.0
9997 | 4 4.0
9998 |
9999 | >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
10000 | >>> m = df % 3 == 0
10001 | >>> df.where(m, -df)
10002 | A B
10003 | 0 0 -1
10004 | 1 -2 3
10005 | 2 -4 -5
10006 | 3 6 -7
10007 | 4 -8 9
10008 | >>> df.where(m, -df) == np.where(m, df, -df)
10009 | A B
10010 | 0 True True
10011 | 1 True True
10012 | 2 True True
10013 | 3 True True
10014 | 4 True True
10015 | >>> df.where(m, -df) == df.mask(~m, -df)
10016 | A B
10017 | 0 True True
10018 | 1 True True
10019 | 2 True True
10020 | 3 True True
10021 | 4 True True
10022 |
10023 | See Also
10024 | --------
10025 | :func:`DataFrame.where`
10026 |
10027 | pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
10028 | Percentage change between the current and a prior element.
10029 |
10030 | Computes the percentage change from the immediately previous row by
10031 | default. This is useful in comparing the percentage of change in a time
10032 | series of elements.
10033 |
10034 | Parameters
10035 | ----------
10036 | periods : int, default 1
10037 | Periods to shift for forming percent change.
10038 | fill_method : str, default 'pad'
10039 | How to handle NAs before computing percent changes.
10040 | limit : int, default None
10041 | The number of consecutive NAs to fill before stopping.
10042 | freq : DateOffset, timedelta, or offset alias string, optional
10043 | Increment to use from time series API (e.g. 'M' or BDay()).
10044 | **kwargs
10045 | Additional keyword arguments are passed into
10046 | `DataFrame.shift` or `Series.shift`.
10047 |
10048 | Returns
10049 | -------
10050 | chg : Series or DataFrame
10051 | The same type as the calling object.
10052 |
10053 | See Also
10054 | --------
10055 | Series.diff : Compute the difference of two elements in a Series.
10056 | DataFrame.diff : Compute the difference of two elements in a DataFrame.
10057 | Series.shift : Shift the index by some number of periods.
10058 | DataFrame.shift : Shift the index by some number of periods.
10059 |
10060 | Examples
10061 | --------
10062 | **Series**
10063 |
10064 | >>> s = pd.Series([90, 91, 85])
10065 | >>> s
10066 | 0 90
10067 | 1 91
10068 | 2 85
10069 | dtype: int64
10070 |
10071 | >>> s.pct_change()
10072 | 0 NaN
10073 | 1 0.011111
10074 | 2 -0.065934
10075 | dtype: float64
10076 |
10077 | >>> s.pct_change(periods=2)
10078 | 0 NaN
10079 | 1 NaN
10080 | 2 -0.055556
10081 | dtype: float64
10082 |
10083 | See the percentage change in a Series where filling NAs with last
10084 | valid observation forward to next valid.
10085 |
10086 | >>> s = pd.Series([90, 91, None, 85])
10087 | >>> s
10088 | 0 90.0
10089 | 1 91.0
10090 | 2 NaN
10091 | 3 85.0
10092 | dtype: float64
10093 |
10094 | >>> s.pct_change(fill_method='ffill')
10095 | 0 NaN
10096 | 1 0.011111
10097 | 2 0.000000
10098 | 3 -0.065934
10099 | dtype: float64
10100 |
10101 | **DataFrame**
10102 |
10103 | Percentage change in French franc, Deutsche Mark, and Italian lira from
10104 | 1980-01-01 to 1980-03-01.
10105 |
10106 | >>> df = pd.DataFrame({
10107 | ... 'FR': [4.0405, 4.0963, 4.3149],
10108 | ... 'GR': [1.7246, 1.7482, 1.8519],
10109 | ... 'IT': [804.74, 810.01, 860.13]},
10110 | ... index=['1980-01-01', '1980-02-01', '1980-03-01'])
10111 | >>> df
10112 | FR GR IT
10113 | 1980-01-01 4.0405 1.7246 804.74
10114 | 1980-02-01 4.0963 1.7482 810.01
10115 | 1980-03-01 4.3149 1.8519 860.13
10116 |
10117 | >>> df.pct_change()
10118 | FR GR IT
10119 | 1980-01-01 NaN NaN NaN
10120 | 1980-02-01 0.013810 0.013684 0.006549
10121 | 1980-03-01 0.053365 0.059318 0.061876
10122 |
10123 | Percentage of change in GOOG and APPL stock volume. Shows computing
10124 | the percentage change between columns.
10125 |
10126 | >>> df = pd.DataFrame({
10127 | ... '2016': [1769950, 30586265],
10128 | ... '2015': [1500923, 40912316],
10129 | ... '2014': [1371819, 41403351]},
10130 | ... index=['GOOG', 'APPL'])
10131 | >>> df
10132 | 2016 2015 2014
10133 | GOOG 1769950 1500923 1371819
10134 | APPL 30586265 40912316 41403351
10135 |
10136 | >>> df.pct_change(axis='columns')
10137 | 2016 2015 2014
10138 | GOOG NaN -0.151997 -0.086016
10139 | APPL NaN 0.337604 0.012002
10140 |
10141 | pipe(self, func, *args, **kwargs)
10142 | Apply func(self, \*args, \*\*kwargs)
10143 |
10144 | Parameters
10145 | ----------
10146 | func : function
10147 | function to apply to the NDFrame.
10148 | ``args``, and ``kwargs`` are passed into ``func``.
10149 | Alternatively a ``(callable, data_keyword)`` tuple where
10150 | ``data_keyword`` is a string indicating the keyword of
10151 | ``callable`` that expects the NDFrame.
10152 | args : iterable, optional
10153 | positional arguments passed into ``func``.
10154 | kwargs : mapping, optional
10155 | a dictionary of keyword arguments passed into ``func``.
10156 |
10157 | Returns
10158 | -------
10159 | object : the return type of ``func``.
10160 |
10161 | Notes
10162 | -----
10163 |
10164 | Use ``.pipe`` when chaining together functions that expect
10165 | Series, DataFrames or GroupBy objects. Instead of writing
10166 |
10167 | >>> f(g(h(df), arg1=a), arg2=b, arg3=c)
10168 |
10169 | You can write
10170 |
10171 | >>> (df.pipe(h)
10172 | ... .pipe(g, arg1=a)
10173 | ... .pipe(f, arg2=b, arg3=c)
10174 | ... )
10175 |
10176 | If you have a function that takes the data as (say) the second
10177 | argument, pass a tuple indicating which keyword expects the
10178 | data. For example, suppose ``f`` takes its data as ``arg2``:
10179 |
10180 | >>> (df.pipe(h)
10181 | ... .pipe(g, arg1=a)
10182 | ... .pipe((f, 'arg2'), arg1=a, arg3=c)
10183 | ... )
10184 |
10185 | See Also
10186 | --------
10187 | pandas.DataFrame.apply
10188 | pandas.DataFrame.applymap
10189 | pandas.Series.map
10190 |
10191 | pop(self, item)
10192 | Return item and drop from frame. Raise KeyError if not found.
10193 |
10194 | Parameters
10195 | ----------
10196 | item : str
10197 | Column label to be popped
10198 |
10199 | Returns
10200 | -------
10201 | popped : Series
10202 |
10203 | Examples
10204 | --------
10205 | >>> df = pd.DataFrame([('falcon', 'bird', 389.0),
10206 | ... ('parrot', 'bird', 24.0),
10207 | ... ('lion', 'mammal', 80.5),
10208 | ... ('monkey', 'mammal', np.nan)],
10209 | ... columns=('name', 'class', 'max_speed'))
10210 | >>> df
10211 | name class max_speed
10212 | 0 falcon bird 389.0
10213 | 1 parrot bird 24.0
10214 | 2 lion mammal 80.5
10215 | 3 monkey mammal NaN
10216 |
10217 | >>> df.pop('class')
10218 | 0 bird
10219 | 1 bird
10220 | 2 mammal
10221 | 3 mammal
10222 | Name: class, dtype: object
10223 |
10224 | >>> df
10225 | name max_speed
10226 | 0 falcon 389.0
10227 | 1 parrot 24.0
10228 | 2 lion 80.5
10229 | 3 monkey NaN
10230 |
10231 | rank(self, axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
10232 | Compute numerical data ranks (1 through n) along axis. Equal values are
10233 | assigned a rank that is the average of the ranks of those values
10234 |
10235 | Parameters
10236 | ----------
10237 | axis : {0 or 'index', 1 or 'columns'}, default 0
10238 | index to direct ranking
10239 | method : {'average', 'min', 'max', 'first', 'dense'}
10240 | * average: average rank of group
10241 | * min: lowest rank in group
10242 | * max: highest rank in group
10243 | * first: ranks assigned in order they appear in the array
10244 | * dense: like 'min', but rank always increases by 1 between groups
10245 | numeric_only : boolean, default None
10246 | Include only float, int, boolean data. Valid only for DataFrame or
10247 | Panel objects
10248 | na_option : {'keep', 'top', 'bottom'}
10249 | * keep: leave NA values where they are
10250 | * top: smallest rank if ascending
10251 | * bottom: smallest rank if descending
10252 | ascending : boolean, default True
10253 | False for ranks by high (1) to low (N)
10254 | pct : boolean, default False
10255 | Computes percentage rank of data
10256 |
10257 | Returns
10258 | -------
10259 | ranks : same type as caller
10260 |
10261 | reindex_like(self, other, method=None, copy=True, limit=None, tolerance=None)
10262 | Return an object with matching indices to myself.
10263 |
10264 | Parameters
10265 | ----------
10266 | other : Object
10267 | method : string or None
10268 | copy : boolean, default True
10269 | limit : int, default None
10270 | Maximum number of consecutive labels to fill for inexact matches.
10271 | tolerance : optional
10272 | Maximum distance between labels of the other object and this
10273 | object for inexact matches. Can be list-like.
10274 |
10275 | .. versionadded:: 0.21.0 (list-like tolerance)
10276 |
10277 | Notes
10278 | -----
10279 | Like calling s.reindex(index=other.index, columns=other.columns,
10280 | method=...)
10281 |
10282 | Returns
10283 | -------
10284 | reindexed : same as input
10285 |
10286 | rename_axis(self, mapper, axis=0, copy=True, inplace=False)
10287 | Alter the name of the index or columns.
10288 |
10289 | Parameters
10290 | ----------
10291 | mapper : scalar, list-like, optional
10292 | Value to set as the axis name attribute.
10293 | axis : {0 or 'index', 1 or 'columns'}, default 0
10294 | The index or the name of the axis.
10295 | copy : boolean, default True
10296 | Also copy underlying data.
10297 | inplace : boolean, default False
10298 | Modifies the object directly, instead of creating a new Series
10299 | or DataFrame.
10300 |
10301 | Returns
10302 | -------
10303 | renamed : Series, DataFrame, or None
10304 | The same type as the caller or None if `inplace` is True.
10305 |
10306 | Notes
10307 | -----
10308 | Prior to version 0.21.0, ``rename_axis`` could also be used to change
10309 | the axis *labels* by passing a mapping or scalar. This behavior is
10310 | deprecated and will be removed in a future version. Use ``rename``
10311 | instead.
10312 |
10313 | See Also
10314 | --------
10315 | pandas.Series.rename : Alter Series index labels or name
10316 | pandas.DataFrame.rename : Alter DataFrame index labels or name
10317 | pandas.Index.rename : Set new names on index
10318 |
10319 | Examples
10320 | --------
10321 | **Series**
10322 |
10323 | >>> s = pd.Series([1, 2, 3])
10324 | >>> s.rename_axis("foo")
10325 | foo
10326 | 0 1
10327 | 1 2
10328 | 2 3
10329 | dtype: int64
10330 |
10331 | **DataFrame**
10332 |
10333 | >>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
10334 | >>> df.rename_axis("foo")
10335 | A B
10336 | foo
10337 | 0 1 4
10338 | 1 2 5
10339 | 2 3 6
10340 |
10341 | >>> df.rename_axis("bar", axis="columns")
10342 | bar A B
10343 | 0 1 4
10344 | 1 2 5
10345 | 2 3 6
10346 |
10347 | resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)
10348 | Convenience method for frequency conversion and resampling of time
10349 | series. Object must have a datetime-like index (DatetimeIndex,
10350 | PeriodIndex, or TimedeltaIndex), or pass datetime-like values
10351 | to the on or level keyword.
10352 |
10353 | Parameters
10354 | ----------
10355 | rule : string
10356 | the offset string or object representing target conversion
10357 | axis : int, optional, default 0
10358 | closed : {'right', 'left'}
10359 | Which side of bin interval is closed. The default is 'left'
10360 | for all frequency offsets except for 'M', 'A', 'Q', 'BM',
10361 | 'BA', 'BQ', and 'W' which all have a default of 'right'.
10362 | label : {'right', 'left'}
10363 | Which bin edge label to label bucket with. The default is 'left'
10364 | for all frequency offsets except for 'M', 'A', 'Q', 'BM',
10365 | 'BA', 'BQ', and 'W' which all have a default of 'right'.
10366 | convention : {'start', 'end', 's', 'e'}
10367 | For PeriodIndex only, controls whether to use the start or end of
10368 | `rule`
10369 | kind: {'timestamp', 'period'}, optional
10370 | Pass 'timestamp' to convert the resulting index to a
10371 | ``DateTimeIndex`` or 'period' to convert it to a ``PeriodIndex``.
10372 | By default the input representation is retained.
10373 | loffset : timedelta
10374 | Adjust the resampled time labels
10375 | base : int, default 0
10376 | For frequencies that evenly subdivide 1 day, the "origin" of the
10377 | aggregated intervals. For example, for '5min' frequency, base could
10378 | range from 0 through 4. Defaults to 0
10379 | on : string, optional
10380 | For a DataFrame, column to use instead of index for resampling.
10381 | Column must be datetime-like.
10382 |
10383 | .. versionadded:: 0.19.0
10384 |
10385 | level : string or int, optional
10386 | For a MultiIndex, level (name or number) to use for
10387 | resampling. Level must be datetime-like.
10388 |
10389 | .. versionadded:: 0.19.0
10390 |
10391 | Returns
10392 | -------
10393 | Resampler object
10394 |
10395 | Notes
10396 | -----
10397 | See the `user guide
10398 | <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling>`_
10399 | for more.
10400 |
10401 | To learn more about the offset strings, please see `this link
10402 | <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__.
10403 |
10404 | Examples
10405 | --------
10406 |
10407 | Start by creating a series with 9 one minute timestamps.
10408 |
10409 | >>> index = pd.date_range('1/1/2000', periods=9, freq='T')
10410 | >>> series = pd.Series(range(9), index=index)
10411 | >>> series
10412 | 2000-01-01 00:00:00 0
10413 | 2000-01-01 00:01:00 1
10414 | 2000-01-01 00:02:00 2
10415 | 2000-01-01 00:03:00 3
10416 | 2000-01-01 00:04:00 4
10417 | 2000-01-01 00:05:00 5
10418 | 2000-01-01 00:06:00 6
10419 | 2000-01-01 00:07:00 7
10420 | 2000-01-01 00:08:00 8
10421 | Freq: T, dtype: int64
10422 |
10423 | Downsample the series into 3 minute bins and sum the values
10424 | of the timestamps falling into a bin.
10425 |
10426 | >>> series.resample('3T').sum()
10427 | 2000-01-01 00:00:00 3
10428 | 2000-01-01 00:03:00 12
10429 | 2000-01-01 00:06:00 21
10430 | Freq: 3T, dtype: int64
10431 |
10432 | Downsample the series into 3 minute bins as above, but label each
10433 | bin using the right edge instead of the left. Please note that the
10434 | value in the bucket used as the label is not included in the bucket,
10435 | which it labels. For example, in the original series the
10436 | bucket ``2000-01-01 00:03:00`` contains the value 3, but the summed
10437 | value in the resampled bucket with the label ``2000-01-01 00:03:00``
10438 | does not include 3 (if it did, the summed value would be 6, not 3).
10439 | To include this value close the right side of the bin interval as
10440 | illustrated in the example below this one.
10441 |
10442 | >>> series.resample('3T', label='right').sum()
10443 | 2000-01-01 00:03:00 3
10444 | 2000-01-01 00:06:00 12
10445 | 2000-01-01 00:09:00 21
10446 | Freq: 3T, dtype: int64
10447 |
10448 | Downsample the series into 3 minute bins as above, but close the right
10449 | side of the bin interval.
10450 |
10451 | >>> series.resample('3T', label='right', closed='right').sum()
10452 | 2000-01-01 00:00:00 0
10453 | 2000-01-01 00:03:00 6
10454 | 2000-01-01 00:06:00 15
10455 | 2000-01-01 00:09:00 15
10456 | Freq: 3T, dtype: int64
10457 |
10458 | Upsample the series into 30 second bins.
10459 |
10460 | >>> series.resample('30S').asfreq()[0:5] #select first 5 rows
10461 | 2000-01-01 00:00:00 0.0
10462 | 2000-01-01 00:00:30 NaN
10463 | 2000-01-01 00:01:00 1.0
10464 | 2000-01-01 00:01:30 NaN
10465 | 2000-01-01 00:02:00 2.0
10466 | Freq: 30S, dtype: float64
10467 |
10468 | Upsample the series into 30 second bins and fill the ``NaN``
10469 | values using the ``pad`` method.
10470 |
10471 | >>> series.resample('30S').pad()[0:5]
10472 | 2000-01-01 00:00:00 0
10473 | 2000-01-01 00:00:30 0
10474 | 2000-01-01 00:01:00 1
10475 | 2000-01-01 00:01:30 1
10476 | 2000-01-01 00:02:00 2
10477 | Freq: 30S, dtype: int64
10478 |
10479 | Upsample the series into 30 second bins and fill the
10480 | ``NaN`` values using the ``bfill`` method.
10481 |
10482 | >>> series.resample('30S').bfill()[0:5]
10483 | 2000-01-01 00:00:00 0
10484 | 2000-01-01 00:00:30 1
10485 | 2000-01-01 00:01:00 1
10486 | 2000-01-01 00:01:30 2
10487 | 2000-01-01 00:02:00 2
10488 | Freq: 30S, dtype: int64
10489 |
10490 | Pass a custom function via ``apply``
10491 |
10492 | >>> def custom_resampler(array_like):
10493 | ... return np.sum(array_like)+5
10494 |
10495 | >>> series.resample('3T').apply(custom_resampler)
10496 | 2000-01-01 00:00:00 8
10497 | 2000-01-01 00:03:00 17
10498 | 2000-01-01 00:06:00 26
10499 | Freq: 3T, dtype: int64
10500 |
10501 | For a Series with a PeriodIndex, the keyword `convention` can be
10502 | used to control whether to use the start or end of `rule`.
10503 |
10504 | >>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
10505 | freq='A',
10506 | periods=2))
10507 | >>> s
10508 | 2012 1
10509 | 2013 2
10510 | Freq: A-DEC, dtype: int64
10511 |
10512 | Resample by month using 'start' `convention`. Values are assigned to
10513 | the first month of the period.
10514 |
10515 | >>> s.resample('M', convention='start').asfreq().head()
10516 | 2012-01 1.0
10517 | 2012-02 NaN
10518 | 2012-03 NaN
10519 | 2012-04 NaN
10520 | 2012-05 NaN
10521 | Freq: M, dtype: float64
10522 |
10523 | Resample by month using 'end' `convention`. Values are assigned to
10524 | the last month of the period.
10525 |
10526 | >>> s.resample('M', convention='end').asfreq()
10527 | 2012-12 1.0
10528 | 2013-01 NaN
10529 | 2013-02 NaN
10530 | 2013-03 NaN
10531 | 2013-04 NaN
10532 | 2013-05 NaN
10533 | 2013-06 NaN
10534 | 2013-07 NaN
10535 | 2013-08 NaN
10536 | 2013-09 NaN
10537 | 2013-10 NaN
10538 | 2013-11 NaN
10539 | 2013-12 2.0
10540 | Freq: M, dtype: float64
10541 |
10542 | For DataFrame objects, the keyword ``on`` can be used to specify the
10543 | column instead of the index for resampling.
10544 |
10545 | >>> df = pd.DataFrame(data=9*[range(4)], columns=['a', 'b', 'c', 'd'])
10546 | >>> df['time'] = pd.date_range('1/1/2000', periods=9, freq='T')
10547 | >>> df.resample('3T', on='time').sum()
10548 | a b c d
10549 | time
10550 | 2000-01-01 00:00:00 0 3 6 9
10551 | 2000-01-01 00:03:00 0 3 6 9
10552 | 2000-01-01 00:06:00 0 3 6 9
10553 |
10554 | For a DataFrame with MultiIndex, the keyword ``level`` can be used to
10555 | specify on level the resampling needs to take place.
10556 |
10557 | >>> time = pd.date_range('1/1/2000', periods=5, freq='T')
10558 | >>> df2 = pd.DataFrame(data=10*[range(4)],
10559 | columns=['a', 'b', 'c', 'd'],
10560 | index=pd.MultiIndex.from_product([time, [1, 2]])
10561 | )
10562 | >>> df2.resample('3T', level=0).sum()
10563 | a b c d
10564 | 2000-01-01 00:00:00 0 6 12 18
10565 | 2000-01-01 00:03:00 0 4 8 12
10566 |
10567 | See also
10568 | --------
10569 | groupby : Group by mapping, function, label, or list of labels.
10570 |
10571 | sample(self, n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
10572 | Return a random sample of items from an axis of object.
10573 |
10574 | You can use `random_state` for reproducibility.
10575 |
10576 | Parameters
10577 | ----------
10578 | n : int, optional
10579 | Number of items from axis to return. Cannot be used with `frac`.
10580 | Default = 1 if `frac` = None.
10581 | frac : float, optional
10582 | Fraction of axis items to return. Cannot be used with `n`.
10583 | replace : boolean, optional
10584 | Sample with or without replacement. Default = False.
10585 | weights : str or ndarray-like, optional
10586 | Default 'None' results in equal probability weighting.
10587 | If passed a Series, will align with target object on index. Index
10588 | values in weights not found in sampled object will be ignored and
10589 | index values in sampled object not in weights will be assigned
10590 | weights of zero.
10591 | If called on a DataFrame, will accept the name of a column
10592 | when axis = 0.
10593 | Unless weights are a Series, weights must be same length as axis
10594 | being sampled.
10595 | If weights do not sum to 1, they will be normalized to sum to 1.
10596 | Missing values in the weights column will be treated as zero.
10597 | inf and -inf values not allowed.
10598 | random_state : int or numpy.random.RandomState, optional
10599 | Seed for the random number generator (if int), or numpy RandomState
10600 | object.
10601 | axis : int or string, optional
10602 | Axis to sample. Accepts axis number or name. Default is stat axis
10603 | for given data type (0 for Series and DataFrames, 1 for Panels).
10604 |
10605 | Returns
10606 | -------
10607 | A new object of same type as caller.
10608 |
10609 | Examples
10610 | --------
10611 | Generate an example ``Series`` and ``DataFrame``:
10612 |
10613 | >>> s = pd.Series(np.random.randn(50))
10614 | >>> s.head()
10615 | 0 -0.038497
10616 | 1 1.820773
10617 | 2 -0.972766
10618 | 3 -1.598270
10619 | 4 -1.095526
10620 | dtype: float64
10621 | >>> df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
10622 | >>> df.head()
10623 | A B C D
10624 | 0 0.016443 -2.318952 -0.566372 -1.028078
10625 | 1 -1.051921 0.438836 0.658280 -0.175797
10626 | 2 -1.243569 -0.364626 -0.215065 0.057736
10627 | 3 1.768216 0.404512 -0.385604 -1.457834
10628 | 4 1.072446 -1.137172 0.314194 -0.046661
10629 |
10630 | Next extract a random sample from both of these objects...
10631 |
10632 | 3 random elements from the ``Series``:
10633 |
10634 | >>> s.sample(n=3)
10635 | 27 -0.994689
10636 | 55 -1.049016
10637 | 67 -0.224565
10638 | dtype: float64
10639 |
10640 | And a random 10% of the ``DataFrame`` with replacement:
10641 |
10642 | >>> df.sample(frac=0.1, replace=True)
10643 | A B C D
10644 | 35 1.981780 0.142106 1.817165 -0.290805
10645 | 49 -1.336199 -0.448634 -0.789640 0.217116
10646 | 40 0.823173 -0.078816 1.009536 1.015108
10647 | 15 1.421154 -0.055301 -1.922594 -0.019696
10648 | 6 -0.148339 0.832938 1.787600 -1.383767
10649 |
10650 | You can use `random state` for reproducibility:
10651 |
10652 | >>> df.sample(random_state=1)
10653 | A B C D
10654 | 37 -2.027662 0.103611 0.237496 -0.165867
10655 | 43 -0.259323 -0.583426 1.516140 -0.479118
10656 | 12 -1.686325 -0.579510 0.985195 -0.460286
10657 | 8 1.167946 0.429082 1.215742 -1.636041
10658 | 9 1.197475 -0.864188 1.554031 -1.505264
10659 |
10660 | select(self, crit, axis=0)
10661 | Return data corresponding to axis labels matching criteria
10662 |
10663 | .. deprecated:: 0.21.0
10664 | Use df.loc[df.index.map(crit)] to select via labels
10665 |
10666 | Parameters
10667 | ----------
10668 | crit : function
10669 | To be called on each index (label). Should return True or False
10670 | axis : int
10671 |
10672 | Returns
10673 | -------
10674 | selection : type of caller
10675 |
10676 | set_axis(self, labels, axis=0, inplace=None)
10677 | Assign desired index to given axis.
10678 |
10679 | Indexes for column or row labels can be changed by assigning
10680 | a list-like or Index.
10681 |
10682 | .. versionchanged:: 0.21.0
10683 |
10684 | The signature is now `labels` and `axis`, consistent with
10685 | the rest of pandas API. Previously, the `axis` and `labels`
10686 | arguments were respectively the first and second positional
10687 | arguments.
10688 |
10689 | Parameters
10690 | ----------
10691 | labels : list-like, Index
10692 | The values for the new index.
10693 |
10694 | axis : {0 or 'index', 1 or 'columns'}, default 0
10695 | The axis to update. The value 0 identifies the rows, and 1
10696 | identifies the columns.
10697 |
10698 | inplace : boolean, default None
10699 | Whether to return a new %(klass)s instance.
10700 |
10701 | .. warning::
10702 |
10703 | ``inplace=None`` currently falls back to to True, but in a
10704 | future version, will default to False. Use inplace=True
10705 | explicitly rather than relying on the default.
10706 |
10707 | Returns
10708 | -------
10709 | renamed : %(klass)s or None
10710 | An object of same type as caller if inplace=False, None otherwise.
10711 |
10712 | See Also
10713 | --------
10714 | pandas.DataFrame.rename_axis : Alter the name of the index or columns.
10715 |
10716 | Examples
10717 | --------
10718 | **Series**
10719 |
10720 | >>> s = pd.Series([1, 2, 3])
10721 | >>> s
10722 | 0 1
10723 | 1 2
10724 | 2 3
10725 | dtype: int64
10726 |
10727 | >>> s.set_axis(['a', 'b', 'c'], axis=0, inplace=False)
10728 | a 1
10729 | b 2
10730 | c 3
10731 | dtype: int64
10732 |
10733 | The original object is not modified.
10734 |
10735 | >>> s
10736 | 0 1
10737 | 1 2
10738 | 2 3
10739 | dtype: int64
10740 |
10741 | **DataFrame**
10742 |
10743 | >>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
10744 |
10745 | Change the row labels.
10746 |
10747 | >>> df.set_axis(['a', 'b', 'c'], axis='index', inplace=False)
10748 | A B
10749 | a 1 4
10750 | b 2 5
10751 | c 3 6
10752 |
10753 | Change the column labels.
10754 |
10755 | >>> df.set_axis(['I', 'II'], axis='columns', inplace=False)
10756 | I II
10757 | 0 1 4
10758 | 1 2 5
10759 | 2 3 6
10760 |
10761 | Now, update the labels inplace.
10762 |
10763 | >>> df.set_axis(['i', 'ii'], axis='columns', inplace=True)
10764 | >>> df
10765 | i ii
10766 | 0 1 4
10767 | 1 2 5
10768 | 2 3 6
10769 |
10770 | slice_shift(self, periods=1, axis=0)
10771 | Equivalent to `shift` without copying data. The shifted data will
10772 | not include the dropped periods and the shifted axis will be smaller
10773 | than the original.
10774 |
10775 | Parameters
10776 | ----------
10777 | periods : int
10778 | Number of periods to move, can be positive or negative
10779 |
10780 | Notes
10781 | -----
10782 | While the `slice_shift` is faster than `shift`, you may pay for it
10783 | later during alignment.
10784 |
10785 | Returns
10786 | -------
10787 | shifted : same type as caller
10788 |
10789 | squeeze(self, axis=None)
10790 | Squeeze length 1 dimensions.
10791 |
10792 | Parameters
10793 | ----------
10794 | axis : None, integer or string axis name, optional
10795 | The axis to squeeze if 1-sized.
10796 |
10797 | .. versionadded:: 0.20.0
10798 |
10799 | Returns
10800 | -------
10801 | scalar if 1-sized, else original object
10802 |
10803 | swapaxes(self, axis1, axis2, copy=True)
10804 | Interchange axes and swap values axes appropriately
10805 |
10806 | Returns
10807 | -------
10808 | y : same as input
10809 |
10810 | tail(self, n=5)
10811 | Return the last `n` rows.
10812 |
10813 | This function returns last `n` rows from the object based on
10814 | position. It is useful for quickly verifying data, for example,
10815 | after sorting or appending rows.
10816 |
10817 | Parameters
10818 | ----------
10819 | n : int, default 5
10820 | Number of rows to select.
10821 |
10822 | Returns
10823 | -------
10824 | type of caller
10825 | The last `n` rows of the caller object.
10826 |
10827 | See Also
10828 | --------
10829 | pandas.DataFrame.head : The first `n` rows of the caller object.
10830 |
10831 | Examples
10832 | --------
10833 | >>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
10834 | ... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
10835 | >>> df
10836 | animal
10837 | 0 alligator
10838 | 1 bee
10839 | 2 falcon
10840 | 3 lion
10841 | 4 monkey
10842 | 5 parrot
10843 | 6 shark
10844 | 7 whale
10845 | 8 zebra
10846 |
10847 | Viewing the last 5 lines
10848 |
10849 | >>> df.tail()
10850 | animal
10851 | 4 monkey
10852 | 5 parrot
10853 | 6 shark
10854 | 7 whale
10855 | 8 zebra
10856 |
10857 | Viewing the last `n` lines (three in this case)
10858 |
10859 | >>> df.tail(3)
10860 | animal
10861 | 6 shark
10862 | 7 whale
10863 | 8 zebra
10864 |
10865 | take(self, indices, axis=0, convert=None, is_copy=True, **kwargs)
10866 | Return the elements in the given *positional* indices along an axis.
10867 |
10868 | This means that we are not indexing according to actual values in
10869 | the index attribute of the object. We are indexing according to the
10870 | actual position of the element in the object.
10871 |
10872 | Parameters
10873 | ----------
10874 | indices : array-like
10875 | An array of ints indicating which positions to take.
10876 | axis : {0 or 'index', 1 or 'columns', None}, default 0
10877 | The axis on which to select elements. ``0`` means that we are
10878 | selecting rows, ``1`` means that we are selecting columns.
10879 | convert : bool, default True
10880 | Whether to convert negative indices into positive ones.
10881 | For example, ``-1`` would map to the ``len(axis) - 1``.
10882 | The conversions are similar to the behavior of indexing a
10883 | regular Python list.
10884 |
10885 | .. deprecated:: 0.21.0
10886 | In the future, negative indices will always be converted.
10887 |
10888 | is_copy : bool, default True
10889 | Whether to return a copy of the original object or not.
10890 | **kwargs
10891 | For compatibility with :meth:`numpy.take`. Has no effect on the
10892 | output.
10893 |
10894 | Returns
10895 | -------
10896 | taken : type of caller
10897 | An array-like containing the elements taken from the object.
10898 |
10899 | See Also
10900 | --------
10901 | DataFrame.loc : Select a subset of a DataFrame by labels.
10902 | DataFrame.iloc : Select a subset of a DataFrame by positions.
10903 | numpy.take : Take elements from an array along an axis.
10904 |
10905 | Examples
10906 | --------
10907 | >>> df = pd.DataFrame([('falcon', 'bird', 389.0),
10908 | ... ('parrot', 'bird', 24.0),
10909 | ... ('lion', 'mammal', 80.5),
10910 | ... ('monkey', 'mammal', np.nan)],
10911 | ... columns=['name', 'class', 'max_speed'],
10912 | ... index=[0, 2, 3, 1])
10913 | >>> df
10914 | name class max_speed
10915 | 0 falcon bird 389.0
10916 | 2 parrot bird 24.0
10917 | 3 lion mammal 80.5
10918 | 1 monkey mammal NaN
10919 |
10920 | Take elements at positions 0 and 3 along the axis 0 (default).
10921 |
10922 | Note how the actual indices selected (0 and 1) do not correspond to
10923 | our selected indices 0 and 3. That's because we are selecting the 0th
10924 | and 3rd rows, not rows whose indices equal 0 and 3.
10925 |
10926 | >>> df.take([0, 3])
10927 | name class max_speed
10928 | 0 falcon bird 389.0
10929 | 1 monkey mammal NaN
10930 |
10931 | Take elements at indices 1 and 2 along the axis 1 (column selection).
10932 |
10933 | >>> df.take([1, 2], axis=1)
10934 | class max_speed
10935 | 0 bird 389.0
10936 | 2 bird 24.0
10937 | 3 mammal 80.5
10938 | 1 mammal NaN
10939 |
10940 | We may take elements using negative integers for positive indices,
10941 | starting from the end of the object, just like with Python lists.
10942 |
10943 | >>> df.take([-1, -2])
10944 | name class max_speed
10945 | 1 monkey mammal NaN
10946 | 3 lion mammal 80.5
10947 |
10948 | to_clipboard(self, excel=True, sep=None, **kwargs)
10949 | Copy object to the system clipboard.
10950 |
10951 | Write a text representation of object to the system clipboard.
10952 | This can be pasted into Excel, for example.
10953 |
10954 | Parameters
10955 | ----------
10956 | excel : bool, default True
10957 | - True, use the provided separator, writing in a csv format for
10958 | allowing easy pasting into excel.
10959 | - False, write a string representation of the object to the
10960 | clipboard.
10961 |
10962 | sep : str, default ``'\t'``
10963 | Field delimiter.
10964 | **kwargs
10965 | These parameters will be passed to DataFrame.to_csv.
10966 |
10967 | See Also
10968 | --------
10969 | DataFrame.to_csv : Write a DataFrame to a comma-separated values
10970 | (csv) file.
10971 | read_clipboard : Read text from clipboard and pass to read_table.
10972 |
10973 | Notes
10974 | -----
10975 | Requirements for your platform.
10976 |
10977 | - Linux : `xclip`, or `xsel` (with `gtk` or `PyQt4` modules)
10978 | - Windows : none
10979 | - OS X : none
10980 |
10981 | Examples
10982 | --------
10983 | Copy the contents of a DataFrame to the clipboard.
10984 |
10985 | >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
10986 | >>> df.to_clipboard(sep=',')
10987 | ... # Wrote the following to the system clipboard:
10988 | ... # ,A,B,C
10989 | ... # 0,1,2,3
10990 | ... # 1,4,5,6
10991 |
10992 | We can omit the the index by passing the keyword `index` and setting
10993 | it to false.
10994 |
10995 | >>> df.to_clipboard(sep=',', index=False)
10996 | ... # Wrote the following to the system clipboard:
10997 | ... # A,B,C
10998 | ... # 1,2,3
10999 | ... # 4,5,6
11000 |
11001 | to_dense(self)
11002 | Return dense representation of NDFrame (as opposed to sparse)
11003 |
11004 | to_hdf(self, path_or_buf, key, **kwargs)
11005 | Write the contained data to an HDF5 file using HDFStore.
11006 |
11007 | Hierarchical Data Format (HDF) is self-describing, allowing an
11008 | application to interpret the structure and contents of a file with
11009 | no outside information. One HDF file can hold a mix of related objects
11010 | which can be accessed as a group or as individual objects.
11011 |
11012 | In order to add another DataFrame or Series to an existing HDF file
11013 | please use append mode and a different a key.
11014 |
11015 | For more information see the :ref:`user guide <io.hdf5>`.
11016 |
11017 | Parameters
11018 | ----------
11019 | path_or_buf : str or pandas.HDFStore
11020 | File path or HDFStore object.
11021 | key : str
11022 | Identifier for the group in the store.
11023 | mode : {'a', 'w', 'r+'}, default 'a'
11024 | Mode to open file:
11025 |
11026 | - 'w': write, a new file is created (an existing file with
11027 | the same name would be deleted).
11028 | - 'a': append, an existing file is opened for reading and
11029 | writing, and if the file does not exist it is created.
11030 | - 'r+': similar to 'a', but the file must already exist.
11031 | format : {'fixed', 'table'}, default 'fixed'
11032 | Possible values:
11033 |
11034 | - 'fixed': Fixed format. Fast writing/reading. Not-appendable,
11035 | nor searchable.
11036 | - 'table': Table format. Write as a PyTables Table structure
11037 | which may perform worse but allow more flexible operations
11038 | like searching / selecting subsets of the data.
11039 | append : bool, default False
11040 | For Table formats, append the input data to the existing.
11041 | data_columns : list of columns or True, optional
11042 | List of columns to create as indexed data columns for on-disk
11043 | queries, or True to use all columns. By default only the axes
11044 | of the object are indexed. See :ref:`io.hdf5-query-data-columns`.
11045 | Applicable only to format='table'.
11046 | complevel : {0-9}, optional
11047 | Specifies a compression level for data.
11048 | A value of 0 disables compression.
11049 | complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
11050 | Specifies the compression library to be used.
11051 | As of v0.20.2 these additional compressors for Blosc are supported
11052 | (default if no compressor specified: 'blosc:blosclz'):
11053 | {'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
11054 | 'blosc:zlib', 'blosc:zstd'}.
11055 | Specifying a compression library which is not available issues
11056 | a ValueError.
11057 | fletcher32 : bool, default False
11058 | If applying compression use the fletcher32 checksum.
11059 | dropna : bool, default False
11060 | If true, ALL nan rows will not be written to store.
11061 | errors : str, default 'strict'
11062 | Specifies how encoding and decoding errors are to be handled.
11063 | See the errors argument for :func:`open` for a full list
11064 | of options.
11065 |
11066 | See Also
11067 | --------
11068 | DataFrame.read_hdf : Read from HDF file.
11069 | DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
11070 | DataFrame.to_sql : Write to a sql table.
11071 | DataFrame.to_feather : Write out feather-format for DataFrames.
11072 | DataFrame.to_csv : Write out to a csv file.
11073 |
11074 | Examples
11075 | --------
11076 | >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
11077 | ... index=['a', 'b', 'c'])
11078 | >>> df.to_hdf('data.h5', key='df', mode='w')
11079 |
11080 | We can add another object to the same file:
11081 |
11082 | >>> s = pd.Series([1, 2, 3, 4])
11083 | >>> s.to_hdf('data.h5', key='s')
11084 |
11085 | Reading from HDF file:
11086 |
11087 | >>> pd.read_hdf('data.h5', 'df')
11088 | A B
11089 | a 1 4
11090 | b 2 5
11091 | c 3 6
11092 | >>> pd.read_hdf('data.h5', 's')
11093 | 0 1
11094 | 1 2
11095 | 2 3
11096 | 3 4
11097 | dtype: int64
11098 |
11099 | Deleting file with data:
11100 |
11101 | >>> import os
11102 | >>> os.remove('data.h5')
11103 |
11104 | to_json(self, path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression=None, index=True)
11105 | Convert the object to a JSON string.
11106 |
11107 | Note NaN's and None will be converted to null and datetime objects
11108 | will be converted to UNIX timestamps.
11109 |
11110 | Parameters
11111 | ----------
11112 | path_or_buf : string or file handle, optional
11113 | File path or object. If not specified, the result is returned as
11114 | a string.
11115 | orient : string
11116 | Indication of expected JSON string format.
11117 |
11118 | * Series
11119 |
11120 | - default is 'index'
11121 | - allowed values are: {'split','records','index'}
11122 |
11123 | * DataFrame
11124 |
11125 | - default is 'columns'
11126 | - allowed values are:
11127 | {'split','records','index','columns','values'}
11128 |
11129 | * The format of the JSON string
11130 |
11131 | - 'split' : dict like {'index' -> [index],
11132 | 'columns' -> [columns], 'data' -> [values]}
11133 | - 'records' : list like
11134 | [{column -> value}, ... , {column -> value}]
11135 | - 'index' : dict like {index -> {column -> value}}
11136 | - 'columns' : dict like {column -> {index -> value}}
11137 | - 'values' : just the values array
11138 | - 'table' : dict like {'schema': {schema}, 'data': {data}}
11139 | describing the data, and the data component is
11140 | like ``orient='records'``.
11141 |
11142 | .. versionchanged:: 0.20.0
11143 |
11144 | date_format : {None, 'epoch', 'iso'}
11145 | Type of date conversion. 'epoch' = epoch milliseconds,
11146 | 'iso' = ISO8601. The default depends on the `orient`. For
11147 | ``orient='table'``, the default is 'iso'. For all other orients,
11148 | the default is 'epoch'.
11149 | double_precision : int, default 10
11150 | The number of decimal places to use when encoding
11151 | floating point values.
11152 | force_ascii : boolean, default True
11153 | Force encoded string to be ASCII.
11154 | date_unit : string, default 'ms' (milliseconds)
11155 | The time unit to encode to, governs timestamp and ISO8601
11156 | precision. One of 's', 'ms', 'us', 'ns' for second, millisecond,
11157 | microsecond, and nanosecond respectively.
11158 | default_handler : callable, default None
11159 | Handler to call if object cannot otherwise be converted to a
11160 | suitable format for JSON. Should receive a single argument which is
11161 | the object to convert and return a serialisable object.
11162 | lines : boolean, default False
11163 | If 'orient' is 'records' write out line delimited json format. Will
11164 | throw ValueError if incorrect 'orient' since others are not list
11165 | like.
11166 |
11167 | .. versionadded:: 0.19.0
11168 |
11169 | compression : {None, 'gzip', 'bz2', 'zip', 'xz'}
11170 | A string representing the compression to use in the output file,
11171 | only used when the first argument is a filename.
11172 |
11173 | .. versionadded:: 0.21.0
11174 |
11175 | index : boolean, default True
11176 | Whether to include the index values in the JSON string. Not
11177 | including the index (``index=False``) is only supported when
11178 | orient is 'split' or 'table'.
11179 |
11180 | .. versionadded:: 0.23.0
11181 |
11182 | See Also
11183 | --------
11184 | pandas.read_json
11185 |
11186 | Examples
11187 | --------
11188 |
11189 | >>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
11190 | ... index=['row 1', 'row 2'],
11191 | ... columns=['col 1', 'col 2'])
11192 | >>> df.to_json(orient='split')
11193 | '{"columns":["col 1","col 2"],
11194 | "index":["row 1","row 2"],
11195 | "data":[["a","b"],["c","d"]]}'
11196 |
11197 | Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
11198 | Note that index labels are not preserved with this encoding.
11199 |
11200 | >>> df.to_json(orient='records')
11201 | '[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
11202 |
11203 | Encoding/decoding a Dataframe using ``'index'`` formatted JSON:
11204 |
11205 | >>> df.to_json(orient='index')
11206 | '{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
11207 |
11208 | Encoding/decoding a Dataframe using ``'columns'`` formatted JSON:
11209 |
11210 | >>> df.to_json(orient='columns')
11211 | '{"col 1":{"row 1":"a","row 2":"c"},"col 2":{"row 1":"b","row 2":"d"}}'
11212 |
11213 | Encoding/decoding a Dataframe using ``'values'`` formatted JSON:
11214 |
11215 | >>> df.to_json(orient='values')
11216 | '[["a","b"],["c","d"]]'
11217 |
11218 | Encoding with Table Schema
11219 |
11220 | >>> df.to_json(orient='table')
11221 | '{"schema": {"fields": [{"name": "index", "type": "string"},
11222 | {"name": "col 1", "type": "string"},
11223 | {"name": "col 2", "type": "string"}],
11224 | "primaryKey": "index",
11225 | "pandas_version": "0.20.0"},
11226 | "data": [{"index": "row 1", "col 1": "a", "col 2": "b"},
11227 | {"index": "row 2", "col 1": "c", "col 2": "d"}]}'
11228 |
11229 | to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None)
11230 | Render an object to a tabular environment table. You can splice
11231 | this into a LaTeX document. Requires \\usepackage{booktabs}.
11232 |
11233 | .. versionchanged:: 0.20.2
11234 | Added to Series
11235 |
11236 | `to_latex`-specific options:
11237 |
11238 | bold_rows : boolean, default False
11239 | Make the row labels bold in the output
11240 | column_format : str, default None
11241 | The columns format as specified in `LaTeX table format
11242 | <https://en.wikibooks.org/wiki/LaTeX/Tables>`__ e.g 'rcl' for 3
11243 | columns
11244 | longtable : boolean, default will be read from the pandas config module
11245 | Default: False.
11246 | Use a longtable environment instead of tabular. Requires adding
11247 | a \\usepackage{longtable} to your LaTeX preamble.
11248 | escape : boolean, default will be read from the pandas config module
11249 | Default: True.
11250 | When set to False prevents from escaping latex special
11251 | characters in column names.
11252 | encoding : str, default None
11253 | A string representing the encoding to use in the output file,
11254 | defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
11255 | decimal : string, default '.'
11256 | Character recognized as decimal separator, e.g. ',' in Europe.
11257 |
11258 | .. versionadded:: 0.18.0
11259 |
11260 | multicolumn : boolean, default True
11261 | Use \multicolumn to enhance MultiIndex columns.
11262 | The default will be read from the config module.
11263 |
11264 | .. versionadded:: 0.20.0
11265 |
11266 | multicolumn_format : str, default 'l'
11267 | The alignment for multicolumns, similar to `column_format`
11268 | The default will be read from the config module.
11269 |
11270 | .. versionadded:: 0.20.0
11271 |
11272 | multirow : boolean, default False
11273 | Use \multirow to enhance MultiIndex rows.
11274 | Requires adding a \\usepackage{multirow} to your LaTeX preamble.
11275 | Will print centered labels (instead of top-aligned)
11276 | across the contained rows, separating groups via clines.
11277 | The default will be read from the pandas config module.
11278 |
11279 | .. versionadded:: 0.20.0
11280 |
11281 | to_msgpack(self, path_or_buf=None, encoding='utf-8', **kwargs)
11282 | msgpack (serialize) object to input file path
11283 |
11284 | THIS IS AN EXPERIMENTAL LIBRARY and the storage format
11285 | may not be stable until a future release.
11286 |
11287 | Parameters
11288 | ----------
11289 | path : string File path, buffer-like, or None
11290 | if None, return generated string
11291 | append : boolean whether to append to an existing msgpack
11292 | (default is False)
11293 | compress : type of compressor (zlib or blosc), default to None (no
11294 | compression)
11295 |
11296 | to_pickle(self, path, compression='infer', protocol=4)
11297 | Pickle (serialize) object to file.
11298 |
11299 | Parameters
11300 | ----------
11301 | path : str
11302 | File path where the pickled object will be stored.
11303 | compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
11304 | A string representing the compression to use in the output file. By
11305 | default, infers from the file extension in specified path.
11306 |
11307 | .. versionadded:: 0.20.0
11308 | protocol : int
11309 | Int which indicates which protocol should be used by the pickler,
11310 | default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible
11311 | values for this parameter depend on the version of Python. For
11312 | Python 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a
11313 | valid value. For Python >= 3.4, 4 is a valid value. A negative
11314 | value for the protocol parameter is equivalent to setting its value
11315 | to HIGHEST_PROTOCOL.
11316 |
11317 | .. [1] https://docs.python.org/3/library/pickle.html
11318 | .. versionadded:: 0.21.0
11319 |
11320 | See Also
11321 | --------
11322 | read_pickle : Load pickled pandas object (or any object) from file.
11323 | DataFrame.to_hdf : Write DataFrame to an HDF5 file.
11324 | DataFrame.to_sql : Write DataFrame to a SQL database.
11325 | DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
11326 |
11327 | Examples
11328 | --------
11329 | >>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
11330 | >>> original_df
11331 | foo bar
11332 | 0 0 5
11333 | 1 1 6
11334 | 2 2 7
11335 | 3 3 8
11336 | 4 4 9
11337 | >>> original_df.to_pickle("./dummy.pkl")
11338 |
11339 | >>> unpickled_df = pd.read_pickle("./dummy.pkl")
11340 | >>> unpickled_df
11341 | foo bar
11342 | 0 0 5
11343 | 1 1 6
11344 | 2 2 7
11345 | 3 3 8
11346 | 4 4 9
11347 |
11348 | >>> import os
11349 | >>> os.remove("./dummy.pkl")
11350 |
11351 | to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)
11352 | Write records stored in a DataFrame to a SQL database.
11353 |
11354 | Databases supported by SQLAlchemy [1]_ are supported. Tables can be
11355 | newly created, appended to, or overwritten.
11356 |
11357 | Parameters
11358 | ----------
11359 | name : string
11360 | Name of SQL table.
11361 | con : sqlalchemy.engine.Engine or sqlite3.Connection
11362 | Using SQLAlchemy makes it possible to use any DB supported by that
11363 | library. Legacy support is provided for sqlite3.Connection objects.
11364 | schema : string, optional
11365 | Specify the schema (if database flavor supports this). If None, use
11366 | default schema.
11367 | if_exists : {'fail', 'replace', 'append'}, default 'fail'
11368 | How to behave if the table already exists.
11369 |
11370 | * fail: Raise a ValueError.
11371 | * replace: Drop the table before inserting new values.
11372 | * append: Insert new values to the existing table.
11373 |
11374 | index : boolean, default True
11375 | Write DataFrame index as a column. Uses `index_label` as the column
11376 | name in the table.
11377 | index_label : string or sequence, default None
11378 | Column label for index column(s). If None is given (default) and
11379 | `index` is True, then the index names are used.
11380 | A sequence should be given if the DataFrame uses MultiIndex.
11381 | chunksize : int, optional
11382 | Rows will be written in batches of this size at a time. By default,
11383 | all rows will be written at once.
11384 | dtype : dict, optional
11385 | Specifying the datatype for columns. The keys should be the column
11386 | names and the values should be the SQLAlchemy types or strings for
11387 | the sqlite3 legacy mode.
11388 |
11389 | Raises
11390 | ------
11391 | ValueError
11392 | When the table already exists and `if_exists` is 'fail' (the
11393 | default).
11394 |
11395 | See Also
11396 | --------
11397 | pandas.read_sql : read a DataFrame from a table
11398 |
11399 | References
11400 | ----------
11401 | .. [1] http://docs.sqlalchemy.org
11402 | .. [2] https://www.python.org/dev/peps/pep-0249/
11403 |
11404 | Examples
11405 | --------
11406 |
11407 | Create an in-memory SQLite database.
11408 |
11409 | >>> from sqlalchemy import create_engine
11410 | >>> engine = create_engine('sqlite://', echo=False)
11411 |
11412 | Create a table from scratch with 3 rows.
11413 |
11414 | >>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
11415 | >>> df
11416 | name
11417 | 0 User 1
11418 | 1 User 2
11419 | 2 User 3
11420 |
11421 | >>> df.to_sql('users', con=engine)
11422 | >>> engine.execute("SELECT * FROM users").fetchall()
11423 | [(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]
11424 |
11425 | >>> df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
11426 | >>> df1.to_sql('users', con=engine, if_exists='append')
11427 | >>> engine.execute("SELECT * FROM users").fetchall()
11428 | [(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
11429 | (0, 'User 4'), (1, 'User 5')]
11430 |
11431 | Overwrite the table with just ``df1``.
11432 |
11433 | >>> df1.to_sql('users', con=engine, if_exists='replace',
11434 | ... index_label='id')
11435 | >>> engine.execute("SELECT * FROM users").fetchall()
11436 | [(0, 'User 4'), (1, 'User 5')]
11437 |
11438 | Specify the dtype (especially useful for integers with missing values).
11439 | Notice that while pandas is forced to store the data as floating point,
11440 | the database supports nullable integers. When fetching the data with
11441 | Python, we get back integer scalars.
11442 |
11443 | >>> df = pd.DataFrame({"A": [1, None, 2]})
11444 | >>> df
11445 | A
11446 | 0 1.0
11447 | 1 NaN
11448 | 2 2.0
11449 |
11450 | >>> from sqlalchemy.types import Integer
11451 | >>> df.to_sql('integers', con=engine, index=False,
11452 | ... dtype={"A": Integer()})
11453 |
11454 | >>> engine.execute("SELECT * FROM integers").fetchall()
11455 | [(1,), (None,), (2,)]
11456 |
11457 | to_xarray(self)
11458 | Return an xarray object from the pandas object.
11459 |
11460 | Returns
11461 | -------
11462 | a DataArray for a Series
11463 | a Dataset for a DataFrame
11464 | a DataArray for higher dims
11465 |
11466 | Examples
11467 | --------
11468 | >>> df = pd.DataFrame({'A' : [1, 1, 2],
11469 | 'B' : ['foo', 'bar', 'foo'],
11470 | 'C' : np.arange(4.,7)})
11471 | >>> df
11472 | A B C
11473 | 0 1 foo 4.0
11474 | 1 1 bar 5.0
11475 | 2 2 foo 6.0
11476 |
11477 | >>> df.to_xarray()
11478 | <xarray.Dataset>
11479 | Dimensions: (index: 3)
11480 | Coordinates:
11481 | * index (index) int64 0 1 2
11482 | Data variables:
11483 | A (index) int64 1 1 2
11484 | B (index) object 'foo' 'bar' 'foo'
11485 | C (index) float64 4.0 5.0 6.0
11486 |
11487 | >>> df = pd.DataFrame({'A' : [1, 1, 2],
11488 | 'B' : ['foo', 'bar', 'foo'],
11489 | 'C' : np.arange(4.,7)}
11490 | ).set_index(['B','A'])
11491 | >>> df
11492 | C
11493 | B A
11494 | foo 1 4.0
11495 | bar 1 5.0
11496 | foo 2 6.0
11497 |
11498 | >>> df.to_xarray()
11499 | <xarray.Dataset>
11500 | Dimensions: (A: 2, B: 2)
11501 | Coordinates:
11502 | * B (B) object 'bar' 'foo'
11503 | * A (A) int64 1 2
11504 | Data variables:
11505 | C (B, A) float64 5.0 nan 4.0 6.0
11506 |
11507 | >>> p = pd.Panel(np.arange(24).reshape(4,3,2),
11508 | items=list('ABCD'),
11509 | major_axis=pd.date_range('20130101', periods=3),
11510 | minor_axis=['first', 'second'])
11511 | >>> p
11512 | <class 'pandas.core.panel.Panel'>
11513 | Dimensions: 4 (items) x 3 (major_axis) x 2 (minor_axis)
11514 | Items axis: A to D
11515 | Major_axis axis: 2013-01-01 00:00:00 to 2013-01-03 00:00:00
11516 | Minor_axis axis: first to second
11517 |
11518 | >>> p.to_xarray()
11519 | <xarray.DataArray (items: 4, major_axis: 3, minor_axis: 2)>
11520 | array([[[ 0, 1],
11521 | [ 2, 3],
11522 | [ 4, 5]],
11523 | [[ 6, 7],
11524 | [ 8, 9],
11525 | [10, 11]],
11526 | [[12, 13],
11527 | [14, 15],
11528 | [16, 17]],
11529 | [[18, 19],
11530 | [20, 21],
11531 | [22, 23]]])
11532 | Coordinates:
11533 | * items (items) object 'A' 'B' 'C' 'D'
11534 | * major_axis (major_axis) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03 # noqa
11535 | * minor_axis (minor_axis) object 'first' 'second'
11536 |
11537 | Notes
11538 | -----
11539 | See the `xarray docs <http://xarray.pydata.org/en/stable/>`__
11540 |
11541 | truncate(self, before=None, after=None, axis=None, copy=True)
11542 | Truncate a Series or DataFrame before and after some index value.
11543 |
11544 | This is a useful shorthand for boolean indexing based on index
11545 | values above or below certain thresholds.
11546 |
11547 | Parameters
11548 | ----------
11549 | before : date, string, int
11550 | Truncate all rows before this index value.
11551 | after : date, string, int
11552 | Truncate all rows after this index value.
11553 | axis : {0 or 'index', 1 or 'columns'}, optional
11554 | Axis to truncate. Truncates the index (rows) by default.
11555 | copy : boolean, default is True,
11556 | Return a copy of the truncated section.
11557 |
11558 | Returns
11559 | -------
11560 | type of caller
11561 | The truncated Series or DataFrame.
11562 |
11563 | See Also
11564 | --------
11565 | DataFrame.loc : Select a subset of a DataFrame by label.
11566 | DataFrame.iloc : Select a subset of a DataFrame by position.
11567 |
11568 | Notes
11569 | -----
11570 | If the index being truncated contains only datetime values,
11571 | `before` and `after` may be specified as strings instead of
11572 | Timestamps.
11573 |
11574 | Examples
11575 | --------
11576 | >>> df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
11577 | ... 'B': ['f', 'g', 'h', 'i', 'j'],
11578 | ... 'C': ['k', 'l', 'm', 'n', 'o']},
11579 | ... index=[1, 2, 3, 4, 5])
11580 | >>> df
11581 | A B C
11582 | 1 a f k
11583 | 2 b g l
11584 | 3 c h m
11585 | 4 d i n
11586 | 5 e j o
11587 |
11588 | >>> df.truncate(before=2, after=4)
11589 | A B C
11590 | 2 b g l
11591 | 3 c h m
11592 | 4 d i n
11593 |
11594 | The columns of a DataFrame can be truncated.
11595 |
11596 | >>> df.truncate(before="A", after="B", axis="columns")
11597 | A B
11598 | 1 a f
11599 | 2 b g
11600 | 3 c h
11601 | 4 d i
11602 | 5 e j
11603 |
11604 | For Series, only rows can be truncated.
11605 |
11606 | >>> df['A'].truncate(before=2, after=4)
11607 | 2 b
11608 | 3 c
11609 | 4 d
11610 | Name: A, dtype: object
11611 |
11612 | The index values in ``truncate`` can be datetimes or string
11613 | dates.
11614 |
11615 | >>> dates = pd.date_range('2016-01-01', '2016-02-01', freq='s')
11616 | >>> df = pd.DataFrame(index=dates, data={'A': 1})
11617 | >>> df.tail()
11618 | A
11619 | 2016-01-31 23:59:56 1
11620 | 2016-01-31 23:59:57 1
11621 | 2016-01-31 23:59:58 1
11622 | 2016-01-31 23:59:59 1
11623 | 2016-02-01 00:00:00 1
11624 |
11625 | >>> df.truncate(before=pd.Timestamp('2016-01-05'),
11626 | ... after=pd.Timestamp('2016-01-10')).tail()
11627 | A
11628 | 2016-01-09 23:59:56 1
11629 | 2016-01-09 23:59:57 1
11630 | 2016-01-09 23:59:58 1
11631 | 2016-01-09 23:59:59 1
11632 | 2016-01-10 00:00:00 1
11633 |
11634 | Because the index is a DatetimeIndex containing only dates, we can
11635 | specify `before` and `after` as strings. They will be coerced to
11636 | Timestamps before truncation.
11637 |
11638 | >>> df.truncate('2016-01-05', '2016-01-10').tail()
11639 | A
11640 | 2016-01-09 23:59:56 1
11641 | 2016-01-09 23:59:57 1
11642 | 2016-01-09 23:59:58 1
11643 | 2016-01-09 23:59:59 1
11644 | 2016-01-10 00:00:00 1
11645 |
11646 | Note that ``truncate`` assumes a 0 value for any unspecified time
11647 | component (midnight). This differs from partial string slicing, which
11648 | returns any partially matching dates.
11649 |
11650 | >>> df.loc['2016-01-05':'2016-01-10', :].tail()
11651 | A
11652 | 2016-01-10 23:59:55 1
11653 | 2016-01-10 23:59:56 1
11654 | 2016-01-10 23:59:57 1
11655 | 2016-01-10 23:59:58 1
11656 | 2016-01-10 23:59:59 1
11657 |
11658 | tshift(self, periods=1, freq=None, axis=0)
11659 | Shift the time index, using the index's frequency if available.
11660 |
11661 | Parameters
11662 | ----------
11663 | periods : int
11664 | Number of periods to move, can be positive or negative
11665 | freq : DateOffset, timedelta, or time rule string, default None
11666 | Increment to use from the tseries module or time rule (e.g. 'EOM')
11667 | axis : int or basestring
11668 | Corresponds to the axis that contains the Index
11669 |
11670 | Notes
11671 | -----
11672 | If freq is not specified then tries to use the freq or inferred_freq
11673 | attributes of the index. If neither of those attributes exist, a
11674 | ValueError is thrown
11675 |
11676 | Returns
11677 | -------
11678 | shifted : NDFrame
11679 |
11680 | tz_convert(self, tz, axis=0, level=None, copy=True)
11681 | Convert tz-aware axis to target time zone.
11682 |
11683 | Parameters
11684 | ----------
11685 | tz : string or pytz.timezone object
11686 | axis : the axis to convert
11687 | level : int, str, default None
11688 | If axis ia a MultiIndex, convert a specific level. Otherwise
11689 | must be None
11690 | copy : boolean, default True
11691 | Also make a copy of the underlying data
11692 |
11693 | Returns
11694 | -------
11695 |
11696 | Raises
11697 | ------
11698 | TypeError
11699 | If the axis is tz-naive.
11700 |
11701 | tz_localize(self, tz, axis=0, level=None, copy=True, ambiguous='raise')
11702 | Localize tz-naive TimeSeries to target time zone.
11703 |
11704 | Parameters
11705 | ----------
11706 | tz : string or pytz.timezone object
11707 | axis : the axis to localize
11708 | level : int, str, default None
11709 | If axis ia a MultiIndex, localize a specific level. Otherwise
11710 | must be None
11711 | copy : boolean, default True
11712 | Also make a copy of the underlying data
11713 | ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
11714 | - 'infer' will attempt to infer fall dst-transition hours based on
11715 | order
11716 | - bool-ndarray where True signifies a DST time, False designates
11717 | a non-DST time (note that this flag is only applicable for
11718 | ambiguous times)
11719 | - 'NaT' will return NaT where there are ambiguous times
11720 | - 'raise' will raise an AmbiguousTimeError if there are ambiguous
11721 | times
11722 |
11723 | Returns
11724 | -------
11725 |
11726 | Raises
11727 | ------
11728 | TypeError
11729 | If the TimeSeries is tz-aware and tz is not None.
11730 |
11731 | where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)
11732 | Return an object of same shape as self and whose corresponding
11733 | entries are from self where `cond` is True and otherwise are from
11734 | `other`.
11735 |
11736 | Parameters
11737 | ----------
11738 | cond : boolean NDFrame, array-like, or callable
11739 | Where `cond` is True, keep the original value. Where
11740 | False, replace with corresponding value from `other`.
11741 | If `cond` is callable, it is computed on the NDFrame and
11742 | should return boolean NDFrame or array. The callable must
11743 | not change input NDFrame (though pandas doesn't check it).
11744 |
11745 | .. versionadded:: 0.18.1
11746 | A callable can be used as cond.
11747 |
11748 | other : scalar, NDFrame, or callable
11749 | Entries where `cond` is False are replaced with
11750 | corresponding value from `other`.
11751 | If other is callable, it is computed on the NDFrame and
11752 | should return scalar or NDFrame. The callable must not
11753 | change input NDFrame (though pandas doesn't check it).
11754 |
11755 | .. versionadded:: 0.18.1
11756 | A callable can be used as other.
11757 |
11758 | inplace : boolean, default False
11759 | Whether to perform the operation in place on the data
11760 | axis : alignment axis if needed, default None
11761 | level : alignment level if needed, default None
11762 | errors : str, {'raise', 'ignore'}, default 'raise'
11763 | - ``raise`` : allow exceptions to be raised
11764 | - ``ignore`` : suppress exceptions. On error return original object
11765 |
11766 | Note that currently this parameter won't affect
11767 | the results and will always coerce to a suitable dtype.
11768 |
11769 | try_cast : boolean, default False
11770 | try to cast the result back to the input type (if possible),
11771 | raise_on_error : boolean, default True
11772 | Whether to raise on invalid data types (e.g. trying to where on
11773 | strings)
11774 |
11775 | .. deprecated:: 0.21.0
11776 |
11777 | Returns
11778 | -------
11779 | wh : same type as caller
11780 |
11781 | Notes
11782 | -----
11783 | The where method is an application of the if-then idiom. For each
11784 | element in the calling DataFrame, if ``cond`` is ``True`` the
11785 | element is used; otherwise the corresponding element from the DataFrame
11786 | ``other`` is used.
11787 |
11788 | The signature for :func:`DataFrame.where` differs from
11789 | :func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
11790 | ``np.where(m, df1, df2)``.
11791 |
11792 | For further details and examples see the ``where`` documentation in
11793 | :ref:`indexing <indexing.where_mask>`.
11794 |
11795 | Examples
11796 | --------
11797 | >>> s = pd.Series(range(5))
11798 | >>> s.where(s > 0)
11799 | 0 NaN
11800 | 1 1.0
11801 | 2 2.0
11802 | 3 3.0
11803 | 4 4.0
11804 |
11805 | >>> s.mask(s > 0)
11806 | 0 0.0
11807 | 1 NaN
11808 | 2 NaN
11809 | 3 NaN
11810 | 4 NaN
11811 |
11812 | >>> s.where(s > 1, 10)
11813 | 0 10.0
11814 | 1 10.0
11815 | 2 2.0
11816 | 3 3.0
11817 | 4 4.0
11818 |
11819 | >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
11820 | >>> m = df % 3 == 0
11821 | >>> df.where(m, -df)
11822 | A B
11823 | 0 0 -1
11824 | 1 -2 3
11825 | 2 -4 -5
11826 | 3 6 -7
11827 | 4 -8 9
11828 | >>> df.where(m, -df) == np.where(m, df, -df)
11829 | A B
11830 | 0 True True
11831 | 1 True True
11832 | 2 True True
11833 | 3 True True
11834 | 4 True True
11835 | >>> df.where(m, -df) == df.mask(~m, -df)
11836 | A B
11837 | 0 True True
11838 | 1 True True
11839 | 2 True True
11840 | 3 True True
11841 | 4 True True
11842 |
11843 | See Also
11844 | --------
11845 | :func:`DataFrame.mask`
11846 |
11847 | xs(self, key, axis=0, level=None, drop_level=True)
11848 | Returns a cross-section (row(s) or column(s)) from the
11849 | Series/DataFrame. Defaults to cross-section on the rows (axis=0).
11850 |
11851 | Parameters
11852 | ----------
11853 | key : object
11854 | Some label contained in the index, or partially in a MultiIndex
11855 | axis : int, default 0
11856 | Axis to retrieve cross-section on
11857 | level : object, defaults to first n levels (n=1 or len(key))
11858 | In case of a key partially contained in a MultiIndex, indicate
11859 | which levels are used. Levels can be referred by label or position.
11860 | drop_level : boolean, default True
11861 | If False, returns object with same levels as self.
11862 |
11863 | Examples
11864 | --------
11865 | >>> df
11866 | A B C
11867 | a 4 5 2
11868 | b 4 0 9
11869 | c 9 7 3
11870 | >>> df.xs('a')
11871 | A 4
11872 | B 5
11873 | C 2
11874 | Name: a
11875 | >>> df.xs('C', axis=1)
11876 | a 2
11877 | b 9
11878 | c 3
11879 | Name: C
11880 |
11881 | >>> df
11882 | A B C D
11883 | first second third
11884 | bar one 1 4 1 8 9
11885 | two 1 7 5 5 0
11886 | baz one 1 6 6 8 0
11887 | three 2 5 3 5 3
11888 | >>> df.xs(('baz', 'three'))
11889 | A B C D
11890 | third
11891 | 2 5 3 5 3
11892 | >>> df.xs('one', level=1)
11893 | A B C D
11894 | first third
11895 | bar 1 4 1 8 9
11896 | baz 1 6 6 8 0
11897 | >>> df.xs(('baz', 2), level=[0, 'third'])
11898 | A B C D
11899 | second
11900 | three 5 3 5 3
11901 |
11902 | Returns
11903 | -------
11904 | xs : Series or DataFrame
11905 |
11906 | Notes
11907 | -----
11908 | xs is only for getting, not setting values.
11909 |
11910 | MultiIndex Slicers is a generic way to get/set values on any level or
11911 | levels. It is a superset of xs functionality, see
11912 | :ref:`MultiIndex Slicers <advanced.mi_slicers>`
11913 |
11914 | ----------------------------------------------------------------------
11915 | Data descriptors inherited from pandas.core.generic.NDFrame:
11916 |
11917 | at
11918 | Access a single value for a row/column label pair.
11919 |
11920 | Similar to ``loc``, in that both provide label-based lookups. Use
11921 | ``at`` if you only need to get or set a single value in a DataFrame
11922 | or Series.
11923 |
11924 | See Also
11925 | --------
11926 | DataFrame.iat : Access a single value for a row/column pair by integer
11927 | position
11928 | DataFrame.loc : Access a group of rows and columns by label(s)
11929 | Series.at : Access a single value using a label
11930 |
11931 | Examples
11932 | --------
11933 | >>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
11934 | ... index=[4, 5, 6], columns=['A', 'B', 'C'])
11935 | >>> df
11936 | A B C
11937 | 4 0 2 3
11938 | 5 0 4 1
11939 | 6 10 20 30
11940 |
11941 | Get value at specified row/column pair
11942 |
11943 | >>> df.at[4, 'B']
11944 | 2
11945 |
11946 | Set value at specified row/column pair
11947 |
11948 | >>> df.at[4, 'B'] = 10
11949 | >>> df.at[4, 'B']
11950 | 10
11951 |
11952 | Get value within a Series
11953 |
11954 | >>> df.loc[5].at['B']
11955 | 4
11956 |
11957 | Raises
11958 | ------
11959 | KeyError
11960 | When label does not exist in DataFrame
11961 |
11962 | blocks
11963 | Internal property, property synonym for as_blocks()
11964 |
11965 | .. deprecated:: 0.21.0
11966 |
11967 | dtypes
11968 | Return the dtypes in the DataFrame.
11969 |
11970 | This returns a Series with the data type of each column.
11971 | The result's index is the original DataFrame's columns. Columns
11972 | with mixed types are stored with the ``object`` dtype. See
11973 | :ref:`the User Guide <basics.dtypes>` for more.
11974 |
11975 | Returns
11976 | -------
11977 | pandas.Series
11978 | The data type of each column.
11979 |
11980 | See Also
11981 | --------
11982 | pandas.DataFrame.ftypes : dtype and sparsity information.
11983 |
11984 | Examples
11985 | --------
11986 | >>> df = pd.DataFrame({'float': [1.0],
11987 | ... 'int': [1],
11988 | ... 'datetime': [pd.Timestamp('20180310')],
11989 | ... 'string': ['foo']})
11990 | >>> df.dtypes
11991 | float float64
11992 | int int64
11993 | datetime datetime64[ns]
11994 | string object
11995 | dtype: object
11996 |
11997 | empty
11998 | Indicator whether DataFrame is empty.
11999 |
12000 | True if DataFrame is entirely empty (no items), meaning any of the
12001 | axes are of length 0.
12002 |
12003 | Returns
12004 | -------
12005 | bool
12006 | If DataFrame is empty, return True, if not return False.
12007 |
12008 | Notes
12009 | -----
12010 | If DataFrame contains only NaNs, it is still not considered empty. See
12011 | the example below.
12012 |
12013 | Examples
12014 | --------
12015 | An example of an actual empty DataFrame. Notice the index is empty:
12016 |
12017 | >>> df_empty = pd.DataFrame({'A' : []})
12018 | >>> df_empty
12019 | Empty DataFrame
12020 | Columns: [A]
12021 | Index: []
12022 | >>> df_empty.empty
12023 | True
12024 |
12025 | If we only have NaNs in our DataFrame, it is not considered empty! We
12026 | will need to drop the NaNs to make the DataFrame empty:
12027 |
12028 | >>> df = pd.DataFrame({'A' : [np.nan]})
12029 | >>> df
12030 | A
12031 | 0 NaN
12032 | >>> df.empty
12033 | False
12034 | >>> df.dropna().empty
12035 | True
12036 |
12037 | See also
12038 | --------
12039 | pandas.Series.dropna
12040 | pandas.DataFrame.dropna
12041 |
12042 | ftypes
12043 | Return the ftypes (indication of sparse/dense and dtype) in DataFrame.
12044 |
12045 | This returns a Series with the data type of each column.
12046 | The result's index is the original DataFrame's columns. Columns
12047 | with mixed types are stored with the ``object`` dtype. See
12048 | :ref:`the User Guide <basics.dtypes>` for more.
12049 |
12050 | Returns
12051 | -------
12052 | pandas.Series
12053 | The data type and indication of sparse/dense of each column.
12054 |
12055 | See Also
12056 | --------
12057 | pandas.DataFrame.dtypes: Series with just dtype information.
12058 | pandas.SparseDataFrame : Container for sparse tabular data.
12059 |
12060 | Notes
12061 | -----
12062 | Sparse data should have the same dtypes as its dense representation.
12063 |
12064 | Examples
12065 | --------
12066 | >>> import numpy as np
12067 | >>> arr = np.random.RandomState(0).randn(100, 4)
12068 | >>> arr[arr < .8] = np.nan
12069 | >>> pd.DataFrame(arr).ftypes
12070 | 0 float64:dense
12071 | 1 float64:dense
12072 | 2 float64:dense
12073 | 3 float64:dense
12074 | dtype: object
12075 |
12076 | >>> pd.SparseDataFrame(arr).ftypes
12077 | 0 float64:sparse
12078 | 1 float64:sparse
12079 | 2 float64:sparse
12080 | 3 float64:sparse
12081 | dtype: object
12082 |
12083 | iat
12084 | Access a single value for a row/column pair by integer position.
12085 |
12086 | Similar to ``iloc``, in that both provide integer-based lookups. Use
12087 | ``iat`` if you only need to get or set a single value in a DataFrame
12088 | or Series.
12089 |
12090 | See Also
12091 | --------
12092 | DataFrame.at : Access a single value for a row/column label pair
12093 | DataFrame.loc : Access a group of rows and columns by label(s)
12094 | DataFrame.iloc : Access a group of rows and columns by integer position(s)
12095 |
12096 | Examples
12097 | --------
12098 | >>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
12099 | ... columns=['A', 'B', 'C'])
12100 | >>> df
12101 | A B C
12102 | 0 0 2 3
12103 | 1 0 4 1
12104 | 2 10 20 30
12105 |
12106 | Get value at specified row/column pair
12107 |
12108 | >>> df.iat[1, 2]
12109 | 1
12110 |
12111 | Set value at specified row/column pair
12112 |
12113 | >>> df.iat[1, 2] = 10
12114 | >>> df.iat[1, 2]
12115 | 10
12116 |
12117 | Get value within a series
12118 |
12119 | >>> df.loc[0].iat[1]
12120 | 2
12121 |
12122 | Raises
12123 | ------
12124 | IndexError
12125 | When integer position is out of bounds
12126 |
12127 | iloc
12128 | Purely integer-location based indexing for selection by position.
12129 |
12130 | ``.iloc[]`` is primarily integer position based (from ``0`` to
12131 | ``length-1`` of the axis), but may also be used with a boolean
12132 | array.
12133 |
12134 | Allowed inputs are:
12135 |
12136 | - An integer, e.g. ``5``.
12137 | - A list or array of integers, e.g. ``[4, 3, 0]``.
12138 | - A slice object with ints, e.g. ``1:7``.
12139 | - A boolean array.
12140 | - A ``callable`` function with one argument (the calling Series, DataFrame
12141 | or Panel) and that returns valid output for indexing (one of the above)
12142 |
12143 | ``.iloc`` will raise ``IndexError`` if a requested indexer is
12144 | out-of-bounds, except *slice* indexers which allow out-of-bounds
12145 | indexing (this conforms with python/numpy *slice* semantics).
12146 |
12147 | See more at :ref:`Selection by Position <indexing.integer>`
12148 |
12149 | is_copy
12150 |
12151 | ix
12152 | A primarily label-location based indexer, with integer position
12153 | fallback.
12154 |
12155 | Warning: Starting in 0.20.0, the .ix indexer is deprecated, in
12156 | favor of the more strict .iloc and .loc indexers.
12157 |
12158 | ``.ix[]`` supports mixed integer and label based access. It is
12159 | primarily label based, but will fall back to integer positional
12160 | access unless the corresponding axis is of integer type.
12161 |
12162 | ``.ix`` is the most general indexer and will support any of the
12163 | inputs in ``.loc`` and ``.iloc``. ``.ix`` also supports floating
12164 | point label schemes. ``.ix`` is exceptionally useful when dealing
12165 | with mixed positional and label based hierarchical indexes.
12166 |
12167 | However, when an axis is integer based, ONLY label based access
12168 | and not positional access is supported. Thus, in such cases, it's
12169 | usually better to be explicit and use ``.iloc`` or ``.loc``.
12170 |
12171 | See more at :ref:`Advanced Indexing <advanced>`.
12172 |
12173 | loc
12174 | Access a group of rows and columns by label(s) or a boolean array.
12175 |
12176 | ``.loc[]`` is primarily label based, but may also be used with a
12177 | boolean array.
12178 |
12179 | Allowed inputs are:
12180 |
12181 | - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
12182 | interpreted as a *label* of the index, and **never** as an
12183 | integer position along the index).
12184 | - A list or array of labels, e.g. ``['a', 'b', 'c']``.
12185 | - A slice object with labels, e.g. ``'a':'f'``.
12186 |
12187 | .. warning:: Note that contrary to usual python slices, **both** the
12188 | start and the stop are included
12189 |
12190 | - A boolean array of the same length as the axis being sliced,
12191 | e.g. ``[True, False, True]``.
12192 | - A ``callable`` function with one argument (the calling Series, DataFrame
12193 | or Panel) and that returns valid output for indexing (one of the above)
12194 |
12195 | See more at :ref:`Selection by Label <indexing.label>`
12196 |
12197 | See Also
12198 | --------
12199 | DataFrame.at : Access a single value for a row/column label pair
12200 | DataFrame.iloc : Access group of rows and columns by integer position(s)
12201 | DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the
12202 | Series/DataFrame.
12203 | Series.loc : Access group of values using labels
12204 |
12205 | Examples
12206 | --------
12207 | **Getting values**
12208 |
12209 | >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
12210 | ... index=['cobra', 'viper', 'sidewinder'],
12211 | ... columns=['max_speed', 'shield'])
12212 | >>> df
12213 | max_speed shield
12214 | cobra 1 2
12215 | viper 4 5
12216 | sidewinder 7 8
12217 |
12218 | Single label. Note this returns the row as a Series.
12219 |
12220 | >>> df.loc['viper']
12221 | max_speed 4
12222 | shield 5
12223 | Name: viper, dtype: int64
12224 |
12225 | List of labels. Note using ``[[]]`` returns a DataFrame.
12226 |
12227 | >>> df.loc[['viper', 'sidewinder']]
12228 | max_speed shield
12229 | viper 4 5
12230 | sidewinder 7 8
12231 |
12232 | Single label for row and column
12233 |
12234 | >>> df.loc['cobra', 'shield']
12235 | 2
12236 |
12237 | Slice with labels for row and single label for column. As mentioned
12238 | above, note that both the start and stop of the slice are included.
12239 |
12240 | >>> df.loc['cobra':'viper', 'max_speed']
12241 | cobra 1
12242 | viper 4
12243 | Name: max_speed, dtype: int64
12244 |
12245 | Boolean list with the same length as the row axis
12246 |
12247 | >>> df.loc[[False, False, True]]
12248 | max_speed shield
12249 | sidewinder 7 8
12250 |
12251 | Conditional that returns a boolean Series
12252 |
12253 | >>> df.loc[df['shield'] > 6]
12254 | max_speed shield
12255 | sidewinder 7 8
12256 |
12257 | Conditional that returns a boolean Series with column labels specified
12258 |
12259 | >>> df.loc[df['shield'] > 6, ['max_speed']]
12260 | max_speed
12261 | sidewinder 7
12262 |
12263 | Callable that returns a boolean Series
12264 |
12265 | >>> df.loc[lambda df: df['shield'] == 8]
12266 | max_speed shield
12267 | sidewinder 7 8
12268 |
12269 | **Setting values**
12270 |
12271 | Set value for all items matching the list of labels
12272 |
12273 | >>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
12274 | >>> df
12275 | max_speed shield
12276 | cobra 1 2
12277 | viper 4 50
12278 | sidewinder 7 50
12279 |
12280 | Set value for an entire row
12281 |
12282 | >>> df.loc['cobra'] = 10
12283 | >>> df
12284 | max_speed shield
12285 | cobra 10 10
12286 | viper 4 50
12287 | sidewinder 7 50
12288 |
12289 | Set value for an entire column
12290 |
12291 | >>> df.loc[:, 'max_speed'] = 30
12292 | >>> df
12293 | max_speed shield
12294 | cobra 30 10
12295 | viper 30 50
12296 | sidewinder 30 50
12297 |
12298 | Set value for rows matching callable condition
12299 |
12300 | >>> df.loc[df['shield'] > 35] = 0
12301 | >>> df
12302 | max_speed shield
12303 | cobra 30 10
12304 | viper 0 0
12305 | sidewinder 0 0
12306 |
12307 | **Getting values on a DataFrame with an index that has integer labels**
12308 |
12309 | Another example using integers for the index
12310 |
12311 | >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
12312 | ... index=[7, 8, 9], columns=['max_speed', 'shield'])
12313 | >>> df
12314 | max_speed shield
12315 | 7 1 2
12316 | 8 4 5
12317 | 9 7 8
12318 |
12319 | Slice with integer labels for rows. As mentioned above, note that both
12320 | the start and stop of the slice are included.
12321 |
12322 | >>> df.loc[7:9]
12323 | max_speed shield
12324 | 7 1 2
12325 | 8 4 5
12326 | 9 7 8
12327 |
12328 | **Getting values with a MultiIndex**
12329 |
12330 | A number of examples using a DataFrame with a MultiIndex
12331 |
12332 | >>> tuples = [
12333 | ... ('cobra', 'mark i'), ('cobra', 'mark ii'),
12334 | ... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
12335 | ... ('viper', 'mark ii'), ('viper', 'mark iii')
12336 | ... ]
12337 | >>> index = pd.MultiIndex.from_tuples(tuples)
12338 | >>> values = [[12, 2], [0, 4], [10, 20],
12339 | ... [1, 4], [7, 1], [16, 36]]
12340 | >>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
12341 | >>> df
12342 | max_speed shield
12343 | cobra mark i 12 2
12344 | mark ii 0 4
12345 | sidewinder mark i 10 20
12346 | mark ii 1 4
12347 | viper mark ii 7 1
12348 | mark iii 16 36
12349 |
12350 | Single label. Note this returns a DataFrame with a single index.
12351 |
12352 | >>> df.loc['cobra']
12353 | max_speed shield
12354 | mark i 12 2
12355 | mark ii 0 4
12356 |
12357 | Single index tuple. Note this returns a Series.
12358 |
12359 | >>> df.loc[('cobra', 'mark ii')]
12360 | max_speed 0
12361 | shield 4
12362 | Name: (cobra, mark ii), dtype: int64
12363 |
12364 | Single label for row and column. Similar to passing in a tuple, this
12365 | returns a Series.
12366 |
12367 | >>> df.loc['cobra', 'mark i']
12368 | max_speed 12
12369 | shield 2
12370 | Name: (cobra, mark i), dtype: int64
12371 |
12372 | Single tuple. Note using ``[[]]`` returns a DataFrame.
12373 |
12374 | >>> df.loc[[('cobra', 'mark ii')]]
12375 | max_speed shield
12376 | cobra mark ii 0 4
12377 |
12378 | Single tuple for the index with a single label for the column
12379 |
12380 | >>> df.loc[('cobra', 'mark i'), 'shield']
12381 | 2
12382 |
12383 | Slice from index tuple to single label
12384 |
12385 | >>> df.loc[('cobra', 'mark i'):'viper']
12386 | max_speed shield
12387 | cobra mark i 12 2
12388 | mark ii 0 4
12389 | sidewinder mark i 10 20
12390 | mark ii 1 4
12391 | viper mark ii 7 1
12392 | mark iii 16 36
12393 |
12394 | Slice from index tuple to index tuple
12395 |
12396 | >>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
12397 | max_speed shield
12398 | cobra mark i 12 2
12399 | mark ii 0 4
12400 | sidewinder mark i 10 20
12401 | mark ii 1 4
12402 | viper mark ii 7 1
12403 |
12404 | Raises
12405 | ------
12406 | KeyError:
12407 | when any items are not found
12408 |
12409 | ndim
12410 | Return an int representing the number of axes / array dimensions.
12411 |
12412 | Return 1 if Series. Otherwise return 2 if DataFrame.
12413 |
12414 | See Also
12415 | --------
12416 | ndarray.ndim
12417 |
12418 | Examples
12419 | --------
12420 | >>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
12421 | >>> s.ndim
12422 | 1
12423 |
12424 | >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
12425 | >>> df.ndim
12426 | 2
12427 |
12428 | size
12429 | Return an int representing the number of elements in this object.
12430 |
12431 | Return the number of rows if Series. Otherwise return the number of
12432 | rows times number of columns if DataFrame.
12433 |
12434 | See Also
12435 | --------
12436 | ndarray.size
12437 |
12438 | Examples
12439 | --------
12440 | >>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
12441 | >>> s.size
12442 | 3
12443 |
12444 | >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
12445 | >>> df.size
12446 | 4
12447 |
12448 | values
12449 | Return a Numpy representation of the DataFrame.
12450 |
12451 | Only the values in the DataFrame will be returned, the axes labels
12452 | will be removed.
12453 |
12454 | Returns
12455 | -------
12456 | numpy.ndarray
12457 | The values of the DataFrame.
12458 |
12459 | Examples
12460 | --------
12461 | A DataFrame where all columns are the same type (e.g., int64) results
12462 | in an array of the same type.
12463 |
12464 | >>> df = pd.DataFrame({'age': [ 3, 29],
12465 | ... 'height': [94, 170],
12466 | ... 'weight': [31, 115]})
12467 | >>> df
12468 | age height weight
12469 | 0 3 94 31
12470 | 1 29 170 115
12471 | >>> df.dtypes
12472 | age int64
12473 | height int64
12474 | weight int64
12475 | dtype: object
12476 | >>> df.values
12477 | array([[ 3, 94, 31],
12478 | [ 29, 170, 115]], dtype=int64)
12479 |
12480 | A DataFrame with mixed type columns(e.g., str/object, int64, float32)
12481 | results in an ndarray of the broadest type that accommodates these
12482 | mixed types (e.g., object).
12483 |
12484 | >>> df2 = pd.DataFrame([('parrot', 24.0, 'second'),
12485 | ... ('lion', 80.5, 1),
12486 | ... ('monkey', np.nan, None)],
12487 | ... columns=('name', 'max_speed', 'rank'))
12488 | >>> df2.dtypes
12489 | name object
12490 | max_speed float64
12491 | rank object
12492 | dtype: object
12493 | >>> df2.values
12494 | array([['parrot', 24.0, 'second'],
12495 | ['lion', 80.5, 1],
12496 | ['monkey', nan, None]], dtype=object)
12497 |
12498 | Notes
12499 | -----
12500 | The dtype will be a lower-common-denominator dtype (implicit
12501 | upcasting); that is to say if the dtypes (even of numeric types)
12502 | are mixed, the one that accommodates all will be chosen. Use this
12503 | with care if you are not dealing with the blocks.
12504 |
12505 | e.g. If the dtypes are float16 and float32, dtype will be upcast to
12506 | float32. If dtypes are int32 and uint8, dtype will be upcast to
12507 | int32. By :func:`numpy.find_common_type` convention, mixing int64
12508 | and uint64 will result in a float64 dtype.
12509 |
12510 | See Also
12511 | --------
12512 | pandas.DataFrame.index : Retrievie the index labels
12513 | pandas.DataFrame.columns : Retrieving the column names
12514 |
12515 | ----------------------------------------------------------------------
12516 | Methods inherited from pandas.core.base.PandasObject:
12517 |
12518 | __sizeof__(self)
12519 | Generates the total memory usage for an object that returns
12520 | either a value or Series of values
12521 |
12522 | ----------------------------------------------------------------------
12523 | Methods inherited from pandas.core.base.StringMixin:
12524 |
12525 | __bytes__(self)
12526 | Return a string representation for a particular object.
12527 |
12528 | Invoked by bytes(obj) in py3 only.
12529 | Yields a bytestring in both py2/py3.
12530 |
12531 | __repr__(self)
12532 | Return a string representation for a particular object.
12533 |
12534 | Yields Bytestring in Py2, Unicode String in py3.
12535 |
12536 | __str__(self)
12537 | Return a string representation for a particular Object
12538 |
12539 | Invoked by str(df) in both py2/py3.
12540 | Yields Bytestring in Py2, Unicode String in py3.
12541 |
12542 | ----------------------------------------------------------------------
12543 | Data descriptors inherited from pandas.core.base.StringMixin:
12544 |
12545 | __dict__
12546 | dictionary for instance variables (if defined)
12547 |
12548 | __weakref__
12549 | list of weak references to the object (if defined)
12550 |
12551 | ----------------------------------------------------------------------
12552 | Methods inherited from pandas.core.accessor.DirNamesMixin:
12553 |
12554 | __dir__(self)
12555 | Provide method name lookup and completion
12556 | Only provide 'public' methods